Convert multiple header table to long format - r

I am reading in an Excel table with multiple rows of headers, which, through read.csv, creates an object like this in R.
R1 <- c("X", "X.1", "X.2", "X.3", "EU", "EU.1", "EU.2", "US", "US.1", "US.2")
R2 <- c("Min Age", "Max Age", "Min Duration", "Max Duration", "1", "2", "3", "1", "2", "3")
R3 <- c("18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01")
R4 <- c("22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05")
R5 <- c("26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21")
R6 <- c("18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40")
R7 <- c("22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50")
R8 <- c("26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99")
table1 <- as.data.frame(rbind(R1, R2, R3, R4, R5, R6, R7, R8))
How do I now 'flatten' this so that I end up with an R table with "Min age", "Max Age", "Min Duration", "Max Duration", "Area", "Level", "Price" columns. With the "Area" column showing either "EU" or "US", the "Level" column showing either 1, 2 or 3, and then the "Price" column showing the corresponding price found in the Excel table?
I would use the gather function from tidyr if there weren't multiple header rows, but can't seem to work it with this data, any ideas?
The output should have a total of 36 rows + headers

If you skip the first row, as suggested by akrun, you will presumably end up with data that looks something like this: (with "X"s and ".1"/".2" added automatically by R)
library(tidyverse)
df <- tribble(
~Min.Age, ~Max.Age, ~Min.Duration, ~Max.Duration, ~X1.1, ~X2.1, ~X3.1, ~X1.2, ~X2.2, ~X3.2,
"18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01",
"22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05",
"26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21",
"18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40",
"22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50",
"26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99"
)
With this data, you can then use gather to collect all headers beginning with X into one column and price into another. You can separate the the headers into the "Level" and "Area". Finally, recode Area and remove "X" from the levels.
df %>%
gather(headers, Price, starts_with("X")) %>%
separate(headers, c("Level", "Area")) %>%
mutate(Area = if_else(Area == "1", "EU", "US"),
Level = parse_number(Level))
#> # A tibble: 36 x 7
#> Min.Age Max.Age Min.Duration Max.Duration Level Area Price
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 18 21 1 3 1 EU 0.12
#> 2 22 25 1 3 1 EU 0.20
#> 3 26 30 1 3 1 EU 0.25
#> 4 18 21 4 5 1 EU 0.32
#> 5 22 25 4 5 1 EU 0.40
#> 6 26 30 4 5 1 EU 0.55
#> 7 18 21 1 3 2 EU 0.32
#> 8 22 25 1 3 2 EU 0.40
#> 9 26 30 1 3 2 EU 0.50
#> 10 18 21 4 5 2 EU 0.60
#> # ... with 26 more rows
Created on 2018-10-12 by the reprex package (v0.2.1)
P.S. You can find lots of spreadsheet munging workflows here: https://nacnudus.github.io/spreadsheet-munging-strategies/small-multiples-with-all-headers-present-for-each-multiple.html

Related

R select rows in dataframe by external vector as index

I have the following data and I want to subset some rows from the table if the name is in the vector l.
df <-data.frame("Names" = c("TIGIT", "ABCB1", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "ABL1", "CD2", "IL12A", "PSEN2", "CD3G", "CD28", "PSEN1", "ITGA1"),"1S" = c("5", "6", "8", "99", "5", "0", "1", "3", "15", "15", "34", "62", "54", "6", "8", "9"), "1T" = c("6", "4", "6", "9", "5", "11", "33", "7", "8", "24", "34", "62", "66", "4", "78", "44"))
rownames(df) <- df$Names
df <- df %>% select(-"Names") # df I have
l <- c("TIGIT", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "CD2", "PSEN2", "CD3G", "CD28", "PSEN1") # genes I want to select
I want to get the following table in the output.
X1S X1T
TIGIT 5 6
CD8B 8 6
CD8A 99 9
CD1C 5 5
F2RL1 0 11
LCP1 1 33
LAG3 3 7
CD2 15 24
PSEN2 62 62
CD3G 54 66
CD28 6 4
PSEN1 8 78
It is easier to filter by the gene names, if you keep them as a column,
instead of making them rownames.
The following changes to your code will get you the result you are lookin for.
library(tidyverse)
df <-data.frame("Names" = c("TIGIT", "ABCB1", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "ABL1", "CD2", "IL12A", "PSEN2", "CD3G", "CD28", "PSEN1", "ITGA1"),"1S" = c("5", "6", "8", "99", "5", "0", "1", "3", "15", "15", "34", "62", "54", "6", "8", "9"), "1T" = c("6", "4", "6", "9", "5", "11", "33", "7", "8", "24", "34", "62", "66", "4", "78", "44"))
genes_to_select <- c("TIGIT", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "CD2", "PSEN2", "CD3G", "CD28", "PSEN1") # genes I want to select
df <-
df %>%
filter(Names %in% genes_to_select) %>%
column_to_rownames("Names") %>%
mutate(across(.fns = as.numeric)) %>%
as.matrix()
df
#> X1S X1T
#> [1,] 5 6
#> [2,] 8 6
#> [3,] 99 9
#> [4,] 5 5
#> [5,] 0 11
#> [6,] 1 33
#> [7,] 3 7
#> [8,] 15 24
#> [9,] 62 62
#> [10,] 54 66
#> [11,] 6 4
#> [12,] 8 78
We could also use slice
library(dplyr)
library(tibble)
df %>%
slice(match(Names, l)) %>%
column_to_rownames('Names')
One line does the job:
df[rownames(df) %in% l,]
X1S X1T
TIGIT 5 6
CD8B 8 6
CD8A 99 9
CD1C 5 5
F2RL1 0 11
LCP1 1 33
LAG3 3 7
CD2 15 24
PSEN2 62 62
CD3G 54 66
CD28 6 4
PSEN1 8 78
Or if you have Names:
df[df$Names %in% l,]

How to extract records to those patients who got admitted before discharge in another hospital

I am analyzing data of patient admission/discharge in a number of hospitals for various inconsistencies.
My data structure is like -
Row_id ; nothing but a unique identifier of records (used as foreign key in some other table)
patient_id : unique identifier key for a patient
pack_id : the medical package chosen by the patient for treatment
hospital_id : unique identifier for a hospital
admn_dt : the date of admission
discharge_date : the date of discharge of patient
Snapshot of data
row_id patient_id pack_id hosp_id admn_date discharge_date
1 1 12 1 01-01-2020 14-01-2020
2 1 62 2 03-01-2020 15-01-2020
3 1 77 1 16-01-2020 27-01-2020
4 1 86 1 18-01-2020 19-01-2020
5 1 20 2 22-01-2020 25-01-2020
6 2 55 3 01-01-2020 14-01-2020
7 2 86 3 03-01-2020 17-01-2020
8 2 72 4 16-01-2020 27-01-2020
9 1 7 1 26-01-2020 30-01-2020
10 3 54 5 14-01-2020 22-01-2020
11 3 75 5 09-02-2020 17-02-2020
12 3 26 6 22-01-2020 05-02-2020
13 4 21 7 14-04-2020 23-04-2020
14 4 12 7 23-04-2020 29-04-2020
15 5 49 8 17-03-2020 26-03-2020
16 5 35 9 27-02-2020 07-03-2020
17 6 51 10 12-04-2020 15-04-2020
18 7 31 11 11-02-2020 17-02-2020
19 8 10 12 07-03-2020 08-03-2020
20 8 54 13 20-03-2020 23-03-2020
sample dput of data is as under:
df <- structure(list(row_id = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
"19", "20"), patient_id = c("1", "1", "1", "1", "1", "2", "2",
"2", "1", "3", "3", "3", "4", "4", "5", "5", "6", "7", "8", "8"
), pack_id = c("12", "62", "77", "86", "20", "55", "86", "72",
"7", "54", "75", "26", "21", "12", "49", "35", "51", "31", "10",
"54"), hosp_id = c("1", "2", "1", "1", "2", "3", "3", "4", "1",
"5", "5", "6", "7", "7", "8", "9", "10", "11", "12", "13"), admn_date = structure(c(18262,
18264, 18277, 18279, 18283, 18262, 18264, 18277, 18287, 18275,
18301, 18283, 18366, 18375, 18338, 18319, 18364, 18303, 18328,
18341), class = "Date"), discharge_date = structure(c(18275,
18276, 18288, 18280, 18286, 18275, 18278, 18288, 18291, 18283,
18309, 18297, 18375, 18381, 18347, 18328, 18367, 18309, 18329,
18344), class = "Date")), row.names = c(NA, -20L), class = "data.frame")
I have to identify the records where patient got admitted without discharge from previous treatment. For this I have used the following code taking help from this thread How to know customers who placed next order before delivery/receiving of earlier order? In R -
library(tidyverse)
df %>% arrange(patient_id, admn_date, discharge_date) %>%
mutate(sort_key = row_number()) %>%
pivot_longer(c(admn_date, discharge_date), names_to ="activity",
values_to ="date", names_pattern = "(.*)_date") %>%
mutate(activity = factor(activity, ordered = T,
levels = c("admn", "discharge")),
admitted = ifelse(activity == "admn", 1, -1)) %>%
group_by(patient_id) %>%
arrange(date, sort_key, activity, .by_group = TRUE) %>%
mutate (admitted = cumsum(admitted)) %>%
ungroup() %>%
filter(admitted >1, activity == "admn")
This give me nicely all the records where patients got admission without being discharged from previous treatment.
Output-
# A tibble: 6 x 8
row_id patient_id pack_id hosp_id sort_key activity date admitted
<chr> <chr> <chr> <chr> <int> <ord> <date> <dbl>
1 2 1 62 2 2 admn 2020-01-03 2
2 4 1 86 1 4 admn 2020-01-18 2
3 5 1 20 2 5 admn 2020-01-22 2
4 9 1 7 1 6 admn 2020-01-26 2
5 7 2 86 3 8 admn 2020-01-03 2
6 8 2 72 4 9 admn 2020-01-16 2
Explanation-
Row_id 2 is correct because it overlaps with row_id 1
Row_id 4 is correct because it overlaps with row_id 3
Row_id 5 is correct because it overlaps with row_id 3 (again)
Row_id 9 is correct because it overlaps with row_id 3 (again)
Row_id 7 is correct becuase it overlaps with row_id 6
Row_id 8 is correct becuase it overlaps with row_id 7
Now I am stuck at a given validation rule that patients are allowed to take admission in same hospital n number of times without actually validating for their previous discharge. In other words, I have to extract only those records where patients got admitted in a different hospital without being discharged from 'another hospital. If the hospital would have been same, the group_by at hosp_id field could have done the work for me, but here the case is actually reverse. For same hosp_id it is allowed but for different it is not allowed.
Please help how may I proceed?
If I could map the resultant row_id with its overlapping record's row_id, may be we can solve the problem.
Desired Output-
row_id
2
5
8
because row_ids 4,, 9 and 7 overlaps with record having same hospital id.
Thanks in advance.
P.S. Though a desired solution has been given, I want to know can it done through map/apply group of function and/or through data.table package?
Is this what you're looking for? (Refer to the comments in the code for details. I can provide clarifications if necessary.)
#Your data
df <- structure(list(row_id = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
"19", "20"), patient_id = c("1", "1", "1", "1", "1", "2", "2",
"2", "1", "3", "3", "3", "4", "4", "5", "5", "6", "7", "8", "8"
), pack_id = c("12", "62", "77", "86", "20", "55", "86", "72",
"7", "54", "75", "26", "21", "12", "49", "35", "51", "31", "10",
"54"), hosp_id = c("1", "2", "1", "1", "2", "3", "3", "4", "1",
"5", "5", "6", "7", "7", "8", "9", "10", "11", "12", "13"), admn_date = structure(c(18262,
18264, 18277, 18279, 18283, 18262, 18264, 18277, 18287, 18275,
18301, 18283, 18366, 18375, 18338, 18319, 18364, 18303, 18328,
18341), class = "Date"), discharge_date = structure(c(18275,
18276, 18288, 18280, 18286, 18275, 18278, 18288, 18291, 18283,
18309, 18297, 18375, 18381, 18347, 18328, 18367, 18309, 18329,
18344), class = "Date")), row.names = c(NA, -20L), class = "data.frame")
#Solution
library(dplyr)
library(tidyr)
library(stringr)
library(magrittr)
library(lubridate)
#Convert patient_id column into numeric
df$patient_id <- as.numeric(df$patient_id)
#Create empty (well, 1 row) data.frame to
#collect output data
#This needs three additional columns
#(as indicated)
outdat <- data.frame(matrix(nrow = 1, ncol = 9), stringsAsFactors = FALSE)
names(outdat) <- c(names(df), "ref_discharge_date", "ref_hosp_id", "overlap")
#Logic:
#For each unique patient_id take all
#their records.
#For each row of each such set of records
#compare its discharge_date with the admn_date
#of all other records with admn_date >= its own
#admn_date
#Then register the time interval between this row's
#discharge_date and the compared row's admn_date
#as a numeric value ("overlap")
#The idea is that concurrent hospital stays will have
#negative overlaps as the admn_date (of the current stay)
#will precede the discharge_date (of the previous one)
for(i in 1:length(unique(df$patient_id))){
#i <- 7
curdat <- df %>% filter(patient_id == unique(df$patient_id)[i])
curdat %<>% mutate(admn_date = lubridate::as_date(admn_date),
discharge_date = lubridate::as_date(discharge_date))
curdat %<>% arrange(admn_date)
for(j in 1:nrow(curdat)){
#j <- 1
currow <- curdat[j, ]
#otrows <- curdat[-j, ]
#
otrows <- curdat %>% filter(admn_date >= currow$admn_date)
#otrows <- curdat
for(k in 1:nrow(otrows)){
otrows$ref_discharge_date[k] <- currow$discharge_date
#otrows$refdisc[k] <- as_date(otrows$refdisc[k])
otrows$ref_hosp_id[k] <- currow$hosp_id
otrows$overlap[k] <- as.numeric(difftime(otrows$admn_date[k], currow$discharge_date))
}
otrows$ref_discharge_date <- as_date(otrows$ref_discharge_date)
outdat <- bind_rows(outdat, otrows)
}
}
rm(curdat, i, j, k, otrows, currow)
#Removing that NA row + removing all self-rows
outdat %<>%
filter(!is.na(patient_id)) %>%
filter(discharge_date != ref_discharge_date)
#Filter out only negative overlaps
outdat %<>% filter(overlap < 0)
#Filter out only those records where the patient
#was admitted to different hospitals
outdat %<>% filter(hosp_id != ref_hosp_id)
outdat
# row_id patient_id pack_id hosp_id admn_date discharge_date ref_discharge_date ref_hosp_id overlap
# 1 2 1 62 2 2020-01-03 2020-01-15 2020-01-14 1 -11
# 2 5 1 20 2 2020-01-22 2020-01-25 2020-01-27 1 -5
# 3 8 2 72 4 2020-01-16 2020-01-27 2020-01-17 3 -1
Group by the patient id again and then count the hospital IDs. Then merge that back on and filter the data.
Something like:
admitted_not_validated %>%
left_join(
admitted_not_validated %>%
group_by(patient_id) %>%
summarize (multi_hosp = length(unique(hosp_id)),.groups ='drop'),
by = 'patient_id') %>%
filter(multi_hosp >1)

Pivot from long format to wide format in a dataframe [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
pivot_wider when there's no names column (or when names column should be created)
(2 answers)
Closed 2 years ago.
I have the dataframe below:
dput(Moment[1:15,])
structure(list(SectionCut = c("1", "1", "1", "1", "2", "2", "2",
"2", "3", "3", "3", "3", "Left", "Left", "Left"), N_l = c("1",
"2", "3", "4", "1", "2", "3", "4", "1", "2", "3", "4", "1", "2",
"3"), UG = c("84", "84", "84", "84", "84", "84", "84", "84",
"84", "84", "84", "84", "84", "84", "84"), S = c("12", "12",
"12", "12", "12", "12", "12", "12", "12", "12", "12", "12", "12",
"12", "12"), Sample = c("S00", "S00", "S00", "S00", "S00", "S00",
"S00", "S00", "S00", "S00", "S00", "S00", "S00", "S00", "S00"
), DF = c(0.367164093630677, 0.540130283330855, 0.590662743113521,
0.497030982705986, 0.000319303760901125, 0.000504925126205843,
0.00051127115578891, 0.000395434233037301, 0.413218926236695,
0.610726262711904, 0.685000816613652, 0.59474035159783, 0.483354599644366,
0.645710184115934, 0.625883097885242)), row.names = c(NA, -15L
), class = c("tbl_df", "tbl", "data.frame"))
I want to separate the content of the column by pivoting the SectionCut column. I would basically want to use the opposite of pivot_longer somehow... so at the end the values in column DF will be shown under 5 different columns (the values of SectionCut = c("1", "2", "3", "left", "right")
We could use pivot_wider from tidyr after creating a sequence column with rowid
library(dplyr)
library(tidyr0
library(data.table)
Moment %>%
mutate(rn = rowid(SectionCut)) %>%
pivot_wider(names_from = SectionCut, values_from = DF)
-output
# A tibble: 4 x 9
# N_l UG S Sample rn `1` `2` `3` Left
# <chr> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#1 1 84 12 S00 1 0.367 0.000319 0.413 0.483
#2 2 84 12 S00 2 0.540 0.000505 0.611 0.646
#3 3 84 12 S00 3 0.591 0.000511 0.685 0.626
#4 4 84 12 S00 4 0.497 0.000395 0.595 NA

aggregate subset returning this error: NAs introduced by coercion

I'm having trouble finding the mean for a subset of data. Here are the two questions I'm hoping to answer. The first seems to be working fine, but the second returns the same answer as the first, but without numbers to the right of the decimal place. What's going on?
There is also an error that appears:
NAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercion
# What is the mean suspension rate for schools by farms overall?
aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
mean(as.numeric(as.character(suspension_rate_total))))
# What is the mean suspension rate for schools with farms > 100?
aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
mean(as.numeric(as.character(suspension_rate_total))), subset = farms< 100)
Data
merged_data <- structure(list(schid = c("1030642", "1030766", "1030774", "1030840",
"1130103", "1230150", "1530435", "1530492", "1530500", "1931047",
"1931708", "1931864", "1932623", "1933746", "1937226", "1938554",
"1938612", "1938885", "1995836", "1996016"), farms = c("132",
"116", "348", "406", "68", "130", "370", "204", "225", "2,616",
"1,106", "1,918", "1,148", "2,445", "1,123", "1,245", "1,369",
"1,073", "932", "178"), foster = c("2", "0", "1", "8", "1", "4",
"4", "0", "0", "22", "11", "12", "2", "8", "13", "13", "4", "3",
"2", "3"), homeless = c("14", "0", "8", "4", "1", "4", "5", "0",
"14", "35", "42", "116", "9", "8", "34", "54", "26", "31", "5",
"11"), migrant = c("0", "0", "0", "0", "0", "0", "18", "0", "0",
"0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0"), ell = c("18",
"12", "114", "45", "7", "4", "50", "28", "26", "274", "212",
"325", "95", "112", "232", "185", "121", "84", "24", "35"), suspension_rate_total = c("*",
"20", "0", "0", "95", "5", "*", "256", "78", "33", "20", "1",
"218", "120", "0", "0", "*", "*", "*", "0"), suspension_violent = c("*",
"9", "0", "0", "20", "2", "*", "38", "0", "6", "3", "0", "53",
"35", "0", "0", "*", "*", "*", "0"), suspension_violent_no_injury = c("*",
"6", "0", "0", "47", "1", "*", "121", "52", "7", "13", "1", "77",
"44", "0", "0", "*", "*", "*", "0"), suspension_weapon = c("*",
"0", "0", "0", "8", "0", "*", "1", "0", "1", "1", "0", "4", "3",
"0", "0", "*", "*", "*", "0"), suspension_drug = c("*", "0",
"0", "0", "9", "1", "*", "59", "12", "16", "0", "0", "6", "5",
"0", "0", "*", "*", "*", "0"), suspension_defiance = c("*", "1",
"0", "0", "9", "1", "*", "16", "12", "0", "3", "0", "69", "30",
"0", "0", "*", "*", "*", "0"), suspension_other = c("*", "4",
"0", "0", "2", "0", "*", "21", "2", "3", "0", "0", "9", "3",
"0", "0", "*", "*", "*", "0")), row.names = c(NA, 20L), class = "data.frame")
Thank you so much.
Image-1
Image-2
Tidy up your data:
# replace * with NA
merged_data$suspension_rate_total[merged_data$suspension_rate_total == '*'] <- NA
# convert character to numeric format
merged_data$suspension_rate_total <- as.numeric(merged_data$suspension_rate_total)
# remove comma in strings and convert character to numeric format
merged_data$farms <- as.numeric(gsub(",", "", merged_data$farms))
Output
# What is the mean suspension rate for schools by farms overall?
aggregate(suspension_rate_total ~ farms, merged_data, FUN = mean, na.rm = TRUE)
# farms suspension_rate_total
# 1 68 95
# 2 116 20
# 3 130 5
# 4 178 0
# 5 204 256
# 6 225 78
# 7 348 0
# 8 406 0
# 9 1106 20
# 10 1123 0
# 11 1148 218
# 12 1245 0
# 13 1918 1
# 14 2445 120
# 15 2616 33
# What is the mean suspension rate for schools with farms > 100?
aggregate(suspension_rate_total ~ farms, merged_data, FUN = mean, na.rm = TRUE, subset = farms > 100)
# farms suspension_rate_total
# 1 116 20
# 2 130 5
# 3 178 0
# 4 204 256
# 5 225 78
# 6 348 0
# 7 406 0
# 8 1106 20
# 9 1123 0
# 10 1148 218
# 11 1245 0
# 12 1918 1
# 13 2445 120
# 14 2616 33
Are you sure 'NA's introduced by coercion' is a error and not a warning.
When you convert a character column to numeric :
as.numeric(as.character(suspension_rate_total)) , the blanks are coerced into NA's , which is intimated through warnings.
Also, I get different answers for both blocks of code
> aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
+ mean(as.numeric(as.character(suspension_rate_total))))
farms suspension_rate_total
1 68 95
2 116 20
3 130 5
4 132 NA
5 178 0
6 204 256
7 225 78
8 348 0
9 370 NA
10 406 0
11 932 NA
> aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
+ mean(as.numeric(as.character(suspension_rate_total))), subset = farms< 100)
farms suspension_rate_total
1 68 95
>
>
Further, the comment on you second block of code mention farms > 100? , but in you code you used subset = farms< 100

How to combine a list and data.frame in R?

I have a list named data3, like this (from JSON file).
data3 <- list(structure(c(14, 7, 10, 4, 7), .Names = c("0", "3", "2", "14", "7")), structure(c(16, 10, 12, 6, 7), .Names = c("0", "3", "2", "14", "7")), structure(c(77708, 39434, 45489, 30223, 34829 ), .Names = c("0", "3", "2", "14", "7")), structure(c(9828, 6855, 7967, 5638, 6263), .Names = c("0", "3", "2", "14", "7")), structure(c(7626, 5783, 6406, 5074, 5348), .Names = c("0", "3", "2", "14", "7")), structure(c(1012, 404, 546, 251, 300), .Names = c("0", "3", "2", "14", "7")))
and it has some missing values like
data3[4]
[[1]]
0 3 2 14 7
9828 6855 7967 5638 6263
> data3[400]
[[1]]
0 3 2
44 35 38
And I have a data.frame named data1, like this:
date d1 d2 d3 d4
3 20150402 4 5693 0 NEW
4 20150402 4 5693 0 UPGRADE(OEM)
5 20150402 4 5693 0 UPGRADE(ONLINE)
...
I need to combine them like
date d1 d2 d3 d4 0 2 3 7 14
20150402 4 5693 0 NEW 77708 39434 45489 30223 34829
The problem is that not all of data3 has the same number of elements.
I have tried this:
aaa <- NULL
for (i in 1:482){
aaa <- cbind(data1[i, ],data3[[i]])
}
but it didn't work.
May be there is another way to do this but I have no idea.
I can not reproduce your data1 data.frame so I am posting an example that uses first 6 rows of popular iris dataset:
> data3 <- list(structure(c(14, 7, 10, 4, 7), .Names = c("0", "3", "2", "14", "7")), structure(c(16, 10, 12, 6, 7), .Names = c("0", "3", "2", "14", "7")), structure(c(77708, 39434, 45489, 30223, 34829 ), .Names = c("0", "3", "2", "14", "7")), structure(c(9828, 6855, 7967, 5638, 6263), .Names = c("0", "3", "2", "14", "7")), structure(c(7626, 5783, 6406, 5074, 5348), .Names = c("0", "3", "2", "14", "7")), structure(c(1012, 404, 546, 251, 300), .Names = c("0", "3", "2", "14", "7")))
>
>
> t(as.data.frame(data3)) -> x
> rownames(x) <- NULL
> x
0 3 2 14 7
[1,] 14 7 10 4 7
[2,] 16 10 12 6 7
[3,] 77708 39434 45489 30223 34829
[4,] 9828 6855 7967 5638 6263
[5,] 7626 5783 6406 5074 5348
[6,] 1012 404 546 251 300
> cbind(iris[1:6,],x)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 0 3 2 14 7
1 5.1 3.5 1.4 0.2 setosa 14 7 10 4 7
2 4.9 3.0 1.4 0.2 setosa 16 10 12 6 7
3 4.7 3.2 1.3 0.2 setosa 77708 39434 45489 30223 34829
4 4.6 3.1 1.5 0.2 setosa 9828 6855 7967 5638 6263
5 5.0 3.6 1.4 0.2 setosa 7626 5783 6406 5074 5348
6 5.4 3.9 1.7 0.4 setosa 1012 404 546 251 300

Resources