rbindlist data.tables different dimensions - r

I perform a function multiple times with different outputs as exemplified.
require(data.table)
myfunction<-function(x){
DT1<-data.table(a=c(1,2,3),b=c("a","b","c"))
DT2<-data.table(d=c(4,5,6), e=c("d","e","f"))
return(list(DT1=DT1, DT2=DT2))
}
result<-lapply(1:2, myfunction)
I want to bind results. The desired output will be as the one I am showing. My real example uses hundreds of tables.
l1<-rbindlist(list(result[[1]]$DT1, result[[2]]$DT1), idcol = TRUE)
l2<-rbindlist(list(result[[1]]$DT2, result[[2]]$DT2), idcol = TRUE)
DESIRED_OUTPUT<-list(l1, l2)
I use this option but is not working:
rbindlist data.tables wtih different number of columns
======================================================================
Update
The option that #nicola proposed doesn´t work when the number of elements of the list was diferent than 2. For the first example (DT1 and DT2). As a solution I create a variable "l" that calculate the number of elements inside the list of the function.
New example with solution.
require(data.table)
myfunction<-function(x){
DT1<-data.table(a=c(1,2,3),b=c("a","b","c"))
DT2<-data.table(d=c(4,5), e=c("d","e"))
DT3<-data.table(f=c(7,8,NA,9), g=c("g","h","i","j"))
return(list(DT1=DT1, DT2=DT2, DT3=DT3))
}
result<-lapply(1:5, myfunction)
l<-unique(sapply(result, length))
apply(matrix(unlist(result,recursive=FALSE),nrow=l),1,rbindlist,idcol=TRUE)

Here's an option:
do.call(function(...) Map(function(...) rbind(..., idcol = T), ...), result)
#$DT1
# .id a b
#1: 1 1 a
#2: 1 2 b
#3: 1 3 c
#4: 2 1 a
#5: 2 2 b
#6: 2 3 c
#
#$DT2
# .id d e
#1: 1 4 d
#2: 1 5 e
#3: 1 6 f
#4: 2 4 d
#5: 2 5 e
#6: 2 6 f
Here's another:
lapply(purrr::transpose(result), rbindlist, idcol = T)

An attempt that should match on names of the list components:
Map(
function(LL,n) rbindlist(unname(LL[names(l) %in% n]), idcol=TRUE),
list(unlist(result, recursive=FALSE)),
unique(names(l))
)
#[[1]]
# .id a b
#1: 1 1 a
#2: 1 2 b
#3: 1 3 c
#4: 2 1 a
#5: 2 2 b
#6: 2 3 c
#
#[[2]]
# .id d e
#1: 1 4 d
#2: 1 5 e
#3: 1 6 f
#4: 2 4 d
#5: 2 5 e
#6: 2 6 f

Related

Creating data.table from a list of unequal vector lengths

I am looking to create a data.table from a list of unequal vectors, but instead of repeating the values for the "shorter" vector, I want it to be filled with NAs. I have one possible solution, but it repeats values and does not retain the NA as needed.
Example:
library(data.table)
my_list <- list(A = 1:4, B = letters[1:5])
as.data.table(do.call(cbind, my_list))
as.data.table(do.call(cbind, my_list))
A B
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 1 e
But I want it to look like:
as.data.table(do.call(cbind, my_list))
A B
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: NA e
Thank you!
We need to make the lengths same by appending NA at the end of the list elements having lesser length than the max length
mx <- max(lengths(my_list))
as.data.table(do.call(cbind, lapply(my_list, `length<-`, mx)))
-output
A B
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: <NA> e
Instead of cbind/as.data.table, setDT is more compact
setDT(lapply(my_list, `length<-`, mx))[]
A B
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: NA e
You may use stringi::stri_list2matrix to make all the list length equal.
my_list |>
stringi::stri_list2matrix() |>
data.table::as.data.table() |>
type.convert(as.is = TRUE) |>
setNames(names(my_list))
# A B
#1: 1 a
#2: 2 b
#3: 3 c
#4: 4 d
#5: NA e

Merge multiple numeric column as list typed column in data.table [R]

I'm trying to find a way to merge multiple column numeric column as a new list type column.
Data Table
dt <- data.table(
a=c(1,2,3),
b=c(4,5,6),
c=c(7,8,9)
)
Expected Result
a b c d
1: 1 4 7 1,4,7
2: 2 5 8 2,5,8
3: 3 6 9 3,6,9
Attempt 1
I have tried doing append with a list with dt[,d:=list(c(a,b,c))] but it just append everything instead and get the incorrect result
a b c d
1: 1 4 7 1,2,3,4,5,6,...
2: 2 5 8 1,2,3,4,5,6,...
3: 3 6 9 1,2,3,4,5,6,...
Do a group by row and place the elements in the list
dt[, d := .(list(unlist(.SD, recursive = FALSE))), 1:nrow(dt)]
-output
dt
a b c d
1: 1 4 7 1,4,7
2: 2 5 8 2,5,8
3: 3 6 9 3,6,9
Or another option is paste and strsplit
dt[, d := strsplit(do.call(paste, c(.SD, sep=",")), ",")]
Or may use transpose
dt[, d := lapply(data.table::transpose(unname(.SD)), unlist)]
dt
a b c d
1: 1 4 7 1,4,7
2: 2 5 8 2,5,8
3: 3 6 9 3,6,9
dt[, d := purrr::pmap(.SD, ~c(...))]

How to quickly split a data.table in R based on factor column into a list?

an example with required result (achieved by hard-coding)
DT <- data.table(val=1:8, f=c('a','b','b','a','b','a','a','c'))
required <- list(DT[f=='a'], DT[f=='b'], DT[f=='c'])
There's a split method for objects of class "data.table". But unlike in base R's method for data.frames, there is an argument by that needs quoted column names.
From help('split.data.table'), my emphasis:
Details
Argument f is just for consistency in usage to data.frame method. Recommended is to use by argument instead, it will be faster, more flexible, and by default will preserve order according to order in data.
split(DT, by = 'f')
#$a
# val f
#1: 1 a
#2: 4 a
#3: 6 a
#4: 7 a
#
#$b
# val f
#1: 2 b
#2: 3 b
#3: 5 b
#
#$c
# val f
#1: 8 c
The reverse is rbindlist. It gives the original DT with the rows in order a, b, c.
rbindlist(split(DT, by = 'f'))
# val f
#1: 1 a
#2: 4 a
#3: 6 a
#4: 7 a
#5: 2 b
#6: 3 b
#7: 5 b
#8: 8 c

Expanding data.table by operating on a column

I want to perform an operation on a subset of rows in a data.table that result in a greater number of rows than what I started out with. Is there an easy way to expand the original data.table to accommodate this? If not, how could I accomplish this?
Here's a sample of my original data.
DT <- data.table(my.id=c(1,2,3), unmodified=c("a","b","c"), vals=c("apple",NA,"cat"))
DT
my.id unmodified vals
1: 1 a apple
2: 2 b NA
3: 3 c cat
And this is my desired output.
DT
my.id unmodified vals
1: 1 a apple
2: 2 b boy
3: 2 b bat
4: 2 b bag
5: 3 c cat
The new rows can appear at the end as well, I don't care about the order. I tried DT[my.id == 2, vals := c("boy","bat","bag")], but it ignores the last 2 entries with a warning.
TIA!
EDIT: My original dataset has about 10 million rows, although the entry with a missing value occurs just once. I'd prefer not to create copies of the data.table, if possible.
You can use the summarize pattern of data.table by setting the group variables to be my.id and unmodified here; this broadcasts values within each group if the length doesn't match:
DT[, .(vals = if(my.id == 2) c("boy","bat","bag") else vals), .(my.id, unmodified)]
# my.id unmodified vals
#1: 1 a apple
#2: 2 b boy
#3: 2 b bat
#4: 2 b bag
#5: 3 c cat
Another option is to subset the datasets that have 'my.id' as 2 and not 2, then rbind
rbind(DT[my.id == 2][, .(my.id, unmodified, vals = c('boy', 'bat',
'bag'))], DT[my.id != 2])[order(my.id)]
# my.id unmodified vals
#1: 1 a apple
#2: 2 b boy
#3: 2 b bat
#4: 2 b bag
#5: 3 c cat
> DT <- data.table(my.id=c(1,2,3), unmodified=c("a","b","c"), vals=c("apple",NA,"cat"))
> DT
my.id unmodified vals
1: 1 a apple
2: 2 b NA
3: 3 c cat
> DT2 <- data.table(my.id=rep(2,3), unmodified=rep("b",3), vals=c("boy","bat","bag"))
> DT2
my.id unmodified vals
1: 2 b boy
2: 2 b bat
3: 2 b bag
> rbind(DT,DT2)
my.id unmodified vals
1: 1 a apple
2: 2 b NA
3: 3 c cat
4: 2 b boy
5: 2 b bat
6: 2 b bag
> rbind(DT,DT2)[order(my.id),]
my.id unmodified vals
1: 1 a apple
2: 2 b NA
3: 2 b boy
4: 2 b bat
5: 2 b bag
6: 3 c cat
> na.omit(rbind(DT,DT2)[order(my.id),])
my.id unmodified vals
1: 1 a apple
2: 2 b boy
3: 2 b bat
4: 2 b bag
5: 3 c cat

Strsplit on a column of a data frame [duplicate]

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 6 years ago.
I have a data.frame where one of the variables is a vector (or a list), like this:
MyColumn <- c("A, B,C", "D,E", "F","G")
MyDF <- data.frame(group_id=1:4, val=11:14, cat=MyColumn)
# group_id val cat
# 1 1 11 A, B,C
# 2 2 12 D,E
# 3 3 13 F
# 4 4 14 G
I'd like to have a new data frame with as many rows as the vector
FlatColumn <- unlist(strsplit(MyColumn,split=","))
which looks like this:
MyNewDF <- data.frame(group_id=c(rep(1,3),rep(2,2),3,4), val=c(rep(11,3),rep(12,2),13,14), cat=FlatColumn)
# group_id val cat
# 1 1 11 A
# 2 1 11 B
# 3 1 11 C
# 4 2 12 D
# 5 2 12 E
# 6 3 13 F
# 7 4 14 G
In essence, for every factor which is an element of the list of MyColumn (the letters A to G), I want to assign the corresponding values of the list. Every factor appears only once in MyColumn.
Is there a neat way for this kind of reshaping/unlisting/merging? I've come up with a very cumbersome for-loop over the rows of MyDF and the length of the corresponding element of strsplit(MyColumn,split=","). I'm very sure that there has to be a more elegant way.
You can use separate_rows from tidyr:
tidyr::separate_rows(MyDF, cat)
# group_id val cat
# 1 1 11 A
# 2 1 11 B
# 3 1 11 C
# 4 2 12 D
# 5 2 12 E
# 6 3 13 F
# 7 4 14 G
How about
lst <- strsplit(MyColumn, split = ",")
k <- lengths(lst) ## expansion size
FlatColumn <- unlist(lst, use.names = FALSE)
MyNewDF <- data.frame(group_id = rep.int(MyDF$group_id, k),
val = rep.int(MyDF$val, k),
cat = FlatColumn)
# group_id val cat
#1 1 11 A
#2 1 11 B
#3 1 11 C
#4 2 12 D
#5 2 12 E
#6 3 13 F
#7 4 14 G
We can use cSplit from splitstackshape
library(splitstackshape)
cSplit(MyDF, "cat", ",", "long")
# group_id val cat
#1: 1 11 A
#2: 1 11 B
#3: 1 11 C
#4: 2 12 D
#5: 2 12 E
#6: 3 13 F
#7: 4 14 G
We can also use do with base R with strsplit to split the 'cat' column into a list, replicate the sequence of rows of 'MyDF' with the lengths of 'lst', and create the 'cat' column by unlisting the 'lst'.
lst <- strsplit(as.character(MyDF$cat), ",")
transform(MyDF[rep(1:nrow(MyDF), lengths(lst)),-3], cat = unlist(lst))

Resources