rbindlist only elements that meet a condition - r

I have a large list. Some of the elements are strings and some of the elements are data.tables. I would like to create a big data.table, but only rbind the elements that are data.tables.
I know how to do it in a for loop, but I am looking for something more efficient as my data are big and I need something quick.
Thank you!
library(data.table)
DT1 = data.table(
ID = c("b","b","b","a","a","c"),
a = 1:6
)
DT2 = data.table(
ID = c("b","b","b","a","a","c"),
a = 11:16
)
list<- list(DT1,DT2,"string")
I am looking for a result similar to doing, but since I have many entries I cannot do it like this.
rbind(DT1, DT2)

Filter the data.table and rbind
library(data.table)
rbindlist(Filter(is.data.table, list_df))
# ID a
# 1: b 1
# 2: b 2
# 3: b 3
# 4: a 4
# 5: a 5
# 6: c 6
# 7: b 11
# 8: b 12
# 9: b 13
#10: a 14
#11: a 15
#12: c 16
data
list_df <- list(DT1,DT2,"string")

We can use keep from purrr with bind_rows
library(tidyverse)
keep(list, is.data.table) %>%
bind_rows
# ID a
# 1: b 1
# 2: b 2
# 3: b 3
# 4: a 4
# 5: a 5
# 6: c 6
# 7: b 11
# 8: b 12
# 9: b 13
#10: a 14
#11: a 15
#12: c 16
Or using rbindlist with keep
rbindlist(keep(list, is.data.table))

Using sapply() to generate a logical vector to subset your list
rbindlist(list[sapply(list, is.data.table)])

Related

Merge multiple numeric column as list typed column in data.table [R]

I'm trying to find a way to merge multiple column numeric column as a new list type column.
Data Table
dt <- data.table(
a=c(1,2,3),
b=c(4,5,6),
c=c(7,8,9)
)
Expected Result
a b c d
1: 1 4 7 1,4,7
2: 2 5 8 2,5,8
3: 3 6 9 3,6,9
Attempt 1
I have tried doing append with a list with dt[,d:=list(c(a,b,c))] but it just append everything instead and get the incorrect result
a b c d
1: 1 4 7 1,2,3,4,5,6,...
2: 2 5 8 1,2,3,4,5,6,...
3: 3 6 9 1,2,3,4,5,6,...
Do a group by row and place the elements in the list
dt[, d := .(list(unlist(.SD, recursive = FALSE))), 1:nrow(dt)]
-output
dt
a b c d
1: 1 4 7 1,4,7
2: 2 5 8 2,5,8
3: 3 6 9 3,6,9
Or another option is paste and strsplit
dt[, d := strsplit(do.call(paste, c(.SD, sep=",")), ",")]
Or may use transpose
dt[, d := lapply(data.table::transpose(unname(.SD)), unlist)]
dt
a b c d
1: 1 4 7 1,4,7
2: 2 5 8 2,5,8
3: 3 6 9 3,6,9
dt[, d := purrr::pmap(.SD, ~c(...))]

replace row values based on another row value in a data.table

I have a trivial question, though I am struggling to find a simple answer. I have a data table that looks something like this:
dt <- data.table(id= c(A,A,A,A,B,B,B,C,C,C), time=c(1,2,3,1,2,3,1,2,3), score = c(10,15,13,25,NA,NA,18,29,19))
dt
# id time score
# 1: A 1 NA
# 2: A 2 10
# 3: A 3 15
# 4: A 4 13
# 5: B 1 NA
# 6: B 2 25
# 7: B 3 NA
# 8: B 4 NA
# 9: C 1 18
# 10: C 2 29
# 11: C 3 NA
# 12: C 4 19
I would like to replace the missing values of my group "B" with the values of "A".
The final dataset should look something like this
final
# id time score
# 1: A 1 NA
# 2: A 2 10
# 3: A 3 15
# 4: A 4 13
# 5: B 1 NA
# 6: B 2 25
# 7: B 3 15
# 8: B 4 13
# 9: C 1 18
# 10: C 2 29
# 11: C 3 NA
# 12: C 4 19
In other words, conditional on the fact that B is NA, I would like to replace the score of "A". Do note that "C" remains NA.
I am struggling to find a clean way to do this using data.table. However, if it is simpler with other methods it would still be ok.
Thanks a lot for your help
Here is one option where we get the index of the rows which are NA for 'score' and the 'id' is "B", use that to replace the NA with the corresponding 'score' value from 'A'
library(data.table)
i1 <- setDT(dt)[id == 'B', which(is.na(score))]
dt[, score:= replace(score, id == 'B' & is.na(score), score[which(id == 'A')[i1]])]
Or a similar option in dplyr
library(dplyr)
dt %>%
mutate(score = replace(score, id == "B" & is.na(score),
score[which(id == "A")[i1]))

Sort a data.table programmatically using character vector of multiple column names

I need to sort a data.table on multiple columns provided as character vector of variable names.
This is my approach so far:
DT = data.table(x = rep(c("b","a","c"), each = 3), y = c(1,3,6), v = 1:9)
#column names to sort by, stored in a vector
keycol <- c("x", "y")
DT[order(keycol)]
x y v
1: b 1 1
2: b 3 2
Somehow It displays just 2 rows and removes other records. But if I do this:
DT[order(x, y)]
x y v
1: a 1 4
2: a 3 5
3: a 6 6
4: b 1 1
5: b 3 2
6: b 6 3
7: c 1 7
8: c 3 8
9: c 6 9
It works like fluid.
Can anyone help with sorting using column name vector?
You need ?setorderv and its cols argument:
A character vector of column names of x by which to order
library(data.table)
DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)
#column vector
keycol <-c("x","y")
setorderv(DT, keycol)
DT
x y v
1: a 1 4
2: a 3 5
3: a 6 6
4: b 1 1
5: b 3 2
6: b 6 3
7: c 1 7
8: c 3 8
9: c 6 9
Note that there is no need to assign the output of setorderv back to DT. The function updates DT by reference.

Renaming multiple columns in R data.table

This is related to this question from Henrik
Assign multiple columns using := in data.table, by group
But what if I want to create a new data.table with given column names instead of assigning new columns to an existing one?
f <- function(x){list(head(x,2),tail(x,2))}
dt <- data.table(group=sample(c('a','b'),10,replace = TRUE),val=1:10)
> dt
group val
1: b 1
2: b 2
3: a 3
4: b 4
5: a 5
6: b 6
7: a 7
8: a 8
9: b 9
10: b 10
I want to get a new data.table with predefined column names by calling the function f:
dt[,c('head','tail')=f(val),by=group]
I wish to get this:
group head tail
1: a 1 8
2: a 3 10
3: b 2 6
4: b 5 9
But it gives me an error. What I can do is create the table then change the column names, but that seems cumbersome:
> dt2 <- dt[,f(val),by=group]
> dt2
group V1 V2
1: a 1 8
2: a 3 10
3: b 2 6
4: b 5 9
> colnames(dt2)[-1] <- c('head','tail')
> dt2
group head tail
1: a 1 8
2: a 3 10
3: b 2 6
4: b 5 9
Is it something I can do with one call?
From running your code as-is, this is the error I get:
dt[,c('head','tail')=f(val),by=group]
# Error: unexpected '=' in "dt2[,c('head','tail')="
The problem is using = instead of := for assignment.
On to your problem of wanting a new data.table:
dt2 <- dt[, setNames(f(val), c('head', 'tail')), by = group]

How do I select all data.table columns that are in a second data.table

I have several data.tables that have the same columns, and one that has some extra columns. I want to rbind them all but only on the common columns
with data.frames I could simply do
rbind(df1[,names(df2)],df2,df3,...)
I can of course write all the column names in the form
list(col1,col2,col3,col4)
but this is not elegant, nor feasible if one has 1,000 variables
I am sure there is a way and I am not getting there - any help would be appreciated
May be you can try:
DT1 <- data.table(Col1=1:5, Col2=6:10, Col3=2:6)
DT2 <- data.table(Col1=1:4, Col3=2:5)
DT3 <- data.table(Col1=1:7, Col3=1:7)
lst1 <- mget(ls(pattern="DT\\d+"))
ColstoRbind <- Reduce(`intersect`,lapply(lst1, colnames))
# .. "looks up one level"
res <- rbindlist(lapply(lst1, function(x) x[, ..ColstoRbind]))
res
# Col1 Col3
# 1: 1 2
# 2: 2 3
# 3: 3 4
# 4: 4 5
# 5: 5 6
# 6: 1 2
# 7: 2 3
# 8: 3 4
# 9: 4 5
#10: 1 1
#11: 2 2
#12: 3 3
#13: 4 4
#14: 5 5
#15: 6 6
#16: 7 7
Update
As #Arun suggested in the comments, this might be better
rbindlist(lapply(lst1, function(x) {
if(length(setdiff(colnames(x), ColstoRbind))>0) {
x[, ..ColstoRbind]
}
else x}))

Resources