metaprogramming map on data.table list-columns

metaprogramming map on data.table list-columns - r

I cannot map over a nested column using data.table.
I made it an example.
library(data.table)
library(purrr)
DT <- setDT(list(
gp = c("A", "B"),
data = list(
setDT(list(d1 = c(1, 2, 3), d2 = c(2, 2, 4), d3 = c(0.2, 0.2, 0.4))),
setDT(list(d1 = c(10, 20, 30), d2 = c(20, 20, 40), d3 = c(0.2, 0.2, 0.4)))
),
metric = c("max", "min")
))
choose_a and choose_b are two of the n columns nested.
calc_name is the name of the calculated new column, that has been opereted by
the calc_metric_mean function
calc_metric_mean <- function(a, b, metric){
if(metric == "max"){
return(mean(c(max(a), max(b))))
}
if(metric == "min"){
return(mean(c(min(a), min(b))))
}
if(metric == "q74"){
return(mean(c(quantile(a, 74), quantile(b, 74))))
}
}
choose_a <- c("d1", "d2", "d2")
choose_b <- c("d3", "d1", "d2")
calc_name <- paste(choose_a, choose_b, sep = '')
metric <- "max"
for(i in 1:length(calc_name)){
DT[, calc_name[[i]] := map_dbl(
.x = data,
~calc_metric_mean(
a = choose_a[[i]],
b = choose_b[[i]],
metric = "max"
)
)]
}
The result would be
gp data d1d3 d2d1 d2d2
1: A <data.table[3x3]> 1.7 3.5 4
2: B <data.table[3x3]> 15.2 35.0 40
ADDED 2021-03-18
Second quiz: How about if you have the parameter "metric" in a column, outside the nested data?
The result would be
gp data metric d1d3 d2d1 d2d2
1: A <data.table[3x3]> max 1.7 3.5 4
2: B <data.table[3x3]> min 5.1 15 20

Sorry, if I haven't understood the question correctly, but if you're trying to produce the desired output using DT, using a for() loop with set() is an option:
for(i in 1:length(calc_name)){
set(DT, NULL, j = calc_name[i],
value = lapply(DT$data, function(x){
calc_metric_mean(a = x[[choose_a[i]]], b = x[[choose_b[i]]], metric = "max")
}
)
)
}
DT
This approach is in someways a nested for-loop, which isn't the most elegant, but it gets the job done and looping with set() can still be quite fast since it's updating by reference. One note is that this approach takes advantage of the fact that a data.table is a list with x[[choose_a[i]].
To get my code to work, I had to make two small changes to your example set up. First, because you created DT with structure, you need setDT(DT) to use set(). Second, I edited calc_metric_mean() to be more explicit about what it returns. Otherwise, it returned NULL for me:
calc_metric_mean <- function(a, b, metric){
if(metric == "max"){
return(mean(c(max(a), max(b))))
}
if(metric == "min"){
return(mean(c(min(a), min(b))))
}
if(metric == "q74"){
return(mean(c(quantile(a, 74), quantile(b, 74))))
}
}

There's another answer thanks to wonderful #diaggy 's answer.
for(i in 1:length(calc_name)){
DT[, calc_name[i] := lapply(DT$data, function(x){
calc_metric_mean(a = x[[choose_a[i]]], b = x[[choose_b[i]]], metric = "max")
})][]
}
This leads to the desired result too.
> DT
gp data d1d3 d2d1 d2d2
1: A <data.table[3x3]> 1.7 3.5 4
2: B <data.table[3x3]> 15.2 35 40
There're some comments to do:
The final empty [] is neccesary to list off the := result in the data.table (see 2.23 in faqs).
The double call x[[ is neccesary to assess the inner columns in a list-column. For some reason, x[, choose_a[i]] returns the character choose_a[i] and this won't work.
In the comparison, it is better #diaggy 's solution:
expr min lq mean median uq max neval
eval(diaggys_set) 3.589102 3.849702 4.487934 4.054001 4.516901 10.4261 100
eval(direct) 4.749001 5.127901 5.844534 5.386051 5.985651 12.9724 100

First Variation: using variables from the nested objetive
lapply is enough. See the #diaggy's Answer.
Second Variation: using variables from and Outside the nested objetive
If you have to load a parameter from other column, it is neccesary pass from lapply, to mapply.
for(i in 1:length(calc_name)){
set(DT, NULL, j = calc_name[i],
value = mapply(function(x, m){
calc_metric_mean(a = x[[choose_a[i]]], b = x[[choose_b[i]]], metric = m)
}, x = DT$data, m = DT$metric, SIMPLIFY = FALSE
)
)
}
> DT
gp data metric d1d3 d2d1 d2d2
1: A <data.table[3x3]> max 1.7 3.5 4
2: B <data.table[3x3]> min 5.1 15 20
SIMPLIFY = FALSE is required if it will return a list instead a vector.

Related

Memory efficient ways of merging a vector with a large data.table to perform calculations (R)

I have a dataset with year-based data predicted by multiple models, in data.table format.
library(data.table)
nYears = 20 # real data: 110
nMod = 3 # real data: ~ 100
nGrp = 45
dataset <- data.table(
group_code = rep(seq(1:nGrp ), times= 3*nYears ),
Year = rep(seq(1:nYears ), each=nGrp ),
value = rnorm(2700 , mean = 10, sd = 2),
var1 = rep (rnorm(nGrp , mean = nMod, sd = 1) , times= nMod*nYears ),
var2 = rep (rnorm(nGrp , mean = 1.5, sd = 0.5) , times= nMod*nYears ),
model = as.character(rep(seq( from = 1, to = nMod ) , each=nGrp *nYears ))
)
setkey(dataset, Year, model)
I need to perform a set of calculations from this dataset based on a vector, named x, of lenght=1001 and consists on a seq(-2, 8, by=0.01).
To do so, I created a new data.table (dt) with repeated versions of dataset to merge vector x, accordingly:
dt <- dataset[, lapply(.SD, function(x) rep(x, 1001))]
dt[, x := rep(round(seq(-2, 8, by=0.01), 2), each= nYears*nGrp*nMod) ]
Since my original dataset includes hundreds of models, this operation is not memory efficient.
The most important operation I need, includes the generation of normal distribution of x, with mean = var1 and sd= var2, by group_code, Year and model. For example:
# key computation
dt [, norm_dist := dnorm (x, var1, var2) , by= .(group_code, Year, model )]
This last operation is quite fast in my desktop. However, I have other operations to perform that require to subset the data.table and are highly RAM consuming. An example:
dt[ x %between% c( 2, 5.99), dt2 := rep_len( rev(dt [x %between% c(-2, 1.99)]$value), length.out=.N) , by= .(Year, model) ]
The following error pop-s up:
Error: cannot allocate vector of size 1.3 Gb
I believe the problem in this specific step is related to the subset and the rev() function.
Nevertheless, the approach I'm using to performing the set of calculations based on the vector "x" from data.table dt, does not seem appropriate since the moment I merged the dataset with the vector I need for calculations ("x").
I was hoping someone could teach me how to efficiently improve my code, since I have a considerable amount of models in the original dataset, greatly increasing its size.
Thank you!

I think that this part of code should be clearer
dt[ x %between% c( 2, 5.99), dt2 := rep_len( rev(dt [x %between% c(-2, 1.99)]$value), length.out=.N) , by= .(Year, model) ]
as it is a bit like a black box to me. Especially because this double subsetting is where your problem is generated.
These bits of code, x %between% c( 2, 5.99) and dt[x %between% c(-2, 1.99)], should result always to the same positions in all your cases. You should consider that in your code to make it more efficient.
Try something like this to make things a bit clearer:
by_YM <- split(dt, by=c("Year", "model"))
ind1 <- which(by_YM[[1]][["x"]] %between% c( 2, 5.99))
ind2 <- which(by_YM[[1]][["x"]] %between% c(-2, 1.99))
for(i in 1:length(by_YM)){
dt_i <- by_YM[[i]]
#val1 <- rep_len(rev(dt_i$value[ind2]), length.out=length(ind1)) #val1 is equal to val, no need for rep_len
val <- rev(dt_i$value[ind2])
by_YM[[i]] <- dt_i[ind1, dt2 := val]
}
however ours dt2 columns are not equal but as I am not sure of how the final result should be I cannot debug it.
dt2_a <- dt[Year == 20 & model == 3, dt2]
dt2_b <- by_YM[["20.3"]][, dt2]
test <- cbind(dt2_a, dt2_b)
The second code is also much faster.
library(microbenchmark)
microbenchmark( "new_code" = {
by_YM <- split(dt, by=c("Year", "model"))
ind1 <- which(by_YM[[1]][["x"]] %between% c( 2, 5.99))
ind2 <- which(by_YM[[1]][["x"]] %between% c(-2, 1.99))
for(i in 1:length(by_YM)){
dt_i <- by_YM[[i]]
val1 <- rep_len(rev(dt_i$value[ind2]), length.out=length(ind1)) #val1 is equal to val, no need for rep_len
val <- rev(dt_i$value[ind2])
by_YM[[i]] <- dt_i[ind1, dt2 := val]
}}, "old_code" = dt[ x %between% c( 2, 5.99),
dt2 := rep_len( rev(dt [x %between% c(-2, 1.99)]$value), length.out=.N) , by= .(Year, model) ],
times = 5)
Unit: milliseconds
expr min lq mean median uq max neval cld
new_code 155.426 156.4916 200.6587 185.0347 188.9436 317.3977 5 a
old_code 1290.909 1299.8570 1398.6866 1370.4526 1471.0569 1561.1574 5 b
Give it a try and good luck

R custom data.table function with multiple variable inputs

I am writing a custom aggregation function with data.table (v 1.9.6) and struggle to pass function arguments to it. there have been similar questions on this but none deals with multiple (variable) inputs and none seems to have a conclusive answer but rather "little hacks".
pass variables and names to data.table function
eval and quote in data.table
How can one work fully generically in data.table in R with column names in variables
I would like to take a data table sum and order defined variables and create new variables on top (2 steps). the crucial think is that everything should be parameterized i.e. variables to sum, variables to group by, variables to order by. and they can all be one or more variables. a small example.
dt <- data.table(a=rep(letters[1:4], 5),
b=rep(letters[5:8], 5),
c=rep(letters[3:6], 5),
x=sample(1:100, 20),
y=sample(1:100, 20),
z=sample(1:100, 20))
temp <-
dt[, .(x_sum = sum(x, na.rm = T),
y_sum = sum(y, na.rm = T)),
by = .(a, b)][order(a, b)]
temp2 <-
temp[, `:=` (x_sum_del = (x_sum - shift(x = x_sum, n = 1, type = "lag")),
y_sum_del = (y_sum - shift(x = y_sum, n = 1, type = "lag")),
x_sum_del_rel = ((x_sum - shift(x = x_sum, n = 1, type = "lag")) /
(shift(x = x_sum, n = 1, type = "lag"))),
y_sum_del_rel = ((y_sum - shift(x = y_sum, n = 1, type = "lag")) /
(shift(x = y_sum, n = 1, type = "lag")))
)
]
how to programmatically pass following function arguments (i.e. not single inputs but vectors/list of inputs):
x and y --> var_list
new names of x and y (e.g. x_sum, y_sum) --> var_name_list
group by arguments a, b --> by_var_list
order by arguments a, b --> order_var_list
temp 2 should work on all pre-defined parameters, I was also thinking about using an apply function but again struggled to pass a list of variables.
I have played around with variations of get(), as.name(), eval(), quote() but as soon as I pass more than one variable, they don't work anymore. I hope the question is clear, otherwise I am happy to adjust where you deem necessary. a function call would look as follows:
fn_agg(dt, var_list, var_name_list, by_var_list, order_var_list)

Looks like a question to me :)
I prefer computing on the language over get/mget.
fn_agg = function(dt, var_list, var_name_list, by_var_list, order_var_list) {
j_call = as.call(c(
as.name("."),
sapply(setNames(var_list, var_name_list), function(var) as.call(list(as.name("sum"), as.name(var), na.rm=TRUE)), simplify=FALSE)
))
order_call = as.call(c(
as.name("order"),
lapply(order_var_list, as.name)
))
j2_call = as.call(c(
as.name(":="),
c(
sapply(setNames(var_name_list, paste0(var_name_list,"_del")), function(var) {
substitute(.var - shift(x = .var, n = 1, type = "lag"), list(.var=as.name(var)))
}, simplify=FALSE),
sapply(setNames(var_name_list, paste0(var_name_list,"_del_rel")), function(var) {
substitute((.var - shift(x = .var, n = 1, type = "lag")) / (shift(x = .var, n = 1, type = "lag")), list(.var=as.name(var)))
}, simplify=FALSE)
)
))
dt[eval(order_call), eval(j_call), by=by_var_list
][, eval(j2_call)
][]
}
ans = fn_agg(dt, var_list=c("x","y"), var_name_list=c("x_sum","y_sum"), by_var_list=c("a","b"), order_var_list=c("a","b"))
all.equal(temp2, ans)
#[1] TRUE
Some extra notes:
make strict input validation as debugging issues is more difficuilt against meta programming.
optimization of step2 is possible as shift is computed multiple times, easy way is just to compute _del in step2 and _del_rel in step3.
if order variables is always the same as by variables you can put them into keyby argument.

Here's an option using mget, as commented:
fn_agg <- function(DT, var_list, var_name_list, by_var_list, order_var_list) {
temp <- DT[, setNames(lapply(.SD, sum, na.rm = TRUE), var_name_list),
by = by_var_list, .SDcols = var_list]
setorderv(temp, order_var_list)
cols1 <- paste0(var_name_list, "_del")
cols2 <- paste0(cols1, "_rel")
temp[, (cols1) := lapply(mget(var_name_list), function(x) {
x - shift(x, n = 1, type = "lag")
})]
temp[, (cols2) := lapply(mget(var_name_list), function(x) {
xshift <- shift(x, n = 1, type = "lag")
(x - xshift) / xshift
})]
temp[]
}
fn_agg(dt,
var_list = c("x", "y"),
var_name_list = c("x_sum", "y_sum"),
by_var_list = c("a", "b"),
order_var_list = c("a", "b"))
# a b x_sum y_sum x_sum_del y_sum_del x_sum_del_rel y_sum_del_rel
#1: a e 254 358 NA NA NA NA
#2: b f 246 116 -8 -242 -0.031496063 -0.6759777
#3: c g 272 242 26 126 0.105691057 1.0862069
#4: d h 273 194 1 -48 0.003676471 -0.1983471
Instead of mget, you could also make use of data.table's .SDcols argument as in
temp[, (cols1) := lapply(.SD, function(x) {
x - shift(x, n = 1, type = "lag")
}), .SDcols = var_name_list]
Also, there are probably ways to improve the function by avoiding duplicated computation of shift(x, n = 1, type = "lag") but I only wanted to demonstrate a way to use data.table in functions.

plyr outperforms dplyr and data.table - What's wrong?

I have to apply a function to every row of a large table (~ 2M rows). I used to use plyr for that, but the table is growing continuously and the current solution starts to approach unacceptable runtimes. I thought I could just switch to data.table or dplyr and all is fine, but that's not the case.
Here's an example:
library(data.table)
library(plyr)
library(dplyr)
dt = data.table("ID_1" = c(1:1000), # unique ID
"ID_2" = ceiling(runif(1000, 0, 100)), # other ID, duplicates possible
"group" = sample(LETTERS[1:10], 1000, replace = T),
"value" = runif(1000),
"ballast1" = "X", # keeps unchanged in derive_dt
"ballast2" = "Y", # keeps unchanged in derive_dt
"ballast3" = "Z", # keeps unchanged in derive_dt
"value_derived" = 0)
setkey(dt, ID_1)
extra_arg = c("A", "F", "G", "H")
ID_1 is guaranteed to contain no duplicates. Now I define a function to apply to every row/ID_1:
derive = function(tmprow, extra_arg){
if(tmprow$group %in% extra_arg){return(NULL)} # exlude entries occuring in extra_arg
group_index = which(LETTERS == tmprow$group)
group_index = ((group_index + sample(1:26, 1)) %% 25) + 1
new_group = LETTERS[group_index]
if(new_group %in% unique(dt$group)){return(NULL)}
new_value = runif(1)
row_derived = tmprow
row_derived$group = new_group
row_derived$value = runif(1)
row_derived$value_derived = 1
return(row_derived)
}
This one doesn't do anything useful (the actual one does). The point is that the function takes one row and computes a new row of the same format.
Now the comparison:
set.seed(42)
system.time(result_dt <- dt[, derive(.SD, extra_arg), by = ID_1])
set.seed(42)
system.time(result_dplyr <- dt %>% group_by(ID_1) %>% do(derive(., extra_arg)))
set.seed(42)
system.time(results_plyr <- x <- ddply(dt, .variable = "ID_1", .fun = derive, extra_arg))
plyr is about 8x faster than both data.table and dplyr. Obviously I'm doing something wrong here, but what?
EDIT
Thanks to eddi's answer I could reduce runtimes for data.table and dplyr to ~ 0.6 and 0.8 of the plyr version, respectively. I intialized row_derived as data.frame: row_derived = as.data.frame(tmprow). That's cool, but I still expected a higher performance increase from these packages...any further suggestions?

The issue is the assignment you use has a very high overhead in data.table, and plyr converts the row to a data.frame before passing to your derive function, and thus avoids it:
library(microbenchmark)
df = as.data.frame(dt)
microbenchmark({dt$group = dt$group}, {df$group = df$group})
#Unit: microseconds
# expr min lq mean median uq max neval
# { dt$group = dt$group } 1895.865 2667.499 3092.38903 3080.3620 3389.049 4984.406 100
# { df$group = df$group } 26.045 45.244 64.13909 61.6045 79.635 157.266 100
I can't suggest a good fix, since you say your example is not real problem, so no point in solving it better. Some basic suggestions to look at are - vectorizing the code, and using := or set instead (depending on what exactly you end up doing).

by-group calculation, limited to first N rows of each group

I asked a question before and received a good answer but I needed to apply it to a more specific problem. The DT needs to be divided into 16 sectors based on X and Y values. The X and Y variables represent the coordinates to loop through and divide the data table. I have successfully divided this data table into 16 different 'sectors' and I need to apply the sCalc function on each sector and output a number. I'm looking for a faster way to do this.
Refer to this link for clarification if needed: Faster way to subset data table instead of a for loop R.
library(data.table)
DT <- data.table(X = rep(1:2000, times = 1600), Y = rep(1:1600, each = 2000), Norm =rnorm(1600*2000), Unif = runif(1600*2000))
sCalc <- function(DT) {
setkey(DT, Norm)
cells <- DT[1:(nrow(DT)*0.02)]
nCells <- nrow(DT)
sumCell <- sum(cells[,Norm/sqrt(Unif)])
return(sumCell/nCells)
}
startstop <- function(width, y = FALSE) {
startend <- width - (width/4 - 1)
start <- round(seq(0, startend, length.out = 4))
stop <- round(seq(width/4, width, length.out = 4))
if (length(c(start,stop)[anyDuplicated(c(start,stop))]) != 0) {
dup <- anyDuplicated(c(start,stop))
stop[which(stop == c(start,stop)[dup])] <- stop[which(stop == c(start,stop)[dup])] - 1
}
if (y == TRUE) {
coord <- list(rep(start, each = 4), rep(stop, each = 4))
} else if (y == FALSE) {
coord <- list(rep(start, times = 4), rep(stop, times = 4))
}
return(coord)
}
sectorCalc <- function(x,y,DT) {
sector <- numeric(length = 16)
for (i in 1:length(sector)) {
sect <- DT[X %between% c(x[[1]][i],x[[2]][i]) & Y %between% c(y[[1]][i],y[[2]][i])]
sector[i] <- sCalc(sect)
}
return(sector)
}
x <- startstop(2000)
y <- startstop(1600, y = TRUE)
sectorLoop <- sectorCalc(x,y,DT)
sectorLoop returns:
-4.729271 -4.769156 -4.974996 -4.931120 -4.777013 -4.644919 -4.958968 -4.663221
-4.771545 -4.909868 -4.821098 -4.795526 -4.846709 -4.931514 -4.875148 -4.847105
One solution was using the cut function.
DT[, x.sect := cut(DT[, X], seq(0, 2000, by = 500), dig.lab=10)]
DT[, y.sect := cut(DT[, Y], seq(0, 1600, by = 400), dig.lab=10)]
sectorRef <- DT[order(Norm), .(sCalc = sum(Norm[1:(0.02*.N)] / sqrt(Unif[1:(0.02*.N)]) )/(0.02*.N)), by = .(x.sect, y.sect)]
sectorRef <- sectorRef[[3]]
The above solution returns a data table with the values:
-4.919447 -4.778576 -4.757455 -4.779086 -4.739814 -4.836497 -4.776635 -4.656748
-4.939441 -4.707901 -4.751791 -4.864481 -4.839134 -4.973294 -4.663360 -5.055344
cor(sectorRef, sectorLoop)
The above returns: 0.0726904

As far as I can understand the question, the first thing I would explain is that you can use .N to tell you how many rows there are in each by=.(...)group. I think that is analogous to your nCells.
And where your cells takes the top 2% of rows in each group, this can be accomplished at the vector level by indexing [1:(0.02*.N)]. Assuming you want the top 2% in order of increasing Norm (which is the order you would get from setkey(DT, Norm), although setting a key does more than just sorting), you could call setkey(DT, Norm) before the calculation, as in the example, or to make it clearer what you are doing, you could use order(Norm) inside your calculation.
The sum() part doesn't change, so the equivalent third line is:
DT[order(Norm),
.(sCalc = sum( Norm[1:(0.02*.N)] / sqrt(Unif[1:(0.02*.N)]) )/.N),
by = .(x.sect, y.sect)]
Which returns the operation for the 16 groups:
x.sect y.sect sCalc
1: (1500,2000] (800,1200] -0.09380209
2: (499,1000] (399,800] -0.09833151
3: (499,1000] (1200,1600] -0.09606350
4: (0,499] (399,800] -0.09623751
5: (0,499] (800,1200] -0.09598717
6: (1500,2000] (0,399] -0.09306580
7: (1000,1500] (399,800] -0.09669593
8: (1500,2000] (399,800] -0.09606388
9: (1500,2000] (1200,1600] -0.09368166
10: (499,1000] (0,399] -0.09611643
11: (1000,1500] (0,399] -0.09404482
12: (0,499] (1200,1600] -0.09387951
13: (1000,1500] (1200,1600] -0.10069461
14: (1000,1500] (800,1200] -0.09825285
15: (0,499] (0,399] -0.09890184
16: (499,1000] (800,1200] -0.09756506

Issue with split and data.table

I have a data.table that I want to split into a list and then modify. I'm discovering some weird behavior when I try to delete a column on one of the data.tables in the list after calling split. Here's a MWE (that throws an error and causes my R session to crash):
library(data.table)
d = data.table(level = c(1, 1, 2, 2), value = 1:4)
list = split(d, f = d$level)
list[[1]][, level := NULL]
list
I get:
Error in .shallow(x, cols = cols, retain.key = TRUE) : Internal error: length(names)>0 but <length(dt)

I recommend to use l name for a variable instead of list.
This seems to be a bug caused by split.data.frame method utilized in the process.
I've quite recently proposed a new split.data.table method defined below. It seems to address your problem.
Update 2016-03-30:
split.data.table has been implemented in data.table 1.9.7. Now use can simply use:
library(data.table)
d = data.table(level = c(1, 1, 2, 2), value = 1:4)
l = split(d, by = "level")
l[[1L]][, level := NULL]
l
#$`1`
# value
#1: 1
#2: 2
#
#$`2`
# level value
#1: 2 3
#2: 2 4
The old answer below, it may be useful if you stuck with 1.9.6 or below. Be aware that it won't handle factor levels the same way as split.data.frame, this isn't the case for method developed in data.table 1.9.7 which is consistent to data.frame method.
library(data.table)
split.data.table = function(x, f, drop = FALSE, by, flatten = FALSE, ...){
if(missing(by) && !missing(f)) by = f
stopifnot(!missing(by), is.character(by), is.logical(drop), is.logical(flatten), !".ll" %in% names(x), by %in% names(x))
if(!flatten){
.by = by[1L]
tmp = x[, list(.ll=list(.SD)), by = .by, .SDcols = if(drop) setdiff(names(x), .by) else names(x)]
setattr(ll <- tmp$.ll, "names", tmp[[.by]])
if(length(by) > 1L) return(lapply(ll, split.data.table, drop = drop, by = by[-1L])) else return(ll)
} else {
tmp = x[, list(.ll=list(.SD)), by=by, .SDcols = if(drop) setdiff(names(x), by) else names(x)]
setattr(ll <- tmp$.ll, 'names', tmp[, .(nm = paste(.SD, collapse = ".")), by = by, .SDcols = by]$nm)
return(ll)
}
}
d = data.table(level = c(1, 1, 2, 2), value = 1:4)
l = split.data.table(d, by = "level")
# below setattr to be addressed in split.data.table
invisible(lapply(l, setattr, ".data.table.locked", NULL))
l[[1]][, level := NULL]
l
#$`1`
# value
#1: 1
#2: 2
#
#$`2`
# level value
#1: 2 3
#2: 2 4
I've also filled a bug report describing your case, you can find it in data.table#1481.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

metaprogramming map on data.table list-columns - r

Related

Memory efficient ways of merging a vector with a large data.table to perform calculations (R)

R custom data.table function with multiple variable inputs

plyr outperforms dplyr and data.table - What's wrong?

by-group calculation, limited to first N rows of each group

Issue with split and data.table

Categories

Resources