This question already has answers here:
Left join using data.table
(3 answers)
Assign value (stemming from configuration table) to group based on condition in column
(1 answer)
Closed 4 years ago.
I have two indexed data tables, and I want to add a column from one table to the other by index. My current approach is as follows:
A <- data.table(index = seq(6,10), a = rnorm(5))
B <- data.table(index = seq(10), b = rnorm(10))
setkey(B, index)
A[, b := B[.(A[,index]), b]]
While this gets the job done, the syntax seems a bit redundant. Is there a cleaner way to perform the same operation?
We can do this with a join
A[B, b := b, on = .(index)]
The setkey step is not needed here
Related
This question already has answers here:
Return elements of list as independent objects in global environment
(4 answers)
Closed 1 year ago.
Let's say that I have a data frame that looks like the following.
dt = data.frame(a = rnorm(5), b = rnorm(5))
I would like to assign a set of column variables as a vector.
colvec <- c("a","b")
So for these two columns in the vector, I'd like to dynamically take that column and create a vector of the same name.
So the normal way to do this would be to do...
a = dt$a
b = dt$b
But I want to do this dynamically. Any suggestions?
Not sure if this is a good idea, but you can do:
sapply(colvec, function(x) assign(x, dt[, x], envir = .GlobalEnv))
This question already has answers here:
Extract columns from data table by numeric indices stored in a vector
(2 answers)
Closed 1 year ago.
Given a data.table, how can I select a set of columns using a variable?
Example:
df[, 1:3]
is OK, but
idx <- 1:3
df[, idx]
is not OK: column named "idx" does not exist.
How can I use idx to select the columns in the simplest possible way?
We can use .. before the idx to select the columns in data.table or with = FALSE
library(data.table)
df[, ..idx]
df[, idx, with = FALSE]
This question already has answers here:
Create group number for contiguous runs of equal values
(4 answers)
Closed 6 days ago.
Working with data.table package in R, I'm trying to get the 'group number' of some data points.
Specifically, my data is trajectories: I have many rows describing a specific observation of the particle I'm tracking, and I want to generate a specific index for the trajectory based on other identifying information I have.
If I do a [, , by] command, I can group my data by this identifying information and isolate each trajectory.
Is there a way, similar to .I or .N, which gives what I would call the index of the subset?
Here's an example with toy data:
dt <- data.table(x1 = c(rep(1,4), rep(2,4)),
x2 = c(1,1,2,2,1,1,2,2),
z = runif(8))
I need a fast way to get the trajectories (here should be c(1,1,2,2,3,3,4,4) for each observation -- my real data set is moderately large.
If we need the trajectories (donno what that means) based on the 'x2', we can use rleid
dt[, Grp := rleid(x2)]
Or if we need the group numbers based on 'x1' and 'x2', .GRP can be used.
dt[, Grp := .GRP,.(x1, x2)]
Or this can be done using rleid alone without the by (as #Frank mentioned)
dt[, Grp := rleid(x1,x2)]
This question already has answers here:
Extracting unique rows from a data table in R [duplicate]
(2 answers)
Closed 4 years ago.
I've discovered some interesting behavior in data.table, and I'm curious if someone can explain to me why this is happening. I'm merging two data.tables (in this MWE, one has 1 row and the other 2 rows). The merged data.table has two unique rows, but when I call unique() on the merged data.table, I get a data.table with one row. Am I doing something wrong? Or is this a bug?
Here's an MWE:
library(data.table)
X = data.table(keyCol = 1)
setkey(X, keyCol)
Y = data.table(keyCol = 1, otherKey = 1:2)
setkeyv(Y, c("keyCol", "otherKey"))
X[Y, ] # 2 unique rows
unique(X[Y, ]) # Only 1 row???
I'd expect unique(X[Y, ]) to be the same as X[Y, ] since all rows are unique, but this doesn't seem to be the case.
The default value to by argument for unique.data.table is key(x). Therefore, if you do unique(x) on a keyed data.table, it only looks at the key columns. To override it, do:
unique(x, by = NULL)
by = NULL by default considers all the columns. Alternatively you can also provide by = names(x).
This question already has answers here:
Using lists inside data.table columns
(2 answers)
Closed 8 years ago.
I have this:
dt = data.table(index=c(1,2), items=list(c(1,2,3),c(4,5)))
# index items
#1: 1 1,2,3
#2: 2 4,5
I want to change the dt[index==2,items] to c(6,7).
I tried:
dt[index==2, items] = c(6,7)
dt[index==2, items := c(6,7)]
One workaround is to use ifelse:
dt[,items:=ifelse(index==2,list(c(6,7)),items)]
index items
1: 1 1,2,3
2: 2 6,7
EDIT the correct answer:
dt[index==2,items := list(list(c(6,7)))]
Indeed, you'll need one more list because data.table uses list(.) to look for values to assign to columns by reference.
There are two ways to use the := operator in data.table:
The LHS := RHS form:
DT[, c("col1", "col2", ..) := list(val1, val2, ...)]
It takes a list() argument on the RHS. To add a list column, you'll need to wrap with another list (as illustrated above).
The functional form:
DT[, `:=`(col1 = val1, ## some comments
col2 = val2, ## some more comments
...)]
It is especially useful to add some comments along with the assignment.
dt[index==2]$items[[1]] <- list(c(6,7))
dt
# index items
# 1: 1 1,2,3
# 2: 2 6,7
The problem is that, the way you have it set up, dt$items is a list, not a vector, so you have to use list indexing (e.g., dt$items[[1]]). But AFAIK you can't update a list element by reference, so, e.g.,
dt[index==2,items[[1]]:=list(c(6,7))]
will not work.
BTW I also do not see the point of using data.tables for this.
This worked:
dt$items[[which(dt$index==2)]] = c(6,7)