This question already has answers here:
Extract columns from data table by numeric indices stored in a vector
(2 answers)
Closed 1 year ago.
Given a data.table, how can I select a set of columns using a variable?
Example:
df[, 1:3]
is OK, but
idx <- 1:3
df[, idx]
is not OK: column named "idx" does not exist.
How can I use idx to select the columns in the simplest possible way?
We can use .. before the idx to select the columns in data.table or with = FALSE
library(data.table)
df[, ..idx]
df[, idx, with = FALSE]
Related
This question already has an answer here:
Selecting only unique values from a comma separated string [duplicate]
(1 answer)
Closed 2 years ago.
I am looking to find the unique values with the each row of a column.
df <- as.data.frame(rbind(c('10','20','30','10','45','34'),
c('a','b','c','a','b'),
c("fs","pp","dd","dd")))
df$f7 <-paste0(df$V1,
',',
df$V2,
',',
df$V3,',',df$V4,',',df$V5,',',df$V6)
df_1 <- as.data.frame(df[,c(7)])
names(df_1)[1] <-"f1"
The expected output is :
Row1 :10,20,30,45,34
Row2: a,b,c
Row3:fs,pp,dd
Any help is highly appreciated.
Regards,
R
We can loop over the rows with apply (MARGIN = 1 - for rowwise loop), get the unique values and paste
apply(df, 1, FUN = function(x) toString(unique(x)))
This question already has answers here:
Left join using data.table
(3 answers)
Assign value (stemming from configuration table) to group based on condition in column
(1 answer)
Closed 4 years ago.
I have two indexed data tables, and I want to add a column from one table to the other by index. My current approach is as follows:
A <- data.table(index = seq(6,10), a = rnorm(5))
B <- data.table(index = seq(10), b = rnorm(10))
setkey(B, index)
A[, b := B[.(A[,index]), b]]
While this gets the job done, the syntax seems a bit redundant. Is there a cleaner way to perform the same operation?
We can do this with a join
A[B, b := b, on = .(index)]
The setkey step is not needed here
This question already has answers here:
How do I split a data frame among columns, say at every nth column?
(1 answer)
What is the algorithm behind R core's `split` function?
(1 answer)
Closed 4 years ago.
Is there an easy way in base R to split a data frame into a list of data frames based on an index factor levels (taken from another data frame)?
For example,
x = data.frame(num1 = 1:26, let = letters, num2 = 10:35, LET = LETTERS)
ls = list(x[, 1:2], x[, 3:4])
But lets say we had an index indicating factor levels for columns, can split be used?
indx = c(1,1,2,2)
? split(x, indx)
It would be the default method of split
out <- split.default(x, indx)
identical(ls, setNames(out, NULL))
#[1] TRUE
This question already has answers here:
Apply a function to every specified column in a data.table and update by reference
(7 answers)
Closed 7 years ago.
Let DT be a data.table:
DT<-data.table(V1=factor(1:10),
V2=factor(1:10),
...
V9=factor(1:10),)
Is there a better/simpler method to do multicolumn factor conversion like this:
DT[,`:=`(
Vn1=as.numeric(V1),
Vn2=as.numeric(V2),
Vn3=as.numeric(V3),
Vn4=as.numeric(V4),
Vn5=as.numeric(V5),
Vn6=as.numeric(V6),
Vn7=as.numeric(V7),
Vn8=as.numeric(V8),
Vn9=as.numeric(V9)
)]
Column names are totally arbitrary.
Yes, the most efficient would be probably to run set in a for loop
Set the desired columns to modify (you can chose all the names too using names(DT) instead)
cols <- c("V1", "V2", "V3")
Then just run the loop
for (j in cols) set(DT, i = NULL, j = j, value = as.numeric(DT[[j]]))
Or a bit less efficient but more readable way would be just (note the parenthesis around cols which evaluating the variable)
## if you chose all the names in DT, you don't need to specify the `.SDcols` parameter
DT[, (cols) := lapply(.SD, as.numeric), .SDcols = cols]
Both should be efficient even for a big data set. You can read some more about data.table basics here
Though beware of converting factors to numeric classes in such a way, see here for more details
This question already has answers here:
Extracting unique rows from a data table in R [duplicate]
(2 answers)
Closed 4 years ago.
I've discovered some interesting behavior in data.table, and I'm curious if someone can explain to me why this is happening. I'm merging two data.tables (in this MWE, one has 1 row and the other 2 rows). The merged data.table has two unique rows, but when I call unique() on the merged data.table, I get a data.table with one row. Am I doing something wrong? Or is this a bug?
Here's an MWE:
library(data.table)
X = data.table(keyCol = 1)
setkey(X, keyCol)
Y = data.table(keyCol = 1, otherKey = 1:2)
setkeyv(Y, c("keyCol", "otherKey"))
X[Y, ] # 2 unique rows
unique(X[Y, ]) # Only 1 row???
I'd expect unique(X[Y, ]) to be the same as X[Y, ] since all rows are unique, but this doesn't seem to be the case.
The default value to by argument for unique.data.table is key(x). Therefore, if you do unique(x) on a keyed data.table, it only looks at the key columns. To override it, do:
unique(x, by = NULL)
by = NULL by default considers all the columns. Alternatively you can also provide by = names(x).