How to loop through a datatable columns? [duplicate] - r

This question already has answers here:
Convert *some* column classes in data.table
(2 answers)
Closed 6 years ago.
I am looking to convert the columns of a datatable into another class and I am stuck with the inability to refer to the columns using strings.
set.seed(10238)
idt <- data.table(A = rep(1:3, each = 5), B = rep(1:5, 3),
C = sample(15), D = sample(15))
> idt
A B C D
1: 1 1 10 14
2: 1 2 2 2
3: 1 3 13 3
4: 1 4 7 1
5: 1 5 1 8
6: 2 1 11 15
7: 2 2 4 10
8: 2 3 15 7
9: 2 4 14 12
10: 2 5 5 9
11: 3 1 8 13
12: 3 2 3 4
13: 3 3 9 6
14: 3 4 6 11
15: 3 5 12 5
#All columns are integers
> lapply(idt, class)
$A
[1] "integer"
$B
[1] "integer"
$C
[1] "integer"
$D
[1] "integer"
vec = parse(text=c('A','B','C','D'))
for (i in vec) idt[, eval( i ) := as.character( eval(i) ) ]
Error in eval(expr, envir, enclos) : object 'A' not found*
I want to reassign the columns classes by looping through a vector containing the strings representing the names of the columns I want to convert.
I am aware of other threads adressing the same problem but they are not very understandable. My question is why can't I loop through expressions and eval them just like I would do manually replacing the the i in the j-expressions with the column names for each column.
** EDIT NOT A DUPLICATE **
I am aware of other threads adressing the same problem but they are not very understandable. My question is why can't I loop through expressions and eval them just like I would do manually replacing the the i in the j-expressions with the column names for each column.

We can do this with a for loop by looping over the column names of 'idt'. In this case we get the values of the string, convert it to character and assign (:=) it to the string name or column name ((i))
vec <- names(idt)
for(i in vec) idt[, (i) := as.character(get(i))]
Or using .SDcols, we specify the columns of interest in .SDcols, loop through the Subset of data.table (.SD) with lapply and assign (:=) it to the vector of column names ('vec')
id1[, (vec) := lapply(.SD, as.character), .SDcols = vec]

Try this:
names <- colnames(idt)
idt <- idt[, lapply(.SD, as.character), .SDcols = (names)]
.SD can be used with data.table to get a subset of the data. .SDcols is used to tell data.table which columns to lapply the function to.
Based on this answer, the following is a better approach:
names <- colnames(idt)
idt[, (names) := lapply(.SD, as.character), .SDcols = names]

Related

Use of parenthesis on LHS when assigning multiple columns in data.table

Given data
library(data.table)
dt = data.table(x = 1:5, y = 6:10)
x y
1: 1 6
2: 2 7
3: 3 8
4: 4 9
5: 5 10
Lets say I want two new columns with the mean of x and y. I can easily do this by providing a character vector of column names on left hand side (LHS) of :=:
dt[, c("mean_x", "mean_y") := lapply(.SD, mean)]
But if I assign the column names to a vector beforehand, and use the name of the vector as LHS, I get an error:
new_names = c("mean_x", "mean_y")
dt[, new_names := lapply(.SD, mean)]
# Error in `[.data.table`(dt, , `:=`(new_names , lapply(.SD, mean))) :
# Supplied 2 items to be assigned to 10 items of column 'cols'.
# If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
However, if I put parenthesis around the name of the vector it works fine
dt[, (new_names) := lapply(.SD, mean)]
x y mean_x mean_y
1: 1 6 3 8
2: 2 7 3 8
3: 3 8 3 8
4: 4 9 3 8
5: 5 10 3 8
What does the parenthesis tell data.table internally so that it treats new_names as new columns and not something else?

How to append values of columns? [duplicate]

This question already has answers here:
paste two data.table columns
(4 answers)
Closed 6 years ago.
For example there is the following data.table:
dt <- data.table(x = list(1:2, 3:5, 6:9), y = c(1,2,3))
# x y
# 1: 1,2 1
# 2: 3,4,5 2
# 3: 6,7,8,9 3
I need to create a new data.table, where values of the y column will be appended to lists stored in the x column:
# z
# 1: 1,2,1
# 2: 3,4,5,2
# 3: 6,7,8,9,3
I've tried lapply, cbind, list, c functions. But I can't get the table I need.
UPDATE:
The question is different from paste two data.table columns because a trivial solution with paste function or something like this doesn't work.
This will do it
# Merge two lists
dt[, z := mapply(c, x, y, SIMPLIFY=FALSE)]
print(dt)
x y z
1: 1,2 1 1,2,1
2: 3,4,5 2 3,4,5,2
3: 6,7,8,9 3 6,7,8,9,3
And deleting the original x and y columns
dt[, c("x", "y") := NULL]
print(dt)
z
1: 1,2,1
2: 3,4,5,2
3: 6,7,8,9,3
I would like to suggest a general approach for this kind of task in case you have multiple columns that you would like to combine into a single column
An example data with multiple columns
dt <- data.table(x = list(1:2, 3:5, 6:9), y = 1:3, z = list(4:6, NULL, 5:8))
Solution
res <- melt(dt, measure.vars = names(dt))[, .(.(unlist(value))), by = rowid(variable)]
res$V1
# [[1]]
# [1] 1 2 1 4 5 6
#
# [[2]]
# [1] 3 4 5 2
#
# [[3]]
# [1] 6 7 8 9 3 5 6 7 8
The idea here is to convert to long format and then unlist/list by group
(You will receive an warning due to different classes in the resulting value column)

data.table avoid column name changing [duplicate]

Pass character vectors and column names to data.table as a list of columns?
I want to be able to produce a subset of columns in R using data.table in a way that I can determine some of them earlier on and pass the predetermined list on as a character vector, then combine with a static list of columns.
That is, given this:
a <- 1:4
b <- 5:8
c <- c('aa','bb','cc','dd')
e <- 1:4
z <- data.table(a,b,c,e)
I want to do this:
z[, list(a,b)]
Which produces this output:
a b
1: 1 5
2: 2 6
3: 3 7
4: 4 8
But I want to do it in some way similar to this (which works, almost):
cols <- "b"
z[, list(get(cols), a)]
Results:
Note that it doesn't return the name of the column stored in cols
V1 a
1: 5 1
2: 6 2
3: 7 3
4: 8 4
but I need to do it with more than one element of cols (which does not work):
cols <- c('a', 'b')
z[, list(mget(cols), c)]
The above produces the following error:
Error: value for ‘a’ not found
I think my problem lies with scoping and which environments mget is looking in, but I can't figure out what exactly I am doing wrong. Also, how do I preserve the column titles?
Here are two (pretty much equivalent) options. One using lapply:
z[, c(lapply(cols, get), list(c))]
# V1 V2 V3
#1: 1 5 aa
#2: 2 6 bb
#3: 3 7 cc
#4: 4 8 dd
And one using mget:
z[, c(mget(cols, inherits = TRUE), c = list(c))]
# a b c
#1: 1 5 aa
#2: 2 6 bb
#3: 3 7 cc
#4: 4 8 dd
Note that get returns a vector which loses the information about column name (and there isn't much you can do about it besides manually adding it back in), while mget returns a named list.
Attempting to mix standard and non-standard evaluation within a single call will probably end in tears / frustration / obfusticated code.
There are a number of options in data.table
Use .. notation to "look up one level" to find the vector of column names
cols <- c('a','b')
z[, ..cols]
Use .SDcols
z[, .SD, .SDcols = cols]
But if you really want to combine the two ways of referencing, then you can use something like (introducing another option, with=FALSE, which allows more general expressions for column names than a simple vector)
ll <- function(char=NULL,uneval=NULL){
Call <- match.call()
cols <- lapply(Call$uneval,as.character)
unlist(c(char,cols))}
z[, ll(cols,c), with=FALSE]
# a b c
# 1: 1 5 aa
# 2: 2 6 bb
# 3: 3 7 cc
# 4: 4 8 dd
z[, ll(char=cols), with=FALSE]
# a b
# 1: 1 5
# 2: 2 6
# 3: 3 7
# 4: 4 8
z[, ll(uneval=c), with=FALSE]
# c
# 1: aa
# 2: bb
# 3: cc
# 4: dd
Combining a variable with column names with hard-coded column names in data.table
Given z and cols from the example above:
To combine a list of column names in a variable col with other hard coded column name c, we combine them in a new character vector c(col, 'c') in the call to data.table. We can refer to cols from within j (the second argument within []) by using the "up-one-level" notation ..:
z[, c(..cols, 'c')]
Thank you to #thelatemail for providing the base to the solution above.

How to pass a list of columns to data.table where some are predetermined

Pass character vectors and column names to data.table as a list of columns?
I want to be able to produce a subset of columns in R using data.table in a way that I can determine some of them earlier on and pass the predetermined list on as a character vector, then combine with a static list of columns.
That is, given this:
a <- 1:4
b <- 5:8
c <- c('aa','bb','cc','dd')
e <- 1:4
z <- data.table(a,b,c,e)
I want to do this:
z[, list(a,b)]
Which produces this output:
a b
1: 1 5
2: 2 6
3: 3 7
4: 4 8
But I want to do it in some way similar to this (which works, almost):
cols <- "b"
z[, list(get(cols), a)]
Results:
Note that it doesn't return the name of the column stored in cols
V1 a
1: 5 1
2: 6 2
3: 7 3
4: 8 4
but I need to do it with more than one element of cols (which does not work):
cols <- c('a', 'b')
z[, list(mget(cols), c)]
The above produces the following error:
Error: value for ‘a’ not found
I think my problem lies with scoping and which environments mget is looking in, but I can't figure out what exactly I am doing wrong. Also, how do I preserve the column titles?
Here are two (pretty much equivalent) options. One using lapply:
z[, c(lapply(cols, get), list(c))]
# V1 V2 V3
#1: 1 5 aa
#2: 2 6 bb
#3: 3 7 cc
#4: 4 8 dd
And one using mget:
z[, c(mget(cols, inherits = TRUE), c = list(c))]
# a b c
#1: 1 5 aa
#2: 2 6 bb
#3: 3 7 cc
#4: 4 8 dd
Note that get returns a vector which loses the information about column name (and there isn't much you can do about it besides manually adding it back in), while mget returns a named list.
Attempting to mix standard and non-standard evaluation within a single call will probably end in tears / frustration / obfusticated code.
There are a number of options in data.table
Use .. notation to "look up one level" to find the vector of column names
cols <- c('a','b')
z[, ..cols]
Use .SDcols
z[, .SD, .SDcols = cols]
But if you really want to combine the two ways of referencing, then you can use something like (introducing another option, with=FALSE, which allows more general expressions for column names than a simple vector)
ll <- function(char=NULL,uneval=NULL){
Call <- match.call()
cols <- lapply(Call$uneval,as.character)
unlist(c(char,cols))}
z[, ll(cols,c), with=FALSE]
# a b c
# 1: 1 5 aa
# 2: 2 6 bb
# 3: 3 7 cc
# 4: 4 8 dd
z[, ll(char=cols), with=FALSE]
# a b
# 1: 1 5
# 2: 2 6
# 3: 3 7
# 4: 4 8
z[, ll(uneval=c), with=FALSE]
# c
# 1: aa
# 2: bb
# 3: cc
# 4: dd
Combining a variable with column names with hard-coded column names in data.table
Given z and cols from the example above:
To combine a list of column names in a variable col with other hard coded column name c, we combine them in a new character vector c(col, 'c') in the call to data.table. We can refer to cols from within j (the second argument within []) by using the "up-one-level" notation ..:
z[, c(..cols, 'c')]
Thank you to #thelatemail for providing the base to the solution above.

How do I reorder data.table columns? [duplicate]

This question already has answers here:
How to reorder data.table columns (without copying)
(2 answers)
Closed 8 years ago.
How do I permute columns in a data.table?
I can do that for a data.frame, but data.table overrides the method:
> df <- data.frame(a=1:3,b=4:6)
> df
a b
1 1 4
2 2 5
3 3 6
> df[c("b","a")]
b a
1 4 1
2 5 2
3 6 3
> dt <- as.data.table(df)
> dt
a b
1: 1 4
2: 2 5
3: 3 6
> dt[c("b","a")]
Error in `[.data.table`(dt, c("b", "a")) :
When i is a data.table (or character vector), x must be keyed (i.e. sorted, and, marked as sorted) so data.table knows which columns to join to and take advantage of x being sorted. Call setkey(x,...) first, see ?setkey.
Calls: [ -> [.data.table
Note that this is not a dupe for How does one reorder columns in R?.
Use setcolorder:
> library(data.table)
> dt <- data.table(a=1:3,b=4:6)
> setcolorder(dt, c("b", "a"))
> dt
b a
1: 4 1
2: 5 2
3: 6 3
This is how you do it in data.table (without modifying original table):
dt[, list(b, a)]
or
dt[, c("b", "a")]
or
dt[, c(2, 1)]

Resources