This question already has answers here:
How to reorder data.table columns (without copying)
(2 answers)
Closed 8 years ago.
How do I permute columns in a data.table?
I can do that for a data.frame, but data.table overrides the method:
> df <- data.frame(a=1:3,b=4:6)
> df
a b
1 1 4
2 2 5
3 3 6
> df[c("b","a")]
b a
1 4 1
2 5 2
3 6 3
> dt <- as.data.table(df)
> dt
a b
1: 1 4
2: 2 5
3: 3 6
> dt[c("b","a")]
Error in `[.data.table`(dt, c("b", "a")) :
When i is a data.table (or character vector), x must be keyed (i.e. sorted, and, marked as sorted) so data.table knows which columns to join to and take advantage of x being sorted. Call setkey(x,...) first, see ?setkey.
Calls: [ -> [.data.table
Note that this is not a dupe for How does one reorder columns in R?.
Use setcolorder:
> library(data.table)
> dt <- data.table(a=1:3,b=4:6)
> setcolorder(dt, c("b", "a"))
> dt
b a
1: 4 1
2: 5 2
3: 6 3
This is how you do it in data.table (without modifying original table):
dt[, list(b, a)]
or
dt[, c("b", "a")]
or
dt[, c(2, 1)]
Related
I am learning data.table and trying to filter certain columns by using a vector containing a set of column names.
> dt <- data.table(A=1:5, B=2:6, C=3:7)
> dt
A B C
1: 1 2 3
2: 2 3 4
3: 3 4 5
4: 4 5 6
5: 5 6 7
>
> list <- c("A", "B")
> dt[ ,list, with=FALSE]
A B
1: 1 2
2: 2 3
3: 3 4
4: 4 5
5: 5 6
>
This works fine and filter columns.
However, the "missing" item in the list will return an error:
> list <- c("A", "B", "D")
> dt[ ,list, with=FALSE]
Error in `[.data.table`(dt, , list, with = FALSE) :
column(s) not found: D
How can I ignore the missing column name from the list and return just existing columns from the dt data.table?
dt[ ,colnames(dt) %in% list, with=FALSE]
This question already has answers here:
Convert *some* column classes in data.table
(2 answers)
Closed 6 years ago.
I am looking to convert the columns of a datatable into another class and I am stuck with the inability to refer to the columns using strings.
set.seed(10238)
idt <- data.table(A = rep(1:3, each = 5), B = rep(1:5, 3),
C = sample(15), D = sample(15))
> idt
A B C D
1: 1 1 10 14
2: 1 2 2 2
3: 1 3 13 3
4: 1 4 7 1
5: 1 5 1 8
6: 2 1 11 15
7: 2 2 4 10
8: 2 3 15 7
9: 2 4 14 12
10: 2 5 5 9
11: 3 1 8 13
12: 3 2 3 4
13: 3 3 9 6
14: 3 4 6 11
15: 3 5 12 5
#All columns are integers
> lapply(idt, class)
$A
[1] "integer"
$B
[1] "integer"
$C
[1] "integer"
$D
[1] "integer"
vec = parse(text=c('A','B','C','D'))
for (i in vec) idt[, eval( i ) := as.character( eval(i) ) ]
Error in eval(expr, envir, enclos) : object 'A' not found*
I want to reassign the columns classes by looping through a vector containing the strings representing the names of the columns I want to convert.
I am aware of other threads adressing the same problem but they are not very understandable. My question is why can't I loop through expressions and eval them just like I would do manually replacing the the i in the j-expressions with the column names for each column.
** EDIT NOT A DUPLICATE **
I am aware of other threads adressing the same problem but they are not very understandable. My question is why can't I loop through expressions and eval them just like I would do manually replacing the the i in the j-expressions with the column names for each column.
We can do this with a for loop by looping over the column names of 'idt'. In this case we get the values of the string, convert it to character and assign (:=) it to the string name or column name ((i))
vec <- names(idt)
for(i in vec) idt[, (i) := as.character(get(i))]
Or using .SDcols, we specify the columns of interest in .SDcols, loop through the Subset of data.table (.SD) with lapply and assign (:=) it to the vector of column names ('vec')
id1[, (vec) := lapply(.SD, as.character), .SDcols = vec]
Try this:
names <- colnames(idt)
idt <- idt[, lapply(.SD, as.character), .SDcols = (names)]
.SD can be used with data.table to get a subset of the data. .SDcols is used to tell data.table which columns to lapply the function to.
Based on this answer, the following is a better approach:
names <- colnames(idt)
idt[, (names) := lapply(.SD, as.character), .SDcols = names]
This question already has answers here:
paste two data.table columns
(4 answers)
Closed 6 years ago.
For example there is the following data.table:
dt <- data.table(x = list(1:2, 3:5, 6:9), y = c(1,2,3))
# x y
# 1: 1,2 1
# 2: 3,4,5 2
# 3: 6,7,8,9 3
I need to create a new data.table, where values of the y column will be appended to lists stored in the x column:
# z
# 1: 1,2,1
# 2: 3,4,5,2
# 3: 6,7,8,9,3
I've tried lapply, cbind, list, c functions. But I can't get the table I need.
UPDATE:
The question is different from paste two data.table columns because a trivial solution with paste function or something like this doesn't work.
This will do it
# Merge two lists
dt[, z := mapply(c, x, y, SIMPLIFY=FALSE)]
print(dt)
x y z
1: 1,2 1 1,2,1
2: 3,4,5 2 3,4,5,2
3: 6,7,8,9 3 6,7,8,9,3
And deleting the original x and y columns
dt[, c("x", "y") := NULL]
print(dt)
z
1: 1,2,1
2: 3,4,5,2
3: 6,7,8,9,3
I would like to suggest a general approach for this kind of task in case you have multiple columns that you would like to combine into a single column
An example data with multiple columns
dt <- data.table(x = list(1:2, 3:5, 6:9), y = 1:3, z = list(4:6, NULL, 5:8))
Solution
res <- melt(dt, measure.vars = names(dt))[, .(.(unlist(value))), by = rowid(variable)]
res$V1
# [[1]]
# [1] 1 2 1 4 5 6
#
# [[2]]
# [1] 3 4 5 2
#
# [[3]]
# [1] 6 7 8 9 3 5 6 7 8
The idea here is to convert to long format and then unlist/list by group
(You will receive an warning due to different classes in the resulting value column)
This question already has an answer here:
Pass column name in data.table using variable [duplicate]
(1 answer)
Closed 6 years ago.
Suppose I have a data.table with columns names that are specified in a variable. For example I might have used dcast as:
groups <- sample(LETTERS, 2) # i.e. I don't now the values
dt1 <- data.table(ID = rep(1:2, each = 2), group = groups, value = 3:6)
(dt2 <- dcast(dt1, ID~group, value.var = "value"))
# ID D Q
# 1: 1 3 4
# 2: 2 5 6
Now I want to subset based on values in the last two columns, e.g. do something like:
dt2[groups[1] == 3 & groups[2] == 4]
# Empty data.table (0 rows) of 3 cols: ID,D,Q
Is there an easy way?
I found I can do this with keys:
setkeyv(dt2, groups)
dt2[.(3, 4)]
# ID D Q
# 1: 1 3 4
But how do I do something more elaborate, as
dt2[groups[1] > 3 & groups[2] < 7]
?
You can use get to (from ?get)
search by name for an object
:
dt2[get(groups[1]) > 2 & get(groups[2]) == 4]
# ID A J
#1: 1 3 4
We can use eval with as.name and it should be faster than get
dt2[eval(as.name(groups[1])) > 2 & eval(as.name(groups[2])) == 4]
# ID L U
#1: 1 4 3
I have noticed this interesting behaviour of data.tables:
I create a new data.table and use a function to do something to it and change the colnames with setnames. In this minimal example only setnames is used to change 'B' to 'C':
dt1 <- data.table(
A=c(1:5),
B=c(6:10))
dt1
> dt1
> A B
> 1: 1 6
> 2: 2 7
> 3: 3 8
> 4: 4 9
> 5: 5 10
doSomething <- function(dt){
setnames(dt, "B", "C")
}
dt2 <- doSomething(dt1)
dt2
> dt2
> A C
> 1: 1 6
> 2: 2 7
> 3: 3 8
> 4: 4 9
> 5: 5 10
All appears to have worked without an itch. However, looking at dt1:
dt1
> dt1
> A C
> 1: 1 6
> 2: 2 7
> 3: 3 8
> 4: 4 9
> 5: 5 10
After the function dt1 also has a changed colname 'C'. I know that data.tables do not worked in exactly in the same manner as data.frames, in that after certain operations they are not assigned to new objects creating "duplicates". However, in this event a new object get assigned, and still the old object changes after the operation. It somehow reminds of python.
Is this working as intended or should I report it has a bug? Also, is there a way to change this behaviour? I would like to keep dt1 intact after applying a function, with setnames, to it.
Cheers
You can make a copy of the original dataset ('dt1') and then try doSomething(dt1) which will change only 'dt1'
dt2 <- copy(dt1)
doSomething(dt1)
colnames(dt1)
#[1] "A" "C"
colnames(dt2)
#[1] "A" "B"