Alternative to expand.grid for data.frames - r

I have a data.frame df and I want that every row in this df is duplicated lengthTime times and that a new column is added that counts from 1 to lengthTime for each row in df.
I know, it sounds pretty complicated, but what I basically want is to apply expand.grid to df. Here is an ugly workaround and I have the feeling that there most be an easier solution (maybe even a base-R function?):
df <- data.frame(ID = rep(letters[1:3], each=3),
CatA = rep(1:3, times = 3),
CatB = letters[1:9])
lengthTime <- 3
nrRow <- nrow(df)
intDF <- df
for (i in 1:(lengthTime - 1)) {
df <- rbind(df, intDF)
}
df$Time <- rep(1:lengthTime, each=nrRow)
I thought that I could just use expand.grid(df, 1:lengthTime), but that does not work. outer did not bring any luck either. So does anyone know a good solution?

It's been a while since this question was posted, but I recently came across it looking for just the thing in the title, namely, an expand.grid that works for data frames. The posted answers address the OP's more specific question, so in case anyone is looking for a more general solution for data frames, here's a slightly more general approach:
expand.grid.df <- function(...) Reduce(function(...) merge(..., by=NULL), list(...))
# For the example in the OP
expand.grid.df(df, data.frame(1:lengthTime))
# More generally
df1 <- data.frame(A=1:3, B=11:13)
df2 <- data.frame(C=51:52, D=c("Y", "N"))
df3 <- data.frame(E=c("+", "-"))
expand.grid.df(df1, df2, df3)

You can also just do a simple merge by NULL (which will cause merge to do simple combinatorial data replication):
merge(data.frame(time=1:lengthTime), iris, by=NULL)

Why not just something like df[rep(1:nrow(df),times = 3),] to extend the data frame, and then add the extra column just as you have above, with df$Time <- rep(1:lengthTime, each=nrRow)?

Quick update
There is now also the crossing() function in package tidyr which can be used instead of merge, is somewhat faster, and returns a tbl_df / tibble.
data.frame(time=1:10) %>% merge(iris, by=NULL)
data.frame(time=1:10) %>% tidyr::crossing(iris)

This works:
REP <- rep(1:nrow(df), 3)
df2 <- data.frame(df[REP, ], Time = rep(1:3, each = 9))
rownames(df2) <- NULL
df2

A data.table solution:
> library(data.table)
> ( df <- data.frame(ID = rep(letters[1:3], each=3),
+ CatA = rep(1:3, times = 3),
+ CatB = letters[1:9]) )
ID CatA CatB
1 a 1 a
2 a 2 b
3 a 3 c
4 b 1 d
5 b 2 e
6 b 3 f
7 c 1 g
8 c 2 h
9 c 3 i
> ( DT <- data.table(df)[, lapply(.SD, function(x) rep(x,3))][, Time:=rep(1:3, each=nrow(df0))] )
ID CatA CatB Time
1: a 1 a 1
2: a 2 b 1
3: a 3 c 1
4: b 1 d 1
5: b 2 e 1
6: b 3 f 1
7: c 1 g 1
8: c 2 h 1
9: c 3 i 1
10: a 1 a 2
11: a 2 b 2
12: a 3 c 2
13: b 1 d 2
14: b 2 e 2
15: b 3 f 2
16: c 1 g 2
17: c 2 h 2
18: c 3 i 2
19: a 1 a 3
20: a 2 b 3
21: a 3 c 3
22: b 1 d 3
23: b 2 e 3
24: b 3 f 3
25: c 1 g 3
26: c 2 h 3
27: c 3 i 3
Another one :
> library(data.table)
> ( df <- data.frame(ID = rep(letters[1:3], each=3),
+ CatA = rep(1:3, times = 3),
+ CatB = letters[1:9]) )
> DT <- data.table(df)
> rbindlist(lapply(1:3, function(i) cbind(DT, Time=i)))
ID CatA CatB Time
1: a 1 a 1
2: a 2 b 1
3: a 3 c 1
4: b 1 d 1
5: b 2 e 1
6: b 3 f 1
7: c 1 g 1
8: c 2 h 1
9: c 3 i 1
10: a 1 a 2
11: a 2 b 2
12: a 3 c 2
13: b 1 d 2
14: b 2 e 2
15: b 3 f 2
16: c 1 g 2
17: c 2 h 2
18: c 3 i 2
19: a 1 a 3
20: a 2 b 3
21: a 3 c 3
22: b 1 d 3
23: b 2 e 3
24: b 3 f 3
25: c 1 g 3
26: c 2 h 3
27: c 3 i 3

Related

How to convert the values in one column into new columns, the values in another column into rows, indexing values in a third column?

Suppose I have the following data.table in R:
> data.table(Group = c(rep(1, 5), rep(2,5), rep(3,5)), Type = c("A","B","C","D","E"), Value = c(1:15))
Group Type Value
1: 1 A 1
2: 1 B 2
3: 1 C 3
4: 1 D 4
5: 1 E 5
6: 2 A 6
7: 2 B 7
8: 2 C 8
9: 2 D 9
10: 2 E 10
11: 3 A 11
12: 3 B 12
13: 3 C 13
14: 3 D 14
15: 3 E 15
I would like to create a new data table where I have:
> dat <- data.table(A = c(1,6,11), B = c(2,7,12), C = c(3,8,13), D = c(4,9,14), E = c(5,10,15))
> rownames(dat) <- c("1","2","3")
> dat
A B C D E
1: 1 2 3 4 5
2: 6 7 8 9 10
3: 11 12 13 14 15
where the rownames are now the Group values, the Type the column names, with the entries being the corresponding values from Values. Is there a way to do this using a function in data.table?
Using data.table rather than tidyr functions:
dt <- data.table(Group = c(rep(1, 5), rep(2,5), rep(3,5)), Type = c("A","B","C","D","E"), Value = c(1:15))
data.table::dcast(dt, Group ~ Type, value.var = "Value")
# Group A B C D E
# 1: 1 1 2 3 4 5
# 2: 2 6 7 8 9 10
# 3: 3 11 12 13 14 15
Edit: I have made the data.table:: explicit because there is also reshape2::dcast().

Get the group index in a data.table

I have the following data.table:
library(data.table)
DT <- data.table(a = c(1,2,3,4,5,6,7,8,9,10), b = c('A','A','A','B','B', 'C', 'C', 'C', 'D', 'D'), c = c(1,1,1,1,1,2,2,2,2,2))
> DT
a b c
1: 1 A 1
2: 2 A 1
3: 3 A 1
4: 4 B 1
5: 5 B 1
6: 6 C 2
7: 7 C 2
8: 8 C 2
9: 9 D 2
10: 10 D 2
I want to add a column that shows the index grouped by c (starts from 1 from each group in column c), but that only changes when the value of b is changed. The result wanted is shown below:
Here are two ways to do this :
Using rleid :
library(data.table)
DT[, col := rleid(b), c]
With match + unique :
DT[, col := match(b, unique(b)), c]
# a b c col
# 1: 1 A 1 1
# 2: 2 A 1 1
# 3: 3 A 1 1
3 4: 4 B 1 2
# 5: 5 B 1 2
# 6: 6 C 2 1
# 7: 7 C 2 1
# 8: 8 C 2 1
# 9: 9 D 2 2
#10: 10 D 2 2
We can use factor with levels specified and coerce it to integer
library(data.table)
DT[, col := as.integer(factor(b, levels = unique(b))), c]
-output
DT
# a b c col
# 1: 1 A 1 1
# 2: 2 A 1 1
# 3: 3 A 1 1
# 4: 4 B 1 2
# 5: 5 B 1 2
# 6: 6 C 2 1
# 7: 7 C 2 1
# 8: 8 C 2 1
# 9: 9 D 2 2
#10: 10 D 2 2
Or using base R with rle
with(DT, as.integer(ave(b, c, FUN = function(x)
with(rle(x), rep(seq_along(values), lengths)))))

Enumerate groups within groups in a data.table [duplicate]

This question already has answers here:
How to create group indices for nested groups in r
(3 answers)
Closed 3 years ago.
This is related to multiple duplicates (1, 2, 3), but a slightly different problem that I'm stuck with. So far, I've seen pandas solution only.
In this data table:
dt = data.table(gr = rep(letters[1:2], each = 6),
cl = rep(letters[1:4], each = 3))
gr cl
1: a a
2: a a
3: a a
4: a b
5: a b
6: a b
7: b c
8: b c
9: b c
10: b d
11: b d
12: b d
I'd like to enumerate unique classes per group to obtain this:
gr cl id
1: a a 1
2: a a 1
3: a a 1
4: a b 2
5: a b 2
6: a b 2
7: b c 1
8: b c 1
9: b c 1
10: b d 2
11: b d 2
12: b d 2
Try
library(data.table)
dt[, id := rleid(cl), by=gr]
dt
# gr cl id
# 1: a a 1
# 2: a a 1
# 3: a a 1
# 4: a b 2
# 5: a b 2
# 6: a b 2
# 7: b c 1
# 8: b c 1
# 9: b c 1
#10: b d 2
#11: b d 2
#12: b d 2
You can do (maybe it will require to sort the data first):
dt[, id := cumsum(!duplicated(cl)), by = gr]
gr cl id
1: a a 1
2: a a 1
3: a a 1
4: a b 2
5: a b 2
6: a b 2
7: b c 1
8: b c 1
9: b c 1
10: b d 2
11: b d 2
12: b d 2
The same with dplyr:
dt %>%
group_by(gr) %>%
mutate(id = cumsum(!duplicated(cl)))
Or a rleid()-like possibility:
dt %>%
group_by(gr) %>%
mutate(id = with(rle(cl), rep(seq_along(lengths), lengths)))
An alternative solution using factor which will not require ordering first
dt %>%
group_by(gr) %>%
mutate(id = as.numeric(factor(cl))) %>%
ungroup()
# # A tibble: 12 x 3
# gr cl id
# <chr> <chr> <dbl>
# 1 a a 1
# 2 a a 1
# 3 a a 1
# 4 a b 2
# 5 a b 2
# 6 a b 2
# 7 b c 1
# 8 b c 1
# 9 b c 1
#10 b d 2
#11 b d 2
#12 b d 2
Note that this will automatically assign a number / id based on the alphabetical order of the cl values, within each gr group.

Keep all the data.table when aggregating a data.table

I would like to aggregate a data.table by a list of column and keep all the columns at the end.
A <- c(1,2,3,4,4,6,4)
B <- c("a","b","c","d","e","f","g")
C <- c(10,11,23,8,8,1,3)
D <- c(2,3,5,9,7,8,4)
dt <- data.table(A,B,C,D)
Now I want to aggregate the column B paste(B,sep=";") by A and C and keep the column D too at the end. Do you know a way to do it please?
EDIT
this is what i obtained using dt[, newCol := toString(B), .(A, C)]
A B C D newCol
1: 1 a 10 2 a
2: 2 b 11 3 b
3: 3 c 23 5 c
4: 4 d 8 9 d, e
5: 4 e 8 7 d, e
6: 6 f 1 8 f
7: 4 g 3 4 g
But i would like to obtain
A B C D newCol
1: 1 a 10 2 a
2: 2 b 11 3 b
3: 3 c 23 5 c
4: 4 d 8 9 d, e
6: 6 f 1 8 f
7: 4 g 3 4 g

Replace duplicate rows in data.table

I'm trying to replace values of duplicate rows in a data.table. Let's say u have
A <- c(1,2,3,4,4,6,4)
B <- c("a","b","c","d","e","f","g")
C <- c(10,11,23,8,8,1,3)
dt <- data.table(A,B,C)
I would like to do: dt[duplicated(dt,dt[,c(1,3)]),][,2] <- 0 to obtain
>dt
A B C
1: 1 a 10
2: 2 b 11
3: 3 c 23
4: 4 d 8
5: 4 0 8
6: 6 f 1
7: 4 g 3
You could do
> A <- c(1,2,3,4,4,6,4)
> B <- c("a","b","c","d","e","f","g")
> dt <- data.table(A,B,C, stringsAsFactors = FALSE)
> C <- c(10,11,23,8,8,1,3)
> dt[dt[, j = duplicated(.SD), .SDcols = c("A", "C")], B := "0"]
> dt
A B C
1: 1 a 10
2: 2 b 11
3: 3 c 23
4: 4 d 8
5: 4 0 8
6: 6 f 1
7: 4 g 3
... but now seeing David's solution is way more concise...

Resources