Split and combination of dataframe columns in R - r

I have a very large dataset (around 500k rows and 15 columns). one of the columns has more than one character divided by a semicolon as follows:
Date a b c d
01-01-2020 A1 B1 C1a;C1b D1
30-12-2019 A2 B2 C2a;C2b;C2c D2
33-5-2018 A3 B3 C3a;C3b;C3c;C3d D3
20-11-2019 A4 B4 C4a;C4b D4
I would like to split column c in order to have only to columns (cA and cB). When there are more than two factors in c, such as in columns 2 and 3, I want to create as many rows as per each possible unique combination of the Cs all else equal. The result would then be like:
Date a b c_01 c_02 d
01-01-2020 A1 B1 C1a C1b D1
30-12-2019 A2 B2 C2a C2b D2
30-12-2019 A2 B2 C2a C2c D2
30-12-2019 A2 B2 C2b C2c D2
33-5-2018 A3 B3 C3a C3b D3
33-5-2018 A3 B3 C3a C3c D3
33-5-2018 A3 B3 C3a C3d D3
33-5-2018 A3 B3 C3b C3c D3
33-5-2018 A3 B3 C3b C3d D3
33-5-2018 A3 B3 C3c C3d D3
20-11-2019 A4 B4 C4a C4b D4
I have tried to use csplit to create a single column per each factor and then to create a for loop per each row but it does not really work. I have also tried with apply function to create something similar to a loop but the dataset is too large and I keep receiving errors. Can someone help? Thank you very much!

We could use strsplit to split the 'c' column by ';', then loop over the list with map, get the pair of combnations, convert to data.frame, and unnest the list of 'data.frame' column
library(dplyr)
library(tidyr)
library(purrr)
df1 %>%
mutate(c = map(strsplit(c, ";"), ~ combn(.x, 2) %>%
t %>%
as.data.frame %>%
set_names(c('c_01', 'c_02')))) %>%
unnest(c(c))
# A tibble: 11 x 6
# Date a b c_01 c_02 d
# <chr> <chr> <chr> <chr> <chr> <chr>
# 1 01-01-2020 A1 B1 C1a C1b D1
# 2 30-12-2019 A2 B2 C2a C2b D2
# 3 30-12-2019 A2 B2 C2a C2c D2
# 4 30-12-2019 A2 B2 C2b C2c D2
# 5 33-5-2018 A3 B3 C3a C3b D3
# 6 33-5-2018 A3 B3 C3a C3c D3
# 7 33-5-2018 A3 B3 C3a C3d D3
# 8 33-5-2018 A3 B3 C3b C3c D3
# 9 33-5-2018 A3 B3 C3b C3d D3
#10 33-5-2018 A3 B3 C3c C3d D3
#11 20-11-2019 A4 B4 C4a C4b D4
Or using base R
lst1 <- lapply(strsplit(df1$c, ";"),
function(x) as.data.frame(t(combn(x, 2))))
l1 <- sapply(lst1, nrow)
out <- cbind(df1[rep(seq_len(nrow(df1)), l1),c('Date', 'a', 'b', 'd')],
do.call(rbind, lst1))
row.names(out) <- NULL
names(out)[5:6] <- c("c_01", "c_02")
out
# Date a b d c_01 c_02
#1 01-01-2020 A1 B1 D1 C1a C1b
#2 30-12-2019 A2 B2 D2 C2a C2b
#3 30-12-2019 A2 B2 D2 C2a C2c
#4 30-12-2019 A2 B2 D2 C2b C2c
#5 33-5-2018 A3 B3 D3 C3a C3b
#6 33-5-2018 A3 B3 D3 C3a C3c
#7 33-5-2018 A3 B3 D3 C3a C3d
#8 33-5-2018 A3 B3 D3 C3b C3c
#9 33-5-2018 A3 B3 D3 C3b C3d
#10 33-5-2018 A3 B3 D3 C3c C3d
#11 20-11-2019 A4 B4 D4 C4a C4b
data
df1 <- structure(list(Date = c("01-01-2020", "30-12-2019", "33-5-2018",
"20-11-2019"), a = c("A1", "A2", "A3", "A4"), b = c("B1", "B2",
"B3", "B4"), c = c("C1a;C1b", "C2a;C2b;C2c", "C3a;C3b;C3c;C3d",
"C4a;C4b"), d = c("D1", "D2", "D3", "D4")), class = "data.frame",
row.names = c(NA,
-4L))

Related

Sort rows for a data frame

I am searching for a simple dplyr or data.table solution. I need to sort rows of a large data frame, but only have a solution with for loops.
Here is a minimum example:
A = c('A1', 'A2', 'A3', 'A4', 'A5')
B = c('B1', 'B2', 'B3')
set.seed(20)
df = data.frame(col1 = sample(c(A,B),8,1), col2 = sample(c(A,B),8,1), col3 = sample(c(A,B),8,1))
col1 col2 col3
1 B1 B1 A1
2 B2 B1 A5
3 A3 A5 B1
4 B3 B2 B3
5 A2 B2 A2
6 A1 A1 B2
7 A2 A3 A4
8 A5 A5 A1
The expected output should be:
col1 col2 col3
1 B1 A1 B1
2 B1 A5 B2
3 B1 A3 A5
4 B2 B3 B3
5 B2 A2 A2
6 B2 A1 A1
7 A2 A3 A4
8 A1 A5 A5
So, the order of the rows for the sort algorithm is c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5') with one exception. If there is already one of the B's in the first column we continue with the A's.
The next problem is, that I have three more columns in the data frame with different numbers which should be rearranged in the same order as these three columns.
You can use apply, factor and sort twice with different orders.
order1 = c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5') #Main order
order2 = c('A1', 'A2', 'A3', 'A4', 'A5', 'B1', 'B2', 'B3') #Secondary order for rows with 1st column as "B"
startB <- grepl("B", df[, 1]) #Rows with 1st column being "B"
df <- data.frame(t(apply(df, 1, \(x) sort(factor(x, levels = order1)))))
df[startB, -1] <- t(apply(df[startB, ], 1, \(x) sort(factor(x[-1], levels = order2))))
output
X1 X2 X3
1 B1 A1 B1
2 B1 A5 B2
3 B1 A3 A5
4 B2 B3 B3
5 B2 A2 A2
6 B2 A1 A1
7 A2 A3 A4
8 A1 A5 A5
Might be more than a little bit too convoluted, but a dplyr and purrr option might be:
map2_dfr(.x = df %>%
group_split(cond = as.numeric(grepl("^B", col1))),
.y = list(vec1, vec2),
~ .x %>%
mutate(pmap_dfr(across(c(starts_with("col"), - pluck(select(.x, "cond"), 1))),
function(...) set_names(c(...)[order(match(c(...), .y))], names(c(...))))))
col1 col2 col3 cond
<chr> <chr> <chr> <dbl>
1 B1 A3 A5 0
2 B2 A2 A2 0
3 B2 A1 A1 0
4 A2 A3 A4 0
5 A1 A5 A5 0
6 B1 A1 B1 1
7 B2 A5 B1 1
8 B3 B2 B3 1
My solution so far:
A = c('A1', 'A2', 'A3', 'A4', 'A5')
B = c('B1', 'B2', 'B3')
set.seed(100)
N = 20
df_1 = data.frame(col1 = sample(c(A,B),N,1), col2 = sample(c(A,B),N,1), col3 = sample(c(A,B),N,1))
vec = c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5')
df_2 = t(apply(df_1,1,function(x)match(x,vec)))
df_3 = t(apply(df_2,1,sort))
tr = rowSums(matrix(df_3 %in% c(1,2,3),nrow(df_3), ncol(df_3))) == 2
change = which((df_3[,2]*tr)!=0)
save = df_3[change,2]
df_3[change,2] = df_3[change,3]
df_3[change,3] = save
df_4 = matrix(vec[df_3],nrow(df_3),ncol(df_3))
from df_2 to df_3 the place of the number is changing and I can rearrange the other columns by that.
Looks a little bit complicated

R - save as data.frame all elements of a list of lists efficiently

I have the following list, and I want to create a data.frame that holds every possible "path", when the numeric value of the arrays is> 0.
This is the list:
> ABBCCD2
$A1
$A1$B1
D1 D2
C1 0.233 0.078
C2 0.039 0.039
$A1$B2
D1 D2
C1 0.083 0.028
C2 0.056 0.056
$A1$B3
D1 D2
C1 0.083 0.028
C2 0.056 0.056
$A2
$A2$B1
D1 D2
C1 0.100 0.033
C2 0.017 0.017
$A2$B2
D1 D2
C1 0 0
C2 0 0
$A2$B3
D1 D2
C1 0 0
C2 0 0
And this is the result I want:
> res
FUN INTC INTB INME prob
1 A1 B1 C1 D1 0.233
2 A1 B1 C1 D2 0.078
3 A1 B1 C2 D1 0.039
4 A1 B1 C2 D2 0.039
5 A1 B2 C1 D1 0.083
6 A1 B2 C1 D2 0.028
7 A1 B2 C2 D1 0.056
8 A1 B2 C2 D2 0.056
9 A1 B3 C1 D1 0.083
10 A1 B3 C1 D2 0.028
11 A1 B3 C2 D1 0.056
12 A1 B3 C2 D2 0.056
13 A2 B1 C1 D1 0.100
14 A2 B1 C1 D2 0.033
15 A2 B1 C2 D1 0.017
16 A2 B1 C2 D2 0.017
I have solved it with for loops, but it is not efficient, since the real problem that I have there are 15 million possible paths, and it can take several days to solve it.
This is the code I have made:
m <- 0
# creamos dataframe vacio
res <- data.frame(FUN=character(),INTC=character(),INTB=character(),INME=character(),prob=numeric())
for(i in 1:length(ABBCCD2)) { # A
for (j in 1:length(ABBCCD2[[1]])) { # B
for(k in 1:nrow(ABBCCD2[[1]][[1]])) { # C
for(f in 1:ncol(ABBCCD2[[1]][[1]])) { # D
# solo guardamos las prob > 0
if(ABBCCD2[[i]][[j]][k,f] > 0) {
# contador de caminos con probabilidad no-cero
m <- m + 1
# creamos la fila del data frame correspondiente y vamos rellenando
res[m,] <- data.frame(FUN=names(ABBCCD2[i]), INTC=names(ABBCCD2[[i]][j]), INTB=rownames(ABBCCD2[[i]][[j]])[k],
INME = colnames(ABBCCD2[[i]][[j]])[f] , prob = ABBCCD2[[i]][[j]][k,f] )
}else{
}
}
}
}
}
Any ideas to solve it more efficiently?
Thank you all
Here is an option
library(rrapply)
library(purrr)
library(dplyr)
library(tidyr)
map_depth(ABBCCD2, 2, ~ as.data.frame.table(.x)) %>%
map_dfr(~ bind_rows(.x, .id = 'INTC'), .id = 'FUN') %>%
rename_at(3:5, ~c("INTB", "INME", "prob")) %>%
filter(prob != 0)
-output
# FUN INTC INTB INME prob
#1 A1 B1 C1 D1 -1.0978872
#2 A1 B1 C2 D1 -0.8782714
#3 A1 B1 C1 D2 0.1646925
#4 A1 B1 C2 D2 1.2239280
#5 A1 B2 C1 D1 0.2088934
#6 A1 B2 C2 D1 0.2191693
#7 A1 B2 C1 D2 -1.6247005
#8 A1 B2 C2 D2 -0.4496129
#9 A2 B1 C1 D1 0.3426282
#10 A2 B1 C2 D1 -1.0963979
#11 A2 B1 C1 D2 1.8424623
#12 A2 B1 C2 D2 -0.2248845
#13 A2 B2 C1 D1 -0.9655256
#14 A2 B2 C2 D1 0.6998366
#15 A2 B2 C1 D2 -1.2647063
#16 A2 B2 C2 D2 0.4514344
data
ABBCCD2 <- list(A1 = list(B1 = structure(c(-1.0978871935389, -0.878271447742256,
0.164692499183084, 1.22392804082201), .Dim = c(2L, 2L), .Dimnames = list(
c("C1", "C2"), c("D1", "D2"))), B2 = structure(c(0.208893448902667,
0.21916929248291, -1.62470051990683, -0.449612869059051), .Dim = c(2L,
2L), .Dimnames = list(c("C1", "C2"), c("D1", "D2")))), A2 = list(
B1 = structure(c(0.34262819072166, -1.09639792471103, 1.8424623311698,
-0.224884516346163), .Dim = c(2L, 2L), .Dimnames = list(c("C1",
"C2"), c("D1", "D2"))), B2 = structure(c(-0.965525564286861,
0.699836580462635, -1.26470634026811, 0.451434438203962), .Dim = c(2L,
2L), .Dimnames = list(c("C1", "C2"), c("D1", "D2"))), B3 = structure(c(0,
0, 0, 0), .Dim = c(2L, 2L), .Dimnames = list(c("C1", "C2"
), c("D1", "D2")))))
If I understand correctly, the challenges are
to convert the matrices to data.frames,
to skip zero matrices,
to bind all pieces from the nested lists into one large dataset,
to reshape into long format,
to retain the names of list elements and matrix dimensions.
(Execution not necessarily in that order)
An additional challenge was that the question shows the nested list in printed form but not in a reproducible form, e.g., dput(). See the Data section on turning the printout into a list structure.
For the sake of completeness, here are two other approaches.
nested lapply() and rbindlist()
rrapply::rrapply() and reshape2::melt()
Nested lapply() and rbindlist()
library(data.table)
library(magrittr)
res <- lapply(
ABBCCD2,
function(x) lapply(x, as.data.table, keep.rownames = "INTB") %>% rbindlist(idcol = "INTC")
) %>%
rbindlist(idcol = "FUN") %>%
melt(measure.vars = patterns("^D"), variable.name = "INME", value.name = "prob") %>%
.[prob != 0] %>%
setorderv(names(.))
res
FUN INTC INTB INME prob
1: A1 B1 C1 D1 0.233
2: A1 B1 C1 D2 0.078
3: A1 B1 C2 D1 0.039
4: A1 B1 C2 D2 0.039
5: A1 B2 C1 D1 0.083
6: A1 B2 C1 D2 0.028
7: A1 B2 C2 D1 0.056
8: A1 B2 C2 D2 0.056
9: A1 B3 C1 D1 0.083
10: A1 B3 C1 D2 0.028
11: A1 B3 C2 D1 0.056
12: A1 B3 C2 D2 0.056
13: A2 B1 C1 D1 0.100
14: A2 B1 C1 D2 0.033
15: A2 B1 C2 D1 0.017
16: A2 B1 C2 D2 0.017
magrittr piping is used to improve readability.
This approach converts the single 2 x 2 matrices into data.tables with 3 columns and 2 rows each. These are then combined by rbindlist() in two steps to form one large data.table. Finally, the two value columns are reshaped to long format and zero prob values are removed.
setorderv() is only used to allow for a direct comparison with OP's expected result.
Caveat: Zero prob values are removed after all data have been turned into long format. This may lead to unexpected results in case one of the matrices contains a zero element just by chance.
rrapply() and matrix melt()
Here is a different approach which first reshapes the matrices to long form data.tables (after excluding matrices with all zero elements) which are then combined into one large dataset by two rbindlist() steps:
library(data.table)
library(magrittr)
library(rrapply)
res2 <- rrapply(ABBCCD2,
condition = function(x) sum(abs(x)) > 0,
f = function(x) reshape2::melt(x, value.name = "prob"),
classes = "matrix", how = "prune") %>%
lapply(rbindlist, idcol = "INTC") %>%
rbindlist(idcol = "FUN") %>%
setnames(c("Var1", "Var2"), c("INTB", "INME"))%>%
setorderv(names(.))
res2
The result is the same as above.
Data
Here is a way to turn the printout into a nested list structure:
txt <- "$A1
$A1$B1
D1 D2
C1 0.233 0.078
C2 0.039 0.039
$A1$B2
D1 D2
C1 0.083 0.028
C2 0.056 0.056
$A1$B3
D1 D2
C1 0.083 0.028
C2 0.056 0.056
$A2
$A2$B1
D1 D2
C1 0.100 0.033
C2 0.017 0.017
$A2$B2
D1 D2
C1 0 0
C2 0 0
$A2$B3
D1 D2
C1 0 0
C2 0 0"
txt contains the printout as copied and pasted from the question
library(data.table)
library(magrittr)
library(rrapply)
ABBCCD2 <- fread(text = txt, sep = NULL, header = FALSE, blank.lines.skip = TRUE) %>%
.[, tstrsplit(V1, "\\$")] %>%
.[, c("V2", "V3") := zoo::na.locf(.SD, na.rm = FALSE), .SDcols = c("V2", "V3")] %>%
.[V1 != ""] %>%
split(by = c("V2", "V3"), flatten = FALSE, keep.by = FALSE) %>%
rrapply(
f = . %>%
.[, paste0(V1, collapse = "\n") %>%
{paste("rn", .)} %>%
fread() %>%
as.matrix(rownames = "rn")]
, classes = "data.frame", how = "replace")
ABBCCD2
$A1
$A1$B1
D1 D2
C1 0.233 0.078
C2 0.039 0.039
$A1$B2
D1 D2
C1 0.083 0.028
C2 0.056 0.056
$A1$B3
D1 D2
C1 0.083 0.028
C2 0.056 0.056
$A2
$A2$B1
D1 D2
C1 0.100 0.033
C2 0.017 0.017
$A2$B2
D1 D2
C1 0 0
C2 0 0
$A2$B3
D1 D2
C1 0 0
C2 0 0
Here is a base R option using stack
rev(
transform(
stack(df <- as.data.frame(
rapply(ABBCCD2,
t,
how = "replace"
)
)),
ind = paste0(ind, ".", row.names(df))
)
)
which gives
ind values
1 A1.B1.C1.D1 -1.0978872
2 A1.B1.C1.D2 0.1646925
3 A1.B1.C2.D1 -0.8782714
4 A1.B1.C2.D2 1.2239280
5 A1.B2.C1.D1 0.2088934
6 A1.B2.C1.D2 -1.6247005
7 A1.B2.C2.D1 0.2191693
8 A1.B2.C2.D2 -0.4496129
9 A2.B1.C1.D1 0.3426282
10 A2.B1.C1.D2 1.8424623
11 A2.B1.C2.D1 -1.0963979
12 A2.B1.C2.D2 -0.2248845
13 A2.B2.C1.D1 -0.9655256
14 A2.B2.C1.D2 -1.2647063
15 A2.B2.C2.D1 0.6998366
16 A2.B2.C2.D2 0.4514344
17 A2.B3.C1.D1 0.0000000
18 A2.B3.C1.D2 0.0000000
19 A2.B3.C2.D1 0.0000000
20 A2.B3.C2.D2 0.0000000

Is there a way to change data frame entries in R from numeric to a specific character?

If I have a data frame like so:
df <- data.frame(
a = c(1,1,1,2,2,2,3,3,3),
b = c(1,2,3,1,2,3,1,2,3)
)
which looks like this:
> df
a b
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
Is there a quick way to change the columns a and b to match the example below, without explicitly having to type it all out?
> df
a b
a1 b1
a1 b2
a1 b3
a2 b1
a2 b2
a2 b3
a3 b1
a3 b2
a3 b3
In other words, Im trying to take the name of the column and just place it in front of the value that was in that row originally.
We can use cur_column to return the corresponding column name within across and paste (str_c) the column value with the corresponding column name
library(dplyr)
library(stringr)
df1 <- df %>%
mutate(across(everything(), ~ str_c(cur_column(), .)))
-output
df1
# a b
#1 a1 b1
#2 a1 b2
#3 a1 b3
#4 a2 b1
#5 a2 b2
#6 a2 b3
#7 a3 b1
#8 a3 b2
#9 a3 b3
Or using base R
df[] <- Map(paste0, names(df), df)
Or another option is
df[] <- paste0(names(df)[col(df)], unlist(df))

group 2 variables and then delimit the strings

I am trying to group two variables and remove the comma seperated without increasing the number of row
eg:
#my dataframe
> df
g1 g2 g3
1 a1 a2 77.7,81.7
2 a1 a2 77.7,81.7
3 b2 b3 3,1,5
4 b2 b3 3,1,5
5 b2 b3 3,1,5
Expected Output:
g1 g2 g3
1 a1 a2 77.7
2 a1 a2 81.7
3 b2 b3 3
4 b2 b3 1
5 b2 b3 5
I tried some codes below but its unable to group and not comes in expected format. Please help!
Codes:
df <- data.frame(g1 = c("a1","a1","b2","b2","b2"), g2 = c("a2","a2","b3","b3","b3"), g3 = c("77.7,81.7","77.7,81.7","3,1,5","3,1,5","3,1,5"))
library(stringr)
s <- strsplit(df$g3, split = ",")
data.frame(V1 = rep(df$g1, sapply(s, length)), V2 = unlist(s))
Building on Chris Ruehlemann's answer: you can use the following and it will still work if values reappear.
df$g3_split <- unlist(lapply(split(df,df$g1), function(x) unique(unlist(strsplit(x$g3, ","))) ))
df
g1 g2 g3 g3_split
1 a1 a2 77.7,81.7 77.7
2 a1 a2 77.7,81.7 81.7
3 b2 b3 3,77.7,5 3
4 b2 b3 3,77.7,5 77.7
5 b2 b3 3,77.7,5 5
DATA:
df <- data.frame(g1 = c("a1","a1","b2","b2","b2"),
g2 = c("a2","a2","b3","b3","b3"),
g3 = c("77.7,81.7","77.7,81.7","3,1,5","3,1,5","3,1,5"), stringsAsFactors = F)
SOLUTION:
df$g3_split <- unique(unlist(strsplit(df$g3, ",")))
RESULT:
df
g1 g2 g3 g3_split
1 a1 a2 77.7,81.7 77.7
2 a1 a2 77.7,81.7 81.7
3 b2 b3 3,1,5 3
4 b2 b3 3,1,5 1
5 b2 b3 3,1,5 5
If you want to replace g3with the new values, just assign unique(unlist(strsplit(df$g3, ","))) to df$g3 instead of df$g3_split.
An option with separate_rows
library(dplyr)
library(tidyr)
df %>%
mutate( g3_split = g3) %>%
separate_rows(g3_split) %>%
distinct(g3_split, .keep_all = TRUE)

how to create categories conditionally using other variables values and sequence

I would appreciate any help to create a function that allows me to create categories of one variable using the order of a set of other variables values.
Specifically, I want a function that:
creates category E1 of the variable variable the first time that each combination of values of the variables A, B, and ID
appears in the dataset.
creates category E2 of the variable variable the second time that each combination of values of the variables A, B, and ID
appears in the dataset.
creates category E3 of the variable variable the third time that each combination of values of the variables A, B, and ID
appears in the dataset.
creates category En of the variable variable the nth time that each combination of values of the variables A, B, and ID
appears in the dataset.
#sample data:
rowdT<-structure(list(A = c("a1", "a2", "a1", "a1", "a2", "a1", "a1",
"a2", "a1"), B = c("b2", "b2", "b2", "b1", "b2", "b2", "b1",
"b2", "b1"), ID = c("3", "4", "3", "1", "4", "3", "1", "4", "1"
), E = c(0.621142094943352, 0.742109450696123, 0.39439152996948,
0.40694392882818, 0.779607277916503, 0.550579323666347, 0.352622183880119,
0.690660491345867, 0.23378944873769)), class = c("data.table",
"data.frame"), row.names = c(NA, -9L))
sampleDT <- melt(rowdT, id.vars = c("A", "B", "ID"))
#input data:
A B ID variable value
1: a1 b2 3 E 0.6211421
2: a2 b2 4 E 0.7421095
3: a1 b2 3 E 0.3943915
4: a1 b1 1 E 0.4069439
5: a2 b2 4 E 0.7796073
6: a1 b2 3 E 0.5505793
7: a1 b1 1 E 0.3526222
8: a2 b2 4 E 0.6906605
9: a1 b1 1 E 0.2337894
#expected output:
A B ID variable value
4: a1 b1 1 E1 0.4069439
1: a1 b2 3 E1 0.6211421
2: a2 b2 4 E1 0.7421095
7: a1 b1 1 E2 0.3526222
3: a1 b2 3 E2 0.3943915
5: a2 b2 4 E2 0.7796073
9: a1 b1 1 E3 0.2337894
6: a1 b2 3 E3 0.5505793
8: a2 b2 4 E3 0.6906605
Thanks in advance for any help.
First convert your variable to a character vector for proper coercion, and then use data.table
sampleDT$variable = as.character(sampleDT$variable)
sampleDT[, variable := paste(variable,1:.N,sep = ""), by = c("A", "B", "ID")]
This creates unique tallies based on the observed combinations of A, B, and ID.
This gets the following output:
A B ID variable value
1: a1 b2 3 E1 0.6211421
2: a2 b2 4 E1 0.7421095
3: a1 b2 3 E2 0.3943915
4: a1 b1 1 E1 0.4069439
5: a2 b2 4 E2 0.7796073
6: a1 b2 3 E3 0.5505793
7: a1 b1 1 E2 0.3526222
8: a2 b2 4 E3 0.6906605
9: a1 b1 1 E3 0.2337894
which you can reorder if necessary.

Resources