Problem:
Is there a simple way to get all combinations of two (or more) identical vectors. But only show unique combinations.
Reproducible example:
library(tidyr)
x = 1:3
expand_grid(a = x,
b = x,
c = x)
# A tibble: 27 x 3
a b c
<int> <int> <int>
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 1
5 1 2 2
6 1 2 3
7 1 3 1
8 1 3 2
9 1 3 3
10 2 1 1
# ... with 17 more rows
But, if row 1 2 1 exists, then I do not want to see 1 1 2 or 2 1 1. I.e. show only unique combinations of the three vectors (any order).
library(gtools)
x = 1:3
df <- as.data.frame(combinations(n=3,r=3,v=x,repeats.allowed=T))
df
output
V1 V2 V3
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 2
5 1 2 3
6 1 3 3
7 2 2 2
8 2 2 3
9 2 3 3
10 3 3 3
You can just sort rowwise and remove duplicates. Continuing from your expand_grid(), then
df <- tidyr::expand_grid(a = x,
b = x,
c = x)
data.frame(unique(t(apply(df, 1, sort))))
X1 X2 X3
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 2
5 1 2 3
6 1 3 3
7 2 2 2
8 2 2 3
9 2 3 3
10 3 3 3
Using comboGeneral from the RcppAlgos package, it's implemented in C++ and pretty fast.
x <- 1:3
RcppAlgos::comboGeneral(x, repetition=TRUE)
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 1 2
# [3,] 1 1 3
# [4,] 1 2 2
# [5,] 1 2 3
# [6,] 1 3 3
# [7,] 2 2 2
# [8,] 2 2 3
# [9,] 2 3 3
# [10,] 3 3 3
Note: If you're running Linux, you will need gmp installed, e.g. for Ubuntu do:
sudo apt install libgmp3-dev
base
x <- 1:3
df <- expand.grid(a = x,
b = x,
c = x)
df[!duplicated(apply(df, 1, function(x) paste(sort(x), collapse = ""))), ]
#> a b c
#> 1 1 1 1
#> 2 2 1 1
#> 3 3 1 1
#> 5 2 2 1
#> 6 3 2 1
#> 9 3 3 1
#> 14 2 2 2
#> 15 3 2 2
#> 18 3 3 2
#> 27 3 3 3
Created on 2021-09-09 by the reprex package (v2.0.1)
Related
I have a dataframe like this.
data <- data.frame(Condition = c(1,1,2,3,1,1,2,2,2,3,1,1,2,3,3))
I want to populate a new variable Sequence which identifies whenever Condition starts again from 1.
So the new dataframe would look like this.
Thanks in advance for the help!
data <- data.frame(Condition = c(1,1,2,3,1,1,2,2,2,3,1,1,2,3,3),
Sequence = c(1,1,1,1,2,2,2,2,2,2,3,3,3,3,3))
base R
data$Sequence2 <- cumsum(c(TRUE, data$Condition[-1] == 1 & data$Condition[-nrow(data)] != 1))
data
# Condition Sequence Sequence2
# 1 1 1 1
# 2 1 1 1
# 3 2 1 1
# 4 3 1 1
# 5 1 2 2
# 6 1 2 2
# 7 2 2 2
# 8 2 2 2
# 9 2 2 2
# 10 3 2 2
# 11 1 3 3
# 12 1 3 3
# 13 2 3 3
# 14 3 3 3
# 15 3 3 3
dplyr
library(dplyr)
data %>%
mutate(
Sequence2 = cumsum(Condition == 1 & lag(Condition != 1, default = TRUE))
)
# Condition Sequence Sequence2
# 1 1 1 1
# 2 1 1 1
# 3 2 1 1
# 4 3 1 1
# 5 1 2 2
# 6 1 2 2
# 7 2 2 2
# 8 2 2 2
# 9 2 2 2
# 10 3 2 2
# 11 1 3 3
# 12 1 3 3
# 13 2 3 3
# 14 3 3 3
# 15 3 3 3
This took a while. Finally I find this solution:
library(dplyr)
data %>%
group_by(Sequnce = cumsum(
ifelse(Condition==1, lead(Condition)+1, Condition)
- Condition==1)
)
Condition Sequnce
<dbl> <int>
1 1 1
2 1 1
3 2 1
4 3 1
5 1 2
6 1 2
7 2 2
8 2 2
9 2 2
10 3 2
11 1 3
12 1 3
13 2 3
14 3 3
15 3 3
I have a data.frame with two columns
> data.frame(a=c(5,4,3), b =c(1,2,4))
a b
1 5 1
2 4 2
3 3 4
I want to produce a list of data.frames with different combinations of those column values; there should be a total of six possible scenarios for the above example (correct me if I am wrong):
a b
1 5 1
2 4 2
3 3 4
a b
1 5 1
2 4 4
3 3 2
a b
1 5 2
2 4 1
3 3 4
a b
1 5 2
2 4 4
3 3 1
a b
1 5 4
2 4 2
3 3 1
a b
1 5 4
2 4 1
3 3 2
Is there a simple function to do it? I don't think expand.grid worked out for me.
Actually expand.grid can work here, but it is not recommended since it's rather inefficient when you have many rows in df (you need to subset n! out of n**n if you have n rows).
Below is an example using expand.grid
u <- do.call(expand.grid, rep(list(seq(nrow(df))), nrow(df)))
lapply(
asplit(
subset(
u,
apply(u, 1, FUN = function(x) length(unique(x))) == nrow(df)
), 1
), function(v) within(df, b <- b[v])
)
One more efficient option is to use perms from package pracma
library(pracma)
> lapply(asplit(perms(df$b),1),function(v) within(df,b<-v))
[[1]]
a b
1 5 4
2 4 2
3 3 1
[[2]]
a b
1 5 4
2 4 1
3 3 2
[[3]]
a b
1 5 2
2 4 4
3 3 1
[[4]]
a b
1 5 2
2 4 1
3 3 4
[[5]]
a b
1 5 1
2 4 2
3 3 4
[[6]]
a b
1 5 1
2 4 4
3 3 2
Using combinat::permn create all possible permutations of b value and for each bind it with a column.
df <- data.frame(a= c(5,4,3), b = c(1,2,4))
result <- lapply(combinat::permn(df$b), function(x) data.frame(a = df$a, b = x))
result
#[[1]]
# a b
#1 5 1
#2 4 2
#3 3 4
#[[2]]
# a b
#1 5 1
#2 4 4
#3 3 2
#[[3]]
# a b
#1 5 4
#2 4 1
#3 3 2
#[[4]]
# a b
#1 5 4
#2 4 2
#3 3 1
#[[5]]
# a b
#1 5 2
#2 4 4
#3 3 1
#[[6]]
# a b
#1 5 2
#2 4 1
#3 3 4
I’m trying to create a data frame in r that looks like this
X Y Z
3 1 1
3 1 2
3 1 3
3 2 1
3 2 2
3 2 3
4 1 1
4 1 2
4 1 3
4 2 1
...
So column z counts up to 3 then when it reaches 3 column y increments by 1 and z counts up again until 3. Then x increments by 1 and the process starts again
You could use expand.grid + rev
rev(expand.grid(z = 1:3, y = 1:2, x = 3:4))
x y z
1 3 1 1
2 3 1 2
3 3 1 3
4 3 2 1
5 3 2 2
6 3 2 3
7 4 1 1
8 4 1 2
9 4 1 3
10 4 2 1
11 4 2 2
12 4 2 3
An option is to use tidyr::crossing().
In your case:
crossing(X = 3:4,
Y = 1:2,
Z = 1:3)
data.frame(X=rep(3:4,each=6,1),
Y=rep(1:2,each=3,2),
Z=rep(1:3,each=1,4))
Here is another base R solution in addition to the expand.grid approach by #Onyambu.
The feature of this code below is that, you only need to put everything into the list lst, and pass it to function f:
f <- function(lst) data.frame(mapply(function(p,n) rep(p,each=n),lst, prod(lengths(lst))/cumprod(lengths(lst))))
lst<- list(x = 3:4,y = 1:2,z = 1:3)
res <- f(lst)
such that
> res
x y z
1 3 1 1
2 3 1 2
3 3 1 3
4 3 2 1
5 3 2 2
6 3 2 3
7 4 1 1
8 4 1 2
9 4 1 3
10 4 2 1
11 4 2 2
12 4 2 3
A data.table solution for completness:
data.table::CJ(x = 3:4, y = 1:2, z = 1:3)
x y z
1: 3 1 1
2: 3 1 2
3: 3 1 3
4: 3 2 1
5: 3 2 2
6: 3 2 3
7: 4 1 1
8: 4 1 2
9: 4 1 3
10: 4 2 1
11: 4 2 2
12: 4 2 3
I try to count triplets; for this I use three vectors that are packed in a dataframe:
X=c(4,4,4,4,4,4,4,4,1,1,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3)
Y=c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,3,4,2,2,2,2,3,4,1,1,2,2,3,3,4,4)
Z=c(4,4,5,4,4,4,4,4,6,1,1,1,1,1,1,1,2,2,2,2,7,2,3,3,3,3,3,3,3,3)
Count_Frame=data.frame(matrix(NA, nrow=(length(X)), ncol=3))
Count_Frame[1]=X
Count_Frame[2]=Y
Count_Frame[3]=Z
Counts=data.frame(table(Count_Frame))
There is the following problem: if I increase the value range in the vectors or use even more vectors the "Counts" dataframe quickly approaches its size limit due to the many 0-counts. Is there a way to exclude the 0-counts while generating "Counts"?
We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(Count_Frame)), grouped by all the columns (.(X, Y, Z)), we get the number or rows (.N).
library(data.table)
setDT(Count_Frame)[,.N ,.(X, Y, Z)]
# X Y Z N
# 1: 4 1 4 7
# 2: 4 1 5 1
# 3: 1 1 6 1
# 4: 1 1 1 3
# 5: 1 2 1 2
# 6: 1 3 1 1
# 7: 1 4 1 1
# 8: 2 2 2 4
# 9: 2 3 7 1
#10: 2 4 2 1
#11: 3 1 3 2
#12: 3 2 3 2
#13: 3 3 3 2
#14: 3 4 3 2
Instead of naming all the columns, we can use names(Count_Frame) as well (if there are many columns)
setDT(Count_Frame)[,.N , names(Count_Frame)]
You can accomplish this with aggregate:
Count_Frame$one <- 1
aggregate(one ~ X1 + X2 + X3, data=Count_Frame, FUN=sum)
This will calculate the positive instances of table, but will not list the zero counts.
One solution is to create a combination of the column values and count those instead:
library(tidyr)
as.data.frame(table(unite(Count_Frame, tmp, X1, X2, X3))) %>%
separate(Var1, c('X1', 'X2', 'X3'))
Resulting output is:
X1 X2 X3 Freq
1 1 1 1 3
2 1 1 6 1
3 1 2 1 2
4 1 3 1 1
5 1 4 1 1
6 2 2 2 4
7 2 3 7 1
8 2 4 2 1
9 3 1 3 2
10 3 2 3 2
11 3 3 3 2
12 3 4 3 2
13 4 1 4 7
14 4 1 5 1
Or using plyr:
library(plyr)
count(Count_Frame, colnames(Count_Frame))
output
# > count(Count_Frame, colnames(Count_Frame))
# X1 X2 X3 freq
# 1 1 1 1 3
# 2 1 1 6 1
# 3 1 2 1 2
# 4 1 3 1 1
# 5 1 4 1 1
# 6 2 2 2 4
# 7 2 3 7 1
# 8 2 4 2 1
# 9 3 1 3 2
# 10 3 2 3 2
# 11 3 3 3 2
# 12 3 4 3 2
# 13 4 1 4 7
# 14 4 1 5 1
So, expand.grid returns a df of all the combinations of the vectors passed.
df <- expand.grid(1:3, 1:3)
df <- expand.grid(1:3, 1:3, 1:3)
What I would like is a generalized function that takes 1 parameter (number of vectors) and returns the appropriate data frame.
combinations <- function(n) {
return(expand.grid(0, 1, ... n))
}
Such that
combinations(2) returns(expand.grid(1:3, 1:3))
combinations(3) returns(expand.grid(1:3, 1:3, 1:3))
combinations(4) returns(expand.grid(1:3, 1:3, 1:3, 1:3))
etc.
combinations <- function(n)
expand.grid(rep(list(1:3),n))
> combinations(2)
Var1 Var2
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
> combinations(3)
Var1 Var2 Var3
1 1 1 1
2 2 1 1
3 3 1 1
4 1 2 1
5 2 2 1
6 3 2 1
7 1 3 1
8 2 3 1
9 3 3 1
10 1 1 2
11 2 1 2
12 3 1 2
13 1 2 2
14 2 2 2
15 3 2 2
16 1 3 2
17 2 3 2
18 3 3 2
19 1 1 3
20 2 1 3
21 3 1 3
22 1 2 3
23 2 2 3
24 3 2 3
25 1 3 3
26 2 3 3
27 3 3 3