All possible combinations over groups - r

I have 5 groups: G1, G2,…,G5 with n1,n2,…,n5 elements in each group respectively. I select 2 elements from each of the 4 groups and 1 element from the 5th group. How do I generate all possible combinations in R?

(It is not specified in the question whether the groups are mutually exclusive or not; So, assume:
1. the groups are mutually exclusive
2. the subsets of groups (n1, n2, ...) will use the same elements in being filled)
3 just for the sake of argument |G1|=|G2|=|G3|=5 (The user can change the following code accordingly for differing numbers of elements in the groups)
The following is 3 set mock-up answer of the question that any user can generalize to arbitrary number of groups. So, assume group names are G1, G2, G3.
library(causfinder)
gctemplate(5,2,2) # Elements are coded as: 1,2,3,4,5; |sub-G1|=2; |sub-G2|=2; |sub-G3|=5-(2+2)=1
# In the following table, each number represents a unique element. (SOLUTION ENDED!)
My package (causfinder) is not in CRAN. Hence, I will give the function gctemplate's code below.
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5 sub-G1={1,2} sub-G2={3,4} sub-G3={5}
[2,] 1 2 3 5 4
[3,] 1 2 4 5 3 sub-G1={1,2} sub-G2={4,5} sub-G3={3}
[4,] 1 3 2 4 5
[5,] 1 3 2 5 4
[6,] 1 3 4 5 2
[7,] 1 4 2 3 5
[8,] 1 4 2 5 3
[9,] 1 4 3 5 2
[10,] 1 5 2 3 4
[11,] 1 5 2 4 3
[12,] 1 5 3 4 2
[13,] 2 3 1 4 5
[14,] 2 3 1 5 4
[15,] 2 3 4 5 1
[16,] 2 4 1 3 5
[17,] 2 4 1 5 3
[18,] 2 4 3 5 1
[19,] 2 5 1 3 4
[20,] 2 5 1 4 3
[21,] 2 5 3 4 1
[22,] 3 4 1 2 5
[23,] 3 4 1 5 2
[24,] 3 4 2 5 1
[25,] 3 5 1 2 4
[26,] 3 5 1 4 2
[27,] 3 5 2 4 1
[28,] 4 5 1 2 3
[29,] 4 5 1 3 2
[30,] 4 5 2 3 1
The code of gctemplate:
gctemplate <- function(nvars, ncausers, ndependents){
independents <- combn(nvars, ncausers)
patinajnumber <- dim(combn(nvars - ncausers, ndependents))[[2]]
independentspatinajednumber <- dim(combn(nvars, ncausers))[[2]]*patinajnumber
dependents <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]]*patinajnumber, ncol = ndependents)
for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])){
dependents[(patinajnumber*(i-1)+1):(patinajnumber*i),] <- t(combn(setdiff(seq(1:nvars), independents[,i]), ndependents))
}
independentspatinajed <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]]*patinajnumber, ncol = ncausers)
for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])){
for (j in as.integer(1:patinajnumber)){
independentspatinajed[(i-1)*patinajnumber+j,] <- independents[,i]
}}
independentsdependents <- cbind(independentspatinajed, dependents)
others <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]]*patinajnumber, ncol = nvars - ncausers - ndependents)
for (i in as.integer(1:((dim(combn(nvars, ncausers))[[2]])*patinajnumber))){
others[i, ] <- setdiff(seq(1:nvars), independentsdependents[i,])
}
causalitiestemplate <- cbind(independentsdependents, others)
causalitiestemplate
}
Now, the solution for G1,G2,G3 is the above. Just generalize the above code to 5-variable case with the very same logic!

Related

R: list all directionless circular permutations/arrangements (i.e. where clockwise/anti-clockwise are the same)

How do I list all the circular permutations in R where direction does not matter? I have a vector 1:4 for illustration (however, I would like a general solution).
I use
gtools::permutations(n = 4, r = 4)
which gives me a listing of all possible permutations as follows:
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 1 2 4 3
[3,] 1 3 2 4
[4,] 1 3 4 2
[5,] 1 4 2 3
[6,] 1 4 3 2
[7,] 2 1 3 4
[8,] 2 1 4 3
[9,] 2 3 1 4
[10,] 2 3 4 1
[11,] 2 4 1 3
[12,] 2 4 3 1
[13,] 3 1 2 4
[14,] 3 1 4 2
[15,] 3 2 1 4
[16,] 3 2 4 1
[17,] 3 4 1 2
[18,] 3 4 2 1
[19,] 4 1 2 3
[20,] 4 1 3 2
[21,] 4 2 1 3
[22,] 4 2 3 1
[23,] 4 3 1 2
[24,] 4 3 2 1
However, what I would like is the listing of the six circular permutations. So, I think that this is:
cbind(gtools::permutations(n = 3, r = 3),4)
that gives me:
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 1 3 2 4
[3,] 2 1 3 4
[4,] 2 3 1 4
[5,] 3 1 2 4
[6,] 3 2 1 4
However, I would like to also ignore the listings which are the same but for order. Example: I do not want to distinguish c(1,2,3,4) from c(4,3,2,1) (i.e. the 1st and the 6th entry), or c(1, 3, 2, 4) and c(2, 1, 3, 4) (i.e. the 2nd and the 4th entry) and c(2, 1, 3, 4) from c(3, 1, 2, 4) (i.e. the 3rd and the 5th entry in the output)? Is it simply a case of taking the first half of the results?
Is there a surer way of doing this? Many thanks for answering my question or providing suggestions.
We can't simply take the first half of results of permutations(n-1, n-1) and append n. This is easy to see for n = 5.
I suggest to use the following approach:
We set the first element to always be 1. This way we make sure that we take only 1 permutation for each set of equivalent permutations. That's basically the same as you did making 4 the always be the last element in your example.
Consider only such permutations for which element #2 is less than element #n. For each permutation in the set described by the first rule there will be one and only one permutation which is reverse to it. This way we make sure to only take one permutation for each such pair.
And this is the algorithm we are going to use to construct such set of permutations:
Find all pairs of elements #2 and #n, where #2 is less then #n. This is combinations(n-1, 2, v = 2:n).
For each such combination find all permutations of all the rest n-3 elements. This is permutations(n - 3, n - 3, v = rest_elements) where rest_elements is a vector listing all n-3 elements that are left when we remove 1, #2 and #n.
library(gtools)
get_perms <- function(n) {
# 1 is always first, #2 and #n have to be such that #2 < #n
# all combinations of elements #2 and #n:
combs_2n <- combinations(n-1, 2, v = 2:n)
# for each such combination we have n-3 elements left to place
# it's (n-3)! variants
n_perms_rest <- factorial(n - 3)
# resulting matrix with placeholders for these element combinations
res <-
cbind(
1, # element 1
rep(combs_2n[,1], each = n_perms_rest), #element 2
matrix(nrow = n_perms_rest*nrow(combs_2n), ncol = n-3), # elements 2-(n-1)
rep(combs_2n[,2], each = n_perms_rest)
)
# fill placeholders
for (i in 1:nrow(combs_2n)) {
rest_elements <- setdiff(2:n, combs_2n[i,])
rest_perms <- permutations(n - 3, n - 3, v = rest_elements)
res[1:n_perms_rest + (i-1)*n_perms_rest, 3:(n - 1)] <- rest_perms
}
res
}
get_perms(5)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 2 4 5 3
#> [2,] 1 2 5 4 3
#> [3,] 1 2 3 5 4
#> [4,] 1 2 5 3 4
#> [5,] 1 2 3 4 5
#> [6,] 1 2 4 3 5
#> [7,] 1 3 2 5 4
#> [8,] 1 3 5 2 4
#> [9,] 1 3 2 4 5
#> [10,] 1 3 4 2 5
#> [11,] 1 4 2 3 5
#> [12,] 1 4 3 2 5
Created on 2021-08-28 by the reprex package (v2.0.1)

unique relation between two columns X and Y using R [duplicate]

I have a data frame of integers that is a subset of all of the n choose 3 combinations of 1...n.
E.g., for n=5, it is something like:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 3 4
[5,] 1 3 5
[6,] 1 4 5
[7,] 2 1 3
[8,] 2 1 4
[9,] 2 1 5
[10,] 2 3 4
[11,] 2 3 5
[12,] 2 4 5
[13,] 3 1 2
[14,] 3 1 4
[15,] 3 1 5
[16,] 3 2 4
[17,] 3 2 5
[18,] 3 4 5
[19,] 4 1 2
[20,] 4 1 3
[21,] 4 1 5
[22,] 4 2 3
[23,] 4 2 5
[24,] 4 3 5
[25,] 5 1 2
[26,] 5 1 3
[27,] 5 1 4
[28,] 5 2 3
[29,] 5 2 4
[30,] 5 3 4
What I'd like to do is remove any rows with duplicate combinations, irrespective of ordering. E.g., [1,] 1 2 3 is the same as [1,] 2 1 3 is the same as [1,] 3 1 2.
unique, duplicated, &c. don't seem to take this into account. Also, I am working with quite a large amount of data (n is ~750), so it ought to be a pretty fast operation. Are there any base functions or packages that can do this?
Sort within the rows first, then use duplicated, see below:
# example data
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items
dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5

Removing rows based off of duplicate answers in different columns [duplicate]

I have a data frame of integers that is a subset of all of the n choose 3 combinations of 1...n.
E.g., for n=5, it is something like:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 3 4
[5,] 1 3 5
[6,] 1 4 5
[7,] 2 1 3
[8,] 2 1 4
[9,] 2 1 5
[10,] 2 3 4
[11,] 2 3 5
[12,] 2 4 5
[13,] 3 1 2
[14,] 3 1 4
[15,] 3 1 5
[16,] 3 2 4
[17,] 3 2 5
[18,] 3 4 5
[19,] 4 1 2
[20,] 4 1 3
[21,] 4 1 5
[22,] 4 2 3
[23,] 4 2 5
[24,] 4 3 5
[25,] 5 1 2
[26,] 5 1 3
[27,] 5 1 4
[28,] 5 2 3
[29,] 5 2 4
[30,] 5 3 4
What I'd like to do is remove any rows with duplicate combinations, irrespective of ordering. E.g., [1,] 1 2 3 is the same as [1,] 2 1 3 is the same as [1,] 3 1 2.
unique, duplicated, &c. don't seem to take this into account. Also, I am working with quite a large amount of data (n is ~750), so it ought to be a pretty fast operation. Are there any base functions or packages that can do this?
Sort within the rows first, then use duplicated, see below:
# example data
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items
dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5

Removing duplicates on subset of columns in R

I have a table which is
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 10 0.00040803 0.00255277
[2,] 1 11 3 0.01765470 0.01584580
[3,] 1 6 2 0.15514850 0.15509000
[4,] 1 8 14 0.02100531 0.02572320
[5,] 1 9 4 0.04748648 0.00843252
[6,] 2 5 10 0.00040760 0.06782680
[7,] 2 11 3 0.01765480 0.01584580
[8,] 2 6 2 0.15514810 0.15509000
[9,] 2 8 14 0.02100491 0.02572320
[10,] 2 9 4 0.04748608 0.00843252
[11,] 3 5 10 0.00040760 0.06782680
[12,] 3 11 3 0.01765480 0.01584580
[13,] 3 8 14 0.02100391 0.02572320
[14,] 3 9 4 0.04748508 0.00843252
[15,] 4 5 10 0.00040760 0.06782680
[16,] 4 11 3 0.01765480 0.01584580
[17,] 4 8 14 0.02100391 0.02572320
[18,] 4 9 4 0.04748508 0.00843252
[19,] 5 8 14 0.02100391 0.02572320
[20,] 5 9 4 0.04748508 0.00843252
I want to remove duplicates from this table. However, only colums 2,3,4 matter. Example: rows 1,6,11,15 are identical if only columns 2,3,4 are observed. Note for column 4: is it possible to incorporate that it is considered as being the same as long as it is within 10e-5 of the value? So that rows 1 and 6 would be considered as being identical although the value in column 4 differs slightly (within the tolerance I mentioned)?
Then it would be great to get an output which would be like:
column 2 value | column 3 value | column 1 value at which the the pair has been first observed (with the tolerance) (in the example 1) | column 1 value at which the pair has been last observed (with tolerance) (in the example 4) | value of column 4 at first appearance (0.00040803 in the example)
This is a way of thinking about it, but I'm not sure it's what you're looking for. The logic should be able to get you started though.
dat <- YOUR DATA SET
dat
V1 V2 V3 V4 V5
1 1 5 10 0.00040803 0.00255277
2 1 11 3 0.01765470 0.01584580
3 1 6 2 0.15514850 0.15509000
4 1 8 14 0.02100531 0.02572320
5 1 9 4 0.04748648 0.00843252
# TRUNCATED
dat <- dat[, c(2, 3, 4)]
dat$V4 <- round(dat$V4, 5)
unique(dat)
V2 V3 V4
1 5 10 0.00041
2 11 3 0.01765
3 6 2 0.15515
4 8 14 0.02101
5 9 4 0.04749
9 8 14 0.02100
You could do something like this:
# read your data
yy <- read.csv('your-data.csv', header=F)
## V1 V2 V3 V4 V5
## 1 1 5 10 0.00040803 0.00255277
## 2 1 11 3 0.01765470 0.01584580
## 3 1 6 2 0.15514850 0.15509000
## 4 1 8 14 0.02100531 0.02572320
# create a logical matrix indicating value is within tolerance
mat.eq.tol <- sapply(yy$V4, function(x) abs(yy$V4-x) < 1E-5)
# minimum index
eq.min <- apply(mat.eq.tol, 1, function(x) min(which(x)))
# maximum index
eq.max <- apply(mat.eq.tol, 1, function(x) max(which(x)))
# combine result
res <- cbind(yy$V2, yy$V3, yy$V1[eq.min], yy$V1[eq.max], yy$V4[eq.min])
## [,1] [,2] [,3] [,4] [,5]
## [1,] 5 10 1 4 0.00040803
## [2,] 11 3 1 4 0.01765470
## [3,] 6 2 1 2 0.15514850
## [4,] 8 14 1 5 0.02100531
## [5,] 9 4 1 5 0.04748648
## [6,] 5 10 1 4 0.00040803

Create dataframe of all array indices in R

Using R, I'm trying to construct a dataframe of the row and col numbers of a given matrix. E.g., if
a <- matrix(c(1:15), nrow=5, ncol=3)
then I'm looking to construct a dataframe that gives:
row col
1 1
1 2
1 3
. .
5 1
5 2
5 3
What I've tried:
row <- matrix(row(a), ncol=1, nrow=dim(a)[1]*dim(a)[2], byrow=T)
col <- matrix(col(a), ncol=1, nrow=dim(a)[1]*dim(a)[2], byrow=T)
out <- cbind(row, col)
colnames(out) <- c("row", "col")
results in:
row col
[1,] 1 1
[2,] 2 1
[3,] 3 1
[4,] 4 1
[5,] 5 1
[6,] 1 2
[7,] 2 2
[8,] 3 2
[9,] 4 2
[10,] 5 2
[11,] 1 3
[12,] 2 3
[13,] 3 3
[14,] 4 3
[15,] 5 3
Which isn't what I'm looking for, as the sequence of rows and cols in suddenly reversed, even tough I specified "byrow=T". I don't see if and where I'm making a mistake but would hugely appreciate suggestions to overcome this problem. Thanks in advance!
I'd use expand.grid on the vectors 1:ncol and 1:nrow, then flip the columns with [,2:1] to get them in the order you want:
> expand.grid(seq(ncol(a)),seq(nrow(a)))[,2:1]
Var2 Var1
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
10 4 1
11 4 2
12 4 3
13 5 1
14 5 2
15 5 3
Use row and col, but more directly manipulate their output ordering since they return corresponding indices in place for the input array. Use t to get the non-default order you want in the end:
data.frame(row = as.vector(t(row(a))), col = as.vector(t(col(a))))
row col
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
10 4 1
11 4 2
12 4 3
13 5 1
14 5 2
15 5 3
Or, as a matrix not a data.frame:
cbind(as.vector(t(row(a))), as.vector(t(col(a))))
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 2 1
[5,] 2 2
[6,] 2 3
[7,] 3 1
[8,] 3 2
[9,] 3 3
[10,] 4 1
[11,] 4 2
[12,] 4 3
[13,] 5 1
[14,] 5 2
[15,] 5 3
You may want to have a look at ?expand.grid, which does just about exactly what you want to achieve.
Since there are many ways to skin a cat, I'll chip in with yet another variant based on rep:
data.frame(row=rep(seq(nrow(a)), each=ncol(a)), col=rep(seq(ncol(a)), nrow(a)))
...but to announce a "winner", I think you need to time the solutions:
# Make up a huge matrix...
a <- matrix(runif(1e7), 1e4)
system.time( a1<-data.frame(row = as.vector(t(row(a))),
col = as.vector(t(col(a)))) ) # 0.68 secs
system.time( a2<-expand.grid(col = seq(ncol(a)),
row = seq(nrow(a)))[,2:1] ) # 0.49 secs
system.time( a3<-data.frame(row=rep(seq(nrow(a)), each=ncol(a)),
col=rep(seq(ncol(a)), nrow(a))) ) # 0.59 secs
identical(a1, a2) && identical(a1, a3) # TRUE
...so it seems #Spacedman has the speediest solution!

Resources