I have a number of individuals that I want to - randomly - divide in subgroups of size groupsize. This process I want to repeat n_group times - with no repeating group constellation.
How can I achieve this in R?
I tried the following so far:
set.seed(1)
individuals <- 1:6
groupsize <- 3
n_groups <- 4
for(i in 1:n_groups) { print(sample(individuals, groupsize))}
[1] 1 4 3
[1] 1 2 6
[1] 3 2 6
[1] 3 1 5
..but am not sure whether that really does not lead to repeating constellations..?
Edit: After looking at the first suggestions and answers I realized, that another restriction could be interesting to me (sorry for not seeing it upfront..).
Is there (in the concrete example above) a way to ensure, that every individual was in contact with every other individual?
Based on your edited question, I assuma that you want to make sure that all indivuals are in at least one subgroup?
Then this might be the solution:
individuals <- 1:6
groupsize <- 3
n_groups <- 4
#sample groups
library(RcppAlgos)
#initialise
answer <- matrix()
# If the length of all unique elements in the answer is smaller than
# the number of individuals, take a new sample
while (length(unique(as.vector(answer))) < length(individuals)) {
answer <- comboSample(individuals, groupsize, n = n_groups)
# Line below isfor demonstration only
#answer <- comboSample(individuals, groupsize, n = n_groups, seed = 123)
}
# sample answer with seed = 123 (see commented line above)
# [,1] [,2] [,3]
# [1,] 1 3 4
# [2,] 1 3 6
# [3,] 2 3 5
# [4,] 2 3 4
test for groups that contain not every individual
# Test with the following matrix
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 3 4
# [3,] 1 4 5
# [4,] 2 3 4
# Note that individual '6' is not present
answer <- matrix(c(1,2,3,1,3,4,1,4,5,2,3,4), nrow = 4, ncol = 3)
while (length(unique(as.vector(answer))) < length(individuals)) {
answer <- comboSample(individuals, groupsize, n = n_groups)
}
# is recalculated to (in this case) the following answer
# [,1] [,2] [,3]
# [1,] 4 5 6
# [2,] 3 4 5
# [3,] 1 3 6
# [4,] 2 4 5
PASSED ;-)
You can use while to dynamically update your combination set, which avoids duplicates, e.g.,
res <- c()
while (length(res) < pmin(n_groups, choose(length(individuals), groupsize))) {
v <- list(sort(sample(individuals, groupsize)))
if (!v %in% res) res <- c(res, v)
}
which gives
> res
[[1]]
[1] 2 5 6
[[2]]
[1] 2 3 6
[[3]]
[1] 1 5 6
[[4]]
[1] 1 2 6
Are there any direct functions that can be used to get the combinations of all the items in the vector?
myVector <- c(1,2,3)
for (i in myVector)
for (j in myVector)
for (k in myVector)
print(paste(i,j,k,sep=","))
The screenshot of the first part of the output look like this. As there are three values 1,2,3 there will be
3 * 3 * 3 = 27 lines
I tried to get the permutations using the function permn() as,
permn(myVector)
But is giving only the 9 different values.
Screenshot of the output :
Is there any direct function that can produce such a result as shown in the first?
Using RcppAlgos::permuteGeneral.
r <- RcppAlgos::permuteGeneral(myVector, length(myVector), repetition=TRUE)
head(r, 3)
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 1 2
# [3,] 1 1 3
If you want the comma separated strings, do
apply(r, 1, paste, collapse=",")
# [1] "1,1,1" "1,1,2" "1,1,3" "1,2,1" "1,2,2" "1,2,3" "1,3,1"
# [8] "1,3,2" "1,3,3" "2,1,1" "2,1,2" "2,1,3" "2,2,1" "2,2,2"
# [15] "2,2,3" "2,3,1" "2,3,2" "2,3,3" "3,1,1" "3,1,2" "3,1,3"
# [22] "3,2,1" "3,2,2" "3,2,3" "3,3,1" "3,3,2" "3,3,3"
Or the list output, you've also shown
RcppAlgos::permuteGeneral(myVector, length(myVector), FUN=function(x)
paste(x, collapse=","), repetition=TRUE)
# [[1]]
# [1] "1,1,1"
#
# [[2]]
# [1] "1,1,2"
#
# [[3]]
# [1] "1,1,3"
#
# [[4]]
# [1] "1,2,1"
# ...
You may decide on your own :)
Use expand.grid :
tmp <- expand.grid(myVector, myVector, myVector)
tmp
# Var1 Var2 Var3
#1 1 1 1
#2 2 1 1
#3 3 1 1
#4 1 2 1
#5 2 2 1
#6 3 2 1
#...
#...
If you want to do this automatically for the length of myVector without manually specifying it 3 times you can use replicate.
tmp <- do.call(expand.grid, replicate(length(myVector),
myVector, simplify = FALSE))
To paste the values together you can do :
do.call(paste, c(tmp, sep = ','))
# [1] "1,1,1" "2,1,1" "3,1,1" "1,2,1" "2,2,1" "3,2,1" "1,3,1" "2,3,1"
# [9] "3,3,1" "1,1,2" "2,1,2" "3,1,2" "1,2,2" "2,2,2" "3,2,2" "1,3,2"
#[17] "2,3,2" "3,3,2" "1,1,3" "2,1,3" "3,1,3" "1,2,3" "2,2,3" "3,2,3"
#[25] "1,3,3" "2,3,3" "3,3,3"
Note that there is a permutations function in the gtools package that allows you to generalize permutation outputs:
library(gtools)
permutations(3, 3, 1:3, repeats.allowed = TRUE)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 2
[3,] 1 1 3
[4,] 1 2 1
[5,] 1 2 2
[6,] 1 2 3
[7,] 1 3 1
[8,] 1 3 2
[9,] 1 3 3
[10,] 2 1 1
The function help describes the parameter settings.
It appears that pracma::combs does exactly this. That, and pracma::perms generate output sets which treat every element of the input as distinct, regardless of whether a value is repeated.
I have the elements for a matrix as follows:
diag= rep(1,5)
offdiag = c(rep(1:4), rep(1:3), rep(1:2), 1)
The final matrix I want should should be a symmetric matrix that looks like this:
1 1 2 3 4
1 1 1 2 3
2 1 1 1 2
3 2 1 1 1
4 3 2 1 1
where the diagonal is filled by diag and the lower-trianglar area is filled by offdiag column-wise.
In practice, all all numbers are random. So I need a generic way to fill in the matrix with elements.
Thanks in advance!
Try this:
m <- matrix(NA, ncol = length(diag), nrow = length(diag))
m[lower.tri(m)] <- offdiag
m[upper.tri(m)] <- t(m)[upper.tri(t(m))]
diag(m) <- diag
m
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 2 3 4
# [2,] 1 1 1 2 3
# [3,] 2 1 1 1 2
# [4,] 3 2 1 1 1
# [5,] 4 3 2 1 1
Another alternative: Manually create a distance matrix and work from there.
class(offdiag) <- "dist"
attr(offdiag, "Size") <- length(diag)
out <- as.matrix(offdiag)
diag(out) <- diag
out
I want to create interleaved matrix from a list of matrices.
Example input:
> l <- list(a=matrix(1:4,2),b=matrix(5:8,2))
> l
$a
[,1] [,2]
[1,] 1 3
[2,] 2 4
$b
[,1] [,2]
[1,] 5 7
[2,] 6 8
Expected output:
1 3
5 7
2 4
6 8
I have checked the interleave function in gdata but it does not show this behaviour for lists. Any help appreciated.
Here is a one-liner:
do.call(rbind, l)[order(sequence(sapply(l, nrow))), ]
# [,1] [,2]
# [1,] 1 3
# [2,] 5 7
# [3,] 2 4
# [4,] 6 8
To help understand, the matrices are first stacked on top of each other with do.call(rbind, l), then the rows are extracted in the right order:
sequence(sapply(l, nrow))
# a1 a2 b1 b2
# 1 2 1 2
order(sequence(sapply(l, nrow)))
# [1] 1 3 2 4
It will work with any number of matrices and it will do "the right thing" (subjective) even if they don't have the same number of rows.
Rather than reinventing the wheel, you can just modify it to get you to your destination.
The interleave function from "gdata" starts with ... to let you specify a number of data.frames or matrices to put together. The first few lines of the function look like this:
head(interleave)
#
# 1 function (..., append.source = TRUE, sep = ": ", drop = FALSE)
# 2 {
# 3 sources <- list(...)
# 4 sources[sapply(sources, is.null)] <- NULL
# 5 sources <- lapply(sources, function(x) if (is.matrix(x) ||
# 6 is.data.frame(x))
You can just rewrite lines 1 and 3 as I did in this Gist to create a list version of interleave (here, I've called it Interleave)
head(Interleave)
#
# 1 function (myList, append.source = TRUE, sep = ": ", drop = FALSE)
# 2 {
# 3 sources <- myList
# 4 sources[sapply(sources, is.null)] <- NULL
# 5 sources <- lapply(sources, function(x) if (is.matrix(x) ||
# 6 is.data.frame(x))
Does it work?
l <- list(a=matrix(1:4,2),b=matrix(5:8,2), c=matrix(9:12,2))
Interleave(l)
# [,1] [,2]
# a 1 3
# b 5 7
# c 9 11
# a 2 4
# b 6 8
# c 10 12
Here is the code for generating the list:
x = matrix(1, 4, 4)
x[2,2] = 5
x[2:3, 1] = 3
x
# [,1] [,2] [,3] [,4]
#[1,] 1 1 1 1
#[2,] 3 5 1 1
#[3,] 3 1 1 1
#[4,] 1 1 1 1
res = apply(x, 2, function(i) list(m=max(i), idx=which(i == max(i))))
res
#[[1]]
#[[1]]$m
#[1] 3
#
#[[1]]$idx
#[1] 2 3
#
#
#[[2]]
#[[2]]$m
#[1] 5
#
#[[2]]$idx
#[1] 2
#
#
#[[3]]
#[[3]]$m
#[1] 1
#
#[[3]]$idx
#[1] 1 2 3 4
#
#
#[[4]]
#[[4]]$m
#[1] 1
#
#[[4]]$idx
#[1] 1 2 3 4
Now i want to compare the $m in each sub-list, get the maximum and its index in the matrix, i can do this way
mvector = vector('numeric', 4)
for (i in 1:4) {
mvector[i] = res[[i]]$m
}
mvector
#[1] 3 5 1 1
max_m = max(mvector)
max_m
#[1] 5
max_col = which(mvector == max_m)
max_row = res[[max_col]]$idx
max_row
#[1] 2
x[max_row, max_col]
#[1] 5
I am wondering if there is a simpler way of doing this?
Do you have to build that list? You can work on the matrix x directly:
max value (your max_m):
max(x)
# [1] 5
where in the matrix this value is found (first match only):
which.max(x)
# [1] 5
or its row and column indices (your max.row and max.col):
arrayInd(which.max(x), dim(x))
# [,1] [,2]
# [1,] 2 2
And in case of multiple maximums, you can get all of them by replacing which.max(x) with which(x == max(x)) in the two statements above.