Related
I have a large set of size M (let's say 10), and I want to, repeatedly for a certain number of occasions (let's say 13), randomly split it into M/N smaller groups of size N (let's say 2). I'd like no element in the large set to be in a repeating group until they have been in a small group with every one else. (The actual problem here: I have a class of 10 people and I want to split them into 5 pairs for a duration of 13 weeks, but I don't want anyone to be in a repeat pairing until they have been in a pairing with everyone in the class.)
How can I do this? I started by [generating non-repeating permutations from my larger group][1], but the trouble I am having is that these unique permutations don't necessarily yield unique groups. (Someone seems to have posed this same question, but [it was resolved in Python][2]. I don't understand Python, and so I'm looking for an easy R solution.)
Any help much appreciated.
Edit: Thanks to all for suggestions. I realize my original question wasn't exactly clear. The solutions suggested below work well when I only want to split the set into a single subset of size N, each time. But my problem is actually that I want to split the set into M/N subsets of size N. For example, in the case of my class, I want to split the 10 students into 5 pairs of 2 on 13 different occasions, and I want pairs to be unique until they no longer can be (i.e., after 9 occasions have passed). Unless I'm failing to see how they can be applied, I don't think any of these solutions quite solves this problem.
I see that the OP has provided a solution from the linked math.so solution, but I would like to provide a working solution of the other answer on that page that gets to the heart of this problem. That solution mentions Round-robin tournament. From the wikipedia page, the algorithm is straightforward.
One simply fixes a position in a matrix and rotates the other indices clockwise. Given M initial players, there are M - 1 unique rounds. Thus, for our given situation, we can only obtain 9 unique sets of groups.
Below, is a very straightforward base R implementation:
roll <- function( x , n ){
if( n == 0 )
return(x)
c(tail(x,n), head(x,-n))
}
RoundRobin <- function(m, n) {
m <- as.integer(m)
n <- as.integer(n)
if (m %% 2L != 0L) {
m <- m + 1L
}
myRounds <- list(n)
myRounds[[1]] <- 1:m
for (i in 2:n) {
myRounds[[i]] <- myRounds[[i - 1L]]
myRounds[[i]][2:m] <- roll(myRounds[[i]][-1], 1)
}
lapply(myRounds, matrix, nrow = 2)
}
The roll function was obtained from this answer.
Here is sample output for 10 students and 4 weeks:
RoundRobin(10, 4)
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 4 6 8
[2,] 10 3 5 7 9
[[3]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 10 3 5 7
[2,] 9 2 4 6 8
[[4]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 9 2 4 6
[2,] 8 10 3 5 7
When we hit the 10th week, we see our first repeat "round".
RoundRobin(10, 13)[c(1, 2, 9, 10, 11)]
[[1]]
[,1] [,2] [,3] [,4] [,5] ## <- first week
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
[[2]]
[,1] [,2] [,3] [,4] [,5] ## <- second week
[1,] 1 2 4 6 8
[2,] 10 3 5 7 9
[[3]]
[,1] [,2] [,3] [,4] [,5] ## <- ninth week
[1,] 1 4 6 8 10
[2,] 3 5 7 9 2
[[4]]
[,1] [,2] [,3] [,4] [,5] ## <- tenth week
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
[[5]]
[,1] [,2] [,3] [,4] [,5] ## <- eleventh week
[1,] 1 2 4 6 8
[2,] 10 3 5 7 9
Note, this is a deterministic algorithm and given the simplicity, it is pretty efficient. E.g. if you have 1000 students and want to find all 999 unique pairings, you can run this function without fear:
system.time(RoundRobin(1000, 999))
user system elapsed
0.038 0.001 0.039
I think you maybe want something like this. It will produce a data frame with the unique combinations in rows. These are sampled randomly until all unique combinations are exhausted. Thereafter, if more samples are required it will sample randomly with replacement from unique combinations:
create_groups <- function(M, N, samples)
{
df <- seq(N) %>%
lapply(function(x) M) %>%
do.call(expand.grid, .) %>%
apply(1, sort) %>%
t() %>%
as.data.frame() %>%
unique()
df <- df[apply(df, 1, function(x) !any(duplicated(x))), ]
df <- df[sample(nrow(df)), ]
if(samples <= nrow(df)) return(df[seq(samples), ])
rbind(df, df[sample(seq(nrow(df)), samples - nrow(df), TRUE), ])
}
It's easy to see how it works if we want groups of 4 elements from 5 objects (there are only 5 possible combinations):
create_groups(letters[1:5], 4, 5)
#> V1 V2 V3 V4
#> 1 a b d e
#> 2 a b c d
#> 3 a c d e
#> 4 b c d e
#> 5 a b c e
We have a randomly-ordered sample of 4 objects drawn from the set, but no repeats. (the elements within each sample are ordered alphabetically however)
If we want more than 5 samples, the algorithm ensures that all unique combinations are exhausted before resampling:
create_groups(letters[1:5], 4, 6)
#> V1 V2 V3 V4
#> 1 a b c e
#> 2 a c d e
#> 3 a b d e
#> 4 b c d e
#> 5 a b c d
#> 6 a b d e
Here we see there are no repeated rows until row 6, which is a repeat of row 3.
For the example in your question, there are 45 unique combinations of 2 elements drawn from 10 objects, so we get no repeats in our 13 samples:
create_groups(1:10, 2, 13)
#> V1 V2
#> 1 7 8
#> 2 4 10
#> 3 2 8
#> 4 3 10
#> 5 3 9
#> 6 1 8
#> 7 4 9
#> 8 8 9
#> 9 7 9
#> 10 4 6
#> 11 5 7
#> 12 9 10
#> 13 4 7
I am not sure combn + sample can work for your goal
as.data.frame(t(combn(M, N))[sample(K <- choose(length(M), N), i, replace = K < i), ])
which gives
V1 V2
1 4 9
2 4 8
3 1 9
4 6 10
5 5 9
6 2 10
7 3 7
8 7 8
9 6 7
10 1 7
11 6 8
12 5 6
13 3 8
With apologies to all for not writing a clear question, here is a solution based on the solution suggested in this post. (Depending on the seed, it can get stuck, and if weeks are larger, the code to recycle old groups has to be adjusted a little.)
set.seed(1)
m<-10
n<-2
weeks<-13
groupmat<-combn(m,n)
students <- c(1:m)
pickedpairs <- matrix(
data=NA,
nrow=n,
ncol=0
)
while( ncol(pickedpairs) < ((m-1)*(m/n)) ) {
thisweekspairs <- matrix(sample(students),nrow=n,ncol=m/n)
#check if this weeks pairs
#are already in pickedpairs
#if so, skip iteration
pairsprez <- lapply(1:ncol(thisweekspairs),function(j) {
#j<-1
apply(pickedpairs,2,function(x) sum(x%in%thisweekspairs[,j])==n)
}) %>% Reduce(f="|") %>% sum
if(pairsprez>=1) {
pickedpairs<<-pickedpairs
} else {
pickedpairs<<-cbind(pickedpairs,thisweekspairs)
}
print(ncol(pickedpairs))
}
uniquepairs <- lapply(1:(ncol(pickedpairs)/(m/n)),function(i) {
pickedpairs[,(1 + (m/n)*(i-1)):((m/n)*i)]
})
#generate weeks' number of unique pairs
combine(
uniquepairs,
uniquepairs[sample(1:length(uniquepairs),weeks-length(uniquepairs))]
)
We could use slice_sample with combn
library(dplyr)
library(purrr)
combn(M, N, simplify = FALSE) %>%
invoke(rbind, .) %>%
as_tibble %>%
slice_sample(n = i)
# A tibble: 13 x 2
# V1 V2
# <int> <int>
# 1 4 5
# 2 3 8
# 3 9 10
# 4 5 7
# 5 8 9
# 6 3 9
# 7 5 10
# 8 4 10
# 9 2 5
#10 5 6
#11 6 9
#12 2 7
#13 4 9
I'm trying to understand more about matrices in R and I stumbled upon this query. Could someone explain to me why x[2:3, 2:2] returns 5 and 6?
Am I right to say that, 2:3, 2:2, simply refers to row 2, column 2 and row 3 column 2?
> x <- matrix(c(1:6), ncol = 2)
> x
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
>
>
> x[2:3, 2:2]
[1] 5 6
If that's the case, why is it that having a 3x3 matrix returns such a strange value?
> x <- matrix(c(1:9), ncol = 3)
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> x[1:2, 3:2, drop = F]
[,1] [,2]
[1,] 7 4
[2,] 8 5
How to extract every two elements in sequence in a matrix and return the result as a matrix so that I could feed the answer in a formula for calculation:
For example, I have a one row matrix with 6 columns:
[,1][,2][,3][,4][,5][,6]
[1,] 2 1 5 5 10 1
I want to extract column 1 and two in first iteration, 3 and 4 in second iteration and so on. The result has to be in the form of matrix.
[1,] 2 1
[2,] 5 5
[3,] 10 1
My original codes:
data <- matrix(c(1,1,1,2,2,1,2,2,5,5,5,6,10,1,10,2,11,1,11,2), ncol = 2)
Center Matrix:
[,1][,2][,3][,4][,5][,6]
[1,] 2 1 5 5 10 1
[2,] 1 1 2 1 10 1
[3,] 5 5 5 6 11 2
[4,] 2 2 5 5 10 1
[5,] 2 1 5 6 5 5
[6,] 2 2 5 5 11 1
[7,] 2 1 5 5 10 1
[8,] 1 1 5 6 11 1
[9,] 2 1 5 5 10 1
[10,] 5 6 11 1 10 2
objCentroidDist <- function(data, centers) {
resultMatrix <- matrix(NA, nrow=dim(data)[1], ncol=dim(centers)[1])
for(i in 1:nrow(centers)) {
resultMatrix [,i] <- sqrt(rowSums(t(t(data)-centers[i, ])^2))
}
resultMatrix
}
objCentroidDist(data,centers)
I want the Result matrix to be as per below:
[1,][,2][,3]
[1,]
[2,]
[3,]
[4,]
[5,]
[7,]
[8,]
[9,]
[10]
My concern is, how to calculate the data-centers distance if the dimensions of the data matrix are two, and centers matrix are six. (to calculate the distance from the data matrix and every two columns in centers matrix). Each row of the centers matrix has three centers.
Something like this maybe?
m <- matrix(c(2,1,5,5,10,1), ncol = 6)
list.seq.pairs <- lapply(seq(1, ncol(m), 2), function(x) {
m[,c(x, x+1)]
})
> list.seq.pairs
[[1]]
[1] 2 1
[[2]]
[1] 5 5
[[3]]
[1] 10 1
And, in case you're wanting to iterate over multiple rows in a matrix,
you can expand on the above like this:
mm <- matrix(1:18, ncol = 6, byrow = TRUE)
apply(mm, 1, function(x) {
lapply(seq(1, length(x), 2), function(y) {
x[c(y, y+1)]
})
})
EDIT:
I'm really not sure what you're after exactly. I think, if you want each row transformed into a 2 x 3 matrix:
mm <- matrix(1:18, ncol = 6, byrow = TRUE)
list.mats <- lapply(1:nrow(mm), function(x){
a = matrix(mm[x,], ncol = 2, byrow = TRUE)
})
> list.mats
[[1]]
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
[[2]]
[,1] [,2]
[1,] 7 8
[2,] 9 10
[3,] 11 12
[[3]]
[,1] [,2]
[1,] 13 14
[2,] 15 16
[3,] 17 18
If, however, you want to get to your results matrix- I think it's probably easiest to do whatever calculations you need to do while you're dealing with each row:
results <- t(apply(mm, 1, function(x) {
sapply(seq(1, length(x), 2), function(y) {
val1 = x[y] # Get item one
val2 = x[y+1] # Get item two
val1 / val2 # Do your calculation here
})
}))
> results
[,1] [,2] [,3]
[1,] 0.5000000 0.7500 0.8333333
[2,] 0.8750000 0.9000 0.9166667
[3,] 0.9285714 0.9375 0.9444444
That said, I don't understand what you're trying to do so this may miss the mark. You may have more luck if you ask a new question where you show example input and the actual expected output that you're after, with the actual values you expect.
I have a 7x7 matrix:
Mat<-matrix(nrow=7,ncol=7)
With certain elements:
Mat[2,2]<-37
Mat[2,4]<-39
Mat[2,6]<-24
Mat[4,2]<-35
Mat[4,4]<-36
Mat[4,6]<-26
Mat[6,2]<-26
Mat[6,4]<-31
Mat[6,6]<-39
I am generating random elements and want to test if they add up to the specified values
I have written the following code:
TF<-c()
TF[1]<-isTRUE(Mat[2,2]==sum(Mat[1,1],Mat[1,2],Mat[1,3],Mat[2,1],Mat[2,3],Mat[3,1],Mat[3,2],Mat[3,3]))
TF[2]<-isTRUE(Mat[2,4]==sum(Mat[1,3],Mat[1,4],Mat[1,5],Mat[2,3],Mat[2,5],Mat[3,3],Mat[3,4],Mat[3,5]))
TF[3]<-isTRUE(Mat[2,6]==sum(Mat[1,5],Mat[1,6],Mat[1,7],Mat[2,5],Mat[2,7],Mat[3,5],Mat[3,6],Mat[3,7]))
TF[4]<-isTRUE(Mat[4,2]==sum(Mat[3,1],Mat[3,2],Mat[3,3],Mat[4,3],Mat[4,5],Mat[5,1],Mat[5,2],Mat[5,3]))
TF[5]<-isTRUE(Mat[4,4]==sum(Mat[3,3],Mat[3,4],Mat[3,5],Mat[4,3],Mat[4,5],Mat[5,3],Mat[5,4],Mat[5,5]))
TF[6]<-isTRUE(Mat[4,6]==sum(Mat[3,5],Mat[3,6],Mat[3,7],Mat[4,5],Mat[4,7],Mat[5,5],Mat[5,6],Mat[5,7]))
TF[7]<-isTRUE(Mat[6,2]==sum(Mat[5,1],Mat[5,2],Mat[5,3],Mat[6,1],Mat[6,3],Mat[7,1],Mat[7,2],Mat[7,3]))
TF[8]<-isTRUE(Mat[6,4]==sum(Mat[5,3],Mat[5,4],Mat[5,5],Mat[6,3],Mat[6,5],Mat[7,3],Mat[7,4],Mat[7,5]))
TF[9]<-isTRUE(Mat[6,6]==sum(Mat[5,5],Mat[5,6],Mat[5,7],Mat[6,5],Mat[6,7],Mat[7,5],Mat[7,6],Mat[7,7]))
Now i am trying to make it more efficient with a nested for loop:
O<-c(2,4,6)
for (G in O)
{
for (H in O)
{
TF[]<-isTRUE(Mat[G,H]==sum(Mat[G-1,H-1],Mat[G-1,H],Mat[G-1,H+1],Mat[G,H-1],Mat[G,H+1],Mat[G+1,H-1],Mat[G+1,H],Mat[G+1,H+1]))
}
}
The problem is that the vector element will be overwritten and it does not make any sense to add another for loop.
I also have problem to find a way to rerun the simulation if one false is found.
Let's start first by answering the following question:
How do you compute the sum of every surrounding cell for each cell in a matrix?
This is actually not trivial as far as I can tell (curious to see if anyone else comes up with something cool). Here is a potential solution, though not even close to being succinct. Let's start by seeing the results of the function. Here we will create matrices of only 1 so we can check that the results make sense (corners should add to 3 since there are only three contiguous cells, insides to 8, etc.):
> compute_neighb_sum(matrix(1, nrow=3, ncol=3))
[,1] [,2] [,3]
[1,] 3 5 3
[2,] 5 8 5
[3,] 3 5 3
> compute_neighb_sum(matrix(1, nrow=3, ncol=5))
[,1] [,2] [,3] [,4] [,5]
[1,] 3 5 5 5 3
[2,] 5 8 8 8 5
[3,] 3 5 5 5 3
> compute_neighb_sum(matrix(1, nrow=7, ncol=7))
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 3 5 5 5 5 5 3
[2,] 5 8 8 8 8 8 5
[3,] 5 8 8 8 8 8 5
[4,] 5 8 8 8 8 8 5
[5,] 5 8 8 8 8 8 5
[6,] 5 8 8 8 8 8 5
[7,] 3 5 5 5 5 5 3
This works!
Now, let's answer your actual question:
compute_neighb_sum(mx) == mx
and this should return TRUE for all cells that are equal to the sum of their surroundings. Lets confirm:
mx <- matrix(1, nrow=7, ncol=7)
mx[cbind(c(3, 6), c(3, 6))] <- 8 # make two interior cells equal two 8, which will be equal to sum of surroundings
which(compute_neighb_sum(mx) == mx, arr.ind=T) # you should look at `mx` to see what's going on
Sure enough, we get back the coordinates that we expect:
row col
[1,] 3 3
[2,] 6 6
Now, here is the function:
compute_neighb_sum <- function(mx) {
mx.ind <- cbind( # create a 2 wide matrix of all possible indices in input
rep(seq.int(nrow(mx)), ncol(mx)),
rep(seq.int(ncol(mx)), each=nrow(mx))
)
sum_neighb_each <- function(x) {
near.ind <- cbind( # for each x, y coord, create an index of all surrounding values
rep(x[[1]] + -1:1, 3),
rep(x[[2]] + -1:1, each=3)
)
near.ind.val <- near.ind[ # eliminate out of bound values, or the actual x,y coord itself
!(
near.ind[, 1] < 1 | near.ind[, 1] > nrow(mx) |
near.ind[, 2] < 1 | near.ind[, 2] > ncol(mx) |
(near.ind[, 1] == x[[1]] & near.ind[, 2] == x[[2]])
),
]
sum(mx[near.ind.val]) # Now sum the surrounding cell values
}
`dim<-`( # this is just to return in same matrix format as input
sapply(
split(mx.ind, row(mx.ind)), # For each x, y coordinate in input mx
sum_neighb_each # compute the neighbor sum
),
c(nrow(mx), ncol(mx)) # dimensions of input
)
}
I am trying to cut one row
x = [1 2 3 4 5 6 7 8 9 10 11 12]
into multiple rows of equal length so that
y(row1) = [1 2 3 4
y(row2) = 5 6 7 8
y(row3) = 9 10 11 12]
I know I can achieve this using a combination of rbind and cbind, but the dataset I am trying to apply this to is much larger than the example, so I am looking for a way to do it more quickly and automatically. I tried cut and cut2 but those didnt work either
jelle
The function matrix() is your friend here:
> matrix(1:12, nrow = 3, byrow = TRUE)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
Note the optional parameter, byrow. The default is FALSE and will fill the matrix by columns, setting it to true in this case gets the data arranged in the order that you described.Just something to be careful about, since R won't throw an error if you fill by column, but your data won't be in the right format!
Use matrix:
> y <- 1:12
> y
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> matrix(y,3,4,byrow=1)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
Edit: I included the byrow=TRUE argument to matrix (pointed out by Chase in the comments) which fills the matrix along the rows instead of down the columns.