Efficiently replicate matrix rows by group in R - r

I am trying to find a way to efficiently replicate rows of a matrix in R based on a group. Let's say I have the following matrix a:
a <- matrix(
c(1, 2, 3,
4, 5, 6,
7, 8, 9),
ncol = 3, byrow = TRUE
)
I want to create a new matrix where each row in a is repeated based on a number specified in a vector (what I'm calling a "group"), e.g.:
reps <- c(2, 3, 4)
In this case, the resulting matrix would be:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
[3,] 4 5 6
[4,] 4 5 6
[5,] 4 5 6
[6,] 7 8 9
[7,] 7 8 9
[8,] 7 8 9
[9,] 7 8 9
This is the only solution I've come up with so far:
matrix(
rep(a, times = rep(reps, times = 3)),
ncol = 3, byrow = FALSE
)
Notice that in this solution I have to use rep() twice - first to replicate the reps vector, and then again to actually replicate each row of a.
This solution works fine, but I'm looking for a more efficient solution as in my case this is being done inside an optimization loop and is being computed in each iteration of the loop, and it's rather slow if a is large.
I'll note that this question is very similar, but it is about repeating each row the same number of times. This question is also similarly about efficiency, but it's about replicating entire matrices.
UPDATE
Since I'm interested in efficiency, here is a simple comparison of the solutions provided thus far...I'll update this as more come in, but in general it looks like the seq_along solution by F. Privé is the fastest.
library(dplyr)
library(tidyr)
a <- matrix(seq(9), ncol = 3, byrow = TRUE)
reps <- c(2, 3, 4)
rbenchmark::benchmark(
"original solution" = {
result <- matrix(rep(a, times = rep(reps, times = 3)),
ncol = 3, byrow = FALSE)
},
"seq_along" = {
result <- a[rep(seq_along(reps), reps), ]
},
"uncount" = {
result <- as.data.frame(a) %>%
uncount(reps)
},
replications = 1000,
columns = c("test", "replications", "elapsed", "relative")
)
test replications elapsed relative
1 original solution 1000 0.004 1.333
2 seq_along 1000 0.003 1.000
3 uncount 1000 1.722 574.000

Simply use a[rep(seq_along(reps), reps), ].

Another option with uncount
library(dplyr)
library(tidyr)
as.data.frame(a) %>%
uncount(reps)
-ouptut
V1 V2 V3
1 1 2 3
2 1 2 3
3 4 5 6
4 4 5 6
5 4 5 6
6 7 8 9
7 7 8 9
8 7 8 9
9 7 8 9

Another base R option (not as elegant as the answer by #F. Privé or #akrun)
> t(do.call(cbind, mapply(replicate, reps, asplit(a, 1))))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
[3,] 4 5 6
[4,] 4 5 6
[5,] 4 5 6
[6,] 7 8 9
[7,] 7 8 9
[8,] 7 8 9
[9,] 7 8 9

Related

How to repeatedly generate non-repeating smaller groups from a larger set

I have a large set of size M (let's say 10), and I want to, repeatedly for a certain number of occasions (let's say 13), randomly split it into M/N smaller groups of size N (let's say 2). I'd like no element in the large set to be in a repeating group until they have been in a small group with every one else. (The actual problem here: I have a class of 10 people and I want to split them into 5 pairs for a duration of 13 weeks, but I don't want anyone to be in a repeat pairing until they have been in a pairing with everyone in the class.)
How can I do this? I started by [generating non-repeating permutations from my larger group][1], but the trouble I am having is that these unique permutations don't necessarily yield unique groups. (Someone seems to have posed this same question, but [it was resolved in Python][2]. I don't understand Python, and so I'm looking for an easy R solution.)
Any help much appreciated.
Edit: Thanks to all for suggestions. I realize my original question wasn't exactly clear. The solutions suggested below work well when I only want to split the set into a single subset of size N, each time. But my problem is actually that I want to split the set into M/N subsets of size N. For example, in the case of my class, I want to split the 10 students into 5 pairs of 2 on 13 different occasions, and I want pairs to be unique until they no longer can be (i.e., after 9 occasions have passed). Unless I'm failing to see how they can be applied, I don't think any of these solutions quite solves this problem.
I see that the OP has provided a solution from the linked math.so solution, but I would like to provide a working solution of the other answer on that page that gets to the heart of this problem. That solution mentions Round-robin tournament. From the wikipedia page, the algorithm is straightforward.
One simply fixes a position in a matrix and rotates the other indices clockwise. Given M initial players, there are M - 1 unique rounds. Thus, for our given situation, we can only obtain 9 unique sets of groups.
Below, is a very straightforward base R implementation:
roll <- function( x , n ){
if( n == 0 )
return(x)
c(tail(x,n), head(x,-n))
}
RoundRobin <- function(m, n) {
m <- as.integer(m)
n <- as.integer(n)
if (m %% 2L != 0L) {
m <- m + 1L
}
myRounds <- list(n)
myRounds[[1]] <- 1:m
for (i in 2:n) {
myRounds[[i]] <- myRounds[[i - 1L]]
myRounds[[i]][2:m] <- roll(myRounds[[i]][-1], 1)
}
lapply(myRounds, matrix, nrow = 2)
}
The roll function was obtained from this answer.
Here is sample output for 10 students and 4 weeks:
RoundRobin(10, 4)
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 4 6 8
[2,] 10 3 5 7 9
[[3]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 10 3 5 7
[2,] 9 2 4 6 8
[[4]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 9 2 4 6
[2,] 8 10 3 5 7
When we hit the 10th week, we see our first repeat "round".
RoundRobin(10, 13)[c(1, 2, 9, 10, 11)]
[[1]]
[,1] [,2] [,3] [,4] [,5] ## <- first week
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
[[2]]
[,1] [,2] [,3] [,4] [,5] ## <- second week
[1,] 1 2 4 6 8
[2,] 10 3 5 7 9
[[3]]
[,1] [,2] [,3] [,4] [,5] ## <- ninth week
[1,] 1 4 6 8 10
[2,] 3 5 7 9 2
[[4]]
[,1] [,2] [,3] [,4] [,5] ## <- tenth week
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
[[5]]
[,1] [,2] [,3] [,4] [,5] ## <- eleventh week
[1,] 1 2 4 6 8
[2,] 10 3 5 7 9
Note, this is a deterministic algorithm and given the simplicity, it is pretty efficient. E.g. if you have 1000 students and want to find all 999 unique pairings, you can run this function without fear:
system.time(RoundRobin(1000, 999))
user system elapsed
0.038 0.001 0.039
I think you maybe want something like this. It will produce a data frame with the unique combinations in rows. These are sampled randomly until all unique combinations are exhausted. Thereafter, if more samples are required it will sample randomly with replacement from unique combinations:
create_groups <- function(M, N, samples)
{
df <- seq(N) %>%
lapply(function(x) M) %>%
do.call(expand.grid, .) %>%
apply(1, sort) %>%
t() %>%
as.data.frame() %>%
unique()
df <- df[apply(df, 1, function(x) !any(duplicated(x))), ]
df <- df[sample(nrow(df)), ]
if(samples <= nrow(df)) return(df[seq(samples), ])
rbind(df, df[sample(seq(nrow(df)), samples - nrow(df), TRUE), ])
}
It's easy to see how it works if we want groups of 4 elements from 5 objects (there are only 5 possible combinations):
create_groups(letters[1:5], 4, 5)
#> V1 V2 V3 V4
#> 1 a b d e
#> 2 a b c d
#> 3 a c d e
#> 4 b c d e
#> 5 a b c e
We have a randomly-ordered sample of 4 objects drawn from the set, but no repeats. (the elements within each sample are ordered alphabetically however)
If we want more than 5 samples, the algorithm ensures that all unique combinations are exhausted before resampling:
create_groups(letters[1:5], 4, 6)
#> V1 V2 V3 V4
#> 1 a b c e
#> 2 a c d e
#> 3 a b d e
#> 4 b c d e
#> 5 a b c d
#> 6 a b d e
Here we see there are no repeated rows until row 6, which is a repeat of row 3.
For the example in your question, there are 45 unique combinations of 2 elements drawn from 10 objects, so we get no repeats in our 13 samples:
create_groups(1:10, 2, 13)
#> V1 V2
#> 1 7 8
#> 2 4 10
#> 3 2 8
#> 4 3 10
#> 5 3 9
#> 6 1 8
#> 7 4 9
#> 8 8 9
#> 9 7 9
#> 10 4 6
#> 11 5 7
#> 12 9 10
#> 13 4 7
I am not sure combn + sample can work for your goal
as.data.frame(t(combn(M, N))[sample(K <- choose(length(M), N), i, replace = K < i), ])
which gives
V1 V2
1 4 9
2 4 8
3 1 9
4 6 10
5 5 9
6 2 10
7 3 7
8 7 8
9 6 7
10 1 7
11 6 8
12 5 6
13 3 8
With apologies to all for not writing a clear question, here is a solution based on the solution suggested in this post. (Depending on the seed, it can get stuck, and if weeks are larger, the code to recycle old groups has to be adjusted a little.)
set.seed(1)
m<-10
n<-2
weeks<-13
groupmat<-combn(m,n)
students <- c(1:m)
pickedpairs <- matrix(
data=NA,
nrow=n,
ncol=0
)
while( ncol(pickedpairs) < ((m-1)*(m/n)) ) {
thisweekspairs <- matrix(sample(students),nrow=n,ncol=m/n)
#check if this weeks pairs
#are already in pickedpairs
#if so, skip iteration
pairsprez <- lapply(1:ncol(thisweekspairs),function(j) {
#j<-1
apply(pickedpairs,2,function(x) sum(x%in%thisweekspairs[,j])==n)
}) %>% Reduce(f="|") %>% sum
if(pairsprez>=1) {
pickedpairs<<-pickedpairs
} else {
pickedpairs<<-cbind(pickedpairs,thisweekspairs)
}
print(ncol(pickedpairs))
}
uniquepairs <- lapply(1:(ncol(pickedpairs)/(m/n)),function(i) {
pickedpairs[,(1 + (m/n)*(i-1)):((m/n)*i)]
})
#generate weeks' number of unique pairs
combine(
uniquepairs,
uniquepairs[sample(1:length(uniquepairs),weeks-length(uniquepairs))]
)
We could use slice_sample with combn
library(dplyr)
library(purrr)
combn(M, N, simplify = FALSE) %>%
invoke(rbind, .) %>%
as_tibble %>%
slice_sample(n = i)
# A tibble: 13 x 2
# V1 V2
# <int> <int>
# 1 4 5
# 2 3 8
# 3 9 10
# 4 5 7
# 5 8 9
# 6 3 9
# 7 5 10
# 8 4 10
# 9 2 5
#10 5 6
#11 6 9
#12 2 7
#13 4 9

Draw d observation from a sample n times and calculate the mean

Consider the following data:
x <- c(2, 4, 6, 8)
mean(x)
[1] 5
Now I want do draw 2 observations. This gives me 6 combinations. I want to calculate the mean for all 6 combinations, and the mean of these 6 values. That is, I should get the following means:
(2+4)/2 = 3
(2+6)/2 = 4
(2+8)/2 = 5
(4+6)/2 = 5
(4+8)/2 = 6
(6+8)/2 = 7
I know the order could be different from the above, but it should each time give me an average 5 (in the case above: (3+4+5+5+6+7)/6 = 5).
Can anyone help me?
One Line answer:
mean(rowMeans(t(combn(x,2))))
explained step by step:
with 'draw 2 observations' you actually mean to select 2 objects/observations out of x:
x <- c(2, 4, 6, 8)
combn(x, 2)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 2 2 2 4 4 6
## [2,] 4 6 8 6 8 8
with t() you can transform it to:
t(combn(x,2))
## [,1] [,2]
## [1,] 2 4
## [2,] 2 6
## [3,] 2 8
## [4,] 4 6
## [5,] 4 8
## [6,] 6 8
The means you can calculate by rowMeans()
rowMeans(t(combn(x,2)))
## [1] 3 4 5 5 6 7
If you calculate the mean of that, you get what you want.
mean(rowMeans(t(combn(x,2))))
## [1] 5
I think OP was going in right-direction to use sample in order to draw 2 observations n times (as title suggests). But somehow angle of solution changed towards combn function (which is not a correct option).
An option is to use sample along with replicate as:
x <- c(2, 4, 6, 8)
mean(replicate(6, sample(x,2)))
#[1] 4.666667 #There is random-ness in set of observations selected.
In above attempt, replicate provides 6 sets of 2 observations as:
replicate(6, sample(x,2))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 2 4 4 4 8 8
# [2,] 6 2 2 8 6 2
Personally, I dont think combn is a correct option to use here. combn returns all possible combination of selected number of observations. That means, every observation (of x) will appear equal number of times in combination set received from combn function. This implies, that mean of combn(x,2) will be same as mean(x). Hence, what's point using combn in this case.

How to use data from every row to create a matrix by loop

I have a data frame like
df<-data.frame(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))
a b c d
1 1 4 7 10
2 2 5 8 11
3 3 6 9 12
I want to use every row to create 3 (nrow(df)) 2*2 matrixes. 1st use 1,4,7,10, 2nd use 2,5,8,11, 3rd use 3,6,9,12. So that I can get 3 matrixes. Thank you.
We can use split to split up the dataset into list and use matrix
lapply(split.default(as.matrix(df), row(df)), matrix, 2)
If we need the matrix columns to be 1, 7 followed by 4, 10, use the byrow=TRUE
lapply(split.default(as.matrix(df), row(df)), matrix, 2, byrow=TRUE)
Or use apply with MARGIN = 1 and wrap it with list to get a list output
do.call("c", apply(df, 1, function(x) list(matrix(x, ncol=2))))
If we need a for loop, preassign a as a list with length equal to the number of rows of 'df'
a <- vector("list", nrow(df))
for(i in 1:nrow(df)){ a[[i]] <- matrix(unlist(df[i,]), ncol=2)}
a
Or if it can be stored as array
array(t(df), c(2, 2, 3))
Or using map:
m <- matrix(c(t(df)), ncol = 2, byrow = T)
p <- 2 # number of rows
Map(function(i,j) m[i:j,], seq(1,nrow(m),p), seq(p,nrow(m),p))
# [[1]]
# [,1] [,2]
# [1,] 1 4
# [2,] 7 10
# [[2]]
# [,1] [,2]
# [1,] 2 5
# [2,] 8 11
# [[3]]
# [,1] [,2]
# [1,] 3 6
# [2,] 9 12

How to perform complex looping operation on matrix

I have a sample matrix like
5 4 3
2 6 8
1 9 7
and I want output like
max(5*6,5*8,5*9,5*7) // i!=j condition
max(4*2,4*8,4*1,4*7)
max(3*2,3*6,3*1,3*9)
And so on...
This maximum values obtained after computation should be in matrix form. I need to generalize it, therefore I need a generic code.
This gets the job done but is a pretty unimaginative solution, in that it just loops through the rows and columns performing the requested calculation instead of doing anything vectorized.
sapply(1:ncol(m), function(j) sapply(1:nrow(m), function(i) max(m[i,j]*m[-i,-j])))
# [,1] [,2] [,3]
# [1,] 45 32 27
# [2,] 18 42 72
# [3,] 8 72 42
Data:
(m <- matrix(c(5, 2, 1, 4, 6, 9, 3, 8, 7), nrow=3))
# [,1] [,2] [,3]
# [1,] 5 4 3
# [2,] 2 6 8
# [3,] 1 9 7

transforming dataset (similarity ratings)

I want to transform the following data format (simplified representation):
image1 image2 rating
1 1 2 6
2 1 3 5
3 1 4 7
4 2 3 3
5 2 4 5
6 3 4 1
Reproduced by:
structure(list(image1 = c(1, 1, 1, 2, 2, 3), image2 = c(2, 3,
4, 3, 4, 4), rating = c(6, 5, 7, 3, 5, 1)), .Names = c("image1",
"image2", "rating"), row.names = c(NA, -6L), class = "data.frame")
To a format where you get a sort of correlation matrix, where the first two columns figure as indicators, and ratings are the values:
1 2 3 4
1 NA 6 5 7
2 6 NA 3 5
3 5 3 NA 1
4 7 5 1 NA
Does any of you know of a function in R to do this?
I would rather use matrix indexing:
N <- max(dat[c("image1", "image2")])
out <- matrix(NA, N, N)
out[cbind(dat$image1, dat$image2)] <- dat$rating
out[cbind(dat$image2, dat$image1)] <- dat$rating
# [,1] [,2] [,3] [,4]
# [1,] NA 6 5 7
# [2,] 6 NA 3 5
# [3,] 5 3 NA 1
# [4,] 7 5 1 NA
I don't like the <<- operator very much, but it works for this (naming your structure s):
N <- max(s[,1:2])
m <- matrix(NA, nrow=N, ncol=N)
apply(s, 1, function(x) { m[x[1], x[2]] <<- m[x[2], x[1]] <<- x[3]})
> m
[,1] [,2] [,3] [,4]
[1,] NA 6 5 7
[2,] 6 NA 3 5
[3,] 5 3 NA 1
[4,] 7 5 1 NA
Not as elegant as Karsten's solution, but it does not rely on the order of the rows, nor does it require that all combinations be present.
Here is one approach, where dat is the data frame as defined in the question
res <- matrix(0, nrow=4, ncol=4) # dim may need to be adjusted
ll <- lower.tri(res, diag=FALSE)
res[which(ll)] <- dat$rating
res <- res + t(res)
diag(res) <- NA
This works only if the rows are ordered as in the question.

Resources