Extract elements from a vector within lists - r

I have a list of length 30000 and each list element contains one vector of length 6.
Example (with a length of just 2):
trainLists <- list(c(1,2,3,4,5,6),c(7,8,9,10,11,12))
I want to "flatten" these lists into a dataframe and create 6 factors (one corresponding to each of the elements in the vectors in the list).
Thus, the result would be:
I can accomplish this with a loop such as
for (i in 1:length(trainLists){
factor1 [i] <- trainLists[[i]][1]
factor2 [i] <- trainLists[[i]][2]
factor3 [i] <- trainLists[[i]][3]
factor4 [i] <- trainLists[[i]][4]
factor5 [i] <- trainLists[[i]][5]
factor6 [i] <- trainLists[[i]][6]
}
but it is horribly slow. How best to accomplish this?

As noted in the comments, most of what you want to do is achieved with a simple do.call(rbind, ...), like this:
> trainLists <- list(c(1,2,3,4,5,6),c(7,8,9,10,11,12))
> trainLists
[[1]]
[1] 1 2 3 4 5 6
[[2]]
[1] 7 8 9 10 11 12
> do.call(rbind, trainLists)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 7 8 9 10 11 12
Taking things a few steps forward, you can do something like this:
cbind(example = seq_along(trainLists),
setNames(data.frame(do.call(rbind, trainLists)),
paste0("Factor_", sequence(
max(sapply(trainLists, length))))))
# example Factor_1 Factor_2 Factor_3 Factor_4 Factor_5 Factor_6
# 1 1 1 2 3 4 5 6
# 2 2 7 8 9 10 11 12

Related

Extract specific elements in a matrix with a character vector in R

I want to extract specific elements column wise from the matrix A with the information from a character vector B (contain elements in the row names of the matrix) such as:
A <- matrix(seq(1,12),ncol=4)
rownames(A) <- letters[1:3]
A
[,1] [,2] [,3] [,4]
a 1 4 7 10
b 2 5 8 11
c 3 6 9 12
B <- c("a","c","c","b")
I want to get 1,6,9,11. Thanks :)
Two possible ways:
> A[cbind(match(B, rownames(A)), seq_len(ncol(A)))]
[1] 1 6 9 11
>
> diag(A[B, seq_along(B)]) # or diag(A[B, seq_len(ncol(A))])
[1] 1 6 9 11

Sampling a number of indivuals in subgroups with no repeating group constellation in R

I have a number of individuals that I want to - randomly - divide in subgroups of size groupsize. This process I want to repeat n_group times - with no repeating group constellation.
How can I achieve this in R?
I tried the following so far:
set.seed(1)
individuals <- 1:6
groupsize <- 3
n_groups <- 4
for(i in 1:n_groups) { print(sample(individuals, groupsize))}
[1] 1 4 3
[1] 1 2 6
[1] 3 2 6
[1] 3 1 5
..but am not sure whether that really does not lead to repeating constellations..?
Edit: After looking at the first suggestions and answers I realized, that another restriction could be interesting to me (sorry for not seeing it upfront..).
Is there (in the concrete example above) a way to ensure, that every individual was in contact with every other individual?
Based on your edited question, I assuma that you want to make sure that all indivuals are in at least one subgroup?
Then this might be the solution:
individuals <- 1:6
groupsize <- 3
n_groups <- 4
#sample groups
library(RcppAlgos)
#initialise
answer <- matrix()
# If the length of all unique elements in the answer is smaller than
# the number of individuals, take a new sample
while (length(unique(as.vector(answer))) < length(individuals)) {
answer <- comboSample(individuals, groupsize, n = n_groups)
# Line below isfor demonstration only
#answer <- comboSample(individuals, groupsize, n = n_groups, seed = 123)
}
# sample answer with seed = 123 (see commented line above)
# [,1] [,2] [,3]
# [1,] 1 3 4
# [2,] 1 3 6
# [3,] 2 3 5
# [4,] 2 3 4
test for groups that contain not every individual
# Test with the following matrix
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 3 4
# [3,] 1 4 5
# [4,] 2 3 4
# Note that individual '6' is not present
answer <- matrix(c(1,2,3,1,3,4,1,4,5,2,3,4), nrow = 4, ncol = 3)
while (length(unique(as.vector(answer))) < length(individuals)) {
answer <- comboSample(individuals, groupsize, n = n_groups)
}
# is recalculated to (in this case) the following answer
# [,1] [,2] [,3]
# [1,] 4 5 6
# [2,] 3 4 5
# [3,] 1 3 6
# [4,] 2 4 5
PASSED ;-)
You can use while to dynamically update your combination set, which avoids duplicates, e.g.,
res <- c()
while (length(res) < pmin(n_groups, choose(length(individuals), groupsize))) {
v <- list(sort(sample(individuals, groupsize)))
if (!v %in% res) res <- c(res, v)
}
which gives
> res
[[1]]
[1] 2 5 6
[[2]]
[1] 2 3 6
[[3]]
[1] 1 5 6
[[4]]
[1] 1 2 6

How to repeatedly generate non-repeating smaller groups from a larger set

I have a large set of size M (let's say 10), and I want to, repeatedly for a certain number of occasions (let's say 13), randomly split it into M/N smaller groups of size N (let's say 2). I'd like no element in the large set to be in a repeating group until they have been in a small group with every one else. (The actual problem here: I have a class of 10 people and I want to split them into 5 pairs for a duration of 13 weeks, but I don't want anyone to be in a repeat pairing until they have been in a pairing with everyone in the class.)
How can I do this? I started by [generating non-repeating permutations from my larger group][1], but the trouble I am having is that these unique permutations don't necessarily yield unique groups. (Someone seems to have posed this same question, but [it was resolved in Python][2]. I don't understand Python, and so I'm looking for an easy R solution.)
Any help much appreciated.
Edit: Thanks to all for suggestions. I realize my original question wasn't exactly clear. The solutions suggested below work well when I only want to split the set into a single subset of size N, each time. But my problem is actually that I want to split the set into M/N subsets of size N. For example, in the case of my class, I want to split the 10 students into 5 pairs of 2 on 13 different occasions, and I want pairs to be unique until they no longer can be (i.e., after 9 occasions have passed). Unless I'm failing to see how they can be applied, I don't think any of these solutions quite solves this problem.
I see that the OP has provided a solution from the linked math.so solution, but I would like to provide a working solution of the other answer on that page that gets to the heart of this problem. That solution mentions Round-robin tournament. From the wikipedia page, the algorithm is straightforward.
One simply fixes a position in a matrix and rotates the other indices clockwise. Given M initial players, there are M - 1 unique rounds. Thus, for our given situation, we can only obtain 9 unique sets of groups.
Below, is a very straightforward base R implementation:
roll <- function( x , n ){
if( n == 0 )
return(x)
c(tail(x,n), head(x,-n))
}
RoundRobin <- function(m, n) {
m <- as.integer(m)
n <- as.integer(n)
if (m %% 2L != 0L) {
m <- m + 1L
}
myRounds <- list(n)
myRounds[[1]] <- 1:m
for (i in 2:n) {
myRounds[[i]] <- myRounds[[i - 1L]]
myRounds[[i]][2:m] <- roll(myRounds[[i]][-1], 1)
}
lapply(myRounds, matrix, nrow = 2)
}
The roll function was obtained from this answer.
Here is sample output for 10 students and 4 weeks:
RoundRobin(10, 4)
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 4 6 8
[2,] 10 3 5 7 9
[[3]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 10 3 5 7
[2,] 9 2 4 6 8
[[4]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 9 2 4 6
[2,] 8 10 3 5 7
When we hit the 10th week, we see our first repeat "round".
RoundRobin(10, 13)[c(1, 2, 9, 10, 11)]
[[1]]
[,1] [,2] [,3] [,4] [,5] ## <- first week
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
[[2]]
[,1] [,2] [,3] [,4] [,5] ## <- second week
[1,] 1 2 4 6 8
[2,] 10 3 5 7 9
[[3]]
[,1] [,2] [,3] [,4] [,5] ## <- ninth week
[1,] 1 4 6 8 10
[2,] 3 5 7 9 2
[[4]]
[,1] [,2] [,3] [,4] [,5] ## <- tenth week
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
[[5]]
[,1] [,2] [,3] [,4] [,5] ## <- eleventh week
[1,] 1 2 4 6 8
[2,] 10 3 5 7 9
Note, this is a deterministic algorithm and given the simplicity, it is pretty efficient. E.g. if you have 1000 students and want to find all 999 unique pairings, you can run this function without fear:
system.time(RoundRobin(1000, 999))
user system elapsed
0.038 0.001 0.039
I think you maybe want something like this. It will produce a data frame with the unique combinations in rows. These are sampled randomly until all unique combinations are exhausted. Thereafter, if more samples are required it will sample randomly with replacement from unique combinations:
create_groups <- function(M, N, samples)
{
df <- seq(N) %>%
lapply(function(x) M) %>%
do.call(expand.grid, .) %>%
apply(1, sort) %>%
t() %>%
as.data.frame() %>%
unique()
df <- df[apply(df, 1, function(x) !any(duplicated(x))), ]
df <- df[sample(nrow(df)), ]
if(samples <= nrow(df)) return(df[seq(samples), ])
rbind(df, df[sample(seq(nrow(df)), samples - nrow(df), TRUE), ])
}
It's easy to see how it works if we want groups of 4 elements from 5 objects (there are only 5 possible combinations):
create_groups(letters[1:5], 4, 5)
#> V1 V2 V3 V4
#> 1 a b d e
#> 2 a b c d
#> 3 a c d e
#> 4 b c d e
#> 5 a b c e
We have a randomly-ordered sample of 4 objects drawn from the set, but no repeats. (the elements within each sample are ordered alphabetically however)
If we want more than 5 samples, the algorithm ensures that all unique combinations are exhausted before resampling:
create_groups(letters[1:5], 4, 6)
#> V1 V2 V3 V4
#> 1 a b c e
#> 2 a c d e
#> 3 a b d e
#> 4 b c d e
#> 5 a b c d
#> 6 a b d e
Here we see there are no repeated rows until row 6, which is a repeat of row 3.
For the example in your question, there are 45 unique combinations of 2 elements drawn from 10 objects, so we get no repeats in our 13 samples:
create_groups(1:10, 2, 13)
#> V1 V2
#> 1 7 8
#> 2 4 10
#> 3 2 8
#> 4 3 10
#> 5 3 9
#> 6 1 8
#> 7 4 9
#> 8 8 9
#> 9 7 9
#> 10 4 6
#> 11 5 7
#> 12 9 10
#> 13 4 7
I am not sure combn + sample can work for your goal
as.data.frame(t(combn(M, N))[sample(K <- choose(length(M), N), i, replace = K < i), ])
which gives
V1 V2
1 4 9
2 4 8
3 1 9
4 6 10
5 5 9
6 2 10
7 3 7
8 7 8
9 6 7
10 1 7
11 6 8
12 5 6
13 3 8
With apologies to all for not writing a clear question, here is a solution based on the solution suggested in this post. (Depending on the seed, it can get stuck, and if weeks are larger, the code to recycle old groups has to be adjusted a little.)
set.seed(1)
m<-10
n<-2
weeks<-13
groupmat<-combn(m,n)
students <- c(1:m)
pickedpairs <- matrix(
data=NA,
nrow=n,
ncol=0
)
while( ncol(pickedpairs) < ((m-1)*(m/n)) ) {
thisweekspairs <- matrix(sample(students),nrow=n,ncol=m/n)
#check if this weeks pairs
#are already in pickedpairs
#if so, skip iteration
pairsprez <- lapply(1:ncol(thisweekspairs),function(j) {
#j<-1
apply(pickedpairs,2,function(x) sum(x%in%thisweekspairs[,j])==n)
}) %>% Reduce(f="|") %>% sum
if(pairsprez>=1) {
pickedpairs<<-pickedpairs
} else {
pickedpairs<<-cbind(pickedpairs,thisweekspairs)
}
print(ncol(pickedpairs))
}
uniquepairs <- lapply(1:(ncol(pickedpairs)/(m/n)),function(i) {
pickedpairs[,(1 + (m/n)*(i-1)):((m/n)*i)]
})
#generate weeks' number of unique pairs
combine(
uniquepairs,
uniquepairs[sample(1:length(uniquepairs),weeks-length(uniquepairs))]
)
We could use slice_sample with combn
library(dplyr)
library(purrr)
combn(M, N, simplify = FALSE) %>%
invoke(rbind, .) %>%
as_tibble %>%
slice_sample(n = i)
# A tibble: 13 x 2
# V1 V2
# <int> <int>
# 1 4 5
# 2 3 8
# 3 9 10
# 4 5 7
# 5 8 9
# 6 3 9
# 7 5 10
# 8 4 10
# 9 2 5
#10 5 6
#11 6 9
#12 2 7
#13 4 9

R - How to rbind two lists while alternating their list elements

I'd like to know how to rbind two lists containing vectors into a data frame. e.g.
a<-list(c(1,2,3,4,5), c(2,3,4,5,6))
b<-list(c(3,4,5,6,7), c(4,5,6,7,8))
How to make a data frame from the two lists as the following:
1 2 3 4 5
3 4 5 6 7
2 3 4 5 6
4 5 6 7 8
So I need to take the first element of each list and then rbind them. Then take the second element of each list and then rbind to the previous data frame. I know I could use a for loop but is there a better and faster way to do this?
A variation on #DiscoSuperfly's answer that will work with objects of uneven length, like:
a <- list(c(1,2,3,4,5), c(2,3,4,5,6), c(1,1,1,1,1))
b <- list(c(3,4,5,6,7), c(4,5,6,7,8))
An answer:
L <- list(a,b)
L <- lapply(L, `length<-`, max(lengths(L)))
do.call(rbind, do.call(Map, c(rbind, L)))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 2 3 4 5
#[2,] 3 4 5 6 7
#[3,] 2 3 4 5 6
#[4,] 4 5 6 7 8
#[5,] 1 1 1 1 1
A solution using the purrr package.
library(purrr)
map2_dfr(a, b, ~data.frame(rbind(.x, .y)))
X1 X2 X3 X4 X5
1 1 2 3 4 5
2 3 4 5 6 7
3 2 3 4 5 6
4 4 5 6 7 8
Reduce(rbind,Map(rbind,a,b))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 3 4 5 6 7
[3,] 2 3 4 5 6
[4,] 4 5 6 7 8
Of the answers given, this seems the fastest when using two lists, thanks in large part to #thelatemail's suggested edit (thanks!).
Try this:
rbab<-do.call(rbind,c(a,b)); rbind(rbab[c(TRUE,FALSE),],rbab[c(FALSE,TRUE),])
Output:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 3 4 5 6 7
[3,] 2 3 4 5 6
[4,] 4 5 6 7 8
Using c(TRUE,FALSE) above rbinds every other line a and b; then we flip that to c(FALSE,TRUE) to get the rest. Finally, we rbind it all together.
EDIT: Speed Test
Here's a larger scale speed test, for an objective comparison, which used two lists of 6000 elements each instead of the original a and b provided. A total of 100 iterations were used to estimate these statistics.
#Sample used:
a<-list(c(1,2,3,4,5),c(2,3,4,5,6))
b<-list(c(3,4,5,6,7),c(4,5,6,7,8))
a<-a[rep(1:2,3e3)]
b<-a[rep(1:2,3e3)]
#Here is the collaboration version (with #thelatemail):
func1 <- function(){
rbab<-do.call(rbind,c(a,b)); rbind(rbab[c(TRUE,FALSE),],rbab[c(FALSE,TRUE),])
}
#Here is my original version:
func2 <- function(){
rbind(do.call(rbind,c(a,b))[c(TRUE,FALSE),],do.call(rbind,c(a,b))[c(FALSE,TRUE),])
}
#Here's a base-R translation of #ycw's answer (*translated by thelatemail)
func3 <- function(){
do.call(rbind, Map(rbind, a, b))
}
#Here is #Onyambu's answer (also a great answer for its brevity!):
func4 <- function(){
Reduce(rbind,Map(rbind,a,b))
}
microbenchmark::microbenchmark(
func1(),func2(),func3(),func4()
)
Unit: microseconds
expr min lq mean median uq max neval
func1() 4.39 6.46 14.74 15.85 20.24 31.94 100
func2() 5789.26 6578.83 7114.21 7027.57 7531.52 9411.05 100
func3() 10279.50 10970.70 11611.90 11245.47 11866.70 16315.00 100
func4() 251098.18 265936.30 273667.45 275778.04 281740.77 291279.20 100
I created a new list with both a and b, and then make it a matrix. I am sure there is a more elegant way to do this.
a <- list(c(1,2,3,4,5), c(2,3,4,5,6), c(1,1,1,1,1))
b <- list(c(3,4,5,6,7), c(4,5,6,7,8))
# empty list
ab <- vector("list", length = length(a) + length(b))
# put a and b in correct locations
ab[seq(1, length(ab), 2)] <- a
ab[seq(2, length(ab), 2)] <- b
# make the matrix
res <- t(matrix(unlist(ab), nrow=5, ncol=length(a) + length(b)))
> ab <-rbind(unlist(a), unlist(b))
> ab <- rbind(ab[,1:5], ab[,6:10])
> ab
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 3 4 5 6 7
[3,] 2 3 4 5 6
[4,] 4 5 6 7 8
I would do:
d <- t(as.data.frame(c(a,b)))
rbind( d[ seq(1,nrow(d),by=2) ,] , d[ seq(2,nrow(d),by=2) ,])

How to delete an element at a time from a vector while retaining the others?

I have a vector x containing 5 elements.
x <- (1,2,3,4,5)
I would want to delete one element at each iteration and retain other elements in the vector.(as shown below)
x <- (2,3,4,5) #vector after first iteration
x <- (1,3,4,5) #vector after second iteration
x <- (1,2,4,5) #vector after third iteration
x <- (1,2,3,5) #vector after fourth iteration
and also, is it possible to store these new vectors in a list?
is there a way to extend this to multiple vectors?
You could use combn:
combn(5,4)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 2
[2,] 2 2 2 3 3
[3,] 3 3 4 4 4
[4,] 4 5 5 5 5
To get the data as a list:
as.list(data.frame(combn(5,4)))
To use this on multiple vectors or a matrix, first transform it into a data.frame, to make it easier for lapply to go over the length (columns) of the data.frame. Then you can use lapply with combn like so:
mat <- data.frame(matrix(1:10,5))
lapply(mat, function(x) combn(x,length(x)-1))
$X1
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 2
[2,] 2 2 2 3 3
[3,] 3 3 4 4 4
[4,] 4 5 5 5 5
$X2
[,1] [,2] [,3] [,4] [,5]
[1,] 6 6 6 6 7
[2,] 7 7 7 8 8
[3,] 8 8 9 9 9
[4,] 9 10 10 10 10
We can do
lapply(seq_along(x), function(i) x[-i])
drop_n <- function(n, x) x[-n]
lapply(1:5, drop_n, x)
Here you have a way to get what you want. You only need to change the parameter n to make it more general
# Generate a list
L <- list()
# Define the number of elements
n <- 5
# Define the values
values <- 1:n
# Complete the list
for (i in 1:n){
L[[i]] <- values[-i]
}

Resources