sample list randomly and remove used values - r

I have a problem (maybe it is not that difficult but I cannot figure it out:
I have a list (l) of 25 and I want to divide the list into 5 groups but randomly. The problem I have is if I use sample(l, 5) and this 5times it does not give me unique samples. So basically, I am looking for is to choose 5 then remove them from the list and then sample again.
I hope someone has a solution... thanks

If you want Andrew's method as a function
sample2 <- function(x, sample.size){
split(x, sample(ceiling(seq_along(x)/sample.size)))
}
sample2(1:20, 5)
gives
$`1`
[1] 1 15 6 3 18
$`2`
[1] 11 7 5 10 14
$`3`
[1] 2 12 4 13 17
$`4`
[1] 19 16 20 8 9

Another method...
x <- 1:20
matrix(x[sample(seq_along(x),length(x))],ncol = 4)
Here we are randomly reordering your vector by sampling index values, then dumping results into a matrix so that its columns represent your five groups. You could also leave it as a vector, or make a list if you don't want your output as a matrix.

You could do something like this...
l <- as.list(LETTERS[1:25])
l2 <- split(l,rep(1:5,5)[sample(25)])
l2 #is then a list of five lists containing all elements of l...
$`1`
$`1`[[1]]
[1] "D"
$`1`[[2]]
[1] "I"
$`1`[[3]]
[1] "M"
$`1`[[4]]
[1] "W"
$`1`[[5]]
[1] "Y"
$`2`
$`2`[[1]]
[1] "C"
$`2`[[2]]
[1] "E"
$`2`[[3]]
[1] "H"
$`2`[[4]]
[1] "T"
$`2`[[5]]
[1] "X"
etc...

Related

How can I get a list of indices of values related to each other by lineage? [R]

suppose I have a dataframe where there are two columns that indicate a direct relationship between the parallel values.
c2 <- c(2,5,7,8,10)
c1 <- c(1,3,2,7,5)
df <- data.frame(c1, c2)
Such that:
1 is related to 2 [1],
2 is related to 7 [3],
7 is related to 8 [4]
So I get a vector of the indexes 1,3, and 4
and then 3 is related to 5 [2],
and 5 is related to 10 [5]
so I get a vector of the indexes 2 and 5?
It hurts my brain.
This could be effectively solved using the igraph library:
common_ids <- clusters(graph_from_data_frame(df, directed = FALSE))$membership
split(1:nrow(df), common_ids[match(df$c1, names(common_ids))])
$`1`
[1] 1 3 4
$`2`
[1] 2 5
If also members of the groups are of interest:
split(names(common_ids), common_ids)
$`1`
[1] "1" "2" "7" "8"
$`2`
[1] "3" "5" "10"
An option with igraph
lapply(
groups(components(graph_from_data_frame(df, directed = FALSE))),
function(x) Filter(Negate(is.na),match(x, as.character(df$c1)))
)
gives
$`1`
[1] 1 3 4
$`2`
[1] 2 5

Using mapply to select from elements from a nested list using multiple arguments

Apologies if this has already been answered somewhere, but I checked all the pages I could find and can’t find a solution to this specific problem.
I want to use an apply function to select elements from lists nested within a list. The element I want to select from the sub-lists vary based on a arguments contained in a separate list. Here is some example code to illustrate what I am trying to do:
# Set seed for replicable results
set.seed(123)
# Create list of lists populated with randomly generated numbers
list1 <- list()
for (i in 1:10) {
list1[[i]] <- as.list(sample.int(20, 10))
}
# Create second randomly generated list
list2 <- as.list(sample.int(10, 10))
# For loop with uses values from list 2 to call specific elements from sub-lists within list1
for (i in 1:10){
print(list1[[i]][[list2[[i]]]])
}
####################################################################################
[1] 4
[1] 8
[1] 5
[1] 8
[1] 15
[1] 17
[1] 12
[1] 15
[1] 3
[1] 15
As you can see, I can use a for loop to successfully to select elements from the sub-lists nested within list1 using values from list2 in combination with the iterating value i.
Solutions offered to questions like this (R apply function with multiple parameters), suggest that I should be able to achieve this same result using the mapply function. However, when I try to do this I get the following error:
# Attempt to replicate output using mapply
mapply(function(x,y,z) x <- x[[z]][[y[[z]]]], x=list1, y=list2, z=1:10 )
####################################################################################
Error in x[[z]][[y[[z]]]] : subscript out of bounds
My questions are:
How can my code can be altered to achieve the desired outcome?
What is causing this error? I have had similar problems with mapply in the past, when I have tried to input one or more lists alongside a vector, and have never been able to work out why it sometimes fails.
Many thanks in advance!
Try this. It is better to use a function to catch the desired values. The reason why you got an error is because functions works different when using indexing. It is better to set the function directly inside the *apply() sketch to reach the desired outcome. Here the code:
#Code
unlist(mapply(function(x,y) x[y],x=list1,y=list2))
Output:
[1] 4 8 5 8 15 17 12 15 3 15
Or if you want the output in a list:
#Code 2
List <- mapply(function(x,y) x[y],x=list1,y=list2)
Output:
List
[[1]]
[1] 4
[[2]]
[1] 8
[[3]]
[1] 5
[[4]]
[1] 8
[[5]]
[1] 15
[[6]]
[1] 17
[[7]]
[1] 12
[[8]]
[1] 15
[[9]]
[1] 3
[[10]]
[1] 15
Another simplified options can be (Many thanks and all credit to #27ϕ9):
#Code3
mapply(`[[`, list1, list2)
Output:
[1] 4 8 5 8 15 17 12 15 3 15
Or:
#Code4
mapply(`[`, list1, list2)
Output:
[[1]]
[1] 4
[[2]]
[1] 8
[[3]]
[1] 5
[[4]]
[1] 8
[[5]]
[1] 15
[[6]]
[1] 17
[[7]]
[1] 12
[[8]]
[1] 15
[[9]]
[1] 3
[[10]]
[1] 15
If you look at your for loop there is only one variable which is changing i.e i. So in this case you can use lapply or even sapply since you are getting a single number back.
sapply(1:10, function(i) list1[[i]][[list2[[i]]]])
#[1] 4 8 5 8 15 17 12 15 3 15

vectors in a list - how to extract element from one of the vectors

I want to extract the 3rd element of the second vector of the first
sub-list.....
This is the list of vectors
A <- letters[1:4]
B <- letters[5:10]
C <- letters[11:15]
D <- c(1:10)
E <- c(20:5)
Z <- list(x = c(A,B,C), y = c(D, E))
which returns
>Z
$x
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o"
$y
[1] 1 2 3 4 5 6 7 8 9 10 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5
I've tried this
Z[[1]][B[3]]
but it returns
[1] NA
Thank you in advance
There is no way to differentiate between A, B and C when you use them in a vector (c(A, B, C)). I think the elements in x and y should be in a list :
Z <- list(x = list(A,B,C), y = list(D, E))
If you have that we can do :
Z[[1]][[2]][3]
#[1] "g"
This will return 3rd element of the second vector of the first sub-list.
On lists, you have these two options, for example, extract the second value, of the first element of your list. The first system is only valid, when you give names to the elements of your list.
Z$x[2]
Z[[1]][2]

How to expand.grid specific objects in a list to form a new list

I have a list data as follows:
a<-list(10,c(8,9),5,14,c(3,7),c(2,3),5,13,c(3,4),4,5,8,12,c(2,3),c(5,7))
a
[[1]]
[1] 10
[[2]]
[1] 8 9
[[3]]
[1] 5
[[4]]
[1] 14
[[5]]
[1] 3 7
[[6]]
[1] 2 3
[[7]]
[1] 5
[[8]]
[1] 13
[[9]]
[1] 3 4
[[10]]
[1] 4
[[11]]
[1] 5
[[12]]
[1] 8
[[13]]
[1] 12
[[14]]
[1] 2 3
[[15]]
[1] 5 7
Then I want to use "expand.grid" in every 3 objects in list a.
That is to say, to expand.grid 1-3, 4-6, 7-9, 10-12, 13-15, respectively, then combine these result to a new list form.
Result should be something like following appearance.
I just use the foolishest way to solve this problem:list(expand.grid(a[1:3]),expand.grid(a[4:6]),expand.grid(a[7:9]),expand.grid(a[10:12]),expand.grid(a[13:15]))
When I try to use "sapply": sapply(1:(length(a)/3), function(x){expand.grid(a[1:3+3*x-3])})it didn't work, the result is as follows:
I don't know why, and could you help me with this problem, thank you so much!
We can create a grouping index with gl, split the sequence of 'a', subset the list elements using the index and use expand.grid.
lapply(split(seq_along(a), as.numeric(gl(length(a), 3, length(a)))),
function(i) expand.grid(a[i]))
We can also use sapply, but make sure we use simplify=FALSE as the option
The OP's code with simplify=FALSE gives
sapply(1:(length(a)/3), function(x)
{expand.grid(a[1:3+3*x-3])}, simplify=FALSE)
According to ?sapply
simplify: logical or character string; should the result be simplified
to a vector, matrix or higher dimensional array if possible? For
sapply it must be named and not abbreviated. The default value, TRUE,
returns a vector or matrix if appropriate, whereas if simplify =
"array" the result may be an array of “rank” (=length(dim(.))) one
higher than the result of FUN(X[[i]]).

Replace all values of a recursive list with values of a vector

Say, I have the following recursive list:
rec_list <- list(list(rep(1,5), 10), list(rep(100, 4), 20:25))
rec_list
[[1]]
[[1]][[1]]
[1] 1 1 1 1 1
[[1]][[2]]
[1] 10
[[2]]
[[2]][[1]]
[1] 100 100 100 100
[[2]][[2]]
[1] 20 21 22 23 24 25
Now, I would like to replace all the values of the list, say, with the vector seq_along(unlist(rec_list)), and keep the structure of the list. I tried using the empty index subsetting like
rec_list[] <- seq_along(unlist(rec_list))
But this doesn't work.
How can I achieve the replacement while keeping the original structure of the list?
You can use relist:
relist(seq_along(unlist(rec_list)), skeleton = rec_list)
# [[1]]
# [[1]][[1]]
# [1] 1 2 3 4 5
#
# [[1]][[2]]
# [1] 6
#
#
# [[2]]
# [[2]][[1]]
# [1] 7 8 9 10
#
# [[2]][[2]]
# [1] 11 12 13 14 15 16
If you wanted to uniquely index each element of a nested list, you could start with the rapply() function which is the recursive form of the apply() family. Here I use a special function that can uniquely index across a list of any structure
rapply(rec_list,
local({i<-0; function(x) {i<<-i+length(x); i+seq_along(x)-length(x)}}),
how="replace")
other functions are simplier, for example if you just wanted to seq_along each subvector
rapply(rec_list, seq_along, how="replace")

Resources