How to subset a vector inside list of list - r

I have the following list of list:
foo <- list(a = list(x = 1:10, y = 11:25), b = list(x = 1:10, y = 100:110))
It looks like this:
> foo
$a
$a$x
[1] 1 2 3 4 5 6 7 8 9 10
$a$y
[1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
$b
$b$x
[1] 1 2 3 4 5 6 7 8 9 10
$b$y
[1] 100 101 102 103 104 105 106 107 108 109 110
What I want to do is to only pick first 3 of the list yielding:
$a
$a$x
[1] 1 2 3
$a$y
[1] 11 12 13
$b
$b$x
[1] 1 2 3
$b$y
[1] 100 101 102
How can I achieve that?

A wild ?rapply appears:
rapply(foo, f=head, n=3, how="list")
#$a
#$a$x
#[1] 1 2 3
#
#$a$y
#[1] 11 12 13
#
#
#$b
#$b$x
#[1] 1 2 3
#
#$b$y
#[1] 100 101 102
This is a recursive apply which will go through each list until a non-list is found and then apply a f=unction. The how="list" argument just ensures that you get the same list structure back again.

Use nested lapply
lapply(foo, lapply, `[`, 1:3)
#$a
#$a$x
#[1] 1 2 3
#$a$y
#[1] 11 12 13
#$b
#$b$x
#[1] 1 2 3
#$b$y
#[1] 100 101 102
Or use head
lapply(foo, lapply, head, 3)

library(purrr)
foo %>% map_depth(2,head,3)
#$a
#$a$x
#[1] 1 2 3
#
#$a$y
#[1] 11 12 13
#
#
#$b
#$b$x
#[1] 1 2 3
#
#$b$y
#[1] 100 101 102

Related

how to find where the interval of continuous numbers starts and ends?

I have a vector
vec <- c(2, 3, 5, 6, 7, 8, 16, 19, 22, 23, 24)
The continuous numbers are:
c(2, 3)
c(5, 6, 7, 8)
c(22, 23, 24)
So the first vector starts at 2 and ends at 3;
for the second vector starts at 5 and ends at 8;
for the third vector starts at 22 and ends at 24;
There is a function to identify where the continuous numbers starts and ends?
By using diff to check the differences between each consecutive value, you can find where the difference is not +1.
diff(vec)
## [1] 1 2 1 1 1 8 3 3 1 1
c(1, diff(vec)) != 1
## [1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
Then use cumsum to make a group identifier:
cumsum(c(1, diff(vec))!=1)
## [1] 0 0 1 1 1 1 2 3 4 4 4
And use this to split your data up:
split(vec, cumsum(c(1, diff(vec))!=1))
##$`0`
##[1] 2 3
##
##$`1`
##[1] 5 6 7 8
##
##$`2`
##[1] 16
##
##$`3`
##[1] 19
##
##$`4`
##[1] 22 23 24
Which can be Filtered to consecutive values:
Filter(\(x) length(x) > 1, split(vec, cumsum(c(1, diff(vec))!=1)))
##$`0`
##[1] 2 3
##
##$`1`
##[1] 5 6 7 8
##
##$`4`
##[1] 22 23 24
Another one
vec=c( 2 , 3 , 5 , 6 , 7 , 8 , 16 , 19 , 22 , 23 , 24 )
x <- replace(NA, vec, vec)
# [1] NA 2 3 NA 5 6 7 8 NA NA NA NA NA NA NA 16 NA NA 19 NA NA 22 23 24
l <- split(x, with(rle(is.na(x)), rep(seq.int(length(lengths)), lengths)))
# l <- split(x, data.table::rleid(is.na(x))) ## same as above
l <- Filter(Negate(anyNA), l)
l
# $`2`
# [1] 2 3
#
# $`4`
# [1] 5 6 7 8
#
# $`6`
# [1] 16
#
# $`8`
# [1] 19
#
# $`10`
# [1] 22 23 24
If you have a length requirement:
l[lengths(l) > 1]
# $`2`
# [1] 2 3
#
# $`4`
# [1] 5 6 7 8
#
# $`10`
# [1] 22 23 24

How can you generate multiple vectors of different length in R?

I would like to use a for loop to generate multiple vectors and save their values for later use. The end result ideally would be:
vector_1 = c(1)
vector_2 = c(1,2,3)
vector_3 = c(1,2,3,4,5,6)
.
.
.
vector_i = c(1,2,3,...,n) #for some n generated during the loop. This n does not always have an upper bound.
This is so that I can use each vector later on to plot multiple lines on the same graph with the axis of the graph scaled correctly.
The following code is the best example I can come up with to try and describe the idea but obviously using 'vector_i' for each loop is not going to work.
for (i in 1:n){
length = sample(1:i^2,1)
vector_i = seq(1,length)
}
You could use the following function:
make_vectors <- function(n) lapply(seq(n), function(i) seq(sample(i^2, 1)))
Which allows:
vector <- make_vectors(5)
vector
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1 2 3 4
#>
#> [[3]]
#> [1] 1 2 3 4
#>
#> [[4]]
#> [1] 1 2 3 4 5 6
#>
#> [[5]]
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
And you can access each one like this:
vector[[5]]
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
This keeps your global environment tidy and allows you to iterate through your vectors more easily than writing them all as independent entities.
We could use sequence
v1 <- sequence(c(1, 3, 6))
v1
#[1] 1 1 2 3 1 2 3 4 5 6
If we need it in a list
split(v1, cumsum(v1 == 1))
#$`1`
#[1] 1
#$`2`
#[1] 1 2 3
#$`3`
#[1] 1 2 3 4 5 6

I want r to arrange/merge my resamples in vector

I have a vector from 1 to 10 which I divided into blocks of size 2 each and then resample the blocks but I am not happy with the result arrangement in r. the below r code demonstrates my experience:
ts <- 1:10 # generate a number from 1 to 10
bs <- 2 # let block size equals 2
nb <- length(ts) / bs # number of blocks
blk <- split(ts, rep(1:nb, each=bs)) # split the generated numbers into "blk"
res<-sample(blk, replace=T, 20) # resample the blk 20 times with replacement
res # gives me the below result
#$`5`
#[1] 9 10
#$`5`
#[1] 9 10
#$`5`
#[1] 9 10
#$`3`
#[1] 5 6
#$`1`
#[1] 1 2
#$`2`
#[1] 3 4
#$`5`
#[1] 9 10
#$`1`
#[1] 1 2
#$`1`
#[1] 1 2
#$`1`
#[1] 1 2
#$`4`
#[1] 7 8
#$`4`
#[1] 7 8
#$`4`
#[1] 7 8
#$`4`
#[1] 7 8
#$`4`
#[1] 7 8
#$`3`
#[1] 5 6
#$`2`
#[1] 3 4
#$`1`
#[1] 1 2
#$`1`
#[1] 1 2
#$`1`
#[1] 1 2
I rather want the result like:
(9,10,9,10,9,10,5,6,1,2,3,4,9,10,1,2,1,2,1,2,7,8,7,8,7,8,7,8,7,8,5,6,3,4,1,2,1,2,1,2)
such that I can call "res" as a univariate and I should be able to write the result into either one line or one column .csv file like this:
write.csv(res, "resamples.csv")
You can use unlist() to flatten the list structure. You can also remove the auto-generated names if you don't want/need them:
ts <- 1:10 # generate a number from 1 to 10
bs <- 2 # let block size equals 2
nb <- length(ts) / bs # number of blocks
blk <- split(ts, rep(1:nb, each=bs)) # split the generated numbers into "blk"
res<-unlist(sample(blk, replace=TRUE, 20)) # resample the blk 20 times with replacement
names(res) <- NULL # optional
res
#> [1] 3 4 9 10 1 2 7 8 3 4 3 4 7 8 5 6 7 8 3 4 1 2 3
#> [24] 4 9 10 5 6 7 8 9 10 9 10 5 6 9 10 1 2
Created on 2019-09-23 by the reprex package (v0.3.0)

Adding a vector to components of a list

I have the following list:
A <- c(11)
B <- c(7, 13)
C <- c(1, 10, 11, 12)
my_list <- list(A, B, C)
> my_list
[[1]]
[1] 11
[[2]]
[1] 7 13
[[3]]
[1] 1 10 11 12
I would like to add -2, -1, 0, 1, and 2 to each number in this list, and retain all of the unique values within each list element, to obtain the following resulting list:
> my_new_list
[[1]]
[1] 9 10 11 12 13
[[2]]
[1] 5 6 7 8 9 11 12 13 14 15
[[3]]
[1] -1 0 1 2 3 8 9 10 11 12 13 14
I tried the following code, but I did not get the result I was hoping for:
my_new_list <- lapply(res, `+`, -2:2)
> my_new_list
$`1`
[1] 9 10 11 12 13
$`2`
[1] 5 12 7 14 9
$`3`
[1] -1 9 11 13 3
Why is this happening, and how can I obtain the result I'd like? Thanks!
Assuming that we need the unique values
lapply(my_list, function(x) sort(unique(unlist(lapply(x, `+`, -2:2)))))
Or with outer
lapply(my_list, function(x) sort(unique(c(outer(x, -2:2, `+`)))))
Or with rep and recyling
lapply(my_list, function(x) sort(unique(rep(-2:2, each = length(x)) + x)))
#[[1]]
# [1] 9 10 11 12 13
#[[2]]
# [1] 5 6 7 8 9 11 12 13 14 15
#[[3]]
# [1] -1 0 1 2 3 8 9 10 11 12 13 14
How about this:
my_new_list <- lapply(my_list, function(x) unique(union(x,sapply(x, function(y) y +c(-2:2)) )))
my_new_list <- lapply(my_new_list, sort)
my_new_list
[[1]]
[1] 9 10 11 12 13
[[2]]
[1] 5 6 7 8 9 11 12 13 14 15
[[3]]
[1] -1 0 1 2 3 8 9 10 11 12 13 14

Is there a ready made function to transform a matrix into data frame of rows, columns and values?

I want to recast a matrix as a data frame with one column for rows, one for columns and one for the actual values stored in the matrix.
Let us have an example matrix like that:
mat <- matrix(paste0(rep(1:5,3), rep(1:3,each=5)), 5, 3)
mat
# [,1] [,2] [,3]
# [1,] "11" "12" "13"
# [2,] "21" "22" "23"
# [3,] "31" "32" "33"
# [4,] "41" "42" "43"
# [5,] "51" "52" "53"
With as.vector() and rep() I can construct the wanted result as such:
values <- as.vector(mat)
rows <- rep(seq_len(dim(mat)[1]), dim(mat)[2])
columns <- rep(seq_len(dim(mat)[2]), each=dim(mat)[1])
df <- data.frame(rows, columns, values)
df
# rows columns values
# 1 1 1 11
# 2 2 1 21
# 3 3 1 31
# 4 4 1 41
# 5 5 1 51
# 6 1 2 12
# 7 2 2 22
# 8 3 2 32
# 9 4 2 42
# 10 5 2 52
# 11 1 3 13
# 12 2 3 23
# 13 3 3 33
# 14 4 3 43
# 15 5 3 53
Now, is there a ready made maybe more performant function in R or one of its package to do this?
With the reshape2 package:
melt(mat)
Var1 Var2 value
1 1 1 11
2 2 1 21
3 3 1 31
4 4 1 41
5 5 1 51
6 1 2 12
7 2 2 22
8 3 2 32
9 4 2 42
10 5 2 52
11 1 3 13
12 2 3 23
13 3 3 33
14 4 3 43
15 5 3 53
A solution with base R:
as.data.frame.table(mat)
however, the above code does not produce the desired output because the matrix has no row/col names, and as.data.frame.table() puts letters instead. However, if the matrix has row/col names this works:
rownames(mat) = 1:5
colnames(mat) = 1:3
as.data.frame.table(mat)
That being said, melt is way faster with large matrices in my experience.
You could try which and it's arr.ind argument
values <- as.vector(mat)
data.frame(which(mat == values, arr.ind = TRUE), values)
# row col values
# 1 1 1 11
# 2 2 1 21
# 3 3 1 31
# 4 4 1 41
# 5 5 1 51
# 6 1 2 12
# 7 2 2 22
# 8 3 2 32
# 9 4 2 42
# 10 5 2 52
# 11 1 3 13
# 12 2 3 23
# 13 3 3 33
# 14 4 3 43
# 15 5 3 53
Another option is:
data.frame(Var1=c(row(mat)), Var2=c(col(mat)), value=c(mat))
expand.grid can produce the row and column combinations efficiently, and then you can cbind the data:
cbind(expand.grid(rows=seq(nrow(mat)),columns=seq(ncol(mat))),values=c(mat))
rows columns values
1 1 1 11
2 2 1 21
3 3 1 31
4 4 1 41
5 5 1 51
6 1 2 12
7 2 2 22
8 3 2 32
9 4 2 42
10 5 2 52
11 1 3 13
12 2 3 23
13 3 3 33
14 4 3 43
15 5 3 53
The ready-made function in the base of R is as.data.frame.table:
as.data.frame.table(mat)
## Var1 Var2 Freq
## 1 A A 11
## 2 B A 21
## 3 C A 31
## etc.
It uses the row and column names of mat as values in the first two output columns and if there are none it defaults to upper case letters, as seen above.
If you don't like letters, specify the dimnames, e.g. try this which specifies numbers rather than letters and specifies rows, columns and value as the column names:
dimnames(mat) <- list(rows = 1:nrow(mat), columns = 1:ncol(mat))
as.data.frame.table(mat, responseNAme = "value")
## rows columns value
## 1 1 1 11
## 2 2 1 21
## 3 3 1 31
## etc.
or for a non-destructive version using numbers but not changing the column names:
as.data.frame.table(mat, base = list(paste(1:nrow(mat)), paste(1:ncol(mat))))
## Var1 Var2 Freq
## 1 1 1 11
## 2 2 1 21
## 3 3 1 31
## etc.

Resources