Create a rolling list in R - r

Given a vector (column of a data frame), I'd like to create a rolling vector.
l = 0:10
Would return, (with a window of 3):
[0, 1, 2], [1, 2, 3], [2, 3, 4], [3, 4, 5] ...

1) rollapply r is a 9x3 matrix each of whose rows is one of the list elements asked for and split turns that into a list of vectors. Although this gives what you asked for it may be that you just want to iterate over that list and in that case it might be easier to just replace c with whatever function you wanted to use in that iteration. e.g. rollapply(l, 3, sd)
library(zoo)
l <- 0:10 # test input
r <- rollapply(l, 3, c)
split(r, row(r))
giving:
$`1`
[1] 0 1 2
$`2`
[1] 1 2 3
$`3`
[1] 2 3 4
$`4`
[1] 3 4 5
$`5`
[1] 4 5 6
$`6`
[1] 5 6 7
$`7`
[1] 6 7 8
$`8`
[1] 7 8 9
$`9`
[1] 8 9 10
2) embed This could also be done using base R like this:
r <- embed(l, 3)[, 3:1]
split(r, row(r))

You can use the following function (I am assuming you want the values to be sorted first. If not, just remove the line of code where I am using sort()) :
roll<-function(list,window){
list<-sort(list,decreasing = FALSE)
res<-vector(mode = "list")
for(i in 1:(length(list) - window + 1)){
res[[i]]<-list[i:(i + window - 1)]
}
return(res)
}
Enter your column/list values in the argument along with the window size you want and it should give you the desired output.
For example:
test<-0:10
roll(list = test,window = 3)
This results in the following output:
[[1]]
[1] 0 1 2
[[2]]
[1] 1 2 3
[[3]]
[1] 2 3 4
[[4]]
[1] 3 4 5
[[5]]
[1] 4 5 6
[[6]]
[1] 5 6 7
[[7]]
[1] 6 7 8
[[8]]
[1] 7 8 9
[[9]]
[1] 8 9 10
You can use this function for other cases and even change the window size as per your requirements.
Hope that helps!

Related

From a list of numeric values, create a list of indices

I have a list of numeric vectors:
a <- list(c(2, 3, 4, 5, 6, 7), c(4, 5, 6, 7, 8), c(6, 7, 8, 9, 10))
> a
[[1]]
[1] 2 3 4 5 6 7
[[2]]
[1] 4 5 6 7 8
[[3]]
[1] 6 7 8 9 10
I want to create a list where each element corresponds to values from 1 to the max value in the original list "a". The values in each element of the new list are the indices in the original list containing the focal value.
For example, the first element in the result contains the indices in "a" with the value 1. Because no element contains 1, the result is NULL. The second element contains the indices in "a" with the value 2, i.e. the first element, 1. The value 4 is found in element 1 and 2.
> res
[[1]]
NULL
[[2]]
[1] 1
[[3]]
[1] 1
[[4]]
[2] 1 2
[[5]]
[2] 1 2
[[6]]
[3] 1 2 3
[[7]]
[3] 1 2 3
[[8]]
[2] 2 3
[[9]]
[1] 3
[[10]]
[1] 3
I tried this with nested loops, but it is taking too much time and growing lists within loops is very slow. I have 60,000 sublists in my main list, so is there vectorized solution for this ?
Thanks in Advance.
Here is a base R way.
lapply(seq.int(max(unique(unlist(a)))), \(i){
which(sapply(a, \(x) any(i == x)))
})
Another way:
searchInList <- function(list2search, e){
idx2search <- 1:length(list2search)
list2search2 <- lapply(list2search, `length<-`, max(lengths(list2search)))
output <- matrix(unlist(list2search2), ncol = length(list2search2[[1]]), byrow = TRUE)
idx <- apply(output, 1, function(x){ (e %in% x) } )
return(idx2search[idx])
}
result <- lapply(1:max(unlist(a)), function(x) { searchInList(a, x) } )
Here is one way using match and rapply.
apply(matrix(rapply(a, \(x) !is.na(match(1:max(unlist(a)), x))),,length(a)), 1, which)
# [[1]]
# integer(0)
#
# [[2]]
# [1] 1
#
# [[3]]
# [1] 1
#
# [[4]]
# [1] 1 2
#
# [[5]]
# [1] 1 2
#
# [[6]]
# [1] 1 2 3
#
# [[7]]
# [1] 1 2 3
#
# [[8]]
# [1] 2 3
#
# [[9]]
# [1] 3
#
# [[10]]
# [1] 3
Another solution using base R:
apply(sapply(a, `%in%`, x = seq_len(max(unlist(a)))), 1, which)
A tidyverse approach:
library(purrr)
a <- list(c(2, 3, 4, 5, 6, 7), c(4, 5, 6, 7, 8), c(6, 7, 8, 9, 10))
i = 1:10
map(i, ~map_int(imap(a, ~(..3 %in% .x)*.y, i), ~.x[.y], .x) %>% .[. != 0])
The logic is to get a list of positions of TRUE values, and multiple this by the list element index. Here, the first element of the first vector, first element of the second vector, third element of the third vector form all matches, and thus the first element in the target list.
imap(a, ~(..3 %in% .x)*.y, i)
[[1]]
[1] 0 1 1 1 1 1 1 0 0 0
[[2]]
[1] 0 0 0 2 2 2 2 2 0 0
[[3]]
[1] 0 0 0 0 0 3 3 3 3 3

Accessing dataframes within a nested list

I have a list of three different types of datasets, with ten datasets in each type. It looks like this:
mat1 <- replicate(n=10,data.frame(matrix(data=rnorm(20,0,1),nrow=5,ncol=5)),simplify=FALSE)
mat2 <- replicate(n=10,data.frame(matrix(data=rnorm(20,0,1),nrow=5,ncol=5)),simplify=FALSE)
mat3 <- replicate(n=10,data.frame(matrix(data=rnorm(20,0,1),nrow=5,ncol=5)),simplify=FALSE)
combined <- list(mat1,mat2,mat3)
I want to apply the same function to each of the datasets, but I can't figure out how to access them. I tried using map from purrr, but it only applies it to the first one in the list:
map(combined[[i]],~length(.))
[[1]]
[1] 5
[[2]]
[1] 5
[[3]]
[1] 5
[[4]]
[1] 5
[[5]]
[1] 5
[[6]]
[1] 5
[[7]]
[1] 5
[[8]]
[1] 5
[[9]]
[1] 5
[[10]]
[1] 5
How can I apply a function to all datasets in a nested list?
*The function is more complex than length - it's a function from another package that I need to access using ~function
You can apply lengths on each list in combined :
lapply(combined, lengths)
#[[1]]
# [1] 5 5 5 5 5 5 5 5 5 5
#[[2]]
# [1] 5 5 5 5 5 5 5 5 5 5
#[[3]]
# [1] 5 5 5 5 5 5 5 5 5 5
Using purrr's map :
purrr::map(combined, lengths)
If length is just an example and you want a general way to apply a function to each nested list you may use nested lapply :
lapply(combined, function(x) lapply(x, function(y) length(y)))
Or use rapply :
rapply(combined, length, how = 'list')

How to check if the given value belong to the vectors in list?

Suppose we have a value y=4, and a list of vectors, I want to check if this value belongs to any vector in the list if yes, I will add this value to all the elements of vectors.
y<-4
M<- list( c(1,3,4,6) , c(2,3,5), c(1,3,6) ,c(1,4,5,6))
> M
[[1]]
[1] 1 3 4 6
[[2]]
[1] 2 3 5
[[3]]
[1] 1 3 6
[[4]]
[1] 1 4 5 6
The outcomes will be similar to :
> R
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
We can use keep which only keeps elements that satisfy a predicate. In this case, it is only keeping the vectors that contain y.
We then add y to each of the vectors.
library('tidyverse')
keep(M, ~y %in% .) %>%
map(~. + y)
Here is a simple hacky way to do this:
lapply(M[sapply(M, function(x){y %in% x})],function(x){x+y})
returning:
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
Logic: use sapply to work out which parts of M have a 4 in, then add 4 to those with lapply
You can do this with...
lapply(M[sapply(M, `%in%`, x=y)], `+`, y)
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
Here is a method with lapply and set functions.
# loop through M, check length of intersect
myList <- lapply(M, function(x) if(length(intersect(y, x)) > 0) x + y else NULL)
# now subset, dropping the NULL elements
myList <- myList[lengths(myList) > 0]
this returns
myList
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
Wow! everyone has given great answers, just including the use of Map functionality.
Map("+",M[unlist(Map("%in%", y,M))],y)
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10

R: enumerating sequences of permutations

I want to enumerate the distinct sequences of different permutations, and I'm using the function permn. I understand for 2!, I can just use permn(2) and that will enumerate 1, 2 and 2, 1.
> library(combinat)
> permn(2)
[[1]]
[1] 1 2
[[2]]
[1] 2 1
I want to do the same thing for the numbers 7 and 8. So what should I pass into the function so that it will return something like this?
> permn(...)
[[1]]
[1] 7 8
[[2]]
[1] 8 7
permn(c(7,8))
#[[1]]
#[1] 7 8
#
#[[2]]
#[1] 8 7

Splitting numeric vectors in R

If I have a vector, c(1,2,3,5,7,9,10,12)...and another vector c(3,7,10), how would I produce the following:
[[1]]
1,2,3
[[2]]
5,7
[[3]]
9,10
[[4]]
12
Notice how 3 7 and 10 become the last number of each list element (except the last one). Or in a sense the "breakpoint". I am sure there is a simple R function I am unknowledgeable of or having loss of memory.
Here's one way using cut and split:
split(x, cut(x, c(-Inf, y, Inf)))
#$`(-Inf,3]`
#[1] 1 2 3
#
#$`(3,7]`
#[1] 5 7
#
#$`(7,10]`
#[1] 9 10
#
#$`(10, Inf]`
#[1] 12
Could do
split(x, cut(x, unique(c(y, range(x)))))
## $`[1,3]`
## [1] 1 2 3
## $`(3,7]`
## [1] 5 7
## $`(7,10]`
## [1] 9 10
## $`(10,12]`
## [1] 12
Similar to #beginneR 's answer, but using findInterval instead of cut
split(x, findInterval(x, y + 1))
# $`0`
# [1] 1 2 3
#
# $`1`
# [1] 5 7
#
# $`2`
# [1] 9 10
#
# $`3`
# [1] 12

Resources