I have once seen this function but can't remember its name now. The function performs a rolling-slice of the input vector/matrix and outputs a matrix with 1 dimension higher. Here is what the function does:
rolling_slice <- function(v,window){
rows = length(v)-window+1
m <- matrix(0,rows,window)
for(i in 1:rows){m[i,] <- v[i:(i+window-1)]}
return(m)
}
A sample output with a vector input looks like this:
> v <- 1:10
> rolling_slice(v,3)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
[4,] 4 5 6
[5,] 5 6 7
[6,] 6 7 8
[7,] 7 8 9
[8,] 8 9 10
The reason to try finding it is I want to speed up the rolling-window operations in R and I hope this function could help by pre-indexing the input data.
I just discovered the base R function embed and now it is one of my favorite things:
> numcol <- 3
> embed(1:10, numcol)
[,1] [,2] [,3]
[1,] 3 2 1
[2,] 4 3 2
[3,] 5 4 3
[4,] 6 5 4
[5,] 7 6 5
[6,] 8 7 6
[7,] 9 8 7
[8,] 10 9 8
It basically does exactly what you describe by making a matrix of rolling windows of your data, with the second input being the window size. If order matters you can reverse the columns using:
embed(1:10, numcol)[ , numcol:1]
Sounds like zoo:rollapply/rapply() or roll*() are what you need.
What is your actual end-application: rolling-means, medians, weighted sum, filter, rolling-stdev, something else? I doubt that your end-application is simply taking a sliding-window slice. There's no point in generating huge unnecessary temporary data structure as it'll kill memory and performance.
Also, for performance, this sounds like a case where data.table's sequential access will beat dplyr/tibbles/tidyverse. What data structure are you using?
You could do this vectorized in base R:
window <- 3
m <- diag(length(v)-window+1)
(row(m)+col(m)-1)[,1:window]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 2 3 4
# [3,] 3 4 5
# [4,] 4 5 6
# [5,] 5 6 7
# [6,] 6 7 8
# [7,] 7 8 9
# [8,] 8 9 10
Related
Assume I have to generate 1000 Sample Pairs (Y1,Y2) (from a Normal Distribution with replacement). Each of the pairs should have 20 observations.
y1 <- rep(sample(c(1:10),10, replace = TRUE))
y2 <- rep(sample(c(1:10),10, replace = TRUE))
How would I now generate 1000 of these pairs, so that they are easy to access for further computations.
I had the idea of looping them a 1000 times and saving them in a dataframe, but this may get chaotic.
Is there a simpler/nicer way to do this? A package or a function that I am missing?
Help would be appreciated!
One way is to use replicate, i.e.
replicate(5, rep(sample(c(1:10), 10, replace = TRUE)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 3 9 2 4 5
# [2,] 4 1 10 8 1
# [3,] 5 6 1 3 7
# [4,] 1 9 9 6 5
# [5,] 5 3 4 7 9
# [6,] 4 5 4 4 5
# [7,] 2 10 9 4 9
# [8,] 3 1 10 5 3
# [9,] 7 3 10 9 10
#[10,] 10 3 10 10 1
Professionals of R, I have a question:
I have the matrix as below, and I want create the criteria: to construct the matrix using only the next strings: 1st string + i, where i=3 so I want to get the new matrix with the first, 5th, 9th strings of the initial matrix, and so the dimension of new matrix has to be 3x3. Maybe is there the special function in R for this procedure or needed to realize this task through the FUN in R?
[,1] [,2] [,3]
[1,] 1 2 15
[2,] 2 3 16
[3,] 3 4 1
[4,] 4 5 2
[5,] 5 6 3
[6,] 6 7 4
[7,] 7 8 5
[8,] 8 9 6
[9,] 9 10 7
Below the desired matrix:
1 2 15
5 6 3
9 10 7
I want to repeatedly divide a set into two complementary subsets with known size and keep them as the columns of two matrix. For example assume the main set is {1, 2, ..., 10}, the size of first sample is 8 and I want to repeat sampling 3 times. I want to have:
[,1] [,2] [,3]
[1,] 10 9 1
[2,] 8 1 10
[3,] 3 7 5
[4,] 4 2 3
[5,] 1 8 8
[6,] 6 4 2
[7,] 9 5 7
[8,] 5 10 6
and
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 7 6 9
Any idea how to implement it in R avoiding for loops?
I would use replicate + sample, like this:
set.seed(1) # Just so you can replicate my results
A <- replicate(3, sample(10, 8, FALSE)) # Change 3 to the number of replications
A
# [,1] [,2] [,3]
# [1,] 3 7 8
# [2,] 4 1 9
# [3,] 5 2 4
# [4,] 7 8 6
# [5,] 2 5 7
# [6,] 8 10 2
# [7,] 9 4 3
# [8,] 6 6 1
For the other set, I would use apply + setdiff, like this:
B <- apply(A, 2, function(x) setdiff(1:10, x))
B
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 10 9 10
Another option as suggested by #thelatemail (which would be more efficient) is to just create use replicate to create your original matrix, and use basic subsetting to create your separate matrices.
A <- replicate(3, sample(10))
B <- A[-(seq_len(8)), ]
A <- A[seq_len(8), ]
The function mapply() appears not to properly work in the following case:
a <- list(matrix(1:8,4,2),matrix(1:9,3,3))
b <- list(1:4,1:3)
mapply(a,b,FUN=cbind)
that gives the following matrix
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 4
[5,] 5 5
[6,] 6 6
[7,] 7 7
[8,] 8 8
[9,] 1 9
[10,] 2 1
[11,] 3 2
[12,] 4 3
instead of the following (expected) result:
[[1]]
[,1] [,2] [,3]
[1,] 1 5 1
[2,] 2 6 2
[3,] 3 7 3
[4,] 4 8 4
[[2]]
[,1] [,2] [,3] [,4]
[1,] 1 4 7 1
[2,] 2 5 8 2
[3,] 3 6 9 3
Can anybody help me in understanding if something in my code is wrong? Thank you!
Make sure to set SIMPLIFY to false
mapply(a,b,FUN=cbind, SIMPLIFY=FALSE)
otherwise mapply tries to coerce everything into a compatible single result. In your case, because the return from each call had 12 elements, it put those two elements side by side in a matrix, with the first matrix values in the first column, and the second matrix in the second column.
Alternatively you can use
Map(cbind, a, b)
which always returns a list. (Map is also nice because if a has names it will use those names in the resulting list which isn't useful in this case, but may be useful in others.)
I want to create a function that produces a matrix containing several lags of a variable. A simple example that works is
a <- ts(1:10)
cbind(a, lag(a, -1))
To do this for multiple lags, I have
lagger <- function(var, lags) {
### Create list of lags
lagged <- lapply(1:lags, function(x){
lag(var, -x)
})
### Join lags together
do.call(cbind, list(var, lagged))
}
Using the above example gives unexpected results;
lagger(a, 1)
gives a length 20 list with the original time series broken out into separate list slots and the final 10 each being a replication of the lagged series.
Any suggestions to getting this working? Thanks!
This gives a lag of 0 and of 1.
library(zoo)
a <- ts(11:13)
lags <- -(0:1)
a.lag <- as.ts(lag(as.zoo(a), lags))
Now a.lag is this:
> a.lag
Time Series:
Start = 1
End = 4
Frequency = 1
lag0 lag-1
1 11 NA
2 12 11
3 13 12
4 NA 13
If you don't want the NA entries then use: as.ts(na.omit(lag(as.zoo(a), lags))) .
Based on #Joshua Ulrich answer.
I thinkd embed is the correct answer but you get the vectors in the other way around. I mean using embed you'll get the lagged series not in the proper order, see the following
lagged <- embed(a,4)
colnames(lagged) <- paste('t', 3:0, sep='-')
lagged
t-3 t-2 t-1 t-0
[1,] 4 3 2 1
[2,] 5 4 3 2
[3,] 6 5 4 3
[4,] 7 6 5 4
[5,] 8 7 6 5
[6,] 9 8 7 6
[7,] 10 9 8 7
this gives the correct answer to you but not in the correct order, since the lags are in descending order.
But it you reorder just like this:
lagged_OK <- lagged[,ncol(lagged):1]
colnames(lagged_OK) <- paste('t', 0:3, sep='-')
lagged_OK
lag.0 lag.1 lag.2 lag.3
[1,] 1 2 3 4
[2,] 2 3 4 5
[3,] 3 4 5 6
[4,] 4 5 6 7
[5,] 5 6 7 8
[6,] 6 7 8 9
[7,] 7 8 9 10
Then, you get the right lagged matrix.
I add colnames only for explanation purpose, you can just do:
embed(a,4)[ ,4:1]
If you really want a lagger function, try this
lagger <- function(x, lag=1){
lag <- lag+1
Lagged <- embed(x,lag)[ ,lag:1]
colnames(Lagged) <- paste('lag', 0:(lag-1), sep='.')
return(Lagged)
}
lagger(a, 4)
lag.0 lag.1 lag.2 lag.3 lag.4
[1,] 1 2 3 4 5
[2,] 2 3 4 5 6
[3,] 3 4 5 6 7
[4,] 4 5 6 7 8
[5,] 5 6 7 8 9
[6,] 6 7 8 9 10
lagger(a, 1)
lag.0 lag.1
[1,] 1 2
[2,] 2 3
[3,] 3 4
[4,] 4 5
[5,] 5 6
[6,] 6 7
[7,] 7 8
[8,] 8 9
[9,] 9 10
I'm not sure what's wrong with your function, but you can probably use embed instead.
> embed(a,4)
[,1] [,2] [,3] [,4]
[1,] 4 3 2 1
[2,] 5 4 3 2
[3,] 6 5 4 3
[4,] 7 6 5 4
[5,] 8 7 6 5
[6,] 9 8 7 6
[7,] 10 9 8 7