Assume I have to generate 1000 Sample Pairs (Y1,Y2) (from a Normal Distribution with replacement). Each of the pairs should have 20 observations.
y1 <- rep(sample(c(1:10),10, replace = TRUE))
y2 <- rep(sample(c(1:10),10, replace = TRUE))
How would I now generate 1000 of these pairs, so that they are easy to access for further computations.
I had the idea of looping them a 1000 times and saving them in a dataframe, but this may get chaotic.
Is there a simpler/nicer way to do this? A package or a function that I am missing?
Help would be appreciated!
One way is to use replicate, i.e.
replicate(5, rep(sample(c(1:10), 10, replace = TRUE)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 3 9 2 4 5
# [2,] 4 1 10 8 1
# [3,] 5 6 1 3 7
# [4,] 1 9 9 6 5
# [5,] 5 3 4 7 9
# [6,] 4 5 4 4 5
# [7,] 2 10 9 4 9
# [8,] 3 1 10 5 3
# [9,] 7 3 10 9 10
#[10,] 10 3 10 10 1
Related
I have once seen this function but can't remember its name now. The function performs a rolling-slice of the input vector/matrix and outputs a matrix with 1 dimension higher. Here is what the function does:
rolling_slice <- function(v,window){
rows = length(v)-window+1
m <- matrix(0,rows,window)
for(i in 1:rows){m[i,] <- v[i:(i+window-1)]}
return(m)
}
A sample output with a vector input looks like this:
> v <- 1:10
> rolling_slice(v,3)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
[4,] 4 5 6
[5,] 5 6 7
[6,] 6 7 8
[7,] 7 8 9
[8,] 8 9 10
The reason to try finding it is I want to speed up the rolling-window operations in R and I hope this function could help by pre-indexing the input data.
I just discovered the base R function embed and now it is one of my favorite things:
> numcol <- 3
> embed(1:10, numcol)
[,1] [,2] [,3]
[1,] 3 2 1
[2,] 4 3 2
[3,] 5 4 3
[4,] 6 5 4
[5,] 7 6 5
[6,] 8 7 6
[7,] 9 8 7
[8,] 10 9 8
It basically does exactly what you describe by making a matrix of rolling windows of your data, with the second input being the window size. If order matters you can reverse the columns using:
embed(1:10, numcol)[ , numcol:1]
Sounds like zoo:rollapply/rapply() or roll*() are what you need.
What is your actual end-application: rolling-means, medians, weighted sum, filter, rolling-stdev, something else? I doubt that your end-application is simply taking a sliding-window slice. There's no point in generating huge unnecessary temporary data structure as it'll kill memory and performance.
Also, for performance, this sounds like a case where data.table's sequential access will beat dplyr/tibbles/tidyverse. What data structure are you using?
You could do this vectorized in base R:
window <- 3
m <- diag(length(v)-window+1)
(row(m)+col(m)-1)[,1:window]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 2 3 4
# [3,] 3 4 5
# [4,] 4 5 6
# [5,] 5 6 7
# [6,] 6 7 8
# [7,] 7 8 9
# [8,] 8 9 10
For example suppose I have matrix A
x y z f
1 1 2 A 1005
2 2 4 B 1002
3 3 2 B 1001
4 4 8 C 1001
5 5 10 D 1004
6 6 12 D 1004
7 7 11 E 1005
8 8 14 E 1003
From this matrix I want to find the repeated values like 1001, 1005, D, 2 (in third column) and I also want to find their index (which row, or which position).
I am new to R!
Obviously it is possible to do with simple searching element by element by using a for loop, but I want to know, is there any function available in R for this kind of problem.
Furthermore, I tried using duplicated and unique, both functions are giving me the duplicated row number or column number, they are also giving me how many of them were repeated, but I can not search for whole matrix using both of them!
You can write a rather simple function to get this information. Though note that this solution works with a matrix. It does not work with a data.frame. A similar function could be written for a data.frame using the fact that the data.frame data structure is a subset of a list.
# example data
set.seed(234)
m <- matrix(sample(1:10, size=100, replace=T), 10)
find_matches <- function(mat, value) {
nr <- nrow(mat)
val_match <- which(mat == value)
out <- matrix(NA, nrow= length(val_match), ncol= 2)
out[,2] <- floor(val_match / nr) + 1
out[,1] <- val_match %% nr
return(out)
}
R> m
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 8 6 6 7 6 7 4 10 6 9
[2,] 8 6 6 3 10 4 5 4 6 9
[3,] 1 6 9 2 9 2 3 6 4 2
[4,] 8 6 7 8 3 9 9 4 9 2
[5,] 1 1 5 6 7 1 5 1 10 6
[6,] 7 5 4 7 8 2 4 4 7 10
[7,] 10 4 7 8 3 1 8 6 3 4
[8,] 8 8 2 2 7 5 6 4 10 4
[9,] 10 2 9 6 6 9 7 2 4 7
[10,] 3 9 9 4 2 7 7 2 9 6
R> find_matches(m, 8)
[,1] [,2]
[1,] 1 1
[2,] 2 1
[3,] 4 1
[4,] 8 1
[5,] 8 2
[6,] 4 4
[7,] 7 4
[8,] 6 5
[9,] 7 7
In this function, the row index is output in column 1 and the column index is output in column 2
I am trying to create a matrix by drawing random block rows from another matrix. I have managed to do so with a loop.
set.seed(1)
a_matrix <- matrix(1:10,10,5) # the matrix with original sample
b_matrix <- matrix(NA,10, 5) # a matrix to store the bootstrap sample
S2<- seq(from =1 , to = 10, by =2) #[1] 1 3 5 7 9
m <- 2 # block size of m
for (r in S2){ start_point<-sample(1:(nrow(a_matrix)-1), 1, replace=T)
#randomly choose a number 1 to length of a_matrix -1
b_block <- a_matrix[start_point:(start_point+(m-1)), 1:ncol(a_matrix)]
# randomly select blocks from matrix a
b_matrix[r,]<-as.matrix((b_block)[1,])
b_matrix[(r+1),]<-as.matrix((b_block)[2,]) # put the blocks into matrix b
}
b_matrix
#we now have a b_matrix that is made of random blocks (size m=2)
#of the original a_matrix
The loop method works but it is clearly not very efficient and it is not possible to extend it to other block size (for e.g. having a blocksize of 3) .What is a cleaner and expandable approach ? Thanks in advance
Here I tried to clean it up a bit and generalize the use of m:
random_block_sample <- function(a_matrix, m = 2L) {
N <- nrow(a_matrix)
stopifnot(m <= N)
n <- ceiling(N / m)
s <- sample(N - m + 1L, n, TRUE) # start_point
i <- unlist(lapply(s, seq, length.out = m))
b_matrix <- a_matrix[i, , drop = FALSE]
head(b_matrix, N)
}
set.seed(1L)
random_block_sample(a_matrix, m = 2L)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 3 3 3 3 3
# [2,] 4 4 4 4 4
# [3,] 4 4 4 4 4
# [4,] 5 5 5 5 5
# [5,] 6 6 6 6 6
# [6,] 7 7 7 7 7
# [7,] 9 9 9 9 9
# [8,] 10 10 10 10 10
# [9,] 2 2 2 2 2
# [10,] 3 3 3 3 3
set.seed(1L)
random_block_sample(a_matrix, m = 5L)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 2 2 2 2 2
# [2,] 3 3 3 3 3
# [3,] 4 4 4 4 4
# [4,] 5 5 5 5 5
# [5,] 6 6 6 6 6
# [6,] 3 3 3 3 3
# [7,] 4 4 4 4 4
# [8,] 5 5 5 5 5
# [9,] 6 6 6 6 6
# [10,] 7 7 7 7 7
I want to repeatedly divide a set into two complementary subsets with known size and keep them as the columns of two matrix. For example assume the main set is {1, 2, ..., 10}, the size of first sample is 8 and I want to repeat sampling 3 times. I want to have:
[,1] [,2] [,3]
[1,] 10 9 1
[2,] 8 1 10
[3,] 3 7 5
[4,] 4 2 3
[5,] 1 8 8
[6,] 6 4 2
[7,] 9 5 7
[8,] 5 10 6
and
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 7 6 9
Any idea how to implement it in R avoiding for loops?
I would use replicate + sample, like this:
set.seed(1) # Just so you can replicate my results
A <- replicate(3, sample(10, 8, FALSE)) # Change 3 to the number of replications
A
# [,1] [,2] [,3]
# [1,] 3 7 8
# [2,] 4 1 9
# [3,] 5 2 4
# [4,] 7 8 6
# [5,] 2 5 7
# [6,] 8 10 2
# [7,] 9 4 3
# [8,] 6 6 1
For the other set, I would use apply + setdiff, like this:
B <- apply(A, 2, function(x) setdiff(1:10, x))
B
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 10 9 10
Another option as suggested by #thelatemail (which would be more efficient) is to just create use replicate to create your original matrix, and use basic subsetting to create your separate matrices.
A <- replicate(3, sample(10))
B <- A[-(seq_len(8)), ]
A <- A[seq_len(8), ]
I want to create a function that produces a matrix containing several lags of a variable. A simple example that works is
a <- ts(1:10)
cbind(a, lag(a, -1))
To do this for multiple lags, I have
lagger <- function(var, lags) {
### Create list of lags
lagged <- lapply(1:lags, function(x){
lag(var, -x)
})
### Join lags together
do.call(cbind, list(var, lagged))
}
Using the above example gives unexpected results;
lagger(a, 1)
gives a length 20 list with the original time series broken out into separate list slots and the final 10 each being a replication of the lagged series.
Any suggestions to getting this working? Thanks!
This gives a lag of 0 and of 1.
library(zoo)
a <- ts(11:13)
lags <- -(0:1)
a.lag <- as.ts(lag(as.zoo(a), lags))
Now a.lag is this:
> a.lag
Time Series:
Start = 1
End = 4
Frequency = 1
lag0 lag-1
1 11 NA
2 12 11
3 13 12
4 NA 13
If you don't want the NA entries then use: as.ts(na.omit(lag(as.zoo(a), lags))) .
Based on #Joshua Ulrich answer.
I thinkd embed is the correct answer but you get the vectors in the other way around. I mean using embed you'll get the lagged series not in the proper order, see the following
lagged <- embed(a,4)
colnames(lagged) <- paste('t', 3:0, sep='-')
lagged
t-3 t-2 t-1 t-0
[1,] 4 3 2 1
[2,] 5 4 3 2
[3,] 6 5 4 3
[4,] 7 6 5 4
[5,] 8 7 6 5
[6,] 9 8 7 6
[7,] 10 9 8 7
this gives the correct answer to you but not in the correct order, since the lags are in descending order.
But it you reorder just like this:
lagged_OK <- lagged[,ncol(lagged):1]
colnames(lagged_OK) <- paste('t', 0:3, sep='-')
lagged_OK
lag.0 lag.1 lag.2 lag.3
[1,] 1 2 3 4
[2,] 2 3 4 5
[3,] 3 4 5 6
[4,] 4 5 6 7
[5,] 5 6 7 8
[6,] 6 7 8 9
[7,] 7 8 9 10
Then, you get the right lagged matrix.
I add colnames only for explanation purpose, you can just do:
embed(a,4)[ ,4:1]
If you really want a lagger function, try this
lagger <- function(x, lag=1){
lag <- lag+1
Lagged <- embed(x,lag)[ ,lag:1]
colnames(Lagged) <- paste('lag', 0:(lag-1), sep='.')
return(Lagged)
}
lagger(a, 4)
lag.0 lag.1 lag.2 lag.3 lag.4
[1,] 1 2 3 4 5
[2,] 2 3 4 5 6
[3,] 3 4 5 6 7
[4,] 4 5 6 7 8
[5,] 5 6 7 8 9
[6,] 6 7 8 9 10
lagger(a, 1)
lag.0 lag.1
[1,] 1 2
[2,] 2 3
[3,] 3 4
[4,] 4 5
[5,] 5 6
[6,] 6 7
[7,] 7 8
[8,] 8 9
[9,] 9 10
I'm not sure what's wrong with your function, but you can probably use embed instead.
> embed(a,4)
[,1] [,2] [,3] [,4]
[1,] 4 3 2 1
[2,] 5 4 3 2
[3,] 6 5 4 3
[4,] 7 6 5 4
[5,] 8 7 6 5
[6,] 9 8 7 6
[7,] 10 9 8 7