I want to create a function which helps characterise the results to some simulations. For the purposes of this post let the simulation function be:
example_sim <- function(time=100, npops=5){
result <- data.frame(matrix(NA, nrow = time, ncol = npops))
colnames(result) <- LETTERS[1:npops]
for(i in 1:npops){
sim <- sample.int(time, time)
result[,i] <- sim
result[,i] <- result[,i]*i
}
return(result)
}
This creates a data frame with varying length and width based on the number of populations (npops) and the time simulated.
I want to create a function which uses the output of such simulations and characterises the mean, variance for each population over an n amount of simulations (nsims).
So far I have managed to get it working for two populations with the following code:
library("matrixStats")
library("reshape2")
ensembles <- function(nsims=10, time = 100, npops = 2){
result_N.A <- data.frame(matrix(NA, nrow = time, ncol = nsims))
result_N.B <- data.frame(matrix(NA, nrow = time, ncol = nsims))
for( i in 1:(nsims)){
simulation_with_2pops <- example_sim(time=100,npops=2)
result_N.A[,i] <- simulation_with_2pops[,1]
result_N.B[,i] <- simulation_with_2pops[,2]
}
output <- simulation_with_2pops
for( j in 1:params$ntime){
output$meanA[j] <- rowMeans(result_N.A[j,])
}
for( j in 1:params$ntime){
output$meanB[j] <- rowMeans(result_N.B[j,])
}
for( j in 1:params$ntime){
output$varA[j] <- rowVars(as.matrix(result_N.A[j,]))
}
for( j in 1:params$ntime){
output$varB[j] <- rowVars(as.matrix(result_N.B[j,]))
}
return(output)
}
ensembles_output<- ensembles(nsims = 10)
ensembles_output
To fully implement the function for any number of populations I would need to create another for loop where I create and update the result_N.A object. (Presumably called something like result[i].)
I have also thought about creating a 3 dimensional object (time, npops, nsims) and taking a slice of it to calculate the mean and variance but i havent had much success yet.
I am not married for this route and am very open to other recommendations.
Eventually I would like to create a code where the covariance and correlation are also calculated by giving highlighting two populations in the parameters. (for instance population A and population E). If you have any ideas on the implementation i would be very grateful to hear them.
Thank you for considering this problem.
I think using a multidimensional array is a very good idea in this case.
First, you can get the simulations of example_sim() much cheaper using mapply(). Here an example with time=10 and npops=3. Use the same set.seed(42) and parameters and check for yourself.
I use much smaller parameters here so that you can easily check the result in your head.
set.seed(42)
sim <- replicate(nsims, mapply(\(time, i) sample.int(time, time)*i, 10, 1:3))
sim
# , , 1
#
# [,1] [,2] [,3]
# [1,] 1 16 27
# [2,] 5 14 30
# [3,] 10 8 9
# [4,] 8 2 12
# [5,] 2 10 15
# [6,] 4 20 18
# [7,] 6 4 3
# [8,] 9 12 6
# [9,] 7 18 24
# [10,] 3 6 21
#
# , , 2
#
# [,1] [,2] [,3]
# [1,] 3 10 18
# [2,] 1 8 6
# [3,] 2 4 12
# [4,] 6 16 9
# [5,] 10 6 30
# [6,] 8 2 15
# [7,] 4 20 27
# [8,] 5 14 21
# [9,] 7 12 24
# [10,] 9 18 3
#
# , , 3
#
# [,1] [,2] [,3]
# [1,] 10 8 18
# [2,] 8 18 6
# [3,] 5 6 27
# [4,] 1 16 3
# [5,] 7 10 24
# [6,] 4 12 15
# [7,] 6 20 30
# [8,] 2 4 9
# [9,] 9 2 12
# [10,] 3 14 21
Next, I believe you want to gather row-wise statistics across each population column A, B, C, ... . Here you basically want apply(., MARGINS=1:2, FUN). Just for the mean there exists rowMeans(., dims=2L), which is faster.
rowMeans(sim, dims=2L)
# [,1] [,2] [,3]
# [1,] 4.666667 11.333333 21
# [2,] 4.666667 13.333333 14
# [3,] 5.666667 6.000000 16
# [4,] 5.000000 11.333333 8
# [5,] 6.333333 8.666667 23
# [6,] 5.333333 11.333333 16
# [7,] 5.333333 14.666667 20
# [8,] 5.333333 10.000000 12
# [9,] 7.666667 10.666667 20
# [10,] 5.000000 12.666667 15
apply(sim, 1:2, var)
# [,1] [,2] [,3]
# [1,] 22.333333 17.333333 27
# [2,] 12.333333 25.333333 192
# [3,] 16.333333 4.000000 93
# [4,] 13.000000 65.333333 21
# [5,] 16.333333 5.333333 57
# [6,] 5.333333 81.333333 3
# [7,] 1.333333 85.333333 219
# [8,] 12.333333 28.000000 63
# [9,] 1.333333 65.333333 48
# [10,] 12.000000 37.333333 108
I'm not sure however, why you use simulation_with_2pops for your final output, since it's the result of last iteration of for (i in 1:nsims) loop. Anyway, hope this helps you further.
Note: R >= 4.1 used.
Related
I am a newbie in R, I now have a matrix of 3 columns and 8, 000 rows, with groups of 500 rows, which means 16 sets of 500*3 matrices stacked on top of each other in rows. Now I want to take the first 300 rows of each group of matrices, put 16 groups of 300 by 3 into a new matrix, what do I do?
Two 6 * 2 matrices on top of each other:
m <- matrix(1:24, ncol = 2)
# [,1] [,2]
# [1,] 1 13
# [2,] 2 14
# [3,] 3 15
# [4,] 4 16
# [5,] 5 17
# [6,] 6 18
# [7,] 7 19
# [8,] 8 20
# [9,] 9 21
#[10,] 10 22
#[11,] 11 23
#[12,] 12 24
Make it an array:
a <- array(m, c(6, 2, 2))
a <- aperm(a, c(1, 3, 2))
First three rows of each matrix:
a[1:3,,]
#, , 1
#
# [,1] [,2]
#[1,] 1 13
#[2,] 2 14
#[3,] 3 15
#
#, , 2
#
# [,1] [,2]
#[1,] 7 19
#[2,] 8 20
#[3,] 9 21
Use this if you need a matrix:
matrix(aperm(a[1:3,,], c(1, 3, 2)), ncol = 2)
# [,1] [,2]
#[1,] 1 13
#[2,] 2 14
#[3,] 3 15
#[4,] 7 19
#[5,] 8 20
#[6,] 9 21
You need to generate the sequence 1:300, 501:800, ... etc, then subset out these rows. If your matrix is called mat you can do that like this:
new_mat <- mat[as.numeric(sapply((0:15 * 500), "+", 1:300)),]
If you're looking for just a way to select the first 300 rows from your matrix for each group, this could be a solution.
Given m your matrix of 8000x3 composed by 16 groups on top of each other, then:
r <- 500 # rows for each group
g <- 16 # number of groups
n <- 300 # first n rows to select
new_m <- m[rep(rep(c(T,F), c(n,r-n)), g), ]
dim(new_m)
#> [1] 4800 3
new_m is now a matrix 4800x3
In case you are working with keras or reticulate, you could use array_reshape.
#### 0. parameters
nrows <- 4 # 500 in your example # rows for each group
ncols <- 3 # 3 in your example
ngrps <- 2 # 16 in your example # number of groups
nslct <- 3 # 300 in your example # first n rows to select
#### 1. create an example matrix
m <- matrix(1:24, nrows*ngrps, ncols)
m
#> [,1] [,2] [,3]
#> [1,] 1 9 17
#> [2,] 2 10 18
#> [3,] 3 11 19
#> [4,] 4 12 20
#> [5,] 5 13 21
#> [6,] 6 14 22
#> [7,] 7 15 23
#> [8,] 8 16 24
dim(m)
#> [1] 8 3
#--> c(ngrps * nrows, ncols)
#### 2. reshape in groups
m <- reticulate::array_reshape(m, c(ngrps,nrows,ncols))
dim(m)
#> [1] 2 4 3
# --> c(n_groups, n_rows, n_cols)
m[1,,]
#> [,1] [,2] [,3]
#> [1,] 1 9 17
#> [2,] 2 10 18
#> [3,] 3 11 19
#> [4,] 4 12 20
m[2,,]
#> [,1] [,2] [,3]
#> [1,] 5 13 21
#> [2,] 6 14 22
#> [3,] 7 15 23
#> [4,] 8 16 24
#### 3. select first nslct rows for each group
new_m <- m[,seq_len(nslct),]
# that's the result for each group
new_m[1,,]
#> [,1] [,2] [,3]
#> [1,] 1 9 17
#> [2,] 2 10 18
#> [3,] 3 11 19
new_m[2,,]
#> [,1] [,2] [,3]
#> [1,] 5 13 21
#> [2,] 6 14 22
#> [3,] 7 15 23
#### 4. recreate one matrix
reticulate::array_reshape(new_m, c(nslct*ngrps,ncols))
#> [,1] [,2] [,3]
#> [1,] 1 9 17
#> [2,] 2 10 18
#> [3,] 3 11 19
#> [4,] 5 13 21
#> [5,] 6 14 22
#> [6,] 7 15 23
Created on 2020-11-23 by the reprex package (v0.3.0)
A simple way to generate an array that mimics the fill of your matrix is to use the transpose of the matrix as the input for the array function. Here is a simple example:
n <- rep(1:3, each = 4)
m1 <- matrix(n, ncol = 2, byrow = TRUE)
> m1
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 2 2
[4,] 2 2
[5,] 3 3
[6,] 3 3
m2 <- t(m1)
a1 <- array(m2, c(2, 2, 3))
> a1
, , 1
[,1] [,2]
[1,] 1 1
[2,] 1 1
, , 2
[,1] [,2]
[1,] 2 2
[2,] 2 2
, , 3
[,1] [,2]
[1,] 3 3
[2,] 3 3
I have a number of subarrays, say 2 (for simplicity), each with the same number of rows and columns. Each spot in the subarrays is occupied by a number in [1, 10].
What I would like to do is move rows randomly between subarrays according to some rate of movement m = [0, 1]. m = 0 corresponds to no movement, while m = 1 means that any rows across all subarrays can be moved.
I take inspiration from:
How to swap a number of the values between 2 rows in R
but my problem is a bit different than this. I do know that sample() would be needed here.
Is there an easy way to go about accomplishing this?
This doesn't do it, but I believe I'm on the right track anyway.
m <- 0.2
a <- array(dim = c(5, 5, 2)) # 5 rows, 5 columns, 2 subarrays
res <- rep(sample(nrow(a), size = ceiling(nrow(a)*m), replace = FALSE)) # sample 20% of rows from array a.
Any assistance is appreciated.
It is significantly easier if you can use a matrix (2-dim array).
set.seed(2)
m <- 0.2
d <- c(10, 4)
a <- array(sample(prod(d)), dim = d)
a
# [,1] [,2] [,3] [,4]
# [1,] 8 17 14 1
# [2,] 28 37 40 26
# [3,] 22 38 16 29
# [4,] 7 35 3 32
# [5,] 34 11 23 4
# [6,] 36 33 19 31
# [7,] 5 24 30 13
# [8,] 39 6 27 25
# [9,] 15 10 12 9
# [10,] 18 2 21 20
(I'm going to set the seed again to something that conveniently gives me something "interesting" to show.)
set.seed(2)
ind <- which(runif(d[1]) < m)
ind
# [1] 1 4 7
The first randomness, runif, is compared against m and generates the indices that may change. The second randomness, sample below, takes those indices and possibly reorders them. (In this case, it reorders "1,4,7" to "4,1,7", meaning the third of the rows-that-may-change will be left unchanged.)
a[ind,] <- a[sample(ind),]
a
# [,1] [,2] [,3] [,4]
# [1,] 7 35 3 32 # <-- row 4
# [2,] 28 37 40 26
# [3,] 22 38 16 29
# [4,] 8 17 14 1 # <-- row 1
# [5,] 34 11 23 4
# [6,] 36 33 19 31
# [7,] 5 24 30 13 # <-- row 7, unchanged
# [8,] 39 6 27 25
# [9,] 15 10 12 9
# [10,] 18 2 21 20
Note that this is probabilistic, which means a probability of 0.2 does not guarantee you 20% (or even any) of the rows will be swapped.
(Since I'm guessing you'd really like to preserve your 3-dim (or even n-dim) array, you might be able to use aperm to transfer between array <--> matrix.)
EDIT 1
As an alternative to a probabilitic use of runif, you can use:
ind <- head(sample(d[1]),size=d[1]*m)
to get closer to your goal of "20%". Since d[1]*m will often not be an integer, head silently truncates/floors the number, so you'll get the price-is-right winner: closest to but not over your desired percentage.
EDIT 2
A reversible method for transforming an n-dimensional array into a matrix and back again. Caveat: though the logic appears solid, my testing has only included a couple arrays.
array2matrix <- function(a) {
d <- dim(a)
ind <- seq_along(d)
a2 <- aperm(a, c(ind[2], ind[-2]))
dim(a2) <- c(d[2], prod(d[-2]))
a2 <- t(a2)
attr(a2, "origdim") <- d
a2
}
The reversal uses the "origdim" attribute if still present; this will work as long as your modifications to the matrix do not clear its attributes. (Simple row-swapping does not.)
matrix2array <- function(m, d = attr(m, "origdim")) {
ind <- seq_along(d)
m2 <- t(m)
dim(m2) <- c(d[2], d[-2])
aperm(m2, c(ind[2], ind[-2]))
}
(These two functions should probably do some more error-checks, such as is.null(d).)
A sample run:
set.seed(2)
dims <- 5:2
a <- array(sample(prod(dims)), dim=dims)
Quick show:
a[,,1,1:2,drop=FALSE]
# , , 1, 1
# [,1] [,2] [,3] [,4]
# [1,] 23 109 61 90
# [2,] 84 15 27 102
# [3,] 68 95 83 24
# [4,] 20 53 117 46
# [5,] 110 62 43 8
# , , 1, 2
# [,1] [,2] [,3] [,4]
# [1,] 118 25 14 93
# [2,] 65 21 16 77
# [3,] 87 82 3 38
# [4,] 92 12 78 17
# [5,] 49 4 75 80
The transformation:
m <- array2matrix(a)
dim(m)
# [1] 30 4
head(m)
# [,1] [,2] [,3] [,4]
# [1,] 23 109 61 90
# [2,] 84 15 27 102
# [3,] 68 95 83 24
# [4,] 20 53 117 46
# [5,] 110 62 43 8
# [6,] 67 47 1 54
Proof of reversability:
identical(matrix2array(m), a)
# [1] TRUE
EDIT 3, "WRAP UP of all code"
Creating fake data:
dims <- c(5,4,2)
(a <- array(seq(prod(dims)), dim=dims))
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 2 7 12 17
# [3,] 3 8 13 18
# [4,] 4 9 14 19
# [5,] 5 10 15 20
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 21 26 31 36
# [2,] 22 27 32 37
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 25 30 35 40
(m <- array2matrix(a))
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 2 7 12 17
# [3,] 3 8 13 18
# [4,] 4 9 14 19
# [5,] 5 10 15 20
# [6,] 21 26 31 36
# [7,] 22 27 32 37
# [8,] 23 28 33 38
# [9,] 24 29 34 39
# [10,] 25 30 35 40
# attr(,"origdim")
# [1] 5 4 2
The random-swapping of rows. I'm using 50% here.
pct <- 0.5
nr <- nrow(m)
set.seed(3)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 2 8 4 3 9
(ind2 <- sample(ind1))
# [1] 3 2 9 8 4
m[ind1,] <- m[ind2,]
m
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 3 8 13 18
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 5 10 15 20
# [6,] 21 26 31 36
# [7,] 22 27 32 37
# [8,] 2 7 12 17
# [9,] 4 9 14 19
# [10,] 25 30 35 40
# attr(,"origdim")
# [1] 5 4 2
(Note that I pre-made ind1 and ind2 here, mostly to see what was going on internally. You can replace m[ind2,] with m[sample(ind1),] for the same effect.)
BTW: if we had instead used a seed of 2, we would notice that 2 rows are not swapped:
set.seed(2)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 2 7 5 10 6
(ind2 <- sample(ind1))
# [1] 6 2 5 10 7
Because of this, I chose a seed of 3 for demonstration. However, this may give the appearance of things not working. Lacking more controlling code, sample does not ensure that positions change: it is certainly reasonable to expect that "randomly swap rows" could randomly choose to move row 2 to row 2. Take for example:
set.seed(267)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 3 6 5 7 2
(ind2 <- sample(ind1))
# [1] 3 6 5 7 2
The first randomly chooses five rows, and then reorders them randomly into an unchanged order. (I suggest that if you want to force that they are all movements, you should ask a new question asking about just forcing a sample vector to change.)
Anyway, we can regain the original dimensionality with the second function:
(a2 <- matrix2array(m))
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 3 8 13 18
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 5 10 15 20
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 21 26 31 36
# [2,] 22 27 32 37
# [3,] 2 7 12 17
# [4,] 4 9 14 19
# [5,] 25 30 35 40
In the first plane of the array, rows 1 and 5 are unchanged; in the second plane, rows 1, 2, and 5 are unchanged. Five rows the same, five rows moved around (but otherwise unchanged within each row).
I want to iterate the following matrix and print sets of 2 cell values. Is there a way to do this without a for-loop?
Input:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 7 10 13 16
[2,] 2 5 8 11 14 17
[3,] 3 6 9 12 15 18
Expected Output:
[,1] [,2]
[1,] 1 4
[2,] 7 10
[3,] 13 16
[4,] 2 5
[5,] 8 11
[6,] 14 17
[7,] 3 6
[8,] 9 12
[9,] 15 18
This my code:
mat<-matrix(data=seq(1:18), nrow=3,ncol=6)
r <- rep(seq(1,3),each=2)
c1 <- seq(1,6,2)
c2 <- seq(2,6,2)
m <- mat[r,c(c1:c2)] # This does not work, it only output first two cells
We can get the transpose of the matrix, then convert back to matrix by specifying the ncol
matrix(t(mat), ncol=2, byrow=TRUE)
# [,1] [,2]
# [1,] 1 4
# [2,] 7 10
# [3,] 13 16
# [4,] 2 5
# [5,] 8 11
# [6,] 14 17
# [7,] 3 6
# [8,] 9 12
# [9,] 15 18
I want to repeatedly divide a set into two complementary subsets with known size and keep them as the columns of two matrix. For example assume the main set is {1, 2, ..., 10}, the size of first sample is 8 and I want to repeat sampling 3 times. I want to have:
[,1] [,2] [,3]
[1,] 10 9 1
[2,] 8 1 10
[3,] 3 7 5
[4,] 4 2 3
[5,] 1 8 8
[6,] 6 4 2
[7,] 9 5 7
[8,] 5 10 6
and
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 7 6 9
Any idea how to implement it in R avoiding for loops?
I would use replicate + sample, like this:
set.seed(1) # Just so you can replicate my results
A <- replicate(3, sample(10, 8, FALSE)) # Change 3 to the number of replications
A
# [,1] [,2] [,3]
# [1,] 3 7 8
# [2,] 4 1 9
# [3,] 5 2 4
# [4,] 7 8 6
# [5,] 2 5 7
# [6,] 8 10 2
# [7,] 9 4 3
# [8,] 6 6 1
For the other set, I would use apply + setdiff, like this:
B <- apply(A, 2, function(x) setdiff(1:10, x))
B
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 10 9 10
Another option as suggested by #thelatemail (which would be more efficient) is to just create use replicate to create your original matrix, and use basic subsetting to create your separate matrices.
A <- replicate(3, sample(10))
B <- A[-(seq_len(8)), ]
A <- A[seq_len(8), ]
I'm a beginner R user and I need to write a function that sums the rows of a data frame over a fixed interval (every 4 rows).
I've tried the following code
camp<-function(X){
i<-1
n<-nrow(X)
xc<-matrix(nrow=36,ncol=m)
for (i in 1:n){
xc<-apply(X[i:(i+4),],2,sum)
rownames(xc[i])<-rownames(X[i])
i<-i+5
}
return(xc)
}
the result is "Error in X[i:(i + 4), ] : index out of range".
How can I solve? Any suggestion?
Thanks.
The zoo package has rollapply which is pretty handy for stuff like this...
# Make some data
set.seed(1)
m <- matrix( sample( 10 , 32 , repl = TRUE ) , 8 )
# [,1] [,2] [,3] [,4]
#[1,] 3 7 8 3
#[2,] 4 1 10 4
#[3,] 6 3 4 1
#[4,] 10 2 8 4
#[5,] 3 7 10 9
#[6,] 9 4 3 4
#[7,] 10 8 7 5
#[8,] 7 5 2 6
# Sum every 4 rows
require( zoo )
tmp <- rollapply( m , width = 4 , by = 4 , align = "left" , FUN = sum )
# [,1] [,2] [,3] [,4]
#[1,] 23 13 30 12
#[2,] 29 24 22 24
You can also use rowSums() on the result if you actually wanted to aggregate the columns into a single value for each of the 4 rows...
rowSums( tmp )
#[1] 78 99
Here is a way to do it :
## Sample data
m <- matrix(1:36, nrow=12)
## Create a "group" index
fac <- (seq_len(nrow(m))-1) %/% 4
## Apply sum
apply(m, 2, function(v) tapply(v, fac, sum))
Sample data :
[,1] [,2] [,3]
[1,] 1 13 25
[2,] 2 14 26
[3,] 3 15 27
[4,] 4 16 28
[5,] 5 17 29
[6,] 6 18 30
[7,] 7 19 31
[8,] 8 20 32
[9,] 9 21 33
[10,] 10 22 34
[11,] 11 23 35
[12,] 12 24 36
Result :
[,1] [,2] [,3]
0 10 58 106
1 26 74 122
2 42 90 138