Related
I want to create a function which helps characterise the results to some simulations. For the purposes of this post let the simulation function be:
example_sim <- function(time=100, npops=5){
result <- data.frame(matrix(NA, nrow = time, ncol = npops))
colnames(result) <- LETTERS[1:npops]
for(i in 1:npops){
sim <- sample.int(time, time)
result[,i] <- sim
result[,i] <- result[,i]*i
}
return(result)
}
This creates a data frame with varying length and width based on the number of populations (npops) and the time simulated.
I want to create a function which uses the output of such simulations and characterises the mean, variance for each population over an n amount of simulations (nsims).
So far I have managed to get it working for two populations with the following code:
library("matrixStats")
library("reshape2")
ensembles <- function(nsims=10, time = 100, npops = 2){
result_N.A <- data.frame(matrix(NA, nrow = time, ncol = nsims))
result_N.B <- data.frame(matrix(NA, nrow = time, ncol = nsims))
for( i in 1:(nsims)){
simulation_with_2pops <- example_sim(time=100,npops=2)
result_N.A[,i] <- simulation_with_2pops[,1]
result_N.B[,i] <- simulation_with_2pops[,2]
}
output <- simulation_with_2pops
for( j in 1:params$ntime){
output$meanA[j] <- rowMeans(result_N.A[j,])
}
for( j in 1:params$ntime){
output$meanB[j] <- rowMeans(result_N.B[j,])
}
for( j in 1:params$ntime){
output$varA[j] <- rowVars(as.matrix(result_N.A[j,]))
}
for( j in 1:params$ntime){
output$varB[j] <- rowVars(as.matrix(result_N.B[j,]))
}
return(output)
}
ensembles_output<- ensembles(nsims = 10)
ensembles_output
To fully implement the function for any number of populations I would need to create another for loop where I create and update the result_N.A object. (Presumably called something like result[i].)
I have also thought about creating a 3 dimensional object (time, npops, nsims) and taking a slice of it to calculate the mean and variance but i havent had much success yet.
I am not married for this route and am very open to other recommendations.
Eventually I would like to create a code where the covariance and correlation are also calculated by giving highlighting two populations in the parameters. (for instance population A and population E). If you have any ideas on the implementation i would be very grateful to hear them.
Thank you for considering this problem.
I think using a multidimensional array is a very good idea in this case.
First, you can get the simulations of example_sim() much cheaper using mapply(). Here an example with time=10 and npops=3. Use the same set.seed(42) and parameters and check for yourself.
I use much smaller parameters here so that you can easily check the result in your head.
set.seed(42)
sim <- replicate(nsims, mapply(\(time, i) sample.int(time, time)*i, 10, 1:3))
sim
# , , 1
#
# [,1] [,2] [,3]
# [1,] 1 16 27
# [2,] 5 14 30
# [3,] 10 8 9
# [4,] 8 2 12
# [5,] 2 10 15
# [6,] 4 20 18
# [7,] 6 4 3
# [8,] 9 12 6
# [9,] 7 18 24
# [10,] 3 6 21
#
# , , 2
#
# [,1] [,2] [,3]
# [1,] 3 10 18
# [2,] 1 8 6
# [3,] 2 4 12
# [4,] 6 16 9
# [5,] 10 6 30
# [6,] 8 2 15
# [7,] 4 20 27
# [8,] 5 14 21
# [9,] 7 12 24
# [10,] 9 18 3
#
# , , 3
#
# [,1] [,2] [,3]
# [1,] 10 8 18
# [2,] 8 18 6
# [3,] 5 6 27
# [4,] 1 16 3
# [5,] 7 10 24
# [6,] 4 12 15
# [7,] 6 20 30
# [8,] 2 4 9
# [9,] 9 2 12
# [10,] 3 14 21
Next, I believe you want to gather row-wise statistics across each population column A, B, C, ... . Here you basically want apply(., MARGINS=1:2, FUN). Just for the mean there exists rowMeans(., dims=2L), which is faster.
rowMeans(sim, dims=2L)
# [,1] [,2] [,3]
# [1,] 4.666667 11.333333 21
# [2,] 4.666667 13.333333 14
# [3,] 5.666667 6.000000 16
# [4,] 5.000000 11.333333 8
# [5,] 6.333333 8.666667 23
# [6,] 5.333333 11.333333 16
# [7,] 5.333333 14.666667 20
# [8,] 5.333333 10.000000 12
# [9,] 7.666667 10.666667 20
# [10,] 5.000000 12.666667 15
apply(sim, 1:2, var)
# [,1] [,2] [,3]
# [1,] 22.333333 17.333333 27
# [2,] 12.333333 25.333333 192
# [3,] 16.333333 4.000000 93
# [4,] 13.000000 65.333333 21
# [5,] 16.333333 5.333333 57
# [6,] 5.333333 81.333333 3
# [7,] 1.333333 85.333333 219
# [8,] 12.333333 28.000000 63
# [9,] 1.333333 65.333333 48
# [10,] 12.000000 37.333333 108
I'm not sure however, why you use simulation_with_2pops for your final output, since it's the result of last iteration of for (i in 1:nsims) loop. Anyway, hope this helps you further.
Note: R >= 4.1 used.
So I have a 1256 by 5 matrix.
> head(retmatx12.30.3)
AMT HON KO
[1,] -0.006673489 -0.001292867 -0.0033654493
[2,] 0.004447249 0.002848406 0.0082009877
[3,] 0.001789891 0.002754232 -0.0035886573
[4,] -0.003479321 0.002231823 0.0024011113
[5,] -0.006605786 0.015159190 -0.0002394852
[6,] -0.002375004 -0.008267790 -0.0100625938
NEM NVAX
[1,] -0.034023392 -0.023255737
[2,] 0.016436786 0.007936468
[3,] 0.009529404 0.031496102
[4,] 0.046052588 0.007633549
[5,] -0.031446425 0.037878788
[6,] -0.001694084 0.036496350
I want to apply a function I've made to rows 1-126, then 2-127, and so on. The function is a block of matrix algebra that uses a matrix and a few vectors. Is it wise to somehow break the larger matrix into 1,131 126 by 5 matrices, and apply the function over each (hopefully at once). Or, some sort of application of apply?
Any help is greatly appreciated. Thanks
The actual numbers in the matrix are immaterial, so I'll use much smaller data to demonstrate one method, and a simple function to demonstrate the rolling calculation:
m <- matrix(1:24, nrow=8)
somefunc <- function(x) x %*% seq(ncol(x))
wid <- 4 # 126
somefunc(m[1:4,])
# [,1]
# [1,] 70
# [2,] 76
# [3,] 82
# [4,] 88
somefunc(m[2:5,])
# [,1]
# [1,] 76
# [2,] 82
# [3,] 88
# [4,] 94
The actual rolling work:
lapply(seq(nrow(m) - wid + 1), function(i) somefunc(m[i - 1 + seq(wid),]))
# [[1]]
# [,1]
# [1,] 70
# [2,] 76
# [3,] 82
# [4,] 88
# [[2]]
# [,1]
# [1,] 76
# [2,] 82
# [3,] 88
# [4,] 94
# [[3]]
# [,1]
# [1,] 82
# [2,] 88
# [3,] 94
# [4,] 100
# [[4]]
# [,1]
# [1,] 88
# [2,] 94
# [3,] 100
# [4,] 106
# [[5]]
# [,1]
# [1,] 94
# [2,] 100
# [3,] 106
# [4,] 112
where the first element of the output is from rows 1-4, then 2-5, then 2-6, etc.
I have a matrix called sectorCoor which contains a list of 18 lat long coordinates. These 18 coordinates are dependent upon another variable which could change the size of the matrix from a minimum of 6 to a maximum of 36. The matrices will always be a multiple of 6. So depending upon the size of the sectorCoor matrix I would like to divide the existing matrix into elements of size 6 and from there I would like to add the variable siteCoor to the very start of the sectorCoor matrix and again after the first 6 elements, add siteCoor and take the next 6 and so on and so on until all multiples of 6 have been completed.
Suggestions are greatly appreciated.
siteCoor,
first 6 lon lat coordinates
siteCoor
siteCoor
Next 6 lon lat coordinates
siteCoor
siteCoor
Next 6 lon lat coordinates
siteCoor
> siteCoor
[,1] [,2]
[1,] 152.7075 -27.7027
> sectorCoor
lon lat
[1,] 152.7075 -27.70270
[2,] 152.6983 -27.68203
[3,] 152.7028 -27.68085
[4,] 152.7075 -27.68046
[5,] 152.7122 -27.68085
[6,] 152.7167 -27.68203
[7,] 152.7209 -27.68394
[8,] 152.7322 -27.70592
[9,] 152.7311 -27.71000
[10,] 152.7291 -27.71382
[11,] 152.7264 -27.71724
[12,] 152.7230 -27.72015
[13,] 152.7190 -27.72243
[14,] 152.6920 -27.72015
[15,] 152.6886 -27.71724
[16,] 152.6858 -27.71382
[17,] 152.6839 -27.71000
[18,] 152.6828 -27.70592
[19,] 152.6825 -27.70173
I would preallocate a matrix of the correct size, and separately populate the siteCoor and sectorCoor rows. We can use the initial data vector passed to matrix() to populate the siteCoor rows, and then use an index-assignment to populate the sectorCoor rows.
res <- matrix(siteCoor,nrow(sectorCoor)+nrow(sectorCoor)%/%6L*2L,2L,byrow=T);
res[c(F,rep(T,6L),F),] <- sectorCoor;
res;
## [,1] [,2]
## [1,] -1 -2
## [2,] 1 19
## [3,] 2 20
## [4,] 3 21
## [5,] 4 22
## [6,] 5 23
## [7,] 6 24
## [8,] -1 -2
## [9,] -1 -2
## [10,] 7 25
## [11,] 8 26
## [12,] 9 27
## [13,] 10 28
## [14,] 11 29
## [15,] 12 30
## [16,] -1 -2
## [17,] -1 -2
## [18,] 13 31
## [19,] 14 32
## [20,] 15 33
## [21,] 16 34
## [22,] 17 35
## [23,] 18 36
## [24,] -1 -2
In the above I use a short logical vector to subscript the sectorCoor rows of res. R recycles the vector across the entire row size of res, achieving the required periodicity of the storage pattern.
Data
N <- 3L;
sectorCoor <- matrix(seq_len(N*6L*2L),ncol=2L);
siteCoor <- matrix(c(-1,-2),ncol=2L);
Probably, it will be an easy one, just can't get my head around it today.
How can I combine 2 columns of the same matrix in such a way that element 1 from column 1 of the original matrix will be followed by element 1 from column 2 and so on? E.g. the original matrix may look like the one below:
set.seed(200)
m <- matrix(sample(1:100, 10, replace=FALSE), ncol=2, byrow=TRUE, dimnames=NULL)
m
[,1] [,2]
[1,] 54 58
[2,] 99 68
[3,] 65 80
[4,] 67 9
[5,] 49 22
What I would like to achieve should look like this:
[,1]
[1,] 54
[2,] 58
[3,] 99
[4,] 68
[5,] 65
[6,] 80
[7,] 67
[8,] 9
[9,] 49
[10,] 22
How do I then transform the original matrix to achieve the arrangement shown in the second matrix? Of course it's only an example, not a real data. Thanks for your help.
You can use c or as.vector on the transpose (t) of your matrix, like this:
c(t(m))
# [1] 54 58 99 68 65 80 67 9 49 22
Wrap it again in matrix if you want a single column matrix like you show (or, as noted in the comments, you can skip the c or as.vector at this stage since you're not supplying any dimensions to the matrix you are creating).
matrix(c(t(m)))
# [,1]
# [1,] 54
# [2,] 58
# [3,] 99
# [4,] 68
# [5,] 65
# [6,] 80
# [7,] 67
# [8,] 9
# [9,] 49
# [10,] 22
Now I'm doing it by looping trhough a sorted vector, but maybe there is a faster way using internal R functions, and maybe I don't even need to sort.
vect = c(41,42,5,6,3,12,10,15,2,3,4,13,2,33,4,1,1)
vect = sort(vect)
print(vect)
outvect = mat.or.vec(length(vect),1)
outvect[1] = counter = 1
for(i in 2:length(vect)) {
if (vect[i] != vect[i-1]) { counter = counter + 1 }
outvect[i] = counter
}
print(cbind(vect,outvect))
vect outvect
[1,] 1 1
[2,] 1 1
[3,] 2 2
[4,] 2 2
[5,] 3 3
[6,] 3 3
[7,] 4 4
[8,] 4 4
[9,] 5 5
[10,] 6 6
[11,] 10 7
[12,] 12 8
[13,] 13 9
[14,] 15 10
[15,] 33 11
[16,] 41 12
[17,] 42 13
The code is used to make charts with integers on the X axis instead of real data because for me distance between the X values is not important.
So in my case the smallest x value is always 1. and the largest is always equal to how many X values are there.
-- edit: due to some misuderstanding about my question I added self sufficient code with output.
That's more clear. Hence :
> vect = c(41,42,5,6,3,12,10,15,2,3,4,13,2,33,4,1,1)
> cbind(vect,as.numeric(factor(vect)))
[1,] 41 12
[2,] 42 13
[3,] 5 5
[4,] 6 6
[5,] 3 3
[6,] 12 8
[7,] 10 7
[8,] 15 10
[9,] 2 2
[10,] 3 3
[11,] 4 4
[12,] 13 9
[13,] 2 2
[14,] 33 11
[15,] 4 4
[16,] 1 1
[17,] 1 1
No sort needed. And as said, see also ?factor
and if you want to preserve the order, then:
> cbind(vect,as.numeric(factor(vect,levels=unique(vect))))
vect
[1,] 41 1
[2,] 42 2
[3,] 5 3
[4,] 6 4
[5,] 3 5
[6,] 12 6
[7,] 10 7
[8,] 15 8
[9,] 2 9
[10,] 3 5
[11,] 4 10
[12,] 13 11
[13,] 2 9
[14,] 33 12
[15,] 4 10
[16,] 1 13
[17,] 1 13
Joris solution is right on, but if you have a long vectors, it is a bit (3x) more efficient to use match and unique:
> x=sample(1e5, 1e6, replace=TRUE)
> # preserve order:
> system.time( a<-cbind(x, match(x, unique(x))) )
user system elapsed
0.20 0.00 0.22
> system.time( b<-cbind(x, as.numeric(factor(x,levels=unique(x)))) )
user system elapsed
0.70 0.00 0.72
> all.equal(a,b)
[1] TRUE
>
> # sorted solution:
> system.time( a<-cbind(x, match(x, sort(unique(x)))) )
user system elapsed
0.25 0.00 0.25
> system.time( b<-cbind(x, as.numeric(factor(x))) )
user system elapsed
0.72 0.00 0.72
> all.equal(a,b)
[1] TRUE
You can try this :
(Note that you may want a different behaviour for repeated values. This will give each value a unique rank)
> x <- sample(size=10, replace=T, x=1:100)
> x1 <- vector(length=length(x))
> x1[order(x)] <- 1:length(x)
> cbind(x, x1)
x x1
[1,] 40 1
[2,] 46 4
[3,] 43 3
[4,] 41 2
[5,] 47 5
[6,] 84 10
[7,] 75 8
[8,] 60 7
[9,] 59 6
[10,] 80 9
It looks like you are counting runs in the data, if that is the case, look at the rle function.
You apparently want the results of something like table() but lined up next to the values: Try using the ave() function:
csvdata$counts <- ave(csvdata[, "X"], factor(csvdata[["X"]]), FUN=length)
The trick here is that the syntax of ave is a bit different than tapply because you put in an arbitrarily long set of factor arrguments and you need to put in the FUN= in front of the function because the arguments after triple dots are not process by order. They need to be named.