How to efficiently produce a desired matrix in R? - r

I was trying to produce the following 7 x 4 matrix in R:
m = matrix(c(seq(25, 1, by = -4),
seq(26, 2, by = -4),
seq(27, 3, by = -4),
seq(28, 4, by = -4)), nrow = 7, ncol = 4)
BUT, I'm wondering if could I achieve the same matrix with more efficient R code than what I used above?

Here's a solution:
m <- matrix(rev(c(1:28)),nrow=7,ncol=4,byrow = TRUE)[,rev(1:4)]
And this one is even faster:
m <- matrix(28:1,nrow=7,ncol=4,byrow = TRUE)[,4:1]

m = matrix(c(rep(seq(25, 1, by = -4),4) + rep(c(0:3),each=7) ), nrow = 7, ncol = 4)
Not sure if you would call this more efficient...

Related

Process sets of rasters in parallel using lapp function from terra package

I have groups of rasters that I want to run a function on, I think probably using the lapp function from the {terra} package? Here is a simple example using toy data of the 'style' of thing I am hoping to accomplish.
library("terra")
rp10val = 106520
rp20val = 106520
rp50val = 154250
rp100val = 154250
rp200val = 154250
rp500val = 154250
rp1500val = 154250
sopval = 200
rp_10_vul = rast(nrow = 10, ncol = 10, vals = rep(rp10val, 10))
rp_20_vul = rast(nrow = 10, ncol = 10, vals = rep(rp20val, 10))
rp_50_vul = rast(nrow = 10, ncol = 10, vals = rep(rp50val, 10))
rp_100_vul = rast(nrow = 10, ncol = 10, vals = rep(rp100val, 10))
rp_200_vul = rast(nrow = 10, ncol = 10, vals = rep(rp200val, 10))
rp_500_vul = rast(nrow = 10, ncol = 10, vals = rep(rp500val, 10))
rp_1500_vul = rast(nrow = 10, ncol = 10, vals = rep(rp1500val, 10))
sop_tile = rast(nrow = 10, ncol = 10, vals = rep(sopval, 10))
input_raster_group <- c(rp_10_vul, rp_20_vul, rp_50_vul, rp_100_vul,
rp_200_vul, rp_500_vul, rp_1500_vul, sop_tile)
## In real world each of these lists would have rasters with different data in
input_raster_lists <- list(list(input_raster_group),
list(input_raster_group),
list(input_raster_group))
mcmapply(lapp,
input_raster_lists,
function(a,b,c,d,e,f,g,h){a+b+c+d+e+f+g+h},
mc.cores = 2)
## If working on windows, this might be better to try and run as proof of concept
# mapply(lapp,
# input_raster_lists,
# function(a,b,c,d,e,f,g,h){(a+b-c) / (d+e+f+g+h)})
Simplified data to make this easier to read
library("terra")
r10 = rast(nrow = 10, ncol = 10, vals = 10)
r20 = rast(nrow = 10, ncol = 10, vals = 20)
r50 = rast(nrow = 10, ncol = 10, vals = 50)
group <- c(r10, r20, r50)
input <- list(group, group, group)
You can use lapply to compute lists sequentially
x <- lapply(input, \(i) sum(i))
y <- lapply(input, \(i) app(i, sum))
z <- lapply(input, \(i) lapp(i, function(a,b,c){a+b+c}))
To use parallelization you could use e.g. parallel::parLapply or, as in your case, parallel::mcmapply.
SpatRaster objects hold a pointer (reference) to a C++ object that cannot be passed to a worker. Therefore you would need to use wrap and unwrap as I show below. I use proxy=TRUE to not force values to memory.
library(parallel)
inp <- lapply(input, \(x) wrap(x, proxy=TRUE))
f <- \(i) { unwrap(i) |> sum() |> wrap(proxy=TRUE)}
b <- mcmapply(f, inp)
out <- lapply(b, unwrap)
This approach may be useful in some cases, e.g. when you have to do many simulations on a relatively small raster that is memory.
In most cases you would do parallelization because you are dealing with large rasters that are on disk. In that case, you could just send the filenames to the workers, and create the SpatRasters there (and write the output to disk).
There is more discussion here

Multidimensional vectorization of a for loop in R for two 2D arrays? I tried multiApply

I am trying to perform a multidimensional vectorization in R instead of using a for loop. I have two 2D matrices A and W, that I pass to crit.func(A, W).
The original for loop effectively iterates over versions of A and W:
for(current.couple in 1:nrow(couples)){
a_current <- current.rows[-which(current.rows == couples$current[current.couple])]
a_candidate <- couples$candidate[current.couple]
A <- A.use[ c(a_current, a_candidate),]
W <- W.use[ c(a_current, a_candidate), c(a_current, a_candidate)]
couples$D[ current.couple] <- crit.func(A, W)
}
What I would like to do instead for speed is create a vectorized version. My idea is to stack all versions of A and W to form two 3D arrays and then use the 3rd dimension, the depth, as the vectorized dimension. For example, let's say I have the following A and W matrices:
A1 <- matrix(c(2.4, 5.2, 8.4, 3.1, 6.05, 9.25), nrow = 2,ncol = 3, byrow = TRUE)
A2 <- matrix(c(4.5, 7.5, 10.5, 3.2, 6.2, 9.2), nrow = 2, ncol = 3, byrow = TRUE)
A3 <- matrix(c(2.1, 5, 8.2, 3.05, 6.02, 9.1), nrow = 2,ncol = 3, byrow = TRUE)
A4 <- matrix(c(4.12, 7.31, 10.3, 3.23, 6.1, 9), nrow = 2, ncol = 3, byrow = TRUE)
W1 <- matrix(c(1, 4, 2, 5), nrow = 2, ncol = 2, byrow = TRUE)
W2 <- matrix(c(9, 6, 8, 5), nrow = 2, ncol = 2, byrow = TRUE)
W3 <- matrix(c(1, 4.2, 2.2, 5.2), nrow = 2, ncol = 2, byrow = TRUE)
W4 <- matrix(c(9.05, 6.011, 8.3, 5.2), nrow = 2, ncol = 2, byrow = TRUE)
I would then form the 3D arrays with:
# Z stack all of the A options
A_append <- array(c(A1, A2, A3, A4), c(2, 3, 4))
# Z stack all of the W options
W_append <- array(c(W1, W2, A3, A4), c(2, 2, 4))
If crit.func() takes the determinant so that:
crit.func <- function( A, W){
return( det( t(A) %*% W %*% A))
}
The expected result for a vectorized solution will be:
[2.095476e-12, 0, -7.067261e-12, 7.461713e-12].
What I have tried to do is use the package multiApply
library(multiApply)
A_append <- provideDimnames(A_append ,sep = "_", base = list('row','col','lev'))
W_append <- provideDimnames(W_append ,sep = "_", base = list('row','col','lev'))
# multiApply
D <- Apply(data = list(A_append, W_append), target_dims = c(1, 2, NULL), margins = 3, fun = crit.func)$output1
but I do not get the correct output (see below). I believe that first using list(A_append, W_append) as I did is not giving the behavior I want, and I somehow have to name the dimensions in another way as I get the following warning:
"Guessed names for some unnamed dimensions of equal
length found across different inputs in 'data'. Please
check carefully the assumed names below are correct, or
provide dimension names for safety, or disable the
parameter guess_dim_names."
Input 1:
_unnamed_dim_1_ _unnamed_dim_2_ _unnamed_dim_3_
2 3 4
Input 2:
_unnamed_dim_1_ _unnamed_dim_4_ _unnamed_dim_3_
2 2 4
[1] "The output of multiApply:"
[1] 2.095476e-12 0.000000e+00 4.562232e-12 -1.450281e-11
Does anybody know of either a better way to vectorize this for loop to get the expected behavior? Or, can you see how to change the arguments I provided to multiApply's Apply() to correctly pass (A_append[, ,i], W_append[,,i]) to crit.func()?
It may be simpler to use lists to store your matrices:
A <- list(A1, A2, A3, A4)
W <- list(W1, W2, W3, W4)
mapply(crit.func, A, W)
# [1] 1.850935e-12 0.000000e+00 6.025116e-12 -8.291046e-13
These numbers do not match your expected values, but they seem to be correct for your data:
crit.func(A1, W1)
# [1] 1.850935e-12
crit.func(A2, W2)
# [1] 0
crit.func(A3, W3)
# [1] 6.025116e-12
crit.func(A4, W4)
# [1] -8.291046e-13

Unlist LAST level of a list in R

I have a list of list like ll:
ll <- list(a = list(data.frame(c = 1, d = 2), data.frame(h = 3, j = 4)), b = list(data.frame(c = 5, d = 6), data.frame(h = 7, j = 9)))
I want to unnest/unlist the last level of the structure (the interior list). Note that every list contains the same structure. I want to obtain lj:
lj <- list(a = (data.frame(c = 1, d = 2, h = 3, j = 4)), b = data.frame(c = 5, d = 6, h = 7, j = 9))
I have tried the following code without any success:
lj_not_success <- unlist(ll, recursive = F)
However, this code unlists the FIRST level, not the LAST one.
Any clue?
We may need to cbind the inner list elements instead of unlisting as the expected output is a also a list of data.frames
ll_new <- lapply(ll, function(x) do.call(cbind, x))
-checking
> identical(lj, ll_new)
[1] TRUE

Correlation between variables under the for loop

I have an issue that is shown below. I tried to solve it but was not successful. I have a dataframe df1. I need to make a table of correlation between the variables within a for loop. Reason being I do not want to make the code look long and complicated.
df1 <- structure(list(a = c(1, 2, 3, 4, 5), b = c(3, 5, 7, 4, 3), c = c(3,
6, 8, 1, 2), d = c(5, 3, 1, 3, 5)), class = "data.frame", row.names =
c(NA, -5L))
I tried with the below code using 2 for loops
fv <- as.data.frame(combn(names(df1),2,paste, collapse="&"))
colnames(fv) <- "ColA"
fv$ColB <- sapply(strsplit(fv$ColA,"\\&"),'[',1)
fv$ColC <- sapply(strsplit(fv$ColA,"\\&"),'[',2)
asd <- list()
for (i in fv$ColB) {
for (j in fv$ColC) {
asd[i,j] <- as.data.frame(cor(df1[,i],df1[,j]))}}
May I know what wrong I am doing
We can apply cor directly on the data.frame and convert to 'long' format with melt. As the values in the lower triangular part is the mirror values of those in the upper triangular part, either one of these can be assigned to NA and then do the melt
library(reshape2)
out[lower.tri(out, diag = TRUE)] <- NA
melt(out, na.rm = TRUE)

Generate data matrix using the given data matrix in R

Suppose I have a 200 x 200 matrix using the following r function.
n = 200
sim.mat = matrix(NA, nrow = n, ncol = n)
for (i in 1:n){
sim.mat[,i] = c(rnorm(i, 2, 1), rnorm(n-i, 3, 1))
}
How can I generate 200 x 1000matrix using the same c(rnorm(i, 2, 1), rnorm(n-i, 3, 1)) setting?
Thank you.

Resources