R function with loop append avoiding for (using lapply instead) - r

I have heard that it is not recommended to use for loops in R mainly because it is slow. I have heard that I should use lapply instead because it's calling C for efficiency.
Question: Would it be possible to show me how to transform the following example into a lapply efficient code (or any other apply sapply from the same family)?
myFun <- function(loop){
result = data.frame() #init new df
for(iteration in 1:loop){
generateRnorm1 = matrix(data = rnorm(n = 1000000), nrow = 10000, ncol = 10000)
generateRnorm2 = matrix(data = rnorm(n = 1000000), nrow = 10000, ncol = 10000)
iterationResult = sum(generateRnorm1, generateRnorm2)
bindIterationResult = cbind(iteration, iterationResult)
result = rbind(result, bindIterationResult)
}
return(result)
}
test = myFun(loop = 10)

Here is an lapply method:
myFun2 <- function(loop){
generateRnorm1 = matrix(data = rnorm(n = 1000000), nrow = 10000, ncol = 10000)
generateRnorm2 = matrix(data = rnorm(n = 1000000), nrow = 10000, ncol = 10000)
sum(generateRnorm1, generateRnorm2)
}
# run function over 1:10
myList <- lapply(seq.int(10), myFun2)
# rbind the resulting list
result2 <- do.call(rbind, myList)
Note that there isn't much (if any) speed increase, because the body of your function takes a long time to execute. This swamps any potential speed up with lapply.
On my computer, both methods take about 20 seconds to run.

Related

Process sets of rasters in parallel using lapp function from terra package

I have groups of rasters that I want to run a function on, I think probably using the lapp function from the {terra} package? Here is a simple example using toy data of the 'style' of thing I am hoping to accomplish.
library("terra")
rp10val = 106520
rp20val = 106520
rp50val = 154250
rp100val = 154250
rp200val = 154250
rp500val = 154250
rp1500val = 154250
sopval = 200
rp_10_vul = rast(nrow = 10, ncol = 10, vals = rep(rp10val, 10))
rp_20_vul = rast(nrow = 10, ncol = 10, vals = rep(rp20val, 10))
rp_50_vul = rast(nrow = 10, ncol = 10, vals = rep(rp50val, 10))
rp_100_vul = rast(nrow = 10, ncol = 10, vals = rep(rp100val, 10))
rp_200_vul = rast(nrow = 10, ncol = 10, vals = rep(rp200val, 10))
rp_500_vul = rast(nrow = 10, ncol = 10, vals = rep(rp500val, 10))
rp_1500_vul = rast(nrow = 10, ncol = 10, vals = rep(rp1500val, 10))
sop_tile = rast(nrow = 10, ncol = 10, vals = rep(sopval, 10))
input_raster_group <- c(rp_10_vul, rp_20_vul, rp_50_vul, rp_100_vul,
rp_200_vul, rp_500_vul, rp_1500_vul, sop_tile)
## In real world each of these lists would have rasters with different data in
input_raster_lists <- list(list(input_raster_group),
list(input_raster_group),
list(input_raster_group))
mcmapply(lapp,
input_raster_lists,
function(a,b,c,d,e,f,g,h){a+b+c+d+e+f+g+h},
mc.cores = 2)
## If working on windows, this might be better to try and run as proof of concept
# mapply(lapp,
# input_raster_lists,
# function(a,b,c,d,e,f,g,h){(a+b-c) / (d+e+f+g+h)})
Simplified data to make this easier to read
library("terra")
r10 = rast(nrow = 10, ncol = 10, vals = 10)
r20 = rast(nrow = 10, ncol = 10, vals = 20)
r50 = rast(nrow = 10, ncol = 10, vals = 50)
group <- c(r10, r20, r50)
input <- list(group, group, group)
You can use lapply to compute lists sequentially
x <- lapply(input, \(i) sum(i))
y <- lapply(input, \(i) app(i, sum))
z <- lapply(input, \(i) lapp(i, function(a,b,c){a+b+c}))
To use parallelization you could use e.g. parallel::parLapply or, as in your case, parallel::mcmapply.
SpatRaster objects hold a pointer (reference) to a C++ object that cannot be passed to a worker. Therefore you would need to use wrap and unwrap as I show below. I use proxy=TRUE to not force values to memory.
library(parallel)
inp <- lapply(input, \(x) wrap(x, proxy=TRUE))
f <- \(i) { unwrap(i) |> sum() |> wrap(proxy=TRUE)}
b <- mcmapply(f, inp)
out <- lapply(b, unwrap)
This approach may be useful in some cases, e.g. when you have to do many simulations on a relatively small raster that is memory.
In most cases you would do parallelization because you are dealing with large rasters that are on disk. In that case, you could just send the filenames to the workers, and create the SpatRasters there (and write the output to disk).
There is more discussion here

build matrix in a for loop automatically in R

Suppose I have a code like this
probv=c(0.5,0.1,0.2,0.3)
N=c(1,2,3,4)
g1=matrix(rbinom(n = 10, size = N[1], prob = probv[1]), nrow=5)
g2=matrix(rbinom(n = 10, size = N[2], prob = probv[2]), nrow=5)
g3=matrix(rbinom(n = 10, size = N[3], prob = probv[3]), nrow=5)
g4=matrix(rbinom(n = 10, size = N[4], prob = probv[4]), nrow=5)
I want to use a for loop
for i in (1:J)
{......} J=4 in this case
use one line function to return the same output like this, I want to know
how I create a matrix g_ in the loop
which is also benefit for me when I increase the length
of my vector into 5,6,7......
for example N=c(1,2,3,4,5) probv=c(0.5,0.1,0.2,0.3,0.5)
I do not change my code to create another matrix called g5.The code can create it and I just need to change my input to achieve my goal
Thanks Akrun
what is my N is a three dimensional array, I want to map the last dimension of it? How to change in the map method?
probv=c(0.5,0.1,0.2,0.3)
N=array(1:24,c(3,2,4))
g1=matrix(rbinom(n = 10, size = N[,,1], prob = probv[1]), nrow=5)
g2=matrix(rbinom(n = 10, size = N[,,2], prob = probv[2]), nrow=5)
g3=matrix(rbinom(n = 10, size = N[,,3], prob = probv[3]), nrow=5)
g4=matrix(rbinom(n = 10, size = N[,,4], prob = probv[4]), nrow=5)
We can use Map to loop over the 'N' and 'probv' vector, get the corresponding values into rbinom and create a matrix. It returns a list of matrices
lst1 <- Map(function(x, y) matrix(rbinom(n = 10,
size = x, prob = y), nrow = 5), N, probv)
Or using for loop
lst2 <- vector('list', length(N))
for(i in seq_along(N)) {
lst2[[i]] <- matrix(rbinom(n = 10, size = N[i], prob = probv[i]), nrow = 5)
}
names(lst2) <- paste0("g", seq_along(lst2))
For the updated question to extract from an array
mnLength <- min(length(probv), dim(N)[3])
lst2 <- vector('list', mnLength)
for(i in seq_len(mnLength)) {
lst2[[i]] <- matrix(rbinom(n = 10, size = N[,,i], prob = probv[i]), nrow = 5)
}
names(lst2) <- paste0("g", seq_along(lst2))
lst2$g1
lst2$g2

arima.sim() function with varying: sample sizes, phi values and sd values

I want to simulate ARIMA(1,1,0) with varying:
sample sizes
phi values
standard deviation values.
I admire how the bellow r code is simulating just one ARIMA(1,1,0) which I want to follow the format to simulate many ARIMA(1,1,0) with varying sample sizes, phi values and standard deviation values
wn <- rnorm(10, mean = 0, sd = 1)
ar <- wn[1:2]
for (i in 3:10){
ar<- arima.sim(n=10,model=list(ar=-0.7048,order=c(1,1,0)),start.innov=4.1,n.start=1,innov=wn)
}
I have asked a similar question here and given a good answer based on my question, but now I see that arima.sim() function is indispensable in simulating ARIMA time series and therefore want to incorporate it into my style of simulating ARIMA time series.
I come up with this trial that uses arima.sim() function to simulate N=c(15, 20) ARIMA(1,1,0) time series with varying sample sizes, standard deviation values and phi values by first generating N random number and then using the initial two random number to be the first two ARIMA(1,1,0). The 3rd to **n**th are the made to followARIMA(1,1,0)`.
Here is what I have tried bellow:
N <- c(15L, 20L)
SD = c(1, 2) ^ 2
phi = c(0.2, 0.4)
res <- vector('list', length(N))
names(res) <- paste('N', N, sep = '_')
set.seed(123L)
for (i in seq_along(N)){
res[[i]] <- vector('list', length(SD))
names(res[[i]]) <- paste('SD', SD, sep = '_')
ma <- matrix(NA_real_, nrow = N[i], ncol = length(phi))
for (j in seq_along(SD)){
wn <- rnorm(N[i], mean = 0, sd = SD[j])
ar[[1:2, ]] <- wn[[1:2]]
for (k in 3:N[i]){
ar[k, ] <- arima.sim(n=N[[i]],model=list(ar=phi[[k]],order=c(1,1,0)),start.innov=4.1,n.start=1,innov=wn)
}
colnames(ar) <- paste('ar_theta', phi, sep = '_')
res[[i]][[j]] <- ar
}
}
res1 <- lapply(res, function(dat) do.call(cbind, dat))
sapply(names(res1), function(nm) write.csv(res1[[nm]],
file = paste0(nm, ".csv"), row.names = FALSE, quote = FALSE))
The last two lines write the time series data in .csv and save it in my working directory.
Here may be a method using Map. Please edit your post to include expected output if this does not meet your requirements.
N <- c(15L, 20L)
SD <- c(1, 2) ^ 2
phi = c(0.2, 0.4)
## generate all combos
all_combos <- expand.grid(N = N, SD = SD, phi = phi)
## create function
fx_arima <- function(n, SD, phi) {
arima.sim(n = n,
model=list(ar=phi, order = c(1, 1, 0)),
start.innov = 4.1,
n.start = 1,
rand.gen = function(n) rnorm(n, mean = 0, sd = SD))[-1L]
}
## find arima for all combos using Map
set.seed(123L)
res = Map(fx_arima, all_combos[["N"]], all_combos[["SD"]], all_combos[["phi"]])
## or a little bit more work:
set.seed(123L)
res2 = by(all_combos, all_combos["N"],
function(DF) {
res = mapply(fx_arima, DF[["N"]], DF[["SD"]], DF[["phi"]])
colnames(res) = paste("SD", DF[["SD"]], "phi", DF[["phi"]], sep = "_")
res
})
res2
## write to csv
Map(function(file, DF) write.csv(DF, paste0("N_", file, ".csv")), names(res2), res2)

Improve the performance of this script

Here is a piece of my code that I introduce in an R shiny application but which takes me a lot of time because I execute it in a reactive function which I then call five times for different graphics.
Do you have an idea to improve the speed of this script?
I have already tried to execute this with purr but I do not master this tool well enough.
Here is a reproducible example
library(profvis)
profvis({
#dataframe created for the example
DF<- data.frame("scan"=seq(1:7518),"dye1"=NA,"dye2"=NA,"dye3"=NA,"dye4"=NA,"dye5"=NA,"dye6"=NA)
DF$dye1 <- sample(100, size = nrow(DF), replace = TRUE)
DF$dye2 <- sample(100, size = nrow(DF), replace = TRUE)
DF$dye3 <- sample(100, size = nrow(DF), replace = TRUE)
DF$dye4 <- sample(100, size = nrow(DF), replace = TRUE)
DF$dye5 <- sample(100, size = nrow(DF), replace = TRUE)
DF$dye6 <- sample(100, size = nrow(DF), replace = TRUE)
#slowness begins here
for (d in 3000:7518){
#array of input data
input <- numeric(1206)
for (i in -100:100){
input[1+i+100] <- DF$dye1[d + i]
input[202+i+100] <- DF$dye2[d + i]
input[403+i+100] <- DF$dye3[d + i]
input[604+i+100] <- DF$dye4[d + i]
input[805+i+100] <- DF$dye5[d + i]
input[1006+i+100] <- DF$dye6[d + i]
}
}
})
First: please really reconsider what you want to achieve and whether this approach is the smartest way to achieve it..
Second: use vectorization to improve your performance:
d <- 3000
input <- numeric(1206)
microbenchmark::microbenchmark(
# loop as before
case1 = {for (i in -100:100){
input[1+i+100] <- DF$dye1[d + i]
}},
# use vectorization
case2 = {input[(1-100+100):(1+100+100)] <- DF$dye1[(d -100):(d +100)]}
)

Methods for iteratively changing matrices in a set in R

Currently I am working on an R project where I iteratively have to make a lot of small changes to the final output, which is stored in an self made Class. The calculation time of the problem becomes very large if the amount of iterations increase. Unfortunately, a more vectorized version of the code is not possible, because the future values to be changed depend on the current changes.
A small example of the problem that I encouter is given below. For the calculations using the Example_Class the calculations are significantly longer then for the instance where just a matrix is used. Are their methods available to speed up the calculations in R? Or should I look at extionsion to for example C++?
Example_Class <- setClass(
"Example",
slots = c(
slot_1 = "matrix",
slot_2 = "matrix"
),
prototype=list(
slot_1 = matrix(1, ncol = 1, nrow = 7),
slot_2 = matrix(1, ncol = 4, nrow = 7)
)
)
Example <- Example_Class()
example_matrix_1 <- matrix(1, ncol = 1, nrow = 7)
example_matrix_2 <- matrix(1, ncol = 4, nrow = 7)
example_list <- list(example_matrix_1, example_matrix_2)
profile <- microbenchmark::microbenchmark(
example_matrix_2[3,3] <- (example_matrix_2[3,3] + 1)/2,
example_list[[2]][3,3] <- (example_list[[2]][3,3] + 1)/2,
Example#slot_2[3,3] <- (Example#slot_2[3,3] + 1)/2,
times = 1000
)
profile

Resources