I am trying to convert a for loop which I am currently using to run a process across a large matrix. The current for loop finds the maximum value within a 30 x 30 section and creates a new matrix with the maximum value.
The current code for the for loop looks like this:
mat <- as.matrix(CHM) # CHM is the original raster image
maxm <- matrix(nrow=nrow(mat)/30, ncol=ncol(mat)/30) # create new matrix with new dimensions
for(i in 1:dim(maxm)[1]) {
for(j in 1:dim(maxm)[2]) {
row <- 30 * (i - 1) + 1
col <- 30 * (j - 1) + 1
maxm[i,j] <- max(CHM[row:(row + 29), col:(col + 29)])
}
}
I want to convert this to a foreach loop to use parallel processing. I've got as far as producing the following code but this dosent work. I'm not sure how to produce the new matrix within the foreach loop:
ro<-nrow(mat)/30
co<-ncol(mat)/30
maxm <- matrix(nrow=nrow(mat)/30, ncol=ncol(mat)/30)
foreach(i=ro, .combine='cbind') %:%
foreach(j=co, .combine='c') %dopar% {
row <- 30 * (i - 1) + 1
col <- 30 * (j - 1) + 1
maxm[i,j]<-(max(CHM[row:(row + 29), col:(col + 29)]))
}
Any suggestions please!
Prior to performing any action in parallel, one should try to see if any vectorizing is possible. And once that is done question 'is parallelization reasonable?'
In this specific example, parallelization is unlikely to be as fast as you expect, as at each iteration you are saving your output into a common object. R does not commonly support this in parallelization, and instead one should seek parallelization in the so called 'embarrassingly parallel-able' problems, until one gets a better understanding of how parallel problems work. In short: Don't perform parallel changes to data in R, unless you know what you're doing. It is unlikely to be faster.
That said in your case it actually becomes quite tricky. You seem to be performing a 'rolling-max window', and the output should be saved in a combined matrix. An alternative method to saving the data directly int othe matrix, is to return a matrix with 3 columns x, i, j, where the latter two are indices that indicate which row/column the value of x should be placed in.
In order for this to work, as Dmitriy noted in his answer, the data needs to be exported to each cluster (parallel session), such that we can use it. Afterwards the following example shows how one can perform the parallization
First: Create a cluster and export the dataset
set.seed(1)
#Generate test example
n <- 3000
dat <- matrix(runif(n^2), ncol = n)
library(foreach)
library(doParallel)
#Create cluster
cl <- parallel::makeCluster(parallel::detectCores())
#Register it for the foreach loop
doParallel::registerDoParallel(cl)
#Export the dataset (could be done directly in the foreach, but this is more explicit)
parallel::clusterExport(cl, "dat")
Next we come to the foreach loop. Note that according to the documentation, nested foreach loops should be seperated using the %:% tag, as shown in my example below:
output <- foreach(i = 1:(nrow(dat)/30), .combine = rbind, .inorder = FALSE) %:%
foreach(j = 1:(ncol(dat)/30), .combine = rbind, .inorder = FALSE) %dopar%{
row <- 30 * (i - 1) + 1
col <- 30 * (j - 1) + 1
c(x = max(dat[row:(row + 29), col:(col + 29)]), i = i, j = j)
}
Note the .inorder = FALSE. As i return the indices i dont care about order, only about speed.
Last but not least, we need to create the matrix. The Matrix package function Matrix::SparseMatrix allows for specifying values and indices.
output <- Matrix::sparseMatrix(output[,"i"], output[,"j"], x = output[,"x"])
This is still rather slow. For n = 3000 it took roughly 6 seconds to perform calculations + a not-insignificant overhead from exporting the data. But it is likely faster than the same method using sequential loops.
Let me try to get an answer here.
As I know, R use cluster system for parallel computation, each node works with an own environment. So, foreach-%dopar%, firstly, copy all current .globalEnv to the each cluster node and after that tried to run your code which is written in the cycle body. With no backcopy after code execution. You'll just get only a result by result = foreach(...) { }. So, the code maxm[i,j]<-(max(CHM[row:(row + 29), col:(col + 29)])) in the each node changes only local copy of your matrix, nothing more.
So, the "correct" code, probably, will be like this:
mat <- as.matrix(CHM);
ro<-nrow(mat)/30;
co<-ncol(mat)/30;
maxm = foreach(i=1:ro, .combine='cbind') %:%
{
result = foreach(j = 1:co, .combine='c') %dopar%
{
row <- 30 * (i - 1) + 1;
col <- 30 * (j - 1) + 1;
max(CHM[row:(row + 29), col:(col + 29)]);
}
result;
}
Maybe it also be need to use as.matrix for maxm.
Related
I'm trying to obtain the return of daily prices for each stock I have. The data is cross-sectionnal and very large. Thus I use doParallel and nested foreach.
Here is the code I've been using so far. (this is a reproduceable example)
and here is a reproduce-able example
stock_name <- as.data.frame(sample(x = 1:100, size = 250, replace=TRUE))
price <- as.data.frame(sample(x = 1:100, size = 1000,replace=TRUE))
## Calculating daily returns.
stock_list<-as.tbl(distinct(stock_name))
numStock<- as.integer(count(stock_list)) #150 #as.integer(count(stock_list))
nCPUcores = detectCores()
if (nCPUcores < 3) {
registerDoSEQ()
}else{
cl = makeCluster(nCPUcores - 1)
registerDoParallel(cl)
}
d_ret<-c()
foreach (stock=1:numStock, .packages = c("doParallel","foreach","data.table","plyr","dplyr")) %dopar%{
s<-as.integer(unlist(stock_list[stock,]))
stock_price <- as.matrix(price[which(stock_name[1,]==s),])
u<-nrow(stock_price)
d_ret<-foreach (p=2:u) %:%{
c(d_ret,(stock_price[p,]-stock_price[p-1,])/stock_price[p-1,])
}
}
stopCluster(cl)
##--
But the code doesn't work. After Florian Prive's remark, I checked the library and it seems that I should write nested foreach loops like this:
x <- foreach(b=bvec, .combine='cbind') %:%
foreach(a=avec, .combine='c') %dopar% {
sim(a, b)
}
So what I understand is I shouldn't be writing anything between %:% and the second foreach.
However, in my case, the second loop would change with the first foreach because there aren't the same number of prices for each stocks. Therefore I can't just write ' foreach(a=avec) '.
The second foreach would ideally depend on variable u
u<-nrow(stock_price)
Is this even possible with the foreach library?
Thank you for the help
few days ago I ask this topic about calling a custom made function within a loop that was well resolved by a combination of
eval(parse(text = Function text))
here is the link: Automatic creation and use of custom made function in R.
This allowed me to work with for loop and call automatically the function I need from a Data frame storing the body of the function to create.
Now I would like to bring the question to a next level. My problem is computation time. I need to evaluate something like 52 indices from a hyperspectrial image. this means that in R my hyperspectral image is loaded as a 3d array of 512x512x204 bands.
what I would like to do is run the evaluation of the indices in parallel to reduce the computation time.
here a dummy example to what I would like to emulate, but not in parallel computing.
# create a fake matrix rappresenting my Hyperpectral image
HYPR_IMG=array(NA,dim=c(5,3,4))
HYPR_IMG[,,1]=1
HYPR_IMG[,,2]=2
HYPR_IMG[,,3]=3
HYPR_IMG[,,4]=4
image.plot(HYPR_IMG[,,1], zlim=c(0,20))
image.plot(HYPR_IMG[,,2], zlim=c(0,20))
image.plot(HYPR_IMG[,,3], zlim=c(0,20))
image.plot(HYPR_IMG[,,4], zlim=c(0,20))
#create a fake DF for simulating my indices stored in the dataframe
IDXname=c("IDX1","IDX2","IDX3","IDX4")
IDXFunc=c("HYPR_IMG[,,1] + 3*HYPR_IMG[,,2]",
"HYPR_IMG[,,3] + HYPR_IMG[,,2]",
"HYPR_IMG[,,4] + HYPR_IMG[,,2] - HYPR_IMG[,,3]",
"HYPR_IMG[,,1] + HYPR_IMG[,,4] + 4*HYPR_IMG[,,2] + HYPR_IMG[,,3]")
IDX_DF=as.data.frame(cbind(IDXname,IDXFunc))
# that was what I did before
Store_DF=data.frame(NA)
for (i in 1: length(IDX_DF$IDXname)) {
IDX_ID=IDX_DF$IDXname[i]
IDX_Fun_tmp=IDX_DF$IDXFunc[which(IDX_DF$IDXname==IDX_ID)] #use for extra care to select the right fuction
IDXFunc_call=paste("IDXfun_tmp=function(HYPR_IMG){",IDX_Fun_tmp,"}",sep="")
eval(parse(text = IDXFunc_call))
IDX_VAL=IDXfun_tmp (HYPR_IMG)
image.plot(IDX_VAL,zlim=c(0,20)); title(main=IDX_ID)
temp_DF=as.vector(IDX_VAL)
Store_DF=cbind(Store_DF,temp_DF)
names(Store_DF)[i+1] <- as.vector(IDX_ID)
}
my final goal is to have the very same Store_DF ,storing all the Indices value. Here I have a for loop but using a foreach loop things should speed up. if needed I am working with windows 8 or more as OS.
Is it really possible ?
Will I be able at the end, to reduce the overall computational time having the same Store_DF dataframe or somthing simlar like a matrix with the columns names?
Thanks a lot!!!
For the specific example using either the build in parallelization of a package like data.table or a parallel apply might be more beneficial.
Below is a minimal example of how to achieve the results using a parApply from the parallel package. Note the output is a matrix, which actually yields slightly better performance in base R (not the case necessarily in tidyverse or data.table). In case the data.frame structure is vital you will have to convert it with data.frame.
cl <- parallel::makeCluster( parallel::detectCores() )
result <- parallel::parApply(cl = cl, X = IDX_DF, MARGIN = 1, FUN = function(x, IMAGES){
IDX_ID <- x[["IDXname"]]
eval(parse(text = paste0("IDXfun_tmp <- function(HYPR_IMG){", x[["IDXFunc"]], "}")))
IDX_VAL <- as.vector(IDXfun_tmp(IMAGES))
names(IDX_VAL) <- IDX_ID
IDX_VAL
}, IMAGES = HYPR_IMG)
colnames(result) = IDXname
IDXname
parallel::stopCluster(cl)
Please note the stopCluster(cl) which is important to shut down any loose R sessions.
Benchmark results (4 tiny cores):
Unit: milliseconds
expr min lq mean median uq max neval
Loop 8.420432 9.027583 10.426565 9.272444 9.943783 26.58623 100
Parallel 1.382324 1.491634 2.038024 1.554690 1.907728 18.23942 100
For replications of benchmarks the code has been provided below:
cl <- parallel::makeCluster( parallel::detectCores() )
microbenchmark::microbenchmark(
Loop = {
Store_DF=data.frame(NA)
for (i in 1: length(IDX_DF$IDXname)) {
IDX_ID = IDX_DF$IDXname[i]
IDX_Fun_tmp = IDX_DF$IDXFunc[which(IDX_DF$IDXname == IDX_ID)] #use for extra care to select the right function
eval(parse(text = paste0("IDXfun_tmp = function(HYPR_IMG){", IDX_Fun_tmp, "}")))
IDX_VAL = IDXfun_tmp(HYPR_IMG)
#Plotting in parallel is not a good idea. It will most often not work but might make the R session crash or slow down significantly (at best the latter at worst the prior)
#image.plot(IDX_VAL, zlim = c(0,20)); title(main = IDX_ID)
temp_DF = as.vector(IDX_VAL)
Store_DF = cbind(Store_DF,temp_DF)
names(Store_DF)[i+1] <- as.vector(IDX_ID)
}
rm(Store_DF)
},
Parallel = {
result <- parallel::parApply(cl = cl, X = IDX_DF, MARGIN = 1, FUN = function(x, IMAGES){
IDX_ID <- x[["IDXname"]]
eval(parse(text = paste0("IDXfun_tmp <- function(HYPR_IMG){", x[["IDXFunc"]], "}")))
IDX_VAL <- as.vector(IDXfun_tmp(IMAGES))
names(IDX_VAL) <- IDX_ID
IDX_VAL
}, IMAGES = HYPR_IMG)
colnames(result) = IDXname
rm(result)
}
)
parallel::stopCluster(cl)
Edit: Using the foreach package
After a few comments about performance issues (maybe due to memory), i decided to make an illustration of how one could obtain the same result using the foreach package. A few notes:
The foreach package uses iterators. As standard it can be used like a for loop, where it will iterate over each column in a data.frame
Like other parallel implementations in R, if you are on Windows, often you will have to export the data used for calculations. It can sometimes be avoided with some fiddling and foreach sometimes will let you not export data. When this is, is unclear from the documentation.
The output of the foreach will be combined either as a list or as defined by the .combine argument, which can be rbind, cbind or any other function.
There is a lot of comments, making the code seem alot longer than it actually it. Removing comments and blank lines, it is 9 lines longer.
Below is the code which will yield the same output as above. Note i have used the data.table package. For more information about this package i suggest their wikipedia on github.
cl <- parallel::makeCluster( parallel::detectCores() )
#Foeach uses doParallel for the parallization
doParallel::registerDoParallel(cl)
#To iterate over the rows, we need to use iterators
# if foreach is given a matrix it will be converted to a column iterators
rowIterator <- iterators::iter(IDX_DF, by = "row")
library(foreach)
result <-
foreach(
#Supply the iterator
row = rowIterator,
#Specify if the calculations needs to be in order. If not then we can get better performance not doing so
.inorder = FALSE,
#In most foreach loops you will have to export the data you need for the calculations
# it worked without doing so for me, in which case it is faster if the exported stuff is large
#.export = c("HYPR_IMG"),
#We need to say how the output should be merged. If nothing is given it will be output as a list
#data.table rbindlist is faster than rbind (returns a data.table)
.combine = function(...)data.table::rbindlist(list(...)) ,
#otherwise we could've used:
#.combine = rbind
#if we dont use rbind or cbind (i used data.table::rbindlist for speed)
# we will have to tell if it can take more than 1 argument
.multicombine = TRUE
) %dopar% #Specify how to do the calculations. %do% loop. %dopar% parallel loop. %:% nested loops (next foreach tells how we do the loop)
{ #to do stuff in parallel we use the %dopar%. Alternative %do%. For multiple foreach we split each of them by %:%
IDX_ID <- row[["IDXname"]]
eval(parse(text = paste0("IDXfun_tmp <- function(HYPR_IMG){", row[["IDXFunc"]], "}")))
IDX_VAL <- as.vector(IDXfun_tmp(HYPR_IMG))
data.frame(ID = IDX_ID, IDX_VAL)
}
#output is saved in result
result
result_reformatted <- dcast(result[,indx := 1:.N, by = ID],
indx~ID,
value.var = "IDX_VAL")
#if we dont want to use data.table we could use unstack instead
unstack(test, IDX_VAL ~ ID)
I have a very huge list ( huge_list ) . A function (inner_fun) is called for each value of the list. Inner_fun takes around .5 seconds.output of inner_fun is a simple numeric vector of size 3. I am trying to parallelise this approach. After going through many articles , it was mentioned that it is better to divide in chunks when the parallel function is very quick. So i divided it based on cores. But there is no time benefit. I am not able to understand the concept here . Can anyone give few insights on this. My major concern is that if i am doing something wrong with the code. I am not posting exact codes here. but i have tried to replicate as much as possible
few observations :
dummy_fun and dummy_fun2 takes around 10 hrs with parallel kept as
11
with no parallel , this goes around 20 hrs.
with parallel=2 ,it completes in 15 hrs
I am using 12 cores , 60 GB RAM , ubuntu machine
Code to make cluster
no_of_clusters<-detectCores()-1
cl <- makeCluster(no_of_clusters) ; registerDoParallel(cl) ;
clusterExport(cl, varlist=c("arg1","arg2","inner_fun"))
Function without chunks
dummy_fun<-function(arg1,arg2,huge_list){
g <- foreach (i = 1: length(huge_list),.combine=rbind,
.multicombine=TRUE) %dopar% {
inner_fun(i,arg1,arg2,huge_list[i])
}
return(g)
}
**Functions with chunks **
dummy_fun2<-function(arg1,arg2,huge_list){
il<-1:length(huge_list)
il2<-split(il, ceiling(seq_along(il)/(length(il)/(detectCores()-1))))
g <- foreach ( i= il2 , .combine=rbind,.multicombine=TRUE) %dopar% {
ab1<-lapply(i,function(li)
{
inner_fun(i,arg1,arg2,huge_list(i))
}
)
do.call(rbind,ab1)
}
return(g)
}
You got the chunks wrong. It's not about dividing the indices in chunks of length no_of_clusters but rather to divide them in no_of_clusters chunks.
Try this out:
dummy_fun2 <- function(arg1, arg2, huge_list, inner_fun, ncores) {
cl <- parallel::makeCluster(ncores)
doParallel::registerDoParallel(cl)
on.exit(parallel::stopCluster(cl), add = TRUE)
L <- length(huge_list)
inds <- split(seq_len(L), sort(rep_len(seq_len(NCORES), L)))
foreach(l = seq_along(inds), .combine = rbind) %dopar% {
ab1 <- lapply(inds[[l]], function(i) {
inner_fun(i, arg1, arg2, huge_list[i])
})
do.call(rbind, ab1)
}
}
Further remarks:
it's often useless to use more than half of the cores you have on your computer.
the option .multicombine is automatically used with rbind. But the .maxcombine is really important (need more than 100). Here, we use lapply for the sequential part so this remark doesn't matter.
it's useless to have many exports when using foreach, it already exports what is necessary from the environment of dummy_fun2.
are you sure you want to use huge_list[i] (get a list of one element) rather than huge_list[[i]] (get the i-th element of the list)?
I am trying to translate a for loop to a loop using foreach.
I have tried several output methods playing with the .combine argument, but I cannot output the two vectors that I create by first initilizing them to hold 1e4 zeros and then refilling each entry at each iteration.
In particular, I cannot recover the vectors that are created in this way:
Va = numeric(1e4)
Vb = numeric(1e4)
result = foreach(j = 1:1e4, .multicombine=TRUE) %dopar%
{
... rest of the code ...
Va[j] = sample(4,1)
Vb[j] = sample(5,1)
list(retSLSP, retBH)
}
Note that j is the loop variable in the foreach loop. Note also that the computations I showed are not the actual computations I have in my code, but are equivalent for the purposes of the example.
You can use shared-memory to be accessed by all threads.
library(bigmemory)
V <- big.matrix(1e4, 2)
desc <- describe(V)
result = foreach(j = 1:1e4, .multicombine=TRUE) %dopar%
{
V <- bigmemory::attach.big.matrix(desc)
... rest of the code ...
V[j, 1] = sample(4,1)
V[j, 2] = sample(5,1)
list(retSLSP, retBH)
}
Va <- V[, 1]
Vb <- V[, 2]
rm(V, desc)
Although, it would be better to parallelize by blocks than to do it for the whole loop.
An example: https://stackoverflow.com/a/45196081/6103040
I've looked through much of the documentation and done a fair amount of Googling, but can't find an answer to the following question: Is there a way to induce 'next-like' functionality in a parallel foreach loop using the foreach package?
Specifically, I'd like to do something like (this doesn't work with next but does without):
foreach(i = 1:10, .combine = "c") %dopar% {
n <- i + floor(runif(1, 0, 9))
if (n %% 3) {next}
n
}
I realize I can nest my brackets, but if I want to have a few next conditions over a long loop this very quickly becomes a syntax nightmare.
Is there an easy workaround here (either next-like functionality or a different way of approaching the problem)?
You could put your code in a function and call return. It's not clear from your example what you want it to do when n %% 3 so I'll return NA.
funi <- function(i) {
n <- i + floor(runif(1, 0, 9))
if (n %% 3) return(NA)
n
}
foreach(i = 1:10, .combine = "c") %dopar% { funi(i) }
Although it seems strange, you can use a return in the body of a foreach loop, without the need for an auxiliary function (as demonstrated by #Aaron):
r <- foreach(i = 1:10, .combine='c') %dopar% {
n <- i + floor(runif(1, 0, 9))
if (n %% 3) return(NULL)
n
}
A NULL is returned in this example since it is filtered out by the c function, which can be useful.
Also, although it doesn't work well for your example, the when function can take the place of next at times, and is useful for preventing the computation from taking place at all:
r <- foreach(i=1:5, .combine='c') %:%
foreach(j=1:5, .combine='c') %:%
when (i != j) %dopar% {
10 * i + j
}
The inner expression is only evaluated 20 times, not 25. This is particularly useful with nested foreach loops, since when has access to all of the upstream iterator values.
Update
If you want to filter out NULLs when returning the results in a list, you need to write your own combine function. Here's a complete example that demonstrates a combine function that works like the default combine function but includes a filtering mechanism:
library(doSNOW)
cl <- makeSOCKcluster(3)
registerDoSNOW(cl)
filteredlist <- function(a, ...) {
values <- list(...)
c(a, values[! sapply(values, is.null)])
}
r <- foreach(i=1:200, .combine='filteredlist', .init=list(),
.multicombine=TRUE) %dopar% {
# filter out odd values of i
if (i %% 2) return(NULL)
i
}
Note that this code works correctly when there are more than 100 task results (100 is the default value of the .maxcombine option).