Using a simple sequential loop, I can do something like the following to monitor a long process in R
m <- matrix(rnorm(100*100), 100, 100)
for(i in 1:nrow(m)){
mean(m[i,])
cat("Iteration", i, '\n')
}
Suppose I run this same basic idea as follows
library(doParallel)
library(foreach)
m <- matrix(rnorm(1000*1000), 1000, 1000)
registerDoParallel(2)
foreach(i=1:nrow(m), .combine=rbind) %dopar%
mean(m[i,])
cat("Iteration", i, '\n')
Here the final cat() doesn't work as it does in the first example. Is there a way to capture the iteration progress when running things in parallel? I conceptually understand why such an indicator is not quite the same, but perhaps there are ways to monitor such issues when running big calculations.
Related
I'm modelling a quite big network with EpiModel in R, and the code takes very long to run, so I want to run it on multiple cores instead of just 1. I thought this was possible in EpiModel itself, but when I try it, my code just keeps running without starting the simulations. This is the code I am using:
library(EpiModel)
library(parallel)
nw <- network::network.initialize(n=6000, directed=FALSE)
formation <- ~edges + concurrent
target.stats<-c(1500, 600)
coef.diss <- dissolution_coefs(dissolution=~offset(edges), duration = 1)
est <- netest(nw, formation, target.stats, coef.diss)
dx<-netdx(est, nsims=10, nsteps=122, dynamic=FALSE, ncores=4)
init <- init.net(i.num=1, r.num=0)
param <-param.net(inf.prob=0.55, act.rate=0.6, rec.rate=0.05)
control<-control.net(type='SIR', nsteps= 122, nsims =10, ncores=4)
mainsim <- netsim(est, param, init, control)
plot(mainsim, y='si.flow')
When I set ncores to 1 it will run, but any other number doesn't work. Does anybody know how to solve this?
I run Bayesian statistical models with each chain on a separate processing node using the runjags package in R. I want to fit multiple models at onceby nesting run.jags calls in a parallel loop using the foreach package. However, this often results in error messages, likely because the foreach loop doesn't "know" that within the loop I am calling other parallel processes, and so cores are probably double-allocated (or something). Here is an example error message:
Error in { :
task 2 failed - "The following error was encountered while attempting to run the JAGS model:
Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :
cannot open the connection
Here is some example code to generate data and fit two models, that request 2 cores each (requiring a total of 4 cores, which I have on my laptop). I would love to find a solution that would allow me to run multiple parallel JAGS models, in parallel. In reality I am running 5-10 models at a time which each require 3 cores, on a cluster.
library(foreach)
library(runjags)
#generate a random variable, mean of 25, sd = 5.----
y.list <- list()
for(i in 1:2){
y.list[[i]] <- rnorm(100, 25, sd = 5)
}
#Specify a JAGS model to fit an intercept.----
jags.model = "
model{
for(i in 1:N){
y.hat[i] <- intercept
y[i] ~ dnorm(y.hat[i], tau)
}
#specify priors.
intercept ~ dnorm(0,1E-3)
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100)
}
"
n.cores <- 4
registerDoParallel(n.cores)
#Fit models in parallel, with chains running in parallel.----
#two processes that each require two cores (4 cores are registered and all that is required.)
output <- list()
output <-
foreach(i = 1:length(y.list)) %dopar% {
#specify data object.
jd <- list(y=y.list[[i]], N = length(y.list[[i]]))
#fit model.
jags.out <- run.jags(jags.model,
data=jd,
n.chains=2,
monitor=c('intercept','tau'),
method='rjparallel')
#return output
return(jags.out)
}
I am unable to run your sample, but the following vignette should help you out.
You may want to try to use the foreach nesting operator %:%
https://cran.r-project.org/web/packages/foreach/vignettes/nested.pdf
foreach(i = 1:length(y.list)) %:% {
#specify data object.
jd <- list(y=y.list[[i]], N = length(y.list[[i]]))
#fit model.
jags.out <- run.jags(jags.model,
data=jd,
n.chains=2,
monitor=c('intercept','tau'),
method='rjparallel')
#return output
return(jags.out)
}
There are two things to consider here- how to nest parallel foreach() loops in general, and how to solve this particular issue.
The solution to nesting parallel foreach() loops comes from #Carlos Santillan's answer below, and is a based on a vignette that can be found here. Lets say we have one inner loop nested within an outer loop, similar to the problem above, however instead of the parallel call to run.jags we have a parallel foreach() call:
outer_list <- list()
#begin outer loop:
outer_list <-
foreach(i = 1:length(some_index)) %:% {
#grab something to feed next foreach loop.
to_inner <- grab_data[[i]]
#Do something in a nested foreach loop.
inner_list <- list()
#begin inner loop:
inner_list <-
foreach(k = 1:some_index) %dopar% {
#do some other function.
out_inner <- some_function(to_inner)
return(out_inner)
}
out_outer <- some_function(out_inner)
return(out_outer)
}
The key is using the %:% operator in the outer loop, and the %dopar% operator in the inner loop.
This will not solve the above run.jags() nested parallel problem however, since it is not a nested foreach() loop. To solve this particular nested run.jags() problem I changed the method setting in run.jags to method=parallel instead of method=rjparallel. run.jags() has multiple different parallel implementations and this particular one seems to work based on my timing analyses. Hopefully in the future there will be a more definitive answer as to why this works. I just know that it does work.
I have question about future(), doFuture() usage.
I want to run N computations in parallel (using foreach ... %dopar%) - N is number of cores I have on my machine. To do so I use future:
library(doFuture)
registerDoFuture()
plan(multiprocess)
foreach(i = seq_len(N)) %dopar% {
foo <- rnorm(1e6)
}
This works like a charm as I run N computations in parallel. But I need to implement another analysis step that uses high number of cores (eg., N). This is how code looks like:
foreach(i = seq_len(N)) %dopar% {
foo <- rnorm(1e6)
write.table(foo, paste0("file_", i, ".txt"))
# This step uses high number of cores
system(paste0("head ", "file_", i, ".txt", " > ", "file_head_", i, ".txt")
}
I'm running multiple rnorm and head in parallel, but as head uses high number of cores (lets assume this) my analysis gets stuck.
Question:
How to run only part of code in parallel using future? (How to run only rnorm in parallel and then head sequential)? Is there any solution without using another loop for this? Or maybe I should switch to doSNOW or parallel?
PS:
My real code looks more like this:
library(doFuture)
library(dplyr)
registerDoFuture()
plan(multiprocess)
foreach(i = seq_len(N)) %dopar% {
step1(i) %>%
step2() %>%
step3() %>%
step4_RUN_SEQUENTIAL() %>% # I want to run this part not in parallel
step5() # I want to run this part again in parallel
}
Response to #Andrie comment:
future() is my way to perform parallel computing in R. I'm new to it and find it easiest to use (compared to eg parallel::mcapply). However, if it's possible to do what I want in doSNOW or parallel then I'm more than happy to switch
I'm aware of that, but I'm looking for a solution with single loop
I'm trying to figure out how to pass functions and packages to the boot() function when running parallel computations. It seems very expensive to load a package or define functions inside a loop. The foreach() function that I often use for other parallel tasks has a .packages and .export arguments that handles this (see this SO question) in a nice way but I can't figure out how to do this with the boot package.
Below is a meaningless example that shows what happens when switching to parallel:
library(boot)
myMean <- function(x) mean(x)
meaninglessTest <- function(x, i){
return(myMean(x[i]))
}
x <- runif(1000)
bootTest <- function(){
out <- boot(data=x, statistic=meaninglessTest, R=10000, parallel="snow", ncpus=4)
return(boot.ci(out, type="perc"))
}
bootTest()
Complains (as expected) about that it can't find myMean.
Sidenote: When running this example it runs slower than one-core, probably because splitting this simple task over multiple cores is more time consuming than the actual task. Why isn't the default to split into even job batches of R/ncpus - is there a reason why this isn't default behavior?
Update on the sidenote: As Steve Weston noted, the parLapply that boot() uses actually splits the job into even batches/chunks. The function is a neat wrapper for clusterApply:
docall(c, clusterApply(cl, splitList(x, length(cl)), lapply,
fun, ...))
I'm a little surprised that this doesn't have a better performance when I scale up the the number of repetitions:
> library(boot)
> set.seed(10)
> x <- runif(1000)
>
> Reps <- 10^4
> start_time <- Sys.time()
> res <- boot(data=x, statistic=function(x, i) mean(x[i]),
+ R=Reps, parallel="no")
> Sys.time()-start_time
Time difference of 0.52335 secs
>
> start_time <- Sys.time()
> res <- boot(data=x, statistic=function(x, i) mean(x[i]),
+ R=Reps, parallel="snow", ncpus=4)
> Sys.time()-start_time
Time difference of 3.539357 secs
>
> Reps <- 10^5
> start_time <- Sys.time()
> res <- boot(data=x, statistic=function(x, i) mean(x[i]),
+ R=Reps, parallel="no")
> Sys.time()-start_time
Time difference of 5.749831 secs
>
> start_time <- Sys.time()
> res <- boot(data=x, statistic=function(x, i) mean(x[i]),
+ R=Reps, parallel="snow", ncpus=4)
> Sys.time()-start_time
Time difference of 23.06837 secs
I hope that this is only due to the very simple mean function and that more complex cases behave better. I must admit that I find it a little disturbing as the cluster initialization time should be the same in the 10.000 & 100.000 case, yet the absolute time difference increases and the 4-core version takes 5 times longer. I guess this must be an effect of the list merging, as I can't find any other explanation for it.
If the function to be executed in parallel (meaninglessTest in this case) has extra dependencies (such as myMean), the standard solution is to export those dependencies to the cluster via the clusterExport function. That requires creating a cluster object and passing it to boot via the "cl" argument:
library(boot)
library(parallel)
myMean <- function(x) mean(x)
meaninglessTest <- function(x, i){
return(myMean(x[i]))
}
cl <- makePSOCKcluster(4)
clusterExport(cl, 'myMean')
x <- runif(1000)
bootTest <- function() {
out <- boot(data=x, statistic=meaninglessTest, R=10000,
parallel="snow", ncpus=4, cl=cl)
return(boot.ci(out, type="perc"))
}
bootTest()
stopCluster(cl)
Note that once the cluster workers have been initialized, they can be used by boot many times and do not need to be reinitialized, so it isn't that expensive.
To load packages on the cluster workers, you can use clusterEvalQ:
clusterEvalQ(cl, library(randomForest))
That's nice and simple, but for more complex worker initialization, I usually create a "worker init" function and execute it via clusterCall which is perfect for executing a function once on each of the workers.
As for your side note, the performance is bad because the statistic function does so little work, as you say, but I'm not sure why you think that the work isn't being split evenly between the workers. The parLapply function is used to do the work in parallel in this case, and it does split the work evenly and rather efficiently, but that doesn't guarantee better performance than running sequentially using lapply. But perhaps I'm misunderstanding your question.
I would like to speed up my bootstrap function, which works perfectly fine itself. I read that since R 2.14 there is a package called parallel, but I find it very hard for sb. with low knowledge of computer science to really implement it. Maybe somebody can help.
So here we have a bootstrap:
n<-1000
boot<-1000
x<-rnorm(n,0,1)
y<-rnorm(n,1+2*x,2)
data<-data.frame(x,y)
boot_b<-numeric()
for(i in 1:boot){
bootstrap_data<-data[sample(nrow(data),nrow(data),replace=T),]
boot_b[i]<-lm(y~x,bootstrap_data)$coef[2]
print(paste('Run',i,sep=" "))
}
The goal is to use parallel processing / exploit the multiple cores of my PC. I am running R under Windows. Thanks!
EDIT (after reply by Noah)
The following syntax can be used for testing:
library(foreach)
library(parallel)
library(doParallel)
registerDoParallel(cores=detectCores(all.tests=TRUE))
n<-1000
boot<-1000
x<-rnorm(n,0,1)
y<-rnorm(n,1+2*x,2)
data<-data.frame(x,y)
start1<-Sys.time()
boot_b <- foreach(i=1:boot, .combine=c) %dopar% {
bootstrap_data<-data[sample(nrow(data),nrow(data),replace=T),]
unname(lm(y~x,bootstrap_data)$coef[2])
}
end1<-Sys.time()
boot_b<-numeric()
start2<-Sys.time()
for(i in 1:boot){
bootstrap_data<-data[sample(nrow(data),nrow(data),replace=T),]
boot_b[i]<-lm(y~x,bootstrap_data)$coef[2]
}
end2<-Sys.time()
start1-end1
start2-end2
as.numeric(start1-end1)/as.numeric(start2-end2)
However, on my machine the simple R code is quicker. Is this one of the known side effects of parallel processing, i.e. it causes overheads to fork the process which add to the time in 'simple tasks' like this one?
Edit: On my machine the parallel code takes about 5 times longer than the 'simple' code. This factor apparently does not change as I increase the complexity of the task (e.g. increase boot or n). So maybe there is an issue with the code or my machine (Windows based processing?).
Try the boot package. It is well-optimized, and contains a parallel argument. The tricky thing with this package is that you have to write new functions to calculate your statistic, which accept the data you are working on and a vector of indices to resample the data. So, starting from where you define data, you could do something like this:
# Define a function to resample the data set from a vector of indices
# and return the slope
slopeFun <- function(df, i) {
#df must be a data frame.
#i is the vector of row indices that boot will pass
xResamp <- df[i, ]
slope <- lm(y ~ x, data=xResamp)$coef[2]
}
# Then carry out the resampling
b <- boot(data, slopeFun, R=1000, parallel="multicore")
b$t is a vector of the resampled statistic, and boot has lots of nice methods to easily do stuff with it - for instance plot(b)
Note that the parallel methods depend on your platform. On your Windows machine, you'll need to use parallel="snow".
I haven't tested foreach with the parallel backend on Windows, but I believe this will work for you:
library(foreach)
library(doSNOW)
cl <- makeCluster(c("localhost","localhost"), type = "SOCK")
registerDoSNOW(cl=cl)
n<-1000
boot<-1000
x<-rnorm(n,0,1)
y<-rnorm(n,1+2*x,2)
data<-data.frame(x,y)
boot_b <- foreach(i=1:boot, .combine=c) %dopar% {
bootstrap_data<-data[sample(nrow(data),nrow(data),replace=T),]
unname(lm(y~x,bootstrap_data)$coef[2])
}
I think the main problem is that you have a lot of small tasks. In some cases, you can improve your performance by using task chunking, which results in fewer, but larger data transfers between the master and workers, which is often more efficient:
boot_b <- foreach(b=idiv(boot, chunks=getDoParWorkers()), .combine='c') %dopar% {
sapply(1:b, function(i) {
bdata <- data[sample(nrow(data), nrow(data), replace=T),]
lm(y~x, bdata)$coef[[2]]
})
}
I like using the idiv function for this, but you could b=rep(boot/detectCores(),detectCores()) if you like.
this is an old-question but I think a lot of this can be made more efficient using data.table. the benefits will not really be noticed until larger data sets are used. Putting this answer here to help others that may have to bootstrap larger datasets
library(data.table)
setDT(data) # convert data.frame to data.table by reference
system.time({
b <- rbindlist(
lapply(
1:boot,
function(i) {
data.table(
# store the statistic
'statistic' = lm(y ~ x, data=data[sample(.N, .N, replace = T)])$coef[[2]],
# store the iteration
'iteration' = i
)
}
)
)
})
# 1.66 seconds on my system
ggplot(b) + geom_density(aes(x = statistic))
You could then further improve performance by making use of parallel package.
library(parallel)
cl <- makeCluster(detectCores()) # use all cores on machine, can change this
clusterExport( # give it the variables it needs #nolint
cl,
c(
"data"
),
envir = environment()
)
clusterEvalQ( # give it libraries needed #nolint
cl,
c(
library(data.table)
)
)
system.time({
b <- rbindlist(
parLapply( # this is changed to be in parallel
cl, # give it the cluster you created earlier
1:boot,
function(i) {
data.table(
'statistic' = lm(y ~ x, data=data[sample(.N, .N, replace = T)])$coef[[2]],
'iteration' = i
)
}
)
)
})
stopCluster(cl)
# .47 seconds on my machine