R package parallel stopCluster method seems to hang - r

I am currently working on a project for my company where I am trying to forecast demand for certain flows in parallel. For that, I'm using the following statements from the R parallel package:
cl = makeCluster(number_of_sessions)
parRapply(cl, range_list_small, context = context, fun = forecastDemand)
stopCluster(cl)
The context object in this case, is an environment which contains certain objects.
The problem is the following, I tried the script for a small sample of flows and it works perfectly. However when I runned the script for a big number of flows, it hangs for a long time (sometimes a few hours) on the stopCluster(cl) statement. I googled around, but it seems that nobody ever had the same problem before. Does somebody recognizes the problem? Or is there another way to close the cluster object. Because after this first parallel session my script has to do another parallel session for other calculations and this parallel session does not start until the stopCluster method has finished.

Related

Parallel processing in R with "parallel" package - unpredictable runtime

I've been learning to parallelize code in R using the parallel package, and specifically, the mclapply() function with 14 cores.
Something I noticed, just from a few runs of code, is that repeat calls of mclapply() (with the same arguments and same number of cores used) take significantly different lengths of time. For example, the first run took 18s, the next run took 23s, and the next one took 34s when I did them back to back to back (on the same input). So I waited a minute, ran the code again, and it was back down to taking 18s.
Is there some equivalent of "the computer needs a second to cool down" after running the code, which would mean that running separate calls of mclapply() back to back might take longer and longer amounts of time, but waiting for a minute or so and then running mclapply() again gets it back to normal?
I don't have much experience with parallelizing in R, but this is the only ad-hoc explanation I can think of. It would be very helpful to know if my reasoning checks out, and hear in more detail about why this might be happening. Thanks!
To clarify, my calls are like:
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
x <- mclapply(training_data, simulation, testing_data, mc.cores=14, mc.set.seed = TRUE)
Running this twice in a row takes a lot longer the second time for me. Waiting for a minute and then running it again, it becomes fast again.
I haven't used mcapply but I have used parallel, foreach and pbapply packages. I think the inconsistency lies in the fact that there are small overheads involved in firing workers and in communicating on progress of running tasks in parallel.

R memory blowup when predicting nnet output in parallel with foreach

I have a (large) neural net being trained by the nnet package in R. I want to be able to simulate predictions from this neural net, and do so in a parallelised fashion using something like foreach, which I've used before with success (all on a Windows machine).
My code is essentially of the form
library(nnet)
data = data.frame(out=c(0, 0.1, 0.4, 0.6),
in1=c(1, 2, 3, 4),
in2=c(10, 4, 2, 6))
net = nnet(out ~ in1 + in2, data=data, size=5)
library(doParallel)
registerDoParallel(cores=detectCores()-2)
results = foreach(test=1:10, .combine=rbind, .packages=c("nnet")) %dopar% {
result = predict(net, newdata = data.frame(in1=test, in2=5))
return(result)
}
except with a much larger NN being fit and predicted from; it's around 300MB.
The code above runs fine when using a traditional for loop, or when using %do%, but when using %dopar%, everything gets loaded into memory for each core being used - around 700MB each. If I run it for long enough, everything eventually explodes.
Having looked up similar problems, I still have no idea what is causing this. Omitting the 'predict' part has everything run smoothly.
How can I have each core lookup the unchanging 'net' rather than having it loaded into memory? Or is it not possible?
When you start new parallel workers, you're essentially creating a new environment, which means that whatever operations you perform in that new environment will require access to the relevant variables/functions.
For instance, you have to specify .packages=c("nnet") because you require the nnet package within each new worker (environment), and this is how you "clone" or "export" from the global environment to each worker env.
Because you require the trained neural network to make predictions, you will need to export it to each worker as well, and I don't see a way around the memory blowup you're experiencing. If you're still interested in parallelization but are running out of memory, my only advice is to look into doMPI.
How can I have each core lookup the unchanging 'net' rather than having it loaded into memory? Or is it not possible?
CPak's reply explains what's going on; you're effectively running multiple copies (=workers) of the main script in separate R session. Since you're on Windows, calling
registerDoParallel(cores = n)
expands to:
cl <- parallel::makeCluster(n, type = "PSOCK")
registerDoParallel(cl)
which what sets up n independent background R workers with their own indenpendent memory address spaces.
Now, if you'd been on a Unix-like system, it would instead have corresponded to using n forked R workers, cf. parallel::mclapply(). Forked processes are not supported by R on Windows. With forked processing, you would effectively get what you're asking for, because forked child processes will share the objects already allocated by the main process (as long as such objects are not modified), e.g. net.

Spawn process that run in parallel in R

I am writing a script that needs to be running continuously storing information on a MySQL database.
However, at some point of the day I will like to produce some summary of the data being colected, but writing this in the same script will stop collecting data while doing these summaries. Here's a sketch of the problem:
while (1==1) {
# get data and store it on the relational database
# At some point of the day (or time interval) do some summaries
if (time == certain_time) {
source("analyze_data.R")
}
}
The problem is that I'll like the data collection not to stop, being executed by another core of the computer.
I have seen references to packages parallel and multicore but my impression is that they are useful to repetitive tasks applied over vectors or lists.
You can use parallel to fork a process but you are right that the program will wait eternally for all the forked processes to come back together before proceeding (that is kind of the use case of parallel).
Why not run two separate R programs, one that collects the data and one that grabs it? Then, you simply run one continuously in the background and the other at set times. The problem then becomes one of getting the data out of the continuous data gathering program and into the summary program.
Do the logic outside of R:
Write 2 scripts; 1 with a while loop storing data, the other with a check. Run the while loop with one process and just leave it running.
Meanwhile, run your other (checking script) on demand to crunch the data. Or, put it in a cron job.
There are robust tools outside of R to handle this kind of thing; why do it inside R?

Using R Parallel with other R packages

I am working on a very time intensive analysis using the LQMM package in R. I set the model to start running on Thursday, it is now Monday, and is still running. I am confident in the model itself (tested as a standard MLM), and I am confident in my LQMM code (have run several other very similar LQMMs with the same dataset, and they all took over a day to run). But I'd really like to figure out how to make this run faster if possible using the parallel processing capabilities of the machines I have access to (note all are Microsoft Windows based).
I have read through several tutorials on using parallel, but I have yet to find one that shows how to use the parallel package in concert with other R packages....am I over thinking this, or is it not possible?
Here is the code that I am running using the R package LQMM:
install.packages("lqmm")
library(lqmm)
g1.lqmm<-lqmm(y~x+IEP+pm+sd+IEPZ+IEP*x+IEP*pm+IEP*sd+IEP*IEPZ+x*pm+x*sd+x*IEPZ,random=~1+x+IEP+pm+sd+IEPZ, group=peers, tau=c(.1,.2,.3,.4,.5,.6,.7,.8,.9),na.action=na.omit,data=g1data)
The dataset has 122433 observations on 58 variables. All variables are z-scored or dummy coded.
The dependent libraries will need to be evaluated on all your nodes. The function clusterEvalQ is foreseen inside the parallel package for this purpose. You might also need to export some of your data to the global environments of your subnodes: For this you can use the clusterExport function. Also view this page for more info on other relevant functions that might be useful to you.
In general, to speed up your application by using multiple cores you will have to split up your problem in multiple subpieces that can be processed in parallel on different cores. To achieve this in R, you will first need to create a cluster and assign a particular number of cores to it. Next, You will have to register the cluster, export the required variables to the nodes and then evaluate the necessary libraries on each of your subnodes. The exact way that you will setup your cluster and launch the nodes will depend on the type of sublibraries and functions that you will use. As an example, your clustersetup might look like this when you choose to utilize the doParallel package (and most of the other parallelisation sublibraries/functions):
library(doParallel)
nrCores <- detectCores()
cl <- makeCluster(nrCores)
registerDoParallel(cl);
clusterExport(cl,c("g1data"),envir=environment());
clusterEvalQ(cl,library("lqmm"))
The cluster is now prepared. You can now assign subparts of the global task to each individual node in your cluster. In the general example below each node in your cluster will process subpart i of the global task. In the example we will use the foreach %dopar% functionality that is provided by the doParallel package:
The doParallel package provides a parallel backend for the
foreach/%dopar% function using the parallel package of R 2.14.0 and
later.
Subresults will automatically be added to the resultList. Finally, when all subprocesses are finished we merge the results:
resultList <- foreach(i = 1:nrCores) %dopar%
{
#process part i of your data.
}
stopCluster(cl)
#merge data..
Since your question was not specifically on how to split up your data I will let you figure out the details of this part for yourself. However, you can find a more detailed example using the doParallel package in my answer to this post.
It sounds like you want to use parallel computing to make a single call of the lqmm function execute more quickly. To do that, you either have to:
Split the one call of lqmm into multiple function calls;
Parallelize a loop inside lqmm.
Some functions can be split up into multiple smaller pieces by specifying a smaller iteration value. Examples include parallelizing randomForest over the ntree argument, or parallelizing kmeans over the nstart argument. Another common case is to split the input data into smaller pieces, operate on the pieces in parallel, and then combine the results. That is often done when the input data is a data frame or a matrix.
But many times in order to parallelize a function you have to modify it. It may actually be easier because you may not have to figure out how to split up the problem and combine the partial results. You may only need to convert an lapply call into a parallel lapply, or convert a for loop into a foreach loop. However, it's often time consuming to understand the code. It's also a good idea to profile the code so that your parallelization really speeds up the function call.
I suggest that you download the source distribution of the lqmm package and start reading the code. Try to understand it's structure and get an idea which loops could be executed in parallel. If you're lucky, you might figure out a way to split one call into multiple calls, but otherwise you'll have to rebuild a modified version of the package on your machine.

When does foreach call .combine?

I have written some code using foreach which processes and combines a large number of CSV files. I am running it on a 32 core machine, using %dopar% and registering 32 cores with doMC. I have set .inorder=FALSE, .multicombine=TRUE, verbose=TRUE, and have a custom combine function.
I notice if I run this on a sufficiently large set of files, it appears that R attempts to process EVERY file before calling .combine the first time. My evidence is that in monitoring my server with htop, I initially see all cores maxed out, and then for the remainder of the job only one or two cores are used while it does the combines in batches of ~100 (.maxcombine's default), as seen in the verbose console output. What's really telling is the more jobs i give to foreach, the longer it takes to see "First call to combine"!
This seems counter-intuitive to me; I naively expected foreach to process .maxcombine files, combine them, then move on to the next batch, combining those with the output of the last call to .combine. I suppose for most uses of .combine it wouldn't matter as the output would be roughly the same size as the sum of the sizes of inputs to it; however my combine function pares down the size a bit. My job is large enough that I could not possibly hold all 4200+ individual foreach job outputs in RAM simultaneously, so I was counting on my space-saving .combine and separate batching to see me through.
Am I right that .combine doesn't get called until ALL my foreach jobs are individually complete? If so, why is that, and how can I optimize for that (other than making the output of each job smaller) or change that behavior?
The short answer is to use either doMPI or doRedis as your parallel backend. They work more as you expect.
The doMC, doSNOW and doParallel backends are relatively simple wrappers around functions such as mclapply and clusterApplyLB, and don't call the combine function until all of the results have been computed, as you've observed. The doMPI, doRedis, and (now defunct) doSMP backends are more complex, and get inputs from the iterators as needed and call the combine function on-the-fly, as you have assumed they would. These backends have a number of advantages in my opinion, and allow you to handle an arbitrary number of tasks if you have appropriate iterators and combine function. It surprises me that so many people get along just fine with the simpler backends, but if you have a lot of tasks, the fancy ones are essential, allowing you to do things that are quite difficult with packages such as parallel.
I've been thinking about writing a more sophisticated backend based on the parallel package that would handle results on the fly like my doMPI package, but there's hasn't been any call for it to my knowledge. In fact, yours has been the only question of this sort that I've seen.
Update
The doSNOW backend now supports on-the-fly result handling. Unfortunately, this can't be done with doParallel because the parallel package doesn't export the necessary functions.

Resources