how to combine doMPI and doMC on a cluster? - r

I was used to do parallel computation with doMC and foreach and I have now access to a cluster. My problem is similar to this one Going from multi-core to multi-node in R but there is no response on this post.
Basically I can request a number of tasks -n and a number of cores per task -c to my batch queuing system. I do manage to use doMPI to make parallel simulations on the number of tasks I request, but I now want to use the maxcores options of startMPIcluster to make each MPI process use multicore functionality.
Something I have notices is that parallel::detectCores() does not seem to see how many cores I have been attributed and return the maximum number of core of a node.
For now I have tried:
ncore = 3 #same number as the one I put with -c option
library(Rmpi)
library(doMPI)
cl <- startMPIcluster(maxcores = ncore)
registerDoMPI(cl)
## now some parallel simulations
foreach(icount(10), .packages = c('foreach', 'iterators', 'doParallel')) %dopar% {
## here I'd like to use the `ncore` cores on each simulation of `myfun()`
registerDoParallel(cores = ncore)
myfun()
}
(myfun has indeed some foreach loop inside) but if I set ncore > 1 then I got an error:
Error in { : task 1 failed - "'mckill' failed"
thanks
EDIT
the machineI have access to is http://www-ccrt.cea.fr/fr/moyen_de_calcul/airain.htm, where it's specified "Librairies MPI: BullxMPI, distribution Bull MPI optimised and compatible with OpenMPI"

You are trying to use a lot of different concepts at the same time. You are using an MPI-based cluster to launch on different computers, but are trying to use multi-core processing at the same time. This makes things needlessly complicated.
The cluster you are using is probably spread out over multiple nodes. You need some way to transfer data between these nodes if you want to do parallel processing.
In comes MPI. This is a way to easily connect between different workers on different machines, without needing to specify IP addresses or ports. And this is indeed why you want to launch your process using mpirun or ccc_mprun (which is probably a script with some extra arguments for your specific cluster).
How do we now use this system in R? (see also https://cran.r-project.org/web/packages/doMPI/vignettes/doMPI.pdf)
Launch your script using: mpirun -n 24 R --slave -f myScriptMPI.R, for launching on 24 worker processes. The cluster management system will decide where to launch these worker processes. It might launch all 24 of them on the same (powerful) machine, or it might spread them over 24 different machines. This depends on things like workload, available resources, available memory, machines currently in SLEEP mode, etc.
The above command will launch myScriptMPI.R on 24 different machines. How do we now collaborate?
library(doMPI)
cl <- startMPIcluster()
#the "master" process will continue
#the "worker" processes will wait here until receiving work from the master
registerDoMPI(cl)
## now some parallel simulations
foreach(icount(24), .packages = c('foreach', 'iterators'), .export='myfun') %dopar% {
myfun()
}
Your data will get transferred automatically from master to workers using the MPI protocol.
If you want more control over your allocation, including making "nested" MPI clusters for multicore vs. inter-node parallelization, I suggest you read the doMPI vignette.

Related

How to create cluster for multiple processes?

I have a 31 CPU machine available for parallel computations. I would like to create a single 31-node cluster which would then serve for parallel computations to several different R processes. How can this be done?
I am currently using makeCluster in a way like this:
cl <- makeCluster(5)
registerDoParallel(cl)
but this will only serve the current R process. How can I connect to a cluster created in different R process?
PS: The reason why I want multiple processes to access one cluster is that I want to be constantly adding new sets of computations which will be waiting in the queue for the running processes to finish. I hope it will work this way? I have used doRedis for this in the past, but there were some problems and I would like to use a simple cluster for the purpose.

Do I have to registerDoParallel() and stopCluster() every time I want to use foreach() in R?

I read you had to use stopCluster() after running parallel function: foreach() in R. However, I can get away with registerDoParallel() and then running foreach() as many times as I want without ever using stopCluster(). So do I need stopCluster() or not?
Does not using stopCluster() mean your cores are occupied with your current task? So if I am using parallel programming with only a little bit of single core sequential tasks in between, then I don't need to stopCluster()? I understand there are also significant overhead time consumption with setting up parallel.
parallel::makeCluster() and doParallel::registerDoParallel() create a set of copies of R running in parallel. The copies are called workers.
parallel::stopCluster() and doParallel::stopImplicitCluster() are safe ways of shutting down the workers. From the help page ?stopCluster:
It is good practice to shut down the workers by calling ‘stopCluster’: however the workers will terminate themselves once the socket on which they are listening for commands becomes unavailable, which it should if the master R session is completed (or its process dies).
Indeed, CPU usage of unused workers is often negligible. However, if the workers load large R objects, e.g., large datasets, they may use large parts of the memory and, as a consequence, slow down computations. In that case, it is more efficient to shut down the unused workers.

Controlling the number of CPUs used by registerDoParallel

I have recently inherited a legacy R script that at some point trains a gradient boost model with a large regression matrix. This task is parallelised using the doParallel::registerDoParallel function. Originally, the script started the parallel back end with:
cl = makeCluster(10)
doParallel::registerDoParallel(c)
The workstation has 12 CPU and 28 GB of RAM. The regression matrix being just over 2 GB, I thought this set up would be manageable, however, it would launch dozens of sub-processes, exhausting the memory and crashing R in a few seconds. Eventually I understood that on Linux, a more predictable set up can be achieved by using the cores parameter:
doParallel::registerDoParallel(cores = 1)
Where cores is actually the number of sub-processes per CPU, i.e., with cores = 2 it launches 24 sub-process in this 12 CPU architecture. The problem is that with such a large regression matrix, even 12 sub-processes is too much.
How can the number of sub-processes launched by registerDoParallel be restricted to, say, 8? Would it help using a different parallelisation library?
Update: To ascertain the memory required by each sub-process, I ran the script without starting a cluster or using registerDoParallel; still ,12 back end sub-processes were spawned. The culprit is the caret::train function, it seems to be managing parallelisation without oversight. I opened a new issue at GitHub asking for instructions on how to limit the resources used by this function.

R + snow + cluster: Kill slaves when master dies

I am parallelizing my R code with snow using clusters (SOCK type) in an Ubuntu 16 LTS. A simpler code example is below:
# Make cluster of type SOCK
cl <- makeCluster(hostsList, type = "SOCK")
clusterExport(cl, "task");
# Compute long time tasks
result <- clusterApplyLB(cl, 1:50, function(x) task(x))
# Stop cluster
stopCluster(cl)
Task function can take a long time (minutes or hours), but when for some reasons in my application there is no need to continue computing the tasks, application is not able to stop all slaves processes.
I can kill master R process but R slave processes remain until they finish (i.e. remain using CPU during several time).
I can not kill slave processes because their parent process is the system one (PPID = 1) so I don't know which slaves are related to master process I want to stop. I also tried to use a kind of interrupt to let master R process execute stopCluster function without succeed.
After a depth search, I did not found a solution for this. So, does anybody know a way to stop/kill the slaves or has an idea to solve this?
Thanks in advance!

doParallel, cluster vs cores

What is the difference between cluster and cores in registerDoParallel when using doParallel package?
Is my understanding correct that on single machine these are interchangeable and I will get same results for :
cl <- makeCluster(4)
registerDoParallel(cl)
and
registerDoParallel(cores = 4)
The only difference I see that makeCluster() has to be stopped explicitly using stopCluster().
I think the chosen answer is too general and actually not accurate, since it didn't touch the detail of doParallel package itself. If you read the vignettes, it's actually pretty clear.
The parallel package is essentially a merger of the multicore
package, which was written by Simon Urbanek, and the snow package,
which was written by Luke Tierney and others. The multicore
functionality supports multiple workers only on those operating
systems that support the fork system call; this excludes Windows. By
default, doParallel uses multicore functionality on Unix-like systems
and snow functionality on Windows.
We will use snow-like functionality in this vignette, so we start by
loading the package and starting a cluster
To use multicore-like functionality, we would specify the number of
cores to use instead
In summary, this is system dependent. Cluster is the more general mode cover all platforms, and cores is only for unix-like system.
To make the interface consistent, the package used same function for these two modes.
> library(doParallel)
> cl <- makeCluster(4)
> registerDoParallel(cl)
> getDoParName()
[1] "doParallelSNOW"
> registerDoParallel(cores=4)
> getDoParName()
[1] "doParallelMC"
Yes, it's right from the software view.
on single machine these are interchangeable and I will get same results.
To understand 'cluster' and 'cores' clearly, I suggest thinking from the 'hardware' and 'software' level.
At the hardware level, 'cluster' means network connected machines that can work together by communications such as by socket (Need more init/stop operations as stopCluster you pointed). While 'cores' means several hardware cores in local CPU, and they work together by shared memory typically (don't need to send message explicitly from A to B).
At the software level, sometimes, the boundary of cluster and cores is not that clear. The program can be run locally by cores or remote by cluster, and the high-level software doesn't need to know the details. So, we can mix two modes such as using explicit communication in local as setting cl in one machine,
and also can run multicores in each of the remote machines.
Back to your question, is setting cl or cores equal?
From the software, it will be the same that the program will be run by the same number of clients/servers and then get the same results.
From the hardware, it may be different. cl means to communicate explicitly and cores to shared memory, but if the high-level software optimized very well. In a local machine, both setting will goes into the same flow. I don't look into doParallel very deep now, so I am not very sure if these two are the same.
But in practice, it is better to specify cores for a single machine and cl for the cluster.
Hope this helps you.
The behavior of doParallel::registerDoParallel(<numeric>) depends on the operating system, see print(doParallel::registerDoParallel) for details.
On Windows machines,
doParallel::registerDoParallel(4)
effectively does
cl <- makeCluster(4)
doParallel::registerDoParallel(cl)
i.e. it set up four ("PSOCK") workers that run in background R sessions. Then, %dopar% will basically utilize the parallel::parLapply() machinery. With this setup, you do have to worry about global variables and packages being attached on each of the workers.
However, on non-Windows machines,
doParallel::registerDoParallel(4)
the result will be that %dopar% will utilize the parallel::mclapply() machinery, which in turn relies on forked processes. Since forking is used, you don't have to worry about globals and packages.

Resources