R + snow + cluster: Kill slaves when master dies

R + snow + cluster: Kill slaves when master dies - r

I am parallelizing my R code with snow using clusters (SOCK type) in an Ubuntu 16 LTS. A simpler code example is below:
# Make cluster of type SOCK
cl <- makeCluster(hostsList, type = "SOCK")
clusterExport(cl, "task");
# Compute long time tasks
result <- clusterApplyLB(cl, 1:50, function(x) task(x))
# Stop cluster
stopCluster(cl)
Task function can take a long time (minutes or hours), but when for some reasons in my application there is no need to continue computing the tasks, application is not able to stop all slaves processes.
I can kill master R process but R slave processes remain until they finish (i.e. remain using CPU during several time).
I can not kill slave processes because their parent process is the system one (PPID = 1) so I don't know which slaves are related to master process I want to stop. I also tried to use a kind of interrupt to let master R process execute stopCluster function without succeed.
After a depth search, I did not found a solution for this. So, does anybody know a way to stop/kill the slaves or has an idea to solve this?
Thanks in advance!

Related

How to create cluster for multiple processes?

I have a 31 CPU machine available for parallel computations. I would like to create a single 31-node cluster which would then serve for parallel computations to several different R processes. How can this be done?
I am currently using makeCluster in a way like this:
cl <- makeCluster(5)
registerDoParallel(cl)
but this will only serve the current R process. How can I connect to a cluster created in different R process?
PS: The reason why I want multiple processes to access one cluster is that I want to be constantly adding new sets of computations which will be waiting in the queue for the running processes to finish. I hope it will work this way? I have used doRedis for this in the past, but there were some problems and I would like to use a simple cluster for the purpose.

R package Future - why does a loop with remote workers hangs the local R session

Please let me know if you need an example, but I don't think it is necessary.
I've written a for loop that makes futures and store the results of each in a list. The plan is remote, say, made of 4 nodes on an internet machine.
After the 4th future is deployed and all cores of the remote machine are busy, R hangs until one of them is free. As I'm not using any of my local cores, why does it have to hang? Is that a way to change this behavior?

Author of the future framework here. This behavior is by design.
Your main R session has a certain number of workers available. The number of workers depends on what future plan you have set up. You can always check then number of workers set up by calling nbrOfWorkers(). In your case, you have four remote workers, which means that nbrOfWorkers() returns 4.
You can this number of futures (= nbrOfWorkers()) active at any time without blocking. When you attempt to create one more future, there are no more workers available to take it on. At this point, the only option is to block.
Now, it could be that you are asking: How can I make use of my local machine when the remote workers are all busy?
The easiest way to achieve this is by adding one of more local workers in the mix of remote workers. For example, if you allow yourself to use two workers on your local machine, you can do this as:
library(future)
remote_workers <- makeClusterPSOCK(c("n1.remote.org", "n2.remote.org"))
local_workers <- makeClusterPSOCK(2)
plan(cluster, workers = c(remote_workers, local_workers))
or even just
library(future)
remote_workers <- c("n1.remote.org", "n2.remote.org")
local_workers <- rep("localhost", times = 2)
plan(cluster, workers = c(remote_workers, local_workers))

how to combine doMPI and doMC on a cluster?

I was used to do parallel computation with doMC and foreach and I have now access to a cluster. My problem is similar to this one Going from multi-core to multi-node in R but there is no response on this post.
Basically I can request a number of tasks -n and a number of cores per task -c to my batch queuing system. I do manage to use doMPI to make parallel simulations on the number of tasks I request, but I now want to use the maxcores options of startMPIcluster to make each MPI process use multicore functionality.
Something I have notices is that parallel::detectCores() does not seem to see how many cores I have been attributed and return the maximum number of core of a node.
For now I have tried:
ncore = 3 #same number as the one I put with -c option
library(Rmpi)
library(doMPI)
cl <- startMPIcluster(maxcores = ncore)
registerDoMPI(cl)
## now some parallel simulations
foreach(icount(10), .packages = c('foreach', 'iterators', 'doParallel')) %dopar% {
## here I'd like to use the `ncore` cores on each simulation of `myfun()`
registerDoParallel(cores = ncore)
myfun()
}
(myfun has indeed some foreach loop inside) but if I set ncore > 1 then I got an error:
Error in { : task 1 failed - "'mckill' failed"
thanks
EDIT
the machineI have access to is http://www-ccrt.cea.fr/fr/moyen_de_calcul/airain.htm, where it's specified "Librairies MPI: BullxMPI, distribution Bull MPI optimised and compatible with OpenMPI"

You are trying to use a lot of different concepts at the same time. You are using an MPI-based cluster to launch on different computers, but are trying to use multi-core processing at the same time. This makes things needlessly complicated.
The cluster you are using is probably spread out over multiple nodes. You need some way to transfer data between these nodes if you want to do parallel processing.
In comes MPI. This is a way to easily connect between different workers on different machines, without needing to specify IP addresses or ports. And this is indeed why you want to launch your process using mpirun or ccc_mprun (which is probably a script with some extra arguments for your specific cluster).
How do we now use this system in R? (see also https://cran.r-project.org/web/packages/doMPI/vignettes/doMPI.pdf)
Launch your script using: mpirun -n 24 R --slave -f myScriptMPI.R, for launching on 24 worker processes. The cluster management system will decide where to launch these worker processes. It might launch all 24 of them on the same (powerful) machine, or it might spread them over 24 different machines. This depends on things like workload, available resources, available memory, machines currently in SLEEP mode, etc.
The above command will launch myScriptMPI.R on 24 different machines. How do we now collaborate?
library(doMPI)
cl <- startMPIcluster()
#the "master" process will continue
#the "worker" processes will wait here until receiving work from the master
registerDoMPI(cl)
## now some parallel simulations
foreach(icount(24), .packages = c('foreach', 'iterators'), .export='myfun') %dopar% {
myfun()
}
Your data will get transferred automatically from master to workers using the MPI protocol.
If you want more control over your allocation, including making "nested" MPI clusters for multicore vs. inter-node parallelization, I suggest you read the doMPI vignette.

How do I ensure parallel's Rscript.exe processes are closed with my R session?

I am using the parallel package to run a server function multiple times at once. The server function loops until the session is manually stopped by the user.
It looks like:
library(parallel)
cluster <- makeCluster(3)
clusterCall(cluster, f)
On Windows, parallel works by creating an Rscript process for each worker in a cluster. However, these processes do not get closed when terminating the R session; they must be manually removed in the task manager. With a dozen or so workers, this is quickly becoming a hassle.
Is it possible to set these processes to close when the parent R session closes?

You always must close the connections after the parallel processing. Try the following example:
library(parallel)
cluster <- makeCluster(3)
clusterCall(cluster, f)
stopCluster(cluster) # always add this line in the end of the script

Remove zombie processes using parallel package

After I have played around for some time using R's parallel package on my Debian-based machine I still can't find a way to remove all zombie child-processes after a computation.
I'm searching for a general and OS independent solution.
Below a simple script illustrating the problem for 2 cores:
library(parallel)
testfun <- function(){TRUE}
cltype <- ifelse(.Platform$OS.type != "windows", "FORK", "PSOCK")
cl <- makeCluster(2, type = cltype)
p <- clusterCall(cl, testfun)
stopCluster(cl)
Unfortunately, this script leaves two zombie processes in the process table which only get killed if R is shut down.

This only seems to be an issue with "FORK" clusters. If you make a "PSOCK" cluster instead, the processes will die when you call stopCluster(cl).
Is there anything preventing you from using a "PSOCK" cluster on your Debian-based machine?

Probably the answer of your problem is in the help file of makeCluster() command.
At the bottom of the file, it is written : It is good practice to shut down the workers by calling stopCluster: however the workers will terminate themselves once the socket on which they are listening for commands becomes unavailable, which it should if the master R session is completed (or its process dies).
The solution is (it is working for me) to define a port for your cluster while you are creating it.
cl <- makeCluster(2, type = cltype,port=yourPortNumber)
another (may be not usefull) solution is setting a timeout for your sockets. timeout variable is in seconds.
cl <- makeCluster(2, type = cltype,port=yourPortNumber,timeout=50)
In any case, the aim should be to make the socket connection unavailable.either closing the ports or closing the main R process would do this.
Edit: What I meant was to close the ports which the process is listening. It should be OS independent. you can try to use -> showConnections(all = TRUE); . This will give all the connections. Then you can try closeAllConnections();
Sorry if this doesn't work also.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex