After I have played around for some time using R's parallel package on my Debian-based machine I still can't find a way to remove all zombie child-processes after a computation.
I'm searching for a general and OS independent solution.
Below a simple script illustrating the problem for 2 cores:
library(parallel)
testfun <- function(){TRUE}
cltype <- ifelse(.Platform$OS.type != "windows", "FORK", "PSOCK")
cl <- makeCluster(2, type = cltype)
p <- clusterCall(cl, testfun)
stopCluster(cl)
Unfortunately, this script leaves two zombie processes in the process table which only get killed if R is shut down.
This only seems to be an issue with "FORK" clusters. If you make a "PSOCK" cluster instead, the processes will die when you call stopCluster(cl).
Is there anything preventing you from using a "PSOCK" cluster on your Debian-based machine?
Probably the answer of your problem is in the help file of makeCluster() command.
At the bottom of the file, it is written : It is good practice to shut down the workers by calling stopCluster: however the workers will terminate themselves once the socket on which they are listening for commands becomes unavailable, which it should if the master R session is completed (or its process dies).
The solution is (it is working for me) to define a port for your cluster while you are creating it.
cl <- makeCluster(2, type = cltype,port=yourPortNumber)
another (may be not usefull) solution is setting a timeout for your sockets. timeout variable is in seconds.
cl <- makeCluster(2, type = cltype,port=yourPortNumber,timeout=50)
In any case, the aim should be to make the socket connection unavailable.either closing the ports or closing the main R process would do this.
Edit: What I meant was to close the ports which the process is listening. It should be OS independent. you can try to use -> showConnections(all = TRUE); . This will give all the connections. Then you can try closeAllConnections();
Sorry if this doesn't work also.
Related
Please let me know if you need an example, but I don't think it is necessary.
I've written a for loop that makes futures and store the results of each in a list. The plan is remote, say, made of 4 nodes on an internet machine.
After the 4th future is deployed and all cores of the remote machine are busy, R hangs until one of them is free. As I'm not using any of my local cores, why does it have to hang? Is that a way to change this behavior?
Author of the future framework here. This behavior is by design.
Your main R session has a certain number of workers available. The number of workers depends on what future plan you have set up. You can always check then number of workers set up by calling nbrOfWorkers(). In your case, you have four remote workers, which means that nbrOfWorkers() returns 4.
You can this number of futures (= nbrOfWorkers()) active at any time without blocking. When you attempt to create one more future, there are no more workers available to take it on. At this point, the only option is to block.
Now, it could be that you are asking: How can I make use of my local machine when the remote workers are all busy?
The easiest way to achieve this is by adding one of more local workers in the mix of remote workers. For example, if you allow yourself to use two workers on your local machine, you can do this as:
library(future)
remote_workers <- makeClusterPSOCK(c("n1.remote.org", "n2.remote.org"))
local_workers <- makeClusterPSOCK(2)
plan(cluster, workers = c(remote_workers, local_workers))
or even just
library(future)
remote_workers <- c("n1.remote.org", "n2.remote.org")
local_workers <- rep("localhost", times = 2)
plan(cluster, workers = c(remote_workers, local_workers))
I am parallelizing my R code with snow using clusters (SOCK type) in an Ubuntu 16 LTS. A simpler code example is below:
# Make cluster of type SOCK
cl <- makeCluster(hostsList, type = "SOCK")
clusterExport(cl, "task");
# Compute long time tasks
result <- clusterApplyLB(cl, 1:50, function(x) task(x))
# Stop cluster
stopCluster(cl)
Task function can take a long time (minutes or hours), but when for some reasons in my application there is no need to continue computing the tasks, application is not able to stop all slaves processes.
I can kill master R process but R slave processes remain until they finish (i.e. remain using CPU during several time).
I can not kill slave processes because their parent process is the system one (PPID = 1) so I don't know which slaves are related to master process I want to stop. I also tried to use a kind of interrupt to let master R process execute stopCluster function without succeed.
After a depth search, I did not found a solution for this. So, does anybody know a way to stop/kill the slaves or has an idea to solve this?
Thanks in advance!
I was used to do parallel computation with doMC and foreach and I have now access to a cluster. My problem is similar to this one Going from multi-core to multi-node in R but there is no response on this post.
Basically I can request a number of tasks -n and a number of cores per task -c to my batch queuing system. I do manage to use doMPI to make parallel simulations on the number of tasks I request, but I now want to use the maxcores options of startMPIcluster to make each MPI process use multicore functionality.
Something I have notices is that parallel::detectCores() does not seem to see how many cores I have been attributed and return the maximum number of core of a node.
For now I have tried:
ncore = 3 #same number as the one I put with -c option
library(Rmpi)
library(doMPI)
cl <- startMPIcluster(maxcores = ncore)
registerDoMPI(cl)
## now some parallel simulations
foreach(icount(10), .packages = c('foreach', 'iterators', 'doParallel')) %dopar% {
## here I'd like to use the `ncore` cores on each simulation of `myfun()`
registerDoParallel(cores = ncore)
myfun()
}
(myfun has indeed some foreach loop inside) but if I set ncore > 1 then I got an error:
Error in { : task 1 failed - "'mckill' failed"
thanks
EDIT
the machineI have access to is http://www-ccrt.cea.fr/fr/moyen_de_calcul/airain.htm, where it's specified "Librairies MPI: BullxMPI, distribution Bull MPI optimised and compatible with OpenMPI"
You are trying to use a lot of different concepts at the same time. You are using an MPI-based cluster to launch on different computers, but are trying to use multi-core processing at the same time. This makes things needlessly complicated.
The cluster you are using is probably spread out over multiple nodes. You need some way to transfer data between these nodes if you want to do parallel processing.
In comes MPI. This is a way to easily connect between different workers on different machines, without needing to specify IP addresses or ports. And this is indeed why you want to launch your process using mpirun or ccc_mprun (which is probably a script with some extra arguments for your specific cluster).
How do we now use this system in R? (see also https://cran.r-project.org/web/packages/doMPI/vignettes/doMPI.pdf)
Launch your script using: mpirun -n 24 R --slave -f myScriptMPI.R, for launching on 24 worker processes. The cluster management system will decide where to launch these worker processes. It might launch all 24 of them on the same (powerful) machine, or it might spread them over 24 different machines. This depends on things like workload, available resources, available memory, machines currently in SLEEP mode, etc.
The above command will launch myScriptMPI.R on 24 different machines. How do we now collaborate?
library(doMPI)
cl <- startMPIcluster()
#the "master" process will continue
#the "worker" processes will wait here until receiving work from the master
registerDoMPI(cl)
## now some parallel simulations
foreach(icount(24), .packages = c('foreach', 'iterators'), .export='myfun') %dopar% {
myfun()
}
Your data will get transferred automatically from master to workers using the MPI protocol.
If you want more control over your allocation, including making "nested" MPI clusters for multicore vs. inter-node parallelization, I suggest you read the doMPI vignette.
I am using the parallel package to run a server function multiple times at once. The server function loops until the session is manually stopped by the user.
It looks like:
library(parallel)
cluster <- makeCluster(3)
clusterCall(cluster, f)
On Windows, parallel works by creating an Rscript process for each worker in a cluster. However, these processes do not get closed when terminating the R session; they must be manually removed in the task manager. With a dozen or so workers, this is quickly becoming a hassle.
Is it possible to set these processes to close when the parent R session closes?
You always must close the connections after the parallel processing. Try the following example:
library(parallel)
cluster <- makeCluster(3)
clusterCall(cluster, f)
stopCluster(cluster) # always add this line in the end of the script
First off, although I'm pretty well versed in R programming, I'm fairly new to both *nix environments + parallel computing, so I appreciate your bearing with me. I'm familiar with the 'parallel', 'foreach', and different 'do_' packages in R, but only for utilizing multiple cores on a local machine.
I have a local Linux cluster of computers (running on OpenSUSE) available to me, with a number of nodes. These nodes all have R installed. Typically, if I'm trying to work on just one of the nodes, I'll use PuTTY to ssh first to the head node (with username + pwd), then to one of the (internal?) nodes. However, what I'd like to be able to do is, run R on a local Windows workstation, and send jobs to the cluster of computers.
Is it possible to set this cluster of nodes as a parallel back-end for my Windows machine? And if so, what's the most expedient way of going about this?
EDIT:
Perhaps I can narrow down the question a bit. It's easy enough to open an R process on the head node and run something like,
library( parallel )
nodes <- c("n01", "n02", "n03") ## the nodes
cl <- makePSOCKcluster( nodes )
setDefaultCluster( cl )
Now, is it possible for me to interface a local R session with this R session running on the head node in an easy way? Eg, ideally I'd like to write code on my computer of the form (pseudo-code):
clusterConnection <- connect("<cluster>")
f <- function() { clusterApply( cl, 1:10, sum( rnorm(1E7) ) ) }
results <- evaluate( f, clusterConnection )
whereby 'evaluate' performs some magic to send the function 'f' to the head node, then evaluate it, and returns the results back to the local computer and stores it in 'results'.
Is there an R function, package or otherwise that handles this sort of interfacing?
I've found a pretty adequate solution. Use Rserve to set up an R server on the head node, then connect to that through socket connections. The Rserve library on CRAN also provides a bunch of utility functions for evaluating certain functions on the server and receiving the results back on the local computer.