I am trying to do a multi-server (not multi-core) computation using doSNOW and foreach packages.
I have 2 Windows servers and I want to start a parallel computation on both of these Windows machines.
I have the following code:
library(foreach)
library(doSNOW)
winOptionsServer1 <-
list(host="Server1",
rscript="C:/Program Files/R/R-3.1.2/bin/Rscript.exe",
snowlib="C:/Program Files/R/R-3.1.2/library")
winOptionsServer2 <-
list(host="Server2",
rscript="C:/Program Files/R/R-3.1.2/bin/Rscript.exe",
snowlib="C:/Program Files/R/R-3.1.2/library")
cl <- makeCluster(c(rep(winOptionsServer1, 2), rep(winOptionsServer1, 2)), type="SOCK")
After calling makeCluster my machine does somrthing, but never actually completes the call. When I hit Stop in RStudio I get the following error message:
running command 'ssh -l mypc Server1 C:/PROGRA~1/R/R-31~1.2/bin/Rscript.exe "C:/Program Files/R/R-3.1.2/library/snow/RSOCKnode.R" MASTER=MY-PC PORT=11764 OUT=/dev/null SNOWLIB=C:/Program Files/R/R-3.1.2/library' had status 127
Does it mean that I have to configure something on these remote servers? What exactly should I configure? ssh? And how do I do it? Maybe I should open some ports on my remote machines? Which ones?
Does anyone have a full example of steps I need to do to run R across 2 or more machines.
P.S. doSnow works really well with multi-core running, no problem with that. I have problems with multi-servers running
Related
I am queueing and running an R script on a HPC cluster via sbatch and mpirun; the script is meant to use foreach in parallel. To do this I've used several useful questions & answers from StackOverflow: R Running foreach dopar loop on HPC MPIcluster, Single R script on multiple nodes, Slurm: Use cores from multiple nodes for R parallelization.
It seems that the script completes, but a couple of strange things happen. The most important is that the slurm job keeps on running afterwards, doing nothing(?). I'd like to understand if I'm doing things properly. I'll first give some more specific information, then explain the strange things I'm seeing, then I'll ask my questions.
– Information:
R is loaded as a module, which also calls an OpenMPI module. The packages Rmpi, doParallel, snow, foreach were already compiled and included in the module.
The cluster has nodes with 20 CPUs each. My sbatch file books 2 nodes and 20 CPUs per node.
The R script myscript.R is called in the sbatch file like this:
mpirun -np 1 Rscript -e "source('myscript.R')"
My script calls several libraries in this order:
library('snow')
library('Rmpi')
library('doParallel')
library('foreach')
and then sets up parallelization as follows at the beginning:
workers <- mpi.universe.size() - 1
cl <- makeMPIcluster(workers, outfile='', type='MPI')
registerDoParallel(cl)
Then several foreach-dopar are called in succession – that is, each starts after the previous has finished. Finally
stopCluster(cl)
mpi.quit()
are called at the very end of the script.
mpi.universe.size() correctly gives 40, as expected. Also, getDoParWorkers() gives doParallelSNOW. The slurm log encouragingly says
39 slaves are spawned successfully. 0 failed.
starting MPI worker
starting MPI worker
...
Also, calling print(clusterCall(cl, function() Sys.info()[c("nodename","machine")])) from within the script correctly reports the node names shown in the slurm queue.
– What's strange:
The R script completes all its operations, the last one being saving a plot as pdf, which I do see and is correct. But the slurm job doesn't end, it remains in the queue indefinitely with status "running".
The slurm log shows very many lines with
Type: EXEC. I can't find any relation between their number and the number of foreach called. At the very end the log shows 19 lines with Type: DONE (which make sense to me).
– My questions:
Why does the slurm job run indefinitely after the script has finished?
Why the numerous Type: EXEC messages? are they normal?
There is some masking between packages snow and doParallel. Am I calling the right packages and in the right order?
Some answers to the StackOverflow questions mentioned above recommend to call the script with
mpirun -np 1 R --slave -f 'myscript.R'
instead of using Rscript as I did. What's the difference? Note that the problems I mentioned remain even if I call the script this way, though.
I thank you very much for your help!
I am now dealing with a large dataset and I want to use parallel calculation to accelerate the process. WestGird is a Canadian computing system which has clusters with interconnect.
I use two packages doSNOW and parallel to do parallel jobs. My question is how I should write the pbs file. When I submit the job using qsub, an error occurs: mpirun noticed that the job aborted, but has no info as to the process that caused that situation.
Here is the R script code:
install.packages("fume_1.0.tar.gz")
library(fume)
library(foreach)
library(doSNOW)
load("spei03_df.rdata",.GlobalEnv)
cl <- makeCluster(mpi.universe.size(), type='MPI' )
registerDoSNOW(cl)
MK_grid <-
foreach(i=1:6000, .packages="fume",.combine='rbind') %dopar% {
abc <- mkTrend(as.matrix(spei03_data)[i,])
data.frame(P_value=abc$`Corrected p.value`, Slope=abc$`Sen's Slope`*10,Zc=abc$Zc)
}
stopCluster(cl)
save(MK_grid,file="MK_grid.rdata")
mpi.exit()
The "fume" package is download from https://cran.r-project.org/src/contrib/Archive/fume/ .
Here is the pbs file:
#!/bin/bash
#PBS -l nodes=2:ppn=12
#PBS -l walltime=2:00:00
module load application/R/3.3.1
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=1
mpirun -np 1 -hostfile $PBS_NODEFILE R CMD BATCH Trend.R
Can anyone help? Thanks a lot.
It's difficult to give advice on how to use a compute cluster that I've never used since each cluster is setup somewhat differently, but I can give you some general advice that may help.
Your job script looks reasonable to me. It's very similar to what I use on one of our Torque/Moab clusters. It's a good idea to verify that you're able to load all of the necessary R packages interactively because sometimes additional module files may need to be loaded. If you need to install packages yourself, make sure you install them in the standard "personal library" which is called something like "~/R/x86_64-pc-linux-gnu-library/3.3". That often avoids errors loading packages in the R script when executing in parallel.
I have more to say about your R script:
You need to load the Rmpi package in your R script using library(Rmpi). It isn't automatically loaded when loading doSNOW, so you will get an error when calling mpi.universe.size().
I don't recommend installing R packages in the R script itself. That will fail if install.script needs to prompt you for the CRAN repository, for example, since you can't execute interactive functions from an R script executed via mpirun.
I suggest starting mpi.universe.size() - 1 cluster workers when calling makeCluster. Since mpirun starts one worker, it may not be safe for makeCluster to spawn mpi.universe.size() additional workers since that would result in a total of mpi.universize.size() + 1 MPI processes. That works on some clusters, but it fails on at least one of our clusters.
While debugging, try using the makeCluster outfile='' option. Depending on your MPI installation, that may let you see error messages that would otherwise be hidden.
I would like to use the plumber package to carry out some flexible parallel processing and was hoping it would work within a node.js framework such that it is non-blocking...
I have the following plumber file.
# myfile.R
#* #get /mean
normalMean <- function(samples=10){
Sys.sleep(5)
data <- rnorm(samples)
mean(data)
}
I have also installed pm2 as suggested here http://plumber.trestletech.com/docs/hosting/
I have also made the same run-myfile.sh file i.e.
#!/bin/bash
R -e "library(plumber); pr <- plumb('myfile.R'); pr\$run(port=4000)"
and made it executable as suggested...
I have started up pm2 using
pm2 start /path/to/run-myfile.sh
and wanted to test to see if it could carry out a non-blocking node.js framework...
by opening up another R console and running the following...
foo <- function(){
con <- curl::curl('http://localhost:4000/mean?samples=10000',handle = curl::new_handle())
on.exit(close(con))
return(readLines(con, n = 1, ok = FALSE, warn = FALSE))
}
system.time(for (i in seq(5)){
print(foo())
})
Perhaps it is my miss-understanding of how a node.js non-blocking framework is meant to work, but in my head the last loop should take only a bit of over 5 seconds. But it seems to take 25 seconds, suggesting everything is sequential rather than parallel.
How could I use the plumber package to carry out that non-blocking nature?
pm2 can't load-balance R processes for you, unfortunately. R is single-threaded and doesn't really have libraries that allow it to behave in asynchronous fashion like NodeJS does (yet), so there aren't many great ways to parallelize code like this in plumber today. The best option would be to run multiple plumber R back-ends and distribute traffic across them. See the "load balancing" section here: http://plumber.trestletech.com/docs/docker-advanced
Basically concurrent requests are queued by httpuv so that it is not performant by itself. The author recommends multiple docker containers but it can be complicated as well as response-demanding.
There are other tech eg Rserve and rApache. Rserve forks prosesses and it is possible to configure rApache to pre-fork so as to handle concurrent requests.
See the following posts for comparison
https://www.linkedin.com/pulse/api-development-r-part-i-jaehyeon-kim/
https://www.linkedin.com/pulse/api-development-r-part-ii-jaehyeon-kim/
I'm just doing some performance testing on a new laptop. My problem starts when I tried to test it on parallel computing.
So, when I run the function detectCores() from parallel the result is 1. The problem is that the laptop has an i7- 4800MQ processor which as 4 cores.
As a result when I run my code it thinks that it has only one core and the time to execute the code is exactly the same as without the parallelization.
I’ve tested the code in a different machine with an i5 processor also with 4 cores using the same R version (R 3.0.2 64 bits) and the code runs perfectly. The only difference is that the new computer as installed windows 8.1 while the old one has windows 7
Also, when I run Sys.getenv(“NUMBER_OF_PROCESSORS”) I also get 1 as an answer.
I've search the internet looking for an answer with no joy. As anyone came across this problem before?
Manny thanks
Make sure you are loading the parallel package before running detectCores(). I also have an i-7 processor (Windows 8.1, 64 Bit) and I am able to see as 8 cores when I run detectCores(logical = TRUE) and I get 4 when I run detectCores(logical = FALSE). For more, kindly refer this link. HTH
I'm running R 2.11 64-bit on a WinXP64 machine with 8 processors. With R 2.10.1 the following code spawned 6 R processes for parallel processing:
require(foreach)
require(doSNOW)
cl = makeCluster(6, type='SOCK')
registerDoSNOW(cl)
bl2 = foreach(i=icount(length(unqmrno))) %dopar% {
(Some code here)
}
stopCluster(cl)
When I run the same code in R 2.11 Win64, the 6 R processes are not spawning, and the code hangs. I'm wondering if this is a problem with the port of SNOW to 2.11-64bit, or if any additional code is required on my part. Thanks
BTW, this works just fine on my multicore machine at home running Ubuntu Karmic 64-bit and R 2.11. Unfortunately I have to work on Win64 at work
The code seems to be working here.
R version 2.11.0 (2010-04-22)
x86_64-pc-mingw32
other attached packages:
[1] doSNOW_1.0.3 snow_0.3-3 foreach_1.3.0 codetools_0.2-2
[5] iterators_1.0.3
loaded via a namespace (and not attached):
[1] tools_2.11.0
Check your sessionInfo() to make sure your versions match mine. One thing I noted is that on my Windows 7 machine the first attempt to makeCluster made a request for a firewall exception. If you did not explicitly make allowances for the socket communication that could be why it is hanging. The defaults it opened (ugly as it is) was all TCP and UDP ports when operating under the private profile.
It is an old question, but I encountered the same problems with R-2.13.1 64 on Win 64 bits.
doSNOW was working fine with R 32-bits but not with R 64-bits, and was hanging at "cl = makeCluster(6, type='SOCK')" as well.
To resolve the problem I eventually added "C:\Program Files\R\R-2.13.1\bin\x64" to the %PATH% environment variable (win+pause/advanced system settings/advanced/environment variables/system variables). Make also sure to allow the R connections in Windows Firewall, and that C:\Program Files\R\R-2.13.1\bin contains copy of the 32-bits version of R.exe and Rscript.exe (not the x64 ones).
After doing this, when running makeCluster() 12 processes are started, 6 32 bits and 6 64 bits, but during the calculations only the 64 bits one are used.