I'm trying to parallelize a code on 4 nodes(type = "SOCK"). Here is my code.
library(itertools)
library(foreach)
library(doParallel)
library(parallel)
workers <- ip address of 4 nodes
cl = makePSOCKcluster(workers, master="ip address of master")
registerDoParallel(cl)
z <- read.csv("ProcessedData.csv", header=TRUE, as.is=TRUE)
z <- as.matrix(z)
system.time({
chunks <- getDoParWorkers()
b <- foreach (these = isplitIndices(nrow(z),
chunks=chunks),
.combine = c) %dopar% {
a <- rep(0, length(these))
for (i in 1:length(these)) {
a[i] <- mean(z[these[i],])
}
a
}
})
I get this error:
4 nodes produced errors; first error: object '.doSnowGlobals' not
found.
This code runs fine if I'm using doMC i.e using the same machine's cores. But when I try to use other computers for parallel computing I get the above error. When I change it to registerDoSNOW the error persists.
Does snow and DoSNOW work in a cluster? I could create nodes on the localhost using snow but not on the cluster. Anyone out there using snow?
To set the library path on each worker you can run:
clusterEvalQ(cl, .libPaths("Your library path"))
You can get this error if any of the workers are unable to load the doParallel package. You can make that happen by installing doParallel into some directory and pointing the master to it via ".libPaths":
> .libPaths('~/R/lib.test')
> library(doParallel)
> cl <- makePSOCKcluster(3, outfile='')
starting worker pid=26240 on localhost:11566 at 13:47:59.470
starting worker pid=26248 on localhost:11566 at 13:47:59.667
starting worker pid=26256 on localhost:11566 at 13:47:59.864
> registerDoParallel(cl)
> foreach(i=1:10) %dopar% i
Warning: namespace ‘doParallel’ is not available and has been replaced
by .GlobalEnv when processing object ‘’
Warning: namespace ‘doParallel’ is not available and has been replaced
by .GlobalEnv when processing object ‘’
Warning: namespace ‘doParallel’ is not available and has been replaced
by .GlobalEnv when processing object ‘’
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
3 nodes produced errors; first error: object '.doSnowGlobals' not found
The warning happens when a function from doParallel is deserialized on a worker. The error happens when the function is executed and tries to access .doSnowGlobal which is defined in the doParallel namespace, not in .GlobalEnv.
You can also verify that doParallel is available on the workers by executing:
> clusterEvalQ(cl, library(doParallel))
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
3 nodes produced errors; first error: there is no package called ‘doParallel’
A specific case of #Steve Weston's answer is when your workers aren't able to load a given package (eg doParallel) because the package is inside a Packrat project. Install the packages to the system library, or somewhere else that a worker will be able to find them.
I encountered the same problem today, and I tried all the answers above, none of which worked for me. Then I simply reinstalled the doSNOW package, and magically, the problem was solved.
So none of these fixes worked for me at all. In my particular case, I use a custom R library location. Parallel processing worked if my working directory was the base directory where my custom libraries folder was located, but it failed if I used setwd() to change the working directory.
This custom library location was not being passed on to worker nodes, so they were looking in R's default library directory for packages that were not there. The fix by #Nat did not work for me; the worker nodes still could not find my custom library folder. What did work was:
Before sending jobs to nodes:
paths <- .libPaths()
I then sent jobs to nodes, along with the argument paths. Then, inside the worker function I simply called:
.libPaths(paths)
Related
I've recently converted my windows R code to a Linux installation for running DEoptim on a function. On my windows system it all worked fine using:
ans <- DEoptim1(Calibrate,lower,upper,
DEoptim.control(trace=TRUE,parallelType=1,parVAr=parVarnames3,
packages=c("hydromad","maptools","compiler","tcltk","raster")))
where the function 'Calibrate' consisted of multiple functions. On the windows system I simply downloaded the various packages needed into the R library. The option paralleType=1 ran the code across a series of cores.
However, now I want to put this code onto a Linux based computing cluster - the function 'Calibrate' works fine when stand alone, as does DEoptim if I want to run the code on one core. However, when I specify the parelleType=1, the code fails and returns:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
7 nodes produced errors; first error: there is no package called ‘raster’
This error is reproduced whatever package I try and recall, even though the
library(raster)
command worked fine and 'raster' is clearly shown as okay when I call all the libraries using:
library()
So, my gut feeling is, is that even though all the packages and libraries are loaded okay, it is because I have used a personal library and the packages element of DEoptim.control is looking in a different space. An example of how the packages were installed is below:
install.packages("/home/antony/R/Pkges/raster_2.4-15.tar.gz",rpeo=NULL,target="source",lib="/home/antony/R/library")
I also set the lib paths option as below:
.libPaths('/home/antony/R/library')
Has anybody any idea of what I am doing wrong and how to set the 'packages' option in DEoptim control so I can run DEoptim across multiple cores in parallel?
Many thanks, Antony
My R script worked fine in RStudio (Version 0.98.1091) on Windows 7. Then I restarted my laptop, entered again in RStudio and now it provides the following error messages each time I want to execute my code:
cl <- makeCluster(mc); # build the cluster
Error: could not find function "makeCluster"
> registerDoParallel(cl)
Error: could not find function "registerDoParallel"
> fileIdndexes <- gsub("\\.[^.]*","",basename(SF))
Error in basename(SF) : object 'SF' not found
These error messages are slightly different each time I run the code. It seems that RStudio cannot find any function that is used in the code.
I restarted R Session, cleaned Workspace, restarted RStudio. Nothing helps.
It must be noticed that after many attempts to execute the code, it finally was initialized. However, after 100 iterations, it crashed with the message related to unavailability of localhost.
Add library(*the package needed/where the function is*) for each of the packages you're using.
When running the following code in an R terminal:
library(parallel)
func <- function(a,b,c) a+b+c
testfun <- function() {
cl <- makeCluster(detectCores(), outfile="parlog.txt")
res <- clusterMap(cl, func, 1:10, 11:20, MoreArgs = list(c=1))
print(res)
stopCluster(cl)
}
testfun()
... it works just fine. However, when I copy the two function definitions into my own package, add a line #' #import parallel, do dev_tools::load_all("mypackage") on the R terminal and then call testfun(), I get an
Error in unserialize(node$con) (from myfile.r#7) :
error reading from connection
where #7 is the line containing the call to clusterMap.
So the exact same code works on the terminal but not inside a package.
If I take a look into parlog.txt, I see the following:
starting worker pid=7204 on localhost:11725 at 13:17:50.784
starting worker pid=4416 on localhost:11725 at 13:17:51.820
starting worker pid=10540 on localhost:11725 at 13:17:52.836
starting worker pid=9028 on localhost:11725 at 13:17:53.849
Error: (converted from warning) namespace 'mypackage' is not available and has been replaced
by .GlobalEnv when processing object ''
Error: (converted from warning) namespace 'mypackage' is not available and has been replaced
by .GlobalEnv when processing object ''
Error: (converted from warning) namespace 'mypackage' is not available and has been replaced
by .GlobalEnv when processing object ''
Error: (converted from warning) namespace 'mypackage' is not available and has been replaced
by .GlobalEnv when processing object ''
What's the root of this problem and how do I resolve it?
Note that I'm doing this with a completely fresh, naked package. (Created by devtools::create.) So no interactions with existing, possibly destructive code.
While writing the question, I actually found the answer and am going to share it here.
The problem here is the combination of the packages devtools and parallel.
Apparently, for some reason, parallel requires the package mypackage to be installed into some local library, even if you do not need to load it in the workers explicitly (e.g. using clusterEvalQ(cl, library(mypackage)) or something similar)!
I was employing the usual devtools workflow, meaning that I was working in dev_mode() all of the time. However, this led to my package being installed just in some special dev mode folders (I do not know exactly how this works internally). These are not searched by the worker processes invoked parallel, since they are not in dev_mode.
So here is my 'workaround':
## turn off dev mode
dev_mode()
## install the package into a 'real' library
install("mypackage")
library(mypackage)
## ... and now the following works:
mypackage:::testfun()
As Hadley just pointed out correctly, another workaround would be to add a line
clusterEvalQ(cl, dev_mode())
right after cluster creation. That way, one can use the dev_mode.
I'm trying to run an analysis in parallel on our computing cluster.
Unfortunately I've had to set up Rmpi myself and may not have done so properly.
Because I had to install all necessary packages into my home folder, I always have to call
.libPaths('/home/myfolder/Rlib');
before I can load packages.
However, it appears that doMPI attempts to load itself, before I can set the library path.
.libPaths('/home/myfolder/Rlib');
cat("Step 1")
library(doMPI)
cl <- startMPIcluster()
registerDoMPI(cl)
cat("Step 2")
Children_mcmc1 = foreach(i=1:2) %dopar% {
cat("Step 3")
.libPaths('/home/myfolder/Rlib');
library(MCMCglmm)
cat("Step 4")
load("krmh_married.rdata")
nitt = 1000; thin = 50; burnin = 100
MCMCglmm( children ~ paternalage.factor ,
random=~idParents,
family="poisson",
data=krmh_married,
pr = F, saveX = T, saveZ = T,
nitt=nitt,thin=thin,burnin=burnin)
}
closeCluster(cl)
mpi.quit()
If I do
mpirun -H localhost -n 3 R --slave -f "3 - krmh mcmcglmm scc test 2.r"
I get (after removing some boilerplate messages)
During startup - Warning message:
Step 1
Step 1
Step 1
Step 2Error in { : task 2 failed - "cannot open the connection"
Calls: %dopar% ->
Execution halted
If I do
R --slave -f "3 - krmh mcmcglmm scc test 2.r"
I get
Step 1
Error in library(doMPI) : there is no package called 'doMPI'
Calls: local ... eval -> suppressMessages -> withCallingHandlers -> library
Execution halted
Error in library(doMPI) : there is no package called 'doMPI'
Calls: local ... eval -> suppressMessages -> withCallingHandlers -> library
Execution halted
I've tried installing doMPI on the run, but even though Step 2 isn't printed, it seems as if the error results from the loop.
And of course, with all this I'm still testing on our frontend, I haven't it made it to submitting the job to the intended cluster yet.
I tried to specify the .libPaths call in my .Rprofile, but I'm not sure this would get read on the cluster and I can't even get it to get read on the frontend (and I couldn't figure out where R is looking for the file).
It's much easier to install R packages into a "personal library", since it is used automatically so you don't have to call .libPaths in your scripts. You can determine what directory this is by executing:
> Sys.getenv('R_LIBS_USER')
This will automatically be the first directory returned by .libPaths if it exists, so you don't have to worry about calling .libPaths at all.
Note that there's no point in calling .libPaths in the body of the foreach loop since doMPI must be loaded by the cluster workers before they can execute any tasks.
I'm not sure what's going wrong in your "mpirun" case, because mpirun is starting all of the workers, so the first four lines of your script are executed by all of them. That is why "Step 1" is displayed three times. But in your second case, the cluster workers are being spawned, so the doMPI package is loaded by the RMPIworker.R script, resulting in the error loading doMPI.
I suggest that you use the mpirun approach to solve the .libPaths problem, but call startMPIcluster with the verbose=TRUE option. That will create some files in your working directory named "MPI_*.log" which may contain some useful error messages that will provide a clue to the problem.
I'm attempting to run a parallel job in R using snow. I've been able to run extremely similar jobs with no trouble on older versions of R and snow. R package dependencies prevent me from reverting.
What happens: My jobs terminate at the parRapply step, i.e., the first time the nodes have to do anything short of reporting Sys.info(). The error message reads:
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: cannot open the connection
Calls: parRapply ... clusterApply -> staticClusterApply -> checkForRemoteErrors
Specs: R 2.14.0, snow 0.3-8, RedHat Enterprise Linux Client release 5.6. The snow package has been built on the correct version of R.
Details:
The following code appears to execute fine:
cl <- makeCluster(3)
clusterEvalQ(cl,library(deSolve,lib="~/R/library"))
clusterCall(cl,function() Sys.info()[c("nodename","machine")])
I'm an end-user, not a system admin, but I'm desperate for suggestions and insights into what could be going wrong.
This cryptic error appeared because an input file that's requested during program execution wasn't actually present. Each node would attempt to load this file and then fail, but this would result only in a "cannot open the connection" message.
What this means is that almost anything can cause a "connection" error. Incredibly annoying!