run Rmpi on cluster, specify library path - r

I'm trying to run an analysis in parallel on our computing cluster.
Unfortunately I've had to set up Rmpi myself and may not have done so properly.
Because I had to install all necessary packages into my home folder, I always have to call
.libPaths('/home/myfolder/Rlib');
before I can load packages.
However, it appears that doMPI attempts to load itself, before I can set the library path.
.libPaths('/home/myfolder/Rlib');
cat("Step 1")
library(doMPI)
cl <- startMPIcluster()
registerDoMPI(cl)
cat("Step 2")
Children_mcmc1 = foreach(i=1:2) %dopar% {
cat("Step 3")
.libPaths('/home/myfolder/Rlib');
library(MCMCglmm)
cat("Step 4")
load("krmh_married.rdata")
nitt = 1000; thin = 50; burnin = 100
MCMCglmm( children ~ paternalage.factor ,
random=~idParents,
family="poisson",
data=krmh_married,
pr = F, saveX = T, saveZ = T,
nitt=nitt,thin=thin,burnin=burnin)
}
closeCluster(cl)
mpi.quit()
If I do
mpirun -H localhost -n 3 R --slave -f "3 - krmh mcmcglmm scc test 2.r"
I get (after removing some boilerplate messages)
During startup - Warning message:
Step 1
Step 1
Step 1
Step 2Error in { : task 2 failed - "cannot open the connection"
Calls: %dopar% ->
Execution halted
If I do
R --slave -f "3 - krmh mcmcglmm scc test 2.r"
I get
Step 1
Error in library(doMPI) : there is no package called 'doMPI'
Calls: local ... eval -> suppressMessages -> withCallingHandlers -> library
Execution halted
Error in library(doMPI) : there is no package called 'doMPI'
Calls: local ... eval -> suppressMessages -> withCallingHandlers -> library
Execution halted
I've tried installing doMPI on the run, but even though Step 2 isn't printed, it seems as if the error results from the loop.
And of course, with all this I'm still testing on our frontend, I haven't it made it to submitting the job to the intended cluster yet.
I tried to specify the .libPaths call in my .Rprofile, but I'm not sure this would get read on the cluster and I can't even get it to get read on the frontend (and I couldn't figure out where R is looking for the file).

It's much easier to install R packages into a "personal library", since it is used automatically so you don't have to call .libPaths in your scripts. You can determine what directory this is by executing:
> Sys.getenv('R_LIBS_USER')
This will automatically be the first directory returned by .libPaths if it exists, so you don't have to worry about calling .libPaths at all.
Note that there's no point in calling .libPaths in the body of the foreach loop since doMPI must be loaded by the cluster workers before they can execute any tasks.
I'm not sure what's going wrong in your "mpirun" case, because mpirun is starting all of the workers, so the first four lines of your script are executed by all of them. That is why "Step 1" is displayed three times. But in your second case, the cluster workers are being spawned, so the doMPI package is loaded by the RMPIworker.R script, resulting in the error loading doMPI.
I suggest that you use the mpirun approach to solve the .libPaths problem, but call startMPIcluster with the verbose=TRUE option. That will create some files in your working directory named "MPI_*.log" which may contain some useful error messages that will provide a clue to the problem.

Related

R/RStudio unable to run, with looping socketConnection error

A few days ago, I was having an error running models in R using 'brms', which said that my posterior samples didn't exist. Upon reading further, these links (1, 2, 3, 4) led me to think it was an rstan problem playing with my macOS (Catalina 10.15.6).
I followed their instructions, namely:
-updated packages Rcpp, rstan, arm, and brms
-followed these workaround instructions to alter the 'parallel' settings for stan: https://github.com/rstudio/rstudio/issues/6692
-updated R and RStudio, since this problem was supposedly fixed a few months ago with R 4.0
-updated XCode 11, Quartz 11, GNU Fortran 8.2
-updated latest macOS Catalina bug fixes
-ran sudo rm -rf [path to R] to uninstall R
-tried to do a thorough uninstall of all R and RStudio files, including deleting files in my Library/Frameworks folder, any .plist files in Library/Preferences, and any .Rprofile, .Rscript, .Rapp, .Rhistory, or .Renvirons files
-reinstalled R and RStudio after restart
Now, instead of having a "blank slate" to start from, I am experiencing some super weird behaviors. First, RStudio opens on a completely white blank screen and never loads. Second, when I try to open R directly either via terminal or with R Console, I get stuck in a loop for nearly 20 min that says:
Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :
file descriptor is too large for select()
Calls: <Anonymous> ... makePSOCKcluster -> newPSOCKnode -> socketConnection
Execution halted
Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :
cannot open the connection
Calls: <Anonymous> ... makePSOCKcluster -> newPSOCKnode -> socketConnection
In addition: Warning message:
In socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :
port 11537 cannot be opened
Execution halted
At the very end, when it finally stops looping forever, it says:
/Library/Frameworks/R.framework/Versions/4.0/Resources/bin/R: cannot make pipe for command substitution: Too many open files
ERROR: option '-e' requires a non-empty argument
rm: /var/folders/54/km__8z8x78x8_ct1pw8w8bbh0000gn/T//RtmpVORdTy: Too many open files
I can't access a console or enter anything in R to try to troubleshoot. Moreover, it causes a massive slowdown to my computer and Activity Monitor shows more than 150 'R' processes running, which don't go away after quitting R, only after using 'killall R' in Terminal.
However, someone in IT helped me determine that it's something in my Mac user library or preferences, because we created a brand new user on my machine, installed R and RStudio, and had no problems loading them.
I am just a psychology grad student, so I really don't understand the back end that makes R work and I am totally baffled by these symptoms.
I suspect that these links (5, 6, 7) might help, but I don't know how to execute the solutions because right now I can't enter or run anything in R without triggering that endless loop of 'Execution halted.'
I could really use a hand, thanks!

errors running R script on Clusters

I'm running an R script using foreach() and %dopar% on clusters. Without changing anything in my code, the execution of my program is interrupted every time returning 3 different errors:
1.
Error in unserialize(socklist[[n]]) : error reading from connection
Calls: %dopar% ... recvOneData -> recvOneData.SOCKcluster -> unserialize
Execution halted
Error in { :
task 1 failed - "unable to load shared object '/cluster/apps/r/3.5.1_openblas/x86_64/lib64/R/library/units/libs/units.so':
libudunits2.so.0: cannot open shared object file: No such file or directory"
Calls: %dopar% -> <Anonymous>
Execution halted
3.
TERM_MEMLIMIT: job killed after reaching LSF memory usage limit.
Exited with exit code 1
In the last scenario, I tried increasing the number of nodes or the used memory in the submission command bsub -n 23 -W 20:00 -R "rusage[mem=4072]" "R --vanilla --slave <1.algorithm_function_part0_alternative.R> resultFunPart0Alt.out", but still it returns the same error or one of the previous two. I'm using the R version 3.5.1.

"R CMD check" throws warning on use of 'devtools::test()', but allows 'test()', but need to use full function name

I'm running my package through R CMD check and the only (remaining) warning is the following:
W checking for unstated dependencies in 'tests' (4.4s)
'::' or ':::' import not declared from: 'devtools'
After getting confused for a long time about this seemingly nonsensical warning, I realized it's coming from my "test manager" script (see reason for its need below). This is file pkg/tests/testthat.R, while the tests themselves are in pkg/tests/testthat/.
# testthat.R
sink(stderr(), type = "output")
x <- tryCatch(
{
x <- data.frame(devtools::test()) # here's the problem!
as.numeric(sum(x$failed) > 0)
},
error = function(e) {
1
}
)
sink(NULL, type = "output")
cat(1)
If I comment out this entire file, the R CMD check warning vanishes.
And then the weird part: if I replace devtools::test() with just test(), the R CMD check warning vanishes.
However, the purpose of this "manager" script is to be to be called (via Rscript) by a git pre-commit hook. This way, I can run all my tests to ensure the commit is stable. Due to this, I can't use test(), since devtools isn't loaded when the script is run via Rscript.
I tried a few things to satisfy both R CMD check and being called by Rscript:
Using library(devtools) doesn't work (throws a package not found error);
Moving testthat.R out of the /tests/ folder and into the top-level. This kills the R CMD check warning, but it now instead throws a note: Non-standard file/directory found at top level: 'testthat.R', so not exactly satisfactory (especially since keeping it in the /tests/ directory seems more logically consistent);
Testing for a function which has been apparently loaded by R CMD check to determine behavior. Since using a naked test() works, I assumed devtools was loaded, so prepended the following to the file (and used runTests on the problematic line). The logic being, if we can find test(), use it. If we can't, then this probably isn't R CMD check, so we can use the full name.
if (length(find("test")) == 0) {
runTests <- devtools::test()
} else {
runTests <- test()
}
Unfortunately, this just made things worse: the warning remains and we also get an error on the if-else block:
> if (length(find("test")) == 0) {
+ runTests <- devtools::test()
+ } else {
+ runTests <- test()
+ }
Error in loadNamespace(name) : there is no package called 'devtools'
Calls: :: ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Why devtools::test() throws an error here and just a warning on the problematic line is beyond me.
Similarly, using testthat::skip(). Also doesn't work.
So, what can I do to satisfy both R CMD check and being called by Rscript? Is there a way to tell R CMD check to ignore this file?
For the record, this is my git pre-commit hook, in case it can be reformulated to solve this problem some other way
#!/bin/sh
R_USER="D:/Users/wasabi/Documents"
export R_USER
# check that Rscript is accessible via PATH; fail otherwise
command -v Rscript >/dev/null || {
echo "Rscript must be accessible via PATH. Commit aborted.";
exit 1;
};
# check whether there are unstaged changes. If so, stash them.
# This allows the tests to run only on previously committed or
# indexed (added on this commit) changes.
hasChanges=$(git diff)
if [ -n "$hasChanges" ]; then
git stash push --keep-index
fi
exitCode=$(Rscript tests/testthat.R)
# remember to unstash any unstaged changes
if [ -n "$hasChanges" ]; then
git stash pop
fi
exit $exitCode
The solution is to simply add tests/testthat.R to .Rbuildignore (either by hand in the form of a regular expression or using usethis::use_build_ignore("tests/testthat.R")).
If you actually run R CMD check, the warning will still appear (since it runs on the source files, and therefore ignores .Rbuildignore, unless you run it on the binary itself).
But the "Check Package" command in RStudio relies on devtools::check(), which builds the package first and then checks the binary, therefore not getting the error. And since that's how my team and I will actually be running the checks, it's sufficient.
Solution inspired by this question.

How to use the parallel package inside another package, using devtools?

When running the following code in an R terminal:
library(parallel)
func <- function(a,b,c) a+b+c
testfun <- function() {
cl <- makeCluster(detectCores(), outfile="parlog.txt")
res <- clusterMap(cl, func, 1:10, 11:20, MoreArgs = list(c=1))
print(res)
stopCluster(cl)
}
testfun()
... it works just fine. However, when I copy the two function definitions into my own package, add a line #' #import parallel, do dev_tools::load_all("mypackage") on the R terminal and then call testfun(), I get an
Error in unserialize(node$con) (from myfile.r#7) :
error reading from connection
where #7 is the line containing the call to clusterMap.
So the exact same code works on the terminal but not inside a package.
If I take a look into parlog.txt, I see the following:
starting worker pid=7204 on localhost:11725 at 13:17:50.784
starting worker pid=4416 on localhost:11725 at 13:17:51.820
starting worker pid=10540 on localhost:11725 at 13:17:52.836
starting worker pid=9028 on localhost:11725 at 13:17:53.849
Error: (converted from warning) namespace 'mypackage' is not available and has been replaced
by .GlobalEnv when processing object ''
Error: (converted from warning) namespace 'mypackage' is not available and has been replaced
by .GlobalEnv when processing object ''
Error: (converted from warning) namespace 'mypackage' is not available and has been replaced
by .GlobalEnv when processing object ''
Error: (converted from warning) namespace 'mypackage' is not available and has been replaced
by .GlobalEnv when processing object ''
What's the root of this problem and how do I resolve it?
Note that I'm doing this with a completely fresh, naked package. (Created by devtools::create.) So no interactions with existing, possibly destructive code.
While writing the question, I actually found the answer and am going to share it here.
The problem here is the combination of the packages devtools and parallel.
Apparently, for some reason, parallel requires the package mypackage to be installed into some local library, even if you do not need to load it in the workers explicitly (e.g. using clusterEvalQ(cl, library(mypackage)) or something similar)!
I was employing the usual devtools workflow, meaning that I was working in dev_mode() all of the time. However, this led to my package being installed just in some special dev mode folders (I do not know exactly how this works internally). These are not searched by the worker processes invoked parallel, since they are not in dev_mode.
So here is my 'workaround':
## turn off dev mode
dev_mode()
## install the package into a 'real' library
install("mypackage")
library(mypackage)
## ... and now the following works:
mypackage:::testfun()
As Hadley just pointed out correctly, another workaround would be to add a line
clusterEvalQ(cl, dev_mode())
right after cluster creation. That way, one can use the dev_mode.

"Cannot open the connection" - HPC in R with snow

I'm attempting to run a parallel job in R using snow. I've been able to run extremely similar jobs with no trouble on older versions of R and snow. R package dependencies prevent me from reverting.
What happens: My jobs terminate at the parRapply step, i.e., the first time the nodes have to do anything short of reporting Sys.info(). The error message reads:
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: cannot open the connection
Calls: parRapply ... clusterApply -> staticClusterApply -> checkForRemoteErrors
Specs: R 2.14.0, snow 0.3-8, RedHat Enterprise Linux Client release 5.6. The snow package has been built on the correct version of R.
Details:
The following code appears to execute fine:
cl <- makeCluster(3)
clusterEvalQ(cl,library(deSolve,lib="~/R/library"))
clusterCall(cl,function() Sys.info()[c("nodename","machine")])
I'm an end-user, not a system admin, but I'm desperate for suggestions and insights into what could be going wrong.
This cryptic error appeared because an input file that's requested during program execution wasn't actually present. Each node would attempt to load this file and then fail, but this would result only in a "cannot open the connection" message.
What this means is that almost anything can cause a "connection" error. Incredibly annoying!

Resources