doMPI not recognizing other nodes in cluster for R script - r

Using RHEL7.3
Using R 3.3.2
Installed Rmpi_0.6-6.tar.gz and doMPI_0.2.1.tar.gz
Installed mpich-3.0-3.0.4-10.el7 RPM for x86_64
I created a cluster of three machines (aml1,2,3). I can run the /examples/cpi example from the mpich installation and the processes run without issue on all three machines.
I can also run an R script that needs to be run multiple times, which is discussed on the doMPI documentation -- so the script runs on all clusters.
My problem is when my R script has code prior to the %dopar% that needs to be run once on the master(aml1), and have the %dopar% run on the cluster (aml2,aml3). It only runs on the master. And doMPI says Size of MPI universe: 0 and doesn't recognize aml2 or aml3.
For example:
Run: mpirun -np 1 --hostfile ~/projects/hosts R --no-save -q < example6.R
(and my ~/projects/hosts file is defined to use 8 cores)
example6.R:
library(doMPI) #load doMPI library
cl <- startMPIcluster(verbose=TRUE)
#load data
#clean data
#perform some functions
#let's say I want to have this done in the script and only parallelize this
x <- foreach(seed=c(7, 11, 13), .combine="cbind") %dopar% {
set.seed(seed)
rnorm(3)
}
x
closeCluster(cl)
Output of example6.R:
Master processor name: aml1; nodename: aml1
Size of MPI universe: 0
Spawning 2 workers using the command:
/usr/lib64/R/bin/Rscript /usr/lib64/R/library/doMPI/RMPIworker.R WORKDIR=/home/spark LOGDIR=/home/spark MAXCORES=1 COMM=3 INTERCOMM=4 MTAG=10 WTAG=11 INCLUDEMASTER=TRUE BCAST=TRUE VERBOSE=TRUE
2 slaves are spawned successfully. 0 failed.
If I define cl <- startMPIcluster(count=34, verbose=TRUE) I still get the following but at least I can run 34 slaves:
Master processor name: aml1; nodename: aml1
Size of MPI universe: 0
34 slaves are spawned successfully. 0 failed.
How can I troubleshoot this? I would like to run the R script so it runs the first portion once on the master, and then do %dopar% on the cluster.
Thanks!!
Update 1
Since the last update, I tried running an older version of OpenMPI:
[spark#aml1 ~]$ which mpirun
/opt/openmpi-1.8.8/bin/mpirun
Per #SteveWeston, I created the following script and ran it:
[spark#aml1 ~]$ cat sanity_check.R
library(Rmpi)
print(mpi.comm.rank(0))
mpi.quit()
With the following output:
[spark#aml1 ~]$ mpirun -np 3 --hostfile ~/projects/hosts R --slave -f sanity_check.R
FIPS mode initialized
master (rank 0, comm 1) of size 3 is running on: aml1
slave1 (rank 1, comm 1) of size 3 is running on: aml1
slave2 (rank 2, comm 1) of size 3 is running on: aml1
[1] 0
Here it just hangs -- and nothing happens.

I've already accepted #SteveWeston's answer as it helped me in better understanding my original question.
I commented to his answer that I was still having issues with my R script hanging; the scripts would run, but it would never finish on its own or close its own clusters and I would have to kill it with ctrl-C.
I ultimately set up an nfs environment, build and installed openmpi-1.10.5 there, and installed my R libraries there as well. R is installed separately on both machines, but they share the same library in my nfs directory. Previously I had installed and managed everything under root, including the R libraries (I know). I'm not sure if this what caused complications, but my issues seem to be resolved.
[master#aml1 nfsshare]$ cat sanity_check.R
library(Rmpi)
print(mpi.comm.rank(0))
mpi.quit(save= "no")
[master#aml1 nfsshare]$ mpirun -np 3 --hostfile hosts R --slave -f sanity_check.R
FIPS mode initialized
[1] 1
[1] 0
[1] 2
# no need to ctrl-C here. It no longer hangs

Related

Forks are spawned on a single core on interactive HPC node

I am trying to test a script I have developed locally on an interactive HPC node, and I keep running in this strange issue that mclapply works only on a single core. I see several R processes spawned in htop (as many as the number of the cores), but they all occupy only one core.
Here is how I obtain the interactive node:
srun -n 16 -N 1 -t 5 --pty bash -il
Is there a setting I am missing? How can I make this work? What can I check?
P.S. I just tested and the other programs that rely on forking to do parallel processing (say pigz) are afflicted by the same issue as well. Those that rely on MPI and messaging work properly, it seems.
Yes, you are missing a setting. Try:
srun -N 1 -n 1 -c 16 -t 5 --pty bash -il
The problem is that you are running the parallel commands within a bash shell that is allocated on a single core, so the bash process is spawned on only one of the cores requested by srun.
Otherwise, you can first allocate your resources using salloc and once you obtain them run your actual command. For instance:
salloc -N 1 -n 1 -c 16 -t 5
srun pigz file.ext

"no enough slots" error of running Open MPI on databricks cluster with Linux

I try to use mpi to run a C application on databricks clusters.
I have downloaded Open MPI from
https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.3.tar.gz
and installed on databricks cluster.
It was built on databricks cluster with Ubuntu.
Operating system/version: Linux 4.4.0 Ubuntu
Computer hardware: x86_64
Network type: databricks
I am trying to run from python notebook on databricks:
%sh
mpirun --allow-run-as-root -np 20 MY_c_Application
The MY_c_Application was written by C and compiled on databricks Linux.
My databricks cluster has 21 nodes with one as driver. Each node has 32 cores.
When I run the above command, I got the error as follows.
Could you please let me know how this could be caused ?
Or, do I miss something ?
thanks
There are not enough slots available in the system to satisfy the 20
slots that were requested by the application:
MY_c_application
Either request fewer slots for your application, or make more slots available for use.
A "slot" is the Open MPI term for an allocatable unit where we can launch a process.
The number of slots available are defined by the environment in which Open MPI processes are run:
Hostfile, via "slots=N" clauses (N defaults to number of processor cores if not provided)
The --host command line parameter, via a ":N" suffix on the hostname
(N defaults to 1 if not provided)
Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
If none of a hostfile, the --host command line parameter, or an RM
is present, Open MPI defaults to the number of processor cores In
all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use
the --use-hwthread-cpus option.
Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to launch.
UPDATE
After adding a hostfile , this problem is gone.
sudo mpirun --allow-run-as-root -np 25 --hostfile my_hostfile ./MY_C_APP
thanks
Sharing the answer as per by the original poster:
After adding a hostfile, the problem as resolved.
sudo mpirun --allow-run-as-root -np 25 --hostfile my_hostfile ./MY_C_APP

why does mpirun behave as it does when used with slurm?

I am using Intel MPI and have encountered some confusing behavior when using mpirun in conjunction with slurm.
If I run (in a login node)
mpirun -n 2 python -c "from mpi4py import MPI; print(MPI.COMM_WORLD.Get_rank())"
then I get as output the expected 0 and 1 printed out.
If however I salloc --time=30 --nodes=1 and run the same mpirun from the interactive compute node, I get two 0s printed out instead of the expected 0 and 1.
Then, if I change -n 2 to -n 3 (still in compute node), I get a large error from slurm saying srun: error: PMK_KVS_Barrier task count inconsistent (2 != 1) (plus a load of other stuff), but I am not sure how to explain this either...
Now, based on this OpenMPI page, it seems these kind of operations should be supported at least for OpenMPI:
Specifically, you can launch Open MPI's mpirun in an interactive SLURM allocation (via the salloc command) or you can submit a script to SLURM (via the sbatch command), or you can "directly" launch MPI executables via srun.
Maybe the Intel MPI implementation I was using just doesn't have the same support and is not designed to be used directly in a slurm environment (?), but I am still wondering: what is the underlying nature of mpirun and slurm (salloc) that this is the behavior produced? Why would it print two 0s in the first "case," and what are the inconsistent task counts it talks about in the second "case"?

Linux cluster, Rmpi and number of procesess

Since the beginning of November, I'm stuck in to run a parallel job in a Linux cluster. I already search A LOT on the internet searching for information but I simply can't progress. When I start to search for parallelism in R using cluster I discovered the Rmpi. It looked quite simple, but now I don't now more what to do. I have a script to send my job:
#PBS -S /bin/bash
#PBS -N ANN_residencial
#PBS -q linux.q
#PBS -l nodes=8:ppn=8
cd $PBS_O_WORKDIR
source /hpc/modulos/bash/R-3.3.0.sh
export LD_LIBRARY_PATH=/hpc/nlopt-2.4.2/lib:$LD_LIBRARY_PATH
export CPPFLAGS='-I/hpc/nlopt-2.4.2/include '$CPPFLAGS
export PKG_CONFIG_PATH=/hpc/nlopt-2.4.2/lib/pkgconfig:$PKG_CONFIG_PATH
# OPENMPI 1.10 + GCC 5.3
source /hpc/modulos/bash/openmpi-1.10-gcc53.sh
mpiexec --mca orte_base_help_aggregate 0 -np 1 -hostfile ${PBS_NODEFILE} /hpc/R-3.3.0/bin/R --slave -f sunhpc_mpi.r
And this is the beginning of my R program:
library(caret)
library(Rmpi)
library(doMPI)
cl <- startMPIcluster()
registerDoMPI(cl)
So here is my questions:
1- Is this way I should initialize the processes (i.e. using starMPIcluster whitout a parameter and using at the command line -np 1)?
2- Why when I use this commands the MPI complains with it's frase?
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process....
OBS: He said that for all the 64 processes (because there are 8 nodes with 8 cpus and I'm creating 63 processes)
3- Why when I use this commands on a machine of 60 CPU's he just spawn two workers?
Finally, I got it!
To run a parallel program in R using the Rmpi in a cluster you need to configure the job script according to the system. Next on the command line:
mpiexec --mca orte_base_help_aggregate 0 -np 1 -hostfile ${PBS_NODEFILE} /hpc/R-3.3.0/bin/R --slave -f sunhpc_mpi.r
You have to modify to:
mpiexec -np NUM_PROC -hostfile ${PBS_NODEFILE} /hpc/R-3.3.0/bin/R --slave -f sunhpc_mpi.r
On the R code, you must not detail anything 'startMPIcluster()' So, the code will exactly as I wrote above.

MPI remote nodes do not execute

I am trying to build a cluster to parallelize a R function. I use Open MPI in Ubuntu and R packages (Rmpi and Snow). The test code I am running is:
cl <- makeMPIcluster(8)
fun <- function(){
Sys.info()[c("nodename")]
}
clusterCall(cl,fun)
stopCluster(cl)
mpi.quit()
The command is:
mpirun -H localhost,node2 -n 1 R --slave -f testSnowMPI.R
The problem is that all the returns are the local hostname. I observed the process in localhost and node2 as well and the process seems to start (4 in localhost and 4 in node2) but quickly the slaves processes at node2 stop and everything is executed in localhost.
I did another test having different scripts (testSnowMPI.R) for each node and when I changed the parameter -n 1 to -n 2 I had different returns as expected but both scripts were executed by the localhost.
Another interesting test is when I make the mpirun command in localhost but I just set node2 host for the execution (-H node2). The answer I have is All nodes which are allocated for this job are already filled.
I can ping localhost from node2 and node2 from localhost. And I already have set the ssh connection without passphrase.
It seems like the processes in node2 start normally but it can not write the return to the master and then localhost do all the work.
I did the same tests having node2 as the localhost and the behaviour was exactly the same.
Do you have any idea about the weird behaviour of these tests?
EDIT
I did some tests using only Rmpi functions (without Snow functions). I wrote this script:
mpi.spawn.Rslaves()
mpi.close.Rslaves()
The command was:
mpirun -H localhost,node2,node2 -n 1 R --slave -f testSnowMPI.R
And I got this output:
master (rank 0, comm 1) of size 3 is running on: node1
slave1 (rank 1, comm 1) of size 3 is running on: node1
slave2 (rank 2, comm 1) of size 3 is running on: node1
(node1 is the localhost)

Resources