PBS scheduler assigning same processor for an MPI program of 3 processors - mpi

I am doing MPI programming on a cluster with 8 nodes and each having a Intel Xeon hexcore processor. I need three processors for my mpi code.
I submit the job using qsub. When I check on which processors the job is running using "qstat -n" it says something like cn004/0*3 .
So does this mean it is running it on only one processor ??
Because it is not speeding up than when I use a single processor(This is when the domain size is the same for both cases)
The script i use for submitting is as follows
#! /bin/bash
#PBS -o logfile.log
#PBS -e errorfile.err
#PBS -l cput=40:00:00
#PBS -lselect=1:ncpus=3:ngpus=3
#PBS -lplace=excl
cat $PBS_NODEFILE
cd $PBS_O_WORKDIR
mpicc -g -W -c -I /usr/local/cuda/include mpi1.c
mpicc -g -W mpi1.o -L /usr/local/cuda/lib64 -lOpenCL
mpirun -np 3 ./a.out

"qstat -n" it says something like cn004/0*3.
Q: So does this mean it is running it on only one processor ??
The short answer is "no". This does not mean that it runs on one processor.
"cn004/0*3" should be interpreted as "The job is allocated three cpu cores. And if we were to number the cores from 0 to 5 then the cores allocated would have numbers 0,1,and 2".
If another job were to run on the node it would receive the next three consecutive numbers "3,4, and 5". In the qstat -n output this would look like "cn004/3*3".
You use the directive place=excl to ensure that other jobs would not get the node, so essentially all the six cores are available.
Now for your second question:
Q: it is not speeding up than when I use a single processor
In order to answer this question we need to know if the algorithm is parallelized correctly.

Related

Forks are spawned on a single core on interactive HPC node

I am trying to test a script I have developed locally on an interactive HPC node, and I keep running in this strange issue that mclapply works only on a single core. I see several R processes spawned in htop (as many as the number of the cores), but they all occupy only one core.
Here is how I obtain the interactive node:
srun -n 16 -N 1 -t 5 --pty bash -il
Is there a setting I am missing? How can I make this work? What can I check?
P.S. I just tested and the other programs that rely on forking to do parallel processing (say pigz) are afflicted by the same issue as well. Those that rely on MPI and messaging work properly, it seems.
Yes, you are missing a setting. Try:
srun -N 1 -n 1 -c 16 -t 5 --pty bash -il
The problem is that you are running the parallel commands within a bash shell that is allocated on a single core, so the bash process is spawned on only one of the cores requested by srun.
Otherwise, you can first allocate your resources using salloc and once you obtain them run your actual command. For instance:
salloc -N 1 -n 1 -c 16 -t 5
srun pigz file.ext

Linux cluster, Rmpi and number of procesess

Since the beginning of November, I'm stuck in to run a parallel job in a Linux cluster. I already search A LOT on the internet searching for information but I simply can't progress. When I start to search for parallelism in R using cluster I discovered the Rmpi. It looked quite simple, but now I don't now more what to do. I have a script to send my job:
#PBS -S /bin/bash
#PBS -N ANN_residencial
#PBS -q linux.q
#PBS -l nodes=8:ppn=8
cd $PBS_O_WORKDIR
source /hpc/modulos/bash/R-3.3.0.sh
export LD_LIBRARY_PATH=/hpc/nlopt-2.4.2/lib:$LD_LIBRARY_PATH
export CPPFLAGS='-I/hpc/nlopt-2.4.2/include '$CPPFLAGS
export PKG_CONFIG_PATH=/hpc/nlopt-2.4.2/lib/pkgconfig:$PKG_CONFIG_PATH
# OPENMPI 1.10 + GCC 5.3
source /hpc/modulos/bash/openmpi-1.10-gcc53.sh
mpiexec --mca orte_base_help_aggregate 0 -np 1 -hostfile ${PBS_NODEFILE} /hpc/R-3.3.0/bin/R --slave -f sunhpc_mpi.r
And this is the beginning of my R program:
library(caret)
library(Rmpi)
library(doMPI)
cl <- startMPIcluster()
registerDoMPI(cl)
So here is my questions:
1- Is this way I should initialize the processes (i.e. using starMPIcluster whitout a parameter and using at the command line -np 1)?
2- Why when I use this commands the MPI complains with it's frase?
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process....
OBS: He said that for all the 64 processes (because there are 8 nodes with 8 cpus and I'm creating 63 processes)
3- Why when I use this commands on a machine of 60 CPU's he just spawn two workers?
Finally, I got it!
To run a parallel program in R using the Rmpi in a cluster you need to configure the job script according to the system. Next on the command line:
mpiexec --mca orte_base_help_aggregate 0 -np 1 -hostfile ${PBS_NODEFILE} /hpc/R-3.3.0/bin/R --slave -f sunhpc_mpi.r
You have to modify to:
mpiexec -np NUM_PROC -hostfile ${PBS_NODEFILE} /hpc/R-3.3.0/bin/R --slave -f sunhpc_mpi.r
On the R code, you must not detail anything 'startMPIcluster()' So, the code will exactly as I wrote above.

error in using "qsub" when allocating nodes

my script (13-4.sh) is :
#!/bin/sh
#PBS -N sample
#PBS -l nodes=4:ppn=64
#PBS -q batch
#PBS -o $HOME/qpms9-2/out/14-4.out
#PBS -e $HOME/qpms9-2/error/14-4.out
#PBS -l walltime=100:00:00
mpirun $HOME/qpms9-2/run_mpi $HOME/qpms9-2/14-4 -l 14 -d 4
when i write this command : qsub 13-4.sh
The answer is as follows:
qsub: Job exceeds queue resource limits MSG=cannot satisfy queue max nodes requirement
my cluster has 10 nodes (64 core per node)
This might be an issue with your scheduler. I don't know which one you have but you should search the relevant documentation and settings to see if there is a setting that is capping the maximum number of nodes per user. If you type
/sbin/service pbs status
you can find which scheduler you're using by checking which services are running. Popular schedulers are pbs_sched, maui, and moab.
I would also make sure that all 10 nodes are online. You might be able to test this using
pbsnodes
Depending on how your cluster is configured. Additionally you should check that the batch queue exists and there are no standing reservations.

Multithreaded program only runs on a single processor after compiling, how do I troubleshoot?

I am trying to run a compiled program that is supposed to be running on multiple processors. But with the same data, sometimes this program runs in parallel and sometimes it won't (with the identical PBS script file!). I am suspecting that something is wrong with some of the compute nodes that won't let it run on parallel (I don't get to choose the compute node I want). How can I troubleshoot if this is a bug in the program or it is problem with the compute node?
As per the sys admin's adivce, I am using ulimit -s 100000, but this don't change anything. Also, this program is not an mpi program (runs only on a single node, with multiple processors).
The code that I run is as follows:
quorum_error_correct_reads -q 68 \
--contaminant=/data004/software/GIF/packages/masurca/2.3.0rc1/bin/../share/adapter.jf \
-m 1 -s 1 -g 1 -a 3 --thread=32 -w 10 -e 3 \
quorum_mer_db.jf aa.renamed.fastq ab.renamed.fastq ac.renamed.fastq ad.renamed.fastq ae.renamed.fastq af.renamed.fastq ag.renamed.fastq \
--no-discard -o pe.cor --verbose
Thanks for any advice you can offer. I will greatly appreciate your help!
PS: I don't have sudo access.
EDIT: I know it is supposed to be using multiple processors because, when I SSH into the node and do top -c I can see (above command) sometimes running like 3200 % CPU (all the time) and sometimes only 100 % CPU all the time. This is the only step involved and there are no other sub-process within this program. Also, I am using HPC, where I submit the job to a compute node, each with 32 procs, 512GB RAM.

Determine total CPU count after qsub within PBS script

For a PBS script called with qsub, I want to know how many total CPU's have actually been allocated in case the number defined in the PBS file is overwritten by inputs from the command line. For example with the following pbs script file:
jobscript.pbs:
#!/bin/bash
#PBS -N test_run
#PBS -l nodes=32
#PBS -l walltime=06:00:00
#PBS -j oe
#PBS -q normal
#PBS -o output.txt
cd $PBS_O_WORKDIR
module load gcc-openmpi-1.2.7
time mpiexec visct
This script could be run with just 16 CPU's (instead of 32) using the following command line:
$ qsub -l nodes=2:ppn=8 jobscript.pbs
So I would like a robust method for determining how many CPU's are actually available from within the script.
I was able to answer my own question with the following solution using the $PBS_NODEFILE environment variable which contains the path to a file listing information about the available nodes:
jobscript.pbs:
#!/bin/bash
#PBS -N test_run
#PBS -l nodes=32
#PBS -l walltime=06:00:00
#PBS -j oe
#PBS -q normal
#PBS -o output.txt
# This finds out the number of nodes we have
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
echo "Total CPU count = $NP"
Thanks to "Source" after much online searching.
MasterHD I know you have found your answer but I thought I would share another way
This code is longer but it helps for my specific needs. I actually use pbsnodes commands. Below is a snippet of my code.
#nodes_whole =`pbsnodes -av -s $server | grep "pcpus" `;
$nodes_count = ` pbsnodes -av -s $server | grep "pcpus" | wc -l `;
while($i < $nodes_count){
#cpu_present = split(/\s+/, $nodes_whole[$i]);
$cpu_whole_count += $cpu_present[3];
$i++;
}
I do this because in my script I check things like the amount of cpus , which varies depending on the node the cpus maybe be 4, 8, 16. Also I have multiple clusters which are always changing size and I don't want the script have specific cluster or node info hard coded. Mainly, I do this because when a user submits a job I check to see how many resources they can use . If say they want use a queue and request 200 cpus but on cluster A their job will be queued my script can tell them they will be queued but would not be on cluster b or d. So then they have the option to change before they submit.
I also use it to check for nodes down:
#nodes_down=`pbsnodes -l -s $server `;
I see what resources are in use:
#nodes_used=`pbsnodes -av -s $server | grep "resources_assigned.ncpus" `;
Also in one case I have two clusters running off one head node while I wait for hardware. In that case I check to see what cluster the node is assigned to and then do a count based on the node assigned to that cluster. That way all the users see is another cluster and use the script they way they would for any of the other clusters.
I just mention because I have found a lot of useful ways to use the pbsnodes and it worked well for my particular needs.

Resources