Determine total CPU count after qsub within PBS script - mpi

For a PBS script called with qsub, I want to know how many total CPU's have actually been allocated in case the number defined in the PBS file is overwritten by inputs from the command line. For example with the following pbs script file:
jobscript.pbs:
#!/bin/bash
#PBS -N test_run
#PBS -l nodes=32
#PBS -l walltime=06:00:00
#PBS -j oe
#PBS -q normal
#PBS -o output.txt
cd $PBS_O_WORKDIR
module load gcc-openmpi-1.2.7
time mpiexec visct
This script could be run with just 16 CPU's (instead of 32) using the following command line:
$ qsub -l nodes=2:ppn=8 jobscript.pbs
So I would like a robust method for determining how many CPU's are actually available from within the script.

I was able to answer my own question with the following solution using the $PBS_NODEFILE environment variable which contains the path to a file listing information about the available nodes:
jobscript.pbs:
#!/bin/bash
#PBS -N test_run
#PBS -l nodes=32
#PBS -l walltime=06:00:00
#PBS -j oe
#PBS -q normal
#PBS -o output.txt
# This finds out the number of nodes we have
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
echo "Total CPU count = $NP"
Thanks to "Source" after much online searching.

MasterHD I know you have found your answer but I thought I would share another way
This code is longer but it helps for my specific needs. I actually use pbsnodes commands. Below is a snippet of my code.
#nodes_whole =`pbsnodes -av -s $server | grep "pcpus" `;
$nodes_count = ` pbsnodes -av -s $server | grep "pcpus" | wc -l `;
while($i < $nodes_count){
#cpu_present = split(/\s+/, $nodes_whole[$i]);
$cpu_whole_count += $cpu_present[3];
$i++;
}
I do this because in my script I check things like the amount of cpus , which varies depending on the node the cpus maybe be 4, 8, 16. Also I have multiple clusters which are always changing size and I don't want the script have specific cluster or node info hard coded. Mainly, I do this because when a user submits a job I check to see how many resources they can use . If say they want use a queue and request 200 cpus but on cluster A their job will be queued my script can tell them they will be queued but would not be on cluster b or d. So then they have the option to change before they submit.
I also use it to check for nodes down:
#nodes_down=`pbsnodes -l -s $server `;
I see what resources are in use:
#nodes_used=`pbsnodes -av -s $server | grep "resources_assigned.ncpus" `;
Also in one case I have two clusters running off one head node while I wait for hardware. In that case I check to see what cluster the node is assigned to and then do a count based on the node assigned to that cluster. That way all the users see is another cluster and use the script they way they would for any of the other clusters.
I just mention because I have found a lot of useful ways to use the pbsnodes and it worked well for my particular needs.

Related

How to scatter chunks of data on cluster nodes without running out of memory

I wrote a code in Python which implements mpi4py to scatter chunks of data across the processors of a cluster. Each processor writes the given chunk of data into a .txt file, then all these .txt files are merged in one.
Everything is working as expected.
However, for very large .txt files, the cluster is complaining about memory:
mpiexec noticed that process ... rank ... on node ... exited on signal 9 (Killed)
I'm trying to set the parameters in the PBS file in a way which avoids this issue. So far, this is not working:
#!/bin/bash
#PBS -S /bin/bash
## job name and output file
#PBS -N test
#PBS -j oe
#PBS -o job.o
#PBS -V
###########################################################
# USER PARAMETERS
##PBS -l select=16:mpiprocs=1:mem=8000mb
#PBS -l select=4:ncpus=16:mem=4gb
#PBS -l walltime=03:00:00
###########################################################
ulimit -Hn
# number of processes
NPROC=64
echo $NPROC
CURRDIR=$PBS_O_WORKDIR
echo $CURRDIR
cd $CURRDIR
module load anaconda/2019.10
source activate py3
cat $PBS_NODEFILE
echo starting run in current directory $CURRDIR
echo " "
mpiexec -n $NPROC -hostfile $PBS_NODEFILE python $CURRDIR/test.py
echo "finished successfully"
Any idea?
MPI uses distributed memory, that is, if you have more data than fits in one process, you spread it over multiple processes, for instance on multiple computers. So "scattering" data often doesn't make sense: it assumes that all this too-much data actually fits on one process. For a true MPI program, your processes all create their own data, or read it from a file, but you never have all data in one place.
So if you're dealing with lots of data, then a scattering approach will of course run out of memory, but it's the wrong way to approach your problem to begin with. Rewrite your program and make it truly distributed memory parallel.

GNUPlot cannot be executed after mpirun command in PBS script

I have PBS command something like this
#PBS -N marcell_single_cell
#PBS -l nodes=1:ppn=1
#PBS -l walltime=20000:00:00
#PBS -e stderr.log
#PBS -o stdout.log
# Specific the shell types
#PBS -S /bin/bash
# Specific the queue type
#PBS -q dque
#uncomment this if you want to debug the process
#set -vx
cd $PBS_O_WORKDIR
ulimit -s unlimited
NPROCS=`wc -l < $PBS_NODEFILE`
#export PATH=$PBS_O_PATH
echo This job has allocated $NPROCS nodes
echo Cleaning old files...
rm -rf *.png *.plt *.log
echo Cleaning success
/opt/Lib/openmpi-2.1.3/bin/mpirun -np $NPROCS /scratch4/marcell/CellMLSimulator/bin/CellMLSimulator -ionmodel grandi2010 -solverType CVode -irepeat 4 -dt 0.01
gnuplot -p plotting.gnu
It got error something like this, thrown by the PBS error log.
/var/spool/torque/mom_priv/jobs/6265.node01.SC: line 28: gnuplot: command not found
I've already make sure that the path of GNUPlot is already been added to the PATH environment variable.
However, the strange part is, if I interchange the sequence of command, like gnuplot first and then mpirun, there isn't any error. I suspect that some commands after mpirun need some special configs, but I dunno how to do that
Already following this solution, but no avail.
sleep command not found in torque pbs but works in shell
EDITED:
it seems that the before and after mpirun still got error. and this is the which result:
which: no gnuplot in (/opt/intel/composer_xe_2011_sp1.9.293/bin/intel64:/opt/intel/composer_xe_2011_sp1.9.293/bin/intel64:/opt/pgi/linux86-64/9.0-4/bin:/opt/openmpi/bin:/usr/kerberos/bin:/prog/tools/grace/grace/bin:/home/prog/ansys_inc/v121/fluent/bin:/bin:/usr/bin:/opt/intel/composer_xe_2011_sp1.9.293/mpirt/bin/intel64:/opt/intel/composer_xe_2011_sp1.9.293/mpirt/bin/intel64:/scratch7/feber/jdk1.8.0_101:/scratch7/feber/code/apache-maven/bin:/usr/local/bin:/scratch7/cml/bin)
It's strange, since when I try to find the gnuplot, there is one in the /usr/local/bin
ls -l /usr/local/bin/gnuplot
-rwxr-xr-x 1 root root 3262113 Sep 18 2017 /usr/local/bin/gnuplot
moreover, if I run those commands without PBS, it seems executed as I expected:
/scratch4/marcell/CellMLSimulator/bin/CellMLSimulator -ionmodel grandi2010 -solverType CVode -irepeat 4 -dt 0.01
gnuplot -p plotting.gnu
It's very likely that your system has different "login/head nodes" and "compute nodes". This is a commonly used practice in many supercomputing clusters. While you build and launch your application from the head node, it gets executed on one or more compute nodes.
The compute nodes can have different hardware and software compared to the head nodes. In your case, gnuplot is installed only on the head node, as you can see from the different outputs of which gnuplot. To solve this, you have three approaches:
Request the system administrators to install gnuplot on the compute nodes.
Build and install your own version of gnuplot in a file-system accessible from the compute nodes. It could be your home directory or somewhere else depending on your cluster. In general, the filesystem where your application is will be available. In your case, anywhere under /scratch4/marcell/ would probably work.
Run gnuplot on the head node after the MPI jobs finish as a post-processing step. PBS/Torque does not provide a direct way to do this. You'll need to write a separate bash (not PBS) script to do this.

Going from multi-core to multi-node in R

I've gotten accustomed to doing R jobs on a cluster with 32 cores per node. I am now on a cluster with 16 cores per node. I'd like to maintain (or improve) performance by using more than one node (as I had been doing) at a time.
As can be seen from my dummy sell script and dummy function (below), parallelization on a single node is really easy. Is it similarly easy to extend this to multiple nodes? If so, how would I modify my scripts?
R script:
library(plyr)
library(doMC)
registerDoMC(16)
dothisfunctionmanytimes = function(d){
print(paste("my favorite number is",d$x,'and my favorite letter is',d$y))
}
d = expand.grid(1:1000,letters)
d_ply(.data=d,.fun=dothisfunctionmanytimes,.parallel=T)
Shell script:
#!/bin/sh
#PBS -N runR
#PBS -q normal
#PBS -l nodes=1:ppn=32
#PBS -l walltime=5:00:00
#PBS -j oe
#PBS -V
#PBS -M email
#PBS -m abe
. /etc/profile.d/modules.sh
module load R
#R_LIBS=/home/diag/opt/R/local/lib
R_LIBS_USER=${HOME}/R/x86_64-unknown-linux-gnu-library/3.0
OMP_NUM_THREADS=1
export R_LIBS R_LIBS_USER OMP_NUM_THREADS
cd $PBS_O_WORKDIR
R CMD BATCH script.R
(The shell script gets submitted by qsub script.sh)

error in using "qsub" when allocating nodes

my script (13-4.sh) is :
#!/bin/sh
#PBS -N sample
#PBS -l nodes=4:ppn=64
#PBS -q batch
#PBS -o $HOME/qpms9-2/out/14-4.out
#PBS -e $HOME/qpms9-2/error/14-4.out
#PBS -l walltime=100:00:00
mpirun $HOME/qpms9-2/run_mpi $HOME/qpms9-2/14-4 -l 14 -d 4
when i write this command : qsub 13-4.sh
The answer is as follows:
qsub: Job exceeds queue resource limits MSG=cannot satisfy queue max nodes requirement
my cluster has 10 nodes (64 core per node)
This might be an issue with your scheduler. I don't know which one you have but you should search the relevant documentation and settings to see if there is a setting that is capping the maximum number of nodes per user. If you type
/sbin/service pbs status
you can find which scheduler you're using by checking which services are running. Popular schedulers are pbs_sched, maui, and moab.
I would also make sure that all 10 nodes are online. You might be able to test this using
pbsnodes
Depending on how your cluster is configured. Additionally you should check that the batch queue exists and there are no standing reservations.

PBS/TORQUE: how do I submit a parallel job on multiple nodes?

So, right now I'm submitting jobs on a cluster with qsub, but they seem to always run on a single node. I currently run them by doing
#PBS -l walltime=10
#PBS -l nodes=4:gpus=2
#PBS -r n
#PBS -N test
range_0_total = $(seq 0 $(expr $total - 1))
for i in $range_0_total
do
$PATH_TO_JOB_EXEC/job_executable &
done
wait
I would be incredibly grateful if you could tell me if I'm doing something wrong, or if it's just that my test tasks are too small.
With your approach, you need to have your for loop go through all of the entries in the file pointed to by $PBS_NODEFILE and then inside of you loop you would need "ssh $i $PATH_TO_JOB_EXEC/job_executable &".
The other, easier way to do this would be to replace the for loop and wait with:
pbsdsh $PATH_TO_JOB_EXEC/job_executable
This would run a copy of your program on each core assigned to your job. If you need to modify this behavior take a look at the options available in the pbsdsh man page.

Resources