Using GNU Parallel etc with PBS queue system to run more than 2 or more MPI codes across multiple nodes as a single job - mpi

I am trying to run more than 1 MPI codes (eg. 2) in PBS queue system across multiple nodes as a single job.
E.g. For my cluster, 1 node = 12 procs
I need to run 2 codes (abc1.out & abc2.out) as a single job, each code using 24 procs. Hence, I need 4x12 cores for this job. And I need a software which can assign 2x12 to each of the code.
Someone suggested:
How to run several commands in one PBS job submission
which is:
(cd jobdir1; myexecutable argument1 argument2) &
(cd jobdir2; myexecutable argument1 argument2) &
wait
but it doesn't work. The codes are not distributed among all processes.
Can GNU parallel be used? Becos I read somewhere that it can't work across multiple nodes.
If so, what's the command line for the PBS queue system
If not, is there any software which can do this?
This is similar to my final objective which is similar but much more complicated.
Thanks for the help.

Looking at https://hpcc.umd.edu/hpcc/help/running.html#mpi it seems you need to use $PBS_NODEFILE.
Let us assume you have $PBS_NODEFILE containing the 4 reserved nodes. You then need a way to split these in 2x2. This will probably do:
run_one_set() {
cat > nodefile.$$
mpdboot -n 2 -f nodefile.$$
mpiexec -n 1 YOUR_PROGRAM
mpdallexit
rm nodefile.$$
}
export -f run_one_set
cat $PBS_NODEFILE | parallel --pipe -N2 run_one_set
(Completely untested).

thanks for the suggestions.
Btw, i tried using gnu parallel and so far, it only works for jobs within a single node. After some trial and error, I finally found the solution.
Suppose each node has 12procs. And you need to run 2 jobs, each req 24 procs.
So u can request:
#PBS -l select=4:ncpus=12:mpiprocs=12:mem=32gb:ompthreads=1
Then
sort -u $PBS_NODEFILE > unique-nodelist.txt
sed -n '1,2p' unique-nodelist.txt > host.txt
sed 's/.*/& slots=12/' host.txt > host1.txt
sed -n '3,4p' unique-nodelist.txt > host.txt
sed 's/.*/& slots=12/' host.txt > host2.txt
mv host1.txt 1/
mv host2.txt 2/
(cd 1; ./run_solver.sh) &
(cd 2; ./run_solver.sh) &
wait
What the above do is to get the nodes used, remove repetition
separate into 2 nodes each for each job
go to dir 1 and 2 and run the job using run_solver.sh
Inside run_solver.sh for job 1 in dir 1:
...
mpirun -n 24 --hostfile host1.txt abc
Inside run_solver.sh for job 2 in dir 2:
...
mpirun -n 24 --hostfile host2.txt def
Note the different host name.

Related

In Slurm, is it possible to assign a different number of CPUs for every task?

I am running MPI-over-openmp jobs in a Slurm cluster and I am trying to figure out a way to give different number of CPUs to each generated task. For example, let's say we run this job:
srun --nodes 1 --ntasks 2 --cpus-per-task 2 ./mpi_exe
This would generate 2 MPI processes in a single node, with 2 CPUs each. However I would like to, for example, assign 3 CPUs to the first process and 1 in the second process.
Do you know any way to achieve this?
Have a look at Heterogeneous Jobs. For your example, this should do the trick:
srun -N1 -n1 -c3 : -N1 -n1 -c1 ./mpi_exe

Process not listed on "ps -ef" (AIX 7.1)

I have an unusual problem involving the output from the ps -ef command on AIX 7.1.
A shell script monitors processes by parsing this output. I've noticed on two occasions a process (a Perl program) was omitted from this list. Everything I've read on the subject says this is not possible. The program in question starts via crontab at 6am and runs until 11pm, when it self terminates. I checked the output of ps -ef immediately after being omitted by the monitor script, and it displays:
user 1249864 9569338 0 06:00:00 - 0:19 /usr/bin/perl -w /path/to/omittedProgram.pl
... which means it's the same process that was started at 6am. The program did not terminate, then restart.
What is causing it to be omitted from the ps -ef output?
Edit: This is the program that examines the output of ps -ef, which has been running successfully for about five years. I've only noticed this problem twice, but both have been in the last 2 months:
# set global variables
PROCESS_FILE=/tmp/processList.txt
TEMP_FILE=/tmp/greppedProcesses.tmp
BOX=`uname -n`
DATE=`date`
EMAIL_LIST="Support#email.address"
# Get list of running processes
ps -ef > $PROCESS_FILE
checkProcess() {
PROCESS_NAME=$1
PROCESS_ABBREVIATION=$2
PROCESS_COUNT=$3
UNIQUE_PROCESS_IDENTIFIER=$4
GREPPED_LINES=$TEMP_FILE-$PROCESS_ABBREVIATION
grep $UNIQUE_PROCESS_IDENTIFIER $PROCESS_FILE | grep -v grep > $GREPPED_LINES
NUM=`cat $GREPPED_LINES | wc -l`
if [[ $NUM -ne $PROCESS_COUNT ]]
# Incorrect number of processes running!
then MESSAGE=`printf "The \"$PROCESS_NAME\" process count is %1d, but it should be $PROCESS_COUNT!!!" $NUM`
echo "Monitor - starting on $DATE\n\n$MESSAGE\n\n`cat $GREPPED_LINES`" | mail -s "Problem with $PROCESS_NAME on $BOX" $EMAIL_LIST
fi
# Delete the temp file
rm $GREPPED_LINES
}
checkProcess "Full Name of Program" "Program Abbreviation" <expected number of processes running> "Unique string to identify program in ps output"
checkProcess ... (for other processes) ...
exit 0
This might be a long shot in your case but I had same experience with "ps -ef" in the past (don't remember the exact OS type where I seen it, but my script had to work on any Linux, AIX, Solaris and HP-UX).
The "ps -ef" output might be limited to a certain number of columns when used inside a script executed without a terminal. The user, pid, ppid, cputime columns are dynamic and breaking the format sometimes (when the data is larger then the reserved space).
For example if the PID of the process gets to large then the name of the process might be "cut" so that it doesn't appear in the already limited number of column displayed by "ps -ef" then your monitor script would fail.
You could try to keep the file containing the "ps -ef" output and check if it's this problem. No need to wait for when the issue happens, just check if you have the extra long process names in the file (anything longer then the process you're looking for).
My workaround for this problem is to specify a large enough number of columns to be used, like this: COLUMNS=8192 ps -ef > file.out the variable is set just for this 1 purpose.
I just heard from my server support team that the AIX 7.1 TL4 SP4 patch will fix this! We're installing it on our servers now and hopefully this won't happen again.

Run multiple R scripts in parallel with command line arguments

I have an R script that performs analysis on one chromosome. I want to run this script repeatedly for each chromosome (1-22, X and Y). Right now I have the script set up to accept one argument from command line, the chromosome number. I want to submit multiple jobs to my server in parallel since analysis for one chromosome takes a few hours. After playing around with some options and googling everything, I'm still not sure what the best option is as I've never submitted jobs in parallel to a server (Sun Grid Engine server). I looked into GNU parallel but I'm not sure how to use it or if it even runs for R scripts. Maybe throw everything in a shell script and submit that to the server? This is a pretty basic question, but any direction would be greatly appreciated!
parallel Rscript plot_LRR_BAF_chromosome_parallel ::: {1..22} X Y
using GNU make with option -j , replace __CHROM__ in your R script with the chromosome name.
chroms=1 2 3 4 5 6 7 8 9 10
define method1
$$(addsuffix .out,$(1)) : script.R
cat $$< | sed 's/__CHROM__/$(1)/g' | R --nosave > $$#
endef
all: $(addsuffix .out,$(chroms))
$(foreach C, $(chroms),$(eval $(call method1, $(C) )))

How to use the S-output-modifier with Unix/Linux command ps?

The command
ps -o time -p 21361
works; however what I need is the running time of the process including all
the children. For example, if 21361 is a bash script, which calls other scripts,
then I want the total running time, including the running time of all children.
Now the ps documentation lists the "OUTPUT MODIFIER":
S
Sum up some information, such as CPU usage, from dead child processes into their parent. This is useful for examining a system where a parent process repeatedly forks off short-lived children to do work.
Sounds just right. Unfortunately, there is no specification of the ps-syntax, so
I have no clue where to place the "S"! For hours now I tried many combinations, but
either I get syntax errors, or "S" makes nothing. And on the Internet you find only
very basic information about ps (and always the same), specifically the "S" modifier
I couldn't find mentioned anywhere, and also nobody ever explains the syntax of ps.
I am not sure, but it might be that ps is somewhat buggy in this respect. Try this here:
$ ps p 12104 k time
PID TTY STAT TIME COMMAND
12104 ? Ss 16:17 /usr/sbin/apache2 -k start
$ ps p 12104 k time S
PID TTY STAT TIME COMMAND
12104 ? Ss 143:16 /usr/sbin/apache2 -k start
This is using the BSD options for ps. It works on my machine, however you get an extra header row and extra columns. I would cut them away using tr and cut:
$ ps p 12104 k time S | tail -n 1 | tr -s '[:space:]' | cut -d ' ' -f 4
143:39
$ ps p 12104 k time | tail -n 1 | tr -s '[:space:]' | cut -d ' ' -f 4
16:17
On MacOS X (10.7, Lion) the manual page says:
-S Change the way the process time is calculated by summing all exited children to their parent process.
So, I was able to get output using:
$ ps -S -o time,etime,pid -p 305
TIME ELAPSED PID
0:00.12 01-18:31:07 305
$
However, that output was not really any different from when the '-S' option was omitted.
I tried:
$ ps -S -o time,etime,pid -p 305
TIME ELAPSED PID
0:00.14 01-18:43:59 305
$ time dd if=/dev/zero of=/dev/null bs=1m count=100k
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 15.374440 secs (6983941055 bytes/sec)
real 0m15.379s
user 0m0.056s
sys 0m15.034s
$ ps -S -o time,etime,pid -p 305
TIME ELAPSED PID
0:00.14 01-18:44:15 305
$
As you can see, the 15 seconds of system time spent copying /dev/zero to /dev/null did not get included in the summary.
At this stage, the only way of working out what the '-S' option does, if anything, is to look at the source. You could look for sumrusage in the FreeBSD version, for example, at FreeBSD.

Using make to execute independent tasks in parallel

I have a bunch of commands I would like to execute in parallel. The commands are nearly identical. They can be expected to take about the same time, and can run completely independently. They may look like:
command -n 1 > log.1
command -n 2 > log.2
command -n 3 > log.3
...
command -n 4096 > log.4096
I could launch all of them in parallel in a shell script, but the system would try to load more than strictly necessary to keep the CPU(s) busy (each task takes 100% of one core until it has finished). This would cause the disk to thrash and make the whole thing slower than a less greedy approach to execution.
The best approach is probably to keep about n tasks executing, where n is the number of available cores.
I am keen not to reinvent the wheel. This problem has already been solved in the Unix make program (when used with the -j n option). I was wondering if perhaps it was possible to write generic Makefile rules for the above, so as to avoid the linear-size Makefile that would look like:
all: log.1 log.2 ...
log.1:
command -n 1 > log.1
log.2:
command -n 2 > log.2
...
If the best solution is not to use make but another program/utility, I am open to that as long as the dependencies are reasonable (make was very good in this regard).
Here is more portable shell code that does not depend on brace expansion:
LOGS := $(shell seq 1 1024)
Note the use of := to define a more efficient variable: the simply expanded "flavor".
See pattern rules
Another way, if this is the single reason why you need make, is to use -n and -P options of xargs.
First the easy part. As Roman Cheplyaka points out, pattern rules are very useful:
LOGS = log.1 log.2 ... log.4096
all: $(LOGS)
log.%:
command -n $* > log.$*
The tricky part is creating that list, LOGS. Make isn't very good at handling numbers. The best way is probably to call on the shell. (You may have to adjust this script for your shell-- shell scripting isn't my strongest subject.)
NUM_LOGS = 4096
LOGS = $(shell for ((i=1 ; i<=$(NUM_LOGS) ; ++i)) ; do echo log.$$i ; done)
xargs -P is the "standard" way to do this.
Note depending on disk I/O you may want to limit to spindles rather than cores.
If you do want to limit to cores note the new nproc command in recent coreutils.
With GNU Parallel you would write:
parallel command -n {} ">" log.{} ::: {1..4096}
10 second installation:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
Learn more: http://www.gnu.org/software/parallel/parallel_tutorial.html https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Resources