Pinning parallel process using UNix system calls

Pinning parallel process using UNix system calls - unix

for i in `seq 1 8` ; do
(./runProgram &)
done
Dear Fellows,
I know how to create parallel processes by creating 8 independent processes, the next thing I am in search for is how to
i-Run 8 copies concurrently with processor pinning (each copy on is own processor core)
ii-Run 16 copies concurrently with processor pinning (2 copies per core)
iii-Run 8 copies concurrently with processor pinning as per “iii” and flipping processor core to the furthest core after a particular function call in the code.
Current configuration of my cpu is 8 cores.it is running Fedora OS. I dont know the process ids in advance.
please suggest.
Thanks in advance.

The easiest way to achieve i and ii is to use the taskset command:
Case i:
for i in `seq 0 7`; do
taskset -c $i ./runProgram &
done
Case ii:
for i in `seq 0 7`; do
taskset -c $i ./runProgram &
taskset -c $i ./runProgram &
done
Case iii: See the manual pages for sched_getaffinity(2) and sched_setaffinity(2) on how to change the pinning in code.

Related

Forks are spawned on a single core on interactive HPC node

I am trying to test a script I have developed locally on an interactive HPC node, and I keep running in this strange issue that mclapply works only on a single core. I see several R processes spawned in htop (as many as the number of the cores), but they all occupy only one core.
Here is how I obtain the interactive node:
srun -n 16 -N 1 -t 5 --pty bash -il
Is there a setting I am missing? How can I make this work? What can I check?
P.S. I just tested and the other programs that rely on forking to do parallel processing (say pigz) are afflicted by the same issue as well. Those that rely on MPI and messaging work properly, it seems.

Yes, you are missing a setting. Try:
srun -N 1 -n 1 -c 16 -t 5 --pty bash -il
The problem is that you are running the parallel commands within a bash shell that is allocated on a single core, so the bash process is spawned on only one of the cores requested by srun.
Otherwise, you can first allocate your resources using salloc and once you obtain them run your actual command. For instance:
salloc -N 1 -n 1 -c 16 -t 5
srun pigz file.ext

Unable to use all cores with mpirun

I'm testing a simple MPI program on my desktop (Ubuntu LTS 16.04/ Intel® Core™ i3-6100U CPU # 2.30GHz × 4/ gcc 4.8.5 /OpenMPI 3.0.0) and mpirun won't let me use all of the cores on my machine (4). When I run:
$ mpirun -n 4 ./test2
I get the following error:
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
./test2
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
But if I run with:
$ mpirun -n 2 ./test2
everything works fine.
I've seen from other answers that I can check the number of processors with
cat /proc/cpuinfo | grep processor | wc -l
and this tells me that I have 4 processors. I'm not interested in oversubscribing, I'd just like to be able to use all my processors. Can anyone help?

Your processor has 4 hyperthreads but only 2 cores (see the specs here).
By default, Open MPI does not run more than one MPI task per core.
You can have Open MPI run up to one MPI task per hyperthread with the following option
mpirun --use-hwthread-cpus ...
FWIW
The command you mentioned reports the number of hyperthreads.
A better way to figure out the topology of a machine is via the lstopo command from the hwloc package.
MPI tasks are not bound on cores nor threads on OS X, so if you are running on a Mac, the --oversubscribe -np 4 would lead to the same result.

To resolve your problem, you can use the --use-hwthread-cpus command line arguments for mpirun, as already pointed out by Gilles Gouaillardet. In this case, Open MPI will treat the thread provided by hyperthreading as the Open MPI processor. Otherwise, it will treat a CPU core as an Open MPI processor, which is the default behavior. When using --use-hwthread-cpus, it will correctly determine the total number of processors available to you, that is, all processors available on all hosts specified in the Open MPI host file. Therefore, you do not need to specify the "-n" parameter. In addition, when using the --use-hwthread-cpus command line parameter, Open MPI refers to the threads provided by hyperthreading as "hardware threads". With this technique, you will not oversubscribe, and if some Open MPI processor will run on a virtual machine, it will use the correct number of threads assigned to that virtual machine. And if your processor has more than two threads per core, as a Xeon Phi (Knights Mill, Knights Landing, etc.), it will take all four threads per core as an Open MPI processor.

Use $ lscpu the number of cores per socket * number of sockets would give you number of physical cores(the ones that you can use for mpi) where as number of cores per socket * number of sockets * threads per core will give you number of logical cores(the one that you get by using the command $ cat /proc/cpuinfo | grep processor | wc -l)

mpi job submission on lsf cluster

I usually process data on the University's cluster. Most jobs done before are based on parallel batch shell (divide job to several batches, then submit them parallel). An example of this shell is shown below:
#! /bin/bash
#BSUB -J model_0001
#BSUB -o z_output_model_0001.o
#BSUB -n 8
#BSUB -e z_output_model_0001.e
#BSUB -q general
#BSUB -W 5:00
#BSUB -B
#BSUB -N
some command
This time, I am testing some mpi job (based on mpi4py). The code has been tested on my laptop working on single task(1 task using 4 processor to run). Now I need to submit multi-task (30) jobs on the cluster (1 task using 8 processor to run). My design is like this: prepare 30 similar shell files above. command in each shell fill is my mpi command (something like "mpiexec -n 8 mycode.py args"). And each shell reserves 8 processors.
I submitted the jobs. But I am not sure if I am doing correctly. It's running but I am not sure if it runs based on mpi. How can I check? Here are 2 more questions:
1) For normal parallel jobs, usually there is a limit number I can reserve for single task -- 16. Above 16, I never succeeded. If I use mpi, can I reserve more? Because mpi is different. Basically I do not need continuous memory.
2) I think there is a priority rule on the cluster. For normal parallel jobs, usually when I reserve more processors for 1 task (say 10 tasks and 16 processors per task), it requires much more waiting time in the queue than reserving less less processors for single task (say divide each task to 8 sub-tasks (80 sub-tasks in total) and 2 processors per sub-task). If I can reserve more processors for mpi. Does it affects this rule? I worry that I am going to wait forever...

Well, increasing "#BSUB -n" is exactly what you need to do. That option tells how many execution "slots" you are reserving. So if you want to run an MPI job with 20 ranks, you need
#BSUB -n 20
IIRC the execution slots do not need to be allocated on the same node, LSF will allocate slots from as many nodes are required for the request to be satisfied. But it's been a while since I've used LSF, and I currently don't have access to a system using it, so I could be wrong (and it might depend on the local cluster LSF configuration).

Multithreaded program only runs on a single processor after compiling, how do I troubleshoot?

I am trying to run a compiled program that is supposed to be running on multiple processors. But with the same data, sometimes this program runs in parallel and sometimes it won't (with the identical PBS script file!). I am suspecting that something is wrong with some of the compute nodes that won't let it run on parallel (I don't get to choose the compute node I want). How can I troubleshoot if this is a bug in the program or it is problem with the compute node?
As per the sys admin's adivce, I am using ulimit -s 100000, but this don't change anything. Also, this program is not an mpi program (runs only on a single node, with multiple processors).
The code that I run is as follows:
quorum_error_correct_reads -q 68 \
--contaminant=/data004/software/GIF/packages/masurca/2.3.0rc1/bin/../share/adapter.jf \
-m 1 -s 1 -g 1 -a 3 --thread=32 -w 10 -e 3 \
quorum_mer_db.jf aa.renamed.fastq ab.renamed.fastq ac.renamed.fastq ad.renamed.fastq ae.renamed.fastq af.renamed.fastq ag.renamed.fastq \
--no-discard -o pe.cor --verbose
Thanks for any advice you can offer. I will greatly appreciate your help!
PS: I don't have sudo access.
EDIT: I know it is supposed to be using multiple processors because, when I SSH into the node and do top -c I can see (above command) sometimes running like 3200 % CPU (all the time) and sometimes only 100 % CPU all the time. This is the only step involved and there are no other sub-process within this program. Also, I am using HPC, where I submit the job to a compute node, each with 32 procs, 512GB RAM.

MPI: mpiexec third parameter not clear

What exactly is the third parameter in the following MPI command
mpiexec -n 2 cpi
Is it no. of cores? So if I am running on Pentium 4 , shall I make it 1?

-n 2: spawn two processes.
cpi: the executable.
Experiment with what is faster, one or two or more processes. Some codes run best with one process per core, some codes benefit from oversubscription.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex