using strace with mpiexec - mpi

How do I strace all processes of MPI parallel job, started with mpiexec (MPICH2, linux)?
-o will mess outputs from different processes
PS To some editors: who may think that MPICH is the name of the library. MPICH2 is a particular version.. MPICH2 is actually MPICH2 is an all-new implementation of MPI and I sometimes had to used both mpich and mpich2. So, we can't replace mpich2 with mpich.

Create a wrapper around your program, which will be launched by mpiexec. Something like:
#!/bin/sh
LOGFILE="strace-$(hostname).$$"
exec strace -o"$LOGFILE" my_mpi_program

You may want to try STAT (Stack Trace Analysis Tool).
Check out the STAT Homepage.
It will give you a high level overview of your process behavior, and works
especially well in the case of a hung process.

Related

OpenMPI specify executable for specific nodes

I have a heterogeneous computing cluster that I would like to run parallel computing tasks on using OpenMPI. Since not all nodes in the cluster can run the same executable (by virtue of being heterogeneous) I would like for some of the nodes to compile their own version of the program and have Open MPI invoke that executable on those nodes. My first question is whether OpenMPI enables this kind of computing across heterogeneous architectures.
If so, my second question is how to specify which executables to run on which nodes. For example lets say node0, node1, and node2 can run executable prog1 and node3, node4, and node5 can run executable prog2, where prog1 and prog2 are the same program but compiled for different architectures using the mpicc or mpic++ wrapper compilers.
If I wanted to run this program in parallel across all nodes would I do the following:
mpirun -n 3 --hosts node0,node1,node2 prog1 : -n 3 --hosts node3,node4,node5 prog2
If not, what would I do to achieve this effect? This post indicates that heterogeneous cluster computing is supported by OpenMPI but I must build OpenMPI with the --enable-heterogeneous flag. I'm not sure how to do this since my cluster is running ArchLinux and I installed OpenMPI with pacman.
Note there is a typo (--host does not require an ending s), so your command should be
mpirun -n 3 --host node0,node1,node2 prog1 : -n 3 --host node3,node4,node5 prog2
--enable-heterogeneous is needed so Open MPI can be ran on heterogeneous systems (for example between Intel x86_64 (little endian) and a sparcv9 (big endian) nodes). If OpenMPI (coming with ArchLinux) was not configured with this flag, then you should rebuild this package. an other option is to rebuild Open MPI and install it into an alternate directory.
Last but not least, heterogeneous support is (very) lightly tested, and i strongly encourage you to use the latest Open MPI 3.0 series.

Cray aprun is adding an extra dash to program arguments - how can I stop this?

I have an MPI application which has a command line option -ss to specify an argument. I've been running this successfully on various Cray machines, including ARCHER (www.archer.ac.uk) an XC30, for years. The OS was recently upgraded and as part of this ALPS was upgraded to version 5.1.1-2.0501.8507.1.1
Now when I launch the program on the compute nodes with aprun, the program is receiving the the option as --ss.
Checking with a shell script instead of a full application
#!/bin/bash
echo $*
confirms that this option is getting double-dashed by aprun.
Clearly there is a bug in aprun (I've reported it) but how can I work around the issue until this is patched?

MPI OpenMp hybrid

I am trying to run a program written for MPI and OpenMP on a cluster of Linux dual cores.
When I try to set the OMP_NUM_THREADS variable
export OMP_NUM_THREADS=2
I get a message
OMP_NUM_THREADS: Undefined variable.
I don't get a better performance with OpenMP... I also tried:
mpiexec -n 10 -genv OMP_NUM_THREADS 2 ./binary
and omp_set_num_threads(2) inside the program, but it didn't get any better...
Any ideas?
update: when I run mpiexec -n 1 ./binary with omp_set_num_threads(2) execution time is 4s and when I run mpiexec -f machines -n 1 ./binary execution time is 8s.
I would suggest doing an $echo OMP_NUM_THREADS first and further querying for the number of threads inside the program to make sure that threads are being spawned. Use the omp_get_num_threads() function for this. Further if you're using a MacOS then this blogpost can help:
https://whiteinkdotorg.wordpress.com/2014/07/09/installing-mpich-using-macports-on-mac-os-x/
The latter part in this post will help you to successfully compile and run Hybrid programs. Whether a Hybrid program gets better performance or not depends a lot on contention of resources. Excessive usage of locks, barriers - can further slow the program down. It will be great if you post your code here for others to view and to actually help you.

How to use ltrace for mpi programs?

I want to know how to use ltrace to get library function calls of mpi application but simply ltrace doesn't work and my mpirun cannot succeed.
Any idea?
You should be able to simply use:
$ mpiexec -n 4 -other_mpiexec_options ltrace ./executable
But that will create a huge mess since the outputs from the different ranks will merge. A much better option is to redirect the output of ltrace to a separate file for each rank. Getting the rank is easy with some MPI implementations. For example, Open MPI exports the world rank in the environment variable OMPI_COMM_WORLD_RANK. The following wrapper script would help:
#!/bin/sh
ltrace --output trace.$OMPI_COMM_WORLD_RANK $*
Usage:
$ mpiexec -n 4 ... ltrace_wrapper ./executable
This will produce 4 trace files, one for each rank: trace.0, trace.1, trace.2, and trace.3.
For MPICH and other MPI implementations based on it and using the Hydra PM exports PMI_RANK and the above given script has to be modified and OMPI_COMM_WORLD_RANK replaced with PMI_RANK. One could also write an universal wrapper that works with both families of MPI implementations.

How do I tell what version of MPICH or OpenMPI that I have?

I am an extremely novice user of MPI and its relatives. On the node that I have access to at my institution, MPI is installed, but I would like to know what version I have.
From this old question, an answer suggests trying:
mpiexec --version
But when I try this, I get this error message:
invalid "local" arg: --version
usage:
mpiexec [-h or -help or --help] # get this message
mpiexec -file filename # (or -f) filename contains XML job description
mpiexec [global args] [local args] executable [args]
Having said this, I am not completely sure that I have MPICH. I may instead have OpenMPI. But I do, I think, have MPICH because I ran ldd on my program, and the output included references to libmpich.so, which an answer to this old question says is indicative of MPICH rather than OpenMPI.
Do you have any ideas of how I can extract the version of MPI that I am using?
Addendum
Another answer on that old question says to try:
mpicc -v
I have tried this, and I get this output:
mpicc for MPICH2 version 1.2.1p1
Using built-in specs.
Target: x86_64-linux-gnu
Thread model: posix
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
So I guess I have MPICH2 version 1.2.1p1. But can I know from this that for sure that MPICH2 version 1.2.1p1 is currently installed? Or could it be that mpicc was configured with MPICH2 version 1.2.1p1 and now a different version of MPI could be installed?
It means that you have installed MPICH2 1.2.1p1 and it's your default mpicc. If you install another MPI distribution (e.g. Open MPI), then you need to adjust the paths such that you can use the newly installed one.

Resources