Vtune throws aps error while running MPI samples - mpi

I am trying to profile my MPI application using Intel Vtune. While trying to run the below two commands, I am getting error.
I_MPI_DEBUG=5.
I tried two things,
export I_MPI_DEBUG = 5
mpirun -np 4 aps ~/binary/vasp_std_2022
mpirun -genv I_MPI_DEBUG=5 -np 4 aps ~/binary/vasp_std_2022
vtune: Warning: Memory bandwidth collection is not supported inside a virtual machine since uncore events cannot be collected. For full functionality, consider using a bare-metal environment.
vtune: Warning: CPU frequency data collection is not supported on this platform.
vtune: Error: amplxe-perf:
Using CPUID GenuineIntel-6-6A-6
both cgroup and no-aggregation modes only available in system-wide mode
Usage: perf stat [] []
-G, --cgroup monitor event in cgroup name only
-A, --no-aggr disable CPU count aggregation
-a, --all-cpus system-wide collection from all CPUs
--for-each-cgroup expand events for each cgroup
vtune: Error: Preliminary validation of the requested events failed.
aps Error: Cannot run the collection.
aps Error: Cannot process configs directory.
aps Error: Cannot process configs directory.
aps Error: Cannot process configs directory.

In the command you are using 'aps' while profiling.
If you are using aps command you need to use some parameters like collection-mode as --collection-mode=<mode>
This parameter is used to specify a comma separated list of data to collect. Possible values:
hwc : hardware counters
omp : openMP statistics
mpi : MPI statistcs
all : all possible data(default)
Try to use the command as below
mpirun -genv I_MPI_DEBUG= 5 -np 4 aps --collect-mode=omp ./obj

Related

How to activate hyperthreading for ipcluster and MPI

I am starting an IPython cluster with an MPI engine to execute a jupyter notebook on multiple processes:
ipcluster start --engines=MPI -n 6 --profile=mpi
The machine has 6 cores so this works without an issue. However, I would also like to use its 12 threads. How do I tell IPython/the ipcluster command to activate hyperthreading (i.e. pass --use-hwthread-cpus to the mpirun/mpiexec command it executes)?
Error message I receive when trying the above ipcluster command with 12 nodes:
2023-01-06 14:01:35.586 [IPClusterStart] Starting 12 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>
2023-01-06 14:01:35.688 [IPClusterStart] WARNING | engine set stopped 1673010095: {'exit_code': 1, 'pid': 187667, 'identifier': 'ipengine-1673010095-187622'}
2023-01-06 14:01:35.689 [IPClusterStart] ERROR |
Engines shutdown early, they probably failed to connect.
Check the engine log files for output.
If your controller and engines are not on the same machine, you probably
have to instruct the controller to listen on an interface other than localhost.
You can set this by adding "--ip=*" to your ControllerLauncher.controller_args.
Be sure to read our security docs before instructing your controller to listen on
a public interface.
2023-01-06 14:01:35.690 [IPClusterStart] ERROR | Engine output:
Invalid MIT-MAGIC-COOKIE-1 key--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 12
slots that were requested by the application:
/****/venv/bin/python
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the Open MPI term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which Open MPI processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, Open MPI defaults to the number of processor cores
In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
2023-01-06 14:01:35.690 [IPClusterStart] ERROR | IPython cluster: stopping
2023-01-06 14:01:35.691 [IPClusterStart] Stopping controller
2023-01-06 14:01:35.691 [IPController] CRITICAL | Received signal 15, shutting down
2023-01-06 14:01:35.692 [IPController] CRITICAL | terminating children...
2023-01-06 14:01:35.816 [IPClusterStart] Controller stopped: {'exit_code': 0, 'pid': 187624, 'identifier': 'ipcontroller-187622'}
2023-01-06 14:01:35.816 [IPClusterStart] Stopping engine(s): 1673010095

Error in MPI program execution - no active ports found

I am trying to run a simple MPI job across multiple hosts of a cluster.
[capc#gpu6 mpi_tests]$ /opt/openmpi4.0.3/build/bin/mpirun --host gpu7,gpu6 ./a.out
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.
Local host: gpu7
We have 2 processes.
WARNING: Open MPI accepted a TCP connection from what appears to be a
another Open MPI process but cannot find a corresponding process
entry for that peer.
This attempted connection will be ignored; your MPI job may or may not
continue properly.
Local host: gpu6
PID: 29209
[gpu6:29203] 1 more process has sent help message help-mpi-btl-openib.txt / no active ports found
[gpu6:29203] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
I have compiled the MPI program with mpicc and on running with mpirun it hangs.
Can anyone guide me regarding this?

MPI: Pin each instance to certain cores on each node

I want to execute several instances of my program with OpenMPI 2.11. Each instance runs on its own node (-N 1) on my cluster. This works fine. I now want to pin each program-instance to the first 2 cores of its node. To do that, it looks like I need to use rankfiles. Here is my rankfile:
rank 0=+n0 slot=0-1
rank 1=+n1 slot=0-1
This, in my opinion, should limit each program-instance to cores 0 and 1 of the local machine it runs on.
I execute mpirun like so:
mpirun -np 2 -N 1 -rf /my/rank/file my_program
But mpirun fails with this error without even executing my program:
Conflicting directives for mapping policy are causing the policy
to be redefined:
New policy: RANK_FILE
Prior policy: UNKNOWN
Please check that only one policy is defined.
What's this? Did I make a mistake in the rankfile?
Instead of using a rankfile, simply use a hostfile:
n0 slots=n max_slots=n
n1 slots=n max_slots=n
Then tell Open MPI to map one process per node with two cores per process using:
mpiexec --hostfile hostfile --map-by ppr:1:node:PE=2 --bind-to core ...
ppr:1:node:PE=2 reads as: 1 process per resource; resource type is node; 2 processing elements per process. You can check the actual binding by adding the --report-bindings option.

R - Error in Rmpi with snow

I'm trying to execute an MPI cluster over 3 different computers inside a local area network with the following R code:
library(plyr)
library(class)
library(snow)
cl <- makeCluster(spec=c("localhost","ip1","ip2"),master="ip3")
but I'm getting an error:
Error in mpi.comm.spawn(slave = mpitask, slavearg = args, nslaves = count, :
Calloc could not allocate memory (18446744071562067968 of 4 bytes)
Warning messages:
1: In if (nslaves <= 0) stop("Choose a positive number of slaves.") : [...]
2: In mpi.comm.spawn(slave = mpitask, slavearg = args, nslaves = count, :
NA produced by coercition
What is this error due? I couldn't find any relevant topic on the current subject.
When calling makeCluster to create an MPI cluster, the spec argument should either be a number or missing, depending on whether you want the workers to be spawned or not. You can't specify the hostnames, as you would when creating a SOCK cluster. And in order to start workers on other machines with an MPI cluster, you have to execute your R script using a command such as mpirun, mpiexec, etc., depending on your MPI installation, and you specify the hosts to use via arguments to mpirun, not to makeCluster.
In your case, you might execute your script with:
$ mpirun -n 1 -H ip3,localhost,ip1,ip2 R --slave -f script.R
Since -n 1 is used, your script executes only on "ip3", not all four hosts, but MPI knows about the other three hosts, and will be able to spawn processes to them.
You would create the MPI cluster in that script with:
cl <- makeCluster(3)
This should cause a worker to be spawned on "localhost", "ip1", and "ip2", with the master process running on "ip3" (at least with Open MPI: I'm not sure about other MPI distributions). I don't believe the "master" option is used with the MPI transport: it's primarily used by the SOCK transport.
You can get lots of information about mpirun from its man page.
You can even try out executing the code in cluster nodes by following:
Create a file with name
nodelist -> Write down the machine names inside that one below the other.
Using mpirun try the following command in terminal :
mpirun -np (no.of processes) -machinefile (path where your nodelist file is present) Rscript (filename.R). Ignore round braces.
By default it will take the first node as the master and spawn the process to rest of the nodes including itself as slaves.

mpi_comm_spawn on remote nodes

How does one use MPI_Comm_spawn to start worker processes on remote nodes?
Using OpenMPI 1.4.3, I've tried this code:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "host", "node2");
MPI_Comm intercom;
MPI_Comm_spawn("worker",
MPI_ARGV_NULL,
nprocs,
info,
0,
MPI_COMM_SELF,
&intercom,
MPI_ERRCODES_IGNORE);
But that fails with this error message:
--------------------------------------------------------------------------
There are no allocated resources for the application
worker
that match the requested mapping:
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
If I replace the "node2" with the name of my local machine, then it works fine. If I ssh into node2 and run the same thing there (with "node2" in the info dictionary) then it also works fine.
I don't want to start the parent process with mpirun, so I'm just looking for a way to dynamically spawn processes on remote nodes. Is this possible?
I don't want to start the parent
process with mpirun, so I'm just
looking for a way to dynamically spawn
processes on remote nodes. Is this
possible?
I'm not sure why you don't want to start it with mpirun? You're implicitly starting up the whole MPI machinery anyway as soon as you hit MPI_Init(), this way you just get to pass it options rather than relying on the default.
The issue here is simply that when the MPI library starts up (at MPI_Init()) it doesn't see any other hosts available, because you haven't given it any with the --host or --hostfile options to mpirun. It won't just launch processes elsewhere on your say-so (indeed, spawn doesn't require Info host, so in general it wouldn't even know where to go otherwise), so it fails.
So you'll need to do
mpirun --host myhost,host2 -np 1 ./parentjob
or, more generally, provide a hostfile, preferably with a number of slots available
myhost slots=1
host2 slots=8
host3 slots=8
and launch the jobs this way, mpirun --hostfile mpihosts.txt -np 1 ./parentjob This is a feature, not a bug; now it's MPIs job to figure out where the workers go, and if you don't specify a host explicitly in the info, it'll try to put it in the most underutilized place. It also means you don't have to recompile to change the hosts you'll spawn to.

Resources