mpi_comm_spawn on remote nodes - mpi

How does one use MPI_Comm_spawn to start worker processes on remote nodes?
Using OpenMPI 1.4.3, I've tried this code:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "host", "node2");
MPI_Comm intercom;
MPI_Comm_spawn("worker",
MPI_ARGV_NULL,
nprocs,
info,
0,
MPI_COMM_SELF,
&intercom,
MPI_ERRCODES_IGNORE);
But that fails with this error message:
--------------------------------------------------------------------------
There are no allocated resources for the application
worker
that match the requested mapping:
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
If I replace the "node2" with the name of my local machine, then it works fine. If I ssh into node2 and run the same thing there (with "node2" in the info dictionary) then it also works fine.
I don't want to start the parent process with mpirun, so I'm just looking for a way to dynamically spawn processes on remote nodes. Is this possible?

I don't want to start the parent
process with mpirun, so I'm just
looking for a way to dynamically spawn
processes on remote nodes. Is this
possible?
I'm not sure why you don't want to start it with mpirun? You're implicitly starting up the whole MPI machinery anyway as soon as you hit MPI_Init(), this way you just get to pass it options rather than relying on the default.
The issue here is simply that when the MPI library starts up (at MPI_Init()) it doesn't see any other hosts available, because you haven't given it any with the --host or --hostfile options to mpirun. It won't just launch processes elsewhere on your say-so (indeed, spawn doesn't require Info host, so in general it wouldn't even know where to go otherwise), so it fails.
So you'll need to do
mpirun --host myhost,host2 -np 1 ./parentjob
or, more generally, provide a hostfile, preferably with a number of slots available
myhost slots=1
host2 slots=8
host3 slots=8
and launch the jobs this way, mpirun --hostfile mpihosts.txt -np 1 ./parentjob This is a feature, not a bug; now it's MPIs job to figure out where the workers go, and if you don't specify a host explicitly in the info, it'll try to put it in the most underutilized place. It also means you don't have to recompile to change the hosts you'll spawn to.

Related

Error in MPI program execution - no active ports found

I am trying to run a simple MPI job across multiple hosts of a cluster.
[capc#gpu6 mpi_tests]$ /opt/openmpi4.0.3/build/bin/mpirun --host gpu7,gpu6 ./a.out
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.
Local host: gpu7
We have 2 processes.
WARNING: Open MPI accepted a TCP connection from what appears to be a
another Open MPI process but cannot find a corresponding process
entry for that peer.
This attempted connection will be ignored; your MPI job may or may not
continue properly.
Local host: gpu6
PID: 29209
[gpu6:29203] 1 more process has sent help message help-mpi-btl-openib.txt / no active ports found
[gpu6:29203] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
I have compiled the MPI program with mpicc and on running with mpirun it hangs.
Can anyone guide me regarding this?

MPI: Pin each instance to certain cores on each node

I want to execute several instances of my program with OpenMPI 2.11. Each instance runs on its own node (-N 1) on my cluster. This works fine. I now want to pin each program-instance to the first 2 cores of its node. To do that, it looks like I need to use rankfiles. Here is my rankfile:
rank 0=+n0 slot=0-1
rank 1=+n1 slot=0-1
This, in my opinion, should limit each program-instance to cores 0 and 1 of the local machine it runs on.
I execute mpirun like so:
mpirun -np 2 -N 1 -rf /my/rank/file my_program
But mpirun fails with this error without even executing my program:
Conflicting directives for mapping policy are causing the policy
to be redefined:
New policy: RANK_FILE
Prior policy: UNKNOWN
Please check that only one policy is defined.
What's this? Did I make a mistake in the rankfile?
Instead of using a rankfile, simply use a hostfile:
n0 slots=n max_slots=n
n1 slots=n max_slots=n
Then tell Open MPI to map one process per node with two cores per process using:
mpiexec --hostfile hostfile --map-by ppr:1:node:PE=2 --bind-to core ...
ppr:1:node:PE=2 reads as: 1 process per resource; resource type is node; 2 processing elements per process. You can check the actual binding by adding the --report-bindings option.

openMPI/mpich2 doesn't run on multiple nodes

I am trying to use install openMPI and mpich2 on a multi-node cluster and I am having trouble running on multiple machines in both cases. Using mpich2 I am able to run on an specific host from the head node, but if I try to run something from the compute nodes to a different node I get:
HYDU_sock_connect (utils/sock/sock.c:172): unable to connect from "destination_node" to "parent_node" (No route to host)
[proxy:0:0#destination_node] main (pm/pmiserv/pmip.c:189): unable to connect to server parent_node at port 56411 (check for firewalls!)
If I try to use sge to set up a job I get similar errors.
On the other hand, if I try to use openMPI to run jobs, I am not able to run in any remote machine, even from the head node. I get:
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
The machines are connected to each other, I can ping, ssh passwordlessly etc from any of them to any other, MPI_LIB and the PATH are well set in all machines.
Usually this is caused because you didn't set up a hostfile or pass the list of hosts on the command line.
For MPICH, you do this by passing the flag -host on the command line, followed by a list of hosts (host1,host2,host3,etc.).
mpiexec -host host1,host2,host3 -n 3 <executable>
You can also put these in a file:
host1
host2
host3
Then you pass that file on the command line like so:
mpiexec -f <hostfile> -n 3 <executable>
Similarly, with Open MPI, you would use:
mpiexec --host host1,host2,host3 -n 3 <executable>
and
mpiexec --hostfile hostfile -n 3 <executable>
You can get more information at these links:
MPICH - https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks
Open MPI - http://www.open-mpi.org/faq/?category=running#mpirun-hostfile

Unicorn is killed automatically

I'm using unicorn in a staging environment (Ubuntu), when a build process is started unicorn is killed automatically with the following logs.
I, [2014-09-23T06:59:58.912673 #16717] INFO -- : reaped #<Process::Status: pid 16720 exit 0> worker=0
I, [2014-09-23T06:59:58.913144 #16717] INFO -- : reaped #<Process::Status: pid 16722 exit 0> worker=1
I, [2014-09-23T06:59:58.913464 #16717] INFO -- : master complete
I'm unable to locate why this is error is happening.
It seems your unicorn server is gracefully shutdown by sending a SIGQUIT to the master process. In this case, the master process reaps all its worker processes after they have finished their current request and then shuts down itself. Unicorn supports a couple more signals to trigger certain behaviour (e.g. adding or removing workers, reloading itself, ...). You can lean more about that at the SIGNALS documentation of unicorn.
The SIGQUIT is probably caused by your deployment process which probably tries to reload/restart your unicorn but dies something strange. Generally, you should look at your unicorn init script or your deployment process for which signals are send (e.g by using the kill command).

Can MPI_Publish_name be used for two separately started applications?

I write an OpenMPI application which consists of a server and a client part which are launched separately:
me#server1:~> mpirun server
and
me#server2:~> mpirun client
server creates a port using MPI_Open_port. The question is: Does OpenMPI have a mechanism to communicate the port to client? I suppose that MPI_Publish_name and MPI_Lookup_name doesn't work here because server wouldn't know to which other computer the information should be sent.
To me, it looks like only processes which were started using a single mpirun can communicate with MPI_Publish_name.
I also found ompi-server, but the documentation is too minimalistic for me to understand this. Does anyone know how this is used?
Related: MPICH: How to publish_name such that a client application can lookup_name it? and https://stackoverflow.com/questions/9263458/client-server-example-using-ompi-does-not-work
MPI_Publish_name is supplied with an MPI info object, which could have an Open MPI specific boolean key ompi_global_scope. If this key is set to true, then the name would be published to the global scope, i.e. to an already running instance of ompi-server. MPI_Lookup_name by default first does a global name lookup if the URI of the ompi-server was provided.
With a dedicated Open MPI server
The process involves several steps:
1) Start the ompi-server somewhere in the cluster where it could be accessed from all nodes. For debugging purposes you may pass it the --no-daemonize -r + argument. It would start and print to the standard output an URI similar to this one:
$ ompi-server --no-daemonize -r +
1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351
2) In the server, build an MPI info object and set the ompi_global_scope key to true:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "ompi_global_scope", "true");
Then pass the info object to MPI_Publish_name:
MPI_Publish_name("server", info, port_name);
3) In the client, the call to MPI_Lookup_name would automatically do the lookup in the global context first (this could be changed by providing the proper key in the MPI info object, but in your case the default behaviour should suffice).
In order for both client and server code to know where the ompi-server is located, you have to give its URI to both mpirun commands with the --ompi-server 1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351 option.
Another option is to have ompi-server write the URI to a file, which can then be read on the node(s) where mpirun is to be run. For example, if you start the server on the same node where both mpirun commands are executed, then you could use a file in /tmp. If you start the ompi-server on a different node, then a shared file system (NFS, Lustre, etc.) would do. Either way, the set of commands would be:
$ ompi-server [--no-daemonize] -r file:/path/to/urifile
...
$ mpirun --ompi-server file:/path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client
Serverless method
If run both mpirun's on the same node, the --ompi-server could also specify the PID of an already running mpirun instance to be used as a name server. It allows you to use local name publishing in the server (i.e. skip the "run an ompi-server" and "make an info object" parts). The sequence of commands would be:
head-node$ mpirun --report-pid server
[ note the PID of this mpirun instance ]
...
head-node$ mpirun --ompi-server pid:12345 client
where 12345 should be replaced by the real PID of the server's mpirun.
You can also have the server's mpirun print its URI and pass that URI to the client's mpirun:
$ mpirun --report-uri + server
[ note the URI ]
...
$ mpirun --ompi-server URI client
You could also have the URI written to a file if you specify /path/to/file (note: no file: prefix here) instead of + after the --report-uri option:
$ mpirun --report-uri /path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client
Note that the URI returned by mpirun has the same format as that of an ompi-server, i.e. it includes the host IP address, so it also works if the second mpirun is executed on a different node, which is able to talk to the first node via TCP/IP (and /path/to/urifile lives on a shared file system).
I tested all of the above with Open MPI 1.6.1. Some of the variant might not work with earlier versions.

Resources