Can MPI_Publish_name be used for two separately started applications? - mpi

I write an OpenMPI application which consists of a server and a client part which are launched separately:
me#server1:~> mpirun server
and
me#server2:~> mpirun client
server creates a port using MPI_Open_port. The question is: Does OpenMPI have a mechanism to communicate the port to client? I suppose that MPI_Publish_name and MPI_Lookup_name doesn't work here because server wouldn't know to which other computer the information should be sent.
To me, it looks like only processes which were started using a single mpirun can communicate with MPI_Publish_name.
I also found ompi-server, but the documentation is too minimalistic for me to understand this. Does anyone know how this is used?
Related: MPICH: How to publish_name such that a client application can lookup_name it? and https://stackoverflow.com/questions/9263458/client-server-example-using-ompi-does-not-work

MPI_Publish_name is supplied with an MPI info object, which could have an Open MPI specific boolean key ompi_global_scope. If this key is set to true, then the name would be published to the global scope, i.e. to an already running instance of ompi-server. MPI_Lookup_name by default first does a global name lookup if the URI of the ompi-server was provided.
With a dedicated Open MPI server
The process involves several steps:
1) Start the ompi-server somewhere in the cluster where it could be accessed from all nodes. For debugging purposes you may pass it the --no-daemonize -r + argument. It would start and print to the standard output an URI similar to this one:
$ ompi-server --no-daemonize -r +
1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351
2) In the server, build an MPI info object and set the ompi_global_scope key to true:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "ompi_global_scope", "true");
Then pass the info object to MPI_Publish_name:
MPI_Publish_name("server", info, port_name);
3) In the client, the call to MPI_Lookup_name would automatically do the lookup in the global context first (this could be changed by providing the proper key in the MPI info object, but in your case the default behaviour should suffice).
In order for both client and server code to know where the ompi-server is located, you have to give its URI to both mpirun commands with the --ompi-server 1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351 option.
Another option is to have ompi-server write the URI to a file, which can then be read on the node(s) where mpirun is to be run. For example, if you start the server on the same node where both mpirun commands are executed, then you could use a file in /tmp. If you start the ompi-server on a different node, then a shared file system (NFS, Lustre, etc.) would do. Either way, the set of commands would be:
$ ompi-server [--no-daemonize] -r file:/path/to/urifile
...
$ mpirun --ompi-server file:/path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client
Serverless method
If run both mpirun's on the same node, the --ompi-server could also specify the PID of an already running mpirun instance to be used as a name server. It allows you to use local name publishing in the server (i.e. skip the "run an ompi-server" and "make an info object" parts). The sequence of commands would be:
head-node$ mpirun --report-pid server
[ note the PID of this mpirun instance ]
...
head-node$ mpirun --ompi-server pid:12345 client
where 12345 should be replaced by the real PID of the server's mpirun.
You can also have the server's mpirun print its URI and pass that URI to the client's mpirun:
$ mpirun --report-uri + server
[ note the URI ]
...
$ mpirun --ompi-server URI client
You could also have the URI written to a file if you specify /path/to/file (note: no file: prefix here) instead of + after the --report-uri option:
$ mpirun --report-uri /path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client
Note that the URI returned by mpirun has the same format as that of an ompi-server, i.e. it includes the host IP address, so it also works if the second mpirun is executed on a different node, which is able to talk to the first node via TCP/IP (and /path/to/urifile lives on a shared file system).
I tested all of the above with Open MPI 1.6.1. Some of the variant might not work with earlier versions.

Related

How do I get back to the running instance of riak-shell?

I was in riak-shell when ssh lost its connection to the server. After reconnecting, I do the following:
sudo riak-shell
and get:
An instance of riak-shell is already running
So, I restarted the riak node in question. This did not seem to solve the problem. I do not see anything using ps -aux to kill. According to the docs, only one instance can run at a time. That makes sense, but when I run riak-shell from another node and try to connect to any node, I now get the following:
Error: invalid function call : connection_EXT:connect ["riak#<<<ip_address_elided>>>"]
You can connect to a specific node (whether in your riak_shell.config
or not) by typing 'connect "dev1#127.0.0.1";' substituting your
node name for dev1.
You may need to change the Erlang cookie to do this.
See also the 'reconnect' command.
Unhandled message received is {#Ref<0.0.0.135>,disconnected}
riak-shell(3)>
I have not changed the cookies during this process, and the cookie appears to be the same (at least in /etc/riak/riak_shell.config). (I am running the Riak TS AMI on AWS.)
riak-shell runs in its own Erlang VM - entirely separate from the riak node
(You don't need to run riak-shell from the machine your node is on - it uses the normal riak-erlang-client to talk to riak)
If you you are on a Linux do ps aux | grep riak_shell_app it will give you the process number you need to kill that instance:
08:30:45:~ $ ps aux | grep riak_shell_app
vagrant 4671 0.0 0.3 493260 34884 pts/4 Sl+ Aug17 0:03 /home/vagrant/riak_ee/dev/dev1/erts-5.10.3/bin/beam.smp -- -root /home/vagrant/riak_ee/dev/dev1 -progname erl -- -home /home/vagrant -- -boot /home/vagrant/riak_ee/dev/dev1/releases/2.1.1/start_clean -run riak_shell_app boot debug_off /home/vagrant/riak_ee/dev/dev1/bin/../log/riak_shell/riak_shell -noshell -config /home/vagrant/riak_ee/dev/dev1/bin/../etc/riak
I wrote a good chunk of it so let me know how you got on:
https://github.com/basho/riak_shell/graphs/contributors

openMPI/mpich2 doesn't run on multiple nodes

I am trying to use install openMPI and mpich2 on a multi-node cluster and I am having trouble running on multiple machines in both cases. Using mpich2 I am able to run on an specific host from the head node, but if I try to run something from the compute nodes to a different node I get:
HYDU_sock_connect (utils/sock/sock.c:172): unable to connect from "destination_node" to "parent_node" (No route to host)
[proxy:0:0#destination_node] main (pm/pmiserv/pmip.c:189): unable to connect to server parent_node at port 56411 (check for firewalls!)
If I try to use sge to set up a job I get similar errors.
On the other hand, if I try to use openMPI to run jobs, I am not able to run in any remote machine, even from the head node. I get:
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
The machines are connected to each other, I can ping, ssh passwordlessly etc from any of them to any other, MPI_LIB and the PATH are well set in all machines.
Usually this is caused because you didn't set up a hostfile or pass the list of hosts on the command line.
For MPICH, you do this by passing the flag -host on the command line, followed by a list of hosts (host1,host2,host3,etc.).
mpiexec -host host1,host2,host3 -n 3 <executable>
You can also put these in a file:
host1
host2
host3
Then you pass that file on the command line like so:
mpiexec -f <hostfile> -n 3 <executable>
Similarly, with Open MPI, you would use:
mpiexec --host host1,host2,host3 -n 3 <executable>
and
mpiexec --hostfile hostfile -n 3 <executable>
You can get more information at these links:
MPICH - https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks
Open MPI - http://www.open-mpi.org/faq/?category=running#mpirun-hostfile

Run R/Rook as a web server on startup

I have created a server using Rook in R - http://cran.r-project.org/web/packages/Rook
Code is as follows
#!/usr/bin/Rscript
library(Rook)
s <- Rhttpd$new()
s$add(
name="pingpong",
app=Rook::URLMap$new(
'/ping' = function(env){
req <- Rook::Request$new(env)
res <- Rook::Response$new()
res$write(sprintf('<h1>Pong</h1>',req$to_url("/pong")))
res$finish()
},
'/pong' = function(env){
req <- Rook::Request$new(env)
res <- Rook::Response$new()
res$write(sprintf('<h1>Ping</h1>',req$to_url("/ping")))
res$finish()
},
'/?' = function(env){
req <- Rook::Request$new(env)
res <- Rook::Response$new()
res$redirect(req$to_url('/pong'))
res$finish()
}
)
)
## Not run:
s$start(port=9000)
$ ./Rook.r
Loading required package: tools
Loading required package: methods
Loading required package: brew
starting httpd help server ... done
Server started on host 127.0.0.1 and port 9000 . App urls are:
http://127.0.0.1:9000/custom/pingpong
Server started on 127.0.0.1:9000
[1] pingpong http://127.0.0.1:9000/custom/pingpong
Call browse() with an index number or name to run an application.
$
And the process ends here.
Its running fine in the R shell but then i want to run it as a server on system startup.
So once the start is called , R should not exit but wait for requests on the port.
How will i convince R to simply wait or sleep rather than exiting ?
I can use the wait or sleep function in R to wait some N seconds , but that doesnt fit the bill perfectly
Here is one suggestion:
First split the example you gave into (at least) two files: One file contains the definition of the application, which in your example is the value of the app parameter to the Rhttpd$add() function. The other file is the RScript that starts the application defined in the first file.
For example, if the name of your application function is named pingpong defined in a file named Rook.R, then the Rscript might look something like:
#!/usr/bin/Rscript --default-packages=methods,utils,stats,Rook
# This script takes as a single argument the port number on which to listen.
args <- commandArgs(trailingOnly=TRUE)
if (length(args) < 1) {
cat(paste("Usage:",
substring(grep("^--file=", commandArgs(), value=T), 8),
"<port-number>\n"))
quit(save="no", status=1)
} else if (length(args) > 1)
cat("Warning: extra arguments ignored\n")
s <- Rhttpd$new()
app <- RhttpdApp$new(name='pingpong', app='Rook.R')
s$add(app)
s$start(port=args[1], quiet=F)
suspend_console()
As you can see, this script takes one argument that specifies the listening port. Now you can create a shell script that will invoke this Rscript multiple times to start multiple instances of your server listening on different ports in order to enable some concurrency in responding to HTTP requests.
For example, if the Rscript above is in a file named start.r then such a shell script might look something like:
#!/bin/sh
if [ $# -lt 2 ]; then
echo "Usage: $0 <start-port> <instance-count>"
exit 1
fi
start_port=$1
instance_count=$2
end_port=$((start_port + instance_count - 1))
fifo=/tmp/`basename $0`$$
exit_command="echo $(basename $0) exiting; rm $fifo; kill \$(jobs -p)"
mkfifo $fifo
trap "$exit_command" INT TERM
cd `dirname $0`
for port in $(seq $start_port $end_port)
do ./start.r $port &
done
# block until interrupted
read < $fifo
The above shell script takes two arguments: (1) the lowest port-number to listen on and (2) the number of instances to start. For example, if the shell script is in an executable file named start.sh then
./start.sh 9000 3
will start three instances of your Rook application listening on ports 9000, 9001 and 9002, respectively.
You see the last line of the shell script reads from the fifo which prevents the script from exiting until caused to by a received signal. When one of the specified signals is trapped, the shell script kills all the Rook server processes that it started before it exits.
Now you can configure a reverse proxy to forward incoming requests to any of the server instances. For example, if you are using Nginx, your configuration might look something like:
upstream rookapp {
server localhost:9000;
server localhost:9001;
server localhost:9002;
}
server {
listen your.ip.number.here:443;
location /pingpong/ {
proxy_pass http://rookapp/custom/pingpong/;
}
}
Then your service can be available on the public Internet.
The final step is to create a control script with options such as start (to invoke the above shell script) and stop (to send it a TERM signal to stop your servers). Such a script will handle things such as causing the shell script to run as a daemon and keeping track of its process id number. Install this control script in the appropriate location and it will start your Rook application servers when the machine boots. How to do that will depend on your operating system, the identity of which is missing from your question.
Notes
For an example of how the fifo in the shell script can be used to take different actions based on received signals, see this stack overflow question.
Jeffrey Horner has provided an example of a complete Rook server application.
You will see that the example shell script above traps only INT and TERM signals. I chose those because INT results from typing control-C at the terminal and TERM is the signal used by control scripts on my operating system to stop services. You might want to adjust the choice of signals to trap depending on your circumstances.
Have you tried this?
while (TRUE) {
Sys.sleep(0.5);
}

Writing a unix daemon

I'm trying to code a daemon in Unix. I understand the part how to make a daemon up and running . Now I want the daemon to respond when I type commands in the shell if they are targeted to the daemon.
For example:
Let us assume the daemon name is "mydaemon"
In terminal 1 I type mydaemon xxx.
In terminal 2 I type mydaemon yyy.
"mydaemon" should be able to receive the argument "xxx" and "yyy".
If I interpret your question correctly, then you have to do this as an application-level construct. That is, this is something specific to your program you're going to have to code up yourself.
The approach I would take is to write "mydaemon" with the idea of it being a wrapper: it checks the process table or a pid file to see if a "mydaemon" is already running. If not, then fork/exec your new daemon. If so, then send the arguments to it.
For "send the arguments to it", I would use named pipes, like are explained here: What are named pipes? Essentially, you can think of named pipes as being like "stdin", except they appear as a file to the rest of the system, so you can open them in your running "mydaemon" and check them for inputs.
Finally, it should be noted that all of this check-if-running-send-to-pipe stuff can either be done in your daemon program, using the API of the *nix OS, or it can be done in a script by using e.g. 'ps', 'echo', etc...
The easiest, most common, and most robust way to do this in Linux is using a systemd socket service.
Example contents of /usr/lib/systemd/system/yoursoftware.socket:
[Unit]
Description=This is a description of your software
Before=yoursoftware.service
[Socket]
ListenStream=/run/yoursoftware.sock
Service=yourservicename.service
# E.x.: use SocketMode=0666 to give rw access to everyone
# E.x.: use SocketMode=0640 to give rw access to root and read-only to SocketGroup
SocketMode=0660
SocketUser=root
# Use socket group to grant access only to specific processes
SocketGroup=root
[Install]
WantedBy=sockets.target
NOTE: If you are creating a local-user daemon instead of a root daemon, then your systemd files go in /usr/lib/systemd/user/ (see pulseaudio.socket for example) or ~/.config/systemd/user/ and your socket is at /run/usr/$(id -u)/yoursoftware.sock (note that you can't actually use command substitution in pathnames in systemd.)
Example contents of /lib/systemd/system/yoursoftware.service
[Unit]
Description=This is a description of your software
Requires=yoursoftware.socket
[Service]
ExecStart=/usr/local/bin/yoursoftware --daemon --yourarg yourvalue
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
Also=yoursoftware.socket
Run systemctl daemon-reload && systemctl enable yoursoftware.socket yoursoftware.service as root
Use systemctl --user daemon-reload && systemctl --user enable yoursoftware.socket yoursoftware.service if you're creating the service to run as a local-user
A functional example of the software in C would be way too long, so here's an example in NodeJS. Here is /usr/local/bin/yoursoftware:
#!/usr/bin/env node
var SOCKET_PATH = "/run/yoursoftware.sock";
function errorHandle(e) {
if (e) console.error(e), process.exit(1);
}
if (process.argv[0] === "--daemon") {
var logFile = require("fs").createWriteStream(
"/var/log/yoursoftware.log", {flags: "a"});
require('net').createServer(errorHandle)
.listen(SOCKET_PATH, s => s.pipe(logFile));
} else process.stdin.pipe(
require('net')
.createConnection(SOCKET_PATH, errorHandle)
);
In the example above, you can run many yoursoftware instances at the same time, and the stdin of each of the instances will be piped through to the daemon, which appends all the stuff it receives to a log file.
For non-Linux OSes and distros without systemd, you would use the (typically shell-scripted) startup system to begin your process at boot and the user would receive an error like could not connect to socket /run/yoursoftware.sock when something goes wrong with your daemon.

mpi_comm_spawn on remote nodes

How does one use MPI_Comm_spawn to start worker processes on remote nodes?
Using OpenMPI 1.4.3, I've tried this code:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "host", "node2");
MPI_Comm intercom;
MPI_Comm_spawn("worker",
MPI_ARGV_NULL,
nprocs,
info,
0,
MPI_COMM_SELF,
&intercom,
MPI_ERRCODES_IGNORE);
But that fails with this error message:
--------------------------------------------------------------------------
There are no allocated resources for the application
worker
that match the requested mapping:
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
If I replace the "node2" with the name of my local machine, then it works fine. If I ssh into node2 and run the same thing there (with "node2" in the info dictionary) then it also works fine.
I don't want to start the parent process with mpirun, so I'm just looking for a way to dynamically spawn processes on remote nodes. Is this possible?
I don't want to start the parent
process with mpirun, so I'm just
looking for a way to dynamically spawn
processes on remote nodes. Is this
possible?
I'm not sure why you don't want to start it with mpirun? You're implicitly starting up the whole MPI machinery anyway as soon as you hit MPI_Init(), this way you just get to pass it options rather than relying on the default.
The issue here is simply that when the MPI library starts up (at MPI_Init()) it doesn't see any other hosts available, because you haven't given it any with the --host or --hostfile options to mpirun. It won't just launch processes elsewhere on your say-so (indeed, spawn doesn't require Info host, so in general it wouldn't even know where to go otherwise), so it fails.
So you'll need to do
mpirun --host myhost,host2 -np 1 ./parentjob
or, more generally, provide a hostfile, preferably with a number of slots available
myhost slots=1
host2 slots=8
host3 slots=8
and launch the jobs this way, mpirun --hostfile mpihosts.txt -np 1 ./parentjob This is a feature, not a bug; now it's MPIs job to figure out where the workers go, and if you don't specify a host explicitly in the info, it'll try to put it in the most underutilized place. It also means you don't have to recompile to change the hosts you'll spawn to.

Resources