Run R/Rook as a web server on startup - r

I have created a server using Rook in R - http://cran.r-project.org/web/packages/Rook
Code is as follows
#!/usr/bin/Rscript
library(Rook)
s <- Rhttpd$new()
s$add(
name="pingpong",
app=Rook::URLMap$new(
'/ping' = function(env){
req <- Rook::Request$new(env)
res <- Rook::Response$new()
res$write(sprintf('<h1>Pong</h1>',req$to_url("/pong")))
res$finish()
},
'/pong' = function(env){
req <- Rook::Request$new(env)
res <- Rook::Response$new()
res$write(sprintf('<h1>Ping</h1>',req$to_url("/ping")))
res$finish()
},
'/?' = function(env){
req <- Rook::Request$new(env)
res <- Rook::Response$new()
res$redirect(req$to_url('/pong'))
res$finish()
}
)
)
## Not run:
s$start(port=9000)
$ ./Rook.r
Loading required package: tools
Loading required package: methods
Loading required package: brew
starting httpd help server ... done
Server started on host 127.0.0.1 and port 9000 . App urls are:
http://127.0.0.1:9000/custom/pingpong
Server started on 127.0.0.1:9000
[1] pingpong http://127.0.0.1:9000/custom/pingpong
Call browse() with an index number or name to run an application.
$
And the process ends here.
Its running fine in the R shell but then i want to run it as a server on system startup.
So once the start is called , R should not exit but wait for requests on the port.
How will i convince R to simply wait or sleep rather than exiting ?
I can use the wait or sleep function in R to wait some N seconds , but that doesnt fit the bill perfectly

Here is one suggestion:
First split the example you gave into (at least) two files: One file contains the definition of the application, which in your example is the value of the app parameter to the Rhttpd$add() function. The other file is the RScript that starts the application defined in the first file.
For example, if the name of your application function is named pingpong defined in a file named Rook.R, then the Rscript might look something like:
#!/usr/bin/Rscript --default-packages=methods,utils,stats,Rook
# This script takes as a single argument the port number on which to listen.
args <- commandArgs(trailingOnly=TRUE)
if (length(args) < 1) {
cat(paste("Usage:",
substring(grep("^--file=", commandArgs(), value=T), 8),
"<port-number>\n"))
quit(save="no", status=1)
} else if (length(args) > 1)
cat("Warning: extra arguments ignored\n")
s <- Rhttpd$new()
app <- RhttpdApp$new(name='pingpong', app='Rook.R')
s$add(app)
s$start(port=args[1], quiet=F)
suspend_console()
As you can see, this script takes one argument that specifies the listening port. Now you can create a shell script that will invoke this Rscript multiple times to start multiple instances of your server listening on different ports in order to enable some concurrency in responding to HTTP requests.
For example, if the Rscript above is in a file named start.r then such a shell script might look something like:
#!/bin/sh
if [ $# -lt 2 ]; then
echo "Usage: $0 <start-port> <instance-count>"
exit 1
fi
start_port=$1
instance_count=$2
end_port=$((start_port + instance_count - 1))
fifo=/tmp/`basename $0`$$
exit_command="echo $(basename $0) exiting; rm $fifo; kill \$(jobs -p)"
mkfifo $fifo
trap "$exit_command" INT TERM
cd `dirname $0`
for port in $(seq $start_port $end_port)
do ./start.r $port &
done
# block until interrupted
read < $fifo
The above shell script takes two arguments: (1) the lowest port-number to listen on and (2) the number of instances to start. For example, if the shell script is in an executable file named start.sh then
./start.sh 9000 3
will start three instances of your Rook application listening on ports 9000, 9001 and 9002, respectively.
You see the last line of the shell script reads from the fifo which prevents the script from exiting until caused to by a received signal. When one of the specified signals is trapped, the shell script kills all the Rook server processes that it started before it exits.
Now you can configure a reverse proxy to forward incoming requests to any of the server instances. For example, if you are using Nginx, your configuration might look something like:
upstream rookapp {
server localhost:9000;
server localhost:9001;
server localhost:9002;
}
server {
listen your.ip.number.here:443;
location /pingpong/ {
proxy_pass http://rookapp/custom/pingpong/;
}
}
Then your service can be available on the public Internet.
The final step is to create a control script with options such as start (to invoke the above shell script) and stop (to send it a TERM signal to stop your servers). Such a script will handle things such as causing the shell script to run as a daemon and keeping track of its process id number. Install this control script in the appropriate location and it will start your Rook application servers when the machine boots. How to do that will depend on your operating system, the identity of which is missing from your question.
Notes
For an example of how the fifo in the shell script can be used to take different actions based on received signals, see this stack overflow question.
Jeffrey Horner has provided an example of a complete Rook server application.
You will see that the example shell script above traps only INT and TERM signals. I chose those because INT results from typing control-C at the terminal and TERM is the signal used by control scripts on my operating system to stop services. You might want to adjust the choice of signals to trap depending on your circumstances.

Have you tried this?
while (TRUE) {
Sys.sleep(0.5);
}

Related

How to setup workers for parallel processing in R using snowfall and multiple Windows nodes?

I’ve successfully used snowfall to setup a cluster on a single server with 16 processors.
require(snowfall)
if (sfIsRunning() == TRUE) sfStop()
number.of.cpus <- 15
sfInit(parallel = TRUE, cpus = number.of.cpus)
stopifnot( sfCpus() == number.of.cpus )
stopifnot( sfParallel() == TRUE )
# Print the hostname for each cluster member
sayhello <- function()
{
info <- Sys.info()[c("nodename", "machine")]
paste("Hello from", info[1], "with CPU type", info[2])
}
names <- sfClusterCall(sayhello)
print(unlist(names))
Now, I am looking for complete instructions on how to move to a distributed model. I have 4 different Windows machines with a total of 16 cores that I would like to use for a 16 node cluster. So far, I understand that I could manually setup a SOCK connection or leverage MPI. While it appears possible, I haven’t found clear and complete directions as to how.
The SOCK route appears to depend on code in a snowlib script. I can generate a stub from the master side with the following code:
winOptions <-
list(host="172.01.01.03",
rscript="C:/Program Files/R/R-2.7.1/bin/Rscript.exe",
snowlib="C:/Rlibs")
cl <- makeCluster(c(rep(list(winOptions), 2)), type = "SOCK", manual = T)
It yields the following:
Manually start worker on 172.01.01.03 with
"C:/Program Files/R/R-2.7.1/bin/Rscript.exe"
C:/Rlibs/snow/RSOCKnode.R
MASTER=Worker02 PORT=11204 OUT=/dev/null SNOWLIB=C:/Rlibs
It feels like a reasonable start. I found code for RSOCKnode.R on GitHub under the snow package:
local({
master <- "localhost"
port <- ""
snowlib <- Sys.getenv("R_SNOW_LIB")
outfile <- Sys.getenv("R_SNOW_OUTFILE") ##**** defaults to ""; document
args <- commandArgs()
pos <- match("--args", args)
args <- args[-(1 : pos)]
for (a in args) {
pos <- regexpr("=", a)
name <- substr(a, 1, pos - 1)
value <- substr(a,pos + 1, nchar(a))
switch(name,
MASTER = master <- value,
PORT = port <- value,
SNOWLIB = snowlib <- value,
OUT = outfile <- value)
}
if (! (snowlib %in% .libPaths()))
.libPaths(c(snowlib, .libPaths()))
library(methods) ## because Rscript as of R 2.7.0 doesn't load methods
library(snow)
if (port == "") port <- getClusterOption("port")
sinkWorkerOutput(outfile)
cat("starting worker for", paste(master, port, sep = ":"), "\n")
slaveLoop(makeSOCKmaster(master, port))
})
It’s not clear how to actually start a SOCK listener on the workers, unless it is buried in snow::recvData.
Looking into the MPI route, as far as I can tell, Microsoft MPI version 7 is a starting point. However, I could not find a Windows alternative for sfCluster. I was able to start the MPI service, but it does not appear to listen on port 22 and no amount of bashing against it with snowfall::makeCluster has yielded a result. I’ve disabled the firewall and tried testing with makeCluster and directly connecting to the worker from the master with PuTTY.
Is there a comprehensive, step-by-step guide to setting up a snowfall cluster on Windows workers that I’ve missed? I am fond of snowfall::sfClusterApplyLB and would like to continue using that, but if there is an easier solution, I’d be willing to change course. Looking into Rmpi and parallel, I found alternative solutions for the master side of the work, but still little to no specific detail on how to setup workers running Windows.
Due to the nature of the work environment, neither moving to AWS, nor Linux is an option.
Related questions without definitive answers for Windows worker nodes:
How to set up cluster slave nodes (on Windows)
Parallel R on a Windows cluster
Create a cluster of co-workers' Windows 7 PCs for parallel processing in R?
There were several options for HPC infrastructure considered: MPICH, Open MPI, and MS MPI. Initially tried to use MPICH2 but gave up as the latest stable release 1.4.1 for Windows dated back by 2013 and no support since those times. Open MPI is not supported by Windows. Then only the MS MPI option is left.
Unfortunately snowfall does not support MS MPI so I decided to go with pbdMPI package, which supports MS MPI by default. pbdMPI implements the SPMD paradigm in contrast withRmpi, which uses manager/worker parallelism.
MS MPI installation, configuration, and execution
Install MS MPI v.10.1.2 on all machines in the to-be Windows HPC cluster.
Create a directory accessible to all nodes, where R-scripts / resources will reside, for example, \HeadMachine\SharedDir.
Check if MS MPI Launch Service (MsMpiLaunchSvc) running on all nodes.
Check, that MS MPI has the rights to run R application on all the nodes on behalf of the same user, i.e. SharedUser. The user name and the password must be the same for all machines.
Check, that R should be launched on behalf of the SharedUser user.
Finally, execute mpiexec with the following options mentioned in Steps 7-10:
mpiexec.exe -n %1 -machinefile "C:\MachineFileDir\hosts.txt" -pwd
SharedUserPassword –wdir "\HeadMachine\SharedDir" Rscript hello.R
where
-wdir is a network path to the directory with shared resources.
–pwd is a password by SharedUser user, for example, SharedUserPassword.
–machinefile is a path to hosts.txt text file, for example С:\MachineFileDir\hosts.txt. hosts.txt file must be readable from the head node at the specified path and it contains a list of IP addresses of the nodes on which the R script is to be run.
As a result of Step 7 MPI will log in as SharedUser with the password SharedUserPassword and execute copies of the R processes on each computer listed in the hosts.txt file.
Details
hello.R:
library(pbdMPI, quiet = TRUE)
init()
cat("Hello World from
process",comm.rank(),"of",comm.size(),"!\n")
finalize()
hosts.txt
The hosts.txt - MPI Machines File - is a text file, the lines of which contain the network names of the computers on which R scripts will be launched. In each line, after the computer name is separated by a space (for MS MPI), the number of MPI processes to be launched. Usually, it equals the number of processors in each node.
Sample of hosts.txt with three nodes having 2 processors each:
192.168.0.1 2
192.168.0.2 2
192.168.0.3 2

multi-computer makePSOCKcluster on Windows: Building a step-by-step guide

I've been trying to build a cluster using multiple computers for three days now and have failed spectacularly. So now I'm going to try to suck a bunch of you into solving my problem for me. If all goes well, I would hope we can generate a step-by-step guide to use as a reference to do this in the future, because as of yet, I haven't managed to find a decent reference for setting this up (perhaps it's too specific a task?)
In my case, let's assume Windows 7, with PuTTY as the SSH client, and 'localhost' is going to serve as the master.
Furthermore, let's assume only two computers on the same network for now. I imagine the process will generalize easily enough that if I can get it to work on two computers, I can get it to work on three. So we'll work on localhost and remote-computer.
Here's what I've gathered so far (with references linked at the bottom)
Install PuTTY on localhost.
Install PuTTY on remote-computer
Install an SSH server on remote-computer
Assign it a port to listen on? (I'm not sure about this step)
Install R on localhost
Install the same version of R on remote-computer
Add R to the PATH environment variable on both localhost and remote-computer
Run the R code below from localhost
code:
library(parallel)
cl <- makePSOCKcluster(c(rep("localhost", 2),
rep("remote-computer", 2)))
So far, I've done steps 1-3, not sure if I need to do 4, done 5-7, and the code for step 8 just hangs indefinitely.
When I check my SSH server logs, it doesn't appear that I'm hitting the SSH server from localhost. So it appears that my first problem is configuring the SSH correctly. Has anyone succeeded in doing this and would you be willing to share your expertise?
EDIT
Oops: references
http://www.milanor.net/blog/wp-content/uploads/2013/10/03.FirstStepinParallelComputing.pdf
R Parallel - connecting to remote cores
https://stat.ethz.ch/pipermail/r-sig-hpc/2010-October/000780.html
At best, this is a partial answer. I'm still not establishing a cluster, but the steps described here are a pretty good record of how I've gotten to this point.
CONFIGURATIONS:
Install PuTTY on 'remote-computer'
Install SSH server on 'remote-computer'
Install R on 'remote-computer' (Use the same version of R as on 'localhost')
Add R to the PATH
Install PuTTY on 'localhost'
Install R on 'localhost'
Add R to the PATH
TESTING THE CONNECTION: PHASE I
From the command line, run
C:\PuTTYPath\plink.exe -pw [password] [username]#[remote_ip_address] Rscript -e rnorm(100)
(Confirm return of 100 normal random variates
From the command line, run
C:\PuTTYPath\plink.exe -pw [password] [username]#[remoate_ip_address] RScript -e parallel:::.slaveRSOCK() MASTER=[local_ip_address] PORT=100501 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
(Confirm that a session is started on the SSH server logs on 'remote-computer')
TESTING THE CONNECTION: PHASE II
From an R Session, run
system(paste0("C:/PuTTYPath/plink.exe -pw [password] ",
"[username]#[remote_ip_address] ",
"RScript -e rnorm(100)"))
(Confirm return of 100 normal random variates)
From an R session, run
system(paste0("C:/PuTTY/plink.exe ",
"-pw [password] ",
"[username]#[remote_ip_address] ",
"RScript -e parallel:::.slaveRSOCK() ",
"MASTER=[local_ip_address] ",
"PORT=100501 ",
"OUT=/dev/null ",
"TIMEOUT=2592000 ",
"METHODS=TRUE ",
"XDR=TRUE"))
(Confirm that a session is started and maintained on the SSH server logs on 'remote-computer'
ESTABLISH A CLUSTER
From an R Session, run
library(snow)
cl <- makeCluster(spec = c("localhost", "[remote_ip_address]"),
rshcmd = "C:/PuTTY/plink.exe -pw [password]",
host = "[local_ip_address]")
(A session should be started and maintained on the SSH server logs on 'remote-computer'.
Ideally, the function will complete at 'cl' be assigned)
Establishing the cluster is the point at which I'm failing. I run makeCluster and watch my SSH server logs. It shows a connection is made and then immediately closed. makeCluster never finishes running, cl is not assigned, and I'm stuck on how to go on. I'm not even sure if this is an R problem or a configuration problem at this point.
EDIT AND RESOLUTION:
For no good reason, I tried running this with the snow package, as shown in the "Establish a Cluster" section above. When I used the snow package, the cluster is built and runs stably. Not sure why I couldn't get this to work with the parallel package, but at least I've got something functional.
For those who are looking for establishing clusters across several computers in Windows, #Benjamin's answer is almost correct, you need to follow his instructions until the last step, which is ESTABLISH A CLUSTER, and make sure the previous steps are all working in your computer. My solution is based on the package 'Parallel' instead of 'snow', which are essentially same.
Solution
Code template:
machineAddresses <-list(list(host='[Server address]',user='[user name]',rscript="[The Rscript file in the server]",rshcmd="plink -pw [Your password]"))
cl <- makePSOCKcluster(machineAddresses,manual = F)
You have to fill all the [] in your code. In my computer, it is:
machineAddresses <-list(list(host='192.168.1.220',user='jeff',rscript="C:/Program Files/R/R-3.3.2/bin/Rscript",rshcmd="plink -pw qwer"))
cl <- makePSOCKcluster(machineAddresses,manual = F)
Reason
Running cluster in Windows is very tricky, the function makePSOCKcluster usually does not work as expected. The easiest way to make it work is to change manual=F to manual=T and manually create workers. Here is a related post, which talks about why the function makePSOCKcluster will hang forever, and I think these two post basically stuck in the same place. I also post my answer to that question to discuss how to make it work.
R Parallel - connecting to remote cores
As I do not have the reputation to post a comment on Jeff's answer, I will post this as an answer:
The reason I have found that automatic start of cluster nodes using makePSOCKcluster does not work in Windows is that the arg and the outfile arguments in the internal parallel function newPSOCKnode are wrapped in the shQuotes function. This causes the combination of cmd.exe and Rscript.exe to return an error, which leads to makePSOCKcluster hanging forever.
The following two function definitions enable the automatic starting of the cluster nodes using makePSOCKcluter, assuming a proper configuration of ssh or putty/plink for key-based password-less login:
makePSOCKcluster <- function (names, ...)
{
if (is.numeric(names)) {
names <- as.integer(names[1L])
if (is.na(names) || names < 1L)
stop("numeric 'names' must be >= 1")
names <- rep("localhost", names)
}
parallel:::.check_ncores(length(names))
options <- parallel:::addClusterOptions(parallel:::defaultClusterOptions, list(...))
cl <- vector("list", length(names))
for (i in seq_along(cl)) cl[[i]] <- newPSOCKnode(names[[i]],
options = options, rank = i)
class(cl) <- c("SOCKcluster", "cluster")
cl
}
newPSOCKnode <- function (machine = "localhost", ..., options = parallel:::defaultClusterOptions,
rank)
{
options <- parallel:::addClusterOptions(options, list(...))
if (is.list(machine)) {
options <- parallel:::addClusterOptions(options, machine)
machine <- machine$host
}
outfile <- parallel:::getClusterOption("outfile", options)
master <- if (machine == "localhost")
"localhost"
else parallel:::getClusterOption("master", options)
port <- parallel:::getClusterOption("port", options)
setup_timeout <- parallel:::getClusterOption("setup_timeout", options)
manual <- parallel:::getClusterOption("manual", options)
timeout <- parallel:::getClusterOption("timeout", options)
methods <- parallel:::getClusterOption("methods", options)
useXDR <- parallel:::getClusterOption("useXDR", options)
env <- paste0("MASTER=", master, " PORT=", port, " OUT=",
#shQuote(outfile), " SETUPTIMEOUT=", setup_timeout, " TIMEOUT=",
(outfile), " SETUPTIMEOUT=", setup_timeout, " TIMEOUT=",
timeout, " XDR=", useXDR)
arg <- "parallel:::.slaveRSOCK()"
rscript <- if (parallel:::getClusterOption("homogeneous", options)) {
shQuote(parallel:::getClusterOption("rscript", options))
}
else "Rscript"
rscript_args <- parallel:::getClusterOption("rscript_args", options)
if (methods)
rscript_args <- c("--default-packages=datasets,utils,grDevices,graphics,stats,methods",
rscript_args)
cmd <- if (length(rscript_args))
paste(rscript, paste(rscript_args, collapse = " "), "-e",
#shQuote(arg), env)
arg, env)
#else paste(rscript, "-e", shQuote(arg), env)
else paste(rscript, "-e", arg, env)
renice <- parallel:::getClusterOption("renice", options)
if (!is.na(renice) && renice)
cmd <- sprintf("nice +%d %s", as.integer(renice), cmd)
if (manual) {
cat("Manually start worker on", machine, "with\n ",
cmd, "\n")
utils::flush.console()
}
else {
if (machine != "localhost") {
rshcmd <- parallel:::getClusterOption("rshcmd", options)
user <- parallel:::getClusterOption("user", options)
cmd <- shQuote(cmd)
cmd <- paste(rshcmd, "-l", user, machine, cmd)
}
if (.Platform$OS.type == "windows") {
system(cmd, wait = FALSE, input = "")
}
else system(cmd, wait = FALSE)
}
con <- socketConnection("localhost", port = port, server = TRUE,
blocking = TRUE, open = "a+b", timeout = timeout)
structure(list(con = con, host = machine, rank = rank), class = if (useXDR)
"SOCKnode"
else "SOCK0node")
}
I plan to update this response with more complete setup instructions when I have the chance.

How can I shut down Rserve gracefully?

I have tried many options both in Mac and in Ubuntu.
I read the Rserve documentation
http://rforge.net/Rserve/doc.html
and that for the Rserve and RSclient packages:
http://cran.r-project.org/web/packages/RSclient/RSclient.pdf
http://cran.r-project.org/web/packages/Rserve/Rserve.pdf
I cannot figure out what is the correct workflow for opening/closing a connection within Rserve and for shutting down Rserve 'gracefully'.
For example, in Ubuntu, I installed R from source with the ./config --enable-R-shlib (following the Rserve documentation) and also added the 'control enable' line in /etc/Rserve.conf.
In an Ubuntu terminal:
library(Rserve)
library(RSclient)
Rserve()
c<-RS.connect()
c ## this is an Rserve QAP1 connection
## Trying to shutdown the server
RSshutdown(c)
Error in writeBin(as.integer....): invalid connection
RS.server.shutdown(c)
Error in RS.server.shutdown(c): command failed with satus code 0x4e: no control line present (control commands disabled or server shutdown)
I can, however, CLOSE the connection:
RS.close(c)
>NULL
c ## Closed Rserve connection
After closing the connection, I also tried the options (also tried with argument 'c', even though the connection is closed):
RS.server.shutdown()
RSshutdown()
So, my questions are:
1- How can I close Rserve gracefully?
2- Can Rserve be used without RSclient?
I also looked at
How to Shutdown Rserve(), running in DEBUG
but the question refers to the debug mode and is also unresolved. (I don't have enough reputation to comment/ask whether the shutdown works in the non-debug mode).
Also looked at:
how to connect to Rserve with an R client
Thanks so much!
Load Rserve and RSclient packages, then connect to the instances.
> library(Rserve)
> library(RSclient)
> Rserve(port = 6311, debug = FALSE)
> Rserve(port = 6312, debug = TRUE)
Starting Rserve...
"C:\..\Rserve.exe" --RS-port 6311
Starting Rserve...
"C:\..\Rserve_d.exe" --RS-port 6312
> rsc <- RSconnect(port = 6311)
> rscd <- RSconnect(port = 6312)
Looks like they're running...
> system('tasklist /FI "IMAGENAME eq Rserve.exe"')
> system('tasklist /FI "IMAGENAME eq Rserve_d.exe"')
Image Name PID Session Name Session# Mem Usage
========================= ======== ================ =========== ============
Rserve.exe 8600 Console 1 39,312 K
Rserve_d.exe 12652 Console 1 39,324 K
Let's shut 'em down.
> RSshutdown(rsc)
> RSshutdown(rscd)
And they're gone...
> system('tasklist /FI "IMAGENAME eq Rserve.exe"')
> system('tasklist /FI "IMAGENAME eq Rserve_d.exe"')
INFO: No tasks are running which match the specified criteria.
Rserve can be used w/o RSclient by starting it with args and/or a config script. Then you can connect to it from some other program (like Tableau) or with your own code. RSclient provides a way to pass commands/data to Rserve from an instance of R.
Hope this helps :)
On a Windows system, if you want to close an RServe instance, you can use the system function in R to close it down.
For example in R:
library(Rserve)
Rserve() # run without any arguments or ports specified
system('tasklist /FI "IMAGENAME eq Rserve.exe"') # run this to see RServe instances and their PIDs
system('TASKKILL /PID {yourPID} /F') # run this to kill off the RServe instance with your selected PID
If you have closed your RServe instance with that PID correctly, the following message will appear:
SUCCESS: The process with PID xxxx has been terminated.
You can check the RServe instance has been closed down by entering
system('tasklist /FI "IMAGENAME eq Rserve.exe"')
again. If there are no RServe instances running any more, you will get the message
INFO: No tasks are running which match the specified criteria.
More help and info on this topic can be seen in this related question.
Note that the 'RSClient' approach mentioned in an earlier answer is tidier and easier than this one, but I put it forward anyway for those who start RServe without knowing how to stop it.
If you are not able to shut it down within R, run the codes below to kill it in terminal. These codes work on Mac.
$ ps ax | grep Rserve # get active Rserve sessions
You will see outputs like below. 29155 is job id of the active Rserve session.
29155 /Users/userid/Library/R/3.5/library/Rserve/libs/Rserve
38562 0:00.00 grep Rserve
Then run
$ kill 29155

Can MPI_Publish_name be used for two separately started applications?

I write an OpenMPI application which consists of a server and a client part which are launched separately:
me#server1:~> mpirun server
and
me#server2:~> mpirun client
server creates a port using MPI_Open_port. The question is: Does OpenMPI have a mechanism to communicate the port to client? I suppose that MPI_Publish_name and MPI_Lookup_name doesn't work here because server wouldn't know to which other computer the information should be sent.
To me, it looks like only processes which were started using a single mpirun can communicate with MPI_Publish_name.
I also found ompi-server, but the documentation is too minimalistic for me to understand this. Does anyone know how this is used?
Related: MPICH: How to publish_name such that a client application can lookup_name it? and https://stackoverflow.com/questions/9263458/client-server-example-using-ompi-does-not-work
MPI_Publish_name is supplied with an MPI info object, which could have an Open MPI specific boolean key ompi_global_scope. If this key is set to true, then the name would be published to the global scope, i.e. to an already running instance of ompi-server. MPI_Lookup_name by default first does a global name lookup if the URI of the ompi-server was provided.
With a dedicated Open MPI server
The process involves several steps:
1) Start the ompi-server somewhere in the cluster where it could be accessed from all nodes. For debugging purposes you may pass it the --no-daemonize -r + argument. It would start and print to the standard output an URI similar to this one:
$ ompi-server --no-daemonize -r +
1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351
2) In the server, build an MPI info object and set the ompi_global_scope key to true:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "ompi_global_scope", "true");
Then pass the info object to MPI_Publish_name:
MPI_Publish_name("server", info, port_name);
3) In the client, the call to MPI_Lookup_name would automatically do the lookup in the global context first (this could be changed by providing the proper key in the MPI info object, but in your case the default behaviour should suffice).
In order for both client and server code to know where the ompi-server is located, you have to give its URI to both mpirun commands with the --ompi-server 1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351 option.
Another option is to have ompi-server write the URI to a file, which can then be read on the node(s) where mpirun is to be run. For example, if you start the server on the same node where both mpirun commands are executed, then you could use a file in /tmp. If you start the ompi-server on a different node, then a shared file system (NFS, Lustre, etc.) would do. Either way, the set of commands would be:
$ ompi-server [--no-daemonize] -r file:/path/to/urifile
...
$ mpirun --ompi-server file:/path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client
Serverless method
If run both mpirun's on the same node, the --ompi-server could also specify the PID of an already running mpirun instance to be used as a name server. It allows you to use local name publishing in the server (i.e. skip the "run an ompi-server" and "make an info object" parts). The sequence of commands would be:
head-node$ mpirun --report-pid server
[ note the PID of this mpirun instance ]
...
head-node$ mpirun --ompi-server pid:12345 client
where 12345 should be replaced by the real PID of the server's mpirun.
You can also have the server's mpirun print its URI and pass that URI to the client's mpirun:
$ mpirun --report-uri + server
[ note the URI ]
...
$ mpirun --ompi-server URI client
You could also have the URI written to a file if you specify /path/to/file (note: no file: prefix here) instead of + after the --report-uri option:
$ mpirun --report-uri /path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client
Note that the URI returned by mpirun has the same format as that of an ompi-server, i.e. it includes the host IP address, so it also works if the second mpirun is executed on a different node, which is able to talk to the first node via TCP/IP (and /path/to/urifile lives on a shared file system).
I tested all of the above with Open MPI 1.6.1. Some of the variant might not work with earlier versions.

Writing a unix daemon

I'm trying to code a daemon in Unix. I understand the part how to make a daemon up and running . Now I want the daemon to respond when I type commands in the shell if they are targeted to the daemon.
For example:
Let us assume the daemon name is "mydaemon"
In terminal 1 I type mydaemon xxx.
In terminal 2 I type mydaemon yyy.
"mydaemon" should be able to receive the argument "xxx" and "yyy".
If I interpret your question correctly, then you have to do this as an application-level construct. That is, this is something specific to your program you're going to have to code up yourself.
The approach I would take is to write "mydaemon" with the idea of it being a wrapper: it checks the process table or a pid file to see if a "mydaemon" is already running. If not, then fork/exec your new daemon. If so, then send the arguments to it.
For "send the arguments to it", I would use named pipes, like are explained here: What are named pipes? Essentially, you can think of named pipes as being like "stdin", except they appear as a file to the rest of the system, so you can open them in your running "mydaemon" and check them for inputs.
Finally, it should be noted that all of this check-if-running-send-to-pipe stuff can either be done in your daemon program, using the API of the *nix OS, or it can be done in a script by using e.g. 'ps', 'echo', etc...
The easiest, most common, and most robust way to do this in Linux is using a systemd socket service.
Example contents of /usr/lib/systemd/system/yoursoftware.socket:
[Unit]
Description=This is a description of your software
Before=yoursoftware.service
[Socket]
ListenStream=/run/yoursoftware.sock
Service=yourservicename.service
# E.x.: use SocketMode=0666 to give rw access to everyone
# E.x.: use SocketMode=0640 to give rw access to root and read-only to SocketGroup
SocketMode=0660
SocketUser=root
# Use socket group to grant access only to specific processes
SocketGroup=root
[Install]
WantedBy=sockets.target
NOTE: If you are creating a local-user daemon instead of a root daemon, then your systemd files go in /usr/lib/systemd/user/ (see pulseaudio.socket for example) or ~/.config/systemd/user/ and your socket is at /run/usr/$(id -u)/yoursoftware.sock (note that you can't actually use command substitution in pathnames in systemd.)
Example contents of /lib/systemd/system/yoursoftware.service
[Unit]
Description=This is a description of your software
Requires=yoursoftware.socket
[Service]
ExecStart=/usr/local/bin/yoursoftware --daemon --yourarg yourvalue
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
Also=yoursoftware.socket
Run systemctl daemon-reload && systemctl enable yoursoftware.socket yoursoftware.service as root
Use systemctl --user daemon-reload && systemctl --user enable yoursoftware.socket yoursoftware.service if you're creating the service to run as a local-user
A functional example of the software in C would be way too long, so here's an example in NodeJS. Here is /usr/local/bin/yoursoftware:
#!/usr/bin/env node
var SOCKET_PATH = "/run/yoursoftware.sock";
function errorHandle(e) {
if (e) console.error(e), process.exit(1);
}
if (process.argv[0] === "--daemon") {
var logFile = require("fs").createWriteStream(
"/var/log/yoursoftware.log", {flags: "a"});
require('net').createServer(errorHandle)
.listen(SOCKET_PATH, s => s.pipe(logFile));
} else process.stdin.pipe(
require('net')
.createConnection(SOCKET_PATH, errorHandle)
);
In the example above, you can run many yoursoftware instances at the same time, and the stdin of each of the instances will be piped through to the daemon, which appends all the stuff it receives to a log file.
For non-Linux OSes and distros without systemd, you would use the (typically shell-scripted) startup system to begin your process at boot and the user would receive an error like could not connect to socket /run/yoursoftware.sock when something goes wrong with your daemon.

Resources