check the available slots/resources before spawning MPI processes - mpi

I need ,before spawning a number of workers processes as shown below, to check if this number is available so that the below code will not crash if the requested slots is not available.
int numworkers = settings.Parallelism + 1; //omp_get_num_procs();
MPI_Comm_spawn("./processes/montecarlo", MPI_ARGV_NULL, numworkers,
MPI_INFO_NULL,
0, MPI_COMM_SELF, &workercomm, MPI_ERRCODES_IGNORE);
How to check available slots for mpi?
This is happening in the context of a service accepting several requests:
let us suppose:Total available slots: 13
REQ1: spawn 5 processes
Req2: spawn another 5 proc
Req3: will try to spawn 5 proc , but will crash because only 3 is available. how to check that only 3 is available?
or otherwise how to handle the crash that results from the non availability of resources. this crash is killing the service.

you can simply ask MPI_Comm_spawn() to return with an error code instead of aborting the application.
MPI_Comm_set_errhandler(MPI_COMM_SELF, MPI_ERRORS_RETURN);
int res = MPI_Comm_spawn("./processes/montecarlo", MPI_ARGV_NULL, numworkers,
MPI_INFO_NULL, 0, MPI_COMM_SELF, &workercomm, MPI_ERRCODES_IGNORE);
if (MPI_SUCCESS != res) {
// MPI_Comm_spawn failed
}

Related

OpenMPI "nooversubscribe" does not work when using MPI_Comm_spawn

I have a simple MPI_Comm_spawn-based code that spawns 4 processes:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char* argv[]){
MPI_Init(&argc, &argv);
MPI_Comm intercomm;
int final_nranks = 4, len;
char name[MPI_MAX_PROCESSOR_NAME];
MPI_Get_processor_name(name, &len);
MPI_Comm_get_parent(&intercomm);
if(intercomm == MPI_COMM_NULL){
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "hostfile", "hostfile");
MPI_Info_set(info, "pernode", "true");
MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, final_nranks, info, 0, MPI_COMM_WORLD, &intercomm, MPI_ERRCODES_IGNORE);
printf("PARENT %s\n", name);
} else {
printf("CHILD %s\n", name);
}
MPI_Finalize();
return 0;
}
When I run the code with
$ mpirun -np 2 --hostfile hostfile --map-by node ./a.out
I would expect something like:
PARENT node00
PARENT node01
CHILD node00
CHILD node01
CHILD node02
CHILD node03
I want that each process runs in a different node.
Since by default OpenMPI does not oversubscribe slots, I would expect that child processes were spawned in different nodes. But it is not the case. Instead, I get:
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[node00:50113] *** An error occurred in MPI_Init
[node00:50113] *** reported by process [3761700866,2]
[node00:50113] *** on a NULL communicator
[node00:50113] *** Unknown error
[node00:50113] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node00:50113] *** and potentially your MPI job)
[node00:200682] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure
[node00:200682] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[node00:200682] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
If I remove MPI_Info_set(info, "pernode", "true"); from the code, the run does not crash but child processes are spawned in the same host:
PARENT node00
PARENT node01
CHILD node00
CHILD node00
CHILD node00
CHILD node00
OpenMPI version 4.1.2

OpenMPI MPI_Comm_spawn() giving unreachable errors

I've got a master and worker system, where the master uses OpenMPI to spawn and communicate with its workers. I've used versions 4.0.4 and 3.1.6, both are giving similar errors.
master.cxx
#include <mpi.h>
void doStuff() {
// does stuff
}
int main(int argc, char *argv[]) {
int NUM_JOBS = 10;
MPI_Init(&argc, &argv);
char * args[] = {"Arg1","Arg2",NULL,};
MPI_Info mpi_info;
MPI_Info_create(&mpi_info);
MPI_Info_set(mpi_info, "add-hostfile", "nodefile");
MPI_Comm child_comm;
MPI_Comm_spawn("worker", args, NUM_JOBS, mpi_info, 0, MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);
doStuff();
MPI_Finalize();
return 0;
}
worker.cxx
#include <mpi.h>
void doStuff() {
// does stuff
}
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
doStuff();
MPI_Finalize();
return 0;
}
Both are compiled via cmake. I have 101 cores available to me, for 1 master and up to 100 workers, with 24 cores per node. For the first run, I'm just doing workers to prove that it can run. I had to add in some infiniband parameters
run1.sh
export OMPI_MCA_btl_openib_allow_ib=1
export OMPI_MCA_btl_openib_if_include="mlx4_0:1"
mpirun -n 101 --mca btl openib,self --hostfile nodefile worker
Now, I try to run the workers using the master. Note that the code as written above spawns 10 workers, so the 11 total procs can fit on one 24-core node.
run2.sh
export OMPI_MCA_btl_openib_allow_ib=1
export OMPI_MCA_btl_openib_if_include="mlx4_0:1"
mpirun -n 1 --mca btl openib,self --hostfile nodefile master
That works fine. So I up master.cxx NUM_JOBS=100 and repeat the run using run2.sh
This time, I get an MPI error and the job fails
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[53646,2],88]) is on host: mymachine04
Process 2 ([[53636,1],0]) is on host: unnknown1
BTLs attempted: self openib
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
[mymachine04:22480] [[53646,2],88] ORTE_ERROR_LOG: Unreachable in file dpm/dpm.c at line 493
That last line is repeated 77 total times. Presumably uncoincidentally, 101 procs - 24 procs per node = 77. The hostfile is just 101 lines of hostnames, and I get the same error whether I supply it or not. Interestingly, if I remove the first line of the hostfile before spawning my workers, on the grounds that that slot is already used by the master, I get an error about running out of slots. Any idea what I'm doing wrong, how I can get all of my processes to reach each other?

Setting the TCP keepalive interval on the Hiredis async context

I'm writing a wrapper around hiredis to enable publish / subscribe functionality with reconnects should a redis node go down.
I'm using the asynchronous redis API.
So I have a test harness that sets up a publisher and subscriber. The harness then shuts down the slave VM from which the subscriber is reading.
However, the disconnect callback isn't called until much later (when I'm destructing the Subscription object that contains the corresponding redisAsyncContext.
I thought that the solution to this might be using tcp keepalive.
So I found that there's a suitable redis function in net.h:
int redisKeepAlive (redisContext* c, int interval);
However, the following appears to show that the redisKeepAlive function has been omitted from the library on purpose:
$ nm libhiredis.a --demangle | grep redisKeepAlive
0000000000000030 T redisKeepAlive
U redisKeepAlive
$ nm libhiredis.a -u --demangle | grep redisKeepAlive
U redisKeepAlive
Certainly when I try to use the call, the linker complains:
Subscription.cpp:167: undefined reference to `redisKeepAlive(redisContext*, int)'
collect2: error: ld returned 1 exit status
Am I out of luck - is there a way to set the TCP keepalive interval on the Hiredis async context?
Update
I've found this:
int redisEnableKeepAlive(redisContext *c);
But setting this on the asyncContext->c and adjusting REDIS_KEEPALIVE_INTERVAL seems to have no effect.
I found that the implementation of redisKeepAlive contains code that shows how to get direct access to the underlying socket descriptor:
int redisKeepAlive(redisContext *c, int interval) {
int val = 1;
int fd = c->fd;
if (setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &val, sizeof(val)) == -1){
__redisSetError(c,REDIS_ERR_OTHER,strerror(errno));
return REDIS_ERR;
}
Maybe this'll help someone..

How can I run multiple threads inside of a given MPI process?

I understand that a single MPI job Launches many processes which could be run on multiple nodes.
How do I run multiple threads inside of a given MPI process using MPI_THREAD_MULTIPLE?
I was unable to find enough information in relation to the topic.
Assuming your using OpenMP to run multiple threads
You will write the OpenMP code as you would do with out the MPI. (this statement is over simplified)
When the MPI comes you need to consider how your process will communicate. MPI is not sending messages to individual threads but individual process. For that reason MPI provides four modes of interaction with threads.
MPI_THREAD_SINGLE: Provides only one thread
MPI_THREAD_FUNNELED: Can provide many threads, but only the master thread can make MPI calls. The master thread is the one who call MPI_Init...
MPI_THREAD_SERIALIZED: Can provide many threads, but only one can make MPI calls at a time.
MPI_THREAD_MULTIPE: Can provide many threads, and all of them can make MPI call at any time.
You need to specify the mode you want at MPI_Init, which becomes:
MPI_Init_thread(&argc, &argv, HERE_PUT_THE_MODE_YOU_NEED, PROVIDED_MODE)
Ex:
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPE, &provided)
At the provided field the MPI_Init_thread returns the provided mode. Make sure that you got a mode that your code can cope with it.
Also, avoid the use of MPI_Probe and MPI_IProbe, because they are not thread save. You should use MPI_Mprobe and MPI_Improbe.
Here is a simple 'hello world' example as #ab2050 asked:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>
#include "mpi.h"
int main(int argc, char *argv[]) {
int provided;
int rank;
MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
if (provided != MPI_THREAD_FUNNELED) {
fprintf(stderr, "Warning MPI did not provide MPI_THREAD_FUNNELED\n");
}
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
#pragma omp parallel default(none), \
shared(rank), \
shared(ompi_mpi_comm_world), \
shared(ompi_mpi_int), \
shared(ompi_mpi_char)
{
printf("Hello from thread %d at rank %d parallel region\n",
omp_get_thread_num(), rank);
#pragma omp master
{
char helloWorld[12];
if (rank == 0) {
strcpy(helloWorld, "Hello World");
MPI_Send(helloWorld, 12, MPI_CHAR, 1, 0, MPI_COMM_WORLD);
printf("Rank %d send: %s\n", rank, helloWorld);
}
else {
MPI_Recv(helloWorld, 12, MPI_CHAR, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Rank %d received: %s\n", rank, helloWorld);
}
}
}
MPI_Finalize();
return 0;
}
You have to run this code on two process. Because 'MPI_THREAD_FUNNELED' is selected only the master thread makes MPI calls.
The following variables are specified at OpenMP data scoping place
because is needed by gcc version 6.1.1. Older versions like 4.8 do not require to declare them.
ompi_mpi_comm_world
ompi_mpi_char

Killing a Haskell binary

If I press Ctrl+C, this throws an exception (always in thread 0?). You can catch this if you want - or, more likely, run some cleanup and then rethrow it. But the usual result is to bring the program to a halt, one way or another.
Now suppose I use the Unix kill command. As I understand it, kill basically sends a (configurable) Unix signal to the specified process.
How does the Haskell RTS respond to this? Is it documented somewhere? I would imagine that sending SIGTERM would have the same effect as pressing Ctrl+C, but I don't know that for a fact...
(And, of course, you can use kill to send signals that have nothing to do with killing at all. Again, I would imagine that the RTS would ignore, say, SIGHUP or SIGPWR, but I don't know for sure.)
Googling "haskell catch sigterm" led me to System.Posix.Signals of the unix package, which has a rather nice looking system for catching and handling these signals. Just scroll down to the "Handling Signals" section.
EDIT: A trivial example:
import System.Posix.Signals
import Control.Concurrent (threadDelay)
import Control.Concurrent.MVar
termHandler :: MVar () -> Handler
termHandler v = CatchOnce $ do
putStrLn "Caught SIGTERM"
putMVar v ()
loop :: MVar () -> IO ()
loop v = do
putStrLn "Still running"
threadDelay 1000000
val <- tryTakeMVar v
case val of
Just _ -> putStrLn "Quitting" >> return ()
Nothing -> loop v
main = do
v <- newEmptyMVar
installHandler sigTERM (termHandler v) Nothing
loop v
Notice that I had to use an MVar to inform loop that it was time to quit. I tried using exitSuccess from System.Exit, but since termHandler executes in a thread that isn't the main one, it can't cause the program to exit. There might be an easier way to do it, but I've never used this module before so I don't know of one. I tested this on Ubuntu 12.10.
Searching for "signal" in the ghc source code on github revealed the installDefaultSignals function:
void
initDefaultHandlers(void)
{
struct sigaction action,oact;
// install the SIGINT handler
action.sa_handler = shutdown_handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGINT, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGINT handler");
}
#if defined(HAVE_SIGINTERRUPT)
siginterrupt(SIGINT, 1); // isn't this the default? --SDM
#endif
// install the SIGFPE handler
// In addition to handling SIGINT, also handle SIGFPE by ignoring it.
// Apparently IEEE requires floating-point exceptions to be ignored by
// default, but alpha-dec-osf3 doesn't seem to do so.
// Commented out by SDM 2/7/2002: this causes an infinite loop on
// some architectures when an integer division by zero occurs: we
// don't recover from the floating point exception, and the
// program just generates another one immediately.
#if 0
action.sa_handler = SIG_IGN;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGFPE, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGFPE handler");
}
#endif
#ifdef alpha_HOST_ARCH
ieee_set_fp_control(0);
#endif
// ignore SIGPIPE; see #1619
// actually, we use an empty signal handler rather than SIG_IGN,
// so that SIGPIPE gets reset to its default behaviour on exec.
action.sa_handler = empty_handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGPIPE, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGPIPE handler");
}
set_sigtstp_action(rtsTrue);
}
From that, you can see that GHC installs at least SIGINT and SIGPIPE handlers. I don't know if there are any other signal handlers hidden in the source code.

Resources