mpiexec checkpointing error (RPi) - runtime-error

When I try to run an application (just a simple hello_world.c doesn't work) I receive this error every time:
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/ -ckpoint-interval 10 -machinefile /tmp/machinefile -n 1 ./app_name
[proxy:0:0#masterpi] requesting checkpoint
[proxy:0:0#masterpi] checkpoint completed
[proxy:0:0#masterpi] requesting checkpoint
[proxy:0:0#masterpi] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:111): Previous checkpoint has not completed.[proxy:0:0#masterpi] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:905): checkpoint suspend failed
[proxy:0:0#masterpi] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0#masterpi] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec#masterpi] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
[mpiexec#masterpi] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec#masterpi] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec#masterpi] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
I want just to make a checkpoint and nothing else (and restart later).
Thanks in advance
UPDATE:
I have tried with MPICH2, no chance. Or maybe I'm wrong somewhere...
pi#raspberrypi ~ $ mpiexec -n 1 -ckpointlib blcr -ckpoint-prefix /tmp/ -ckpoint-interval 2 ./test3
Count to: 0
[proxy:0:0#raspberrypi] requesting checkpoint
[proxy:0:0#raspberrypi] checkpoint completed
Count to: 1
[proxy:0:0#raspberrypi] requesting checkpoint
[proxy:0:0#raspberrypi] HYDT_ckpoint_checkpoint (/tmp/mpich/mpich2-1.5/src/pm/hydra/tools/ckpoint/ckpoint.c:111): Previous checkpoint has not completed.[proxy:0:0#raspberrypi] HYD_pmcd_pmip_control_cmd_cb (/tmp/mpich/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmip_cb.c:902): checkpoint suspend failed
[proxy:0:0#raspberrypi] HYDT_dmxu_poll_wait_for_event (/tmp/mpich/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0#raspberrypi] main (/tmp/mpich/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmip.c:210): demux engine error waiting for event
[mpiexec#raspberrypi] control_cb (/tmp/mpich/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:201): assert (!closed) failed
[mpiexec#raspberrypi] HYDT_dmxu_poll_wait_for_event (/tmp/mpich/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
[mpiexec#raspberrypi] HYD_pmci_wait_for_completion (/tmp/mpich/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:196): error waiting for event
[mpiexec#raspberrypi] main (/tmp/mpich/mpich2-1.5/src/pm/hydra/ui/mpich/mpiexec.c:325): process manager error waiting for completion
Test3-Code:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char* argv[]) {
int rank;
int size;
int i = 0;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Status status;
if (rank == 0) {
for(i; i <=100; i++){
int j = 0;
while(j < 100000000){
j++;
}
printf("Count to: %i\n", i);
}
} else {
}
MPI_Finalize();
return 0;
}
I just need to have one successful checkpoint and to show the restart.
If someone has a working example (irrelevant what it makes, simple working "Hello World" would make me happy!) I would be very glad.
Happy new year!

Unfortunately, the checkpoint/restart code in MPICH 3.0.4 is known to be buggy at the moment. That will hopefully get fixed in a future release. It looks like you're probably using it correctly. It's possible that if you go back to a previous version, you might have better luck.

Here the problem was with the too small interval for checkpointing.
Setting it to 20s or more has solved this (but not the other :( ) problem.

Related

OpenMPI "nooversubscribe" does not work when using MPI_Comm_spawn

I have a simple MPI_Comm_spawn-based code that spawns 4 processes:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char* argv[]){
MPI_Init(&argc, &argv);
MPI_Comm intercomm;
int final_nranks = 4, len;
char name[MPI_MAX_PROCESSOR_NAME];
MPI_Get_processor_name(name, &len);
MPI_Comm_get_parent(&intercomm);
if(intercomm == MPI_COMM_NULL){
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "hostfile", "hostfile");
MPI_Info_set(info, "pernode", "true");
MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, final_nranks, info, 0, MPI_COMM_WORLD, &intercomm, MPI_ERRCODES_IGNORE);
printf("PARENT %s\n", name);
} else {
printf("CHILD %s\n", name);
}
MPI_Finalize();
return 0;
}
When I run the code with
$ mpirun -np 2 --hostfile hostfile --map-by node ./a.out
I would expect something like:
PARENT node00
PARENT node01
CHILD node00
CHILD node01
CHILD node02
CHILD node03
I want that each process runs in a different node.
Since by default OpenMPI does not oversubscribe slots, I would expect that child processes were spawned in different nodes. But it is not the case. Instead, I get:
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[node00:50113] *** An error occurred in MPI_Init
[node00:50113] *** reported by process [3761700866,2]
[node00:50113] *** on a NULL communicator
[node00:50113] *** Unknown error
[node00:50113] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node00:50113] *** and potentially your MPI job)
[node00:200682] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure
[node00:200682] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[node00:200682] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
If I remove MPI_Info_set(info, "pernode", "true"); from the code, the run does not crash but child processes are spawned in the same host:
PARENT node00
PARENT node01
CHILD node00
CHILD node00
CHILD node00
CHILD node00
OpenMPI version 4.1.2

OpenMPI MPI_Comm_spawn() giving unreachable errors

I've got a master and worker system, where the master uses OpenMPI to spawn and communicate with its workers. I've used versions 4.0.4 and 3.1.6, both are giving similar errors.
master.cxx
#include <mpi.h>
void doStuff() {
// does stuff
}
int main(int argc, char *argv[]) {
int NUM_JOBS = 10;
MPI_Init(&argc, &argv);
char * args[] = {"Arg1","Arg2",NULL,};
MPI_Info mpi_info;
MPI_Info_create(&mpi_info);
MPI_Info_set(mpi_info, "add-hostfile", "nodefile");
MPI_Comm child_comm;
MPI_Comm_spawn("worker", args, NUM_JOBS, mpi_info, 0, MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);
doStuff();
MPI_Finalize();
return 0;
}
worker.cxx
#include <mpi.h>
void doStuff() {
// does stuff
}
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
doStuff();
MPI_Finalize();
return 0;
}
Both are compiled via cmake. I have 101 cores available to me, for 1 master and up to 100 workers, with 24 cores per node. For the first run, I'm just doing workers to prove that it can run. I had to add in some infiniband parameters
run1.sh
export OMPI_MCA_btl_openib_allow_ib=1
export OMPI_MCA_btl_openib_if_include="mlx4_0:1"
mpirun -n 101 --mca btl openib,self --hostfile nodefile worker
Now, I try to run the workers using the master. Note that the code as written above spawns 10 workers, so the 11 total procs can fit on one 24-core node.
run2.sh
export OMPI_MCA_btl_openib_allow_ib=1
export OMPI_MCA_btl_openib_if_include="mlx4_0:1"
mpirun -n 1 --mca btl openib,self --hostfile nodefile master
That works fine. So I up master.cxx NUM_JOBS=100 and repeat the run using run2.sh
This time, I get an MPI error and the job fails
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[53646,2],88]) is on host: mymachine04
Process 2 ([[53636,1],0]) is on host: unnknown1
BTLs attempted: self openib
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
[mymachine04:22480] [[53646,2],88] ORTE_ERROR_LOG: Unreachable in file dpm/dpm.c at line 493
That last line is repeated 77 total times. Presumably uncoincidentally, 101 procs - 24 procs per node = 77. The hostfile is just 101 lines of hostnames, and I get the same error whether I supply it or not. Interestingly, if I remove the first line of the hostfile before spawning my workers, on the grounds that that slot is already used by the master, I get an error about running out of slots. Any idea what I'm doing wrong, how I can get all of my processes to reach each other?

How can I run multiple threads inside of a given MPI process?

I understand that a single MPI job Launches many processes which could be run on multiple nodes.
How do I run multiple threads inside of a given MPI process using MPI_THREAD_MULTIPLE?
I was unable to find enough information in relation to the topic.
Assuming your using OpenMP to run multiple threads
You will write the OpenMP code as you would do with out the MPI. (this statement is over simplified)
When the MPI comes you need to consider how your process will communicate. MPI is not sending messages to individual threads but individual process. For that reason MPI provides four modes of interaction with threads.
MPI_THREAD_SINGLE: Provides only one thread
MPI_THREAD_FUNNELED: Can provide many threads, but only the master thread can make MPI calls. The master thread is the one who call MPI_Init...
MPI_THREAD_SERIALIZED: Can provide many threads, but only one can make MPI calls at a time.
MPI_THREAD_MULTIPE: Can provide many threads, and all of them can make MPI call at any time.
You need to specify the mode you want at MPI_Init, which becomes:
MPI_Init_thread(&argc, &argv, HERE_PUT_THE_MODE_YOU_NEED, PROVIDED_MODE)
Ex:
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPE, &provided)
At the provided field the MPI_Init_thread returns the provided mode. Make sure that you got a mode that your code can cope with it.
Also, avoid the use of MPI_Probe and MPI_IProbe, because they are not thread save. You should use MPI_Mprobe and MPI_Improbe.
Here is a simple 'hello world' example as #ab2050 asked:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>
#include "mpi.h"
int main(int argc, char *argv[]) {
int provided;
int rank;
MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
if (provided != MPI_THREAD_FUNNELED) {
fprintf(stderr, "Warning MPI did not provide MPI_THREAD_FUNNELED\n");
}
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
#pragma omp parallel default(none), \
shared(rank), \
shared(ompi_mpi_comm_world), \
shared(ompi_mpi_int), \
shared(ompi_mpi_char)
{
printf("Hello from thread %d at rank %d parallel region\n",
omp_get_thread_num(), rank);
#pragma omp master
{
char helloWorld[12];
if (rank == 0) {
strcpy(helloWorld, "Hello World");
MPI_Send(helloWorld, 12, MPI_CHAR, 1, 0, MPI_COMM_WORLD);
printf("Rank %d send: %s\n", rank, helloWorld);
}
else {
MPI_Recv(helloWorld, 12, MPI_CHAR, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Rank %d received: %s\n", rank, helloWorld);
}
}
}
MPI_Finalize();
return 0;
}
You have to run this code on two process. Because 'MPI_THREAD_FUNNELED' is selected only the master thread makes MPI calls.
The following variables are specified at OpenMP data scoping place
because is needed by gcc version 6.1.1. Older versions like 4.8 do not require to declare them.
ompi_mpi_comm_world
ompi_mpi_char

MPI: Why does my MPICH program fails for large no. of processes?

/* C Example */
#include <mpi.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
int main (int argc, char* argv[])
{
int rank, size;
int buffer_length = MPI_MAX_PROCESSOR_NAME;
char hostname[buffer_length];
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
MPI_Get_processor_name(hostname, &buffer_length); /* get hostname */
printf( "Hello world from process %d running on %s of %d\n", rank, hostname, size );
MPI_Finalize();
return 0;
}
The above program compiles and run successfully on ubuntu 12.04 for smaller no. of processes. But it fails when I try to execute with 1000s of processes. Why it is so?
I am expecting that scheduler would keep the threads in queue and can dispatch one by one (I am running this code on a single core machine)
Why the following error is coming for large no. of processes and how to resolve this issue?
root#ubuntu:/home# mpiexec -n 1000 ./hello
[proxy:0:0#ubuntu] HYDU_create_process (./utils/launch/launch.c:26): pipe error (Too many open files)
[proxy:0:0#ubuntu] launch_procs (./pm/pmiserv/pmip_cb.c:751): create process returned error
[proxy:0:0#ubuntu] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:935): launch_procs returned error
[proxy:0:0#ubuntu] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0#ubuntu] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
Killed
You are running into the open file limit on your system. Default in Ubuntu is 1024. You can try raising the limit in your session with the ulimit command.
ulimit -n 2048

How does a zombie process manifest itself?

kill -s SIGCHLD
The above is the code for killing any zombie process, But my question is:
Is there any way by which a Zombie process manifest itself??
steenhulthin is correct, but until it's moved someone may as well answer it here. A zombie process exists between the time that a child process terminates and the time that the parent calls one of the wait() functions to get its exit status.
A simple example:
/* Simple example that creates a zombie process. */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void)
{
pid_t cpid;
char s[4];
int status;
cpid = fork();
if (cpid == -1) {
puts("Whoops, no child process, bye.");
return 1;
}
if (cpid == 0) {
puts("Child process says 'goodbye cruel world.'");
return 0;
}
puts("Parent process now cruelly lets its child exist as\n"
"a zombie until the user presses enter.\n"
"Run 'ps aux | grep mkzombie' in another window to\n"
"see the zombie.");
fgets(s, sizeof(s), stdin);
wait(&status);
return 0;
}

Resources