MPICH2, the failure of one process will crash all other processes

MPICH2, the failure of one process will crash all other processes - mpi

I use MPICH2. When I launch processes with mpiexec, the failure of one process will crash all other processes. How to avoid this?

In MPICH, there is a flag called -disable-auto-cleanup which will prevent the process manager from automatically cleaning up all processes when a single process fails.
However, MPI itself does not have much support for fault tolerance and this is something that the Fault Tolerance Working Group is working on adding in a future version of the MPI Standard.
For now, the best you can do is change the default MPI Error Handler away from MPI_ERRORS_ARE_FATAL, which causes all processes to abort, to something else like MPI_ERRORS_RETURN which would return the error code to the application and allow it to do something else. However, you're not likely to be able to communicate anymore after a failure has occurred, especially if you are trying to use collective communication.

Related

How to add fault tolerance support to an existing MPI based system such that the system continues even after a machine goes down?

I am trying to modify an MPI based system to add fault tolerance (process should continue if machines go down).
I was thinking of using Apache Zookeeper to handle the machine failure case. Is it the best way to proceed further? Also, what happens to the MPI calls (like send, receive, broadcast) when using Zookeeper? Send/Recv calls in MPI are typically bound to machine id (source/destination); now in an environment where machines fail and may never come back, how would it work?
What will be the performance drop by porting the existing application from MPI to Zookeeper based solution?

SIGKILL init process (PID 1)

I'm facing a weird issue regarding sending signal 9 (SIGKILL) to the init process (PID 1).
As you may know, SIGKILL can't be ignored via signal handlers. As I tried sending SIGKILL to init, I noticed that nothing was happening; init would not get terminated. Trying to figure out this behaviour, I decided to attach myself to the init process with strace too see more clearly what was happening. Now comes the weird part. If I'm "looking" at the init process with strace and send it SIGKILL, the system crashes.
My question is why is this happening? Why does the system crash when I look at the process and why does it not crash when I'm not? As I said, in both cases I send SIGKILL to init. Tested on CentOS 6.5, Debian 7 and Arch.
Thanks!

The Linux kernel deliberately forces a system crash if init terminates (see http://lxr.free-electrons.com/source/kernel/exit.c?v=3.12#L501 and particularly the call to panic therein). Therefore, as a safeguard, the kernel will not deliver any fatal signal to init, and SIGKILL is not excepted (see http://lxr.free-electrons.com/ident?v=3.12&i=SIGNAL_UNKILLABLE) (however, the code flow is convoluted enough that I'm not sure, but I suspect a kernel-generated SIGSEGV or similar would go through).
Applying ptrace(2) (the system call that strace uses) to process 1 apparently disables this protection. This could be said to be a bug in the kernel. I am insufficiently skilled at digging around in the code to find this bug.
I do not know if other Unix variants apply the same crash-on-exit semantics or signal protection to init. It would be reasonable to have the OS perform a clean shutdown or reboot, rather than a panic, if init terminates (at least, if it does so by calling _exit) but as far as I know, all modern Unix variants have a dedicated system call to request this, instead (reboot(2)).

Potential Concerns in Stopping Meteor Ungracefully

Just getting into Meteor, which by many accounts seems like a great project. One potential issue (which it may not be) is there doesn't seem to be a meteor stop or another programmatic way to shut down meteor gracefully. Please let me know if I am wrong about this!
Are there potential concerns about maintaining database integrity (for example), if we interrupt the process using CTRL-C or shutting it down via an Activity Monitor? And are there steps we can take to reduce or eliminate such issues?
Caveat: I recognize the above questions are somewhat vague, and I understand that this is usually considered harmful on Stack, but I hope they are still answerable ones.
Thanks,

It does look like there is a cleanup which takes place before the process is terminated (https://github.com/meteor/meteor/blob/master/tools/cleanup.js).
The first signal sent is SIGINT which is a polite way to ask the process to shut down (and give it time to finish its last running thread)
With database integrity, the mongod process also tries to clean itself up before it shuts down & it has a recovery mechanism (from the journal files) on a quick recovery while restarting if forced to shutdown.
That being said, in the middle of a longer running thread I'm not too sure if it's allowed to finish or its killed immediately. But meteor does attempt to give it a chance to have a graceful termination at first, and then escalates it to a SIGHUP then finally a SIGTERM (which is still a graceful termination signal). At no point does meteor force or send a SIGKILL or SIGSTOP.
So meteor apps should be safe from Ctrl+C termination. With activity monitor termination it depends on what type of signal its sent (i.e Force Quit or just Quit)

So to add some closure to this, if your mongodb is externally managed, i.e. on a deployment production server meteor doesn't stop it as mongo-runner.js notes:
// Since it is externally managed, asking it to actually stop would be
// impolite, so our stoppable handle is a noop
if (process.env.MONGO_URL) {
launch_callback();
return handle;
}

Kill an mpi process

I would like to know if there is a way that an MPI process send a kill signal to another MPI process?
Or differently, is there a way to exit from an MPI environment graciously, when one of the process is still active? (i.e. mpi_abort() prints an error message).
Thanks

No, this is not possible within an MPI application using the MPI library.
Individual processes would not be aware of the location of the other processes, nor of the process IDs of the other processes - and there is nothing in the MPI spec to make the kill you are wanting.
If you were to do this manually, then you'd need to MPI_Alltoall to exchange process IDs and hostnames across the system, and then you would need to spawn ssh/rsh to visit the required node when you wanted to kill something. All in all, it's not portable, not clean.
MPI_Abort is the right way to do what you are trying to achieve. From the Open MPI manual:
"This routine makes a "best attempt" to abort all tasks in the group of comm." (ie. MPI_Abort(MPI_COMM_WORLD, -1) is what you need.
Any output during MPI_Abort would be machine specific - so you may, or may not, receive the error message you mention.

POSIX Threads: are pthreads_cond_wait() and others systemcalls?

The POSIX standard defines several routines for thread synchronization, based on concepts like mutexes and conditional variables.
my question is now: are these (like e.g. pthreads_cond_init(), pthreads_mutex_init(), pthreads_mutex_lock()... and so on) system calls or just library calls? i know they are included via "pthread.h", but do they finally result in a system call and therefore are implemented in the kernel of the operating system?

On Linux a pthread mutex makes a "futex" system call, but only if the lock is contended. That means that taking a lock no other thread wants is almost free.
In a similar way, sending a condition signal is only expensive when there is someone waiting for it.
So I believe that your answer is that pthread functions are library calls that sometimes result in a system call.

Whenever possible, the library avoids trapping into the kernel for performance reasons. If you already have some code that uses these calls you may want to take a look at the output from running your program with strace to better understand how often it is actually making system calls.

I never looked into all those library call , but as far as I understand they all involve kernel operations as they are supposed to provide synchronisations between process and/or threads at global level - I mean at the OS level.
The kernel need to maintain for a mutex, for instance, a thread list: threads that are currently sleeping, waiting that a locked mutex get released. When the thread that currently lock/owns that mutex invokes the kernel with pthread_mutex_release(), the kernel system call will browse that aforementioned list to get the higher priority thread that is waiting for the mutex release, flag the new mutex owner into the mutex kernel structure, and then will give away the cpu (aka "ontect switch") to the newly owner thread, thus this process will return from the posix library call pthread_mutex_lock().
I only see a cooperation with the kernel when it involves IPC between processes (I am not talking between threads at a single process level). Therefore I expect those library call to invoke the kernel, so.

When you compile a program on Linux that uses pthreads, you have to add -lphtread to the compiler options. by doing this, you tell the linker to link libpthreads. So, on linux, they are calls to a library.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

MPICH2, the failure of one process will crash all other processes - mpi

I use MPICH2. When I launch processes with mpiexec, the failure of one process will crash all other processes. How to avoid this?

Related

How to add fault tolerance support to an existing MPI based system such that the system continues even after a machine goes down?

SIGKILL init process (PID 1)

Potential Concerns in Stopping Meteor Ungracefully

Kill an mpi process

POSIX Threads: are pthreads_cond_wait() and others systemcalls?

Categories

Resources