I have an assignment to implement simple fault-tolerance in an OpenMPI application. The problem we are having is that, despite setting the MPI error handling to MPI_ERRORS_RETURN, when one of our nodes is unplugged from the cluster we get the following error on the next MPI_ call after a lengthy hang:
[btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() failed: Connection timed out (110)
My take from this is that it is not possible to continue processing on all other nodes when one node drops from the network with OpenMPI. Can anyone confirm this for me, or point me in a direction for preventing the btl_tcp_endpoint error?
We are using OpenMPI version 1.6.5.
The MPI_ERRORS_RETURN code paths are not well tested (and probably not well implemented) in Open MPI. They simply haven't been a priority, so we've never really done much work in this area.
Sorry.
Related
I am trying to do an assignment (from another univ's coursepage) which has a line in the starter code (Python with mininet) as
os.system("rmmod tcp_probe; modprobe tcp_probe full=1")
Popen("cat /proc/net/tcpprobe > %s" % (outfile), shell=True)
which gives an error saying that tcp_probe has been disabled.
I found out by googling that tcp_probe has been deprecated in the linux kernel. However it just asks me to 'do the same using ftrace'. I have tried searching online but could not find out how to use ftrace to achieve the same.
Any help is appreciated.
tldr;
Unfortunately, I could not find any way to get TCP tracepoints to work in Mininet, which is what ftrace would uses. The reason for this is that the mininet's /sys/kern/debug directory is empty, i.e., tracing cannot be enabled.
Options:
1. Using mininet-tracing (not recommended)
There probably is a way to get the kernel to include this, or you could use https://github.com/mininet/mininet-tracing which might get you what you need, but I have seen reports that it is slow, and has been updated 9 years ago...
2. Writing a new kernel module (I have tested this and it works)
What I found as a solution instead, was to force printing for the TCP I had in mind and then take a look at the results that way. In order to enable this, you would essentially need to extend some of TCP's behaviour and (quite possibly) reuse the TCP module you have in mind. And create a new kernel module.
Here I have provided an example that you can use. It logs socket information on each ACK. I also included a Makefile and a script to load/unload the kernel module. After you enable the module and let some traffic flow (assuming you are on a debian-based linux) you should be able to find the logs of your TCP in /var/log/kern.log.
Note:
This is a hacky way around the issue, but was good enough for my needs, and hopefully can help someone else too.
I use MPICH2. When I launch processes with mpiexec, the failure of one process will crash all other processes. How to avoid this?
In MPICH, there is a flag called -disable-auto-cleanup which will prevent the process manager from automatically cleaning up all processes when a single process fails.
However, MPI itself does not have much support for fault tolerance and this is something that the Fault Tolerance Working Group is working on adding in a future version of the MPI Standard.
For now, the best you can do is change the default MPI Error Handler away from MPI_ERRORS_ARE_FATAL, which causes all processes to abort, to something else like MPI_ERRORS_RETURN which would return the error code to the application and allow it to do something else. However, you're not likely to be able to communicate anymore after a failure has occurred, especially if you are trying to use collective communication.
A program is using a slow unstable network. Frequent timeouts, slow connection etc.
The program uses a few rest APIs and even ssh. The previous developer solved timeout problems by checking for error message and running the same instruction again until it worked. However, sometimes the network connection simply goes dark for a few hours and we have to wait.
We cannot always keep the program alive and waiting (to save power) and end up serializing the state and have another program check for network activity and then wake it up and resume from a serialized state.
The code quality is becoming more and more hackish due to these workarounds.
are there any best practices or solutions when it comes to writing programs for unstable networks? I'm wondering if someone already solved that problem through a library or some book you could recommend?
Thank you
PS: I have no control over the network infrastructure.
Which signal does gdb send when attaching to a process? Does this work the same for different UNIXes. E.g. Linux and Mac OS X?
So far I only found out, that SIGTRAP is used to implement breakpoints. Is it used for attaching aswell?
AFAIK it does not need any signals to attach. It just suspends the "inferior" by calling ptrace. It also reads debugged process memory and registers using this calls and it can request instruction single stepping (provided it's implemented on that port of linux), etc.
Software breakpoints are implemented by placing at right location instruction that triggers "trap" or something similar when reached, but debugged process can run full speed until then.
Also (next to reading man ptrace, as already mentioned) see ptrace explanation on wikipedia.
I would like to know if there is a way that an MPI process send a kill signal to another MPI process?
Or differently, is there a way to exit from an MPI environment graciously, when one of the process is still active? (i.e. mpi_abort() prints an error message).
Thanks
No, this is not possible within an MPI application using the MPI library.
Individual processes would not be aware of the location of the other processes, nor of the process IDs of the other processes - and there is nothing in the MPI spec to make the kill you are wanting.
If you were to do this manually, then you'd need to MPI_Alltoall to exchange process IDs and hostnames across the system, and then you would need to spawn ssh/rsh to visit the required node when you wanted to kill something. All in all, it's not portable, not clean.
MPI_Abort is the right way to do what you are trying to achieve. From the Open MPI manual:
"This routine makes a "best attempt" to abort all tasks in the group of comm." (ie. MPI_Abort(MPI_COMM_WORLD, -1) is what you need.
Any output during MPI_Abort would be machine specific - so you may, or may not, receive the error message you mention.