Debugging Thread blocked indefinitely in an MVar operation - ghc

A program compiled with GHC 8.2.1 ends with the following error message.
Foo: thread blocked indefinitely in an MVar operation
Given it's reproducible locally, is there any information available at runtime to narrow down source of the problem?

Related

MPICH2, the failure of one process will crash all other processes

I use MPICH2. When I launch processes with mpiexec, the failure of one process will crash all other processes. How to avoid this?
In MPICH, there is a flag called -disable-auto-cleanup which will prevent the process manager from automatically cleaning up all processes when a single process fails.
However, MPI itself does not have much support for fault tolerance and this is something that the Fault Tolerance Working Group is working on adding in a future version of the MPI Standard.
For now, the best you can do is change the default MPI Error Handler away from MPI_ERRORS_ARE_FATAL, which causes all processes to abort, to something else like MPI_ERRORS_RETURN which would return the error code to the application and allow it to do something else. However, you're not likely to be able to communicate anymore after a failure has occurred, especially if you are trying to use collective communication.

When can a running process in Unix receive a SIGTRAP (value 5) signal?

There is a one process called NSM in my code. Which abruptly got a SIGTRAP signal and it got killed. So just wanted to know that when can a process get a SIFTRAP signal?
as said in the comments it is a signal so can be triggered at any time. SIGTRAP signal is handled by the debugger; in the absence of a debugger it is quite natural for the process to be terminated. if you are using static libraries in your project then you are not linking them appropriately. with out further information in your question, I suggest you to check your linking with libraries.

SIGKILL init process (PID 1)

I'm facing a weird issue regarding sending signal 9 (SIGKILL) to the init process (PID 1).
As you may know, SIGKILL can't be ignored via signal handlers. As I tried sending SIGKILL to init, I noticed that nothing was happening; init would not get terminated. Trying to figure out this behaviour, I decided to attach myself to the init process with strace too see more clearly what was happening. Now comes the weird part. If I'm "looking" at the init process with strace and send it SIGKILL, the system crashes.
My question is why is this happening? Why does the system crash when I look at the process and why does it not crash when I'm not? As I said, in both cases I send SIGKILL to init. Tested on CentOS 6.5, Debian 7 and Arch.
Thanks!
The Linux kernel deliberately forces a system crash if init terminates (see http://lxr.free-electrons.com/source/kernel/exit.c?v=3.12#L501 and particularly the call to panic therein). Therefore, as a safeguard, the kernel will not deliver any fatal signal to init, and SIGKILL is not excepted (see http://lxr.free-electrons.com/ident?v=3.12&i=SIGNAL_UNKILLABLE) (however, the code flow is convoluted enough that I'm not sure, but I suspect a kernel-generated SIGSEGV or similar would go through).
Applying ptrace(2) (the system call that strace uses) to process 1 apparently disables this protection. This could be said to be a bug in the kernel. I am insufficiently skilled at digging around in the code to find this bug.
I do not know if other Unix variants apply the same crash-on-exit semantics or signal protection to init. It would be reasonable to have the OS perform a clean shutdown or reboot, rather than a panic, if init terminates (at least, if it does so by calling _exit) but as far as I know, all modern Unix variants have a dedicated system call to request this, instead (reboot(2)).

program with mpirun doesn't operate

I tried quantum computational program gamess with mpirun and it runned well yesterday. However, when I tried another calculation by the same procedure, it failed with next messeages.How can I fix it? I confirmed that there was no mpi process running and I cleaned cache memories also..
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with
errorcode 911.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
There are two reasons this could occur:
this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for
all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here).
Something in the application called MPI_ABORT on rank 0. You'll have to look at the code to figure out why it was called, but my guess is that there was some bad input. I don't know much about GAMESS though. You might try asking the GAMESS people directly. They have a website (http://www.msg.ameslab.gov/gamess/) which includes a way to contact them.

Process stop getting network data

We have a process (written in c++ /managed), which receives network data via tcpip.
After running the process for a while while tracking network load, it seems that network get into freeze state and the process does not getting data, there are other processes in the system that using networking (same nic) which operates normally.
the process gets out of this frozen situation by itself after several minutes.
Any idea what is happening?
Any counter i can track to see if my process reach some limitations ?
It is going to be very difficult to answer specifically,
-- without knowing what exactly is your process/application about,
-- whether it is a network chat application, or a file server/client, or ......
-- without other details about your process how it is implemented, what libraries it uses, if relevant to problem.
Also you haven't mentioned what OS and environment you are running this process under,
there is very little anyone can help . It could be anything, a busy wait loopl in your code, locking problems if its a multi-threaded code,....
Nonetheless , here are some options to check:
If its linux try below commands to debug and monitor the behaviour of the process and see what could be problem-
top
Check top to see ow much resources(CPU, memory) your process is using and if there is anything abnormally high values in CPU usage for it.
pstack
This should stack frames of the process executing at time of the problem.
netstat
Run this with necessary options (tcp/udp) to check what is the stae of the network sockets opened by your process
gcore -s -c
This forces your process to core when the mentioned problem happens, and then analyze that core file using gdb
gdb
and then use command where at gdb prompt to get full back trace of the process (which functions it was executing last and previous function calls.

Resources