How does a zombie process manifest itself? - unix

kill -s SIGCHLD
The above is the code for killing any zombie process, But my question is:
Is there any way by which a Zombie process manifest itself??

steenhulthin is correct, but until it's moved someone may as well answer it here. A zombie process exists between the time that a child process terminates and the time that the parent calls one of the wait() functions to get its exit status.
A simple example:
/* Simple example that creates a zombie process. */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void)
{
pid_t cpid;
char s[4];
int status;
cpid = fork();
if (cpid == -1) {
puts("Whoops, no child process, bye.");
return 1;
}
if (cpid == 0) {
puts("Child process says 'goodbye cruel world.'");
return 0;
}
puts("Parent process now cruelly lets its child exist as\n"
"a zombie until the user presses enter.\n"
"Run 'ps aux | grep mkzombie' in another window to\n"
"see the zombie.");
fgets(s, sizeof(s), stdin);
wait(&status);
return 0;
}

Related

How to do a non-blocking read on a non-socket fd

Is there a way to do a single read() in non-blocking mode on a pipe/terminal/etc, the way I can do it on a socket with recv(MSG_DONTWAIT)?
The reason I need that is because I cannot find any guarantee that a read() on a file-descriptor returned as ready for reading by select() or poll() will not block.
I know can make the file descriptor non-blocking with fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | O_NONBLOCK) but this will change the mode on that file descriptor globally, not just in the calling thread/process. For example:
% perl -MFcntl=F_SETFL,F_GETFL,O_NONBLOCK -e 'fcntl STDIN, F_SETFL, fcntl(STDIN, F_GETFL, 0) | O_NONBLOCK; select undef, undef, undef, undef'
^Z # put it in the background
% cat
cat: -: Resource temporarily unavailable
This will also make the fd non blocking for both reading and writing, which may confuse the hell out of another process doing the opposite on the same fd, as in:
non_blocking_read | filter | blocking_write
One way I think of is to save the file status flags on starting up and SIGCONT, and restore them on exiting and on SIGTSTP (just the way it's done with the termios settings), but this is very limited, race-prone, and will leave a mess behind in the case where the program exited abnormally.
Putting a save/restore with fcntl() before/after each read() also feels ugly and dumb, and may have other issues too. The same with an ioctl(FIONREAD) just before the read (which I'm not even sure it will work reliably with any fd; assurances in that direction will be welcome, though).
I would be happy even with system specific (eg. linux or bsd-only) solutions.
For reference, here is a discussion about fixing it in linux; the idea didn't seem to get anywhere, though.
A Linux only solution would be to reopen the file descriptor via
"/dev/stdin"|"/dev/tty"|"/dev/fd/$fd".
C example:
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
int main()
{
int fd;
char buf[8];
int flags;
if(0>(fd=open("/dev/stdin", O_RDONLY))) return 1;
if(0>(flags = fcntl(fd,F_GETFL))) return 1;
if(0>(flags = fcntl(fd,F_SETFL,flags|O_NONBLOCK))) return 1;
sleep(3);
puts("reading");
ssize_t nr = read(fd, buf, sizeof(buf));
printf("read=%zd\n", nr);
return 0;
}
Unlike a duplicated file descriptor, a reopened filedescriptor will have independent file status flags.

How can I run multiple threads inside of a given MPI process?

I understand that a single MPI job Launches many processes which could be run on multiple nodes.
How do I run multiple threads inside of a given MPI process using MPI_THREAD_MULTIPLE?
I was unable to find enough information in relation to the topic.
Assuming your using OpenMP to run multiple threads
You will write the OpenMP code as you would do with out the MPI. (this statement is over simplified)
When the MPI comes you need to consider how your process will communicate. MPI is not sending messages to individual threads but individual process. For that reason MPI provides four modes of interaction with threads.
MPI_THREAD_SINGLE: Provides only one thread
MPI_THREAD_FUNNELED: Can provide many threads, but only the master thread can make MPI calls. The master thread is the one who call MPI_Init...
MPI_THREAD_SERIALIZED: Can provide many threads, but only one can make MPI calls at a time.
MPI_THREAD_MULTIPE: Can provide many threads, and all of them can make MPI call at any time.
You need to specify the mode you want at MPI_Init, which becomes:
MPI_Init_thread(&argc, &argv, HERE_PUT_THE_MODE_YOU_NEED, PROVIDED_MODE)
Ex:
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPE, &provided)
At the provided field the MPI_Init_thread returns the provided mode. Make sure that you got a mode that your code can cope with it.
Also, avoid the use of MPI_Probe and MPI_IProbe, because they are not thread save. You should use MPI_Mprobe and MPI_Improbe.
Here is a simple 'hello world' example as #ab2050 asked:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>
#include "mpi.h"
int main(int argc, char *argv[]) {
int provided;
int rank;
MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
if (provided != MPI_THREAD_FUNNELED) {
fprintf(stderr, "Warning MPI did not provide MPI_THREAD_FUNNELED\n");
}
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
#pragma omp parallel default(none), \
shared(rank), \
shared(ompi_mpi_comm_world), \
shared(ompi_mpi_int), \
shared(ompi_mpi_char)
{
printf("Hello from thread %d at rank %d parallel region\n",
omp_get_thread_num(), rank);
#pragma omp master
{
char helloWorld[12];
if (rank == 0) {
strcpy(helloWorld, "Hello World");
MPI_Send(helloWorld, 12, MPI_CHAR, 1, 0, MPI_COMM_WORLD);
printf("Rank %d send: %s\n", rank, helloWorld);
}
else {
MPI_Recv(helloWorld, 12, MPI_CHAR, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Rank %d received: %s\n", rank, helloWorld);
}
}
}
MPI_Finalize();
return 0;
}
You have to run this code on two process. Because 'MPI_THREAD_FUNNELED' is selected only the master thread makes MPI calls.
The following variables are specified at OpenMP data scoping place
because is needed by gcc version 6.1.1. Older versions like 4.8 do not require to declare them.
ompi_mpi_comm_world
ompi_mpi_char

mpiexec checkpointing error (RPi)

When I try to run an application (just a simple hello_world.c doesn't work) I receive this error every time:
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/ -ckpoint-interval 10 -machinefile /tmp/machinefile -n 1 ./app_name
[proxy:0:0#masterpi] requesting checkpoint
[proxy:0:0#masterpi] checkpoint completed
[proxy:0:0#masterpi] requesting checkpoint
[proxy:0:0#masterpi] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:111): Previous checkpoint has not completed.[proxy:0:0#masterpi] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:905): checkpoint suspend failed
[proxy:0:0#masterpi] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0#masterpi] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec#masterpi] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
[mpiexec#masterpi] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec#masterpi] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec#masterpi] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
I want just to make a checkpoint and nothing else (and restart later).
Thanks in advance
UPDATE:
I have tried with MPICH2, no chance. Or maybe I'm wrong somewhere...
pi#raspberrypi ~ $ mpiexec -n 1 -ckpointlib blcr -ckpoint-prefix /tmp/ -ckpoint-interval 2 ./test3
Count to: 0
[proxy:0:0#raspberrypi] requesting checkpoint
[proxy:0:0#raspberrypi] checkpoint completed
Count to: 1
[proxy:0:0#raspberrypi] requesting checkpoint
[proxy:0:0#raspberrypi] HYDT_ckpoint_checkpoint (/tmp/mpich/mpich2-1.5/src/pm/hydra/tools/ckpoint/ckpoint.c:111): Previous checkpoint has not completed.[proxy:0:0#raspberrypi] HYD_pmcd_pmip_control_cmd_cb (/tmp/mpich/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmip_cb.c:902): checkpoint suspend failed
[proxy:0:0#raspberrypi] HYDT_dmxu_poll_wait_for_event (/tmp/mpich/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0#raspberrypi] main (/tmp/mpich/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmip.c:210): demux engine error waiting for event
[mpiexec#raspberrypi] control_cb (/tmp/mpich/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:201): assert (!closed) failed
[mpiexec#raspberrypi] HYDT_dmxu_poll_wait_for_event (/tmp/mpich/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
[mpiexec#raspberrypi] HYD_pmci_wait_for_completion (/tmp/mpich/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:196): error waiting for event
[mpiexec#raspberrypi] main (/tmp/mpich/mpich2-1.5/src/pm/hydra/ui/mpich/mpiexec.c:325): process manager error waiting for completion
Test3-Code:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char* argv[]) {
int rank;
int size;
int i = 0;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Status status;
if (rank == 0) {
for(i; i <=100; i++){
int j = 0;
while(j < 100000000){
j++;
}
printf("Count to: %i\n", i);
}
} else {
}
MPI_Finalize();
return 0;
}
I just need to have one successful checkpoint and to show the restart.
If someone has a working example (irrelevant what it makes, simple working "Hello World" would make me happy!) I would be very glad.
Happy new year!
Unfortunately, the checkpoint/restart code in MPICH 3.0.4 is known to be buggy at the moment. That will hopefully get fixed in a future release. It looks like you're probably using it correctly. It's possible that if you go back to a previous version, you might have better luck.
Here the problem was with the too small interval for checkpointing.
Setting it to 20s or more has solved this (but not the other :( ) problem.

MPI: Why does my MPICH program fails for large no. of processes?

/* C Example */
#include <mpi.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
int main (int argc, char* argv[])
{
int rank, size;
int buffer_length = MPI_MAX_PROCESSOR_NAME;
char hostname[buffer_length];
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
MPI_Get_processor_name(hostname, &buffer_length); /* get hostname */
printf( "Hello world from process %d running on %s of %d\n", rank, hostname, size );
MPI_Finalize();
return 0;
}
The above program compiles and run successfully on ubuntu 12.04 for smaller no. of processes. But it fails when I try to execute with 1000s of processes. Why it is so?
I am expecting that scheduler would keep the threads in queue and can dispatch one by one (I am running this code on a single core machine)
Why the following error is coming for large no. of processes and how to resolve this issue?
root#ubuntu:/home# mpiexec -n 1000 ./hello
[proxy:0:0#ubuntu] HYDU_create_process (./utils/launch/launch.c:26): pipe error (Too many open files)
[proxy:0:0#ubuntu] launch_procs (./pm/pmiserv/pmip_cb.c:751): create process returned error
[proxy:0:0#ubuntu] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:935): launch_procs returned error
[proxy:0:0#ubuntu] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0#ubuntu] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
Killed
You are running into the open file limit on your system. Default in Ubuntu is 1024. You can try raising the limit in your session with the ulimit command.
ulimit -n 2048

QProcess::readAllStandardOutput gives flaky outputs

I am trying to create a GUI in Qt4 for my tcl based tool. In order to populate widgets I need to execute some tcl commands. I read about QProcess and I am invoking tcl scripts using QProcess and then grabbing their output from stdout.
Suppose I execute 3 commands in tcl then when I query stdout I believe I should see 3 outputs corresponding to each of the three commands, however this is not happening consistently. Behavior is flaky.
As you can see in the main.cpp I am executing multiple commands using runTclCommand() function and in the end executing getData() function to read stdout.
main.cpp:
#include <QApplication>
#include <QProcess>
#include <QDebug>
#include "Tclsh.h"
int main(int argc, char *argv[])
{
QApplication a(argc, argv);
QByteArray out;
Tclsh *tcl = new Tclsh;
tcl->startTclsh();
tcl->runTclCommand("set ::tcl_interactive 1\n");
tcl->runTclCommand("set a 23\n");
tcl->runTclCommand("puts $a\n");
tcl->runTclCommand("set a 40\n");
tcl->runTclCommand("puts $a\n");
// out = idl->getData();
out = tcl->getData();
}
Tclsh.cpp:
#include <QProcessEnvironment>
#include <QProcess>
#include <QDebug>
#include "Tclsh.h"
void Tclsh::startTclsh() {
QString program = "/usr/bin/tclsh8.4";
this->setProcessChannelMode(QProcess::MergedChannels);
this->start(program);
if ( !this->waitForStarted()) {
qDebug()<<"ERROR Starting tclsh";
}
return;
}
void Tclsh::runTclCommand(const char *cmd) {
qDebug()<<"CMD:"<<cmd;
this->write(cmd);
if (!this->waitForBytesWritten()) {
qDebug()<<"Error in writing data";
}
}
QByteArray Tclsh::getData() {
if (!this->waitForReadyRead()) {
qDebug()<<"Error in reading stdout: Ready read signal is not emitted";
}
QByteArray data = this->readAllStandardOutput();
qDebug()<<"DATA:"<<data;
return data;
}
However, sometime I get the following output:
CMD: set ::tcl_interactive 1
CMD: set a 23
CMD: puts $a
CMD: set a 40
CMD: puts $a
DATA: "1
% 23
% 23
% "
And sometimes this:
CMD: set ::tcl_interactive 1
CMD: set a 23
CMD: puts $a
CMD: set a 40
CMD: puts $a
DATA: "1
"
I do not understand why this is happening. I would really appreciate if someone can point me to the error in my approach here.
Thanks,
Newbie
Edit: After some more research, here are my thoughts
According to Qt manual, readyRead signal will be emitted whenever new data is available (as specified by #Frank Osterfeld also, thanks!). It will not wait for complete output data to be available (which is justified since it does not know when will that happen). Hence my approach is not good. What I can do is something like this:
start the process -> wait for process to finish -> read stdout
This will ensure that flaky behavior does not arise as process is already finished when I am reading hence no new data can come.
However, in this proposed approach I am not clear about one thing: Does stdout is specific to a process? I mean can it happen that process which was supposed to read stdout output from process1, can get other stdout data from some other process which happen to write stdout at the same time as process1?
Thanks,
Newbie
I am closing this question. Reading from a channel more than once does not seem to a be a good idea. Instead what I do now is write what I want to write in one go --> close the channel for writing --> then read it back. In that way I get consistent output.

Resources