I have a simple question (in my mind) and I cannot find an answer. How do I suppress output messages from mpirun?
For example, I have an MPI based program that takes input file names. If a file name is bad, the program generates a log file such as:
Beginning initialization...
*****************************
Reading topology file...
Error: Topology file mysample.top was not found.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 21581 on
node newton-compute-2-25.local exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
The behavior is correct; the program terminates execution (by calling MPI_Abort) with a message that the input file is bad. The messages from MPI are not necessary, and these are what I would like to suppress.
I did try adding the -q and --quiet options to the mpirun call, but they appear to do nothing for this particular problem. I am also using OpenMPI, if the implementation matters.
Edit: I should mention the MPI messages go to stderr, which is not necessarily stdout. That is fine, but I still do not want to see them with error messages from the program.
As MPI must be capable of handling errors from all nodes it is run on, I'm pretty confident you can't split the MPI error stream and the processes error streams. You can remove all the stderr with 2>/dev/null or to an error log with 2> err.log, but again, I don't believe you can split the errors.
Related
When exactly signal will start execution in unix ?Does the signal will be processed when system turns into kernel mode? or immediately when it is receives signal? I assume it will be processed immediate when it receives.
A signal is the Unix mechanism for allowing a user space process to receive asynchronous notifications. As such, signals are always "delivered by" the kernel. And hence, it is impossible for a signal to be delivered without a transition into kernel mode. Therefore it doesn't make sense to talk of a process "receiving" a signal (or sending one) without the involvement of the kernel.
Signals can be generated in different ways.
They can be generated by a device driver within the kernel (for example, tty driver in response to the interrupt, kill, or stop keys or in response to input or output by a backgrounded process).
They can be generated by the kernel in response to an emergent out-of-memory condition.
They can be generated by a processor exception in response to something the process itself does during its execution (illegal instruction, divide by zero, reference an illegal address).
They can be generated directly by another process (or by the receiving process itself) via kill(2).
SIGPIPE can be generated as a result of writing to a pipe that has no reader.
But in every case, the signal is delivered to the receiving process by the kernel and hence through a kernel-mode transition.
The kernel might need to force that transition -- pre-empt the receiving process -- in order to deliver the signal (for example, in the case of a CPU-bound process running on processor A being sent a signal by a different process running on processor B).
In some cases, the signal may be handled for the process by the kernel itself (for example, with SIGKILL -- or several others when no signal handler is configured).
Actually invoking a process' signal handler is done by manipulating the process' user space stack so that the signal handler is invoked on return from kernel-mode and then, if/when the signal handler procedure returns, the originally executing code can be resumed.
As to when it is processed, that is subject to a number of different factors.
There are operating system (i.e. kernel) operations that are never interrupted by signals (these are generally relatively short duration operations), in which case the signal will be processed after their completion.
The process may have temporarily blocked signal delivery, in which case the signal will be "pending" until it is unblocked.
The process could be swapped out or non-runnable for any of a number of reasons -- in which case, its signal handler cannot be invoked until the process is runnable again.
Resuming the process in order to deliver the signal might be delayed by interrupts and higher priority tasks.
A signal will be immediately detected by the process which receives it.
Depending on the signal type, the process might treat it with the default handler, might ignore it or might execute a custom handler. It depends a lot on what the process is and what signal it receives. The exception is the kill signal (9) which is treated by the kernel and terminates the execution of the process which was supposed to receive it.
What's the difference between the SIGINT signal and the SIGTERM signal? I know that SIGINT is equivalent to pressing ctrl+c on the keyboard, but what is SIGTERM for? If I wanted to stop some background process gracefully, which of these should I use?
The only difference in the response is up to the developer. If the developer wants the application to respond to SIGTERM differently than to SIGINT, then different handlers will be registered. If you want to stop a background process gracefully, you would typically send SIGTERM. If you are developing an application, you should respond to SIGTERM by exiting gracefully. SIGINT is often handled the same way, but not always. For example, it is often convenient to respond to SIGINT by reporting status or partial computation. This makes it easy for the user running the application on a terminal to get partial results, but slightly more difficult to terminate the program since it generally requires the user to open another shell and send a SIGTERM via kill. In other words, it depends on the application but the convention is to respond to SIGTERM by shutting down gracefully, the default action for both signals is termination, and most applications respond to SIGINT by stopping gracefully.
If I wanted to stop some background process gracefully, which of these should I use?
The unix list of signals date back to the time when computers had serial terminals and modems, which is where the concept of a controlling terminal originates. When a modem drops the carrier, the line is hung up.
SIGHUP(1) therefore would indicate a loss of connection, forcing programs to exit or restart. For daemons like syslogd and sshd, processes without a terminal connection that are supposed to keep running, SIGHUP is typically the signal used to restart or reset.
SIGINT(2) and SIGQUIT(3) are literally "interrupt" or "quit" - "from keyboard" - giving the user immediate control if a program would go haywire. With a physical character based terminal this would be the
only way to stop a program!
SIGTERM(15) is not related to any terminal handling, and can only be sent from another process. This would be the conventional signal to send to a background process.
SIGINT is a program interrupt signal,
which will sent when an user presses Ctrl+C.
SIGTERM is a termination signal, this will sent to an process to request that process termination, but it can be caught or ignored by that specific process.
I have a bash script where i kill a running process by sending the SIGTERM signal to it's process ID. However, i want to know the return code of the process i just sent the signal.
Is that possible?
i cannot use 'wait' because the process to kill was not started from my script and i'm receiving
"pid ##### is not a child of this shell"
I did some tests in a command line, in a console where the process was running, after i send the SIGTERM signal (from another console), i checked the exit code and it was 143.
I want to kill the process from a different script and catch that number.
As shellter said, you cannot get the exit code of a process except using wait (or waitpid(), etc...) and you can only do that if you are its parent.
But even if you could, think about this:
When you send a process a SIGTERM, only one of three things can happen:
The process has not installed any signal handler for SIGTERM. In this case it dies immediately as a result of the signal. But in this case the exit code is uninteresting – you already know what it is. On most platforms it is 143 (128 + integer value of SIGTERM), indicating, unsurprisingly, that the process has died as a result of SIGTERM.
The process has configured SIGTERM to be ignored. In this case, nothing happens, the process does not die, and so there is no exit code to obtain anyway.
The process has installed a signal handler for SIGTERM. In this case, the handler is invoked. The handler might do anything at all: possibly nothing, possibly exit immediately, possibly carry out some cleanup operation and exit later, possibly something completely different. Even if the process does exit, that's only an indirect result of the signal, and it happens at a later time, so there is no exit code to obtain that comes directly from the delivery of the signal.
Apparently, mpirun uses a SIGINT handler which "forwards" the SIGINT signal to each of the processes it spawned.
This means you can write an interrupt handler for your mpi-enabled code, execute mpirun -np 3 my-mpi-enabled-executable and then SIGINT will be raised for each of the three processes. Shortly after that, mpirun exits. This works fine when you have a small custom handler which only prints an error message and then exits. However, when your custom interrupt handler is doing a non-trivial job (e.g. doing serious computations or persisting data), the handler does not run to completion. I'm assuming this is because mpirun decided to exit too soon.
Here's the stderr upon pressing ctrl-c (i.e. causing SIGINT) after executing my-mpi-enabled-executable. This is the desirable expected behavior:
interrupted by signal 2.
running viterbi... done.
persisting parameters... done.
the master process will now exit.
Here's the stderr upon pressing ctrl-c after executing mpirun -np 1 my-mpi-enabled-executable. This is the problematic behavior:
interrupted by signal 2.
running viterbi... mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 8970 on node pharaoh exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
Answering any of the following questions will solve my problem:
How to override the mpirun SIGINT handler (if at all possible)?
How to avoid the termination of the processes mpirun spawned right after mpirun terminates?
Is there another signal which mpirun may be sending to the children processes before mpirun terminates?
Is there a way to "capture" the so-called "signal 0 (Unknown signal 0)" (see the second stderr above)?
I'm running openmpi-1.6.3 on linux.
As per the OpenMPI manpage you can send a SIGUSR1 or SIGUSR2 to mpirun which will forward it and not shut down itsself.
When having the same issue, I came across this question and the answer by #Zulan.
In particular I wanted to catch a SIGINT (Ctrl+C) from the user, do some stuff and then exit in an orderly fashion. Thus, using SIGUSR1 was not an option. Reading the man page that #Zulan linked however, shows that mpirun (at least the OpenMPI version) catches a SIGINT and then sends a SIGTERM signal to the child processes. Thus, catching SIGTERM in my code allowed me to call the proper exit routines.
Note that signal handling is not save with MPI as noted here.
I'm using an OpenSSL library in multi-threading application.
For various reasons I'm using blocking SSL connection. And there is a situation when client hangs on
SSL_connect
function.
I moved connection procedure to another thread and created timer. On timeout connection thread is terminated using:
QThread::terminate()
The thread is terminable, but on the next attempt to start thread I get:
QThread::start: Thread termination error:
I checked the "max thread issue" and that's not the case.
I'm working on CentOS 6.0 with QT 4.5, OpenSSL 1.0
The question is how to completely terminate a thread.
The Qt Documentation about terminate() tells:
The thread may or may not be terminated immediately, depending on the operating systems scheduling policies. Use QThread::wait() after terminate() for synchronous termination.
but also:
Warning: This function is dangerous and its use is discouraged. The thread can be terminated at any point in its code path. Threads can be terminated while modifying data. There is no chance for the thread to clean up after itself, unlock any held mutexes, etc. In short, use this function only if absolutely necessary.
Assuming you didn't reimplement QThread::run() (which is usually not necessary) - or if you actually reimplemented run and called exec() yourself, the usual way to stop a thread would be:
_thread->quit();
_thread->wait();
The first line tells the thread asynchronously to stop execution which usually means the thread will finish whatever it is currently doing and then return from it's event loop. However, quit() always instantly returns which is why you need to call wait() so the main thread is blocked until _thread was actually ended. After that, you can safely start() the thread again.
If you really want to get rid of the thread as quickly as possible, you can also call wait() after terminate() or at least before you call start() again