mpi multiple init finalize - mpi

Assuming I have good reason to do the following (I think I have), how to make it works?
#include "mpi.h"
int main( int argc, char *argv[] )
{
int myid, numprocs;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
// ...
MPI_Finalize();
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
// ...
MPI_Finalize();
return 0;
}
I got the error:
--------------------------------------------------------------------------
Calling any MPI-function after calling MPI_Finalize is erroneous.
The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** after MPI was finalized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[ange:13049] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!
The reason to do that:
I've Python wrapping around C++ code. Some wrapped class have constructor that call MPI_Init, and destructor that call MPI_Finalize. I would like to be able in Python to freely create, delete re-create the Python object that wrap this C++ class. The ultimate goal is to create a webservice entirely in Python, that import the Python C++ exstension once, and execute some Python code given the user request.
EDIT: I think I'll refactor the C++ code to give possibility to not MPI_Init and MPI_Finalize in constructor and destructor, so it's possible to do it exactly one time in the Python script (using mpi4py).

You've basically got the right solution, so I'll just confirm. It is in fact erroneous to call MPI_Init and MPI_Finalize multiple times, and if you have an entity that calls these internally on creation/destruction, then you can only instantiate that entity once. If you want to create multiple instances, you'll need to change the entity to do one of the following:
Offer an option to not call Init and Finalize that the user can set externally
Use MPI_Initialized and MPI_Finalized to decide whether it needs to call either of the above

Related

googlebenchmark and MPI: Is there hope?

I want to run a particular MPI function under google benchmark. Something like:
#include <mpi.h>
#include <benchmark/benchmark.h>
template<class Real>
void MPIInitFinalize(benchmark::State& state)
{
auto mpi = []() {
MPI_Init(nullptr, nullptr);
foo();
MPI_Finalize();
};
for(auto _ : state) {
mpi();
}
}
BENCHMARK_TEMPLATE(MPIInitFinalize, double);
BENCHMARK_MAIN();
Of course, we know what will happen:
*** The MPI_Init() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
I understand that MPI isn't cool with what I want to do. But google benchmark is simply too useful to not at least try to find a hack to make this work.
Is there anything that can be done? Can I fork a process and pass the lambda to it? Is there a threading pattern that will work? Even expensive things will be helpful, as I can just subtract the cost of doing whatever hack works without a call too foo() from the one which call foo().
If you don't need to include MPI_Init and MPI_Finalize in your time (which you probably don't want anyways) you can take alook at this gist: https://gist.github.com/mdavezac/eb16de7e8fc08e522ff0d420516094f5
It countains an example on how to benchmark MPI enabled code with google benchmark. The basic idea is to call google benchmark from your own main method (using ::benchmark::Initialize(&argc, argv) and ::benchmark::RunSpecifiedBenchmarks()), synchronize using MPI_Barrier, time your code using std::chrono::high_resolution_clock and using MPI_Allreduce to find the slowest process. You can then publish that time using state.SetIterationTime (but only on the main process).

How to determine if an MPI communicator is valid?

In my program, I have wrapped up some MPI communicators in to a data structure. Unfortunately, sometimes the destructor of an object of this type might get called before it has been initialized. In my destructor, I of course call MPI_Comm_Free. But if this is called on an invalid communicator the code crashes.
I've been looking through the MPI standard, but I can't find a function to test if a communicator is valid. I also assume I can't use MPI_Comm_set_errhandler to try and catch the free exception because there isn't a valid communicator to set the handler of. I could maintain a flag value of my own saying if the communicator is valid, but I prefer to avoid duplicating state information like that. Is there any built in way I can safely check if a communicator is valid?
Here is a basic program demonstrating my problem:
#include <mpi.h>
typedef struct {
MPI_Comm comm;
} mystruct;
void cleanup(mystruct* a) {
MPI_Comm_free(&(a->comm));
}
int main(int argc, char* argv[]) {
MPI_Init(&argc, &argv);
mystruct a;
/* Some early exit condition triggers cleanup without
initialization */
cleanup(&a);
MPI_Finalize();
return 0;
}
MPI_COMM_NULL is a constant used for invalid communicators. However, you cannot determine if an MPI communicator has been initialized. In C, it is impossible to determine if a variable has been initialized. Non-static variables start with an indeterminate value, reading it causes undefined behavior.
You must initialized the communicator with MPI_COMM_NULL yourself. This only make sense if cannot possibly create actual communicator during initialization.
Note: MPI_Comm_free also sets comm to MPI_COMM_NULL.

My signal / slot connection does not work

I repeatedly see people having problems with slots not being called. I would like to collect some of the most common reasons. So maybe I can help people and avoid a lot of redundant questions.
What are reasons for signal / slot connections not working? How can such problems be avoided?
There are some rules that make life with signals and slots easier and cover the most common reason for defective connections. If I forgot something please tell me.
1) Check the debug console output:
When execution errors occur, the debug output can show you the reason.
2) Use the full signature of signal and slot:
Instead of
connect(that, SIGNAL(mySignal), this, SLOT(mySlot));
write
connect(that, SIGNAL(mySignal(int)), this, SLOT(mySlot(int)));
and check your spelling and capitalization.
3) Use existing overloads:
Carefully check if you are using the desired overloads of signal and slot and if the overloads you used actually exist.
4) Your signal and slot must be compatible:
This especially means the parameters must be of the same type (references are tolerated) and have the same order.
Compile-time syntax also needs the same number of parameters. Old runtime syntax allows connecting signals to slots with less parameters.
5) Always check return value of connect method (programmers should never ignore return values):
Instead of
connect(that, SIGNAL(mySignal(int)), this, SLOT(mySlot(int)));
always use something like
bool success = connect(that, SIGNAL(mySignal(int)), this, SLOT(mySlot(int)));
Q_ASSERT(success);
Or if you like throw an exception or implement full error handling. You may also use a macro like that:
#ifndef QT_NO_DEBUG
#define CHECK_TRUE(instruction) Q_ASSERT(instruction)
#else
#define CHECK_TRUE(instruction) (instruction)
#endif
CHECK_TRUE(connect(that, SIGNAL(mySignal(int)), this, SLOT(mySlot(int))));
6) You need an event loop for queued connections:
I.e. when ever you connect signals/slots of two objects owned by different threads (so called queued connections) you need to call exec(); in the slot's thread!
The event loop also needs to be actually served. Whenever the slot's thread is stuck in some kind of busy loop, queued connections are NOT executed!
7) You need register custom types for queued connections:
So when using custom types in queued connections you must register them for this purpose.
First declare the type using the following macro:
Q_DECLARE_METATYPE(MyType)
Then use one of the following calls:
qRegisterMetaType<MyTypedefType>("MyTypedefType"); // For typedef defined types
qRegisterMetaType<MyType>(); // For other types
8) Prefer new compile time syntax over old run-time checked syntax:
Instead of
connect(that, SIGNAL(mySignal(int)), this, SLOT(mySlot(int)));
use this syntax
connect(that, &ThatObject::mySignal, this, &ThisObject::mySlot));
which checks signal and slot during compile time and even does not need the destination being an actual slot.
If your signal is overloaded use the following syntax:
connect(that, static_cast<void (ThatObject::*)(int)> &ThatObject::mySignal), this, &ThisObject::mySlot); // <Qt5.7
connect(that, qOverload<int>::of(&ThatObject::mySignal), this, &ThisObject::mySlot); // >=Qt5.7 & C++11
connect(that, qOverload<int>(&ThatObject::mySignal), this, &ThisObject::mySlot); // >=Qt5.7 & C++14
Starting with Qt5.14, overloaded signals are deprecated. Disable deprecated Qt features to get rid of the above shenanigans.
Also do not mix const/non-const signals/slots for that syntax (normally signals and slots will be non-const).
9) Your classes need a Q_OBJECT macro:
In classes where you are using "signals" and "slots" specifications you need to add a Q_OBJECT macro like this:
class SomeClass
{
Q_OBJECT
signals:
void MySignal(int x);
};
class SomeMoreClass
{
Q_OBJECT
public slots:
void MySlot(int x);
};
This macro adds necessary meta information to the class.
10) Your objects must be alive:
As soon as either the sender object or the receiver object is destroyed, Qt automatically discards the connection.
If the signal isn't emitted: Does the sender object still exist?
If the slot isn't called: Does the receiver object still exist?
To check the lifetime of both objects use a debugger break point or some qDebug() output in the constructors/destructors.
11) It still does not work:
To do a very quick and dirty check of your connection emit the signal by your self using some dummy arguments and see if it is called:
connect(that, SIGNAL(mySignal(int)), this, SLOT(mySlot(int)));
emit that->mySignal(0); // Ugly, don't forget to remove it immediately
Finally of course it is possible that the signal simply is not emitted. If you followed the above rules, probably something is wrong in your program's logic. Read the documentation. Use the debugger. And if there is now other way, ask at stackoverflow.
In my practice, I have encountered cases of incorrectly overriding eventFilter in the object receiving the signal. Some novice programmers forget to return "false" at the end of function. And thus do not allow the MetaCall event to pass to the receiving object. In this case, the signal is not processed at the receiving object.
Short answer
You (almost) don't have to worry about that anymore. Always use the QMetaMethod/Pointer to member prototype of connect, as it will fail at compile time if the signal and slot are not compatible.
connect(sourceObject, &SourceClass::signal, destObject, &DestClass::slot);
This prototype will only fail at runtime if the sourceObject or destObject is null (which is to be expected). But argument incompatibility will show up during compilation
Only rare situations require the older SIGNAL/SLOT literal-based syntax, so this should be your last resort.
Compatibility
The signatures are compatible if the following conditions are met:
You are connecting a signal to a slot or a signal
The destination signal/slot has the same number or less arguments than the source signal
Arguments of the source signal can be implicitly converted to the corresponding argument (matched in order) in the destination signal/slot, if used
Examples
OK - signalA(int, std::string) => signalC(int, std::string)
Note that we are connecting to a signal
OK - signalA(int, std::string) => slotB(int, std::string)
OK - signalA(int, std::string) => slotB(int)
String parameter ignored
OK - signalA(int, std::string) => slotB()
All parameters ignored
OK - signalA(int, const char*) => slotB(int, QString)
Implicitely converted with QString(const char*)
Fails - signalA(int, std::string) => slotB(std::string)
int not implicitely convertible to std::string
Fails - signalA(int, std::string) => slotB(std::string, int)
Incorrect order
Fails - signalA(int, std::string) => slotB(int, std::string, int)
Too many arguments on the right side

Controlled premature termination of mpi program running under slurm?

I am running a script that does multiple subsequent mpirun calls through slurms squeue command. Each call to mpirun will write its output to an own directory, but there is a dependency between them in the way that a given run will use data from the former runs output directory.
The mpi program internally performs some iterative optimization algorithm, which will terminate if some convergence criteria are met. Every once in a while it happens, that the algorithm reaches a state in which those criteria are not quite met yet, but by plotting the output (which is continuosly written to disk) one can quite easily tell that the important things have converged and that further iterations would not change the nature of the final result anymore.
What I am therefore looking for is a way to manually terminate the run in a controlled way and have the outer script proceed to the next mpirun call. What is the best way to achieve this? I do not have direct access to the node on which the calculation is actually performed, but I have of course access to all of slurms commands and the working directories of the individual runs. I have access to the mpi programs full source code.
One solution that would work is the following: If one manually wants to terminate a run, one places a file with a special name like killme in the working directory, which could easily be done with touch killme. The mpi program would regulary check for the existence of this file and terminate in a controlled manner if it exists. The outer script or slurm would not be involved at all here and the script would just continue with the next mpirun call. What do you think of this solution? Can you think of anything better?
Here is a short code snippet for getting SIGUSR1 as a signal.
More detailed explanation can be found here.
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>
void sighandler(int signum, siginfo_t *info, void *ptr) {
fprintf(stderr, "Received signal %d\n", signum);
fprintf(stderr, "Signal originates from process %lu\n",
(unsigned long) info->si_pid);
fprintf(stderr, "Shutting down properly.\n");
exit(0);
}
int main(int argc, char** argv) {
struct sigaction act;
printf("pid %lu\n", (unsigned long) getpid());
memset(&act, 0, sizeof(act));
act.sa_sigaction = sighandler;
act.sa_flags = SA_SIGINFO;
sigaction(SIGUSR1, &act, NULL);
while (1) {
};
return 0;
}

Correct way to quit a Qt program?

How should I quit a Qt Program, e.g when loading a data file, and discovered file corruption, and user need to quit this app or re-initiate data file?
Should I:
call exit(EXIT_FAILURE)
call QApplication::quit()
call QCoreApplication::quit()
And difference between (2) and (3)?
QApplication is derived from QCoreApplication and thereby inherits quit() which is a public slot of QCoreApplication, so there is no difference between QApplication::quit() and QCoreApplication::quit().
As we can read in the documentation of QCoreApplication::quit() it "tells the application to exit with return code 0 (success).". If you want to exit because you discovered file corruption then you may not want to exit with return code zero which means success, so you should call QCoreApplication::exit() because you can provide a non-zero returnCode which, by convention, indicates an error.
It is important to note that "if the event loop is not running, this function (QCoreApplication::exit()) does nothing", so in that case you should call exit(EXIT_FAILURE).
You can call qApp->exit();. I always use that and never had a problem with it.
If you application is a command line application, you might indeed want to return an exit code. It's completely up to you what the code is.
While searching this very question I discovered this example in the documentation.
QPushButton *quitButton = new QPushButton("Quit");
connect(quitButton, &QPushButton::clicked, &app, &QCoreApplication::quit, Qt::QueuedConnection);
Mutatis mutandis for your particular action of course.
Along with this note.
It's good practice to always connect signals to this slot using a
QueuedConnection. If a signal connected (non-queued) to this slot is
emitted before control enters the main event loop (such as before "int
main" calls exec()), the slot has no effect and the application never
exits. Using a queued connection ensures that the slot will not be
invoked until after control enters the main event loop.
It's common to connect the QGuiApplication::lastWindowClosed() signal
to quit()
If you're using Qt Jambi, this should work:
QApplication.closeAllWindows();
if you need to close your application from main() you can use this code
int main(int argc, char *argv[]){
QApplication app(argc, argv);
...
if(!QSslSocket::supportsSsl()) return app.exit(0);
...
return app.exec();
}
The program will terminated if OpenSSL is not installed
//How to Run App
bool ok = QProcess::startDetached("C:\\TTEC\\CozxyLogger\\CozxyLogger.exe");
qDebug() << "Run = " << ok;
//How to Kill App
system("taskkill /im CozxyLogger.exe /f");
qDebug() << "Close";
example

Resources