Howto compile MPI application in "serial" mode (without using MPI compiler)? - mpi

This question might sound a bit weird...
Imagine I have an MPI application, but I don't have a system with MPI installed.
So I want to compile the application with no MPI support (1-process, 1-thread) without modifying source code.
Is that possible?
I found somewhere a "mimic_mpi.h" wrapper which is supposed to do exactly what I want. But there were some MPI functions missing in there (e.g., MPI_Cart_create, MPI_Cart_get, etc.), so I didn't succeed.
mimic_mpi.h http://openmx.sourcearchive.com/documentation/3.2.4.dfsg-3/mimic__mpi_8h-source.html
mimic_mpi.c http://openmx.sourcearchive.com/documentation/3.2.4.dfsg-3/mimic__mpi_8c-source.html
Do you know any other approach I could use to compile MPI apps with no MPI support?
Thanks in advance!

You can run a "real" MPI application easily with a single process. In practice this even works without using mpiexec/mpirun although I'm not sure if that's officially supported. That said a full and confirming 1-process MPI "serial" implementation would probably become rather complex and its own library - so in that case, why not just use a real full MPI implementation?
I hope you see the circle I'm trying to draw:
If you want full MPI behavior, just use an MPI implementation - regardless if it's just limited to a single process.
In practice, applications that want to be able to function with or without MPI often seem to use their own MPI abstractions using domain specific communication wrappers, #ifdef HAVE_MPI or more complex macros.

Related

Linking Library that uses MPI, I don't, what happens behind the hood

I am linking against a library that is built with OpenMPI support for internal processes.
My application is being built with no MPI support, and I link against this library. I have no idea what is happening behind the hood with regards to mpi. If library A loads/calls functions from openmpi, does this mean I can run my application with runmpi to get the library processes to distribute themselves? If I decide to make the application MPI aware and want to use mpich2 instead of OpenMPI or if I want to use Library B that is linked against mpich instead of openmpi, will the library and my application behave themselves in their individual message spaces? Is it typical to force application developers to explicitly link an mpi implementation to use an mpi-enabled library?
The normal practice is that the developer ultimately compiles and links everything against a single MPI implementation. Where there are libraries depending on MPI, I typically see builds of the library for each available MPI implementation on a system. If you could somehow manage to link in two MPI implementations, and come up with their separate MPI_Init and MPI_COMM_WORLD definitions to use in the separate pieces of code, it might even work. That would be really tenuous, though. Don't do it.
As for your earlier question, it is almost possible to have an application call into a library using MPI and just have it do the right thing. First, the code will have to call MPI_Init somewhere. whether that's in the client or wholly encapsulated in the library may vary. The library will have to know what MPI communicator it's supposed to use; typically the client code would pass one in. Finally, the client code will have to take account of the fact that it will run on all MPI processes, not just one of them. So if it does any IO or other computation that should only happen in one process, then you'll need to set conditions accordingly.

Rust on grid computing

I'm looking to create Rust implementations of some small bioinformatics programs for my research. One of my main considerations is performance, and while I know that I could schedule the Rust program to run on a grid with qsub - the cluster I have access to uses Oracle's GridEngine - I'm worried that the fact that I'm not calling MPI directly will cause performance issues with the Rust program.
Will scheduling the program without using an MPI library hinder performance greatly? Should I use an MPI library in Rust, and if so, are there any known MPI libraries for Rust? I've looked for one but I haven't found anything.
I have used several supercomputing facilities (I'm an astrophysicist) and have often faced the same problem: I know C/C++ very well but prefer to work with other languages.
In general, any approach other than MPI will do, but consider that often such supercomputers have heavily optimised MPI libraries, often tailored for the specific hardware integrated in the cluster. It is difficult to tell how much the performance of your Rust programs will be affected if you do not use MPI, but the safest bet is to stay with the MPI implementation provided on the cluster.
There is no performance penalty in using a Rust wrapper around a C library like a MPI library, as the bottleneck is the time needed to transfer data (e.g. via a MPI_Send) between nodes, not the negligible cost of an additional function call. (Moreover, this is not the case for Rust: there is no additional function call, as already stated above.)
However, despite the very good FFI provided by Rust, it is not going to be easy to create MPI bindings. The problem lies in the fact that MPI is not a library, but a specification. Popular MPI libraries are OpenMPI (http://www.open-mpi.org) and MPICH (http://www.mpich.org). Each of them differs slightly in the way they implement the standard, and they usually cover such differences using C preprocessor macros. Very few FFIs are able to deal with complex macros; I don't know how Rust scores here.
As an instance, I am implementing an MPI Program in Free Pascal but I am not able to use the existing MPICH bindings (http://wiki.lazarus.freepascal.org/MPICH), as the cluster I am using provides its own MPI library and I prefer to use this one for the reason stated above. I was unable to reuse MPICH bindings, as they assumed that constants like MPI_BYTE were hardcoded integer constants. But in my case they are pointers to opaque structures that seem to be created when MPI_Init is called.
Julia bindings to MPI (https://github.com/lcw/MPI.jl) solve this problem by running C and Fortran programs during the installation that generate Julia code with the correct values for such constants. See e.g. https://github.com/lcw/MPI.jl/blob/master/deps/make_f_const.f
In my case I preferred to implement a middleware, I.e., a small C library which wraps MPI calls with a more "predictable" interface. (This is more or less what the Python and Ocaml bindings do too, see https://forge.ocamlcore.org/projects/ocamlmpi/ and http://mpi4py.scipy.org.) Things are running smoothly, so far I haven't got any problem.
Will scheduling the program without using an MPI library hinder performance greatly?
There are lots of ways to carry out parallel computing. MPI is one, and as comments to your question indicate you can call MPI from Rust with a bit of gymnastics.
But there are other approaches, like the PGAS family (Chapel, OpenSHMEM, Co-array Fortran), or alternative messaging like what Charm++ uses.
MPI is "simply" providing a (very useful, highly portable, aggressively optimized) messaging abstraction, but as long as you have some way to manage the parallelism, you can run anything on a cluster.

Getting OpenCL program code from GPU

I have program which use OpenCL do do math, how i can get source code of opencl, that execute on my gpu when this program do calculations?
The most straightforward approach is to look for the kernel string in the application. Sometimes you'll be able to just find its source lying in some .cl file, otherwise you can try to scan the application's binaries with something like strings. If the application is not purposefully obfuscating the kernel source, you're likely to find it using one of those methods.
A more bulletproof approach would be to catch the strings provided to the OpenCL API. You can even provide your own OpenCL implementation that just prints out the kernel strings in the relevant cl function. It's actually pretty easy: start with pocl and change the implementation of clCreateProgramWithSource to print out the input strings - this is a trivial code change.
You can then install that modified version as an OpenCL implementation and make sure the application uses it. This might be tricky if the application requires certain OpenCL capabilities, but your implementation can of course lie about those.
Notice that in the future, SPIR can make this sort of thing impossible - you'll be able to get an IR of the kernel, but not its source.
clGetProgramInfo(..., CL_PROGRAM_BINARIES, ...) gets you the compiled binary, but interpreting that is dependent upon the architecture. Various SDK's have different tools that might get you GPU assembly though.

QThreads Vs Pthreads

I have a quick question. I am supposed to create a small multithreaded program to grab data from multiple sensors and I have knowledge of both pthreads and qthreads. I have access to both libraries. personally I am biased towards using Qt because of its design and various functionalities. But is there a significant advantage on using one vs the other?
Thanks
QThreads are built upon pthreads. They provide an Object Oriented abstraction, making it easier to work with threads. Besides QThreads are portable, they can run on whatever system using the underlying thread system, while pthreads are specific of POSIX systems.
The almost-only disadvantage of using QThreads is that you'll need to link your application against Qt; this dependence could make it a little more difficult to distribute your application.
But you have to know what QThreads use event loop for managment it, so you can't just kill thread like with pthread. If threads do long and hard work, it's not possible to stop it while it not to be released. In some case it's important.
I think at the heart of things, QThread under linux uses pthread. I'm not sure what is under the hood for the Windows side of it. Unless there are some some specific pthread API functions that you need that aren't available with QThread, I would stick with QThread just to benefit from the portability that it will give you. I wouldn't expect there to be any significant performance difference. Qthread will also allow you to use the signal/slot mechanism across thread boundaries.

Is porting qt to another OS as simple as this?

The article Porting Qt for Embedded Linux to Another Operating System lists five things you have to do to port Qt for Embedded Linux to another OS. From the article:
There are several issues to be aware of if you plan to do your own port to another operating system. In particular you must resolve Qt for Embedded Linux's shared memory and semaphores (used to share window regions), and you must provide something similar to Unix-domain sockets for inter-application communication. You must also provide a screen driver, and if you want to implement sound you must provide your own sound server. Finally you must modify the event dispatcher used by Qt for Embedded Linux.
Is it really this easy to port Qt to another OS, or have i missed some information?
Another important component to port would be QAtomic, to ensure that you can have atomic operations and implicit sharing working well. See also
http://labs.trolltech.com/blogs/2007/08/28/say-hello-to-qatomicint-and-qatomicpointer/
Since Qt has been ported a large number of times it seems logical that it would be inherently simple. However the issue really is on the platform you are porting to and how many features it currently supports.
Assuming you find all those things easy, then the port is easy.
After investigating this in more detail I have come to the conclusion that the article "Porting Qt for Embedded Linux to Another Operating System" assumes that you are porting Qt to a very "linux-like" OS.
I have attempted this and currently making progress.
Some difficulties:
IDE - I have to manually add all Qt files and fight the compiler with #ifdefs until it builds with all dependencies in place.
Linux(ness) - I've had to disable all Linux/Windows things that are not supported in my target OS: threads, sockets, processes. Even the timers are slightly different.
Tips:
Start small : I compiled QtCore as a standard lib within my IDE, next up is QtGui which is a behemoth compared to QtCore.
I plan to run only a single QThread, so I have to artificially made a Thread object to avoid null pointers. You cannot compile out Thread information as it is key to all QObjects.
So far I have an qeventloop running within a qcoreapplication.
I wrote some inline assembly but had serious difficulties with my IDE and compilation. I left it in C++ and let the assembler handle it for me. Because I am single-threaded, I am not too concerned with shared data/ exclusive access as required by the atomic operations.

Resources