Multiple Programs (vs. Kernels) in an OpenCL application - opencl

One program can contain multiple kernels. Does it make any difference if I break my kernels across multiple programs? Can they all use the same context and run on the same devices and queues?
I'm playing around with OpenCL, and happen to be working in PyOpenCL. The structure is such that the command queue is owned by a class which compiles several kernels into a program. I am implementing a few subclasses that implement some kernels that I'll run on the base class' queue. Do I need to inject those kernels back into the base class and compile it into one program, or can I compile separate programs for each subclass that all rely on the context in the base class?

An OpenCL context can have multiple programs associated with it, so you can split your kernels across several programs and everything will still work.

Related

Writing OpenCl Kernels in Python using PyOpenCl

When I write code in PyOpenCl, do I still need to write the kernels in C, or can I write them somehow in Python?
Yes, you still need to write the kernels in C.
It really is not much of a pain to deal with. And if you want a bit more abstraction, you can create a domain specific language with Python that maps to parts of C kernels.
The reason C is required for writing kernels is because OpenCL exists to create extremely performant applications. In order to make the most out of a GPU, you need to control the exact on-chip operations that the application does (such as bitwise operations), and how the application allocates the GPU's memory spaces (global, shared, and local). C is a great language for having that sort of control.

Linking Library that uses MPI, I don't, what happens behind the hood

I am linking against a library that is built with OpenMPI support for internal processes.
My application is being built with no MPI support, and I link against this library. I have no idea what is happening behind the hood with regards to mpi. If library A loads/calls functions from openmpi, does this mean I can run my application with runmpi to get the library processes to distribute themselves? If I decide to make the application MPI aware and want to use mpich2 instead of OpenMPI or if I want to use Library B that is linked against mpich instead of openmpi, will the library and my application behave themselves in their individual message spaces? Is it typical to force application developers to explicitly link an mpi implementation to use an mpi-enabled library?
The normal practice is that the developer ultimately compiles and links everything against a single MPI implementation. Where there are libraries depending on MPI, I typically see builds of the library for each available MPI implementation on a system. If you could somehow manage to link in two MPI implementations, and come up with their separate MPI_Init and MPI_COMM_WORLD definitions to use in the separate pieces of code, it might even work. That would be really tenuous, though. Don't do it.
As for your earlier question, it is almost possible to have an application call into a library using MPI and just have it do the right thing. First, the code will have to call MPI_Init somewhere. whether that's in the client or wholly encapsulated in the library may vary. The library will have to know what MPI communicator it's supposed to use; typically the client code would pass one in. Finally, the client code will have to take account of the fact that it will run on all MPI processes, not just one of them. So if it does any IO or other computation that should only happen in one process, then you'll need to set conditions accordingly.

Rust on grid computing

I'm looking to create Rust implementations of some small bioinformatics programs for my research. One of my main considerations is performance, and while I know that I could schedule the Rust program to run on a grid with qsub - the cluster I have access to uses Oracle's GridEngine - I'm worried that the fact that I'm not calling MPI directly will cause performance issues with the Rust program.
Will scheduling the program without using an MPI library hinder performance greatly? Should I use an MPI library in Rust, and if so, are there any known MPI libraries for Rust? I've looked for one but I haven't found anything.
I have used several supercomputing facilities (I'm an astrophysicist) and have often faced the same problem: I know C/C++ very well but prefer to work with other languages.
In general, any approach other than MPI will do, but consider that often such supercomputers have heavily optimised MPI libraries, often tailored for the specific hardware integrated in the cluster. It is difficult to tell how much the performance of your Rust programs will be affected if you do not use MPI, but the safest bet is to stay with the MPI implementation provided on the cluster.
There is no performance penalty in using a Rust wrapper around a C library like a MPI library, as the bottleneck is the time needed to transfer data (e.g. via a MPI_Send) between nodes, not the negligible cost of an additional function call. (Moreover, this is not the case for Rust: there is no additional function call, as already stated above.)
However, despite the very good FFI provided by Rust, it is not going to be easy to create MPI bindings. The problem lies in the fact that MPI is not a library, but a specification. Popular MPI libraries are OpenMPI (http://www.open-mpi.org) and MPICH (http://www.mpich.org). Each of them differs slightly in the way they implement the standard, and they usually cover such differences using C preprocessor macros. Very few FFIs are able to deal with complex macros; I don't know how Rust scores here.
As an instance, I am implementing an MPI Program in Free Pascal but I am not able to use the existing MPICH bindings (http://wiki.lazarus.freepascal.org/MPICH), as the cluster I am using provides its own MPI library and I prefer to use this one for the reason stated above. I was unable to reuse MPICH bindings, as they assumed that constants like MPI_BYTE were hardcoded integer constants. But in my case they are pointers to opaque structures that seem to be created when MPI_Init is called.
Julia bindings to MPI (https://github.com/lcw/MPI.jl) solve this problem by running C and Fortran programs during the installation that generate Julia code with the correct values for such constants. See e.g. https://github.com/lcw/MPI.jl/blob/master/deps/make_f_const.f
In my case I preferred to implement a middleware, I.e., a small C library which wraps MPI calls with a more "predictable" interface. (This is more or less what the Python and Ocaml bindings do too, see https://forge.ocamlcore.org/projects/ocamlmpi/ and http://mpi4py.scipy.org.) Things are running smoothly, so far I haven't got any problem.
Will scheduling the program without using an MPI library hinder performance greatly?
There are lots of ways to carry out parallel computing. MPI is one, and as comments to your question indicate you can call MPI from Rust with a bit of gymnastics.
But there are other approaches, like the PGAS family (Chapel, OpenSHMEM, Co-array Fortran), or alternative messaging like what Charm++ uses.
MPI is "simply" providing a (very useful, highly portable, aggressively optimized) messaging abstraction, but as long as you have some way to manage the parallelism, you can run anything on a cluster.

How to get all running processes in Qt

I have two questions:
Is there any API in Qt for getting all the processes that are running right now?
Given the name of a process, can I check if there is such a process currently running?
Process APIs are notoriously platform-dependent. Qt provides just the bare minimum for spawning new processes with QProcess. Interacting with any processes on the system (that you didn't start) is out of its depth.
It's also beyond the reach of things like Boost.Process. Well, at least for now. Note their comment:
Boost.Process' long-term goal is to provide a portable abstraction layer over the operating system that allows the programmer to manage any running process, not only those spawned by it. Due to the complexity in offering such an interface, the library currently focuses on child process management alone.
I'm unaware of any good C++ library for cross-platform arbitrary process listing and management. You kind of have to just pick the platforms you want to support and call their APIs. (Or call out to an external utility of some kind that will give you back the info you need.)

Howto compile MPI application in "serial" mode (without using MPI compiler)?

This question might sound a bit weird...
Imagine I have an MPI application, but I don't have a system with MPI installed.
So I want to compile the application with no MPI support (1-process, 1-thread) without modifying source code.
Is that possible?
I found somewhere a "mimic_mpi.h" wrapper which is supposed to do exactly what I want. But there were some MPI functions missing in there (e.g., MPI_Cart_create, MPI_Cart_get, etc.), so I didn't succeed.
mimic_mpi.h http://openmx.sourcearchive.com/documentation/3.2.4.dfsg-3/mimic__mpi_8h-source.html
mimic_mpi.c http://openmx.sourcearchive.com/documentation/3.2.4.dfsg-3/mimic__mpi_8c-source.html
Do you know any other approach I could use to compile MPI apps with no MPI support?
Thanks in advance!
You can run a "real" MPI application easily with a single process. In practice this even works without using mpiexec/mpirun although I'm not sure if that's officially supported. That said a full and confirming 1-process MPI "serial" implementation would probably become rather complex and its own library - so in that case, why not just use a real full MPI implementation?
I hope you see the circle I'm trying to draw:
If you want full MPI behavior, just use an MPI implementation - regardless if it's just limited to a single process.
In practice, applications that want to be able to function with or without MPI often seem to use their own MPI abstractions using domain specific communication wrappers, #ifdef HAVE_MPI or more complex macros.

Resources