Using ScaLAPACK routine from MPI C program - mpi

I currently have an MPI program written in C and I want to use a routine from ScaLAPACK.
I'm working on a parallel version of LDA, and one step is inverting a matrix.
I found a routine in ScaLAPACK that solves this pdgetri.f (it's written in fortran, I'm not sure that a c routine exists), but I'm not sure how to configure it to work. I'm using Windows and an Intel Dual Core Laptop. The purpose is more didactic than for performance.

SCALAPACK relies on BLACS to provide abstraction to whatever message passing system is in use. If you have an existing MPI communicator established in your code, you can use blacs_gridmap to initialise a BLACS context which is mapped onto your communicator. That context can then be used to create SCALAPACK distributed arrays and those arrays then passed to SCALPACK routines which will then operate on them.
How you tackle the C-Fortran interfacing problem will depend a lot on what compiler(s) you are using. If you have a "modern" compiler which supports Fortran 2003 features, you can use the C interoperability language features to write an interface wrapper for the functions you need, and then call them directly. On UNIX/LINUX style systems, F2C style interfacing was the defacto way to call Fortran from C, although some of the details where usually compiler specific. I don't use Windows at all, so I can really help you if you can't use Fortran 2003 interoperability.

Related

Writing OpenCl Kernels in Python using PyOpenCl

When I write code in PyOpenCl, do I still need to write the kernels in C, or can I write them somehow in Python?
Yes, you still need to write the kernels in C.
It really is not much of a pain to deal with. And if you want a bit more abstraction, you can create a domain specific language with Python that maps to parts of C kernels.
The reason C is required for writing kernels is because OpenCL exists to create extremely performant applications. In order to make the most out of a GPU, you need to control the exact on-chip operations that the application does (such as bitwise operations), and how the application allocates the GPU's memory spaces (global, shared, and local). C is a great language for having that sort of control.

Linking Library that uses MPI, I don't, what happens behind the hood

I am linking against a library that is built with OpenMPI support for internal processes.
My application is being built with no MPI support, and I link against this library. I have no idea what is happening behind the hood with regards to mpi. If library A loads/calls functions from openmpi, does this mean I can run my application with runmpi to get the library processes to distribute themselves? If I decide to make the application MPI aware and want to use mpich2 instead of OpenMPI or if I want to use Library B that is linked against mpich instead of openmpi, will the library and my application behave themselves in their individual message spaces? Is it typical to force application developers to explicitly link an mpi implementation to use an mpi-enabled library?
The normal practice is that the developer ultimately compiles and links everything against a single MPI implementation. Where there are libraries depending on MPI, I typically see builds of the library for each available MPI implementation on a system. If you could somehow manage to link in two MPI implementations, and come up with their separate MPI_Init and MPI_COMM_WORLD definitions to use in the separate pieces of code, it might even work. That would be really tenuous, though. Don't do it.
As for your earlier question, it is almost possible to have an application call into a library using MPI and just have it do the right thing. First, the code will have to call MPI_Init somewhere. whether that's in the client or wholly encapsulated in the library may vary. The library will have to know what MPI communicator it's supposed to use; typically the client code would pass one in. Finally, the client code will have to take account of the fact that it will run on all MPI processes, not just one of them. So if it does any IO or other computation that should only happen in one process, then you'll need to set conditions accordingly.

Rust on grid computing

I'm looking to create Rust implementations of some small bioinformatics programs for my research. One of my main considerations is performance, and while I know that I could schedule the Rust program to run on a grid with qsub - the cluster I have access to uses Oracle's GridEngine - I'm worried that the fact that I'm not calling MPI directly will cause performance issues with the Rust program.
Will scheduling the program without using an MPI library hinder performance greatly? Should I use an MPI library in Rust, and if so, are there any known MPI libraries for Rust? I've looked for one but I haven't found anything.
I have used several supercomputing facilities (I'm an astrophysicist) and have often faced the same problem: I know C/C++ very well but prefer to work with other languages.
In general, any approach other than MPI will do, but consider that often such supercomputers have heavily optimised MPI libraries, often tailored for the specific hardware integrated in the cluster. It is difficult to tell how much the performance of your Rust programs will be affected if you do not use MPI, but the safest bet is to stay with the MPI implementation provided on the cluster.
There is no performance penalty in using a Rust wrapper around a C library like a MPI library, as the bottleneck is the time needed to transfer data (e.g. via a MPI_Send) between nodes, not the negligible cost of an additional function call. (Moreover, this is not the case for Rust: there is no additional function call, as already stated above.)
However, despite the very good FFI provided by Rust, it is not going to be easy to create MPI bindings. The problem lies in the fact that MPI is not a library, but a specification. Popular MPI libraries are OpenMPI (http://www.open-mpi.org) and MPICH (http://www.mpich.org). Each of them differs slightly in the way they implement the standard, and they usually cover such differences using C preprocessor macros. Very few FFIs are able to deal with complex macros; I don't know how Rust scores here.
As an instance, I am implementing an MPI Program in Free Pascal but I am not able to use the existing MPICH bindings (http://wiki.lazarus.freepascal.org/MPICH), as the cluster I am using provides its own MPI library and I prefer to use this one for the reason stated above. I was unable to reuse MPICH bindings, as they assumed that constants like MPI_BYTE were hardcoded integer constants. But in my case they are pointers to opaque structures that seem to be created when MPI_Init is called.
Julia bindings to MPI (https://github.com/lcw/MPI.jl) solve this problem by running C and Fortran programs during the installation that generate Julia code with the correct values for such constants. See e.g. https://github.com/lcw/MPI.jl/blob/master/deps/make_f_const.f
In my case I preferred to implement a middleware, I.e., a small C library which wraps MPI calls with a more "predictable" interface. (This is more or less what the Python and Ocaml bindings do too, see https://forge.ocamlcore.org/projects/ocamlmpi/ and http://mpi4py.scipy.org.) Things are running smoothly, so far I haven't got any problem.
Will scheduling the program without using an MPI library hinder performance greatly?
There are lots of ways to carry out parallel computing. MPI is one, and as comments to your question indicate you can call MPI from Rust with a bit of gymnastics.
But there are other approaches, like the PGAS family (Chapel, OpenSHMEM, Co-array Fortran), or alternative messaging like what Charm++ uses.
MPI is "simply" providing a (very useful, highly portable, aggressively optimized) messaging abstraction, but as long as you have some way to manage the parallelism, you can run anything on a cluster.

Howto compile MPI application in "serial" mode (without using MPI compiler)?

This question might sound a bit weird...
Imagine I have an MPI application, but I don't have a system with MPI installed.
So I want to compile the application with no MPI support (1-process, 1-thread) without modifying source code.
Is that possible?
I found somewhere a "mimic_mpi.h" wrapper which is supposed to do exactly what I want. But there were some MPI functions missing in there (e.g., MPI_Cart_create, MPI_Cart_get, etc.), so I didn't succeed.
mimic_mpi.h http://openmx.sourcearchive.com/documentation/3.2.4.dfsg-3/mimic__mpi_8h-source.html
mimic_mpi.c http://openmx.sourcearchive.com/documentation/3.2.4.dfsg-3/mimic__mpi_8c-source.html
Do you know any other approach I could use to compile MPI apps with no MPI support?
Thanks in advance!
You can run a "real" MPI application easily with a single process. In practice this even works without using mpiexec/mpirun although I'm not sure if that's officially supported. That said a full and confirming 1-process MPI "serial" implementation would probably become rather complex and its own library - so in that case, why not just use a real full MPI implementation?
I hope you see the circle I'm trying to draw:
If you want full MPI behavior, just use an MPI implementation - regardless if it's just limited to a single process.
In practice, applications that want to be able to function with or without MPI often seem to use their own MPI abstractions using domain specific communication wrappers, #ifdef HAVE_MPI or more complex macros.

Can C/C++ software be compiled into bytecode for later execution? (Architecture independent unix software.)

I would want to compile existing software into presentation that can later be run on different architectures (and OS).
For that I need a (byte)code that can be easily run/emulated on another arch/OS (LLVM IR? Some RISC assemby?)
Some random ideas:
Compiling into JVM bytecode and running with java. Too restricting? C-compilers available?
MS CIL. C-Compilers available?
LLVM? Can Intermediate representation be run later?
Compiling into RISC arch such as MMIX. What about system calls?
Then there is the system call mapping thing, but e.g. BSD have system call translation layers.
Are there any already working systems that compile C/C++ into something that can later be run with an interpreter on another architecture?
Edit
Could I compile existing unix software into not-so-lowlevel binary, which could be "emulated" more easily than running full x86 emulator? Something more like JVM than XEN HVM.
There are several C to JVM compilers listed on Wikipedia's JVM page. I've never tried any of them, but they sound like an interesting exercise to build.
Because of its close association with the Java language, the JVM performs the strict runtime checks mandated by the Java specification. That requires C to bytecode compilers to provide their own "lax machine abstraction", for instance producing compiled code that uses a Java array to represent main memory (so pointers can be compiled to integers), and linking the C library to a centralized Java class that emulates system calls. Most or all of the compilers listed below use a similar approach.
C compiled to LLVM bit code is not platform independent. Have a look at Google portable native client, they are trying to address that.
Adobe has alchemy which will let you compile C to flash.
There are C to Java or even JavaScript compilers. However, due to differences in memory management, they aren't very usable.
Web Assembly is trying to address that now by creating a standard bytecode format for the web, but unlike the JVM bytecode, Web Assembly is more low level, working at the abstraction level of C/C++, and not Java, so it's more like what's typically called an "assembly language", which is what C/C++ code is normally compiled to.
LLVM is not a good solution for this problem. As beautiful as LLVM IR is, it is by no means machine independent, nor was it intended to be. It is very easy, and indeed necessary in some languages, to generate target dependent LLVM IR: sizeof(void*), for example, will be 4 or 8 or whatever when compiled into IR.
LLVM also does nothing to provide OS independence.
One interesting possibility might be QEMU. You could compile a program for a particular architecture and then use QEMU user space emulation to run it on different architectures. Unfortunately, this might solve the target machine problem, but doesn't solve the OS problem: QEMU Linux user mode emulation only works on Linux systems.
JVM is probably your best bet for both target and OS independence if you want to distribute binaries.
As Ankur mentions, C++/CLI may be a solution. You can use Mono to run it on Linux, as long as it has no native bits. But unless you already have a code base you are trying to port at minimal cost, maybe using it would be counter productive. If it makes sense in your situation, you should go with Java or C#.
Most people who go with C++ do it for performance reasons, but unless you play with very low level stuff, you'll be done coding earlier in a higher level language. This in turn gives you the time to optimize so that by the time you would have been done in C++, you'll have an even faster version in whatever higher level language you choose to use.
The real problem is that C and C++ are not architecture independent languages. You can write things that are reasonably portable in them, but the compiler also hardcodes aspects of the machine via your code. Think about, for example, sizeof(long). Also, as Richard mentions, there's no OS independence. So unless the libraries you use happen to have the same conventions and exist on multiple platforms then it you wouldn't be able to run the application.
Your best bet would be to write your code in a more portable language, or provide binaries for the platforms you care about.

Resources