I am writing a machine learning toolkit to run algorithm with different settings in parallel (each process run the algorithm for one setting). I am thinking about either to use mpi4py or python's build-in multiprocessing ?
There are a few pros and cons I am considering about.
Easy-to-use:
mpi4py: It seems more concepts to learn and a bit more tricks to make it work well
multiprocessing: quite easy and clean API
Speed:
mpi4py: people say it is more low level, so I am expect it can be faster than python multiprocessing ?
multiprocessing: compared with mpi4py, much slower ?
Clean and short code:
mpi4py: seems more code to write
multiprocessing: preferred, easy to use API
The working context is I am aiming at running the code basically in one computer or a GPU server. Not really targeting at running in different machines in the network (which only MPI can do it).
And since the main goal is doing machine learning, so the parallelization is not really required to be very optimal, the key goal I want to have is to balance easy, clean and quick to maintain code base but at the same time like to exploit the benefits of parallelization.
With the background described above, is it recommended that using multiprocessing should just be enough ? Or is there a very strong reason to use mpi4py ?
By using mpi4py you can divide the task into multiple threads, but with a single computer with limited performance or number of cores the usability will be limited. However you might find it handy during training.
mpi4py is constructed on top of the MPI-1/2 specifications and provides an object oriented interface which closely follows MPI-2 C++ bindings.
MPI for Python provides MPI bindings for the Python language, allowing programmers to exploit multiple processor computing systems.
MPI for Python supports convenient, pickle-based communication of generic Python object as well as fast, near C-speed, direct array data communication of buffer-provider objects
Can you send MPI messages across two libraries?
Example - If I have MS-MPI installed on a windows machine and then Open-MPI on a linux cluster (Same Network), can I get messages between code running on the two different OS.
Basically multiple program multiple data MPMD but using Windows and Linux resources.
Just need to know if this is possible, and if so any info would be nice, but not needed.
No, that is not possible. You will even get into great trouble if you try this with different versions or configurations of one implementation.
MPI is targeted at homogeneous installations on HPC systems. The communication protocol is not standardized, only the programming interface is.
In general: no. This is not specified within the MPI standard, and most implementations do not support such jobs. It is a rather uncommon use case, I guess.
However, Intel MPI does provide cross-os launching of jobs, see:
https://software.intel.com/en-us/mpi-developer-guide-linux-cross-os-launch-mode
Some detail:
Intel MPI, as well as other commercial MPI implementations are based on the MPICH open source project. I am not sure if cross-os launching can be achieved somehow via MPICH, though. A quick google research only provided negative, but possibly outdated results, e.g.
https://lists.mcs.anl.gov/pipermail/mpich2-dev/2005-July/000085.html
I'm looking to create Rust implementations of some small bioinformatics programs for my research. One of my main considerations is performance, and while I know that I could schedule the Rust program to run on a grid with qsub - the cluster I have access to uses Oracle's GridEngine - I'm worried that the fact that I'm not calling MPI directly will cause performance issues with the Rust program.
Will scheduling the program without using an MPI library hinder performance greatly? Should I use an MPI library in Rust, and if so, are there any known MPI libraries for Rust? I've looked for one but I haven't found anything.
I have used several supercomputing facilities (I'm an astrophysicist) and have often faced the same problem: I know C/C++ very well but prefer to work with other languages.
In general, any approach other than MPI will do, but consider that often such supercomputers have heavily optimised MPI libraries, often tailored for the specific hardware integrated in the cluster. It is difficult to tell how much the performance of your Rust programs will be affected if you do not use MPI, but the safest bet is to stay with the MPI implementation provided on the cluster.
There is no performance penalty in using a Rust wrapper around a C library like a MPI library, as the bottleneck is the time needed to transfer data (e.g. via a MPI_Send) between nodes, not the negligible cost of an additional function call. (Moreover, this is not the case for Rust: there is no additional function call, as already stated above.)
However, despite the very good FFI provided by Rust, it is not going to be easy to create MPI bindings. The problem lies in the fact that MPI is not a library, but a specification. Popular MPI libraries are OpenMPI (http://www.open-mpi.org) and MPICH (http://www.mpich.org). Each of them differs slightly in the way they implement the standard, and they usually cover such differences using C preprocessor macros. Very few FFIs are able to deal with complex macros; I don't know how Rust scores here.
As an instance, I am implementing an MPI Program in Free Pascal but I am not able to use the existing MPICH bindings (http://wiki.lazarus.freepascal.org/MPICH), as the cluster I am using provides its own MPI library and I prefer to use this one for the reason stated above. I was unable to reuse MPICH bindings, as they assumed that constants like MPI_BYTE were hardcoded integer constants. But in my case they are pointers to opaque structures that seem to be created when MPI_Init is called.
Julia bindings to MPI (https://github.com/lcw/MPI.jl) solve this problem by running C and Fortran programs during the installation that generate Julia code with the correct values for such constants. See e.g. https://github.com/lcw/MPI.jl/blob/master/deps/make_f_const.f
In my case I preferred to implement a middleware, I.e., a small C library which wraps MPI calls with a more "predictable" interface. (This is more or less what the Python and Ocaml bindings do too, see https://forge.ocamlcore.org/projects/ocamlmpi/ and http://mpi4py.scipy.org.) Things are running smoothly, so far I haven't got any problem.
Will scheduling the program without using an MPI library hinder performance greatly?
There are lots of ways to carry out parallel computing. MPI is one, and as comments to your question indicate you can call MPI from Rust with a bit of gymnastics.
But there are other approaches, like the PGAS family (Chapel, OpenSHMEM, Co-array Fortran), or alternative messaging like what Charm++ uses.
MPI is "simply" providing a (very useful, highly portable, aggressively optimized) messaging abstraction, but as long as you have some way to manage the parallelism, you can run anything on a cluster.
The makeCluster function for the SNOW package has the different cluster types of "SOCK", "PVM", "MPI", and "NWS" but I'm not very clear on the differences among them, and more specifically which would be best for my program.
Currently I have a queue of tasks of different length going into a load balancing cluster with clusterApplyLB and am using a 64bit 32-core Windows machine.
I am looking for a brief description of the differences among the four cluster types, which would be best for my use and why.
Welcome to parallel programming. You may want to peruse the vignette of the excellent parallel package that comes with R as it gives a general introduction. It also gives you an idea of what you can or cannot do on Windows -- in short, PVM and MPI are standard parallel programming approaches supported by namesake libraries. These exists on Windows, but are less frequently used and often not as mature as their Unix counterparts.
If you want to stick with snow, your options are essentially limited to SOCK types clusters. Again, the package documentation will have pointers.
Can someone elaborate the differences between the OpenMPI and MPICH implementations of MPI ?
Which of the two is a better implementation ?
Purpose
First, it is important to recognize how MPICH and Open-MPI are different, i.e. that they are designed to meet different needs. MPICH is supposed to be high-quality reference implementation of the latest MPI standard and the basis for derivative implementations to meet special purpose needs. Open-MPI targets the common case, both in terms of usage and network conduits.
Support for Network Technology
Open-MPI documents their network support here. MPICH lists this information in the README distributed with each version (e.g. this is for 3.2.1). Note that because both Open-MPI and MPICH support the OFI (aka libfabric) networking layer, they support many of the same networks. However, libfabric is a multi-faceted API, so not every network may be supported the same in both (e.g. MPICH has an OFI-based IBM Blue Gene/Q implementation, but I'm not aware of equivalent support in Open-MPI). However, the OFI-based implementations of both MPICH and Open-MPI are working on shared-memory, Ethernet (via TCP/IP), Mellanox InfiniBand, Intel Omni Path, and likely other networks. Open-MPI also supports both of these networks and others natively (i.e. without OFI in the middle).
In the past, a common complaint about MPICH is that it does not support InfiniBand, whereas Open-MPI does. However, MVAPICH and Intel MPI (among others) - both of which are MPICH derivatives - support InfiniBand, so if one is willing to define MPICH as "MPICH and its derivatives", then MPICH has extremely broad network support, including both InfiniBand and proprietary interconnects like Cray Seastar, Gemini and Aries as well as IBM Blue Gene (/L, /P and /Q). Open-MPI also supports the Cray Gemini interconnect, but its usage is not supported by Cray. More recently, MPICH supported InfiniBand through a netmod (now deprecated), but MVAPICH2 has extensive optimizations that make it the preferred implementation in nearly all cases.
Feature Support from the Latest MPI Standard
An orthogonal axis to hardware/platform support is coverage of the MPI standard. Here MPICH is usually far and away superior. MPICH has been the first implementation of every single release of the MPI standard, from MPI-1 to MPI-3. Open-MPI has only recently supported MPI-3 and I find that some MPI-3 features are buggy on some platforms (MPICH is not bug-free, of course, but bugs in MPI-3 features have been far less common).
Historically, Open-MPI has not had holistic support for MPI_THREAD_MULTIPLE, which is critical for some applications. It might be supported on some platforms but cannot generally be assumed to work. On the other hand, MPICH has had holistic support for MPI_THREAD_MULTIPLE for many years, although the implementation is not always high-performance (see "Locking Aspects in Multithreaded MPI Implementations" for one analysis).
Another feature that was broken in Open-MPI 1.x was one-sided communication, aka RMA. This has more recently been fixed and I find, as a very heavy user of these features, that they are generally working well in Open-MPI 3.x (see e.g. the ARMCI-MPI test matrix in Travis CI for results showing RMA working with both implementations, at least in shared-memory. I've seen similar positive results on Intel Omni Path, but have not tested Mellanox InfiniBand.
Process Management
One area where Open-MPI used to be significantly superior was the process manager. The old MPICH launch (MPD) was brittle and hard to use. Fortunately, it has been deprecated for many years (see the MPICH FAQ entry for details). Thus, criticism of MPICH because of MPD is spurious.
The Hydra process manager is quite good and has the similar usability and feature set as ORTE (in Open-MPI), e.g. both support HWLOC for control over process topology. There are reports of Open-MPI process launching being faster than MPICH-derivatives for larger jobs (1000+ processes), but since I don't have firsthand experience here, I am not comfortable stating any conclusions. Such performance issues are usually network-specific and sometimes even machine-specific.
I have found Open-MPI to be more robust when using MacOS with a VPN, i.e. MPICH may hang in startup due to hostname resolution issues. As this is a bug, this issue may disappear in the future.
Binary Portability
While both MPICH and Open-MPI are open-source software that can be compiled on a wide range of platforms, the portability of MPI libraries in binary form, or programs linked against them, is often important.
MPICH and many of its derivatives support ABI compatibility (website), which means that the binary interface to the library is constant and therefore one can compile with mpi.h from one implementation and then run with another. This is true even across multiple versions of the libraries. For example, I frequently compile Intel MPI but LD_PRELOAD a development version of MPICH at runtime. One of the big advantages of ABI compatibility is that ISVs (Independent Software Vendors) can release binaries compiled against only one member of the MPICH family.
ABI is not the only type of binary compatibility. The scenarios described above assume that users employ the same version of the MPI launcher (usually mpirun or mpiexec, along with its compute-node daemons) and MPI library everywhere. This is not necessarily the case for containers.
While Open-MPI does not promise ABI compatibility, they have invested heavily in supporting containers (docs, slides). This requires great care in maintaining compatibility across different versions of the MPI launcher, launcher daemons, and MPI Library, because a user may launch jobs using a newer version of the MPI launcher than the launcher daemons in the container support. Without careful attention to launcher interface stability, container jobs will not launch unless the versions of each component of the launcher are compatible. This is not an insurmountable problem:
The workaround used by the Docker world, for example, is to containerize the infrastructure along with the application. In other words, you include the MPI daemon in the container with the application itself, and then require that all containers (mpiexec included) be of the same version. This avoids the issue as you no longer have cross-version infrastructure operations.
I acknowledge Ralph Castain of the Open-MPI team for explaining the container issues to me. The immediately preceding quote is his.
Platform-Specific Comparison
Here is my evaluation on a platform-by-platform basis:
Mac OS: both Open-MPI and MPICH should work just fine. To get the latest features of the MPI-3 standard, you need to use a recent version of Open-MPI, which is available from Homebrew. There is no reason to think about MPI performance if you're running on a Mac laptop.
Linux with shared-memory: both Open-MPI and MPICH should work just fine. If you want a release version that supports all of MPI-3 or MPI_THREAD_MULTIPLE, you probably need MPICH though, unless you build Open-MPI yourself, because e.g. Ubuntu 16.04 only provides the ancient version 1.10 via APT. I am not aware of any significant performance differences between the two implementations. Both support single-copy optimizations if the OS allows them.
Linux with Mellanox InfiniBand: use Open-MPI or MVAPICH2. If you want a release version that supports all of MPI-3 or MPI_THREAD_MULTIPLE, you likely need MVAPICH2 though. I find that MVAPICH2 performs very well but haven't done a direct comparison with OpenMPI on InfiniBand, in part because the features for which performance matters most to me (RMA aka one-sided) have been broken in Open-MPI in the past.
Linux with Intel Omni Path (or its predecessor, True Scale): I have use MVAPICH2, Intel MPI, MPICH and Open-MPI on such systems, and all are working. Intel MPI tends to the most optimized while Open-MPI delivered the best performance of the open-source implementations because they have a well-optimized PSM2-based back-end. I have some notes on GitHub on how to build different open-source implementations, but such information goes stale rather quickly.
Cray or IBM supercomputers: MPI comes installed on these machines automatically and it is based upon MPICH in both cases. There have been demonstrations of MPICH on Cray XC40 (here) using OFI, Intel MPI on Cray XC40 (here) using OFI, MPICH on Blue Gene/Q using OFI (here), and Open-MPI on Cray XC40 using both OFI and uGNI (here), but none of these are vendor supported.
Windows: I see no point in running MPI on Windows except through a Linux VM, but both Microsoft MPI and Intel MPI support Windows and are MPICH-based. I have heard reports of successful builds of MPICH or Open-MPI using Windows Subsystem for Linux but have no personal experience.
Notes
In full disclosure, I currently work for Intel in a research/pathfinding capacity (i.e. I do not work on any Intel software products) and formerly worked for Argonne National Lab for five years, where I collaborated extensively with the MPICH team.
If you do development rather than production system, go with MPICH. MPICH has built-in debugger, while Open-MPI does not last time I checked.
In production, Open-MPI most likely will be faster. But then you may want to research other alternatives, such as Intel MPI.
I concur with the previous poster. Try both to see which one your application runs faster on then use it for production. They are both standards compliant. If it is your desktop either is fine. OpenMPI comes out of the box on Macbooks, and MPICH seems to be more Linux/Valgrind friendly. It is between you and your toolchain.
If it is a production cluster you need to do more extensive benchmarking to make sure it is optimized to your network topology. Configuring it on a production cluster will be the main difference in terms of your time as you will have to RTFM.
Both are standards-compliant, so it shouldn't matter which you use from a correctness point of view. Unless there is some feature, such as specific debug extensions, that you need, then benchmark both and pick whichever is faster for your apps on your hardware. Also consider that there are other MPI implementations that might give better performance or compatibility, such as MVAPICH (can have the best InfiniBand performance) or Intel MPI (widely supported ISVs). HP worked hard to get their MPI qualified with lots of ISV codes too, but I'm not sure how it is faring after being sold on to Platform...
From my experience one good feature that OpenMPI supports but MPICH does not is process affinity. For example, in OpenMPI, using -npersocket you can set the number of ranks launched on each socket. Also, OpenMPI's rankfile is quite handy when you want to pinpoint ranks to cores or oversubscribe them.
Last, if you need to control the mapping of ranks to cores, I would definitely suggest writing and compiling your code using OpenMPI.