this is how we use MPI_Init function
int main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
…
}
why does MPI_Init use pointers to argc and argv instead of values of argv?
According to the answer stated here:
Passing arguments via command line with MPI
Most MPI implementations will remove all the mpirun-related arguments in this function so that, after calling it, you can address command line arguments as though it were a normal (non-mpirun) command execution.
i.e. after
mpirun -np 10 myapp myparam1 myparam2
argc = 7(?) because of the mpirun parameters (it also seems to add some) and the indices of myparam1 and myparam2 are unknown
but after
MPI_Init(&argc, &argv)
argc = 3 and myparam1 is at argv[1] and myparam2 is at argv[2]
Apparently this is outside the standard, but I've tested it on linux mpich and it certainly seems to be the case. Without this behaviour it would be very difficult (impossible?) to distinguish application parameters from mpirun parameters.
my guess to potentially allow to remove mpi arguments from commandline.
passing argument count by pointer allows to modify its value from the point of main.
According to OpenMPI man pages:
MPI_Init(3) man page
Open MPI accepts the C/C++ argc and argv arguments to main, but neither modifies, interprets, nor distributes them.
I'm not an expert but I believe the simple answer is that each node that you're working with is working with its own copy of the code. Passing these arguments allows each of the nodes to have access to argc and argv even though they were not passed them through the command line interface.
The original or master node that calls MPI_Init is passed these arguments. MPI_Init allows the other nodes to access them as well.
It is less overhead to just pass two pointers.
Related
minimalist example:
void someFunction(){
std::cout << "Hello World" << std::endl;
}
void (*functionPointer)();
functionPointer = someFunction;
int main(){
__asm__("call *%P0"::"m"(functionPointer):);
return 0;
}
In gcc 10.3.0, this results in the following error:
relocation truncated to fit: R_X86_64_32S against `.bss'
collect2.exe: error: ld returned 1 exit status
Any ideas?
This is unsafe - First of all, a function call has to be assumed to clobber all the call-clobbered registers, and the red-zone below RSP, so it's a huge pain to do it safely from GNU C inline asm. Calling printf in extended inline ASM shows a safe example that declares clobbers on all the relevant integer, mmx, x87, and xmm registers, and avoids the red-zone.
*%P0 makes GCC print *functionPointer instead of *functionPointer(%rip), which can't link into a PIE executable because it's a 32-bit-sign-extended absolute addressing mode. 32-bit absolute addresses no longer allowed in x86-64 Linux?
Remember, you're doing a memory-indirect jump so you want a normal data addressing mode, not the bare symbol name. So you want just *%0, exactly like if it might be a register so you could let GCC emit call *%rax if it wanted to, for an "rm" constraint.
https://godbolt.org/z/hWeexz8cf
%P only makes sense when you want a direct call, like asm("call %P0" : : "i"(callee));, e.g. call callee not call *callee.
Currently I have a Python program (serial) that calls a C executable (parallel through MPI) through subprocess.run. However, this is a terribly clunky implementation as it means I have to pass some very large arrays back and forth from the Python to the C program using the file system. I would like to be able to directly pass the arrays from Python to C and back. I think ctypes is what I should use. As I understand it, I would create a dll instead of an executable from my C code to be able to use it with Python.
However, to use MPI you need to launch the program using mpirun/mpiexec. This is not possible if I am simply using the C functions from a dll, correct?
Is there a good way to enable MPI for the function called from the dll? The two possibilities I've found are
launch the python program in parallel using mpi4py, then pass MPI_COMM_WORLD to the C function (per this post How to pass MPI information to ctypes in python)
somehow initialize and spawn processes inside the function without using mpirun. I'm not sure if this is possible.
One possibility, if you are OK with passing everything through the c program rank 0, is to use subprocess.Popen() with stdin=subprocess.PIPE and the communicate() function on the python side and fread() on the c side.
This is obviously fragile, but does keep everything in memory. Also, if your data size is large (which you said it was) you may have to write the data to the child process in chunk. Another option could be to use exe.stdin.write(x) rather than exe.communicate(x)
I created a small example program
c code (program named child):
#include "mpi.h"
#include "stdio.h"
int main(int argc, char *argv[]){
MPI_Init(&argc, &argv);
int size, rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
double ans;
if(rank == 0){
fread(&ans, sizeof(ans), 1, stdin);
}
MPI_Bcast(&ans, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
printf("rank %d of %d received %lf\n", rank, size, ans);
MPI_Finalize();
}
python code (named driver.py):
#!/usr/bin/env python
import ctypes as ct
import subprocess as sp
x = ct.c_double(3.141592)
exe = sp.Popen(['mpirun', '-n', '4', './child'], stdin=sp.PIPE)
exe.communicate(x)
x = ct.c_double(101.1)
exe = sp.Popen(['mpirun', '-n', '4', './child'], stdin=sp.PIPE)
exe.communicate(x)
results:
> python ./driver.py
rank 0 of 4 received 3.141592
rank 1 of 4 received 3.141592
rank 2 of 4 received 3.141592
rank 3 of 4 received 3.141592
rank 0 of 4 received 101.100000
rank 2 of 4 received 101.100000
rank 3 of 4 received 101.100000
rank 1 of 4 received 101.100000
I tried using MPI_Comm_connect() and MPI_Comm_accept() through mpi4py, but I couldn't seem to get that working on the python side.
Since most of the time is spent in the C subroutine which is invoked multiple times, and you are running within a resource manager, I would suggest the following approach :
Start all the MPI tasks at once via the following command (assuming you have allocated n+1 slots
mpirun -np 1 python wrapper.py : -np <n> a.out
You likely want to start with a MPI_Comm_split() in order to generate a communicator only for the n tasks implemented by the C program.
Then you will define a "protocol" so the python wrapper can pass parameters to the C tasks, and wait for the result or direct the C program to MPI_Finalize().
You might as well consider using an intercommunicator (first group is for python, second group is for C) but this is really up to you. Intercommunicator semantic can be seen as non intuitive, so make sure you understand how this works if you want to go into that direction.
I'll use a simple specific example to illustrate what I'm trying to do.
file main.c:
#include <stdio.h>
unsigned int X;
int main()
{
printf("&X = 0x%zX\r\n", &X);
return 0;
}
I want to know if it's possible (using a linker-script/gcc options) to manually specify an address for X at compile/link time, because I know it lies somewhere in memory, outside my executable.
I only want to know if this is possible, I know I can use a pointer (i.e. unsigned int*) to access a specific memory location (r/w) but that's not what I'm after.
What I'm after is making GCC generate code in which all accesses to global variables/static function variables are either done through a level of indirection, i.e. through a pointer (-fPIC not good enough because static global vars are not accessed via GOT) or their addresses can be manually specified (at link/compile time).
Thank you
What I'm after is making GCC generate code in which all accesses to
global variables/static function variables … their addresses can be
manually specified (at link/compile time).
You can specify the addresses of the .bss and .data sections (which contain the uninitialized and initialized variables respectively) with linker commands. The relative placement of the variables in the sections is up to the compiler/linker.
If you need only individual variables to be placed, this can be done by declaring them extern and specifying their addresses in a file, e. g. addresses.ld:
X = 0x12345678;
(note: spaces around = needed), which is added to the compiler/linker arguments:
cc main.c addresses.ld
Is it possible to have one function to wrap both MPI_Init and MPI_Init_thread? The purpose of this is to have a cleaner API while maintaining backward compatibility. What happens to a call to MPI_Init_thread when it is not supported by the MPI run time? How do I keep my wrapper function working for MPI implementations when MPI_Init_thread is not supported?
MPI_INIT_THREAD is part of the MPI-2.0 specification, which was released 15 years ago. Virtually all existing MPI implementations are MPI-2 compliant except for some really archaic ones. You might not get the desired level of thread support, but the function should be there and you should still be able to call it instead of MPI_INIT.
You best and most portable option is to have a configure-like mechanism probe for MPI_Init_thread in the MPI library, e.g. by trying to compile a very simple MPI program and see if it fails with an unresolved symbol reference, or you can directly examine the export table of the MPI library with nm (for archives) or objdump (for shared ELF objects). Once you've determined that the MPI library has MPI_Init_thread, you can have a preprocessor symbol defined, e.g. CONFIG_HAS_INITTHREAD. Then have your wrapped similar to this one:
int init_mpi(int *pargc, char ***pargv, int desired, int *provided)
{
#if defined(CONFIG_HAS_INITTHREAD)
return MPI_Init_thread(pargc, pargv, desired, provided);
#else
*provided = MPI_THREAD_SINGLE;
return MPI_Init(pargc, pargv);
#endif
}
Of course, if the MPI library is missing MPI_INIT_THREAD, then MPI_THREAD_SINGLE and the other thread support level constants will also not be defined in mpi.h, so you might need to define them somewhere.
my program is as follows:
module x
use mpi !x includes mpi module
implicit none
...
contains
subroutine do_something_with_mpicommworld
!use mpi !uncommenting this makes a difference (****)
call MPI_...(MPI_COMM_WORLD,...,ierr)
end subroutine
...
end module x
program main
use mpi
use x
MPI_INIT(...)
call do_something_with_mpicommworld
end program main
This program fails with the following error: MPI_Cart_create(199): Invalid communicator, unless
the line marked with (**) is uncommented.
Now, maybe my knowledge of Fortran 90 is incomplete, but i thought if you have a use clause in the module definition (see my module x), whichever global variable exists in the included module (in case of x : MPI_COMM_WORLD from include module mpi) will have the same value in any of the contained subroutines ( do_something_with_mpicommworld ) even when those subroutines do not explicitly include the module (e.g. when (**) is commented out). Or, to put it simply, if you include a module within another module, the subroutines contained in the second module will have access to the globals in the included module without a special use statement.
When I ran my programme, I saw a different behaviour. The sub contained in x was creating errors unless it had the 'use mpi' statement.
So what is the problem, do I have a wrong idea about Fortran 90, or is there something special about MPI module which induces such behaviour?
Its annoyingly hard to find exact details about what should and shouldn't happen in these cases, and my expectation was the same as yours -- the `use mpi' should work as above. So I tried the following:
module hellompi
use mpi
implicit none
contains
subroutine hello
integer :: ierr, nprocs, rank
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
print *, 'Hello world, from ', rank, ' of ', nprocs
print *, MPI_COMM_WORLD
call MPI_FINALIZE(ierr)
return
end subroutine hello
end module hellompi
and it works fine under both gfortran and ifort with OpenMPI. Adding a cart_create doesn't change anything.
What strikes me as weird with your case is that it isn't complaining that MPI_COMM_WORLD isn't defined -- so obviously some of the relevant information is being propagated to the subroutine. Can you post a simpler full example which still fails to work?
Thank you Johnatan for your answer. The problem was really, really simple. I added the subroutine in question after the "end module"
:-D, 'implicit none' did not apply to now external sub and compiler happily initialised a brand new variable MPI_COMM_WORLD to whatever it thought suitable following the standard implicit rules.
This is just a lesson to me to enforce 'implicit none' not only by keywords, but also via the compiler flag. Evil lurks after every end statement.
I'm sorry you went trough the trouble of making the test example, I'd buy you a beer if I could :-)