Related
I do not want to use mpiexec -n 4 ./a.out to run my program on my core i7 processor (with 4 cores). Instead, I want to run ./a.out, have it detect the number of cores and fire up MPI to run a process per core.
This SO question and answer MPI Number of processors? led me to use mpiexec.
The reason I want to avoid mpiexec is because my code is destined to be a library inside a larger project I'm working on. The larger project has a GUI and the user will be starting long computations that will call my library, which will in turn use MPI. The integration between the UI and the computation code is not trivial... so launching an external process and communicating via a socket or some other means is not an option. It must be a library call.
Is this possible? How do I do it?
This is quite a nontrivial thing to achieve in general. Also, there is hardly any portable solution that does not depend on some MPI implementation specifics. What follows is a sample solution that works with Open MPI and possibly with other general MPI implementations (MPICH, Intel MPI, etc.). It involves a second executable or a means for the original executable to directly call you library provided some special command-line argument. It goes like this.
Assume the original executable was started simply as ./a.out. When your library function is called, it calls MPI_Init(NULL, NULL), which initialises MPI. Since the executable was not started via mpiexec, it falls back to the so-called singleton MPI initialisation, i.e. it creates an MPI job that consists of a single process. To perform distributed computations, you have to start more MPI processes and that's where things get complicated in the general case.
MPI supports dynamic process management, in which one MPI job can start a second one and communicate with it using intercommunicators. This happens when the first job calls MPI_Comm_spawn or MPI_Comm_spawn_multiple. The first one is used to start simple MPI jobs that use the same executable for all MPI ranks while the second one can start jobs that mix different executables. Both need information as to where and how to launch the processes. This comes from the so-called MPI universe, which provides information not only about the started processes, but also about the available slots for dynamically started ones. The universe is constructed by mpiexec or by some other launcher mechanism that takes, e.g., a host file with list of nodes and number of slots on each node. In the absence of such information, some MPI implementations (Open MPI included) will simply start the executables on the same node as the original file. MPI_Comm_spawn[_multiple] has an MPI_Info argument that can be used to supply a list of key-value paris with implementation-specific information. Open MPI supports the add-hostfile key that can be used to specify a hostfile to be used when spawning the child job. This is useful for, e.g., allowing the user to specify via the GUI a list of hosts to use for the MPI computation. But let's concentrate on the case where no such information is provided and Open MPI simply runs the child job on the same host.
Assume the worker executable is called worker. Or that the original executable can serve as worker if called with some special command-line option, -worker for example. If you want to perform computation with N processes in total, you need to launch N-1 workers. This is simple:
(separate executable)
MPI_Comm child_comm;
MPI_Comm_spawn("./worker", MPI_ARGV_NULL, N-1, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);
(same executable, with an option)
MPI_Comm child_comm;
char *argv[] = { "-worker", NULL };
MPI_Comm_spawn("./a.out", argv, N-1, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);
If everything goes well, child_comm will be set to the handle of an intercommunicator that can be used to communicate with the new job. As intercommunicators are kind of tricky to use and the parent-child job division requires complex program logic, one could simply merge the two sides of the intercommunicator into a "big world" communicator that replaced MPI_COMM_WORLD. On the parent's side:
MPI_Comm bigworld;
MPI_Intercomm_merge(child_comm, 0, &bigworld);
On the child's side:
MPI_Comm parent_comm, bigworld;
MPI_Get_parent(&parent_comm);
MPI_Intercomm_merge(parent_comm, 1, &bigworld);
After the merge is complete, all processes can communicate using bigworld instead of MPI_COMM_WORLD. Note that child jobs do not share their MPI_COMM_WORLD with the parent job.
To put it all together, here is a complete functioning example with two separate program codes.
main.c
#include <stdio.h>
#include <mpi.h>
int main (void)
{
MPI_Init(NULL, NULL);
printf("[main] Spawning workers...\n");
MPI_Comm child_comm;
MPI_Comm_spawn("./worker", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);
MPI_Comm bigworld;
MPI_Intercomm_merge(child_comm, 0, &bigworld);
int size, rank;
MPI_Comm_rank(bigworld, &rank);
MPI_Comm_size(bigworld, &size);
printf("[main] Big world created with %d ranks\n", size);
// Perform some computation
int data = 1, result;
MPI_Bcast(&data, 1, MPI_INT, 0, bigworld);
data *= (1 + rank);
MPI_Reduce(&data, &result, 1, MPI_INT, MPI_SUM, 0, bigworld);
printf("[main] Result = %d\n", result);
MPI_Barrier(bigworld);
MPI_Comm_free(&bigworld);
MPI_Comm_free(&child_comm);
MPI_Finalize();
printf("[main] Shutting down\n");
return 0;
}
worker.c
#include <stdio.h>
#include <mpi.h>
int main (void)
{
MPI_Init(NULL, NULL);
MPI_Comm parent_comm;
MPI_Comm_get_parent(&parent_comm);
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("[worker] %d of %d here\n", rank, size);
MPI_Comm bigworld;
MPI_Intercomm_merge(parent_comm, 1, &bigworld);
MPI_Comm_rank(bigworld, &rank);
MPI_Comm_size(bigworld, &size);
printf("[worker] %d of %d in big world\n", rank, size);
// Perform some computation
int data;
MPI_Bcast(&data, 1, MPI_INT, 0, bigworld);
data *= (1 + rank);
MPI_Reduce(&data, NULL, 1, MPI_INT, MPI_SUM, 0, bigworld);
printf("[worker] Done\n");
MPI_Barrier(bigworld);
MPI_Comm_free(&bigworld);
MPI_Comm_free(&parent_comm);
MPI_Finalize();
return 0;
}
Here is how it works:
$ mpicc -o main main.c
$ mpicc -o worker worker.c
$ ./main
[main] Spawning workers...
[worker] 0 of 2 here
[worker] 1 of 2 here
[worker] 1 of 3 in big world
[worker] 2 of 3 in big world
[main] Big world created with 3 ranks
[worker] Done
[worker] Done
[main] Result = 6
[main] Shutting down
The child job has to use MPI_Comm_get_parent to obtain the intercommunicator to the parent job. When a process is not part of such a child job, the returned value will be MPI_COMM_NULL. This allows for an easy way to implement both the main program and the worker in the same executable. Here is a hybrid example:
#include <stdio.h>
#include <mpi.h>
MPI_Comm bigworld_comm = MPI_COMM_NULL;
MPI_Comm other_comm = MPI_COMM_NULL;
int parlib_init (const char *argv0, int n)
{
MPI_Init(NULL, NULL);
MPI_Comm_get_parent(&other_comm);
if (other_comm == MPI_COMM_NULL)
{
printf("[main] Spawning workers...\n");
MPI_Comm_spawn(argv0, MPI_ARGV_NULL, n-1, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &other_comm, MPI_ERRCODES_IGNORE);
MPI_Intercomm_merge(other_comm, 0, &bigworld_comm);
return 0;
}
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("[worker] %d of %d here\n", rank, size);
MPI_Intercomm_merge(other_comm, 1, &bigworld_comm);
return 1;
}
int parlib_dowork (void)
{
int data = 1, result = -1, size, rank;
MPI_Comm_rank(bigworld_comm, &rank);
MPI_Comm_size(bigworld_comm, &size);
if (rank == 0)
{
printf("[main] Doing work with %d processes in total\n", size);
data = 1;
}
MPI_Bcast(&data, 1, MPI_INT, 0, bigworld_comm);
data *= (1 + rank);
MPI_Reduce(&data, &result, 1, MPI_INT, MPI_SUM, 0, bigworld_comm);
return result;
}
void parlib_finalize (void)
{
MPI_Comm_free(&bigworld_comm);
MPI_Comm_free(&other_comm);
MPI_Finalize();
}
int main (int argc, char **argv)
{
if (parlib_init(argv[0], 4))
{
// Worker process
(void)parlib_dowork();
printf("[worker] Done\n");
parlib_finalize();
return 0;
}
// Main process
// Show GUI, save the world, etc.
int result = parlib_dowork();
printf("[main] Result = %d\n", result);
parlib_finalize();
printf("[main] Shutting down\n");
return 0;
}
And here is an example output:
$ mpicc -o hybrid hybrid.c
$ ./hybrid
[main] Spawning workers...
[worker] 0 of 3 here
[worker] 2 of 3 here
[worker] 1 of 3 here
[main] Doing work with 4 processes in total
[worker] Done
[worker] Done
[main] Result = 10
[worker] Done
[main] Shutting down
Some things to keep in mind when designing such parallel libraries:
MPI can only be initialised once. If necessary, call MPI_Initialized to check if the library has already been initialised.
MPI can only be finalized once. Again, MPI_Finalized is your friend. It can be used in something like an atexit() handler to implement a universal MPI finalisation on program exit.
When used in threaded contexts (usual when GUIs are involved), MPI must be initialised with support for threads. See MPI_Init_thread.
You can get number of CPUs by using for example this solution, and then start the MPI process by calling MPI_comm_spawn. But you will need to have a separate executable file.
i am trying these logic code, and i do not know what is different between them. I am trying to use MPI_Send() and MPI_Recv() in my program. As i understand, in MPI, processess communicate via their ranks of each processor, and tags of each message. So , what is different if i try
Logic 1:
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
if (world_rank == 0) {
number = -1;
MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n",
number);
}
Logic 2:
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
if (world_rank == 0) {
number = -1;
int i =0;
for(i = 1 ; i< world_size;i++){
MPI_Send(&number, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
}
}else{
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n",
number);
}
I try: mpirun -np 100 ./test <arguments>
Logic 1: after few minutes, my computer hangs on.
Logic 2: it works, and print 100 lines: Process kali received number -1 from process 0
I think both logic will get rank of process, and parse it to parameter on MPI_Send. What is different???
I am working on Debian Kali Linux, with OpenMPI 1.8.
I am new to MPI. Thanks for help.
It is strongly different.
On the one hand, in Logic 1, a single message is sent from process 0 to process 1. On the other hand, in Logic 2, world_size-1 messages are sent by process 0 and each remaining process receives one message from 0. The second case could be replaced by a call to MPI_Bcast().
Had you tried mpirun -np 2 ./test <arguments> these code would have done the same thing...but it is the only case !
Both the code abstracts seem correct. The failure in the first case may be due to the fact that the integer number is not initialized on processes 2 to world_size. For instance, if number is the length of an array, it can trigger a segmentation fault. If number is part of a stopping condition in a for loop, it can trigger an infine loop (or a very long one).
Edited
As we are missing the full source code, it is not possible to determine whether the initialization/finalization functions were properly used.
Here we have the same source code form the initial answer + what's required to properly run an mpi app:
#include <stdio.h>
#include <mpi.h>
int
main (int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
if (world_rank == 0) {
number = -1;
MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n",
number);
}
MPI_Finalize();
}
Compile:
> mpicc -std=c99 -O2 -g -Wall -I. -o app app.c -lm
Run:
> mpirun -n 10 app
Process 1 received number -1 from process 0
>
Basically everything is working fine, so I guess the problem could be related to initialization/finalization.
Initial Response
Your application is hanging-up with logic 1 because there are 99 processes waiting for a message, but the master process is only sending the message to process identified by rank 1.
As you're using a blocking function (i.e. MPI_Send vs. MPI_Isend), there are 98 processes waiting forever until a message arrives from process ranked with 0, tag=0, and communicator MPI_COMM_WORLD.
In this mpi programme only works slave nodes. How to modify it to work master also. Because working of the master also improve the performance of the system.
int A,B,C, slaveid,recvid,root, rank,size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
/*-------------------------- master ------------------------------*/
if(rank == 0){
N =10;
for(slaveid=1; slaveid<size; slaveid++){
MPI_Send(&N, 1, MPI_INT, slaveid, 1, MPI_COMM_WORLD);
}
for(recvid=1; recvid<size; recvid++){
MPI_Recv(&A, 1, MPI_INT, recvid, 2, MPI_COMM_WORLD, &status);
printf(" My id = %d and i send = %d\n",recvid,A);
}
}
/*-------------------------- Slave ------------------------------*/
if(rank>0){
MPI_Recv(&B, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
C = B*3;
MPI_Send(&C, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
}
MPI_Finalize();
}
Within the block delimited by
if(rank == 0){
}
insert, at the appropriate location, the line
work_like_a_slave(argument1, argument2,...)
The appropriate location is probably between the loop that sends messages and the loop that receives messages so that the master isn't entirely idle while the slaves toil.
Whether this has a measurable impact on performance depends on a number of factors your question doesn't provide enough information on which to base a good guess; factors such as: how many slaves there are and therefore how busy the master is sending and receiving messages, how much work each process does compared with the messaging it does, etc.
Be prepared, if the numbers work against you, for any measurable impact to be negative, that is for pressing the master into service to actually slow down your computation.
I am having a new little problem;
I have a little pointer called:
int *a;
Now ..somewhere inside my main method I allocate some space for it using the following lines and assign a value:
a = (int *) malloc(sizeof(int));
*a=5;
..and then I attempt to transmit it (say to process 1):
MPI_Bsend(a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
On the other end, if I try to receive that pointer
int *b;
MPI_Recv(b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("This is what I received: %d \n", *b);
I get an error about the buffer!
However if instead of declaring 'b' a pointer I do the following:
int b;
MPI_Recv(&b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("This is what I received: %d \n", b);
...all seems to be good! Could someone help me figure out what's happening and how to only use pointer?
Thanks in advance!
The meaning of the line
MPI_Bsend(a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
is the following: "a is a point in memory where I have 1 integer. Send it.`
In the code you posted above, this is absolutely true: a does point to an integer, and so it is sent. This is why you can receive it using your second method, since the meaning of the line
MPI_Recv(&b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
is "receive 1 integer, and store it at &b. b is a regular int, so it's all good. In the first receive, you're trying to receive an integer into an int* variable there is no allocated memory that b is pointing to, so Recv has nowhere to write to. However, I should point out:
NEVER pass a pointer's contents to another process in MPI
MPI processes cannot read each others' memory, and virtual addressing makes one process' pointer completely meaningless to another.
This problem is related to handling pointers and allocating memory; it's not an MPI specific issue.
In your second variant, int a automatically allocates memory for one integer. By passing &a you are passing a pointer to an allocated memory segment. In your first variant, memory for the pointer is automatically allocated, but NOT for the memory the pointer is pointing to. Thus, when you pass in the pointer, MPI tries to write to non-allocated memory, which causes the error.
It would work this way though:
int *b = (int *) malloc(sizeof(int));
MPI_Recv(b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
The error you are getting is that you are copying a result from MPI_Recv in some memory *b that you don't own and isn't initialised.
Not an expert on MPI but surely you can't transfer a pointer (ie a memory address) to a process that could be running on another machine!
I want to use MPI (MPICH2) on windows. I write this command:
MPI_Barrier(MPI_COMM_WORLD);
And I expect it blocks all Processors until all group members have called it. But it is not happen. I add a schematic of my code:
int a;
if(myrank == RootProc)
a = 4;
MPI_Barrier(MPI_COMM_WORLD);
cout << "My Rank = " << myrank << "\ta = " << a << endl;
(With 2 processor:) Root processor (0) acts correctly, but processor with rank 1 doesn't know the a variable, so it display -858993460 instead of 4.
Can any one help me?
Regards
You're only assigning a in process 0. MPI doesn't share memory, so if you want the a in process 1 to get the value of 4, you need to call MPI_Send from process 0 and MPI_Recv from process 1.
Variable a is not initialized - it is possible that is why it displays that number. In MPI, variable a is duplicated between the processes - so there are two values for a, one of which is uninitialized. You want to write:
int a = 4;
if (myrank == RootProc)
...
Or, alternatively, do an MPI_send in the Root (id 0), and an MPI_recv in the slave (id 1) so the value in the root is also set in the slave.
Note: that code triggers a small alarm in my head, so I need to check something and I'll edit this with more info. Until then though, the uninitialized value is most certainly a problem for you.
Ok I've checked the facts - your code was not properly indented and I missed the missing {}. The barrier looks fine now, although the snippet you posted does not do too much, and is not a very good example of a barrier because the slave enters it directly, whereas the root will set the value of the variable to 4 and then enter it. To test that it actually works, you probably want some sort of a sleep mechanism in one of the processes - that will yield (hope it's the correct term) the other process as well, preventing it from printing the cout until the sleep is over.
Blocking is not enough, you have to send data to other processes (memory in not shared between processes).
To share data across ALL processes use:
int MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm )
so in your case:
MPI_Bcast(&a, 1, MPI_INT, 0, MPI_COMM_WORLD);
here you send one integer pointed by &a form process 0 to all other.
//MPI_Bcast is sender for root process and receiver for non-root processes
You can also send some data to specyfic process by:
int MPI_Send( void *buf, int count, MPI_Datatype datatype, int dest,
int tag, MPI_Comm comm )
and then receive by:
int MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)