In this mpi programme only works slave nodes. How to modify it to work master also. Because working of the master also improve the performance of the system.
int A,B,C, slaveid,recvid,root, rank,size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
/*-------------------------- master ------------------------------*/
if(rank == 0){
N =10;
for(slaveid=1; slaveid<size; slaveid++){
MPI_Send(&N, 1, MPI_INT, slaveid, 1, MPI_COMM_WORLD);
}
for(recvid=1; recvid<size; recvid++){
MPI_Recv(&A, 1, MPI_INT, recvid, 2, MPI_COMM_WORLD, &status);
printf(" My id = %d and i send = %d\n",recvid,A);
}
}
/*-------------------------- Slave ------------------------------*/
if(rank>0){
MPI_Recv(&B, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
C = B*3;
MPI_Send(&C, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
}
MPI_Finalize();
}
Within the block delimited by
if(rank == 0){
}
insert, at the appropriate location, the line
work_like_a_slave(argument1, argument2,...)
The appropriate location is probably between the loop that sends messages and the loop that receives messages so that the master isn't entirely idle while the slaves toil.
Whether this has a measurable impact on performance depends on a number of factors your question doesn't provide enough information on which to base a good guess; factors such as: how many slaves there are and therefore how busy the master is sending and receiving messages, how much work each process does compared with the messaging it does, etc.
Be prepared, if the numbers work against you, for any measurable impact to be negative, that is for pressing the master into service to actually slow down your computation.
Related
Suppose I have a very large array of things and I have to do some operation on all these things.
In case operation fails for one element, I want to stop the work [this work is distributed across number of processors] across all the array.
I want to achieve this while keeping the number of sent/received messages to a minimum.
Also, I don't want to block processors if there is no need to.
How can I do it using MPI?
This seems to be a common question with no easy answer. Both other answer have scalability issues. The ring-communication approach has linear communication cost, while in the one-sided MPI_Win-solution, a single process will be hammered with memory requests from all workers. This may be fine for low number of ranks, but pose issues when increasing the rank count.
Non-blocking collectives can provide a more scalable better solution. The main idea is to post a MPI_Ibarrier on all ranks except on one designated root. This root will listen to point-to-point stop messages via MPI_Irecv and complete the MPI_Ibarrier once it receives it.
The tricky part is that there are four different cases "{root, non-root} x {found, not-found}" that need to be handled. Also it can happen that multiple ranks send a stop message, requiring an unknown number of matching receives on the root. That can be solved with an additional reduction that counts the number of ranks that sent a stop-request.
Here is an example how this can look in C:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
const int iter_max = 10000;
const int difficulty = 20000;
int find_stuff()
{
int num_iters = rand() % iter_max;
for (int i = 0; i < num_iters; i++) {
if (rand() % difficulty == 0) {
return 1;
}
}
return 0;
}
const int stop_tag = 42;
const int root = 0;
int forward_stop(MPI_Request* root_recv_stop, MPI_Request* all_recv_stop, int found_count)
{
int flag;
MPI_Status status;
if (found_count == 0) {
MPI_Test(root_recv_stop, &flag, &status);
} else {
// If we find something on the root, we actually wait until we receive our own message.
MPI_Wait(root_recv_stop, &status);
flag = 1;
}
if (flag) {
printf("Forwarding stop signal from %d\n", status.MPI_SOURCE);
MPI_Ibarrier(MPI_COMM_WORLD, all_recv_stop);
MPI_Wait(all_recv_stop, MPI_STATUS_IGNORE);
// We must post some additional receives if multiple ranks found something at the same time
MPI_Reduce(MPI_IN_PLACE, &found_count, 1, MPI_INT, MPI_SUM, root, MPI_COMM_WORLD);
for (found_count--; found_count > 0; found_count--) {
MPI_Recv(NULL, 0, MPI_CHAR, MPI_ANY_SOURCE, stop_tag, MPI_COMM_WORLD, &status);
printf("Additional stop from: %d\n", status.MPI_SOURCE);
}
return 1;
}
return 0;
}
int main()
{
MPI_Init(NULL, NULL);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
srand(rank);
MPI_Request root_recv_stop;
MPI_Request all_recv_stop;
if (rank == root) {
MPI_Irecv(NULL, 0, MPI_CHAR, MPI_ANY_SOURCE, stop_tag, MPI_COMM_WORLD, &root_recv_stop);
} else {
// You may want to use an extra communicator here, to avoid messing with other barriers
MPI_Ibarrier(MPI_COMM_WORLD, &all_recv_stop);
}
while (1) {
int found = find_stuff();
if (found) {
printf("Rank %d found something.\n", rank);
// Note: We cannot post this as blocking, otherwise there is a deadlock with the reduce
MPI_Request req;
MPI_Isend(NULL, 0, MPI_CHAR, root, stop_tag, MPI_COMM_WORLD, &req);
if (rank != root) {
// We know that we are going to receive our own stop signal.
// This avoids running another useless iteration
MPI_Wait(&all_recv_stop, MPI_STATUS_IGNORE);
MPI_Reduce(&found, NULL, 1, MPI_INT, MPI_SUM, root, MPI_COMM_WORLD);
MPI_Wait(&req, MPI_STATUS_IGNORE);
break;
}
MPI_Wait(&req, MPI_STATUS_IGNORE);
}
if (rank == root) {
if (forward_stop(&root_recv_stop, &all_recv_stop, found)) {
break;
}
} else {
int stop_signal;
MPI_Test(&all_recv_stop, &stop_signal, MPI_STATUS_IGNORE);
if (stop_signal)
{
MPI_Reduce(&found, NULL, 1, MPI_INT, MPI_SUM, root, MPI_COMM_WORLD);
printf("Rank %d stopping after receiving signal.\n", rank);
break;
}
}
};
MPI_Finalize();
}
While this is not the simplest code, it should:
Introduce no additional blocking
Scale with the implementation of a barrier (usually O(log N))
Have a worst-case-latency from one found, to all stop of 2 * loop time ( + 1 p2p + 1 barrier + 1 reduction).
If many/all ranks find a solution at the same time, it still works but may be less efficient.
A possible strategy to derive this global stop condition in a non-blocking fashion is to rely on MPI_Test.
scenario
Consider that each process posts an asynchronous receive of type MPI_INT to its left rank with a given tag to build a ring. Then start your computation. If a rank encounters the stop condition it sends its own rank to its right rank. In the meantime each rank uses MPI_Test to check for the completion of the MPI_Irecv during the computation if it is completed then enter a branch first waiting the message and then transitively propagating the received rank to the right except if the right rank is equal to the payload of the message (not to loop).
This done you should have all processes in the branch, ready to trigger an arbitrary recovery operation.
Complexity
The topology retained is a ring as it minimizes the number of messages at most (n-1) however it augments the propagation time. Other topologies could be retained with more messages but lower spatial complexity for example a tree with a n.ln(n) complexity.
Implementation
Something like this.
int rank, size;
MPI_Init(&argc,&argv);
MPI_Comm_rank( MPI_COMM_WORLD, &rank);
MPI_Comm_size( MPI_COMM_WORLD, &size);
int left_rank = (rank==0)?(size-1):(rank-1);
int right_rank = (rank==(size-1))?0:(rank+1)%size;
int stop_cond_rank;
MPI_Request stop_cond_request;
int stop_cond= 0;
while( 1 )
{
MPI_Irecv( &stop_cond_rank, 1, MPI_INT, left_rank, 123, MPI_COMM_WORLD, &stop_cond_request);
/* Compute Here and set stop condition accordingly */
if( stop_cond )
{
/* Cancel the left recv */
MPI_Cancel( &stop_cond_request );
if( rank != right_rank )
MPI_Send( &rank, 1, MPI_INT, right_rank, 123, MPI_COMM_WORLD );
break;
}
int did_recv = 0;
MPI_Test( &stop_cond_request, &did_recv, MPI_STATUS_IGNORE );
if( did_recv )
{
stop_cond = 1;
MPI_Wait( &stop_cond_request, MPI_STATUS_IGNORE );
if( right_rank != stop_cond_rank )
MPI_Send( &stop_cond_rank, 1, MPI_INT, right_rank, 123, MPI_COMM_WORLD );
break;
}
}
if( stop_cond )
{
/* Handle the stop condition */
}
else
{
/* Cleanup */
MPI_Cancel( &stop_cond_request );
}
That is a question I've asked myself a few times without finding any completely satisfactory answer... The only thing I thought of (beside MPI_Abort() that does it but which is a bit extreme) is to create an MPI_Win storing a flag that will be raise by whichever process facing the problem, and checked by all processes regularly to see if they can go on processing. This is done using non-blocking calls, the same way as described in this answer.
The main weaknesses of this are:
This depends on the processes to willingly check the status of the flag. There is no immediate interruption of their work to notifying them.
The frequency of this checking must be adjusted by hand. You have to find the trade-off between the time you waste processing data while there's no need to because something happened elsewhere, and the time it takes to check the status...
In the end, what we would need is a way of defining a callback action triggered by an MPI call such as MPI_Abort() (basically replacing the abort action by something else). I don't think this exists, but maybe I overlooked it.
i am trying these logic code, and i do not know what is different between them. I am trying to use MPI_Send() and MPI_Recv() in my program. As i understand, in MPI, processess communicate via their ranks of each processor, and tags of each message. So , what is different if i try
Logic 1:
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
if (world_rank == 0) {
number = -1;
MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n",
number);
}
Logic 2:
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
if (world_rank == 0) {
number = -1;
int i =0;
for(i = 1 ; i< world_size;i++){
MPI_Send(&number, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
}
}else{
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n",
number);
}
I try: mpirun -np 100 ./test <arguments>
Logic 1: after few minutes, my computer hangs on.
Logic 2: it works, and print 100 lines: Process kali received number -1 from process 0
I think both logic will get rank of process, and parse it to parameter on MPI_Send. What is different???
I am working on Debian Kali Linux, with OpenMPI 1.8.
I am new to MPI. Thanks for help.
It is strongly different.
On the one hand, in Logic 1, a single message is sent from process 0 to process 1. On the other hand, in Logic 2, world_size-1 messages are sent by process 0 and each remaining process receives one message from 0. The second case could be replaced by a call to MPI_Bcast().
Had you tried mpirun -np 2 ./test <arguments> these code would have done the same thing...but it is the only case !
Both the code abstracts seem correct. The failure in the first case may be due to the fact that the integer number is not initialized on processes 2 to world_size. For instance, if number is the length of an array, it can trigger a segmentation fault. If number is part of a stopping condition in a for loop, it can trigger an infine loop (or a very long one).
Edited
As we are missing the full source code, it is not possible to determine whether the initialization/finalization functions were properly used.
Here we have the same source code form the initial answer + what's required to properly run an mpi app:
#include <stdio.h>
#include <mpi.h>
int
main (int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
if (world_rank == 0) {
number = -1;
MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n",
number);
}
MPI_Finalize();
}
Compile:
> mpicc -std=c99 -O2 -g -Wall -I. -o app app.c -lm
Run:
> mpirun -n 10 app
Process 1 received number -1 from process 0
>
Basically everything is working fine, so I guess the problem could be related to initialization/finalization.
Initial Response
Your application is hanging-up with logic 1 because there are 99 processes waiting for a message, but the master process is only sending the message to process identified by rank 1.
As you're using a blocking function (i.e. MPI_Send vs. MPI_Isend), there are 98 processes waiting forever until a message arrives from process ranked with 0, tag=0, and communicator MPI_COMM_WORLD.
It was my understanding that MPI communicators restrict the scope of communication, such
that messages sent from one communicator should never be received in a different one.
However, the program inlined below appears to contradict this.
I understand that the MPI_Send call returns before a matching receive is posted because of the internal buffering it does under the hood (as opposed to MPI_Ssend). I also understand that MPI_Comm_free doesn't destroy the communicator right away, but merely marks it for deallocation and waits for any pending operations to finish. I suppose that my unmatched send operation will be forever pending, but then I wonder how come the same object (integer value) is reused for the second communicator!?
Is this normal behaviour, a bug in the MPI library implementation, or is it that my program is just incorrect?
Any suggestions are much appreciated!
LATER EDIT: posted follow-up question
#include "stdio.h"
#include "unistd.h"
#include "mpi.h"
int main(int argc, char* argv[]) {
int rank, size;
MPI_Group group;
MPI_Comm my_comm;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_group(MPI_COMM_WORLD, &group);
MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
if (rank == 0) printf("created communicator %d\n", my_comm);
if (rank == 1) {
int msg = 123;
MPI_Send(&msg, 1, MPI_INT, 0, 0, my_comm);
printf("rank 1: message sent\n");
}
sleep(1);
if (rank == 0) printf("freeing communicator %d\n", my_comm);
MPI_Comm_free(&my_comm);
sleep(2);
MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
if (rank == 0) printf("created communicator %d\n", my_comm);
if (rank == 0) {
int msg;
MPI_Recv(&msg, 1, MPI_INT, 1, 0, my_comm, MPI_STATUS_IGNORE);
printf("rank 0: message received\n");
}
sleep(1);
if (rank == 0) printf("freeing communicator %d\n", my_comm);
MPI_Comm_free(&my_comm);
MPI_Finalize();
return 0;
}
outputs:
created communicator -2080374784
rank 1: message sent
freeing communicator -2080374784
created communicator -2080374784
rank 0: message received
freeing communicator -2080374784
The number you're seeing is simply a handle for the communicator. It's safe to reuse the handle since you've freed it. As to why you're able to send the message, look at how you're creating the communicator. When you use MPI_Comm_group, you're getting a group containing the ranks associated with the specified communicator. In this case, you get all of the ranks, since you are getting the group for MPI_COMM_WORLD. Then, you are using MPI_Comm_create to create a communicator based on a group of ranks. You are using the same group you just got, which will contain all of the ranks. So your new communicator has all of the ranks from MPI_COMM_WORLD. If you want your communicator to only contain a subset of ranks, you'll need to use a different function (or multiple functions) to make the desired group(s). I'd recommend reading through Chapter 6 of the MPI Standard, it includes all of the functions you'll need. Pick what you need to build the communicator you want.
I'm experimenting with MPI and was wondering if this code could cause a deadlock.
MPI_Comm_rank (comm, &my_rank);
if (my_rank == 0) {
MPI_Send (sendbuf, count, MPI_INT, 1, tag, comm);
MPI_Recv (recvbuf, count, MPI_INT, 1, tag, comm, &status);
} else if (my_rank == 1) {
MPI_Send (sendbuf, count, MPI_INT, 0, tag, comm);
MPI_Recv (recvbuf, count, MPI_INT, 0, tag, comm, &status);
}
MPI_Send may or may not block. It will block until the sender can reuse the sender buffer. Some implementations will return to the caller when the buffer has been sent to a lower communication layer. Some others will return to the caller when there's a matching MPI_Recv() at the other end. So it's up to your MPI implementation whether if this program will deadlock or not.
Because of this program behaves differently among different MPI implementations, you may consider rewritting it so there won't be possible deadlocks:
MPI_Comm_rank (comm, &my_rank);
if (my_rank == 0) {
MPI_Send (sendbuf, count, MPI_INT, 1, tag, comm);
MPI_Recv (recvbuf, count, MPI_INT, 1, tag, comm, &status);
} else if (my_rank == 1) {
MPI_Recv (recvbuf, count, MPI_INT, 0, tag, comm, &status);
MPI_Send (sendbuf, count, MPI_INT, 0, tag, comm);
}
Always be aware that for every MPI_Send() there must be a pairing MPI_Recv(), both "parallel" in time. For example, this may end in deadlock because pairing send/recv calls are not aligned in time. They cross each other:
RANK 0 RANK 1
---------- -------
MPI_Send() --- ---- MPI_Send() |
--- --- |
------ |
-- | TIME
------ |
--- --- |
MPI_Recv() <-- ---> MPI_Recv() v
These processes, on the other way, won't end in deadlock, provided of course, that there are indeed two processes with ranks 0 and 1 in the same communicator domain.
RANK 0 RANK 1
---------- -------
MPI_Send() ------------------> MPI_Recv() |
| TIME
|
MPI_Recv() <------------------ MPI_Send() v
The above fixed program may fail if the size of the communicator com does not allow rank 1 (only 0). That way, the if-else won't take the else route and thus, no process will be listening for the MPI_Send() and rank 0 will deadlock.
If you need to use your current communication layout, then you may prefer to use MPI_Isend() or MPI_Issend() instead for nonblocking sends, thus avoiding deadlock.
The post by #mcleod_ideafix is very good. I want to add a couple more things about non-blocking MPI calls.
The way most MPI implementations is that they copy the data out of the user buffer into some other place. It might be a buffer internal to the implementation, it might be something better on the right kind of networks. When that data is copied out of the user buffer and the buffer can be reused by the application, the MPI_SEND call returns. This may be before the matching MPI_RECV is called or it may not. The larger the data you are sending, the more likely that your message will block until the MPI_RECV call is made.
The best way to avoid this is to use non-blocking calls MPI_IRECV and MPI_ISEND. This way you can post your MPI_IRECV first, then make your call to MPI_ISEND. This avoids extra copies when the messages arrive (because the buffer to hold them is already available via the MPI_IRECV) which makes things faster, and it avoids the deadlock situation. So now your code would look like this:
MPI_Comm_rank (comm, &my_rank);
if (my_rank == 0) {
MPI_Irecv (recvbuf, count, MPI_INT, 1, tag, comm, &status, &requests[0]);
MPI_Isend (sendbuf, count, MPI_INT, 1, tag, comm, &requests[1]);
} else if (my_rank == 1) {
MPI_Irecv (recvbuf, count, MPI_INT, 0, tag, comm, &status, &requests[0]);
MPI_Isend (sendbuf, count, MPI_INT, 0, tag, comm, &requests[1]);
}
MPI_Waitall(2, request, &statuses);
As mcleod_ideafix explained your code can result in a deadlock.
Here you go: Explanation and two possible issue Solutions, one by rearranging execution order, one by async send recv calls
Heres the solution with async calls:
if (rank == 0) {
MPI_Isend(..., 1, tag, MPI_COMM_WORLD, &req);
MPI_Recv(..., 1, tag, MPI_COMM_WORLD, &status);
MPI_Wait(&req, &status);
} else if (rank == 1) {
MPI_Recv(..., 0, tag, MPI_COMM_WORLD, &status);
MPI_Send(..., 0, tag, MPI_COMM_WORLD);
}
I am having a new little problem;
I have a little pointer called:
int *a;
Now ..somewhere inside my main method I allocate some space for it using the following lines and assign a value:
a = (int *) malloc(sizeof(int));
*a=5;
..and then I attempt to transmit it (say to process 1):
MPI_Bsend(a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
On the other end, if I try to receive that pointer
int *b;
MPI_Recv(b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("This is what I received: %d \n", *b);
I get an error about the buffer!
However if instead of declaring 'b' a pointer I do the following:
int b;
MPI_Recv(&b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("This is what I received: %d \n", b);
...all seems to be good! Could someone help me figure out what's happening and how to only use pointer?
Thanks in advance!
The meaning of the line
MPI_Bsend(a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
is the following: "a is a point in memory where I have 1 integer. Send it.`
In the code you posted above, this is absolutely true: a does point to an integer, and so it is sent. This is why you can receive it using your second method, since the meaning of the line
MPI_Recv(&b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
is "receive 1 integer, and store it at &b. b is a regular int, so it's all good. In the first receive, you're trying to receive an integer into an int* variable there is no allocated memory that b is pointing to, so Recv has nowhere to write to. However, I should point out:
NEVER pass a pointer's contents to another process in MPI
MPI processes cannot read each others' memory, and virtual addressing makes one process' pointer completely meaningless to another.
This problem is related to handling pointers and allocating memory; it's not an MPI specific issue.
In your second variant, int a automatically allocates memory for one integer. By passing &a you are passing a pointer to an allocated memory segment. In your first variant, memory for the pointer is automatically allocated, but NOT for the memory the pointer is pointing to. Thus, when you pass in the pointer, MPI tries to write to non-allocated memory, which causes the error.
It would work this way though:
int *b = (int *) malloc(sizeof(int));
MPI_Recv(b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
The error you are getting is that you are copying a result from MPI_Recv in some memory *b that you don't own and isn't initialised.
Not an expert on MPI but surely you can't transfer a pointer (ie a memory address) to a process that could be running on another machine!