MPI in C and pointer transmissions - mpi

I am having a new little problem;
I have a little pointer called:
int *a;
Now ..somewhere inside my main method I allocate some space for it using the following lines and assign a value:
a = (int *) malloc(sizeof(int));
*a=5;
..and then I attempt to transmit it (say to process 1):
MPI_Bsend(a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
On the other end, if I try to receive that pointer
int *b;
MPI_Recv(b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("This is what I received: %d \n", *b);
I get an error about the buffer!
However if instead of declaring 'b' a pointer I do the following:
int b;
MPI_Recv(&b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("This is what I received: %d \n", b);
...all seems to be good! Could someone help me figure out what's happening and how to only use pointer?
Thanks in advance!

The meaning of the line
MPI_Bsend(a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
is the following: "a is a point in memory where I have 1 integer. Send it.`
In the code you posted above, this is absolutely true: a does point to an integer, and so it is sent. This is why you can receive it using your second method, since the meaning of the line
MPI_Recv(&b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
is "receive 1 integer, and store it at &b. b is a regular int, so it's all good. In the first receive, you're trying to receive an integer into an int* variable there is no allocated memory that b is pointing to, so Recv has nowhere to write to. However, I should point out:
NEVER pass a pointer's contents to another process in MPI
MPI processes cannot read each others' memory, and virtual addressing makes one process' pointer completely meaningless to another.

This problem is related to handling pointers and allocating memory; it's not an MPI specific issue.
In your second variant, int a automatically allocates memory for one integer. By passing &a you are passing a pointer to an allocated memory segment. In your first variant, memory for the pointer is automatically allocated, but NOT for the memory the pointer is pointing to. Thus, when you pass in the pointer, MPI tries to write to non-allocated memory, which causes the error.
It would work this way though:
int *b = (int *) malloc(sizeof(int));
MPI_Recv(b, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);

The error you are getting is that you are copying a result from MPI_Recv in some memory *b that you don't own and isn't initialised.
Not an expert on MPI but surely you can't transfer a pointer (ie a memory address) to a process that could be running on another machine!

Related

broadcasting structs without knowing attributes

I'm trying to use MPI_Bcast to share an instance of cudaIpcMemHandler_t, but I cannot figure out how to create the corresponding MPI_Datatype needed for Bcast. I do not know the underlying structure of the cuda type, hence methods like this one don't seem to work. Am I missing something ?
Following up from the useful comment , I have a solution that seems to work, though its only been tested in a toy program:
// Broadcast the memory handle
cudaIpcMemHandler_t memHandler[1];
if (rank==0){
// set the handler content, e.g. call cudpaIpcGetMemHandle
}
MPI_Barrier(MPI_COMM_WORLD);
// share the size of the handler with other objects
int hand_size[1];
if (rank==0)
hand_size[0]= sizeof(memHand[0]);
MPI_Bcast(&hand_size[0], 1, MPI_INT, 0, MPI_COMM_WORLD);
// broadcase the handler as byte array
char memHand_C[hand_size[0]];
if (rank==0)
memcpy(&memHand_C, &memHand[0], hand_size[0]);
MPI_Bcast(&memHand_C, hand_size[0], MPI_BYTE, 0, MPI_COMM_WORLD);
if (rank >0)
memcpy(&memHand[0], &memHand_C, hand_size[0]);

MPI message received in different communicator

It was my understanding that MPI communicators restrict the scope of communication, such
that messages sent from one communicator should never be received in a different one.
However, the program inlined below appears to contradict this.
I understand that the MPI_Send call returns before a matching receive is posted because of the internal buffering it does under the hood (as opposed to MPI_Ssend). I also understand that MPI_Comm_free doesn't destroy the communicator right away, but merely marks it for deallocation and waits for any pending operations to finish. I suppose that my unmatched send operation will be forever pending, but then I wonder how come the same object (integer value) is reused for the second communicator!?
Is this normal behaviour, a bug in the MPI library implementation, or is it that my program is just incorrect?
Any suggestions are much appreciated!
LATER EDIT: posted follow-up question
#include "stdio.h"
#include "unistd.h"
#include "mpi.h"
int main(int argc, char* argv[]) {
int rank, size;
MPI_Group group;
MPI_Comm my_comm;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_group(MPI_COMM_WORLD, &group);
MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
if (rank == 0) printf("created communicator %d\n", my_comm);
if (rank == 1) {
int msg = 123;
MPI_Send(&msg, 1, MPI_INT, 0, 0, my_comm);
printf("rank 1: message sent\n");
}
sleep(1);
if (rank == 0) printf("freeing communicator %d\n", my_comm);
MPI_Comm_free(&my_comm);
sleep(2);
MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
if (rank == 0) printf("created communicator %d\n", my_comm);
if (rank == 0) {
int msg;
MPI_Recv(&msg, 1, MPI_INT, 1, 0, my_comm, MPI_STATUS_IGNORE);
printf("rank 0: message received\n");
}
sleep(1);
if (rank == 0) printf("freeing communicator %d\n", my_comm);
MPI_Comm_free(&my_comm);
MPI_Finalize();
return 0;
}
outputs:
created communicator -2080374784
rank 1: message sent
freeing communicator -2080374784
created communicator -2080374784
rank 0: message received
freeing communicator -2080374784
The number you're seeing is simply a handle for the communicator. It's safe to reuse the handle since you've freed it. As to why you're able to send the message, look at how you're creating the communicator. When you use MPI_Comm_group, you're getting a group containing the ranks associated with the specified communicator. In this case, you get all of the ranks, since you are getting the group for MPI_COMM_WORLD. Then, you are using MPI_Comm_create to create a communicator based on a group of ranks. You are using the same group you just got, which will contain all of the ranks. So your new communicator has all of the ranks from MPI_COMM_WORLD. If you want your communicator to only contain a subset of ranks, you'll need to use a different function (or multiple functions) to make the desired group(s). I'd recommend reading through Chapter 6 of the MPI Standard, it includes all of the functions you'll need. Pick what you need to build the communicator you want.

Deadlock with MPI

I'm experimenting with MPI and was wondering if this code could cause a deadlock.
MPI_Comm_rank (comm, &my_rank);
if (my_rank == 0) {
MPI_Send (sendbuf, count, MPI_INT, 1, tag, comm);
MPI_Recv (recvbuf, count, MPI_INT, 1, tag, comm, &status);
} else if (my_rank == 1) {
MPI_Send (sendbuf, count, MPI_INT, 0, tag, comm);
MPI_Recv (recvbuf, count, MPI_INT, 0, tag, comm, &status);
}
MPI_Send may or may not block. It will block until the sender can reuse the sender buffer. Some implementations will return to the caller when the buffer has been sent to a lower communication layer. Some others will return to the caller when there's a matching MPI_Recv() at the other end. So it's up to your MPI implementation whether if this program will deadlock or not.
Because of this program behaves differently among different MPI implementations, you may consider rewritting it so there won't be possible deadlocks:
MPI_Comm_rank (comm, &my_rank);
if (my_rank == 0) {
MPI_Send (sendbuf, count, MPI_INT, 1, tag, comm);
MPI_Recv (recvbuf, count, MPI_INT, 1, tag, comm, &status);
} else if (my_rank == 1) {
MPI_Recv (recvbuf, count, MPI_INT, 0, tag, comm, &status);
MPI_Send (sendbuf, count, MPI_INT, 0, tag, comm);
}
Always be aware that for every MPI_Send() there must be a pairing MPI_Recv(), both "parallel" in time. For example, this may end in deadlock because pairing send/recv calls are not aligned in time. They cross each other:
RANK 0 RANK 1
---------- -------
MPI_Send() --- ---- MPI_Send() |
--- --- |
------ |
-- | TIME
------ |
--- --- |
MPI_Recv() <-- ---> MPI_Recv() v
These processes, on the other way, won't end in deadlock, provided of course, that there are indeed two processes with ranks 0 and 1 in the same communicator domain.
RANK 0 RANK 1
---------- -------
MPI_Send() ------------------> MPI_Recv() |
| TIME
|
MPI_Recv() <------------------ MPI_Send() v
The above fixed program may fail if the size of the communicator com does not allow rank 1 (only 0). That way, the if-else won't take the else route and thus, no process will be listening for the MPI_Send() and rank 0 will deadlock.
If you need to use your current communication layout, then you may prefer to use MPI_Isend() or MPI_Issend() instead for nonblocking sends, thus avoiding deadlock.
The post by #mcleod_ideafix is very good. I want to add a couple more things about non-blocking MPI calls.
The way most MPI implementations is that they copy the data out of the user buffer into some other place. It might be a buffer internal to the implementation, it might be something better on the right kind of networks. When that data is copied out of the user buffer and the buffer can be reused by the application, the MPI_SEND call returns. This may be before the matching MPI_RECV is called or it may not. The larger the data you are sending, the more likely that your message will block until the MPI_RECV call is made.
The best way to avoid this is to use non-blocking calls MPI_IRECV and MPI_ISEND. This way you can post your MPI_IRECV first, then make your call to MPI_ISEND. This avoids extra copies when the messages arrive (because the buffer to hold them is already available via the MPI_IRECV) which makes things faster, and it avoids the deadlock situation. So now your code would look like this:
MPI_Comm_rank (comm, &my_rank);
if (my_rank == 0) {
MPI_Irecv (recvbuf, count, MPI_INT, 1, tag, comm, &status, &requests[0]);
MPI_Isend (sendbuf, count, MPI_INT, 1, tag, comm, &requests[1]);
} else if (my_rank == 1) {
MPI_Irecv (recvbuf, count, MPI_INT, 0, tag, comm, &status, &requests[0]);
MPI_Isend (sendbuf, count, MPI_INT, 0, tag, comm, &requests[1]);
}
MPI_Waitall(2, request, &statuses);
As mcleod_ideafix explained your code can result in a deadlock.
Here you go: Explanation and two possible issue Solutions, one by rearranging execution order, one by async send recv calls
Heres the solution with async calls:
if (rank == 0) {
MPI_Isend(..., 1, tag, MPI_COMM_WORLD, &req);
MPI_Recv(..., 1, tag, MPI_COMM_WORLD, &status);
MPI_Wait(&req, &status);
} else if (rank == 1) {
MPI_Recv(..., 0, tag, MPI_COMM_WORLD, &status);
MPI_Send(..., 0, tag, MPI_COMM_WORLD);
}

How to improve this programme to work master also

In this mpi programme only works slave nodes. How to modify it to work master also. Because working of the master also improve the performance of the system.
int A,B,C, slaveid,recvid,root, rank,size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
/*-------------------------- master ------------------------------*/
if(rank == 0){
N =10;
for(slaveid=1; slaveid<size; slaveid++){
MPI_Send(&N, 1, MPI_INT, slaveid, 1, MPI_COMM_WORLD);
}
for(recvid=1; recvid<size; recvid++){
MPI_Recv(&A, 1, MPI_INT, recvid, 2, MPI_COMM_WORLD, &status);
printf(" My id = %d and i send = %d\n",recvid,A);
}
}
/*-------------------------- Slave ------------------------------*/
if(rank>0){
MPI_Recv(&B, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
C = B*3;
MPI_Send(&C, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
}
MPI_Finalize();
}
Within the block delimited by
if(rank == 0){
}
insert, at the appropriate location, the line
work_like_a_slave(argument1, argument2,...)
The appropriate location is probably between the loop that sends messages and the loop that receives messages so that the master isn't entirely idle while the slaves toil.
Whether this has a measurable impact on performance depends on a number of factors your question doesn't provide enough information on which to base a good guess; factors such as: how many slaves there are and therefore how busy the master is sending and receiving messages, how much work each process does compared with the messaging it does, etc.
Be prepared, if the numbers work against you, for any measurable impact to be negative, that is for pressing the master into service to actually slow down your computation.

MPI Barrier C++

I want to use MPI (MPICH2) on windows. I write this command:
MPI_Barrier(MPI_COMM_WORLD);
And I expect it blocks all Processors until all group members have called it. But it is not happen. I add a schematic of my code:
int a;
if(myrank == RootProc)
a = 4;
MPI_Barrier(MPI_COMM_WORLD);
cout << "My Rank = " << myrank << "\ta = " << a << endl;
(With 2 processor:) Root processor (0) acts correctly, but processor with rank 1 doesn't know the a variable, so it display -858993460 instead of 4.
Can any one help me?
Regards
You're only assigning a in process 0. MPI doesn't share memory, so if you want the a in process 1 to get the value of 4, you need to call MPI_Send from process 0 and MPI_Recv from process 1.
Variable a is not initialized - it is possible that is why it displays that number. In MPI, variable a is duplicated between the processes - so there are two values for a, one of which is uninitialized. You want to write:
int a = 4;
if (myrank == RootProc)
...
Or, alternatively, do an MPI_send in the Root (id 0), and an MPI_recv in the slave (id 1) so the value in the root is also set in the slave.
Note: that code triggers a small alarm in my head, so I need to check something and I'll edit this with more info. Until then though, the uninitialized value is most certainly a problem for you.
Ok I've checked the facts - your code was not properly indented and I missed the missing {}. The barrier looks fine now, although the snippet you posted does not do too much, and is not a very good example of a barrier because the slave enters it directly, whereas the root will set the value of the variable to 4 and then enter it. To test that it actually works, you probably want some sort of a sleep mechanism in one of the processes - that will yield (hope it's the correct term) the other process as well, preventing it from printing the cout until the sleep is over.
Blocking is not enough, you have to send data to other processes (memory in not shared between processes).
To share data across ALL processes use:
int MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm )
so in your case:
MPI_Bcast(&a, 1, MPI_INT, 0, MPI_COMM_WORLD);
here you send one integer pointed by &a form process 0 to all other.
//MPI_Bcast is sender for root process and receiver for non-root processes
You can also send some data to specyfic process by:
int MPI_Send( void *buf, int count, MPI_Datatype datatype, int dest,
int tag, MPI_Comm comm )
and then receive by:
int MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)

Resources