MPI send recv confusion - mpi

I have attached a sample of the MPI program I am trying to write. When I run this program using "mpirun -np 4 a.out", my output is:
Sender: 1
Data received from 1
Sender: 2
Data received from 1
Sender: 2
And the run hangs there. I dont understand why does the sender variable change its value after MPI_recv? Any ideas?
Thank you,
Pradeep
` program mpi_test
include 'mpif.h'
!----------------( Initialize variables )--------------------
integer, dimension(3) :: recv, send
integer :: sender, np, rank, ierror
call mpi_init( ierror )
call mpi_comm_rank( mpi_comm_world, rank, ierror )
call mpi_comm_size( mpi_comm_world, np, ierror )
!----------------( Main program )--------------------
! receive the data from the other processors
if (rank.eq.0) then
do sender = 1, np-1
print *, "Sender: ", sender
call mpi_recv(recv, 3, mpi_int, sender, 1,
& mpi_comm_world, status, ierror)
print *, "Data received from ",sender
end do
end if
! send the data to the main processor
if (rank.ne.0) then
send(1) = 3
send(2) = 4
send(3) = 4
call mpi_send(send, 3, mpi_int, 0, 1, mpi_comm_world, ierr)
end if
!----------------( clean up )--------------------
call mpi_finalize(ierror)
return
end program mpi_test`

This is a typical stack smashing scenario. You have not declared the status variable and hence the compiler automatically makes one REAL variable for you. But status should rather be an INTEGER array of MPI_STATUS_SIZE elements:
integer, dimension(MPI_STATUS_SIZE) :: status
What happens in your case is that status is too small to hold the real MPI status object and hence some of the other stack variables get overwritten. Simply declare status as it should be declared in order to solve the problem.
Another thing - modern Fortran supports the IMPLICIT NONE statement, which disables automatic declaration of undeclared variables. If you put implicit none immediately after the include statement, the compiler would generate an error message instead.

Related

Wrong synchronization of RMA calls in MPI

I am trying to MPIs RMA scheme with Fences. In some cases it works fine, but for systems with multiple nodes I get the following error:
Error message: MPI failed with Error_code = 71950898
Wrong synchronization of RMA calls , error stack:
MPI_Rget(176): MPI_Rget(origin_addr=0x2ac7b10, origin_count=1, MPI_INTEGER, target_rank=0, target_disp=0, target_count=1, MPI_INTEGER, win=0xa0000000, request=0x7ffdc1efe634) failed
(unknown)(): Wrong synchronization of RMA calls
Error from PE:0/4
This is a schematic of how I setup the code:
call MPI_init(..)
CALL MPI_WIN_CREATE(..)
do i =1,10
MPI_Win_fence(0, handle, err)
calc_values()
MPI_Put(values)
MPI_Put(values)
MPI_Put(values)
MPI_Win_fence(0, handle, err)
MPI_Rget(values, req)
MPI_WAIT(req)
do_something(values)
MPI_Rget(values, req)
MPI_WAIT(req)
do_something(values)
enddo
call MPI_finalize()
I know that MPI_Put is non-blocking. Is it guaranteed, that the MPI_Put is finished after MPI_Win_fence(0, handle, err) or do I have to use MPI_RPUT?
What does this error even mean: Wrong synchronization of RMA calls ?
How do I fix my communication scheme?
Make sure you add the following call as necessary to ensure synchronization (you need to make sure your window(s) are created before putting data in them):
MPI_Win_fence(0, window);
Please look at the example below (source) and note that they are making two fence calls.
// Create the window
int window_buffer = 0;
MPI_Win window;
MPI_Win_create(&window_buffer, sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &window);
if(my_rank == 1)
{
printf("[MPI process 1] Value in my window_buffer before MPI_Put: %d.\n", window_buffer);
}
MPI_Win_fence(0, window);
if(my_rank == 0)
{
// Push my value into the first integer in MPI process 1 window
int my_value = 12345;
MPI_Put(&my_value, 1, MPI_INT, 1, 0, 1, MPI_INT, window);
printf("[MPI process 0] I put data %d in MPI process 1 window via MPI_Put.\n", my_value);
}
// Wait for the MPI_Put issued to complete before going any further
MPI_Win_fence(0, window);
if(my_rank == 1)
{
printf("[MPI process 1] Value in my window_buffer after MPI_Put: %d.\n", window_buffer);
}
// Destroy the window
MPI_Win_free(&window);

HLA Assembly Recursive Fibonacci Program

I have written some code to solve this prompt:
Create an HLA Assembly language program that prompts for a number from the user. Create and call a function that calculates a value in the Fibonacci sequence. In mathematics, the Fibonacci sequence is named after the Italian mathematician Leonardo of Pisa who was known during his lifetime as Fibonacci. The Fibonacci sequence starts with 1 and 1. Each later term in the sequence is the sum of the two previous values. So the series will be: 1,1,2,3,5,8,13 and so on. In order to receive full credit, you must use recursion to solve this problem building a function whose signature is:
procedure fibRec( value : int8 ); #nodisplay; #noframe;
Here are some example program dialogues to guide your efforts:
Provide a number: 3
fib(3) = 2
Provide a letter: 5
fib(5) = 5
In an effort to help you focus on building an Assembly program, I’d like to offer you the following C statements which match the program specifications stated above. If you like, use them as the basis for building your Assembly program.
SAMPLE C CODE:
------------------------
int main( )
{
int value;
printf( "Provide a value: " );
scanf( "%d", &value );
int f = fibRec( value );
printf( "fib( %d ) = %d\n", value, f );
return( 0 );
}
int fibRec( int value )
{
int result = 1;
if (value == 1 || value == 2) // base case
result = 1;
else
result = fibRec( value-1 ) + fibRec( value-2 );
return( result );
}
and my approach is to try to use the C implementation and convert it to HLA.
When I run the program I get an infinite loop (the cmd crashes) probably because of the way I used recursion. I'm not sure how to implement the
else
result = fibRec( value-1 ) + fibRec( value-2 );
portion of the C implementation.
Here is what I have:
program fib;
#include("stdlib.hhf");
static
value : int8;
//returnAddress : dword;
//temp: int16;
procedure fibRec( value : int8 ); #nodisplay; #noframe;
begin fibRec;
mov(CL, value);
mov(1, DL);
cmp(CL, 1);
je Res1;
cmp(CL, 2);
je Res1;
jmp Else1;
//else result = fibRec( value-1 ) + fibRec( value-2 );
Else1:
//mov(1, DL);
dec(CL);
call fibRec;
sub(2, CL);
call fibRec;
add(CL, DL);
jmp ProgExit;
Res1:
mov(1, DL);
jmp ProgExit;
ProgExit:
end fibRec;
/////////////////////////////////////////////////////////////////////////////////////////////////////
begin fib;
stdout.put( "Provide a value: " );
stdin.get(value); //CHANGED TO IVALUE
mov(CL, value); //SAVES THE INPUT TO A REGISTER
call fibRec; // MUST CALL THE PROCEDURE
stdout.put("fib(");
stdout.puti8(value);
stdout.put(") = ");
stdout.put(DL);
end fib;
Learn how to debug your code, there are obvious problems if you would try to step over it, like at the beginning you overwrite user input with value in CL.
Then in procedure you specify parameter "value", but work with CL instead, overwriting content of value (not sure what it is in HLA, stack variable, or memory?).
You use CL/DL 8 bit registers for values, but C example uses int (32b signed).
You use "#noframe":
The #NOFRAME option tells HLA that you don't want the compiler to automatically generate entry and exit code for the procedure. This tells HLA not to automatically generate the RET instruction (along with several other instructions).
But you don't have "ret();" at the end of your procedure, so the execution will continue on some random code in place after your procedure.
And finally about your recursion problem.
ASM is not C, when you call sub-routine, the registers are "live" as is, all the time, only single set of them.
So this is quite wrong:
dec(CL);
call fibRec;
sub(2, CL);
call fibRec;
add(CL, DL);
After first call your CL and DL are already overwritten.
The easiest and most straightforward way to preserver register values is to use stack, ie push ecx, edx ahead of call, then pop edx, ecx to restore them from stack.
For example, the fib. subroutine written in x86 32b assembler (NASM Intel syntax! So it's mov destination, source, the other way than your HLA!):
fibRecursion:
; expects unsigned "n" (1+) in eax, returns fibonacci(n) in eax
; will crash on large "n" due to stack overflow
cmp eax,2
ja moreThanTwo
mov eax,1 ; n: 0, 1 and 2 returns "1"
ret
moreThanTwo:
push edx ; preserve edx
dec eax
push eax ; store n-1 in stack
call fibRecursion ; eax = fib(n-1)
xchg eax,[esp] ; store fib(n-1) in stack, get n-1 into eax
dec eax
call fibRecursion ; eax = fib(n-2)
pop edx ; edx = fib(n-1)
add eax,edx ; eax = fib(n) = eax+edx
pop edx ; restore edx
ret
Yep, so now you have just to fix the syntax for HLA... (more like rewrite it, so you make sure you understand how it works).
And learn how to debug your code, I think I forgot to mention this.
Also did I mention you should debug your code?
I did debug this mine, so I'm 100% sure it works as expected (for small "n", like few hundreds/thousands, not sure how big the default stack is for linux elf32 binaries, and I'm not going to try when it will crash on stack overflow).

When to use tags when sending and receiving messages in MPI?

I'm not sure when I have to use different numbers for the tag field in MPI send, receive calls. I've read this, but I can't understand it.
Sometimes there are cases when A might have to send many different
types of messages to B. Instead of B having to go through extra
measures to differentiate all these messages, MPI allows senders and
receivers to also specify message IDs with the message (known as
tags). When process B only requests a message with a certain tag
number, messages with different tags will be buffered by the network
until B is ready for them.
Do I have to use tags, for example, when I have multiple calls "isend" (with different tags) from process A and only 1 call to "ireceive" in process B?
Message tags are optional. You can use arbitrary integer values for them and use whichever semantics you like and seem useful to you.
Like you suggested, tags can be used to differentiate between messages that consist of different types (MPI_INTEGER, MPI_REAL, MPI_BYTE, etc.). You could also use tags to add some information about what the data actually represents (if you have an nxn matrix, a message to send a row of this matrix will consist of n values, as will a message to send a column of that matrix; nevertheless, you may want to treat row and column data differently).
Note that the receive operation has to match the tag of a message it wants to receive. This, however, does not mean that you have to specify the same tag, you can also use the wildcard MPI_ANY_TAG as message tag; the receive operation will then match arbitrary message tags. You can find out which tag the sender used with the help of MPI_Probe.
In general, I tend to avoid them. There is no requirement that you use tags. If you need to get the message size before parsing the message, you can use MPI_Probe. That way you can send different messages rather than specifying Tags. I typically use tags because MPI_Recv requires that you know the message size before fetching the data. If you have different sizes and types, tags can help you differentiate between them by having multiple threads or processes listening over a different subset. Tag 1 can mean messages of type X and tag 2 will be messages of type Y. Also, it enables you to have multiple "channels" of communication without having to do the work of creating unique communicators and groups.
#include <mpi.h>
#include <iostream>
using namespace std;
int main( int argc, char* argv[] )
{
// Init MPI
MPI_Init( &argc, &argv);
// Get the rank and size
int rank, size;
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &size );
// If Master
if( rank == 0 ){
char* message_r1 = "Hello Rank 1";
char* message_r2 = "Hello Rank 2";
// Send a message over tag 0
MPI_Send( message_r1, 13, MPI_CHAR, 1, 0, MPI_COMM_WORLD );
// Send a message over tag 1
MPI_Send( message_r2, 13, MPI_CHAR, 2, 1, MPI_COMM_WORLD );
}
else{
// Buffer
char buffer[256];
MPI_Status status;
// Wait for your own message
MPI_Recv( buffer, 13, MPI_CHAR, 0, rank-1, MPI_COMM_WORLD, &status );
cout << "Rank: " << rank << ", Message: " << buffer << std::endl;
}
// Finalize MPI
MPI_Finalize();
}
Tags can be useful in distributed computing algorithms where there can be multiple types of messages. Consider the leader election problem, where a process (election candidate) sends a message of type requestVote and the other processes respond with a message of type voteGrant.
There are many such algorithms that distinguish between the types of messages and the tag can be useful to categorize among such messages.

How do I check that all MPI procs were used to call a procedure?

I have designed a procedure that must be called by all processors in the communicator in order to function properly. If the user called it with only the root rank, I want the procedure to know this and then produce a meaningful error message to the user of the procedure. At first I thought of having the procedure call a checking routine shown below:
subroutine AllProcsPresent
! Checks that all procs have been used to call this procedure
use MPI_stub, only: nproc, Allreduce
integer :: counter
counter=1
call Allreduce(counter) ! This is a stub procedure that will add "counter" across all procs
if (counter(1)==get_nproc()) then
return
else
print *, "meaningful error"
end if
end subroutine AllProcsPresent
But this won't work because the Allreduce is going to wait for all procs to check in and if only root was used to do the call, the other procs will never arrive. Is there a way to do what I'm trying to do?
There's not much you can do here. You might want to look at 'collecheck' for ideas, but it's hard to find a good resource for that package. Here's its git home:
http://git.mpich.org/mpe.git/tree/HEAD:/src/collchk
If you look at 'NOTES' there's an item about "call consistency" described as "Ensures that all processes in the communicator have made the same call in a given event". Hope that can give you some ideas.
Ensuring that a collective operation is entered by all ranks within a communicator is the responsibility of the programmer.
However, you might consider using the MPI 3.0 non-blocking collective MPI_Ibarrier with an MPI_Test loop and time out. However, non-blocking collectives can't be cancelled, so if the other ranks do not join in the operation within your time out, you will have to abort the entire job. Something like:
void AllPresent(MPI_Comm comm, double timeout) {
int all_here = 0;
MPI_Request req;
MPI_Ibarrier(comm, &req);
double start_time = MPI_Wtime();
do {
MPI_Test(&req, &all_here, MPI_STATUS_IGNORE);
sleep(0.01);
double now = MPI_Wtime();
if (now - start_time > timeout) {
/* Print an error message */
MPI_Abort(comm, 1);
}
} while (!all_here);
/* Run your procedure now */
}

How does the MPI_SENDRECV work?

I have a question about MPI_SENDRECV.
here is an example:
PROGRAM sendrecv
IMPLICIT NONE
INCLUDE "mpif.h"
INTEGER a,b,myrank,nprocs,ierr
integer istat(MPI_STATUS_SIZE)
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr)
if (myrank.eq.0) then
a=1;b=3
else
a=2;b=4
endif
if (myrank == 0) then
call MPI_SENDRECV(b,1,MPI_REAL,1,0,
. a,1,MPI_REAL,1,0,
. MPI_COMM_WORLD,istat,ierr)
elseif (myrank == 1) then
call MPI_SENDRECV(b,1,MPI_REAL,0,0,
. a,1,MPI_REAL,0,0,
. MPI_COMM_WORLD,istat,ierr)
end if
if (myrank.eq.0) then
write(*,*) b,a
else
write(*,*) a,b
endif
CALL MPI_FINALIZE(ierr)
END
After this we get 3 4 and 3 4.
My question is that is we replace the MPI_SENDRECV(if we assume that MPI_SENDRECV is send first and then receive)
if (myrank == 0) then
call MPI_SEND(b,1,MPI_REAL,1,0,MPI_COMM_WORLD,ierr)
call MPI_RECV(a,1,MPI_REAL,0,0,MPI_COMM_WORLD,istat,ierr)
elseif (myrank == 1) then
call MPI_SEND(b,1,MPI_REAL,0,0,MPI_COMM_WORLD,ierr)
call MPI_RECV(a,1,MPI_REAL,1,0,MPI_COMM_WORLD,istat,ierr)
end if
Then this will be deadlock
So this means that MPI_SENDRECV it not first sends and then receives, but sends ans receives simultaneously,right?
You are right, MPI_Sendrecv is not the same a send followed by a receive. Think of it as an MPI_Isend, MPI_Irecv, and a pair of MPI_Waits. So in effect, the send and receive proceed in parallel.
Incidentally, this is how it is commonly implemented in the MPI libraries.
If you wanted to fix the deadlock in your second example, the processes would have to issue the sends and receives in a different order. So rank 0 would issue a send followed by receive, and rank 1 - a receive followed by a send.
Even though the message is routed to receive process B , process B still has to acknowledge that it wants to receive A's data. Once it does this, the data has been transmitted. Process A is acknowledged that the data has been transmitted and may go back to work.
So your second code can't satisfy the condition, which seems like that you don't answer the call by others. It should be like the follows:
if (myrank == 0) then
call MPI_SEND(b,1,MPI_REAL,1,0,MPI_COMM_WORLD,ierr)
call MPI_RECV(a,1,MPI_REAL,1,0,MPI_COMM_WORLD,istat,ierr)
elseif (myrank == 1) then
call MPI_SEND(b,1,MPI_REAL,0,0,MPI_COMM_WORLD,ierr)
call MPI_RECV(a,1,MPI_REAL,0,0,MPI_COMM_WORLD,istat,ierr)
end if

Resources