Is there a non-blocking version of MPI_Comm_create_group()? - mpi

Suppose that we want to create 3 communicators for three processes with ranks 0,1 and 2 and the groups of the communicators are {0,1}, {0,2} and {1,2}.
It seems that (see the example code below) if each process calls MPI_Comm_create_group() in the following order, there will be a blockage.
Process 0
create a communicator for the group {0,1}
create a communicator for the group {0,2}
Process 1
create a communicator for the group {1,2}
create a communicator for the group {0,1}
Process 2
create a communicator for the group {0,2}
create a communicator for the group {1,2}
What solutions are there? One solution is obviously changing the order of the calls to MPI_Comm_create_group().
The example code:
#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
int main()
{
MPI_Init(NULL, NULL);
int worldsize, worldrank;
MPI_Comm_size(MPI_COMM_WORLD, &worldsize);
MPI_Comm_rank(MPI_COMM_WORLD, &worldrank);
if (worldsize < 3)
{
if (worldrank==0) printf("Please launch at least 3 processes!\n");
MPI_Finalize();
return 0;
}
MPI_Group worldgroup, newgroup;
MPI_Comm newcomm1, newcomm2;
MPI_Comm_group(MPI_COMM_WORLD, &worldgroup);
int set01[2] = {0, 1};
int set02[2] = {0, 2};
int set12[2] = {1, 2};
switch (worldrank)
{
case 0:
MPI_Group_incl(worldgroup, 2, set01, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 0, &newcomm1);
MPI_Group_incl(worldgroup, 2, set02, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 1, &newcomm2);
break;
case 1:
MPI_Group_incl(worldgroup, 2, set12, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 2, &newcomm1);
MPI_Group_incl(worldgroup, 2, set01, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 0, &newcomm2);
break;
case 2:
MPI_Group_incl(worldgroup, 2, set02, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 1, &newcomm1);
MPI_Group_incl(worldgroup, 2, set12, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 2, &newcomm2);
break;
}
MPI_Finalize();
}

Related

MPI_SendRecv deadlock on execution

I'm trying to send information from one processor to another in a ring way from an offset processor using MPI_Sendrecv but i got deadlock. What is wrong in my code? Basically i need to use MPI_SendRecv to solve this kind of problem.
#include <stdio.h>
#include <unistd.h>
#include <mpi.h>
int main (int argc, char *argv[])
{
int offset = 9;
int size, rank, value, next, prev, sendval, recval, namelen;
double t0, t;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
value = 5;
if (size > 1)
{
next = (rank + 1)% size;
prev = (size+rank - 1)% size;
sendval = value + rank;
if (rank == offset)
{
MPI_Sendrecv(&sendval, 1, MPI_INT, next, 1, &recval, 1, MPI_INT, prev, 10, MPI_COMM_WORLD, &status);
}
else
{
MPI_Recv(&recval, 1, MPI_INT, prev, 10, MPI_COMM_WORLD, &status);
MPI_Send(&sendval, 1, MPI_INT, next, 10, MPI_COMM_WORLD);
}
}
MPI_Finalize();
return 0;
}
You have mismatched message tags:
MPI_Sendrecv(&sendval, 1, MPI_INT, next, 1, &recval, 1, MPI_INT, prev, 10, MPI_COMM_WORLD, &status);
// ^
...
MPI_Recv(&recval, 1, MPI_INT, prev, 10, MPI_COMM_WORLD, &status);
// ^^
The tag in the send part of the send-receive operation should also be 10.

when using MPI_Isend() MPI_and Irecv(), should the pair use the same request? in case the message is already an array, how it should be passed?

In this code I am trying to broadcast using non blocking send and receive as a practice. I have multiple questions and issues.
1.Should I pair Isend() and Irecv() to use the same request?
2.When the message is an array, how should it be passed? in this case, message or &message?
3.Why I cannot run this code on less or more than 8 processors? if the rank doesn't exit, shouldn't it just go on without executing that piece of code?
4.The snippet on the at the bottom is there in order to print the total time once, but the waitall() does not work, and I do not understand why.
5. When passing arrays longer than 2^12, I get segmentation error, while I have checked the limits of Isend() and Irecv() and they supposed to handle even bigger length messages.
6.I used long double for record the time, is this a common or good practice? when I used smaller variables like float or double I would get nan.
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include<mpi.h>
int main(int argc, char *argv[]){
MPI_Init(&argc, &argv);
int i, rank, size, ready;
long int N = pow(2, 10);
float* message = (float *)malloc(sizeof(float *) * N + 1);
long double start, end;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
//MPI_Request* request = (MPI_Request *)malloc(sizeof(MPI_Request *) * size);
MPI_Request request[size-1];
/*Stage I: -np 8*/
if(rank == 0){
for(i = 0; i < N; i++){
message[i] = N*rand();
message[i] /= rand();
}
start = MPI_Wtime();
MPI_Isend(&message, N, MPI_FLOAT, 1, 0, MPI_COMM_WORLD, &request[0]);
MPI_Isend(&message, N, MPI_FLOAT, 2, 0, MPI_COMM_WORLD, &request[1]);
MPI_Isend(&message, N, MPI_FLOAT, 4, 0, MPI_COMM_WORLD, &request[3]);
printf("Processor root-rank %d- sent the message...\n", rank);
}
if (rank == 1){
MPI_Irecv(&message, N, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, &request[0]);
MPI_Wait(&request[0], MPI_STATUS_IGNORE);
printf("Processor rank 1 received the message.\n");
MPI_Isend(&message, N, MPI_FLOAT, 3, 0, MPI_COMM_WORLD, &request[2]);
MPI_Isend(&message, N, MPI_FLOAT, 5, 0, MPI_COMM_WORLD, &request[4]);
}
if(rank == 2){
MPI_Irecv(&message, N, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, &request[1]);
MPI_Wait(&request[1], MPI_STATUS_IGNORE);
printf("Processor rank 2 received the message.\n");
MPI_Isend(&message, N, MPI_FLOAT, 6, 0, MPI_COMM_WORLD, &request[5]);
}
if(rank == 3){
MPI_Irecv(&message, N, MPI_FLOAT, 1, 0, MPI_COMM_WORLD, &request[2]);
MPI_Wait(&request[2], MPI_STATUS_IGNORE);
printf("Processor rank 3 received the message.\n");
MPI_Isend(&message, N, MPI_FLOAT, 7, 0, MPI_COMM_WORLD, &request[6]);
}
if(rank == 4){
MPI_Irecv(&message, N, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, &request[3]);
MPI_Wait(&request[3], MPI_STATUS_IGNORE);
printf("Processor rank 4 received the message.\n");
}
if(rank == 5){
MPI_Irecv(&message, N, MPI_FLOAT, 1, 0, MPI_COMM_WORLD, &request[4]);
MPI_Wait(&request[4], MPI_STATUS_IGNORE);
printf("Processor rank 5 received the message.\n");
}
if(rank == 6){
MPI_Irecv(&message, N, MPI_FLOAT, 2, 0, MPI_COMM_WORLD, &request[5]);
MPI_Wait(&request[5], MPI_STATUS_IGNORE);
printf("Processor rank 6 received the message.\n");
}
if(rank == 7){
MPI_Irecv(&message, N, MPI_FLOAT, 3, 0, MPI_COMM_WORLD, &request[6]);
MPI_Wait(&request[6], MPI_STATUS_IGNORE);
printf("Processor rank 7 received the message.\n");
}
/*MPI_Testall(size-1,request,&ready, MPI_STATUS_IGNORE);*/
/* if (ready){*/
end = MPI_Wtime();
printf("Total Time: %Lf\n", end - start);
/*}*/
MPI_Finalize();
}
Each MPI task runs in its own address space, so there is no correlation between request[1] on rank 0 and request[1] on rank 2. That means you do not have to "pair" the requests. That being said, if you think "pairing" the requests improves the readability of your code, you might want to do so even if this is not required.
the buffer parameter of MPI_Isend() and MPI_Irecv() is a pointer to the start of the data, this is message (and not &message) here.
if you run with let's say 2 MPI tasks, MPI_Send(..., dest=2, ...) on rank 0 will fail because there 2 is an invalid rank in the MPI_COMM_WORLD communicator.
many requests are uninitialized when MPI_Waitall() (well, MPI_Testall() here) is invoked. One option is to first initialize all of them to MPI_REQUEST_NULL.
using &message results in memory corruption and that likely explains the crash.
From the MPI standard, the prototype is double MPI_Wtime(), so you'd rather use double here (the NaN likely come from the memory corruption described above)

MPI on Sun Grid Engine cluster

I am running MPI applications on a cluster with Sun Grid Engine, using OpenMPI.
Has anyone ever experience MPI communications which hang while the applications are running?
For example: Process on rank 0 calls MPI_Send to process on rank 1, and process on rank 1 calls MPI_Recv from process on rank 0. The ranks are correct, the tags are correct, however the communication just does not happen therefore the application never terminates.
N.B. The applications work on my laptop and other machines so it is not a case that I have some IDs wrong... Also, these applications work on the cluster the first time, but when I submit the exact same job again the MPI communication hangs as explained above.
If anyone has experienced something similar, any help would be appreciated, thanks.
EDIT:
I have implemented a very simple example of one of the applications:
#include <mpi.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdarg.h>
#include <pthread.h>
#include <sys/time.h>
#define NUM_CELLS 5 //Total number of integers to sort
/* Initial process to create numbers and send to first cell */
void *start(void *data)
{
int i, num;
time_t t;
srand((unsigned) time(&t));
//Create array of random numbers and print
for(i = 0; i < NUM_CELLS; i++)
{
num = rand() % 100;
printf("0 SEND\n");
MPI_Send(&num, 1, MPI_INT, 1, 1, MPI_COMM_WORLD);
printf("0 SENT\n");
}
return NULL;
}
/* Process for individual sort cell in sort pump */
void *sort_cell(void *data)
{
int *pos = (int *)data;
int num=2, i;
for(i = 0; i < NUM_CELLS; i++)
{
//Receive num
printf("%d WAIT\n", *pos);
if(*pos == 1)
MPI_Recv(&num, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
else if(*pos == 2)
MPI_Recv(&num, 1, MPI_INT, 1, 2, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
else if(*pos == 3)
MPI_Recv(&num, 1, MPI_INT, 2, 3, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
else if(*pos == 4)
MPI_Recv(&num, 1, MPI_INT, 3, 4, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
else if(*pos == 5)
MPI_Recv(&num, 1, MPI_INT, 4, 5, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("%d RECV\n", *pos);
//Keep larger number and send smaller number to next cell
printf("%d SEND\n", *pos);
if(*pos == 1)
MPI_Send(&num, 1, MPI_INT, 2, 2, MPI_COMM_WORLD);
else if(*pos == 2)
MPI_Send(&num, 1, MPI_INT, 3, 3, MPI_COMM_WORLD);
else if(*pos == 3)
MPI_Send(&num, 1, MPI_INT, 4, 4, MPI_COMM_WORLD);
else if(*pos == 4)
MPI_Send(&num, 1, MPI_INT, 1, 5, MPI_COMM_WORLD);
printf("%d SENT\n", *pos);
}
return NULL;
}
int main(int argc, char **argv)
{
int i;
double elapsedTime;
struct timeval t1, t2;
//Start timer
gettimeofday(&t1, NULL);
int my_rank, provided;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
//Stop timer
gettimeofday(&t2, NULL);
elapsedTime = (t2.tv_sec - t1.tv_sec) * 1000.0; //sec to ms
elapsedTime += (t2.tv_usec - t1.tv_usec) / 1000.0; //us to ms
printf("Rank %d - Setup time: %f milliseconds\n", my_rank, elapsedTime);
//Start timer
gettimeofday(&t1, NULL);
//Execute processes in parallel according to mapping
if(my_rank == 0)
{
int num_threads = 1;
pthread_t threads[num_threads];
pthread_create(&threads[0], NULL, start, NULL);
for(i = 0; i < num_threads; i++)
(void) pthread_join(threads[i], NULL);
}
if(my_rank == 1)
{
int pos = 1;
int num_threads = 2;
pthread_t threads[num_threads];
pthread_create(&threads[0], NULL, sort_cell, (void *) &pos);
int pos1 = 5;
pthread_create(&threads[1], NULL, sort_cell, (void *) &pos1);
for(i = 0; i < num_threads; i++)
(void) pthread_join(threads[i], NULL);
}
if(my_rank == 2)
{
int pos = 2;
int num_threads = 1;
pthread_t threads[num_threads];
pthread_create(&threads[0], NULL, sort_cell, (void *) &pos);
for(i = 0; i < num_threads; i++)
(void) pthread_join(threads[i], NULL);
}
if(my_rank == 3)
{
int pos = 3;
int num_threads = 1;
pthread_t threads[num_threads];
pthread_create(&threads[0], NULL, sort_cell, (void *) &pos);
for(i = 0; i < num_threads; i++)
(void) pthread_join(threads[i], NULL);
}
if(my_rank == 4)
{
int pos = 4;
int num_threads = 1;
pthread_t threads[num_threads];
pthread_create(&threads[0], NULL, sort_cell, (void *) &pos);
for(i = 0; i < num_threads; i++)
(void) pthread_join(threads[i], NULL);
}
MPI_Barrier(MPI_COMM_WORLD);
//Stop timer
gettimeofday(&t2, NULL);
elapsedTime = (t2.tv_sec - t1.tv_sec) * 1000.0; //sec to ms
elapsedTime += (t2.tv_usec - t1.tv_usec) / 1000.0; //us to ms
printf("Rank %d - Execution time: %f milliseconds\n", my_rank, elapsedTime);
MPI_Finalize();
return 0;
}
Depending on the order of the sort_cells on the ranks, it works and I cannot understand why..
The application is as follows:
Sort_cell -> Sort_cell -> Sort_cell -> Sort_cell -> Sort_cell -> Sort_cell
ranks: 0 1 2 3 4 1
(where each Sort_cell is a process run on the specified ranks respectively)
If these are run on ranks which are grouped, ie: 00111, 01233, 00112, etc it works.. but once i execute on a stray rank (like the above example) ie: 00110, 01221, 01231, etc the MPI communication hangs. Any suggestions?

MPI topology for arbitrary domains

I have a rectangular domain 3 x 2 as shown in figure (a). Using MPI Cartesian topology, I can identify the 4 neighbors (-1 means NULL) of each cell as demonstrated below
Figure: Rectangular and arbitrary domain
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
int rank, size;
int top = 0, right = 1, bottom = 2, left = 3;
int neighbors[4], dimSize[2] = {3, 2};
int usePeriods[2] = {0, 0}, newCoords[2];
MPI_Comm cartComm = MPI_COMM_WORLD;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (size != 6) {
if (rank == 0)
printf("Please Launch with ONLY 6 processes \n");
MPI_Finalize();
exit(0);
}
// Create a carthesian communicator
MPI_Cart_create(MPI_COMM_WORLD, 2, dimSize, usePeriods, 1, &cartComm);
// Obtain the 2D coordinates in the new communicator
MPI_Cart_coords(cartComm, rank, 2, newCoords);
// Obtain the direct neighbor ranks
MPI_Cart_shift(cartComm, 0, 1, neighbors + left, neighbors + right);
MPI_Cart_shift(cartComm, 1, 1, neighbors + top, neighbors + bottom);
printf("Rank: %d \t Neighbors(top, right, bottom, left): %2d, %2d, %2d, %2d \n", rank, neighbors[0], neighbors[1], neighbors[2], neighbors[3]);
MPI_Finalize();
return 0;
}
Results:
Rank: 0 Neighbors(top, right, bottom, left): -1, 2, 1, -1
Rank: 1 Neighbors(top, right, bottom, left): 0, 3, -1, -1
Rank: 2 Neighbors(top, right, bottom, left): -1, 4, 3, 0
Rank: 3 Neighbors(top, right, bottom, left): 2, 5, -1, 1
Rank: 4 Neighbors(top, right, bottom, left): -1, -1, 5, 2
Rank: 5 Neighbors(top, right, bottom, left): 4, -1, -1, 3
My question: Is there a way that uses MPI to identify these neighbors for non-rectangular domain as shown in figure (b). Here, we assume no periodic neighbors. Now, I can only think of implementing a similar Cartesian topology approach for 3x3 (figure c):
mpirun -np 9 ./run_cartesian
Next, remove NULL cells (gray color) and reorder the ranks from 0-5. Save result to a file, then load the file (using root) and scatter the corresponding neighbors to each processor.
mpirun -np 6 ./run_load_scatter
Any suggestion for better ways to do this? Thanks.

difficulty with MPI_Gather function

I have a value on local array (named lvotes) for each of the processors (assume 3 processors), and first element of each is storing a value, i.e.:
P0 : 4
P1 : 6
p2 : 7
Now, using MPI_Gather, I want gather them all in P0, so It will look like :
P0 : 4, 6, 7
I used gather this way:
MPI_Gather(lvotes, P, MPI_INT, lvotes, 1, MPI_INT, 0, MPI_COMM_WORLD);
But I get problems. It's my first time coding in MPI. I could use any suggestion.
Thanks
This is a common issue with people using the gather/scatter collectives for the first time; in both the send and receive counts you specify the count of items to send to or receive from each process. So although it's true that you'll be, in total, getting (say) P items, if P is the number of processors, that's not what you specify to the gather operation; you specify you are sending a count of 1, and receiving a count of 1 (from each process). Like so:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <mpi.h>
int main ( int argc, char **argv ) {
int rank;
int size;
int lvotes;
int *gvotes;
MPI_Init ( &argc, &argv );
MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
MPI_Comm_size ( MPI_COMM_WORLD, &size );
if (rank == 0)
gvotes = malloc(size * sizeof(int) );
/* everyone sets their first lvotes element */
lvotes = rank+4;
/* Gather to process 0 */
MPI_Gather(&lvotes, 1, MPI_INT, /* send 1 int from lvotes.. */
gvotes, 1, MPI_INT, /* gather 1 int each process into lvotes */
0, MPI_COMM_WORLD); /* ... to root process 0 */
printf("P%d: %d\n", rank, lvotes);
if (rank == 0) {
printf("P%d: Gathered ", rank);
for (int i=0; i<size; i++)
printf("%d ", gvotes[i]);
printf("\n");
}
if (rank == 0)
free(gvotes);
MPI_Finalize();
return 0;
}
Running gives
$ mpirun -np 3 ./gather
P1: 5
P2: 6
P0: 4
P0: Gathered 4 5 6

Resources