MPI topology for arbitrary domains - mpi

I have a rectangular domain 3 x 2 as shown in figure (a). Using MPI Cartesian topology, I can identify the 4 neighbors (-1 means NULL) of each cell as demonstrated below
Figure: Rectangular and arbitrary domain
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
int rank, size;
int top = 0, right = 1, bottom = 2, left = 3;
int neighbors[4], dimSize[2] = {3, 2};
int usePeriods[2] = {0, 0}, newCoords[2];
MPI_Comm cartComm = MPI_COMM_WORLD;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (size != 6) {
if (rank == 0)
printf("Please Launch with ONLY 6 processes \n");
MPI_Finalize();
exit(0);
}
// Create a carthesian communicator
MPI_Cart_create(MPI_COMM_WORLD, 2, dimSize, usePeriods, 1, &cartComm);
// Obtain the 2D coordinates in the new communicator
MPI_Cart_coords(cartComm, rank, 2, newCoords);
// Obtain the direct neighbor ranks
MPI_Cart_shift(cartComm, 0, 1, neighbors + left, neighbors + right);
MPI_Cart_shift(cartComm, 1, 1, neighbors + top, neighbors + bottom);
printf("Rank: %d \t Neighbors(top, right, bottom, left): %2d, %2d, %2d, %2d \n", rank, neighbors[0], neighbors[1], neighbors[2], neighbors[3]);
MPI_Finalize();
return 0;
}
Results:
Rank: 0 Neighbors(top, right, bottom, left): -1, 2, 1, -1
Rank: 1 Neighbors(top, right, bottom, left): 0, 3, -1, -1
Rank: 2 Neighbors(top, right, bottom, left): -1, 4, 3, 0
Rank: 3 Neighbors(top, right, bottom, left): 2, 5, -1, 1
Rank: 4 Neighbors(top, right, bottom, left): -1, -1, 5, 2
Rank: 5 Neighbors(top, right, bottom, left): 4, -1, -1, 3
My question: Is there a way that uses MPI to identify these neighbors for non-rectangular domain as shown in figure (b). Here, we assume no periodic neighbors. Now, I can only think of implementing a similar Cartesian topology approach for 3x3 (figure c):
mpirun -np 9 ./run_cartesian
Next, remove NULL cells (gray color) and reorder the ranks from 0-5. Save result to a file, then load the file (using root) and scatter the corresponding neighbors to each processor.
mpirun -np 6 ./run_load_scatter
Any suggestion for better ways to do this? Thanks.

Related

Why isin't the vector able to hold all 52 elements?

I have a program where I use a vector to simulate all the possible outcomes when counting cards in blackjack. There's only three possible values, -1, 0, and 1. There's 52 cards in a deck therefore the vector will have 52 elements, each assigned one of values mentioned above. The program works when I scale down the size of the vector, it still works when I have it as this size however I get no output and get the warning "warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data".
#include<iostream>
#include"subtracter.h"
#include<time.h>
#include<vector>
#include<random>
using namespace std;
int acecard = 4;
int twocard = 4;
int threecard = 4;
int fourcard = 4;
int fivecard = 4;
int sixcard = 4;
int sevencard = 4;
int eightcard = 4;
int ninecard = 4;
int tencard = 16;
// declares how many of each card there is
vector<int> cardvalues = {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
// a vector that describes how many cards there are with a certain value
vector<int> deck = { acecard, twocard, threecard, fourcard, fivecard, sixcard, sevencard, eightcard, ninecard, tencard };
// a vector keeping track of how many of each cards there's left in the deck
int start()
{
int deckcount;
deckcount = 0;
int decksize;
decksize = cardvalues.size();
while (decksize >= 49)
{
deckcount += cardsubtracter(cardvalues);
};
return deckcount;
}
int cardcounting()
{
int deckcount;
deckcount = start();
deckcount += cardsubtracter(cardvalues);
return deckcount;
}
int main()
{
int value;
value = cardcounting();
int size;
size = cardvalues.size();
cout << value << "\n";
cout << size;
return 0;
}
#include<iostream>
#include<random>
using namespace std;
int numbergenerator(int x, int y)
{
int number;
random_device generator;
uniform_int_distribution<>distrib(x, y);
number = distrib(generator); //picks random element from vector
return number;
}
int cardsubtracter(vector<int> mynum)
{
int counter;
int size;
int number;
size = mynum.size() - 1;//gives the range of values to picked from the vectorlist
number = numbergenerator(0, size);//gives a random number to pick from the vectorlist
counter = mynum[number]; // uses the random number to pick a value from the vectorlist
mynum.erase(mynum.begin()+number); //removes that value from the vectorlist
return counter;
}
I looked up the max limit of vectors and it said that vectors can hold up 232 values with integers, which should work for this. So I also tried creating a new file and copying the code over to that in case there was something wrong with this file.
There could be different reasons why a vector may not be able to hold all 52 elements. Some possible reasons are:
Insufficient memory: Each element in a vector requires a certain amount of memory, and the total memory required for all 52 elements may exceed the available memory. This can happen if the elements are large, or if there are many other variables or data structures in the environment that consume memory.
Data type limitations: The data type of the vector may not be able to accommodate all 52 elements. For example, if the vector is of type "integer", it can only hold integers up to a certain limit, beyond which it will overflow or produce incorrect results.
Code errors: There may be errors in the code that prevent all 52 elements from being added to the vector. For example, if the vector is being filled in a loop, there may be a mistake in the loop condition or in the indexing that causes the loop to terminate early or skip some elements.
To determine the exact reason for the vector not being able to hold all 52 elements, it is necessary to examine the code, the data types involved, and the memory usage.

How to update message without using if statement?

I have a number of processors, let's say, 9, that are arange like a ring together. So, the processors communicating with each other in a ring and in a non-blocking setting MPI_Isend() and MPI_Irecv(). And the task is to recieve the rank of previous proccessor and add that to its own rank, and then pass it to its neighor. This continues until reaching to the processor '0' again. Then processor '0' prints the sum which is n(n+1)/2 ( in this case 45). I know that these non-blicking function return immediately even if the communication is not finished, and MPI_Wait() is needed to ensure the completion of the communication. And I know that it's better to have a buffer of size 2 to store the rank and sum. But I don,t know how and when to update the message before sending it to the next rank?
I don't want to use if statemet. Lik if(rank==0) then send to 1 and add then if(rank==1) receive from 0 and then add 1 and then send to 2,... Since this one is highly inefficient for larg number of processor.
int main (int argc, char *argv[])
{
int size, rank, next, prev;
int buf[2],
MPI_Request reqs[9];
MPI_Status stats[9];
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
prev = rank-1;
next = rank+1;
if (rank == 0) prev = size - 1;
if (rank == (size - 1)) next = 0;
//MPI_Irecv (&buf,count,datatype,source,tag,comm,&request)
ierror = MPI_Irecv(&buf[0], 1, MPI_INT, prev, tag1, MPI_COMM_WORLD, &reqs[0]);
ierror = MPI_Irecv(&buf[1], 1, MPI_INT, next, tag2, MPI_COMM_WORLD, &reqs[1]);
//MPI_Isend (&buf,count,datatype,dest,tag,comm,&request)
ierror = MPI_Isend(&buf[0], 1, MPI_INT, prev, tag2, MPI_COMM_WORLD, &reqs[2]);
ierror = MPI_Isend(&buf[1], 1, MPI_INT, next, tag1, MPI_COMM_WORLD, &reqs[3]);
ierror = MPI_Waitall(9, reqs, stats);

when using MPI_Isend() MPI_and Irecv(), should the pair use the same request? in case the message is already an array, how it should be passed?

In this code I am trying to broadcast using non blocking send and receive as a practice. I have multiple questions and issues.
1.Should I pair Isend() and Irecv() to use the same request?
2.When the message is an array, how should it be passed? in this case, message or &message?
3.Why I cannot run this code on less or more than 8 processors? if the rank doesn't exit, shouldn't it just go on without executing that piece of code?
4.The snippet on the at the bottom is there in order to print the total time once, but the waitall() does not work, and I do not understand why.
5. When passing arrays longer than 2^12, I get segmentation error, while I have checked the limits of Isend() and Irecv() and they supposed to handle even bigger length messages.
6.I used long double for record the time, is this a common or good practice? when I used smaller variables like float or double I would get nan.
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include<mpi.h>
int main(int argc, char *argv[]){
MPI_Init(&argc, &argv);
int i, rank, size, ready;
long int N = pow(2, 10);
float* message = (float *)malloc(sizeof(float *) * N + 1);
long double start, end;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
//MPI_Request* request = (MPI_Request *)malloc(sizeof(MPI_Request *) * size);
MPI_Request request[size-1];
/*Stage I: -np 8*/
if(rank == 0){
for(i = 0; i < N; i++){
message[i] = N*rand();
message[i] /= rand();
}
start = MPI_Wtime();
MPI_Isend(&message, N, MPI_FLOAT, 1, 0, MPI_COMM_WORLD, &request[0]);
MPI_Isend(&message, N, MPI_FLOAT, 2, 0, MPI_COMM_WORLD, &request[1]);
MPI_Isend(&message, N, MPI_FLOAT, 4, 0, MPI_COMM_WORLD, &request[3]);
printf("Processor root-rank %d- sent the message...\n", rank);
}
if (rank == 1){
MPI_Irecv(&message, N, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, &request[0]);
MPI_Wait(&request[0], MPI_STATUS_IGNORE);
printf("Processor rank 1 received the message.\n");
MPI_Isend(&message, N, MPI_FLOAT, 3, 0, MPI_COMM_WORLD, &request[2]);
MPI_Isend(&message, N, MPI_FLOAT, 5, 0, MPI_COMM_WORLD, &request[4]);
}
if(rank == 2){
MPI_Irecv(&message, N, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, &request[1]);
MPI_Wait(&request[1], MPI_STATUS_IGNORE);
printf("Processor rank 2 received the message.\n");
MPI_Isend(&message, N, MPI_FLOAT, 6, 0, MPI_COMM_WORLD, &request[5]);
}
if(rank == 3){
MPI_Irecv(&message, N, MPI_FLOAT, 1, 0, MPI_COMM_WORLD, &request[2]);
MPI_Wait(&request[2], MPI_STATUS_IGNORE);
printf("Processor rank 3 received the message.\n");
MPI_Isend(&message, N, MPI_FLOAT, 7, 0, MPI_COMM_WORLD, &request[6]);
}
if(rank == 4){
MPI_Irecv(&message, N, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, &request[3]);
MPI_Wait(&request[3], MPI_STATUS_IGNORE);
printf("Processor rank 4 received the message.\n");
}
if(rank == 5){
MPI_Irecv(&message, N, MPI_FLOAT, 1, 0, MPI_COMM_WORLD, &request[4]);
MPI_Wait(&request[4], MPI_STATUS_IGNORE);
printf("Processor rank 5 received the message.\n");
}
if(rank == 6){
MPI_Irecv(&message, N, MPI_FLOAT, 2, 0, MPI_COMM_WORLD, &request[5]);
MPI_Wait(&request[5], MPI_STATUS_IGNORE);
printf("Processor rank 6 received the message.\n");
}
if(rank == 7){
MPI_Irecv(&message, N, MPI_FLOAT, 3, 0, MPI_COMM_WORLD, &request[6]);
MPI_Wait(&request[6], MPI_STATUS_IGNORE);
printf("Processor rank 7 received the message.\n");
}
/*MPI_Testall(size-1,request,&ready, MPI_STATUS_IGNORE);*/
/* if (ready){*/
end = MPI_Wtime();
printf("Total Time: %Lf\n", end - start);
/*}*/
MPI_Finalize();
}
Each MPI task runs in its own address space, so there is no correlation between request[1] on rank 0 and request[1] on rank 2. That means you do not have to "pair" the requests. That being said, if you think "pairing" the requests improves the readability of your code, you might want to do so even if this is not required.
the buffer parameter of MPI_Isend() and MPI_Irecv() is a pointer to the start of the data, this is message (and not &message) here.
if you run with let's say 2 MPI tasks, MPI_Send(..., dest=2, ...) on rank 0 will fail because there 2 is an invalid rank in the MPI_COMM_WORLD communicator.
many requests are uninitialized when MPI_Waitall() (well, MPI_Testall() here) is invoked. One option is to first initialize all of them to MPI_REQUEST_NULL.
using &message results in memory corruption and that likely explains the crash.
From the MPI standard, the prototype is double MPI_Wtime(), so you'd rather use double here (the NaN likely come from the memory corruption described above)

Is there a non-blocking version of MPI_Comm_create_group()?

Suppose that we want to create 3 communicators for three processes with ranks 0,1 and 2 and the groups of the communicators are {0,1}, {0,2} and {1,2}.
It seems that (see the example code below) if each process calls MPI_Comm_create_group() in the following order, there will be a blockage.
Process 0
create a communicator for the group {0,1}
create a communicator for the group {0,2}
Process 1
create a communicator for the group {1,2}
create a communicator for the group {0,1}
Process 2
create a communicator for the group {0,2}
create a communicator for the group {1,2}
What solutions are there? One solution is obviously changing the order of the calls to MPI_Comm_create_group().
The example code:
#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
int main()
{
MPI_Init(NULL, NULL);
int worldsize, worldrank;
MPI_Comm_size(MPI_COMM_WORLD, &worldsize);
MPI_Comm_rank(MPI_COMM_WORLD, &worldrank);
if (worldsize < 3)
{
if (worldrank==0) printf("Please launch at least 3 processes!\n");
MPI_Finalize();
return 0;
}
MPI_Group worldgroup, newgroup;
MPI_Comm newcomm1, newcomm2;
MPI_Comm_group(MPI_COMM_WORLD, &worldgroup);
int set01[2] = {0, 1};
int set02[2] = {0, 2};
int set12[2] = {1, 2};
switch (worldrank)
{
case 0:
MPI_Group_incl(worldgroup, 2, set01, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 0, &newcomm1);
MPI_Group_incl(worldgroup, 2, set02, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 1, &newcomm2);
break;
case 1:
MPI_Group_incl(worldgroup, 2, set12, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 2, &newcomm1);
MPI_Group_incl(worldgroup, 2, set01, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 0, &newcomm2);
break;
case 2:
MPI_Group_incl(worldgroup, 2, set02, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 1, &newcomm1);
MPI_Group_incl(worldgroup, 2, set12, &newgroup);
MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 2, &newcomm2);
break;
}
MPI_Finalize();
}

difficulty with MPI_Gather function

I have a value on local array (named lvotes) for each of the processors (assume 3 processors), and first element of each is storing a value, i.e.:
P0 : 4
P1 : 6
p2 : 7
Now, using MPI_Gather, I want gather them all in P0, so It will look like :
P0 : 4, 6, 7
I used gather this way:
MPI_Gather(lvotes, P, MPI_INT, lvotes, 1, MPI_INT, 0, MPI_COMM_WORLD);
But I get problems. It's my first time coding in MPI. I could use any suggestion.
Thanks
This is a common issue with people using the gather/scatter collectives for the first time; in both the send and receive counts you specify the count of items to send to or receive from each process. So although it's true that you'll be, in total, getting (say) P items, if P is the number of processors, that's not what you specify to the gather operation; you specify you are sending a count of 1, and receiving a count of 1 (from each process). Like so:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <mpi.h>
int main ( int argc, char **argv ) {
int rank;
int size;
int lvotes;
int *gvotes;
MPI_Init ( &argc, &argv );
MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
MPI_Comm_size ( MPI_COMM_WORLD, &size );
if (rank == 0)
gvotes = malloc(size * sizeof(int) );
/* everyone sets their first lvotes element */
lvotes = rank+4;
/* Gather to process 0 */
MPI_Gather(&lvotes, 1, MPI_INT, /* send 1 int from lvotes.. */
gvotes, 1, MPI_INT, /* gather 1 int each process into lvotes */
0, MPI_COMM_WORLD); /* ... to root process 0 */
printf("P%d: %d\n", rank, lvotes);
if (rank == 0) {
printf("P%d: Gathered ", rank);
for (int i=0; i<size; i++)
printf("%d ", gvotes[i]);
printf("\n");
}
if (rank == 0)
free(gvotes);
MPI_Finalize();
return 0;
}
Running gives
$ mpirun -np 3 ./gather
P1: 5
P2: 6
P0: 4
P0: Gathered 4 5 6

Resources