I have a 3d cube represented in a vector where it is divided vertically into faces between the MPI processes. In order to do the calculation I need to pass the face+1 and the face-1 to be able to compare the extremes. The problem comes when I make the sending and receiving that fails. I do not find the problem when the indices are correct.
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <malloc.h>
#include <string.h>
#include <math.h>
#include <time.h>
#include "aux.h"
#define STABILITY 1.0f / sqrt(3.0f)
void mdf_heat(double *u0,
double *u1,
double *aux,
const unsigned int npX,
const unsigned int npY,
const unsigned int npZ,
const double deltaH,
const double deltaT,
const double inErr,
const double boundaries,
const int me,
const int np)
{
double left, right, up, down, top, bottom;
double alpha = deltaT / (deltaH * deltaH);
MPI_Status status;
int continued = 1;
unsigned int steps = 0;
while (continued)
{
steps++;
if (me == 0)
{
MPI_Send(&u0[npX * npY * (npZ - 1)], npX * npY, MPI_DOUBLE, me + 1, 0, MPI_COMM_WORLD);
// printf("(S:%d)(P:%d) Envio a %d\n", steps, me, me + 1);
MPI_Recv(&aux[0], npX * npY, MPI_DOUBLE, me + 1, 0, MPI_COMM_WORLD, &status);
printf("(S:%d)(P:%d) Recibo de %d\n", steps, me, me + 1);
}
else if (me == np - 1)
{
MPI_Send(&u0[0], npX * npY, MPI_DOUBLE, me - 1, 0, MPI_COMM_WORLD);
// printf("(S:%d)(P:%d) Envio a %d\n", steps, me, me - 1);
MPI_Recv(&aux[0], npX * npY, MPI_DOUBLE, me - 1, 0, MPI_COMM_WORLD, &status);
printf("(S:%d)(P:%d) Recibo de %d\n", steps, me, me - 1);
}
else
{
MPI_Send(&u0[0], npX * npY, MPI_DOUBLE, me - 1, 0, MPI_COMM_WORLD);
MPI_Send(&u0[npX * npY * (npZ - 1)], npX * npY, MPI_DOUBLE, me + 1, 0, MPI_COMM_WORLD);
// printf("(S:%d)(P:%d) Envio a %d y %d\n", steps, me, me - 1, me + 1);
MPI_Recv(&aux[0], npX * npY, MPI_DOUBLE, me - 1, 0, MPI_COMM_WORLD, &status);
MPI_Recv(&aux[npX * npY], npX * npY, MPI_DOUBLE, me + 1, 0, MPI_COMM_WORLD, &status);
printf("(S:%d)(P:%d) Recibo de %d y %d\n", steps, me, me - 1, me + 1);
}
....
}
fprintf(stdout, "[%d] Done! in %u steps\n", me, steps);
}
int main(int ac, char **av)
{
....
unsigned int npX = (unsigned int)(sizeX / deltaH);
unsigned int npY = (unsigned int)(sizeY / deltaH);
unsigned int npZ = (unsigned int)(sizeZ / deltaH) / np;
u0_per_process = (double *)calloc(npZ * npY * npX, sizeof(double));
u1_per_process = (double *)calloc(npZ * npY * npX, sizeof(double));
if (me == 0 || me == np - 1)
aux_per_process = (double *)calloc(npZ * npY * 1, sizeof(double));
else
aux_per_process = (double *)calloc(npZ * npY * 2, sizeof(double));
printf("p(%d) (%u, %u, %u)\n", me, npX, npY, npZ);
mdf_heat(u0_per_process, u1_per_process, aux_per_process, npX, npY, npZ, deltaH, deltaT, 1e-15, 100.0f, me, np);
....
}
Console error:
mpirun ./a.out 0.125
p(1) (8, 8, 2)
p(3) (8, 8, 2)
p(2) (8, 8, 2)
p(0) (8, 8, 2)
(S:1)(P:3) Recibo de 2
(S:1)(P:0) Recibo de 1
(S:1)(P:2) Recibo de 1 y 3
(S:1)(P:1) Recibo de 0 y 2
[mateev:113552] *** Process received signal ***
[mateev:113549] *** Process received signal ***
[mateev:113549] Signal: Segmentation fault (11)
[mateev:113549] Signal code: Address not mapped (1)
[mateev:113549] Failing at address: 0x48
[mateev:113550] *** Process received signal ***
[mateev:113550] Signal: Segmentation fault (11)
[mateev:113550] Signal code: Address not mapped (1)
[mateev:113550] Failing at address: 0x48
[mateev:113552] Signal: Segmentation fault (11)
[mateev:113552] Signal code: Address not mapped (1)
[mateev:113552] Failing at address: 0x48
a.out: malloc.c:4036: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.
malloc(): invalid size (unsorted)
[mateev:113552] *** Process received signal ***
[mateev:113552] Signal: Aborted (6)
[mateev:113552] Signal code: (-6)
[mateev:113550] *** Process received signal ***
[mateev:113550] Signal: Aborted (6)
[mateev:113550] Signal code: (-6)
[mateev:113549] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x430c0)[0x7f8bb12180c0]
[mateev:113549] [ 1] /home/vladimir/.openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x77)[0x7f8bb0177417]
[mateev:113549] [ 2] /home/vladimir/.openmpi/lib/libmpi.so.40(PMPI_Send+0x123)[0x7f8bb1488fb3]
[mateev:113549] [ 3] ./a.out(+0x1419)[0x55899d429419]
[mateev:113549] [ 4] ./a.out(+0x1965)[0x55899d429965]
[mateev:113549] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f8bb11f90b3]
[mateev:113549] [ 6] ./a.out(+0x128e)[0x55899d42928e]
[mateev:113549] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node mateev exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I have tried to play with the indexes, if I only send 1 double it doesn't have the segmentation fault.
But cannot send the entire face.
Related
In this code I am trying to broadcast using non blocking send and receive as a practice. I have multiple questions and issues.
1.Should I pair Isend() and Irecv() to use the same request?
2.When the message is an array, how should it be passed? in this case, message or &message?
3.Why I cannot run this code on less or more than 8 processors? if the rank doesn't exit, shouldn't it just go on without executing that piece of code?
4.The snippet on the at the bottom is there in order to print the total time once, but the waitall() does not work, and I do not understand why.
5. When passing arrays longer than 2^12, I get segmentation error, while I have checked the limits of Isend() and Irecv() and they supposed to handle even bigger length messages.
6.I used long double for record the time, is this a common or good practice? when I used smaller variables like float or double I would get nan.
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include<mpi.h>
int main(int argc, char *argv[]){
MPI_Init(&argc, &argv);
int i, rank, size, ready;
long int N = pow(2, 10);
float* message = (float *)malloc(sizeof(float *) * N + 1);
long double start, end;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
//MPI_Request* request = (MPI_Request *)malloc(sizeof(MPI_Request *) * size);
MPI_Request request[size-1];
/*Stage I: -np 8*/
if(rank == 0){
for(i = 0; i < N; i++){
message[i] = N*rand();
message[i] /= rand();
}
start = MPI_Wtime();
MPI_Isend(&message, N, MPI_FLOAT, 1, 0, MPI_COMM_WORLD, &request[0]);
MPI_Isend(&message, N, MPI_FLOAT, 2, 0, MPI_COMM_WORLD, &request[1]);
MPI_Isend(&message, N, MPI_FLOAT, 4, 0, MPI_COMM_WORLD, &request[3]);
printf("Processor root-rank %d- sent the message...\n", rank);
}
if (rank == 1){
MPI_Irecv(&message, N, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, &request[0]);
MPI_Wait(&request[0], MPI_STATUS_IGNORE);
printf("Processor rank 1 received the message.\n");
MPI_Isend(&message, N, MPI_FLOAT, 3, 0, MPI_COMM_WORLD, &request[2]);
MPI_Isend(&message, N, MPI_FLOAT, 5, 0, MPI_COMM_WORLD, &request[4]);
}
if(rank == 2){
MPI_Irecv(&message, N, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, &request[1]);
MPI_Wait(&request[1], MPI_STATUS_IGNORE);
printf("Processor rank 2 received the message.\n");
MPI_Isend(&message, N, MPI_FLOAT, 6, 0, MPI_COMM_WORLD, &request[5]);
}
if(rank == 3){
MPI_Irecv(&message, N, MPI_FLOAT, 1, 0, MPI_COMM_WORLD, &request[2]);
MPI_Wait(&request[2], MPI_STATUS_IGNORE);
printf("Processor rank 3 received the message.\n");
MPI_Isend(&message, N, MPI_FLOAT, 7, 0, MPI_COMM_WORLD, &request[6]);
}
if(rank == 4){
MPI_Irecv(&message, N, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, &request[3]);
MPI_Wait(&request[3], MPI_STATUS_IGNORE);
printf("Processor rank 4 received the message.\n");
}
if(rank == 5){
MPI_Irecv(&message, N, MPI_FLOAT, 1, 0, MPI_COMM_WORLD, &request[4]);
MPI_Wait(&request[4], MPI_STATUS_IGNORE);
printf("Processor rank 5 received the message.\n");
}
if(rank == 6){
MPI_Irecv(&message, N, MPI_FLOAT, 2, 0, MPI_COMM_WORLD, &request[5]);
MPI_Wait(&request[5], MPI_STATUS_IGNORE);
printf("Processor rank 6 received the message.\n");
}
if(rank == 7){
MPI_Irecv(&message, N, MPI_FLOAT, 3, 0, MPI_COMM_WORLD, &request[6]);
MPI_Wait(&request[6], MPI_STATUS_IGNORE);
printf("Processor rank 7 received the message.\n");
}
/*MPI_Testall(size-1,request,&ready, MPI_STATUS_IGNORE);*/
/* if (ready){*/
end = MPI_Wtime();
printf("Total Time: %Lf\n", end - start);
/*}*/
MPI_Finalize();
}
Each MPI task runs in its own address space, so there is no correlation between request[1] on rank 0 and request[1] on rank 2. That means you do not have to "pair" the requests. That being said, if you think "pairing" the requests improves the readability of your code, you might want to do so even if this is not required.
the buffer parameter of MPI_Isend() and MPI_Irecv() is a pointer to the start of the data, this is message (and not &message) here.
if you run with let's say 2 MPI tasks, MPI_Send(..., dest=2, ...) on rank 0 will fail because there 2 is an invalid rank in the MPI_COMM_WORLD communicator.
many requests are uninitialized when MPI_Waitall() (well, MPI_Testall() here) is invoked. One option is to first initialize all of them to MPI_REQUEST_NULL.
using &message results in memory corruption and that likely explains the crash.
From the MPI standard, the prototype is double MPI_Wtime(), so you'd rather use double here (the NaN likely come from the memory corruption described above)
I want to send a set of data with the MPI_Type_struct and one of them is a pointer to an array (because the matrices that I'm going to use are going to be very large and I need to do malloc). The problem I see is that all the data is passed correctly except the matrix. I know that it is possible to pass a matrix through the pointer since if I only send the pointer of the matrix, correct results are observed.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
void main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int size, rank;
int m,n;
m=n=2;
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
typedef struct estruct
{
float *array;
int sizeM, sizeK, sizeN, rank_or;
} ;
struct estruct kernel, server;
MPI_Datatype types[5] = {MPI_FLOAT, MPI_INT,MPI_INT,MPI_INT,MPI_INT};
MPI_Datatype newtype;
int lengths[5] = {n*m,1,1,1,1};
MPI_Aint displacements[5];
displacements[0] = (size_t) & (kernel.array[0]) - (size_t)&kernel;
displacements[1] = (size_t) & (kernel.sizeM) - (size_t)&kernel;
displacements[2] = (size_t) & (kernel.sizeK) - (size_t)&kernel;
displacements[3] = (size_t) & (kernel.sizeN) - (size_t)&kernel;
displacements[4] = (size_t) & (kernel.rank_or) - (size_t)&kernel;
MPI_Type_struct(5, lengths, displacements, types, &newtype);
MPI_Type_commit(&newtype);
if (rank == 0)
{
kernel.array = (float *)malloc(m * n * sizeof(float));
for(int i = 0; i < m*n; i++) kernel.array[i] = i;
kernel.sizeM = 5;
kernel.sizeK = 5;
kernel.sizeN = 5;
kernel.rank_or = 5;
MPI_Send(&kernel, 1, newtype, 1, 0, MPI_COMM_WORLD);
}
else
{
server.array = (float *)malloc(m * n * sizeof(float));
MPI_Recv(&server, 1, newtype, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("%i \n", server.sizeM);
printf("%i \n", server.sizeK);
printf("%i \n", server.sizeN);
printf("%i \n", server.rank_or);
for(int i = 0; i < m*n; i++) printf("%f\n",server.array[i]);
}
MPI_Finalize();
}
Assuming that only two processes are executed,I expect that process with rank = 1 receive and display the correct elements of the matrix on the screen (the other elements are well received), but the actual output is:
5
5
5
5
0.065004
0.000000
0.000000
0.000000
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 26206 RUNNING AT pmul
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
I hope someone can help me.
I have some sample code:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(&argc, &argv);
// Find out rank, size
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// We are assuming at least 2 processes for this task
if (world_size < 2) {
fprintf(stderr, "World size must be greater than 1 for %s\n", argv[0]);
MPI_Abort(MPI_COMM_WORLD, 1);
}
int number;
if (world_rank == 1) {
number = -1;
MPI_Send(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
raise(SIGSEGV);
} else if (world_rank == 0) {
MPI_Recv(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 0 received number %d from process 1\n", number);
}
printf("rank %d finalize\n", world_rank);
MPI_Finalize();
}
Rank 1 raises a signal to simulate crash. After the raise() rank 1 exits. But rank 0 stills prints rank 0 finalize.
Is there any way to know in rank 0 whether rank 1 crashes in this case? Is it possible to let mpirun kill rank 0 when rank 1 crashes?
Note there is a race condition in your problem, and mpirun might have not enough time to notice task 1 crashed and kill task 0 before the message is printed.
You can force Open MPI to kill all tasks as soon as a crash is detected with the option below
mpirun -mca orte_abort_on_non_zero_status 1 ...
I would like to use a shared memory between processes. I tried MPI_Win_allocate_shared but it gives me a strange error when I execute the program:
Assertion failed in file ./src/mpid/ch3/include/mpid_rma_shm.h at line 592: local_target_rank >= 0
internal ABORT
Here's my source:
# include <stdlib.h>
# include <stdio.h>
# include <time.h>
# include "mpi.h"
int main ( int argc, char *argv[] );
void pt(int t[], int s);
int main ( int argc, char *argv[] )
{
int rank, size, shared_elem = 0, i;
MPI_Init ( &argc, &argv );
MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
MPI_Comm_size ( MPI_COMM_WORLD, &size );
MPI_Win win;
int *shared;
if (rank == 0) shared_elem = size;
MPI_Win_allocate_shared(shared_elem*sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &shared, &win);
if(rank==0)
{
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, MPI_MODE_NOCHECK, win);
for(i = 0; i < size; i++)
{
shared[i] = -1;
}
MPI_Win_unlock(0,win);
}
MPI_Barrier(MPI_COMM_WORLD);
int *local = (int *)malloc( size * sizeof(int) );
MPI_Win_lock(MPI_LOCK_SHARED, 0, 0, win);
for(i = 0; i < 10; i++)
{
MPI_Get(&(local[i]), 1, MPI_INT, 0, i,1, MPI_INT, win);
}
printf("processus %d (avant): ", rank);
pt(local,size);
MPI_Win_unlock(0,win);
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win);
MPI_Put(&rank, 1, MPI_INT, 0, rank, 1, MPI_INT, win);
MPI_Win_unlock(0,win);
MPI_Win_lock(MPI_LOCK_SHARED, 0, 0, win);
for(i = 0; i < 10; i++)
{
MPI_Get(&(local[i]), 1, MPI_INT, 0, i,1, MPI_INT, win);
}
printf("processus %d (apres): ", rank);
pt(local,size);
MPI_Win_unlock(0,win);
MPI_Win_free(&win);
MPI_Free_mem(shared);
MPI_Free_mem(local);
MPI_Finalize ( );
return 0;
}
void pt(int t[],int s)
{
int i = 0;
while(i < s)
{
printf("%d ",t[i]);
i++;
}
printf("\n");
}
I get the following result:
processus 0 (avant): -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
processus 0 (apres): 0 -1 -1 -1 -1 -1 -1 -1 -1 -1
processus 4 (avant): 0 -1 -1 -1 -1 -1 -1 -1 -1 -1
processus 4 (apres): 0 -1 -1 -1 4 -1 -1 -1 -1 -1
Assertion failed in file ./src/mpid/ch3/include/mpid_rma_shm.h at line 592: local_target_rank >= 0
internal ABORT - process 5
Assertion failed in file ./src/mpid/ch3/include/mpid_rma_shm.h at line 592: local_target_rank >= 0
internal ABORT - process 6
Assertion failed in file ./src/mpid/ch3/include/mpid_rma_shm.h at line 592: local_target_rank >= 0
internal ABORT - process 9
Can someone please help me figure out what's going wrong & what that error means ? Thanks a lot.
MPI_Win_allocate_shared is a departure from the very abstract nature of MPI. It exposes the underlying memory organisation and allows the programs to bypass the expensive (and often confusing) MPI RMA operations and utilise the shared memory directly on systems that have such. While MPI typically deals with distributed-memory environments where ranks do not share the physical memory address space, a typical HPC system nowadays consists of many interconnected shared-memory nodes. Thus, it is possible for ranks that execute on the same node to attach to shared memory segments and communicate by sharing data instead of message passing.
MPI provides a communicator split operation that allows one to create subgroups of ranks such that the ranks in each subgroup are able to share memory:
MPI_Comm_split_type(comm, MPI_COMM_TYPE_SHARED, key, info, &newcomm);
On a typical cluster, this essentially groups the ranks by the nodes they execute on. Once the split is done, a shared-memory window allocation can be executed over the ranks in each newcomm. Note that for a multi-node cluster job this will result in several independent newcomm communicators and thus several shared memory windows. Ranks on one node won't (and shouldn't) be able to see the shared memory windows on other nodes.
In that regard, MPI_Win_allocate_shared is a platform-independent wrapper around the OS-specific mechanisms for shared memory allocation.
There are several problems with this code and the usage. Some of these are mentioned in #Hristolliev's answer.
you have to run all the processes in the same node to have a intranode communicator or use "communicator split shared".
you need to run this code with at least 10 processes.
Third, local should be deallocated with free().
you should get the shared pointer from a query.
you should deallocate shared (I think this is taken care by Win_free)
This is the resulting code:
# include <stdlib.h>
# include <stdio.h>
# include <time.h>
# include "mpi.h"
int main ( int argc, char *argv[] );
void pt(int t[], int s);
int main ( int argc, char *argv[] )
{
int rank, size, shared_elem = 0, i;
MPI_Init ( &argc, &argv );
MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
MPI_Comm_size ( MPI_COMM_WORLD, &size );
MPI_Win win;
int *shared;
// if (rank == 0) shared_elem = size;
// MPI_Win_allocate_shared(shared_elem*sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &shared, &win);
if (rank == 0)
{
MPI_Win_allocate_shared(size, sizeof(int), MPI_INFO_NULL,
MPI_COMM_WORLD, &shared, &win);
}
else
{
int disp_unit;
MPI_Aint ssize;
MPI_Win_allocate_shared(0, sizeof(int), MPI_INFO_NULL,
MPI_COMM_WORLD, &shared, &win);
MPI_Win_shared_query(win, 0, &ssize, &disp_unit, &shared);
}
if(rank==0)
{
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, MPI_MODE_NOCHECK, win);
for(i = 0; i < size; i++)
{
shared[i] = -1;
}
MPI_Win_unlock(0,win);
}
MPI_Barrier(MPI_COMM_WORLD);
int *local = (int *)malloc( size * sizeof(int) );
MPI_Win_lock(MPI_LOCK_SHARED, 0, 0, win);
for(i = 0; i < 10; i++)
{
MPI_Get(&(local[i]), 1, MPI_INT, 0, i,1, MPI_INT, win);
}
printf("processus %d (avant): ", rank);
pt(local,size);
MPI_Win_unlock(0,win);
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win);
MPI_Put(&rank, 1, MPI_INT, 0, rank, 1, MPI_INT, win);
MPI_Win_unlock(0,win);
MPI_Win_lock(MPI_LOCK_SHARED, 0, 0, win);
for(i = 0; i < 10; i++)
{
MPI_Get(&(local[i]), 1, MPI_INT, 0, i,1, MPI_INT, win);
}
printf("processus %d (apres): ", rank);
pt(local,size);
MPI_Win_unlock(0,win);
MPI_Win_free(&win);
// MPI_Free_mem(shared);
free(local);
// MPI_Free_mem(local);
MPI_Finalize ( );
return 0;
}
void pt(int t[],int s)
{
int i = 0;
while(i < s)
{
printf("%d ",t[i]);
i++;
}
printf("\n");
}
I am trying to write my own MPI function that would compute the smallest number in a vector and broadcast that to all processes. I treat the processes as a binary tree, and find the minimum as I move from leaves to the root. Then I send message from the root to the leaves through its children. But I get a segmentation fault when I trying to receive the minimum value from the left child (process rank 3) of process rank 1 in an execution with just 4 processes ranked from 0 to 3.
void Communication::ReduceMin(double &partialMin, double &totalMin)
{
MPI_Barrier(MPI_COMM_WORLD);
double *leftChild, *rightChild;
leftChild = (double *)malloc(sizeof(double));
rightChild = (double *)malloc(sizeof(double));
leftChild[0]=rightChild[0]=1e10;
cout<<"COMM REDMIN: "<<myRank<<" "<<partialMin<<" "<<nProcs<<endl;
MPI_Status *status;
//MPI_Recv from 2*i+1 amd 2*i+2
if(nProcs > 2*myRank+1)
{
cout<<myRank<<" waiting from "<<2*myRank+1<<" for "<<leftChild[0]<<endl;
MPI_Recv((void *)&leftChild[0], 1, MPI_DOUBLE, 2*myRank+1, 2*myRank+1, MPI_COMM_WORLD, status); //SEG FAULT HERE
cout<<myRank<<" got from "<<2*myRank+1<<endl;
}
if(nProcs > 2*myRank+2)
{
cout<<myRank<<" waiting from "<<2*myRank+2<<endl;
MPI_Recv((void *)rightChild, 1, MPI_DOUBLE, 2*myRank+2, 2*myRank+2, MPI_COMM_WORLD, status);
cout<<myRank<<" got from "<<2*myRank+1<<endl;
}
//sum it up
cout<<myRank<<" finding the min"<<endl;
double myMin = min(min(leftChild[0], rightChild[0]), partialMin);
//MPI_Send to (i+1)/2-1
if(myRank!=0)
{
cout<<myRank<<" sending "<<myMin<<" to "<<(myRank+1)/2 -1 <<endl;
MPI_Send((void *)&myMin, 1, MPI_DOUBLE, (myRank+1)/2 - 1, myRank, MPI_COMM_WORLD);
}
double min;
//MPI_Recv from (i+1)/2-1
if(myRank!=0)
{
cout<<myRank<<" waiting from "<<(myRank+1)/2-1<<endl;
MPI_Recv((void *)&min, 1, MPI_DOUBLE, (myRank+1)/2 - 1, (myRank+1)/2 - 1, MPI_COMM_WORLD, status);
cout<<myRank<<" got from "<<(myRank+1)/2-1<<endl;
}
totalMin = min;
//MPI_send to 2*i+1 and 2*i+2
if(nProcs > 2*myRank+1)
{
cout<<myRank<<" sending to "<<2*myRank+1<<endl;
MPI_Send((void *)&min, 1, MPI_DOUBLE, 2*myRank+1, myRank, MPI_COMM_WORLD);
}
if(nProcs > 2*myRank+2)
{
cout<<myRank<<" sending to "<<2*myRank+1<<endl;
MPI_Send((void *)&min, 1, MPI_DOUBLE, 2*myRank+2, myRank, MPI_COMM_WORLD);
}
}
PS: I know I can use
MPI_Barrier(MPI_COMM_WORLD);
MPI_Reduce((void *)&partialMin, (void *)&totalMin, 1, MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);
MPI_Bcast((void *)&totalMin, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
But I want to write my own code for fun.
The error is in the way you use the status argument in the receive calls. Instead of passing the address of an MPI_Status instance, you simply pass an uninitialised pointer and that leads to the crash:
MPI_Status *status; // status declared as a pointer and never initialised
...
MPI_Recv((void *)&leftChild[0], 1, MPI_DOUBLE, 2*myRank+1, 2*myRank+1,
MPI_COMM_WORLD, status); // status is an invalid pointer here
You should change your code to:
MPI_Status status;
...
MPI_Recv((void *)&leftChild[0], 1, MPI_DOUBLE, 2*myRank+1, 2*myRank+1,
MPI_COMM_WORLD, &status);
Since you do not examine at all the status in your code, you can simply pass MPI_STATUS_IGNORE in all calls:
MPI_Recv((void *)&leftChild[0], 1, MPI_DOUBLE, 2*myRank+1, 2*myRank+1,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);