MPI programming issue with MPI_Gather - mpi

I am trying to use MPI to sort digits, after sorting by the different processors I want to use MPI_Gather to collect and later print all the sorted numbers but this is not working. Any help will be appreciated. Below is my code.
#include <stdio.h>
#include <time.h>
#include <math.h>
#include <stdlib.h>
#include <mpi.h> /* Include MPI's header file */
/* The IncOrder function that is called by qsort is defined as follows */
int IncOrder(const void *e1, const void *e2)
{
return (*((int *)e1) - *((int *)e2));
}
void CompareSplit(int nlocal, int *elmnts, int *relmnts, int *wspace, int keepsmall);
//int IncOrder(const void *e1, const void *e2);
int main(int argc, char *argv[]){
int n; /* The total number of elements to be sorted */
int npes; /* The total number of processes */
int myrank; /* The rank of the calling process */
int nlocal; /* The local number of elements, and the array that stores them */
int *elmnts; /* The array that stores the local elements */
int *relmnts; /* The array that stores the received elements */
int oddrank; /* The rank of the process during odd-phase communication */
int evenrank; /* The rank of the process during even-phase communication */
int *wspace; /* Working space during the compare-split operation */
int i;
MPI_Status status;
/* Initialize MPI and get system information */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &npes);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
n = 30000;//atoi(argv[1]);
nlocal = n/npes; /* Compute the number of elements to be stored locally. */
/* Allocate memory for the various arrays */
elmnts = (int *)malloc(nlocal*sizeof(int));
relmnts = (int *)malloc(nlocal*sizeof(int));
wspace = (int *)malloc(nlocal*sizeof(int));
/* Fill-in the elmnts array with random elements */
srand(time(NULL));
for (i=0; i<nlocal; i++) {
elmnts[i] = rand()%100+1;
printf("\n%d:",elmnts[i]); //print generated random numbers
}
/* Sort the local elements using the built-in quicksort routine */
qsort(elmnts, nlocal, sizeof(int), IncOrder);
/* Determine the rank of the processors that myrank needs to communicate during
* ics/ccc.gifthe */
/* odd and even phases of the algorithm */
if (myrank%2 == 0) {
oddrank = myrank-1;
evenrank = myrank+1;
} else {
oddrank = myrank+1;
evenrank = myrank-1;
}
/* Set the ranks of the processors at the end of the linear */
if (oddrank == -1 || oddrank == npes)
oddrank = MPI_PROC_NULL;
if (evenrank == -1 || evenrank == npes)
evenrank = MPI_PROC_NULL;
/* Get into the main loop of the odd-even sorting algorithm */
for (i=0; i<npes-1; i++) {
if (i%2 == 1) /* Odd phase */
MPI_Sendrecv(elmnts, nlocal, MPI_INT, oddrank, 1, relmnts,
nlocal, MPI_INT, oddrank, 1, MPI_COMM_WORLD, &status);
else /* Even phase */
MPI_Sendrecv(elmnts, nlocal, MPI_INT, evenrank, 1, relmnts,
nlocal, MPI_INT, evenrank, 1, MPI_COMM_WORLD, &status);
CompareSplit(nlocal, elmnts, relmnts, wspace, myrank < status.MPI_SOURCE);
}
MPI_Gather(elmnts,nlocal,MPI_INT,relmnts,nlocal,MPI_INT,0,MPI_COMM_WORLD);
/* The master host display the sorted array */
//int len = sizeof(elmnts)/sizeof(int);
if(myrank == 0) {
printf("\nSorted array :\n");
int j;
for (j=0;j<n;j++) {
printf("relmnts[%d] = %d\n",j,relmnts[j]);
}
printf("\n");
//printf("sorted in %f s\n\n",((double)clock() - start) / CLOCKS_PER_SEC);
}
free(elmnts); free(relmnts); free(wspace);
MPI_Finalize();
}
/* This is the CompareSplit function */
void CompareSplit(int nlocal, int *elmnts, int *relmnts, int *wspace, int keepsmall){
int i, j, k;
for (i=0; i<nlocal; i++)
wspace[i] = elmnts[i]; /* Copy the elmnts array into the wspace array */
if (keepsmall) { /* Keep the nlocal smaller elements */
for (i=j=k=0; k<nlocal; k++) {
if (j == nlocal || (i < nlocal && wspace[i] < relmnts[j]))
elmnts[k] = wspace[i++];
else
elmnts[k] = relmnts[j++];
}
} else { /* Keep the nlocal larger elements */
for (i=k=nlocal-1, j=nlocal-1; k>=0; k--) {
if (j == 0 || (i >= 0 && wspace[i] >= relmnts[j]))
elmnts[k] = wspace[i--];
else
elmnts[k] = relmnts[j--];
}
}
}

If I understand your code you've gathered the separately-sorted sublists back onto one process into the array relmnts ? And then printed them in order of occurrence. But I can't see where you've done anything about sorting relmnts. (I often don't understand other people's code, so if I have misunderstood stop reading now.)
You seem to be hoping that the gather will mysteriously merge the sorted sub-lists into a sorted list for you. It ain't going to happen ! You will need to merge the elements from the sorted sub-lists yourself, possibly after gathering them back to one process, or possibly doing some sort of 'cascading gather'.
By this I mean, suppose that you had 32 processes, and 32 sub-lists, then you would merge the sub-lists from process 1 and process 2 onto process 1, 3 and 4 onto 3, ..., 31 and 32 onto 31. Then you would merge from process 1 and 3 onto 1, .... After 5 steps you'd have the whole list merged, in sorted order, on process 1 (I'm a Fortran programmer, I start counting at 1, I should have written 'the process with rank 0' etc).
Incidentally, the example you put in your comment to your own question may be misleading: it sort of looks like you gathered 3 sub-lists each of 4 elements and rammed them together. But there are no elements in sub-list 1 which are smaller than any of the elements in sub-list 2, that sort of thing. How did that happen if the original list was unsorted ?

Related

Anomalous MPI behavior

I am wondering if anyone can offer an explanation.
I'll start with the code:
/*
Barrier implemented using tournament-style coding
*/
// Constraints: Number of processes must be a power of 2, e.g.
// 2,4,8,16,32,64,128,etc.
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
void mybarrier(MPI_Comm);
// global debug bool
int verbose = 1;
int main(int argc, char * argv[]) {
int rank;
int size;
int i;
int sum = 0;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int check = size;
// check to make sure the number of processes is a power of 2
if (rank == 0){
while(check > 1){
if (check % 2 == 0){
check /= 2;
} else {
printf("ERROR: The number of processes must be a power of 2!\n");
MPI_Abort(MPI_COMM_WORLD, 1);
return 1;
}
}
}
// simple task, with barrier in the middle
for (i = 0; i < 500; i++){
sum ++;
}
mybarrier(MPI_COMM_WORLD);
for (i = 0; i < 500; i++){
sum ++;
}
if (verbose){
printf("process %d arrived at finalize\n", rank);
}
MPI_Finalize();
return 0;
}
void mybarrier(MPI_Comm comm){
// MPI variables
int rank;
int size;
int * data;
MPI_Status * status;
// Loop variables
int i;
int a;
int skip;
int complete = 0;
int currentCycle = 1;
// Initialize MPI vars
MPI_Comm_rank(comm, &rank);
MPI_Comm_size(comm, &size);
// step 1, gathering
while (!complete){
skip = currentCycle * 2;
// if currentCycle divides rank evenly, then it is a target
if ((rank % currentCycle) == 0){
// if skip divides rank evenly, then it needs to receive
if ((rank % skip) == 0){
MPI_Recv(data, 0, MPI_INT, rank + currentCycle, 99, comm, status);
if (verbose){
printf("1: %d from %d\n", rank, rank + currentCycle);
}
// otherwise, it needs to send. Once sent, the process is done
} else {
if (verbose){
printf("1: %d to %d\n", rank, rank - currentCycle);
}
MPI_Send(data, 0, MPI_INT, rank - currentCycle, 99, comm);
complete = 1;
}
}
currentCycle *= 2;
// main process will never send, so this code will allow it to complete
if (currentCycle >= size){
complete = 1;
}
}
complete = 0;
currentCycle = size / 2;
// step 2, scattering
while (!complete){
// if currentCycle is 1, then this is the last loop
if (currentCycle == 1){
complete = 1;
}
skip = currentCycle * 2;
// if currentCycle divides rank evenly then it is a target
if ((rank % currentCycle) == 0){
// if skip divides rank evenly, then it needs to send
if ((rank % skip) == 0){
if (verbose){
printf("2: %d to %d\n", rank, rank + currentCycle);
}
MPI_Send(data, 0, MPI_INT, rank + currentCycle, 99, comm);
// otherwise, it needs to receive
} else {
if (verbose){
printf("2: %d waiting for %d\n", rank, rank - currentCycle);
}
MPI_Recv(data, 0, MPI_INT, rank - currentCycle, 99, comm, status);
if (verbose){
printf("2: %d from %d\n", rank, rank - currentCycle);
}
}
}
currentCycle /= 2;
}
}
Expected behavior
The code is to increment a sum to 500, wait for all other processes to reach that point using blocking MPI_Send and MPI_Recv calls, and then increment sum to 1000.
Observed behavior on cluster
Cluster behaves as expected
Anomalous behavior observed on my machine
All processes in main function are reported as being 99, which I have linked specifically to the tag of the second while loop of mybarrier.
In addition
My first draft was written with for loops, and with that one, the program executes as expected on the cluster as well, but on my machine execution never finishes, even though all processes call MPI_Finalize (but none move beyond it).
MPI Versions
My machine is running OpenRTE 2.0.2
The cluster is running OpenRTE 1.6.3
The questions
I have observed that my machine seems to run unexpectedly all of the time, while the cluster executes normally. This is true with other MPI code I have written as well. Was there major changes between 1.6.3 and 2.0.2 that I'm not aware of?
At any rate, I'm baffled, and I was wondering if anyone could offer some explanation as to why my machine seems to not run MPI correctly. I hope I have provided enough details, but if not, I will be happy to provide whatever additional information you require.
There is a problem with your code, maybe that's what causing the weird behavior you are seeing.
You are passing to the MPI_Recv routines a status object that hasn't been allocated. In fact, that pointer is not even initialized, so if it happens not to be NULL, the MPI_Recv will endup writing wherever in memory causing undefined behavior. The correct form is the following:
MPI_Status status;
...
MPI_Recv(..., &status);
Or if you want to use the heap:
MPI_Status *status = malloc(sizeof(MPI_Status));
...
MPI_Recv(..., status);
...
free(status);
Also since you are not using the value returned by the receive, you should instead use MPI_STATUS_IGNORE instead:
MPI_Recv(..., MPI_STATUS_IGNORE);

Sizeof pointer of pointer in C [duplicate]

First off, here is some code:
int main()
{
int days[] = {1,2,3,4,5};
int *ptr = days;
printf("%u\n", sizeof(days));
printf("%u\n", sizeof(ptr));
return 0;
}
Is there a way to find out the size of the array that ptr is pointing to (instead of just giving its size, which is four bytes on a 32-bit system)?
No, you can't. The compiler doesn't know what the pointer is pointing to. There are tricks, like ending the array with a known out-of-band value and then counting the size up until that value, but that's not using sizeof().
Another trick is the one mentioned by Zan, which is to stash the size somewhere. For example, if you're dynamically allocating the array, allocate a block one int bigger than the one you need, stash the size in the first int, and return ptr+1 as the pointer to the array. When you need the size, decrement the pointer and peek at the stashed value. Just remember to free the whole block starting from the beginning, and not just the array.
The answer is, "No."
What C programmers do is store the size of the array somewhere. It can be part of a structure, or the programmer can cheat a bit and malloc() more memory than requested in order to store a length value before the start of the array.
For dynamic arrays (malloc or C++ new) you need to store the size of the array as mentioned by others or perhaps build an array manager structure which handles add, remove, count, etc. Unfortunately C doesn't do this nearly as well as C++ since you basically have to build it for each different array type you are storing which is cumbersome if you have multiple types of arrays that you need to manage.
For static arrays, such as the one in your example, there is a common macro used to get the size, but it is not recommended as it does not check if the parameter is really a static array. The macro is used in real code though, e.g. in the Linux kernel headers although it may be slightly different than the one below:
#if !defined(ARRAY_SIZE)
#define ARRAY_SIZE(x) (sizeof((x)) / sizeof((x)[0]))
#endif
int main()
{
int days[] = {1,2,3,4,5};
int *ptr = days;
printf("%u\n", ARRAY_SIZE(days));
printf("%u\n", sizeof(ptr));
return 0;
}
You can google for reasons to be wary of macros like this. Be careful.
If possible, the C++ stdlib such as vector which is much safer and easier to use.
There is a clean solution with C++ templates, without using sizeof(). The following getSize() function returns the size of any static array:
#include <cstddef>
template<typename T, size_t SIZE>
size_t getSize(T (&)[SIZE]) {
return SIZE;
}
Here is an example with a foo_t structure:
#include <cstddef>
template<typename T, size_t SIZE>
size_t getSize(T (&)[SIZE]) {
return SIZE;
}
struct foo_t {
int ball;
};
int main()
{
foo_t foos3[] = {{1},{2},{3}};
foo_t foos5[] = {{1},{2},{3},{4},{5}};
printf("%u\n", getSize(foos3));
printf("%u\n", getSize(foos5));
return 0;
}
Output:
3
5
As all the correct answers have stated, you cannot get this information from the decayed pointer value of the array alone. If the decayed pointer is the argument received by the function, then the size of the originating array has to be provided in some other way for the function to come to know that size.
Here's a suggestion different from what has been provided thus far,that will work: Pass a pointer to the array instead. This suggestion is similar to the C++ style suggestions, except that C does not support templates or references:
#define ARRAY_SZ 10
void foo (int (*arr)[ARRAY_SZ]) {
printf("%u\n", (unsigned)sizeof(*arr)/sizeof(**arr));
}
But, this suggestion is kind of silly for your problem, since the function is defined to know exactly the size of the array that is passed in (hence, there is little need to use sizeof at all on the array). What it does do, though, is offer some type safety. It will prohibit you from passing in an array of an unwanted size.
int x[20];
int y[10];
foo(&x); /* error */
foo(&y); /* ok */
If the function is supposed to be able to operate on any size of array, then you will have to provide the size to the function as additional information.
For this specific example, yes, there is, IF you use typedefs (see below). Of course, if you do it this way, you're just as well off to use SIZEOF_DAYS, since you know what the pointer is pointing to.
If you have a (void *) pointer, as is returned by malloc() or the like, then, no, there is no way to determine what data structure the pointer is pointing to and thus, no way to determine its size.
#include <stdio.h>
#define NUM_DAYS 5
typedef int days_t[ NUM_DAYS ];
#define SIZEOF_DAYS ( sizeof( days_t ) )
int main() {
days_t days;
days_t *ptr = &days;
printf( "SIZEOF_DAYS: %u\n", SIZEOF_DAYS );
printf( "sizeof(days): %u\n", sizeof(days) );
printf( "sizeof(*ptr): %u\n", sizeof(*ptr) );
printf( "sizeof(ptr): %u\n", sizeof(ptr) );
return 0;
}
Output:
SIZEOF_DAYS: 20
sizeof(days): 20
sizeof(*ptr): 20
sizeof(ptr): 4
There is no magic solution. C is not a reflective language. Objects don't automatically know what they are.
But you have many choices:
Obviously, add a parameter
Wrap the call in a macro and automatically add a parameter
Use a more complex object. Define a structure which contains the dynamic array and also the size of the array. Then, pass the address of the structure.
You can do something like this:
int days[] = { /*length:*/5, /*values:*/ 1,2,3,4,5 };
int *ptr = days + 1;
printf("array length: %u\n", ptr[-1]);
return 0;
My solution to this problem is to save the length of the array into a struct Array as a meta-information about the array.
#include <stdio.h>
#include <stdlib.h>
struct Array
{
int length;
double *array;
};
typedef struct Array Array;
Array* NewArray(int length)
{
/* Allocate the memory for the struct Array */
Array *newArray = (Array*) malloc(sizeof(Array));
/* Insert only non-negative length's*/
newArray->length = (length > 0) ? length : 0;
newArray->array = (double*) malloc(length*sizeof(double));
return newArray;
}
void SetArray(Array *structure,int length,double* array)
{
structure->length = length;
structure->array = array;
}
void PrintArray(Array *structure)
{
if(structure->length > 0)
{
int i;
printf("length: %d\n", structure->length);
for (i = 0; i < structure->length; i++)
printf("%g\n", structure->array[i]);
}
else
printf("Empty Array. Length 0\n");
}
int main()
{
int i;
Array *negativeTest, *days = NewArray(5);
double moreDays[] = {1,2,3,4,5,6,7,8,9,10};
for (i = 0; i < days->length; i++)
days->array[i] = i+1;
PrintArray(days);
SetArray(days,10,moreDays);
PrintArray(days);
negativeTest = NewArray(-5);
PrintArray(negativeTest);
return 0;
}
But you have to care about set the right length of the array you want to store, because the is no way to check this length, like our friends massively explained.
This is how I personally do it in my code. I like to keep it as simple as possible while still able to get values that I need.
typedef struct intArr {
int size;
int* arr;
} intArr_t;
int main() {
intArr_t arr;
arr.size = 6;
arr.arr = (int*)malloc(sizeof(int) * arr.size);
for (size_t i = 0; i < arr.size; i++) {
arr.arr[i] = i * 10;
}
return 0;
}
No, you can't use sizeof(ptr) to find the size of array ptr is pointing to.
Though allocating extra memory(more than the size of array) will be helpful if you want to store the length in extra space.
int main()
{
int days[] = {1,2,3,4,5};
int *ptr = days;
printf("%u\n", sizeof(days));
printf("%u\n", sizeof(ptr));
return 0;
}
Size of days[] is 20 which is no of elements * size of it's data type.
While the size of pointer is 4 no matter what it is pointing to.
Because a pointer points to other element by storing it's address.
In strings there is a '\0' character at the end so the length of the string can be gotten using functions like strlen. The problem with an integer array, for example, is that you can't use any value as an end value so one possible solution is to address the array and use as an end value the NULL pointer.
#include <stdio.h>
/* the following function will produce the warning:
* ‘sizeof’ on array function parameter ‘a’ will
* return size of ‘int *’ [-Wsizeof-array-argument]
*/
void foo( int a[] )
{
printf( "%lu\n", sizeof a );
}
/* so we have to implement something else one possible
* idea is to use the NULL pointer as a control value
* the same way '\0' is used in strings but this way
* the pointer passed to a function should address pointers
* so the actual implementation of an array type will
* be a pointer to pointer
*/
typedef char * type_t; /* line 18 */
typedef type_t ** array_t;
int main( void )
{
array_t initialize( int, ... );
/* initialize an array with four values "foo", "bar", "baz", "foobar"
* if one wants to use integers rather than strings than in the typedef
* declaration at line 18 the char * type should be changed with int
* and in the format used for printing the array values
* at line 45 and 51 "%s" should be changed with "%i"
*/
array_t array = initialize( 4, "foo", "bar", "baz", "foobar" );
int size( array_t );
/* print array size */
printf( "size %i:\n", size( array ));
void aprint( char *, array_t );
/* print array values */
aprint( "%s\n", array ); /* line 45 */
type_t getval( array_t, int );
/* print an indexed value */
int i = 2;
type_t val = getval( array, i );
printf( "%i: %s\n", i, val ); /* line 51 */
void delete( array_t );
/* free some space */
delete( array );
return 0;
}
/* the output of the program should be:
* size 4:
* foo
* bar
* baz
* foobar
* 2: baz
*/
#include <stdarg.h>
#include <stdlib.h>
array_t initialize( int n, ... )
{
/* here we store the array values */
type_t *v = (type_t *) malloc( sizeof( type_t ) * n );
va_list ap;
va_start( ap, n );
int j;
for ( j = 0; j < n; j++ )
v[j] = va_arg( ap, type_t );
va_end( ap );
/* the actual array will hold the addresses of those
* values plus a NULL pointer
*/
array_t a = (array_t) malloc( sizeof( type_t *) * ( n + 1 ));
a[n] = NULL;
for ( j = 0; j < n; j++ )
a[j] = v + j;
return a;
}
int size( array_t a )
{
int n = 0;
while ( *a++ != NULL )
n++;
return n;
}
void aprint( char *fmt, array_t a )
{
while ( *a != NULL )
printf( fmt, **a++ );
}
type_t getval( array_t a, int i )
{
return *a[i];
}
void delete( array_t a )
{
free( *a );
free( a );
}
#include <stdio.h>
#include <string.h>
#include <stddef.h>
#include <stdlib.h>
#define array(type) struct { size_t size; type elem[0]; }
void *array_new(int esize, int ecnt)
{
size_t *a = (size_t *)malloc(esize*ecnt+sizeof(size_t));
if (a) *a = ecnt;
return a;
}
#define array_new(type, count) array_new(sizeof(type),count)
#define array_delete free
#define array_foreach(type, e, arr) \
for (type *e = (arr)->elem; e < (arr)->size + (arr)->elem; ++e)
int main(int argc, char const *argv[])
{
array(int) *iarr = array_new(int, 10);
array(float) *farr = array_new(float, 10);
array(double) *darr = array_new(double, 10);
array(char) *carr = array_new(char, 11);
for (int i = 0; i < iarr->size; ++i) {
iarr->elem[i] = i;
farr->elem[i] = i*1.0f;
darr->elem[i] = i*1.0;
carr->elem[i] = i+'0';
}
array_foreach(int, e, iarr) {
printf("%d ", *e);
}
array_foreach(float, e, farr) {
printf("%.0f ", *e);
}
array_foreach(double, e, darr) {
printf("%.0lf ", *e);
}
carr->elem[carr->size-1] = '\0';
printf("%s\n", carr->elem);
return 0;
}
#define array_size 10
struct {
int16 size;
int16 array[array_size];
int16 property1[(array_size/16)+1]
int16 property2[(array_size/16)+1]
} array1 = {array_size, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
#undef array_size
array_size is passing to the size variable:
#define array_size 30
struct {
int16 size;
int16 array[array_size];
int16 property1[(array_size/16)+1]
int16 property2[(array_size/16)+1]
} array2 = {array_size};
#undef array_size
Usage is:
void main() {
int16 size = array1.size;
for (int i=0; i!=size; i++) {
array1.array[i] *= 2;
}
}
Most implementations will have a function that tells you the reserved size for objects allocated with malloc() or calloc(), for example GNU has malloc_usable_size()
However, this will return the size of the reversed block, which can be larger than the value given to malloc()/realloc().
There is a popular macro, which you can define for finding number of elements in the array (Microsoft CRT even provides it OOB with name _countof):
#define countof(x) (sizeof(x)/sizeof((x)[0]))
Then you can write:
int my_array[] = { ... some elements ... };
printf("%zu", countof(my_array)); // 'z' is correct type specifier for size_t

MPI Broadcast Very Slow

I am writing an MPI program and the MPI_Bcast function is very slow on one particular machine I am using. In order to narrow down the problem, I have the following two test programs. The first does many MPI_Send/MPI_Recv operations from process 0 to the others:
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>
#define N 1000000000
int main(int argc, char** argv) {
int rank, size;
/* initialize MPI */
MPI_Init(&argc, &argv);
/* get the rank (process id) and size (number of processes) */
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
/* have process 0 do many sends */
if (rank == 0) {
int i, j;
for (i = 0; i < N; i++) {
for (j = 1; j < size; j++) {
if (MPI_Send(&i, 1, MPI_INT, j, 0, MPI_COMM_WORLD) != MPI_SUCCESS) {
printf("Error!\n");
exit(0);
}
}
}
}
/* have the rest receive that many values */
else {
int i;
for (i = 0; i < N; i++) {
int value;
if (MPI_Recv(&value, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE) != MPI_SUCCESS) {
printf("Error!\n");
exit(0);
}
}
}
/* quit MPI */
MPI_Finalize( );
return 0;
}
This program runs in only 2.7 seconds or so with 4 processes.
This next program does exactly the same thing, except it uses MPI_Bcast to send the values from process 0 to the other processes:
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>
#define N 1000000000
int main(int argc, char** argv) {
int rank, size;
/* initialize MPI */
MPI_Init(&argc, &argv);
/* get the rank (process id) and size (number of processes) */
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
/* have process 0 do many sends */
if (rank == 0) {
int i, j;
for (i = 0; i < N; i++) {
if (MPI_Bcast(&i, 1, MPI_INT, 0, MPI_COMM_WORLD) != MPI_SUCCESS) {
printf("FAIL\n");
exit(0);
}
}
}
/* have the rest receive that many values */
else {
int i;
for (i = 0; i < N; i++) {
if (MPI_Bcast(&i, 1, MPI_INT, 0, MPI_COMM_WORLD) != MPI_SUCCESS) {
printf("FAIL\n");
exit(0);
}
}
}
/* quit MPI */
MPI_Finalize( );
return 0;
}
Both programs have the same value for N, and neither program returns an error from the communication calls. The second program should be at least a little bit faster. But it is not, it is much slower at roughly 34 seconds - around 12X slower!
This problem only manifests itself on one machine, but not others even though they are running the same operating system (Ubuntu) and don't have drastically different hardware. Also, I'm using OpenMPI on both.
I'm really pulling my hair out, does anyone have an idea?
Thanks for reading!
A couple of observations.
The MPI_Bcast is receiving the result into the "&i" buffer. The MPI_Recv is receiving the result into "&value". Is there some reason that decision was made?
The Send/Recv model will naturally synchronize. The MPI_Send calls are blocking and serialized. The matching MPI_Recv should always be ready when the MPI_Send is called.
In general, collectives tend to have larger advantages as the job size scales up.
I compiled and ran the programs using IBM Platform MPI. I lowered the N value by 100x to 10 Million, to speed up the testing. I changed the MPI_Bcast to receive the result in a "&value" buffer rather than into the "&i" buffer. I ran each case three times, and averaged the times. The times are the "real" value returned by "time" (this was necessary as the ranks were running remotely from the mpirun command).
With 4 ranks over shared memory, the Send/Recv model took 6.5 seconds, the Bcast model took 7.6 seconds.
With 32 ranks (8/node x 4 nodes, FDR InfiniBand), the Send/Recv model took 79 seconds, the Bcast model took 22 seconds.
With 128 ranks (16/node x 8 nodes, FDR Infiniband), the Send/Recv model took 134 seconds, the Bcast model took 44 seconds.
Given these timings AFTER the reduction in the N value by 100x to 10000000, I am going to suggest that the "2.7 second" time was a no-op. Double check that some actual work was done.

Need explanation MPI_Scatter()

Im trying to do the Monte Carlo problem using MPI were we generate x amount of rand. num between 0 and 1 and then send n-length numbers to each processor. I'm using scatter function but my code doesn't run right, it compiles but it doesn't ask for the input. I dont understand how MPI loops by itself without loops, can some explain this and what is wrong with my code?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include "mpi.h"
main(int argc, char* argv[]) {
int my_rank; /* rank of process */
int p; /* number of processes */
int source; /* rank of sender */
int dest; /* rank of receiver */
int tag = 0; /* tag for messages */
char message[100]; /* storage for message */
MPI_Status status; /* return status for */
double *total_xr, *p_xr, total_size_xr, p_size_xr; /* receive */
/* Start up MPI */
MPI_Init(&argc, &argv);
/* Find out process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* Find out number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &p);
double temp;
int i, partial_sum, x, total_sum, ratio_p, area;
total_size_xr = 0;
partial_sum = 0;
if(my_rank == 0){
while(total_size_xr <= 0){
printf("How many random numbers should each process get?: ");
scanf("%f", &p_size_xr);
}
total_size_xr = p*p_size_xr;
total_xr = malloc(total_size_xr*sizeof(double));
//xr generator will generate numbers between 1 and 0
srand(time(NULL));
for(i=0; i<total_size_xr; i++)
{
temp = 2.0 * rand()/(RAND_MAX+1.0) -1.0;
//this will make sure if any number computer stays in the boundry of 0 and 1, doesn't go over into the negative
while(temp < 0.0)
{
temp = 2.0 * rand()/(RAND_MAX+1.0) -1.0;
}
//array set to total random numbers generated to be scatter into processors
total_xr[i] = temp;
}
}
else{
//this will be the buffer for the processors to hold their own numbers to add
p_xr = malloc(p_size_xr*sizeof(double));
printf("\n\narray set\n\n");
//scatter xr into processors
MPI_Scatter(total_xr, total_size_xr, MPI_DOUBLE, p_xr, p_size_xr, MPI_DOUBLE, 0, MPI_COMM_WORLD);
//while in processor the partial sum will be caluclated by using xr and the formula sqrt(1-x*x)
for(i=0; i<p_size_xr; i++)
{
x = p_xr[i];
temp = sqrt(1 - (x*x));
partial_sum = partial_sum + temp;
}
//}
//we will send the partial sums to master processor which is processor 0 and add them and place
//the result in total_sum
MPI_Reduce(&partial_sum, &total_sum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
//once we have all of the sums we need to multiply the total sum and multiply it with 1/N
//N being the number of processors, the area should contain the value of pi.
ratio_p = 1/p;
area = total_sum*ratio_p;
printf("\n\nThe area under the curve of f(x) = sqrt(1-x*x), between 0 and 1 is, %f\n\n", area);
/* Shut down MPI */
MPI_Finalize();
} /* main */
In general, its not good to rely on STDIN/STDOUT for an MPI program. It's possible that MPI implementation could put rank 0 on some node other than the node on which you're launching your jobs. In that case you have to worry about forwarding correctly. While this will work most of the time, it's not usually a good idea.
A better way to do things is to have your user input be in a file that the application can read or via command line variables. Those will be much more portable.
I'm not sure what you mean by MPI looping by itself without loops. Maybe you can clarify that comment if you still need an answer there.

Passing argument without a cast in MPI

For my homework I have to test for several large matrices using this Conjugate Gradient Program with MPI (see code below). I copied the program from my book and it is supposed to compile but I get the errors:
In function 'main':
37:warning: passing argument 1 of read_replicated_vector makes pointer from integer without a cast
37: warning: passing argument 2 of read_replicated_vector makes pointer from integer without a cast
37: warning: passing argument 3 of read_replicated_vector makes integer from pointer without a cast
37: warning: passing argument 4 of read_replicated_vector from incompatible pointer type
37: error: void value not ignored as it ought to be
44: warning: passing argument 1 of print_replicated_vector makes pointer from integer without a cast
44: warning: passing argument 3 of print_replicated_vector makes integer from pointer without a cast
44: error: too many arguments to function print_replicated_vector
#include <stdlib.h>
#include <stdio.h>
#include "mpi.h"
#include "MyMPI.h"
main (int argc, char *argv[])
{
double **a; /* Solving Ax = b for x */
double *astorage; /* Holds elements of A */
double *b; /* Constant vector */
double *x; /* Solution vector */
int p; /* MPI Processes */
int id; /* Process rank */
int m; /* Rows in A */
int n; /* Columns in A */
int n1; /* Elements in b */
/* Initialize a and b so that solution is x[i] = i */
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &p);
MPI_Comm_rank (MPI_COMM_WORLD, &id);
read_block_row_matrix (id, p, argv[1], (void *) &a,
(void *) &astorage, MPI_DOUBLE, &m, &n);
n1 = read_replicated_vector (id, p, argv[2], (void **) &b, MPI_DOUBLE);
if ((m != n) || (n != n1))
{
if (!id)
printf ("Incompatible dimensions (%d x %d) x (%d)\n", m, n, n1);
}
else {
x = (double *) malloc (n * sizeof(double));
cg (p, id, a, b, x, n);
print_replicated_vector (id, p, x, MPI_DOUBLE, n); // here
}
MPI_Finalize();
}
id and p are not pointers, so I do think I need to pass them by reference in the calls to MPI_Comm_size and MPI_Comm_rank, though I tried doing that.
Edit
//Input Function
void read_replicated_vector (
char *s, /* IN - File name */
void **v, /* OUT - Vector */
MPI_Datatype dtype, /* IN - Vector type */
int *n, /* OUT - Vector length */
MPI_Comm comm) /* IN - Communicator */
{
int datum_size; /* Bytes per vector element */
int i;
int id; /* Process rank */
FILE *infileptr; /* Input file pointer */
int p; /* Number of processes */
MPI_Comm_rank (comm, &id);
MPI_Comm_size (comm, &p);
datum_size = get_size (dtype);
if (id == (p-1))
{
infileptr = fopen (s, "r");
if (infileptr == NULL) *n = 0;
else fread (n, sizeof(int), 1, infileptr);
}
MPI_Bcast (n, 1, MPI_INT, p-1, MPI_COMM_WORLD);
if (! *n) terminate (id, "Cannot open vector file");
*v = my_malloc (id, *n * datum_size);
if (id == (p-1))
{
fread (*v, datum_size, *n, infileptr);
fclose (infileptr);
}
MPI_Bcast (*v, *n, dtype, p-1, MPI_COMM_WORLD);
}
// Output Function
void print_replicated_vector (
void *v, /* IN - Address of vector */
MPI_Datatype dtype, /* IN - Vector element type */
int n, /* IN - Elements in vector */
MPI_Comm comm) /* IN - Communicator */
{
int id; /* Process rank */
MPI_Comm_rank (comm, &id);
if (!id)
{
print_subvector (v, dtype, n);
printf ("\n\n");
}
}
Your warnings are because you're calling the function
void print_replicated_vector (void *, MPI_Datatype, int, MPI_Comm);
with a first parameter of type int:
print_replicated_vector (id, p, x, MPI_DOUBLE, n); // here
C code will sometimes store a pointer in an int, and that's what the compiler is assuming you want to do and it's doing the appropriate type conversions (but warning you of them). But to make the code correct you'd have to make the types match up. I.e. pass a pointer to id with &id or whatever the appropriate argument is (I don't know what print_replicated_vector does or what you want it to do).

Resources