matrix multiplication using Mpi_Scatter and Mpi_Gather - mpi

I newbie to mpi programming. I was trying to write matrix multiplication. Went through the post MPI Matrix Multiplication with scatter gather about matrix multiplication using scatter and gather routine.
I tried modifying the code available on above post as below...
#define N 4
#include <stdio.h>
#include <math.h>
#include <sys/time.h>
#include <stdlib.h>
#include <stddef.h>
#include "mpi.h"
void print_results(char *prompt, int a[N][N]);
int main(int argc, char *argv[])
{
int i, j, k, rank, size, tag = 99, blksz, sum = 0;
int a[N][N]={{1,2,3,4},{5,6,7,8},{9,1,2,3},{4,5,6,7,}};
int b[N][N]={{1,2,3,4},{5,6,7,8},{9,1,2,3},{4,5,6,7,}};
int c[N][N];
int aa[N],cc[N];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
//scatter rows of first matrix to different processes
MPI_Scatter(a, N*N/size, MPI_INT, aa, N*N/size, MPI_INT,0,MPI_COMM_WORLD);
//broadcast second matrix to all processes
MPI_Bcast(b, N*N, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
//perform vector multiplication by all processes
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
sum = sum + aa[j] * b[i][j];
}
cc[i] = sum;
sum = 0;
}
MPI_Gather(cc, N*N/size, MPI_INT, c, N*N/size, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
print_results("C = ", c);
}
void print_results(char *prompt, int a[N][N])
{
int i, j;
printf ("\n\n%s\n", prompt);
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
printf(" %d", a[i][j]);
}
printf ("\n");
}
printf ("\n\n");
}
I ran above program as
$mpirun -np 4 ./a.out
For above program I am getting following incorrect output..
C =
0 0 -562242168 32766
1 0 4197933 0
-562242176 32766 0 0
4197856 0 4196672 0
C =
0 0 -1064802792 32765
1 0 4197933 0
-1064802800 32765 0 0
4197856 0 4196672 0
C =
30 70 29 60
70 174 89 148
29 89 95 74
60 148 74 126
C =
0 0 -1845552920 32765
1 0 4197933 0
-1845552928 32765 0 0
4197856 0 4196672 0
I have following queries
1. Why result matrix C is getting printed by all processes. It is
supposed to be printed by only main process.
2. Why incorrect result is being printed?
Corrections and help in this regard will be appreciated.

The result matrix c is getting printed by all processes because every process executes the function void print_results(char *prompt, int a[N][N]). Since you are gathering at the process having rank 0, add a statement if (rank == 0) before calling the print_results(...) function. Further, the result is incorrect because of a wrong loop logic in :
for (j = 0; j < N; j++)
{
sum = sum + aa[j] * b[i][j];
}
This should be :
for (j = 0; j < N; j++)
{
sum = sum + aa[j] * b[j][i];
}
Also there is no need to broadcast b as all processes already already have a copy of it and you can avoid MPI_Barrier(). The complete program then becomes :
#define N 4
#include <stdio.h>
#include <math.h>
#include <sys/time.h>
#include <stdlib.h>
#include <stddef.h>
#include "mpi.h"
void print_results(char *prompt, int a[N][N]);
int main(int argc, char *argv[])
{
int i, j, k, rank, size, tag = 99, blksz, sum = 0;
int a[N][N]={{1,2,3,4},{5,6,7,8},{9,1,2,3},{4,5,6,7,}};
int b[N][N]={{1,2,3,4},{5,6,7,8},{9,1,2,3},{4,5,6,7,}};
int c[N][N];
int aa[N],cc[N];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
//scatter rows of first matrix to different processes
MPI_Scatter(a, N*N/size, MPI_INT, aa, N*N/size, MPI_INT,0,MPI_COMM_WORLD);
//broadcast second matrix to all processes
MPI_Bcast(b, N*N, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
//perform vector multiplication by all processes
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
sum = sum + aa[j] * b[j][i]; //MISTAKE_WAS_HERE
}
cc[i] = sum;
sum = 0;
}
MPI_Gather(cc, N*N/size, MPI_INT, c, N*N/size, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
if (rank == 0) //I_ADDED_THIS
print_results("C = ", c);
}
void print_results(char *prompt, int a[N][N])
{
int i, j;
printf ("\n\n%s\n", prompt);
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
printf(" %d", a[i][j]);
}
printf ("\n");
}
printf ("\n\n");
}
Then c =
C =
54 37 47 57
130 93 119 145
44 41 56 71
111 79 101 123

Call to mpi_finalize doesn't indicate that all the MPI processes are terminated like in OpenMP !
In most of mpi implementation, all the processes execute the instruction before the MPI_init and after MPI_Finalized.
A good practice is to do nothing before MPI_Init and after MPI_Finalized.

Related

What is an algorithm to generate the following sequence?

I want to find out the function to generate the sequence with the following pattern.
1 2 3 1 1 2 2 3 3 1 1 1 1 2 2 2 2 3 3 3 3 ....
Where the lower bound number is 1 upper number bound number is 3. Each time numbers start from 1 and each number repeats 2 ^ n times, with n starting with 0.
Here it goes, I hope it will help.
#include <iostream>
#include <math.h>
int main(){
for(int n = 0; n < 5;n++){
for(int i = 1; i < 4;i++){
for(int j = 0;j < pow(2,n) ;j++){
std::cout << i;
}
}
}
return 0;
}
Here is a code in C++:
#include <iostream>
#include <cmath>
int main()
{
// These are the loop control variables
int n, m, i, j, k;
// Read the limit
cin >> n;
// Outermost loop to execute the pattern {1..., 2..., 3...} n times
for (i = 0; i < n; ++i)
{
// This loop generates the required numbers 1, 2, and 3
for (j = 1; j <= 3; ++j)
{
// Display the generated number 2^i times
m = pow(2, i);
for (k = 0; k < m; ++k)
{
std::cout << j << ' ';
}
}
}
}
You can use the same logic in any language you choose to implement it.

MPI Send and receive a pointer in MPI_Type_struct

I want to send a set of data with the MPI_Type_struct and one of them is a pointer to an array (because the matrices that I'm going to use are going to be very large and I need to do malloc). The problem I see is that all the data is passed correctly except the matrix. I know that it is possible to pass a matrix through the pointer since if I only send the pointer of the matrix, correct results are observed.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
void main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int size, rank;
int m,n;
m=n=2;
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
typedef struct estruct
{
float *array;
int sizeM, sizeK, sizeN, rank_or;
} ;
struct estruct kernel, server;
MPI_Datatype types[5] = {MPI_FLOAT, MPI_INT,MPI_INT,MPI_INT,MPI_INT};
MPI_Datatype newtype;
int lengths[5] = {n*m,1,1,1,1};
MPI_Aint displacements[5];
displacements[0] = (size_t) & (kernel.array[0]) - (size_t)&kernel;
displacements[1] = (size_t) & (kernel.sizeM) - (size_t)&kernel;
displacements[2] = (size_t) & (kernel.sizeK) - (size_t)&kernel;
displacements[3] = (size_t) & (kernel.sizeN) - (size_t)&kernel;
displacements[4] = (size_t) & (kernel.rank_or) - (size_t)&kernel;
MPI_Type_struct(5, lengths, displacements, types, &newtype);
MPI_Type_commit(&newtype);
if (rank == 0)
{
kernel.array = (float *)malloc(m * n * sizeof(float));
for(int i = 0; i < m*n; i++) kernel.array[i] = i;
kernel.sizeM = 5;
kernel.sizeK = 5;
kernel.sizeN = 5;
kernel.rank_or = 5;
MPI_Send(&kernel, 1, newtype, 1, 0, MPI_COMM_WORLD);
}
else
{
server.array = (float *)malloc(m * n * sizeof(float));
MPI_Recv(&server, 1, newtype, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("%i \n", server.sizeM);
printf("%i \n", server.sizeK);
printf("%i \n", server.sizeN);
printf("%i \n", server.rank_or);
for(int i = 0; i < m*n; i++) printf("%f\n",server.array[i]);
}
MPI_Finalize();
}
Assuming that only two processes are executed,I expect that process with rank = 1 receive and display the correct elements of the matrix on the screen (the other elements are well received), but the actual output is:
5
5
5
5
0.065004
0.000000
0.000000
0.000000
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 26206 RUNNING AT pmul
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
I hope someone can help me.

How to multiply two matrices in MPI using MPI_Scatter/MPI_Gather

I'm trying to multiply two matrices in C using MPI collective communications MPI_Scatter and MPI_Gather.
My code executes with correct result, but when I'm testing processes after MPI_Scatter, I'm getting zeros on every process.
So, I'm thinking there is some problem with sendcount and recievecount parameters in MPI_Scatter and MPI_Gather functions.
Thanks for helping :)
// matrix multiplication using MPI_Scatter/MPI_Gather
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#define N 10
int main(int argc, char *argv[])
{
int a[N][N], b[N][N], c[N][N], r1, c1, r2, c2, i, j, k, rank, size;
MPI_Init (&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// min number of processes must be 2
if(size < 2){
fprintf(stderr, "Number of processes must be min 2 for %s\n", argv[0]);
MPI_Abort(MPI_COMM_WORLD, 1);
}
// if rank of process is zero, initialize matrix a and b
if(rank == 0){
printf("Insert number of rows and columns for matrix a: ");
scanf("%d %d", &r1, &c1);
printf("Insert number of rows and columns for matrix b: ");
scanf("%d %d",&r2, &c2);
// If the number of columns of matrix a isn't equal to the number of rows of matrix b
// ask user to insert again number of rows and columns
while (c1!=r2){
printf("Error! number of columns of matrix a isn't equal to number of rows of matrix b\n\n");
printf("Enter rows and column for matrix a: ");
scanf("%d %d", &r1, &c1);
printf("Enter rows and column for matrix b: ");
scanf("%d %d",&r2, &c2);
}
// Enter elements of matrix a
printf("\nEnter elements of matrix a:\n");
for(i = 0; i < r1; ++i){
for(j = 0; j < c1; ++j){
printf("Enter elements of matrix a%d%d: ",i+1, j+1);
scanf("%d", &a[i][j]);
}
}
// Enter elements of matrix b
printf("\nEnter elements of matrix b:\n");
for(i = 0; i < r2; ++i){
for(j = 0; j < c2; ++j){
printf("Enter elements of matrix b%d%d: ",i+1, j+1);
scanf("%d",&b[i][j]);
}
}
}
// send one row of matrix a to every process in the group
MPI_Scatter(a, N*N/size, MPI_INT, a, N*N/size, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast(b, N*N, MPI_INT, 0, MPI_COMM_WORLD);
for(i=0; i<r1; ++i){
for(j=0; j<c2; ++j){
// initialization of matrix c(result matrix)
c[i][j]=0;
for(k=0; k<c1; ++k)
c[i][j]+= a[i][k]*b[k][j]; // multiplication of matrix a and matrix b, save result in matrix c
}
}
MPI_Gather(c, N*N/size, MPI_INT, c, N*N/size, MPI_INT, 0, MPI_COMM_WORLD);
// print result
if(rank == 0){
printf("\nResult of matrix multiplication:\n");
for(i = 0; i < r1; ++i){
for(j = 0; j < c2; ++j){
printf("%d ", c[i][j]);
if(j == c2-1){
printf("\n\n");
}
}
}
}
MPI_Finalize();
}

Read input integer from external file for MPI

How could I read external input file for mpi? I need to read one integer from external file (zadanie4_vstup.txt), to compute simple factorial. I have tried to substitute the second argument in MPI_Init() with address of int variable (n), but it looks it is nonsense.
Thank you.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char ** argv)
{
FILE *fr, *fw;
fr = fopen("zadanie4_vstup.txt", "r");
fw = fopen("zadanie4_vystup.txt", "w");
int nproc, me;
int fakt=1, i, buff, n;
MPI_Status stat;
fscanf(fr, "%d", &n);
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Comm_rank(MPI_COMM_WORLD, &me);
#pragma omp parallel for private(i) reduction(*:fakt)
for(i=me*n/nproc+1; i<=(me+1)*n/nproc; i++) {
fakt *= i;
}
if(nproc > 1) {
if(me == 0) {
for(i=1; i<nproc; i++) {
MPI_Recv(&buff, 1, MPI_INT, i, 0, MPI_COMM_WORLD, &stat);
fakt*=buff;
}
} else {
MPI_Send(&fakt, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
}
}
if(me == 0) {
fprintf(fw, "%d! = %d\n", n, fakt);
}
fclose(fr);
fclose(fw);
MPI_Finalize();
}
here is a version of your program that reads n on the command line.
note i simplified the communications by using MPI_Reduce()
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int nproc, me;
int fakt=1, res, i, buff, n;
MPI_Status stat;
MPI_Init(&argc, &argv);
n = atoi(argv[1]);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Comm_rank(MPI_COMM_WORLD, &me);
#pragma omp parallel for private(i) reduction(*:fakt)
for(i=me*n/nproc+1; i<=(me+1)*n/nproc; i++) {
fakt *= i;
}
MPI_Reduce(&fakt, &res, 1, MPI_INT, MPI_PROD, 0, MPI_COMM_WORLD);
if(me == 0) {
printf("%d! = %d\n", n, res);
}
MPI_Finalize();
return 0;
}
for example
$ mpirun -np 4 ./fakt 6
6! = 720

Open MPI's MPI_reduce not combining array sums

I am very new to Open MPI. I have made a small program that computes the sum of an array, by splitting array into pieces equal to the number of processes. The problem in my program is that each process is computing right sum of its share of the array, but the individually computed sums are not summed by MPI_reduce function. I tried my best to solve and also consulted the Open MPI manual, but there is still something that I might be missing. I would be grateful for any kind of guidance. Below is the program I made:
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
int n, rank, nrofProcs, i;
int sum, ans;
// 0,1,2, 3,4,5, 6,7,8, 9
int myarr[] = {1,5,9, 2,8,3, 7,4,6, 10};
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nrofProcs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
n = 10;
MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
sum = 0.0;
int remaining = n % nrofProcs;
int lower =rank*(n/nrofProcs);
int upper = (lower+(n/nrofProcs))-1;
for (i = lower; i <= upper; i++)
{
sum = sum + myarr[i];
}
if(rank==nrofProcs-1)
{
while(i<=remaining)
{
sum = sum + myarr[i];
i++;
}
}
/* (PROBLEM IS HERE, IT IS NOT COMBINING "sums") */
MPI_Reduce(&sum, &ans, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
// if (rank == 0)
printf( "rank: %d, Sum ans: %d\n", rank, sum);
/* shut down MPI */
MPI_Finalize();
return 0;
}
Output:
rank: 2, Sum ans: 17
rank: 1, Sum ans: 13
rank: 0, Sum ans: 15
(Output should be rank: 0, Sum ans: 55)
Sorry, I made some mistakes, that I corrected after running parallel debugging on my program. Here I am sharing code to split an array of length N on M processes, where N and M can have any value:
/*
An MPI program split an array of length N on M processes, where N and M can have any value
*/
#include <math.h>
#include "mpi.h"
#include <iostream>
#include <vector>
using namespace std;
int main(int argc, char *argv[])
{
int n, rank, nrofProcs, i;
int sum, ans;
// 0,1,2, 3,4,5, 6,7,8, 9, 10
int myarr[] = {1,5,9, 2,8,3, 7,4,6,11,10};
vector<int> myvec (myarr, myarr + sizeof(myarr) / sizeof(int) );
n = myvec.size(); // number of items in array
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nrofProcs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
sum = 0.0;
int remaining = n % nrofProcs;
int lower =rank*(n/nrofProcs);
int upper = (lower+(n/nrofProcs))-1;
for (i = lower; i <= upper; i++)
{
sum = sum + myvec[i];
}
if(rank==nrofProcs-1)
{
int ctr=0;
while(ctr<remaining)
{
sum = sum + myvec[i];
ctr++;
i++;
}
}
/* combine everyone's calculations */
MPI_Reduce(&sum, &ans, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (rank == 0)
cout << "rank: " <<rank << " Sum ans: " << ans<< endl;
/* shut down MPI */
MPI_Finalize();
return 0;
}

Resources