MPI_Isend and MPI_Wait cause segmentation fault with large matrix

MPI_Isend and MPI_Wait cause segmentation fault with large matrix - mpi

The code simply allocates memory for a matrix and uses non-blocking procedure to send the matrix from rank 0 to rank 1. It works fine for a smaller matrix size (1024). But it results in segmentation fault with a larger size (16384);
Below is the code
double **A;
int i,j,size,rankid,rankall;
size = 16384;
MPI_Request reqr,reqs;
MPI_Status star,stas;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&rankall);
MPI_Comm_rank(MPI_COMM_WORLD,&rankid);
A = (double**)calloc(size,sizeof(double*));
for(i=0;i<size;i++){
A[i] = (double *)calloc(size,sizeof(double));
for(j=0;j<size;j++){
if(rankid ==0){
A[i][j] = 1;
}
}
}
if(rankid ==0){
MPI_Isend(&A[0][0],size*size,MPI_DOUBLE,1,1,MPI_COMM_WORLD,&reqs);
MPI_Wait(&reqs,&stas);
}
if(rankid ==1){
MPI_Irecv(&A[0][0],size*size,MPI_DOUBLE,0,1,MPI_COMM_WORLD,&reqr);
MPI_Wait(&reqr,&star);
}
MPI_Finalize();
debug showed
#0 0x00007FFFF7947093 in ?? () From /1ib/x86_64-1inux-gnu/libc.so.6
#1 0x000000000043a5B0 in MPID_Segment_contig_m2m ()
#2 0x00000000004322cb in MPID_Segment_manipulate ()
#3 0x000000000043a?Ba in MPID_Segment_pack ()
#4 0x000000000042BB99 in lmt_shm_send_progress ()
#5 0x000000000042?e1F in MPID_nem_lmt_shm_start_send ()
#6 0x0000000000425aFF in pkt_CTS_handler ()
#? 0x000000000041Fb52 in MPIDI_CH3I_Progress ()
#8 0x0000000000405Bc1 in MPIR_Wait_impl ()
#9 0x000000000040594e in PMPI_Wait ()
#10 0x0000000000402ea5 in main (argc=1,argv=0x7fffffffe4a8)
at ./simpletest.c:26

Related

Recursive FIbonacci arm Assembly

Edit: I have removed my code as I do not want to get caught for cheating on my assignment. I will repost the code once my assignment has been submitted. I apologize for posting it on stack overflow, I just had no where else to go for help. Please respect my edit to remove the code. I have tried deleting it, but it will not let me as I need to request it.
[MIPS code I was trying to follow][1]
[C Code I was trying to follow][2]
I am trying to convert recursive fibonacci code into arm assembly but I am running into issues. When running my arm assembly, the final value of the sum is 5 when it should be 2. It seems as though my code loops but maybe one too many times. Any help would be much appreciated as I am new to this.

This is what your code is doing, and below is a test run.  This simply isn't a usual recursive fibonacci.
#include <stdio.h>
void f ( int );
int R2 = 0;
int main () {
for ( int i = 0; i < 10; i++ ) {
R2 = 0;
f ( i );
printf ( "f ( %d ) = %d\n", i, R2 );
}
}
void f ( int n ) {
if ( n == 0 ) { R2 += 0; return; }
if ( n == 1 ) { R2 += 1; return; }
f ( n-1 );
f ( n-2 );
R2 += n-1;
}
f ( 0 ) = 0
f ( 1 ) = 1
f ( 2 ) = 2
f ( 3 ) = 5
f ( 4 ) = 10
f ( 5 ) = 19
f ( 6 ) = 34
f ( 7 ) = 59
f ( 8 ) = 100
f ( 9 ) = 167
Either you started with a broken Fibonacci algorithm, or substantially changed it going to assembly.  I don't know how this can be fixed, except by following a working algorithm.

Note that in the C code the only addition is in the fib(n-1) + fib(n-2). In particular the special cases just do return 0; and return 1; respectively. Thus your else add 0/1 to sum lines are wrong. You should replace your additions with moves.
Also, you do MOV R1, R0 //copy fib(n-1) which is incorrect because the fib(n-1) has been returned in R2 not R0. That should be MOV R1, R2.
With these changes the code works, even if it is slightly non-standard.

Behaviour of Future.delayed() in dart

I generally program in C++ and aware how the Sleep function work, but learning dart (for flutter) now i came across this delay function
void countSeconds(s) {
for( var i = 1 ; i <= s; i++ ) {
Future.delayed(Duration(seconds: i), () => print(i));
}
}
It prints value of i after ith second, but shouldn't it print 1 after 1 sec, 2 after another 2 sec ( ie 3 ), 3 after another 3 secs (ie 6 sec) etc. How is it working?

This will print 1 after 1s, 2 after another 2s, 3 after 6s.
for( var i = 1 ; i <= 5; i++ ) {
await Future.delayed(Duration(seconds: i), () => print(i));
}
In asynchronous programming you need to await for futures to return result. Otherwise it will return everything immediately

QtConcurrent::map segmentation fault

When I have been trying to implement "parallel for" using QtConcurrent::map:
QFuture<void> parForAsync(size_t n, std::function<void (size_t)> Op)
{
size_t nThreads =
static_cast<size_t>(QThreadPool::globalInstance()->maxThreadCount());
size_t nn = n/nThreads + 1;
using Sequence = QVector<std::function<void()>>;
Sequence vFuns;
for(size_t i = 0; i < n; i+=nn)
{
size_t firstIdx = i,
lastIdx = i + nn > n ? n : i + nn;
vFuns.push_back([=]()->void
{
for(size_t i = firstIdx; i < lastIdx; ++i)
{
Op(i);
}
});
}
return QtConcurrent::map<Sequence> //<-Segmentation fault!
(vFuns, [](std::function<void()> f)
{
f();
});
}
I've got segmentation fault in this place:
template<typename _Res, typename... _ArgTypes>
function<_Res(_ArgTypes...)>::
function(const function& __x)
: _Function_base()
{
if (static_cast<bool>(__x))
{
__x._M_manager(_M_functor, __x._M_functor, __clone_functor); //<-Segmentation fault!
_M_invoker = __x._M_invoker;
_M_manager = __x._M_manager;
}
}
Why is this happening? It seems that std::function had passed checking. How can I make this code working?
Thanks in advance!

I cannot reproduce your case but I can give you some example to illustrate issue
QFuture<void> test ()
{
QVector<int> v; // LOCAL VARIABLE IN SCOPE OF test FUNCTION
// preparing v vector
QFuture<void> f = QtConcurrent::map(v,someFunction); // returns immediately
return f;
}
[1] QtConcurrent::map takes v by reference NOT BY COPY.
[2] QtConcurrent::map returns immediately.
[3] So when test function ends, parallel operations started by map use v vector which was deleted because it is local variable in test function.
You can use waitForFinished for QFuture but then your function doesn't make sense because it blocks until parallel task ends.

A mpi program to add the numbers from 1 to 16000000, different results

*The master task first initializes an array and then distributes an equal portion that array to the other tasks. After the other tasks receive their portion of the array, they perform an addition operation to each array element.They also maintain a sum for their portion of the array. The master task does likewise with its portion of the array. As each of the non-master
tasks finish, they send their updated portion of the array to the master.
An MPI collective communication call is used to collect the sums maintained by each task. Finally, the master task displays selected parts of the final array and the global sum of all array elements. *
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define ARRAYSIZE 16000000
#define MASTER
float data[ARRAYSIZE];
int main (int argc, char *argv[])
{
int numtasks, taskid, rc, dest, source, offset, i, j, tag1,
tag2, chunksize;
float mysum, sum;
float update(int myoffset, int chunk, int myid);
MPI_Status status;
/******Initializations******/
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks % 4 != 0) {
printf("Quitting. Number of MPI tasks must be divisible by 4. \n");
MPI_Abort(MPI_COMM_WORLD, rc);
exit(0);
}
MPI_Comm_rank(MPI_COMM_WORLD, &taskid);
chunksize = ARRAYSIZE / numtasks;
tag2 = 1;
tag1 = 2;
/******Master task only ******/
if (taskid == MASTER){
/* Initialize the array */
sum = 0;
for (i = 0; i < ARRAYSIZE; i++){
data[i] = i * 1.0;
sum = sum + data[i];
}
printf("Initialized array sum = %e\n", sum);
/* Send each task its portion of the array - mater keeps 1st part */
offset = chunksize;
for (dest = 1; dest < numtasks; dest++){
MPI_Send(&offset, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD);
MPI_Send(&data[offset], chunksize, MPI_FLOAT, dest, tag2, MPI_COMM_WRLD);
printf("Sent %d elements to task %d offset = %d\n, chunksize, dest, offset);
offset = offset + chunksize;
}
/* Master does its part of the work */
offset = 0;
mysum = update(offset, chunksize, taskid);
/* Get final sum */
MPI_Reduce(&mysum, &sum, 1, MPI_FLOAT, MPI_SUM, MASTER, MPI_COMM_WORLD);
printf("***Final sum = %e ***\n", sum);
} /* end of master section */
/******Non-master tasks only ******/
if (taskid > MASTER){
/* Receive my portion of array from the master task */
source = MASTER;
MPI_Recv(&offset, 1, MPI_INT, source, tag1, MPI_COMM_WORLD, &status);
MPI_Recv(&data[offset], chunksize, MPI_FLOAT, source, tag2, MPI_COMM_WORLD, &status);
mysum = update(offset, chunksize, taskid);
MPI_Reduce(&mysum, &sum, 1, MPI_FLOAT, MPI_SUM, MASTER, MPI_COMM_WORLD);
} /* end of non-master */
MPI_Finalize();
} /* end of main */
float update(int myoffset, int chunk, int myid){
int i;
float mysum;
/* Perform addition to each of my array elements and keep my sum */
mysum = 0;
for (i = myoffset; i < myoffset + chunk; i++){
mysum = mysum + data[i];
}
printf("Task %d mysum = %e\n", myid, mysum);
return mysum;
}
/******The result of this program is: ******/
MPI task 0 has started...
MPI task 1 has started...
MPI task 2 has started...
MPI task 3 has started...
Initialized array sum = 1.335708e+14
Sent 4000000 elements to task 1 offset= 4000000
Sent 4000000 elements to task 2 offset= 8000000
Task 1 mysum = 2.442024e+13
Sent 4000000 elements to task 3 offset= 12000000
Task 2 mysum = 3.991501e+13
Task 3 mysum = 5.809336e+13
Task 0 mysum = 7.994294e+12
Sample results:
0.000000e+00 1.000000e+00 2.000000e+00 3.000000e+00 4.000000e+00
4.000000e+06 4.000001e+06 4.000002e+06 4.000003e+06 4.000004e+06
8.000000e+06 8.000001e+06 8.000002e+06 8.000003e+06 8.000004e+06
1.200000e+07 1.200000e+07 1.200000e+07 1.200000e+07 1.200000e+07
*** Final sum= 1.304229e+14 ***
*So my question is why these two sum don't hold the same value**

You are storing the result in a 32-bit floating-point number (i.e. a float) which simply isn't enough to maintain all the accuracy you need. What you are seeing is a classic example of how rounding errors accumulate differently depending on what order you add numbers together.
If you just replace all your floats by doubles then it is OK:
mpiexec -n 4 ./arraysum
Initialized array sum = 1.280000e+14
Sent 4000000 elements to task 1 offset = 4000000
Task 1 mysum = 2.400000e+13
Sent 4000000 elements to task 2 offset = 8000000
Task 2 mysum = 4.000000e+13
Sent 4000000 elements to task 3 offset = 12000000
Task 0 mysum = 7.999998e+12
Task 3 mysum = 5.600000e+13
***Final sum = 1.280000e+14 ***

Recursion in C unable to return result from the prototype!?

I'm not sure why this recursion is not working! I'm trying to get the total of an input from i=0 to n. I'm also testing recursion instead of 'for loop' to see how it performs. Program runs properly but stops after the input. I would appreciate any comments, thx!
int sigma (int n)
{
if (n <= 0) // Base Call
return 1;
else {
printf ("%d", n);
int sum = sigma( n+sigma(n-1) );
return sum;
}
// recursive call to calculate any sum>0;
// for example: input=3; sum=(3+sigma(3-1)); sum=(3+sigma(2))
// do sigma(2)=2+sigma(2-1)=2+sigma(1);
// so sigma(1)=1+sigma(1-1)=1+sigma(0)=1;
// finally, sigma(3)=3+2+1+0=6
}
int main (int argc, char *argv[])
{
int n;
printf("Enter a positive integer for sum : ");
scanf( " %d ", &n);
int sum = sigma(n);
printf("The sum of all numbers for your entry: %d\n", sum);
getch();
return 0;
}

Change
int sum = sigma( n+sigma(n-1) );
to
int sum = n + sigma( n-1 );
As you've written it, calling sigma(3) then calls sigma(5), etc...
Also, return 0 from the guard case, not 1.

I think it should be
int sum = n + sigma(n-1)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

MPI_Isend and MPI_Wait cause segmentation fault with large matrix - mpi

Related

Recursive FIbonacci arm Assembly

Behaviour of Future.delayed() in dart

QtConcurrent::map segmentation fault

A mpi program to add the numbers from 1 to 16000000, different results

Recursion in C unable to return result from the prototype!?

Categories

Resources