Signal 11 (Segmentation fault) on MPI_Test - mpi

I am receiving segmentation fault on MPI_Test in the following code. The code involves a sender process and a receiver process. The sender process uses non-blocking send to send an integer value to the receiver process 1000000 times. This is just a test code. I am receiving a segmentation fault on MPI_Test in the sender process and am not able to figure out the reason why.
int main(int argc, char* argv[]){
MPI_Init(&argc,&argv);
int rank,nodes;
int i,j;
MPI_Status stat;
int size,wait;
int msgs = atoi(argv[1]);
MPI_Comm_size(MPI_COMM_WORLD, &nodes);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Request req[msgs][nodes-1],req1;
if(rank==0){
size=2;
for(i=0;i<msgs;i++){
for(j=1;j<nodes;j++){
MPI_Isend(&size,1,MPI_INT,
j,0,MPI_COMM_WORLD,&(req[i][j-1]));
}
}
wait = 1;
i=0,j=0;
while(wait){
printf("at i=%d j=%d\n",i,j);
MPI_Test(&req[i][j], &wait, &stat);
wait = 1-wait;
if(!wait){
j++;
if(j==nodes-1){
j=0;
i++;
wait=1;
}
else{
wait=1;
}
if(i==msgs){
wait=0;
}
}
}
printf("Finished\n");
}
else{
for(i=0;i<msgs;i++){
MPI_Irecv (&size,1,MPI_INT,0,0,MPI_COMM_WORLD,&req1);
wait = 1;
while(wait){
MPI_Test(&req1, &wait, &stat);
wait = 1-wait;
}
if(size!=2){
printf("Received size=%d rank=%d\n",size,rank);
}
size=0;
}
printf("Finished rank=%d\n",rank);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}
The output of the above program is:-
Finished
OK
OK
OK
OK
OK
after which it gives the segmentation fault
mpirun noticed that process rank 0 with PID 4576 exited on signal 11 (Segmentation fault).

Related

Catch signals SIGUSR1 or SIGTERM with signalfd on MPI applications?

A simple MPI code to catch the SIGUSR1, SIGTERM signals with signalfd unix system call
#include <assert.h>
#include <poll.h>
#include <signal.h>
#include <stdio.h>
#include <sys/signalfd.h>
#include <unistd.h>
#include <mpi.h>
/*Simple MPI Code*/
int main(int argc, char* argv[]){
MPI_Init(&argc, &argv);
int err, nbProcs, rank;
sigset_t sigset;
int fd;
MPI_Comm_size(MPI_COMM_WORLD, &nbProcs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
****
Setup SIGALRM to be delivered via SignalFD
****
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGTERM);
sigaddset(&mask, SIGUSR1);
/* */
if (sigprocmask(SIG_BLOCK, &mask, NULL) < 0) {
printf("Failed to signalmask\n");
return -1;
}
fd = signalfd(-1, &mask, 0);
if (fd < 0)
return -1;
while (1) {
struct signalfd_siginfo si;
int ret;
ret = read(fd, &si, sizeof(si));// here the MPI process doesn't
return when I send SIGTERM or SIGUSR1
if (ret < 0)
return -1;
if (ret != sizeof(si))
return -1;
if (si.ssi_signo == SIGTERM)
printf("receive SIGTERM \\\n");
else if (si.ssi_signo == SIGUSR1)
printf("receive SIGUSR1\n");
}
MPI_Finalize();
return 0;
}
In this code, I am using signalfd to catch SIGUSR1, SIGTERM. At
ret = read(fd, &si, sizeof(si));
read() system call it doesn't return and the program dies at this point. When I use the same code without MPI simple C code it works perfect.Can anyone explain me how does signalfd works inside MPI???

Does MPI_Irecv complete on MPI_SSend or is MPI_Wait (or other variant still needed)

I understand that calls to MPI_Irecv need to be paired with a MPI_Wait (or MPI_Test etc.; cf. here to complete. However, with the code below the MPI_Ssend seems to be forcing the Irecv to complete.
#include <iostream>
#include "mpi.h"
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
int rank = 0;
int size = 0;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (size != 2)
MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
int send_buffer_0 = 0;
int send_buffer_1 = 1;
int recv_buffer_0 = -1;
int recv_buffer_1 = -1;
MPI_Request request;
if (rank == 0) {
MPI_Irecv(&recv_buffer_0, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &request);
MPI_Ssend(&send_buffer_0, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else {
MPI_Irecv(&recv_buffer_1, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &request);
MPI_Ssend(&send_buffer_1, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
}
// MPI_Wait(&request, MPI_STATUS_IGNORE);
if (rank == 1)
std::cout << "Rank " << rank << " recv'd " << recv_buffer_1 << std::endl;
MPI_Finalize();
return 0;
}
If I change the sends to MPI_Send then the MPI_Wait is needed for the message to be received as I suspect the MPI implementation is buffering the send.
Is the MPI_Ssend call forcing the Irecv complete such that a wait or test is not needed? Is this implementation specific or expected from the standard?
Here is what the synchronous mode:
A send that uses the synchronous mode can be started whether or not a matching receive was posted. However, the send will complete successfully only if a matching receive is posted, and the receive operation has started to receive the message sent by the synchronous send. Thus, the completion of a synchronous send not only indicates that the send buffer can be reused, but it also indicates that the receiver has reached a certain point in its execution, namely that it has started executing the matching receive. If both sends and receives are blocking operations then the use of the synchronous mode provides synchronous communication semantics: a communication does not complete at either end before both processes rendezvous at the communication.
There is nothing here that would imply that you don't need to wait with the code you provided.

MPI_ERR_TRUNCATE: message truncated

I am getting MPI_ERR_TRUNCATE: message truncated error in the following code. This is a test code in which receiver process receives two messages from sender. In the first message it receives the value of the number of integers it receives in the second message. In the second message it receives those integers.
int main(int argc, char* argv[]){
MPI_Init(&argc,&argv);
int rank,nodes;
int i,j;
MPI_Status stat;
int size,wait;
int msgs = atoi(argv[1]);
MPI_Comm_size(MPI_COMM_WORLD, &nodes);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Request req1;
MPI_Request*** req = (MPI_Request***)malloc(sizeof(MPI_Request**)*msgs);
for(i=0;i<msgs;i++){
req[i] = (MPI_Request**)malloc(sizeof(MPI_Request*)*(nodes-1));
for(j=0;j<nodes-1;j++){
req[i][j] = (MPI_Request*)malloc(sizeof(MPI_Request)*2);
}
}
if(rank==0){
int sent=0;
int data[10];
int pendingThreshold=100;
int sentThreshold=1000;
int completed=0;
size=2;
time_t t;
srand((unsigned) time(&t));
for(i=0;i<msgs;i++){
for(j=1;j<nodes;j++){
size=rand()%9+1;
printf("Sending size = %d at i=%d\n",size,i);
MPI_Isend(&size,1,MPI_INT,
j,0,MPI_COMM_WORLD,&(req[i][j-1][0]));
MPI_Isend (&data[0],size,
MPI_INT,j,1,MPI_COMM_WORLD,&(req[i][j-1][1]));
//Code for ensuring number of non blocking operations
//do not exceed a certain threshold
if(sent==sentThreshold){
while(sent>pendingThreshold){
int k=0;
wait=1;
while(wait){
MPI_Test(&req[completed][k][0], &wait, &stat);
wait = 1-wait;
if(!wait){
MPI_Test(&req[completed][k][1], &wait, &stat);
wait = 1-wait;
if(!wait){
k++;
if(k==nodes-1){
completed++;
sent--;
}
else{
wait=1;
}
}
}
}
}
}
}
sent++;
}
//Code for ensuring all non blocking operations are complete
wait = 1;
printf("Finished\n");
i=completed,j=0;
while(wait){
MPI_Test(&req[i][j][0], &wait, &stat);
wait = 1-wait;
if(!wait){
MPI_Test(&req[i][j][1], &wait, &stat);
wait = 1-wait;
if(!wait){
j++;
if(j==nodes-1){
j=0;
i++;
wait=1;
}
else{
wait=1;
}
if(i==msgs){
wait=0;
}
}
}
}
printf("Finished\n");
}
else{
int data[10];
MPI_Request req2;
for(i=0;i<msgs;i++){
MPI_Irecv (&size,1,MPI_INT,0,0,MPI_COMM_WORLD,&req1);
wait = 1;
while(wait){
MPI_Test(&req1, &wait, &stat);
wait = 1-wait;
}
wait = 1;
while(wait && i){
MPI_Test(&req2, &wait, &stat);
wait = 1-wait;
}
printf("Receiving size=%d at i=%d\n",size,i);
MPI_Irecv (&data[0],size,MPI_INT,0,1,MPI_COMM_WORLD,&req2);
size=0;
}
printf("Finished rank=%d\n",rank);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}
This program receives MPI_ERR_TRUNCATE after multiple successful send-receives. The error occurs when receiver process receives a wrong size of integers to be received in its second message. For example:-
Sending size = 8 at i=1496
Sending size = 2 at i=1497
Sending size = 7 at i=1498
Sending size = 5 at i=1499
Sending size = 5 at i=1500
Sending size = 5 at i=1501
Sending size = 4 at i=1502
Sending size = 9 at i=1503
Sending size = 4 at i=1504
Receiving size=8 at i=1496
Receiving size=2 at i=1497
Receiving size=7 at i=1498
Receiving size=6 at i=1499
Receiving size=6 at i=1500
Receiving size=6 at i=1501
Receiving size=6 at i=1502
Receiving size=6 at i=1503
The error occurs at message number 1499 and after that it is receiving the same size for the further messages received. I had run my code for 100000 messages to be sent to the receiver process.
The following code uses MPI_Iprobe and works fine even for 1000000 messages.
int main(int argc, char* argv[]){
MPI_Init(&argc,&argv);
int rank,nodes;
int i,j;
MPI_Status stat;
int size,wait;
int msgs = atoi(argv[1]);
MPI_Comm_size(MPI_COMM_WORLD, &nodes);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Request req1;
MPI_Request** req = (MPI_Request**)malloc(sizeof(MPI_Request*)*msgs);
for(i=0;i<msgs;i++){
req[i] = (MPI_Request*)malloc(sizeof(MPI_Request)*(nodes-1));
}
if(rank==0){
int sent=0;
int data[10];
int pendingThreshold=1000;
int sentThreshold=10000;
int completed=0;
size=2;
time_t t;
srand((unsigned) time(&t));
for(i=0;i<msgs;i++){
for(j=1;j<nodes;j++){
size=rand()%9+1;
printf("Sending size = %d at i=%d\n",size,i);
MPI_Isend (&data[0],size,
MPI_INT,j,1,MPI_COMM_WORLD,&(req[i][j-1]));
if(sent==sentThreshold){
while(sent>pendingThreshold){
int k=0;
wait=1;
while(wait){
MPI_Test(&req[completed][k], &wait, &stat);
wait = 1-wait;
if(!wait){
k++;
if(k==nodes-1){
completed++;
sent--;
}
else{
wait=1;
}
}
}
}
}
}
sent++;
}
wait = 1;
printf("Finished\n");
i=completed,j=0;
while(wait){
MPI_Test(&req[i][j], &wait, &stat);
wait = 1-wait;
if(!wait){
j++;
if(j==nodes-1){
j=0;
i++;
wait=1;
}
else{
wait=1;
}
if(i==msgs){
wait=0;
}
}
}
printf("Finished\n");
}
else{
int data[10];
MPI_Request req2;
for(i=0;i<msgs;i++){
wait = 1;
while(wait){
MPI_Iprobe(0,1,MPI_COMM_WORLD,&wait,&stat);
wait = 1-wait;
}
MPI_Get_count(&stat,MPI_INT,&size);
printf("Receiving size=%d at i=%d\n",size,i);
MPI_Irecv (&data[0],size,MPI_INT,0,1,MPI_COMM_WORLD,&req2);
size=0;
}
printf("Finished rank=%d\n",rank);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}

How to use and interpret MPI-IO Error codes?

#include <stdio.h>
#include <iostream>
#include <Windows.h>
#include <C:\Program Files\Microsoft MPI\Inc\mpi.h>
using namespace std;
#define BUFSIZE 128
int main (int argc, char *argv[])
{
int err;
int rank;
int size;
double start_time = 0.0;
double end_time;
MPI_Comm comm = MPI_COMM_WORLD;
MPI_File file;
char cbuf[BUFSIZE];
for(int i = 0; i < BUFSIZE; i++)
{
cbuf[i] = 'a' + i;
}
if(err = MPI_Init(&argc, &argv))
{
printf("%s \n", "Error! MPI is halted!");
MPI_Abort(comm, err);
}
MPI_Comm_size(comm, &size);
MPI_Comm_rank(comm, &rank);
if(rank == 0)
{
start_time = MPI_Wtime();
}
err = MPI_File_open(comm, "testfile", MPI_MODE_CREATE | MPI_MODE_RDWR, MPI_INFO_NULL, &file);
if(err != MPI_SUCCESS)
{
printf("Error %d! Can't open the file!\n", err);
MPI_Abort(comm, err);
return EXIT_FAILURE;
}
err = MPI_File_set_view(file, (MPI_Offset) (rank * BUFSIZE * sizeof(char)), MPI_CHAR, MPI_CHAR, "native", MPI_INFO_NULL);
if(err != MPI_SUCCESS)
{
printf("%s \n", "Error! Can't set the view!");
MPI_Abort(comm, err);
return EXIT_FAILURE;
}
err = MPI_File_write(file, cbuf, BUFSIZE, MPI_CHAR, MPI_STATUSES_IGNORE);
if(err != MPI_SUCCESS)
{
printf("%s \n", "Error! Problems with writing!");
MPI_Abort(comm, err);
return EXIT_FAILURE;
}
MPI_File_close(&file);
if(rank == 0)
{
end_time = MPI_Wtime();
printf("Time elapsed : %f seconds", (end_time - start_time) * 1000);
}
MPI_Finalize();
return EXIT_SUCCESS;
}
I'm trying to write some symbols to a file with MPI. When I do that, I get an errorcode of 288 and the file can't be opened. I used command line: mpiexec -n 10 myapp.exe. I was searching for the errorcode but didn't find anything at all.
Go one step further. Your error code doesn't mean anything by itself. But, you can feed that code to MPI_Error_string and get something more human readable. I have this function in every MPI-IO code I write:
static void handle_error(int errcode, char *str)
{
char msg[MPI_MAX_ERROR_STRING];
int resultlen;
MPI_Error_string(errcode, msg, &resultlen);
fprintf(stderr, "%s: %s\n", str, msg);
MPI_Abort(MPI_COMM_WORLD, 1);
}
And then define this macro:
#define MPI_CHECK(fn) { int errcode; errcode = (fn);\
if (errcode != MPI_SUCCESS) handle_error (errcode, #fn ); }
So I can call routines like this:
CHECK(MPI_File_open(comm, "testfile",
MPI_MODE_CREATE | MPI_MODE_RDWR, MPI_INFO_NULL, &file) );

Raspberry Pi UART program in C using termios receives garbage (Rx and Tx are connected directly)

I have a simple program written in C which uses termios to send a basic string to the Raspberry Pi UART and attempts to read and output the response. The Rx and Tx pins on the Raspberry Pi are connected with a jumper so whatever is sent should be immediately received.
Despite the program outputting that it successfully sent and received 5 characters for the chosen string ('Hello'), trying to print the contents of the buffer just produces one or two garbage characters.
The program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <termios.h>
int main(int argc, char* argv[]) {
struct termios serial;
char* str = "Hello";
char buffer[10];
if (argc == 1) {
printf("Usage: %s [device]\n\n", argv[0]);
return -1;
}
printf("Opening %s\n", argv[1]);
int fd = open(argv[1], O_RDWR | O_NOCTTY | O_NDELAY);
if (fd == -1) {
perror(argv[1]);
return -1;
}
if (tcgetattr(fd, &serial) < 0) {
perror("Getting configuration");
return -1;
}
// Set up Serial Configuration
serial.c_iflag = 0;
serial.c_oflag = 0;
serial.c_lflag = 0;
serial.c_cflag = 0;
serial.c_cc[VMIN] = 0;
serial.c_cc[VTIME] = 0;
serial.c_cflag = B115200 | CS8 | CREAD;
tcsetattr(fd, TCSANOW, &serial); // Apply configuration
// Attempt to send and receive
printf("Sending: %s\n", str);
int wcount = write(fd, &str, strlen(str));
if (wcount < 0) {
perror("Write");
return -1;
}
else {
printf("Sent %d characters\n", wcount);
}
int rcount = read(fd, &buffer, sizeof(buffer));
if (rcount < 0) {
perror("Read");
return -1;
}
else {
printf("Received %d characters\n", rcount);
}
buffer[rcount] = '\0';
printf("Received: %s\n", buffer);
close(fd);
}
Outputs:
Opening /dev/ttyAMA0
Sending: Hello
Sent 5 characters
Received 5 characters
Received: [garbage]
I can't see any major problem with the code myself, but I might be wrong. I can successfully send and receive characters using PuTTY connected with the same settings, so it can't really be a hardware problem. Although I haven't tried it in PuTTY, trying to connect with anything less than 115200 baud with this program will result in nothing being received.
Where am I going wrong?
int wcount = write(fd, &str, strlen(str));
int rcount = read(fd, &buffer, sizeof(buffer));
In these lines, buffer/str are already pointers. You are passing a pointer to a pointer.
The lines should be:
int wcount = write(fd, str, strlen(str));
int rcount = read(fd, buffer, sizeof(buffer));

Resources