I've got a very strange bug while using MPI. A successfully created communicator can't be deleted. A deleting attempt results in FATAL ERROR on all nodes except the ones that are included in communicator group. The minimal working example is below. What do you think about the reason of such strange behaviour?
#include <stdio.h>
#include <mpi.h>
int main(int argc, char* argv[])
{
MPI_Group group_world; // group of MPI_COMM_WORLD
MPI_Group group_new; // new group
MPI_Comm comm_new; // new communicator
int group_new_ranks[3]={10,20,30}; // new communicator's ranks
MPI_Init(&argc, &argv);
MPI_Comm_group(MPI_COMM_WORLD, &group_world); // get group_world - MPI_SUCCESS for all nodes
MPI_Group_incl(group_world, 3, group_new_ranks, &group_new); // get new group - MPI_SUCCESS for all nodes
MPI_Comm_create(MPI_COMM_WORLD, group_new, &comm_new); // create new communicator - MPI_SUCCESS for all nodes
MPI_Comm_free(&comm_new); // FATAL ERROR for all nodes except 10, 20, 30
MPI_Group_free(&group_new);
MPI_Group_free(&group_world);
MPI_Finalize();
return 0;
}
MPI_Comm_create() returns MPI_COMM_NULL to all processes not within the group. You're passing MPI_COMM_NULL to MPI_Comm_free(), which is not allowed.
Related
Notice:
I think I figured out why its not working. It was the line libusb_device_handle**. I am trying to create multiple handles. Is this correct or am I only allowed one handle. Is it because I am using pointers wrong? Do I need to create multiple contexts for separate devices?
Using libusb-1.0 am trying to get the device descriptor for a usb device. The code compiles and no error messages are displayed, but it ends in a segmentation default. I have narrow it down to a cause. Opening a libusb_device and declaring a libusb_device_descriptor variable causes the seg fault.
#include <libusb.h>
int main(int argc, char const *argv[])
{
libusb_context* context;
libusb_device** devices;
libusb_device_handle** handles;
libusb_device_descriptor descriptor;
libusb_init(&context);
libusb_set_debug(context,LIBUSB_LOG_LEVEL_WARNING);
int count = libusb_get_device_list(context,&devices);
for(int i = 0u; i < count; ++i)
{
libusb_open(devices[i],&handles[i]);
libusb_close(handles[i]);
}
libusb_free_device_list(devices,1);
libusb_exit(context);
return 0;
}
I was trying to use a for each [Modern c++ style] but the code is crashed each time!
It was something like :
for(auto &k:mimeData->formats())
{ ... }
And later out of my surprises I found that the QStringList returned by formats is either invalid or completely separate object though the internal data is ought to be same!
So I tried to figure out in more simple example :
#include <iostream>
#include <string>
#include <list>
#include <QCoreApplication>
#include <QMimeData>
using namespace std;
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
cout<<boolalpha;
list<string> ls;
cout<<(ls.begin() != ls.end())<<"\n";
QStringList qsl;
cout<<(qsl.begin() != qsl.end())<<"\n";
QMimeData qme;
cout<<(qme.formats().begin() != qme.formats().end())<<"\n";
cout<<"OMG! is it empty? -> "<<qme.formats().empty()<<"\n";
return a.exec();
}
The output is something like :
false
false
true
OMG! is it empty? -> true
Until or unless I take an rvalue reference I cant decide what is happening internally!
I really need a solution to use it with range based for loops, not Qt's foreach!
P.S. I dont want to copy it to avoid O(n).
Looking at the docs, there's no guarantee QMimeData class keeps QStringList of supported formats (http://doc.qt.io/qt-4.8/qmimedata.html#formats) as a field.
The source code supports that (Qt5.4/Src/qtbase/src/corelib/kernel/qmimedata.cpp:593):
QStringList QMimeData::formats() const
{
Q_D(const QMimeData);
QStringList list;
for (int i=0; i<d->dataList.size(); i++)
list += d->dataList.at(i).format;
return list;
}
Therefore this list is constructed on every call to formats(). Farther calls to it will always yield a separate container.
Since you do need to preserve it to traverse it, I'd recommend keeping a local copy of it. Do note that C++11 allows for it to be moved constructed (or in fact - optimized even better).
It was my understanding that MPI communicators restrict the scope of communication, such
that messages sent from one communicator should never be received in a different one.
However, the program inlined below appears to contradict this.
I understand that the MPI_Send call returns before a matching receive is posted because of the internal buffering it does under the hood (as opposed to MPI_Ssend). I also understand that MPI_Comm_free doesn't destroy the communicator right away, but merely marks it for deallocation and waits for any pending operations to finish. I suppose that my unmatched send operation will be forever pending, but then I wonder how come the same object (integer value) is reused for the second communicator!?
Is this normal behaviour, a bug in the MPI library implementation, or is it that my program is just incorrect?
Any suggestions are much appreciated!
LATER EDIT: posted follow-up question
#include "stdio.h"
#include "unistd.h"
#include "mpi.h"
int main(int argc, char* argv[]) {
int rank, size;
MPI_Group group;
MPI_Comm my_comm;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_group(MPI_COMM_WORLD, &group);
MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
if (rank == 0) printf("created communicator %d\n", my_comm);
if (rank == 1) {
int msg = 123;
MPI_Send(&msg, 1, MPI_INT, 0, 0, my_comm);
printf("rank 1: message sent\n");
}
sleep(1);
if (rank == 0) printf("freeing communicator %d\n", my_comm);
MPI_Comm_free(&my_comm);
sleep(2);
MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
if (rank == 0) printf("created communicator %d\n", my_comm);
if (rank == 0) {
int msg;
MPI_Recv(&msg, 1, MPI_INT, 1, 0, my_comm, MPI_STATUS_IGNORE);
printf("rank 0: message received\n");
}
sleep(1);
if (rank == 0) printf("freeing communicator %d\n", my_comm);
MPI_Comm_free(&my_comm);
MPI_Finalize();
return 0;
}
outputs:
created communicator -2080374784
rank 1: message sent
freeing communicator -2080374784
created communicator -2080374784
rank 0: message received
freeing communicator -2080374784
The number you're seeing is simply a handle for the communicator. It's safe to reuse the handle since you've freed it. As to why you're able to send the message, look at how you're creating the communicator. When you use MPI_Comm_group, you're getting a group containing the ranks associated with the specified communicator. In this case, you get all of the ranks, since you are getting the group for MPI_COMM_WORLD. Then, you are using MPI_Comm_create to create a communicator based on a group of ranks. You are using the same group you just got, which will contain all of the ranks. So your new communicator has all of the ranks from MPI_COMM_WORLD. If you want your communicator to only contain a subset of ranks, you'll need to use a different function (or multiple functions) to make the desired group(s). I'd recommend reading through Chapter 6 of the MPI Standard, it includes all of the functions you'll need. Pick what you need to build the communicator you want.
I have this simple code of c,c++, which runs fine:
struct node{
int x;
struct node* next;
};
int main(int argc, char *argv[])
{
int a,b,c;
struct node *root, *node1, *node2, *leaf;
root = malloc( sizeof(struct node) );
//root = NULL;
root->next = 0;
root->x = 12;
system("PAUSE");
return 0;
}
Now if I assign root = NULL or 0, instead of malloc, it gives me this error-
"An access violation (Segmentation fault) raised in your program"
While in the next line i could assign 0 to next pointer. Please explain
In this forum I read segmentation fault occurs when we try to access secured memory space, while operating system defines it like a page fault when the required data is not found in memory. Are they related ? (SF of C, C++ and SF of Operating System)
If you set root = NULL;, it will do exactly that, and will not generate a segmentation fault.
The segfault happens on the next line, when you try to dereference the NULL pointer.
If you comment out the two lines below it, you will observe no segfault.
You cannot and should not dereference A NULL pointer.
My first thought was MPI_Scatter and send-buffer allocation should be used in if(proc_id == 0) clause, because the data should be scattered only once and each process needs only a portion of data in send-buffer, however it didn't work correctly.
It appears that send-buffer allocation and MPI_Scatter must be executed by all processes before the application goes right.
So I wander, what's the philosophy for the existence of MPI_Scatter since all processes have access to the send-buffer.
Any help will be grateful.
Edit:
Code I wrote like this:
if (proc_id == 0) {
int * data = (int *)malloc(size*sizeof(int) * proc_size * recv_size);
for (int i = 0; i < proc_size * recv_size; i++) data[i] = i;
ierr = MPI_Scatter(&(data[0]), recv_size, MPI_INT, &recievedata, recv_size, MPI_INT, 0, MPI_COMM_WORLD);
}
I thought, that's enough for root processes to scatter data, what the other processes need to do is just receiving data. So I put MPI_Scatter, along with send buffer definition & allocation, in the if(proc_id == 0) statement. No compile/runtime error/warning, but the receive buffer of other processes didn't receive it's corresponding part of data.
Your question isn't very clear, and would be a lot easier to understand if you showed some code that you were having trouble with. Here's what I think you're asking -- and I'm only guessing this because this is an error I've seen people new to MPI in C make.
If you have some code like this:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char **argv) {
int proc_id, size, ierr;
int *data;
int recievedata;
ierr = MPI_Init(&argc, &argv);
ierr|= MPI_Comm_size(MPI_COMM_WORLD,&size);
ierr|= MPI_Comm_rank(MPI_COMM_WORLD,&proc_id);
if (proc_id == 0) {
data = (int *)malloc(size*sizeof(int));
for (int i=0; i<size; i++) data[i] = i;
}
ierr = MPI_Scatter(&(data[0]), 1, MPI_INT,
&recievedata, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf("Rank %d recieved <%d>\n", proc_id, recievedata);
if (proc_id == 0) free(data);
ierr = MPI_Finalize();
return 0;
}
why doesn't it work, and why do you get a segmentation fault? Of course the other processes don't have access to data; that's the whole point.
The answer is that in the non-root processes, the sendbuf argument (the first argument to MPI_Scatter()) isn't used. So the non-root processes don't need access to data. But you still can't go around dereferencing a pointer that you haven't defined. So you need to make sure all the C code is valid. But data can be NULL or completely undefined on all the other processes; you just have to make sure you're not accidentally dereferencing it. So this works just fine, for instance:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char **argv) {
int proc_id, size, ierr;
int *data;
int recievedata;
ierr = MPI_Init(&argc, &argv);
ierr|= MPI_Comm_size(MPI_COMM_WORLD,&size);
ierr|= MPI_Comm_rank(MPI_COMM_WORLD,&proc_id);
if (proc_id == 0) {
data = (int *)malloc(size*sizeof(int));
for (int i=0; i<size; i++) data[i] = i;
} else {
data = NULL;
}
ierr = MPI_Scatter(data, 1, MPI_INT,
&recievedata, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf("Rank %d recieved <%d>\n", proc_id, recievedata);
if (proc_id == 0) free(data);
ierr = MPI_Finalize();
return 0;
}
If you're using "multidimensional arrays" in C, and say scattering a row of a matrix, then you have to jump through an extra hoop or two to make this work, but it's still pretty easy.
Update:
Note that in the above code, all routines called Scatter - both the sender and the recievers. (Actually, the sender is also a receiver).
In the message passing paradigm, both the sender and the receiver have to cooperate to send data. In principle, these tasks could be on different computers, housed perhaps in different buildings -- nothing is shared between them. So there's no way for Task 1 to just "put" data into some part of Task 2's memory. (Note that MPI2 has "one sided messages", but even that requires a significant degree of cordination between sender and reciever, as a window has to be put asside to push data into or pull data out of).
The classic example of this is send/recieve pairs; it's not enough that (say) process 0 sends data to process 3, process 3 also has to recieve data.
The MPI_Scatter function contains both send and recieve logic. The root process (specified here as 0) sends out the data, and all the recievers recieve; everyone participating has to call the routine. Scatter is an example of an MPI Collective Operation, where all tasks in the communicator have to call the same routine. Broadcast, barrier, reduction operations, and gather operations are other examples.
If you have only process 0 call the scatter operation, your program will hang, waiting forever for the other tasks to participate.