MPI datatype for structure of arrays - mpi

In an MPI application I have a distributed array of floats and two "parallel" arrays of integers: for each float value there are two associated integers that describe the corresponding value. For the sake of cache-efficiency I want to treat them as three different arrays, i.e. as a structure of arrays, rather than an array of structures.
Now, I have to gather all these values into the first node. I can do this in just one communication instruction, by defining an MPI type, corresponding to a structure, with one float and two integers. But this would force me to use the array of structures pattern instead of the structure of arrays one.
So, I can choose between:
Performing three different communications, one for each array and keep the efficient structure of arrays arrangement
Defining an MPI type, perform a single communication, and deal with the resulting array of structures by adjusting my algorithm or rearranging the data
Do you know a third option that would allow me do have the best of both worlds, i.e. having a single communication and keeping the cache-efficient configuration?

You take take a look at Packing and Unpacking.
http://www.mpi-forum.org/docs/mpi-11-html/node62.html
However, I think if you want to pass a same "structure" often you should define you own MPI derivate type.

E.g. by using the *array_of_blocklength* parameter of MPI_Type_create_struct
// #file mpi_compound.cpp
#include <iterator>
#include <cstdlib> // for rng
#include <ctime> // for rng inits
#include <iostream>
#include <algorithm>
#include <mpi.h>
const std::size_t N = 10;
struct Asset {
float f[N];
int m[N], n[N];
void randomize() {
srand(time(NULL));
srand48(time(NULL));
std::generate(&f[0], &f[0] + N, drand48);
std::generate(&n[0], &n[0] + N, rand);
std::generate(&m[0], &m[0] + N, rand);
}
};
int main(int argc, char* argv[]) {
MPI_Init(&argc,&argv);
int rank,comm_size;
MPI_Status stat;
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&comm_size);
Asset a;
MPI_Datatype types[3] = { MPI_FLOAT, MPI_INT, MPI_INT };
int bls[3] = { N, N, N };
MPI_Aint disps[3];
disps[0] = 0;
disps[1] = int(&(a.m[0]) - (int*)&a)*sizeof(int);
disps[2] = int(&(a.n[0]) - (int*)&a)*sizeof(int);
MPI_Datatype MPI_USER_ASSET;
MPI_Type_create_struct(3, bls, disps, types, &MPI_USER_ASSET);
MPI_Type_commit(&MPI_USER_ASSET);
if(rank==0) {
a.randomize();
std::copy(&a.f[0], &a.f[0] + N, std::ostream_iterator<float>(std::cout, " "));
std::cout << std::endl;
std::copy(&a.m[0], &a.m[0] + N, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
std::copy(&a.n[0], &a.n[0] + N, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
MPI_Send(&a.f[0],1,MPI_USER_ASSET,1,0,MPI_COMM_WORLD);
} else {
MPI_Recv(&a.f[0],1,MPI_USER_ASSET,0,0,MPI_COMM_WORLD, &stat);
std::cout << "\t=> ";
std::copy(&a.f[0], &a.f[0] + N, std::ostream_iterator<float>(std::cout, " "));
std::cout << std::endl << "\t=> ";
std::copy(&a.m[0], &a.m[0] + N, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl << "\t=> ";
std::copy(&a.n[0], &a.n[0] + N, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
}
MPI_Type_free(&MPI_USER_ASSET);
MPI_Finalize();
return 0;
}
worked with
mpirun -n 2 ./mpi_compound
with mpich2 v1.5 (HYDRA) on x86_86 linux and g++ 4.4.5-8 - based mpic++

Related

MPI: Multiple Overlapping Communicators

I want to create MPI communicators linking the process with rank 0 to every other process. Suppose n is the total number of processes. Then the process with rank 0 is supposed to have n-1 communicators while each of the other processes has one communicator. Is this possible, and, if it is, why can I not use the program below to achieve this?
Compiling the code below using the mpic++ compiler terminates without warnings and errors on my computer. But when I run the resulting program using 3 or more processes (mpiexec -n 3), it never terminates.
Likely I'm misunderstanding the concept of communicators in MPI. Maybe someone can help me understand why the program below gets stuck, and what is a better way to create those communicators? Thanks.
#include <iostream>
#include <vector>
#include <thread>
#include <chrono>
#include "mpi.h"
void FinalizeMPI();
void InitMPI(int argc, char** argv);
int main(int argc, char** argv) {
InitMPI(argc, argv);
int rank,comm_size;
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&comm_size);
if(comm_size<2) {
FinalizeMPI();
return 0;
}
MPI_Group GroupAll;
MPI_Comm_group(MPI_COMM_WORLD, &GroupAll);
if(rank==0) {
std::vector<MPI_Group> myGroups(comm_size-1);
std::vector<MPI_Comm> myComms(comm_size-1);
for(int k=1;k<comm_size;++k) {
int ranks[2]{0, k};
MPI_Group_incl(GroupAll, 2, ranks, &myGroups[k-1]);
int err = MPI_Comm_create(MPI_COMM_WORLD, myGroups[k-1], &myComms[k-1]);
std::cout << "Error: " << err << std::endl;
}
} else {
MPI_Group myGroup;
MPI_Comm myComm;
int ranks[2]{0,rank};
MPI_Group_incl(GroupAll, 2, ranks, &myGroup);
int err = MPI_Comm_create(MPI_COMM_WORLD, myGroup, &myComm);
std::cout << "Error: " << err << std::endl;
}
std::cout << "Communicators created: " << rank << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
FinalizeMPI();
return 0;
}
void FinalizeMPI() {
int flag;
MPI_Finalized(&flag);
if(!flag)
MPI_Finalize();
}
void InitMPI(int argc, char** argv) {
int flag;
MPI_Initialized(&flag);
if(!flag) {
int provided_Support;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided_Support);
if(provided_Support!=MPI_THREAD_MULTIPLE) {
exit(0);
}
}
}
MPI_Comm_create is a collective operation on the initial communicator (MPI_COMM_WORLD) - you must call it on all processes.
The simplest way to fix the issue is to use MPI_Comm_create_group just the way you do it. It is similar to MPI_Comm_create except that it is collective over the group.

CGAL Convex Hull, with Qt

I am attempting to convert an existing application from C# to C++/Qt. The existing code is using the MIConvexHull library to calculate the convex hull of a set of points in 3-dimensional space. It uses the Faces function to get a list of the faces, and then loops through them to get the individual vertices for each face. I want to do this with the CGAL library, but there doesn't seem to be an obvious way to do this. Creating the convex hull using the convex_hull_ 3 function, but from there it isn't obvious what to do.
I need to iterate through the facets of the resulting polyhedron object. For each facet, I need to iterate through the vertices. For each vertex, I need to extract the x, y and z coordinates, to form a QVector3D object.
Here is a code snippet of the existing C# code. In this case, baseContour is a list of 3D vertices.
var triangulationFaces = MIConvexHull.ConvexHull.Create(baseContour).Faces;
var triangulationPoints = new List<Point3D>();
var triangulationIndices = new List<int>();
int i = 0;
foreach (var f in triangulationFaces)
{
var x = f.Vertices.Select(p => new Point3D(p.Position[0], p.Position[1], p.Position[2])).ToList();
triangulationPoints.AddRange(x);
triangulationIndices.Add(3 * i);
triangulationIndices.Add(3 * i + 1);
triangulationIndices.Add(3 * i + 2);
i++;
}
I am at a loss for how to do this with the CGAL library. I have read quite a bit of the documentation, but it seems to assume you already have graduate level knowledge of computational geometry, which I do not. Anything to point me in the right direction would be appreciated
There is an example in the user manual.
I used it to do what you want:
#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Polyhedron_3.h>
#include <CGAL/boost/graph/graph_traits_Polyhedron_3.h>
#include <CGAL/Unique_hash_map.h>
#include <CGAL/convex_hull_3.h>
#include <vector>
#include <fstream>
#include <boost/foreach.hpp>
typedef CGAL::Exact_predicates_inexact_constructions_kernel K;
typedef CGAL::Polyhedron_3<K> Polyhedron_3;
typedef K::Point_3 Point_3;
typedef boost::graph_traits<Polyhedron_3>::vertex_descriptor vertex_descriptor;
typedef boost::graph_traits<Polyhedron_3>::face_descriptor face_descriptor;
int main(int argc, char* argv[])
{
// get the input points from a file
std::ifstream in(argv[1]);
std::vector<Point_3> points;
Point_3 p;
while(in >> p){
points.push_back(p);
}
// define polyhedron to hold convex hull
Polyhedron_3 poly;
// compute convex hull of non-collinear points
CGAL::convex_hull_3(points.begin(), points.end(), poly);
std::cout << "The convex hull contains "
<< num_vertices(poly) << " vertices"
<< " and " << num_faces(poly) << " faces" << std::endl;
// A hash map that will associate an index to each vertex
CGAL::Unique_hash_map<vertex_descriptor,int> index;
int i = 0;
// In case your compiler supports C++11 you can replace
// use the next line instead of the line with BOOST_FOREACH
// for(vertex_descriptor vd : vertices(poly)){
BOOST_FOREACH(vertex_descriptor vd, vertices(poly)){
std::cout << vd->point() << std::endl;
index[vd]= i++;
}
// loop over the faces and for each face loop over the vertices
// Again you can replace with for( .. : .. )
BOOST_FOREACH(face_descriptor fd, faces(poly)){
BOOST_FOREACH(vertex_descriptor vd, vertices_around_face(halfedge(fd,poly),poly)){
std::cout << index[vd] << " ";
}
std::cout << std::endl;
}
return 0;
}

parallelize for loop using boost MPI

I am learning to use Boost.MPI to parallelize the large amount of computation, here below is just my simple test see if I can get MPI logic correctly. However, I did not get it to work. I used world.size()=10, there are total 50 elements in data array, each process will do 5 iteration. I would hope to update data array by having each process sending the updated data array to root process, and then the root process receives the updated data array then print out. But I only get a few elements updated.
Thanks for helping me.
#include <boost/mpi.hpp>
#include <iostream>
#include <cstdlib>
namespace mpi = boost::mpi;
using namespace std;
#define max_rows 100
int data[max_rows];
int modifyArr(const int index, const int arr[]) {
return arr[index]*2+1;
}
int main(int argc, char* argv[])
{
mpi::environment env(argc, argv);
mpi::communicator world;
int num_rows = 50;
int my_number;
if (world.rank() == 0) {
for ( int i = 0; i < num_rows; i++)
data[i] = i + 1;
}
broadcast(world, data, 0);
for (int i = world.rank(); i < num_rows; i += world.size()) {
my_number = modifyArr(i, data);
data[i] = my_number;
world.send(0, 1, data);
//cout << "i=" << i << " my_number=" << my_number << endl;
if (world.rank() == 0)
for (int j = 1; j < world.size(); j++)
mpi::status s = world.recv(boost::mpi::any_source, 1, data);
}
if (world.rank() == 0) {
for ( int i = 0; i < num_rows; i++)
cout << "i=" << i << " results = " << data[i] << endl;
}
return 0;
}
Your problem is probably here:
mpi::status s = world.recv(boost::mpi::any_source, 1, data);
This is the only way data can get back to the master node.
However, you do not tell the master node where in data to store the answers it is getting. Since data is the address of the array, everything should get stored in the zeroth element.
Interleaving which elements of the array you are processing on each node is a pretty bad idea. You should assign blocks of the array to each node so that you can send entire chunks of the array at once. That will reduce communication overhead significantly.
Also, if your issue is simply speeding up for loops, you should consider OpenMP, which can do things like this:
#pragma omp parallel for
for(int i=0;i<100;i++)
data[i]*=4;
Bam! I just split that for loop up between all of my processes with no further work needed.

Qt4 QHash hash collision?

I am using QT 4.8 and I notice that it has a QHash class which can be used as follows:
QHash<QString, int> hash;
hash["one"] = 1;
hash["three"] = 3;
hash["seven"] = 7;
hash.insert("twelve", 12);
If there is a hash collision, will it be handled correctly?
Yes, collisions will be handled. QHash is a standard implementation of the classic hash-table based container and wouldn't be very reliable if it didn't handle collisions correctly. Typically a hash-table based container will map keys not to a single entry in the list but to a "bucket" which may contain more than one entry where different keys map to the same hash value.
When fetching values, the hash value for the key leads to the correct bucket then the container will iterate through the entries in the bucket until it finds a match for the particular key you are looking for.
Although I could not find a specific reference in the documentation to the "correctness" of Qt's implementation, this quote eludes to it. I can't imagine it being otherwise.
QHash's internal hash table grows by powers of two, and each time it
grows, the items are relocated in a new bucket, computed as qHash(key)
% QHash::capacity() (the number of buckets).
A simple test will increase our confidence:
BadHashOjbect.h
#ifndef BADHASHOBJECT_H
#define BADHASHOBJECT_H
class BadHashObject
{
public:
BadHashObject(const int value): value(value){}
int getValue() const
{
return value;
}
private:
int value;
};
bool operator==(const BadHashObject &b1, const BadHashObject &b2)
{
return b1.getValue() == b2.getValue();
}
uint qHash(const BadHashObject &/*key*/)
{
return 1;
}
#endif // BADHASHOBJECT_H
main.cpp
#include <iostream>
#include <QHash>
#include "BadHashObject.h"
using namespace std;
int main(int , char **)
{
cout << "Hash of BadHashObject(10) is: " << qHash(BadHashObject(10)) << endl;
cout << "Hash of BadHashObject(100) is: " << qHash(BadHashObject(100)) << endl;
cout << "Adding BadHashObject(10), value10 and BadHashObject(100), value100" << endl;
QHash<BadHashObject, QString> hashMap;
hashMap.insert(BadHashObject(10), QString("value10"));
hashMap.insert(BadHashObject(100), QString("value100"));
cout << "Size of hashMap: " << hashMap.size() << endl;
cout << "Value stored with key 10: " << hashMap.value(BadHashObject(10)).toStdString() << endl;
cout << "Value stored with key 100: " << hashMap.value(BadHashObject(100)).toStdString() << endl;
}
The BadHashObject class stores an int and its hash function will always return 1 so all objects added to a QHash using this type as a key will result in a collision. The output from our test program shows that the collision is handled properly.
Hash of BadHashObject(10) is: 1
Hash of BadHashObject(100) is: 1
Adding BadHashObject(10), value10 and BadHashObject(100), value100
Size of hashMap: 2
Value stored with key 10: value10
Value stored with key 100: value100

Runtime allocation of multidimensional array

So far I thought that the following syntax was invalid,
int B[ydim][xdim];
But today I tried and it worked! I ran it many times to make sure it did not work by chance, even valgrind didn't report any segfault or memory leak!! I am very surprised. Is it a new feature introduced in g++? I always have used 1D arrays to store matrices by indexing them with correct strides as done with A in the program below. But this new method, as with B, is so simple and elegant that I have always wanted. Is it really safe to use? See the sample program.
PS. I am compiling it with g++-4.4.3, if that matters.
#include <cstdlib>
#include <iostream>
int test(int ydim, int xdim) {
// Allocate 1D array
int *A = new int[xdim*ydim](); // with C++ new operator
// int *A = (int *) malloc(xdim*ydim * sizeof(int)); // or with C style malloc
if (A == NULL)
return EXIT_FAILURE;
// Declare a 2D array of variable size
int B[ydim][xdim];
// populate matrices A and B
for(int y = 0; y < ydim; y++) {
for(int x = 0; x < xdim; x++) {
A[y*xdim + x] = y*xdim + x;
B[y][x] = y*xdim + x;
}
}
// read out matrix A
for(int y = 0; y < ydim; y++) {
for(int x = 0; x < xdim; x++)
std::cout << A[y*xdim + x] << " ";
std::cout << std::endl;
}
std::cout << std::endl;
// read out matrix B
for(int y = 0; y < ydim; y++) {
for(int x = 0; x < xdim; x++)
std::cout << B[y][x] << " ";
std::cout << std::endl;
}
delete []A;
// free(A); // or in C style
return EXIT_SUCCESS;
}
int main() {
return test(5, 8);
}
int b[ydim][xdim] is declaring a 2-d array on the stack. new, on the other hand, allocates the array on the heap.
For any non-trivial array size, it's almost certainly better to have it on the heap, lest you run yourself out of stack space, or if you want to pass the array back to something outside the current scope.
This is a C99 'variable length array' or VLA. If they are supported by g++ too, then I believe it is an extension of the C++ standard.
Nice, aren't they?

Resources