I have to transfer a number of elements of type:
typedef struct
{
float w;
int a, b;
} edge;
to different processes, hence I am creating an MPI derived type like this:
unsigned int typecount;
MPI_Datatype PEDGE, types[2] = { MPI_FLOAT, MPI_INT };
MPI_Aint offsets[2], extent;
int blocklen[2] = { 1, 2 };
typecount = 2;
offsets[0] = 0;
MPI_Type_extent(MPI_FLOAT, &extent);
offsets[1] = (1*extent);
MPI_Type_struct (typecount, blocklen, offsets, types, &PEDGE);
MPI_Type_commit(&PEDGE);
When I do a sizeof(edge) I get 12 bytes, but I am getting only 8 bytes when I do sizeof(PEDGE)...why is that? Apart from this, my code for sending some elements of PEDGE type to arrays of edge type are failing, probably because of this mismatch.
The problem here is that an MPI_Datatype object such as PEDGE is not itself the new datatype, merely an opaque handle to some implementation-specific entity that MPI can interpret as a datatype. As such, sizeof() will not be able to return its accurate size. Use MPI_Type_size() instead.
As for the sends failing, I can't say much without seeing your code, but your datatype definition does look correct.
Related
I have a question that I found many threads in, but none did explicitly answer my question.
I am trying to have a multidimensional array inside the kernel of the GPU using thrust. Flattening would be difficult, as all the dimensions are non-homogeneous and I go up to 4D. Now I know I cannot have device_vectors of device_vectors, for whichever underlying reason (explanation would be welcome), so I tried going the way over raw-pointers.
My reasoning is, a raw pointer points onto memory on the GPU, why else would I be able to access it from within the kernel. So I should technically be able to have a device_vector, which holds raw pointers, all pointers that should be accessible from within the GPU. This way I constructed the following code:
thrust::device_vector<Vector3r*> d_fluidmodelParticlePositions(nModels);
thrust::device_vector<unsigned int***> d_allFluidNeighborParticles(nModels);
thrust::device_vector<unsigned int**> d_nFluidNeighborsCrossFluids(nModels);
for(unsigned int fluidModelIndex = 0; fluidModelIndex < nModels; fluidModelIndex++)
{
FluidModel *model = sim->getFluidModelFromPointSet(fluidModelIndex);
const unsigned int numParticles = model->numActiveParticles();
thrust::device_vector<Vector3r> d_neighborPositions(model->getPositions().begin(), model->getPositions().end());
d_fluidmodelParticlePositions[fluidModelIndex] = CudaHelper::GetPointer(d_neighborPositions);
thrust::device_vector<unsigned int**> d_fluidNeighborIndexes(nModels);
thrust::device_vector<unsigned int*> d_nNeighborsFluid(nModels);
for(unsigned int pid = 0; pid < nModels; pid++)
{
FluidModel *fm_neighbor = sim->getFluidModelFromPointSet(pid);
thrust::device_vector<unsigned int> d_nNeighbors(numParticles);
thrust::device_vector<unsigned int*> d_neighborIndexesArray(numParticles);
for(unsigned int i = 0; i < numParticles; i++)
{
const unsigned int nNeighbors = sim->numberOfNeighbors(fluidModelIndex, pid, i);
d_nNeighbors[i] = nNeighbors;
thrust::device_vector<unsigned int> d_neighborIndexes(nNeighbors);
for(unsigned int j = 0; j < nNeighbors; j++)
{
d_neighborIndexes[j] = sim->getNeighbor(fluidModelIndex, pid, i, j);
}
d_neighborIndexesArray[i] = CudaHelper::GetPointer(d_neighborIndexes);
}
d_fluidNeighborIndexes[pid] = CudaHelper::GetPointer(d_neighborIndexesArray);
d_nNeighborsFluid[pid] = CudaHelper::GetPointer(d_nNeighbors);
}
d_allFluidNeighborParticles[fluidModelIndex] = CudaHelper::GetPointer(d_fluidNeighborIndexes);
d_nFluidNeighborsCrossFluids[fluidModelIndex] = CudaHelper::GetPointer(d_nNeighborsFluid);
}
Now the compiler won't complain, but accessing for example d_nFluidNeighborsCrossFluids from within the kernel will work, but return wrong values. I access it like this (again, from within a kernel):
d_nFluidNeighborsCrossFluids[iterator1][iterator2][iterator3];
// Note: out of bounds indexing guaranteed to not happen, indexing is definitely right
The question is, why does it return wrong values? The logic behind it should work in my opinion, since my indexing is correct and the pointers should be valid addresses from within the kernel.
Thank you already for your time and have a great day.
EDIT:
Here is a minimal reproducable example. For some reason the values appear right despite of having the same structure as my code, but cuda-memcheck reveals some errors. Uncommenting the two commented lines leads me to my main problem I am trying to solve. What does the cuda-memcheck here tell me?
/* Part of this example has been taken from code of Robert Crovella
in a comment below */
#include <thrust/device_vector.h>
#include <stdio.h>
template<typename T>
static T* GetPointer(thrust::device_vector<T> &vector)
{
return thrust::raw_pointer_cast(vector.data());
}
__global__
void k(unsigned int ***nFluidNeighborsCrossFluids, unsigned int ****allFluidNeighborParticles){
const unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;
if(i > 49)
return;
printf("i: %d nNeighbors: %d\n", i, nFluidNeighborsCrossFluids[0][0][i]);
//for(int j = 0; j < nFluidNeighborsCrossFluids[0][0][i]; j++)
// printf("i: %d j: %d neighbors: %d\n", i, j, allFluidNeighborParticles[0][0][i][j]);
}
int main(){
const unsigned int nModels = 2;
const int numParticles = 50;
thrust::device_vector<unsigned int**> d_nFluidNeighborsCrossFluids(nModels);
thrust::device_vector<unsigned int***> d_allFluidNeighborParticles(nModels);
for(unsigned int fluidModelIndex = 0; fluidModelIndex < nModels; fluidModelIndex++)
{
thrust::device_vector<unsigned int*> d_nNeighborsFluid(nModels);
thrust::device_vector<unsigned int**> d_fluidNeighborIndexes(nModels);
for(unsigned int pid = 0; pid < nModels; pid++)
{
thrust::device_vector<unsigned int> d_nNeighbors(numParticles);
thrust::device_vector<unsigned int*> d_neighborIndexesArray(numParticles);
for(unsigned int i = 0; i < numParticles; i++)
{
const unsigned int nNeighbors = i;
d_nNeighbors[i] = nNeighbors;
thrust::device_vector<unsigned int> d_neighborIndexes(nNeighbors);
for(unsigned int j = 0; j < nNeighbors; j++)
{
d_neighborIndexes[j] = i + j;
}
d_neighborIndexesArray[i] = GetPointer(d_neighborIndexes);
}
d_nNeighborsFluid[pid] = GetPointer(d_nNeighbors);
d_fluidNeighborIndexes[pid] = GetPointer(d_neighborIndexesArray);
}
d_nFluidNeighborsCrossFluids[fluidModelIndex] = GetPointer(d_nNeighborsFluid);
d_allFluidNeighborParticles[fluidModelIndex] = GetPointer(d_fluidNeighborIndexes);
}
k<<<256, 256>>>(GetPointer(d_nFluidNeighborsCrossFluids), GetPointer(d_allFluidNeighborParticles));
if (cudaGetLastError() != cudaSuccess)
printf("Sync kernel error: %s\n", cudaGetErrorString(cudaGetLastError()));
cudaDeviceSynchronize();
}
A device_vector is a class definition. That class has various methods and operators associated with it. The thing that allows you to do this:
d_nFluidNeighborsCrossFluids[...]...;
is a square-bracket operator. That operator is a host operator (only). It is not usable in device code. Issues like this give rise to the general statements that "thrust::device_vector is not usable in device code." The device_vector object itself is generally not usable. However the data it contains is usable in device code, if you attempt to access it via a raw pointer.
Here is an example of a thrust device vector that contains an array of pointers to the data contained in other device vectors. That data is usable in device code, as long as you don't attempt to make use of the thrust::device_vector object itself:
$ cat t1509.cu
#include <thrust/device_vector.h>
#include <stdio.h>
template <typename T>
__global__ void k(T **data){
printf("the first element of vector 1 is: %d\n", (int)(data[0][0]));
printf("the first element of vector 2 is: %d\n", (int)(data[1][0]));
printf("the first element of vector 3 is: %d\n", (int)(data[2][0]));
}
int main(){
thrust::device_vector<int> vector_1(1,1);
thrust::device_vector<int> vector_2(1,2);
thrust::device_vector<int> vector_3(1,3);
thrust::device_vector<int *> pointer_vector(3);
pointer_vector[0] = thrust::raw_pointer_cast(vector_1.data());
pointer_vector[1] = thrust::raw_pointer_cast(vector_2.data());
pointer_vector[2] = thrust::raw_pointer_cast(vector_3.data());
k<<<1,1>>>(thrust::raw_pointer_cast(pointer_vector.data()));
cudaDeviceSynchronize();
}
$ nvcc -o t1509 t1509.cu
$ cuda-memcheck ./t1509
========= CUDA-MEMCHECK
the first element of vector 1 is: 1
the first element of vector 2 is: 2
the first element of vector 3 is: 3
========= ERROR SUMMARY: 0 errors
$
EDIT: In the mcve you have now posted, you point out that an ordinary run of the code appears to give correct results, but when you use cuda-memcheck, errors are reported. You have a general design problem that will cause this.
In C++, when an object is defined within a curly-braces region:
{
{
Object A;
// object A is in-scope here
}
// object A is out-of-scope here
}
// object A is out of scope here
k<<<...>>>(anything that points to something in object A); // is illegal
and you exit that region, the object defined within the region is now out of scope. For objects with constructors/destructors, this usually means the destructor of the object will be called when it goes out-of-scope. For a thrust::device_vector (or std::vector) this will deallocate any underlying storage associated with that vector. That does not necessarily "erase" any data, but attempts to use that data are illegal and would be considered UB (undefined behavior) in C++.
When you establish pointers to such data inside an in-scope region, and then go out-of-scope, those pointers no longer point to anything that would be legal to access, so attempts to dereference the pointer would be illegal/UB. Your code is doing this. Yes, it does appear to give the correct answer, because nothing is actually erased on deallocation, but the code design is illegal, and cuda-memcheck will highlight that.
I suppose one fix would be to pull all this stuff out of the inner curly-braces, and put it at main scope, just like the d_nFluidNeighborsCrossFluids device_vector is. But you might also want to rethink your general data organization strategy and flatten your data.
You should really provide a minimal, complete, verifiable/reproducible example; yours is neither minimal, nor complete, nor verifiable.
I will, however, answer your side-question:
I know I cannot have device_vectors of device_vectors, for whichever underlying reason (explanation would be welcome)
While a device_vector regards a bunch of data on the GPU, it's a host-side data structure - otherwise you would not have been able to use it in host-side code. On the host side, what it holds should be something like: The capacity, the size in elements, the device-side pointer to the actual data, and maybe more information. This is similar to how an std::vector variable may refer to data that's on the heap, but if you create the variable locally the fields I mentioned above will exist on the stack.
Now, those fields of the device vector that are located in host memory are not generally accessible from the device-side. In device-side code you would typically use the raw pointer to the device-side data the device_vector manages.
Also, note that if you have a thrust::device_vector<T> v, each use of operator[] means a bunch of separate CUDA calls to copy data to or from the device (unless there's some caching going on under the hoold). So you really want to avoid using square-brackets with this structure.
Finally, remember that pointer-chasing can be a performance killer, especially on a GPU. You might want to consider massaging your data structure somewhat in order to make it amenable to flattening.
I am trying to use PMPI wrapper to record some function parameters, e.g. MPI_Send's parameter. I need to record them and then I could use them to reconstruct content of all those parameters.
The wrapper for MPI_Send looks like this:
/* ================== C Wrappers for MPI_Send ================== */
_EXTERN_C_ int PMPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);
_EXTERN_C_ int MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) {
int _wrap_py_return_val = 0;
do_wrap_send_series((char *)"MPI_Send", buf, count, datatype, dest, tag, comm);
_wrap_py_return_val = PMPI_Send(buf, count, datatype, dest, tag, comm);
return _wrap_py_return_val;
}
The problem is that I couldn't record pointer's value and use it later on. Pointer could differ across runs.
At least MPI_Datatype is pointer type, correct me if I am wrong.
How do I find out MPI_Datatype is pointer type: Compile this, mpicc warns (on x86_64):
warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘struct ompi_datatype_t *’
The definition of struct ompi_datatype_t is:
struct ompi_datatype_t {
opal_datatype_t super; /**< Base opal_datatype_t superclass */
/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
int32_t id; /**< OMPI-layers unique id of the type */
int32_t d_f_to_c_index; /**< Fortran index for this datatype */
struct opal_hash_table_t *d_keyhash; /**< Attribute fields */
void* args; /**< Data description for the user */
void* packed_description; /**< Packed description of the datatype */
uint64_t pml_data; /**< PML-specific information */
/* --- cacheline 6 boundary (384 bytes) --- */
char name[MPI_MAX_OBJECT_NAME];/**< Externally visible name */
/* --- cacheline 7 boundary (448 bytes) --- */
/* size: 448, cachelines: 7, members: 7 */
};
typedef struct ompi_datatype_t ompi_datatype_t;
So it looks like each MPI_Datatype has a unique id.
So I tried to access the id filed with here. I got error:
error: dereferencing pointer to incomplete type ‘struct ompi_datatype_t’
ompi should be internal data structure. Is there any way to achive my goal?
Tool to generate PMPI wrapper: here
Generally speaking, MPI_Datatype is an opaque handler, so you cannot make any assumption, especially if your wrappers should be portable.
MPI_Datatype is indeed a pointer in Open MPI, but it is a number in MPICH iirc.
(older) Fortran uses integer in order to refer a datatype, so one option is to use the following subroutines
MPI_Fint MPI_Type_c2f(MPI_Datatype datatype);
MPI_Datatype MPI_Type_f2c(MPI_Fint datatype);
in order to convert between a MPI_Datatype and a MPI_Fint (an int unless you built Open MPI with 8 bytes Fortran integers)
That being said, if you want to compare datatypes between runs, you might want to consider these subroutines
int MPI_Type_set_name(MPI_Datatype type, const char *type_name);
int MPI_Type_get_name(MPI_Datatype type, char *type_name, int *resultlen);
So you do not have to worry about race conditions nor changing the sequence in which derived datatypes are created by your app.
I am using the D programming language.
I want to have a struct containing a multidimensional static array of ints initially filled with a non-zero value (in my case, zero is a valid entry, and I want to initially mark all entries as invalid).
As it is a struct, it can not have a default constructor.
Instead, I can supply a default value for the member of the struct.
The question is: how do I write this multidimensional array value in a short and readable way? Is there any convenience function, special-case syntax or idiom to do that?
Here is what I came up with.
import std.range;
import std.stdio;
struct S
{
static immutable int SIZE = 3;
static immutable int NA = -1;
int [SIZE] [SIZE] a = NA.repeat(SIZE).array().repeat(SIZE).array();
}
void main()
{
S s;
writeln(s);
}
This prints the array of -1s as expected:
S([[-1, -1, -1], [-1, -1, -1], [-1, -1, -1]])
Still, the expression NA.repeat(SIZE).array().repeat(SIZE).array() looks lengthy, and I suspect there could be a better (more idiomatic, more readable) way to express my intent.
Update with a few more attempts:
int [SIZE] [SIZE] a = NA; does not compile, even with the current beta: dmd-2.066-b2.
int [SIZE] [SIZE] a = NA.repeat (SIZE).array (); compiles and does the thing.
Still, the consistency suffers.
int [SIZE] [SIZE] a = [NA, NA, NA]; looks like it is essentially the above expression, simplified.
It compiles but fills only the first three-element subarray with NAs.
The other two subarrays contain some garbage-like stuff.
Is that a partial initialization feature of some kind?
To me, it looks more like a bug, like, compiler accepting invalid code.
int [SIZE] [SIZE] a = [NA]; sets the first subarray to [-1, 0, 0] and the rest to the same garbage as the previous attempt.
There is also fill in std.algorithm, but it works for ranges (not ranges of ranges), and does not look like it's readily usable in an initializer.
At least it won't be shorter.
What about something like this:
module main;
import std.stdio: writeln;
enum SIZE = 3;
enum NA = -1;
struct Int {
int v = -1;
alias v this;
}
struct S
{
Int [SIZE] [SIZE] a;
}
void main()
{
S s;
writeln(s);
}
I have written the code below on Qt,when I put values in it it program.exe stops working.
struct aim
{
int i : 1;
int j : 1;
};
int main()
{
aim missed;
printf("Enter value of i :: ");
scanf("%u",missed.i);
printf("Enter value of j :: ");
scanf("%u",missed.j);
}
can anyone help me out with this problem?
There are a few problems with your code:
A 1-bit signed integer isn't very useful, it can only hold the values -1 and 0.
You can't have a pointer to a bit-field, that's not what pointers mean.
Also, there's nothing in the %d specifier that tells the scanf() function that the target value is a bit field (nor is there any other % specifier that can do this, see 2).
The solution is to scanf() to a temporary variable, range-check the received value, then store it in the bit field.
Because the C/C++ standard does not allow to access the members of a bitfield via a pointer and you have to pass scanf a pointer.
people, i've an issue now..
#include <stdio.h>
#include <stdlib.h>
typedef struct a
{
int *aa;
int *bb;
struct b *wakata;
}a;
typedef struct b
{
int *you;
int *me;
}b;
int main()
{
a *aq;
aq = (a*)malloc(sizeof(a*));
*aq->wakata->you = 1;
*aq->wakata->me = 2;
free(aq);
return 0;
}
and compiled, then debugged :
gcc -o tes tes.c --debug
sapajabole#cintajangankaupergi:/tmp$ gdb -q ./tes
Reading symbols from /tmp/tes...done.
(gdb) r
Starting program: /tmp/tes
Program received signal SIGSEGV, Segmentation fault.
0x08048414 in main () at tes.c:22
22 *aq->wakata->you = 1;
well, the question is, how to set the value to variable inside struct 'b' through struct 'a' ?
anyone ?
The initial allocation of a is only allocating 4 bytes (in a 32-bit architecture). It should be:
aq = (a*)malloc(sizeof(a));
And wakata has not been initialized: Maybe this:
aq->wakata = (b*)malloc(sizeof(b));
And it will need a corresponding free as well prior to the free of aq.
free(aq->wakata);
And since you have pointers to the integers, those would also need to be allocated (you and me). But it is not clear if that is your goal. You probably should remove the * from the int declarations so that they are simply int members rather than the pointers to int.
Looks like you have a few mistakes here. See the code below.
In general a few things to keep in mind. You can't access memory before you malloc it. Also, there is a difference between memory and pointers e.g. int and int *
#include <stdio.h>
#include <stdlib.h>
typedef struct a
{
int aa;
int bb;
struct b *wakata;
}a;
typedef struct b
{
int you;
int me;
}b;
int main()
{
a * aq = malloc(sizeof(a));
aq->wakata = malloc(sizeof(b))
aq->wakata->you = 1;
aq->wakata->me = 2;
free(aq->wakata)
free(aq);
return 0;
}
wakata isn't pointing to any valid memory. You have to malloc memory for it, and then also for wakata->you and wakata->me
Pointers do not contain data. They point at data. That is why they are called pointers.
When you malloc enough space to store an a instance named aq, you allocate space for the pointers contained in that structure. You do not cause them to point at anything, nor do you allocate space to contain the things that they would point at.
You're not allocating space for b in struct a. You have defined 'a' as holding pointers, not structs. Also, I think malloc(sizeof(a*)) should be malloc(sizeof(a))
aq = (a*)malloc(sizeof(a)); // You should probably use calloc here
aq->wakata = (b*)malloc(sizeof(b));
you and me don't seem to need to be pointers, just normal ints
You have some problems with your code.
When you allocate memory for the struct a, you should do
aq = (a*)malloc(sizeof(a));
You now allocated memory for the struct a, but not for the struct b pointed by the wakata member, so you need to do
aq->wakata = (b*)malloc(sizeof(b));
Finally, in the struct b there should not be int* members, but int members. This way, you'll be able to correctly assign a value to them.
Remember that you should check for the correct allocation of memory by checking if the malloc return value is not NULL.