Pass double pointer in a struct to CUDA - pointers

I've got the following struct:
struct Param
{
double** K_RP;
};
And I wanna perform the following operations on "K_RP" in CUDA
__global__ void Test( struct Param prop)
{
int ix = threadIdx.x;
int iy = threadIdx.y;
prop.K_RP[ix][iy]=2.0;
}
If "prop" has the following form, how should I do my "cudaMalloc" and "cudaMemcpy" operations?
int main( )
{
Param prop;
Param cuda_prop;
prop.K_RP=alloc2D(Imax,Jmax);
//cudaMalloc cuda_prop ?
//cudaMemcpyH2D prop to cuda_prop ?
Test<<< (1,1), (Imax,Jmax)>>> ( cuda_prop);
//cudaMemcpyD2H cuda_prop to prop ?
return (0);
}

Questions like this get asked from time to time. If you search on the cuda tag, you'll find a variety of examples with answers. Here's one example.
In general, dynamically allocated data contained within structures or other objects requires special handling. This question/answer explains why and how to do it for the single pointer (*) case.
Handling double pointers (**) is difficult enough that most people would recommend "flattening" the storage so that it can be handled by reference with a single pointer (*). If you really want to see how the double pointer (**) method works, review this question/answer. It's not trivial.

Related

Why does thrust::device_vector not seem to have a chance to hold raw pointers to other device_vectors?

I have a question that I found many threads in, but none did explicitly answer my question.
I am trying to have a multidimensional array inside the kernel of the GPU using thrust. Flattening would be difficult, as all the dimensions are non-homogeneous and I go up to 4D. Now I know I cannot have device_vectors of device_vectors, for whichever underlying reason (explanation would be welcome), so I tried going the way over raw-pointers.
My reasoning is, a raw pointer points onto memory on the GPU, why else would I be able to access it from within the kernel. So I should technically be able to have a device_vector, which holds raw pointers, all pointers that should be accessible from within the GPU. This way I constructed the following code:
thrust::device_vector<Vector3r*> d_fluidmodelParticlePositions(nModels);
thrust::device_vector<unsigned int***> d_allFluidNeighborParticles(nModels);
thrust::device_vector<unsigned int**> d_nFluidNeighborsCrossFluids(nModels);
for(unsigned int fluidModelIndex = 0; fluidModelIndex < nModels; fluidModelIndex++)
{
FluidModel *model = sim->getFluidModelFromPointSet(fluidModelIndex);
const unsigned int numParticles = model->numActiveParticles();
thrust::device_vector<Vector3r> d_neighborPositions(model->getPositions().begin(), model->getPositions().end());
d_fluidmodelParticlePositions[fluidModelIndex] = CudaHelper::GetPointer(d_neighborPositions);
thrust::device_vector<unsigned int**> d_fluidNeighborIndexes(nModels);
thrust::device_vector<unsigned int*> d_nNeighborsFluid(nModels);
for(unsigned int pid = 0; pid < nModels; pid++)
{
FluidModel *fm_neighbor = sim->getFluidModelFromPointSet(pid);
thrust::device_vector<unsigned int> d_nNeighbors(numParticles);
thrust::device_vector<unsigned int*> d_neighborIndexesArray(numParticles);
for(unsigned int i = 0; i < numParticles; i++)
{
const unsigned int nNeighbors = sim->numberOfNeighbors(fluidModelIndex, pid, i);
d_nNeighbors[i] = nNeighbors;
thrust::device_vector<unsigned int> d_neighborIndexes(nNeighbors);
for(unsigned int j = 0; j < nNeighbors; j++)
{
d_neighborIndexes[j] = sim->getNeighbor(fluidModelIndex, pid, i, j);
}
d_neighborIndexesArray[i] = CudaHelper::GetPointer(d_neighborIndexes);
}
d_fluidNeighborIndexes[pid] = CudaHelper::GetPointer(d_neighborIndexesArray);
d_nNeighborsFluid[pid] = CudaHelper::GetPointer(d_nNeighbors);
}
d_allFluidNeighborParticles[fluidModelIndex] = CudaHelper::GetPointer(d_fluidNeighborIndexes);
d_nFluidNeighborsCrossFluids[fluidModelIndex] = CudaHelper::GetPointer(d_nNeighborsFluid);
}
Now the compiler won't complain, but accessing for example d_nFluidNeighborsCrossFluids from within the kernel will work, but return wrong values. I access it like this (again, from within a kernel):
d_nFluidNeighborsCrossFluids[iterator1][iterator2][iterator3];
// Note: out of bounds indexing guaranteed to not happen, indexing is definitely right
The question is, why does it return wrong values? The logic behind it should work in my opinion, since my indexing is correct and the pointers should be valid addresses from within the kernel.
Thank you already for your time and have a great day.
EDIT:
Here is a minimal reproducable example. For some reason the values appear right despite of having the same structure as my code, but cuda-memcheck reveals some errors. Uncommenting the two commented lines leads me to my main problem I am trying to solve. What does the cuda-memcheck here tell me?
/* Part of this example has been taken from code of Robert Crovella
in a comment below */
#include <thrust/device_vector.h>
#include <stdio.h>
template<typename T>
static T* GetPointer(thrust::device_vector<T> &vector)
{
return thrust::raw_pointer_cast(vector.data());
}
__global__
void k(unsigned int ***nFluidNeighborsCrossFluids, unsigned int ****allFluidNeighborParticles){
const unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;
if(i > 49)
return;
printf("i: %d nNeighbors: %d\n", i, nFluidNeighborsCrossFluids[0][0][i]);
//for(int j = 0; j < nFluidNeighborsCrossFluids[0][0][i]; j++)
// printf("i: %d j: %d neighbors: %d\n", i, j, allFluidNeighborParticles[0][0][i][j]);
}
int main(){
const unsigned int nModels = 2;
const int numParticles = 50;
thrust::device_vector<unsigned int**> d_nFluidNeighborsCrossFluids(nModels);
thrust::device_vector<unsigned int***> d_allFluidNeighborParticles(nModels);
for(unsigned int fluidModelIndex = 0; fluidModelIndex < nModels; fluidModelIndex++)
{
thrust::device_vector<unsigned int*> d_nNeighborsFluid(nModels);
thrust::device_vector<unsigned int**> d_fluidNeighborIndexes(nModels);
for(unsigned int pid = 0; pid < nModels; pid++)
{
thrust::device_vector<unsigned int> d_nNeighbors(numParticles);
thrust::device_vector<unsigned int*> d_neighborIndexesArray(numParticles);
for(unsigned int i = 0; i < numParticles; i++)
{
const unsigned int nNeighbors = i;
d_nNeighbors[i] = nNeighbors;
thrust::device_vector<unsigned int> d_neighborIndexes(nNeighbors);
for(unsigned int j = 0; j < nNeighbors; j++)
{
d_neighborIndexes[j] = i + j;
}
d_neighborIndexesArray[i] = GetPointer(d_neighborIndexes);
}
d_nNeighborsFluid[pid] = GetPointer(d_nNeighbors);
d_fluidNeighborIndexes[pid] = GetPointer(d_neighborIndexesArray);
}
d_nFluidNeighborsCrossFluids[fluidModelIndex] = GetPointer(d_nNeighborsFluid);
d_allFluidNeighborParticles[fluidModelIndex] = GetPointer(d_fluidNeighborIndexes);
}
k<<<256, 256>>>(GetPointer(d_nFluidNeighborsCrossFluids), GetPointer(d_allFluidNeighborParticles));
if (cudaGetLastError() != cudaSuccess)
printf("Sync kernel error: %s\n", cudaGetErrorString(cudaGetLastError()));
cudaDeviceSynchronize();
}
A device_vector is a class definition. That class has various methods and operators associated with it. The thing that allows you to do this:
d_nFluidNeighborsCrossFluids[...]...;
is a square-bracket operator. That operator is a host operator (only). It is not usable in device code. Issues like this give rise to the general statements that "thrust::device_vector is not usable in device code." The device_vector object itself is generally not usable. However the data it contains is usable in device code, if you attempt to access it via a raw pointer.
Here is an example of a thrust device vector that contains an array of pointers to the data contained in other device vectors. That data is usable in device code, as long as you don't attempt to make use of the thrust::device_vector object itself:
$ cat t1509.cu
#include <thrust/device_vector.h>
#include <stdio.h>
template <typename T>
__global__ void k(T **data){
printf("the first element of vector 1 is: %d\n", (int)(data[0][0]));
printf("the first element of vector 2 is: %d\n", (int)(data[1][0]));
printf("the first element of vector 3 is: %d\n", (int)(data[2][0]));
}
int main(){
thrust::device_vector<int> vector_1(1,1);
thrust::device_vector<int> vector_2(1,2);
thrust::device_vector<int> vector_3(1,3);
thrust::device_vector<int *> pointer_vector(3);
pointer_vector[0] = thrust::raw_pointer_cast(vector_1.data());
pointer_vector[1] = thrust::raw_pointer_cast(vector_2.data());
pointer_vector[2] = thrust::raw_pointer_cast(vector_3.data());
k<<<1,1>>>(thrust::raw_pointer_cast(pointer_vector.data()));
cudaDeviceSynchronize();
}
$ nvcc -o t1509 t1509.cu
$ cuda-memcheck ./t1509
========= CUDA-MEMCHECK
the first element of vector 1 is: 1
the first element of vector 2 is: 2
the first element of vector 3 is: 3
========= ERROR SUMMARY: 0 errors
$
EDIT: In the mcve you have now posted, you point out that an ordinary run of the code appears to give correct results, but when you use cuda-memcheck, errors are reported. You have a general design problem that will cause this.
In C++, when an object is defined within a curly-braces region:
{
{
Object A;
// object A is in-scope here
}
// object A is out-of-scope here
}
// object A is out of scope here
k<<<...>>>(anything that points to something in object A); // is illegal
and you exit that region, the object defined within the region is now out of scope. For objects with constructors/destructors, this usually means the destructor of the object will be called when it goes out-of-scope. For a thrust::device_vector (or std::vector) this will deallocate any underlying storage associated with that vector. That does not necessarily "erase" any data, but attempts to use that data are illegal and would be considered UB (undefined behavior) in C++.
When you establish pointers to such data inside an in-scope region, and then go out-of-scope, those pointers no longer point to anything that would be legal to access, so attempts to dereference the pointer would be illegal/UB. Your code is doing this. Yes, it does appear to give the correct answer, because nothing is actually erased on deallocation, but the code design is illegal, and cuda-memcheck will highlight that.
I suppose one fix would be to pull all this stuff out of the inner curly-braces, and put it at main scope, just like the d_nFluidNeighborsCrossFluids device_vector is. But you might also want to rethink your general data organization strategy and flatten your data.
You should really provide a minimal, complete, verifiable/reproducible example; yours is neither minimal, nor complete, nor verifiable.
I will, however, answer your side-question:
I know I cannot have device_vectors of device_vectors, for whichever underlying reason (explanation would be welcome)
While a device_vector regards a bunch of data on the GPU, it's a host-side data structure - otherwise you would not have been able to use it in host-side code. On the host side, what it holds should be something like: The capacity, the size in elements, the device-side pointer to the actual data, and maybe more information. This is similar to how an std::vector variable may refer to data that's on the heap, but if you create the variable locally the fields I mentioned above will exist on the stack.
Now, those fields of the device vector that are located in host memory are not generally accessible from the device-side. In device-side code you would typically use the raw pointer to the device-side data the device_vector manages.
Also, note that if you have a thrust::device_vector<T> v, each use of operator[] means a bunch of separate CUDA calls to copy data to or from the device (unless there's some caching going on under the hoold). So you really want to avoid using square-brackets with this structure.
Finally, remember that pointer-chasing can be a performance killer, especially on a GPU. You might want to consider massaging your data structure somewhat in order to make it amenable to flattening.

C functions returning an array

Sorry for the post. I have researched this but..... still no joy in getting this to work. There are two parts to the question too. Please ignore the code TWI Reg code as its application specific I need help on nuts and bolts C problem.
So... to reduce memory usage for a project I have started to write my own TWI (wire.h lib) for ATMEL328p. Its not been put into a lib yet as '1' I have no idea how to do that yet... will get to that later and '2'its a work in progress which keeps getting added to.
The problem I'm having is with reading multiple bytes.
Problem 1
I have a function that I need to return an Array
byte *i2cBuff1[16];
void setup () {
i2cBuff1 = i2cReadBytes(mpuAdd, 0x6F, 16);
}
/////////////////////READ BYTES////////////////////
byte* i2cReadBytes(byte i2cAdd, byte i2cReg, byte i2cNumBytes) {
static byte result[i2cNumBytes];
for (byte i = 0; i < i2cNumBytes; i ++) {
result[i] += i2cAdd + i2cReg;
}
return result;
}
What I understand :o ) is I have declared a Static byte array in the function which I point to as the return argument of the function.
The function call requests the return of a pointer value for a byte array which is supplied.
Well .... it doesn't work .... I have checked multiple sites and I think this should work. The error message I get is:
MPU6050_I2C_rev1:232: error: incompatible types in assignment of 'byte* {aka unsigned char*}' to 'byte* [16] {aka unsigned char* [16]}'
i2cBuff1 = i2cReadBytes(mpuAdd, 0x6F, 16);
Problem 2
Ok say IF the code sample above worked. I am trying to reduce the amount of memory that I use in my sketch. By using any memory in the function even though the memory (need) is released after the function call, the function must need to reserve an amount of 'space' in some way, for when the function is called. Ideally I would like to avoid the use of static variables within the function that are duplicated within the main program.
Does anyone know the trade off with repeated function call.... i.e looping a function call with a bit shift operator, as apposed to calling a function once to complete a process and return ... an Array? Or was this this the whole point that C does not really support Array return in the first place.
Hope this made sense, just want to get the best from the little I got.
BR
Danny
This line:
byte *i2cBuff1[16];
declares i2cBuff1 as an array of 16 byte* pointers. But i2cReadBytes doesn't return an array of pointers, it returns an array of bytes. The declaration should be:
byte *i2cBuff1;
Another problem is that a static array can't have a dynamic size. A variable-length array has to be an automatic array, so that its size can change each time the function is called. You should use dynamic allocation with malloc() (I used calloc() instead because it automatically zeroes the memory).
byte* i2cReadBytes(byte i2cAdd, byte i2cReg, byte i2cNumBytes) {
byte *result = calloc(i2cNumBytes, sizeof(byte));
for (byte i = 0; i < i2cNumBytes; i ++) {
result[i] += i2cAdd + i2cReg;
}
return result;
}

c - Array of pointer to functions, having different number of arguments

I am making a simple scheduler that executes functions contained in a FIFO queue.
Those functions have a same return type int, but have different number of int arguments.
I tried to implement it this way, but it does not seem to work. The compiler forbids conversion between int(*)() , int(*)(int), int(*)(int, int), or to any of those sort. (Arduino Sketch compiler)
Is there a way to solve this problem, or could you recommend a better way around? Thanks!
My code:
typedef int (*fnptr)(); // Tried this!
int foo(int var) {
return 0;
}
int main() {
fnptr fp = &foo; // error: invalid conversion from
// 'int (*)(int)' to 'int (*)()'
// [-fpermissive]
return 0;
}
You can cast:
fnptr fp = reinterpret_cast<fnptr>(foo);
The ()s are the "function call operator", adding them makes no sense at all in this situation, it changes the expression from "take the address of this function" to "take the address of this function's return value".
Note that aboev I don't even include the &, this is because the name of a function acts pretty much like a function pointer so it's already an address.

go tour when to not use pointer to struct literal in a variable

Per the Go tour page 28 and page 53
They show a variable that is a pointer to a struct literal. Why is this not the default behavior? I'm unfamiliar with C, so it's hard to wrap my head around it. The only time I can see when it might not be more beneficial to use a pointer is when the struct literal is unique, and won't be in use for the rest program and so you would want it to be garbage collected as soon as possible. I'm not even sure if a modern language like Go even works that way.
My question is this. When should I assign a pointer to a struct literal to a variable, and when should I assign the struct literal itself?
Thanks.
Using a pointer instead of just a struct literal is helpful when
the struct is big and you pass it around
you want to share it, that is that all modifications affect your struct instead of affecting a copy
In other cases, it's fine to simply use the struct literal. For a small struct, you can think about the question just as using an int or an *int : most of the times the int is fine but sometimes you pass a pointer so that the receiver can modify your int variable.
In the Go tour exercises you link to, the Vertex struct is small and has about the same semantic than any number. In my opinion it would have been fine to use it as struct directly and to define the Scaled function in #53 like this :
func (v Vertex) Scaled(f float64) Vertex {
v.X = v.X * f
v.Y = v.Y * f
return v
}
because having
v2 := v1.Scaled(5)
would create a new vertex just like
var f2 float32 = f1 * 5
creates a new float.
This is similar to how is handled the standard Time struct (defined here), which is usually kept in variables of type Time and not *Time.
But there is no definite rule and, depending on the use, I could very well have kept both Scale and Scaled.
You're probably right that most of the time you want pointers, but personally I find the need for an explicit pointer refreshing. It makes it so there's no difference between int and MyStruct. They behave the same way.
If you compare this to C# - a language which implements what you are suggesting - I find it confusing that the semantics of this:
static void SomeFunction(Point p)
{
p.x = 1;
}
static void Main()
{
Point p = new Point();
SomeFunction(p);
// what is p.x?
}
Depend on whether or not Point is defined as a class or a struct.

CLI/C++: void* to System::Object

This is a similar question to this SO post, which I have been unable to use to solve my problem. I have included some code here, which will hopefully help someone to bring home the message that the other posting was getting at.
I want to write a CLI/C++ method that can take a void pointer as a parameter and return the managed object (whose type I know) that it points to. I have a managed struct:
public ref struct ManagedStruct { double a; double b;};
The method I am trying to write, which takes a void pointer to the managed struct as a parameter and returns the struct.
ManagedStruct^ VoidPointerToObject(void* data)
{
Object^ result = Marshal::PtrToStructure(IntPtr(data), Object::typeid);
return (ManagedStruct^)result;
}
The method is called here:
int main(array<System::String ^> ^args)
{
// The instance of the managed type is created:
ManagedStruct^ myData = gcnew ManagedStruct();
myData->a = 1; myData->b = 2;
// Suppose there was a void pointer that pointed to this managed struct
void* voidPtr = &myData;
//A method to return the original struct from the void pointer
Object^ result = VoidPointerToObject(voidPtr);
return 0;
}
It crashes in the VoidPointerToObject method on calling PtrToStructure , with the error: The specified structure must be blittable or have layout information
I know this is an odd thing to do, but it is a situation I have encountered a few times, especially when unmanaged code makes a callback to managed code and passes a void* as a parameter.
(original explanation below)
If you need to pass a managed handle as a void* through native code, you should use
void* voidPtr = GCHandle::ToIntPtr(GCHandle::Alloc(o)).ToPointer();
// ...
GCHandle h = GCHandle::FromIntPtr(IntPtr(voidPtr));
Object^ result = h.Target;
h.Free();
(or use the C++/CLI helper class gcroot)
Marshal::PtrToStructure works on value types.
In C++/CLI, that means value class or value struct. You are using ref struct, which is a reference type despite use of the keyword struct.
A related problem:
void* voidPtr = &myData;
doesn't point to the object, it points to the handle.
In order to create a native pointer to data on the managed heap, you need to use pinning. For this reason, conversion between void* and Object^ isn't as useful as first glance suggests.

Resources