Segmentation fault when assigning integer to pointer - pointers

I'm trying to assign my node value to a pointer, but gdb gives me segmentation fault when the code is ran. What can I do?
void biggerPotion(No* node, int bottleSize, int *aux){
if(node == NULL)
return;
maiorPocao(node>left, bottleSize, aux);
maiorPocao(node->right, bottleSize, aux);
if((node->value >= garra) && (node-> value < *aux))
*aux = node->value; //here is the issue
}
Other relevant parts of the code are:
for(i=0; i< nBottles;i++){
a = 1000; //i declared that
biggerPotion(potions,bottleSize[i],&a);
}

Okay, since the errant line is:
*aux = node->value;
then either aux is the problem or node is (because they're the only two pointers being dereferenced on that line).
I would print them both out before executing that if block just to be certain:
fprintf(stderr, "node is %p, aux is %p\n", node, aux);
Given the large use of node and small use of aux, it's probably the latter that's causing the issue, in which case you should examine what you're passing to the top-level call of biggerPortion. You should post that top-level call, including the declaration of whatever variable you're passing in.
In any case, you can test that by simply changing:
*aux = node->value;
into:
{
int temp = node->value;
}
If the problem disappears then it's definitely the aux pointer being wrong somehow. Make sure you are actually passing in a pointer, such as with:
int myVar;
biggerPotion(rootNodePtr, 42, &myVar);

Related

Swap two adjacent nodes in a doubly linked list

I am learning the concept of pointer in C programming. I wrote a function as below to swap two adjacent nodes in a doubly-linked list;
void swapNode(DLListNode *a, DLListNode *b)
{
DLListNode *temp = a;
a->value = b->value;
b->value = temp->value;
}
and it doesn't work, as the value of b passes onto a successfully but, the value of a does not pass onto b. Then I found if I wrote the code like this, it works. Could someone please kindly explain the difference to me? Much appreciated.
void swapNode(DLListNode *a, DLListNode *b)
{
DLListNode temp = *a;
a->value = b->value;
b->value = temp.value;
}
The first version does not take a copy of the value that a points to. It merely creates a second reference to what a already references. When a->value gets a new value, then of course this is synonym to temp->value getting a new value.
In the second version, you create a node, which gets its properties from what a references. So here you do make a copy of a value property (and the next and prev properties). Now, when a->value gets changed, temp is unrelated to that change, and so temp.value is still what it was before that assignment to a->value. And that is exactly what you need to happen to make a successful swap.
It would even be possible to only copy the value property value, and not the node (which also has other properties like prev and next), since you really only need to have a copy of value; nothing else (I will assume here that value is an int):
void swapNode(DLListNode *a, DLListNode *b)
{
int value = a->value;
a->value = b->value;
b->value = value;
}

Why does thrust::device_vector not seem to have a chance to hold raw pointers to other device_vectors?

I have a question that I found many threads in, but none did explicitly answer my question.
I am trying to have a multidimensional array inside the kernel of the GPU using thrust. Flattening would be difficult, as all the dimensions are non-homogeneous and I go up to 4D. Now I know I cannot have device_vectors of device_vectors, for whichever underlying reason (explanation would be welcome), so I tried going the way over raw-pointers.
My reasoning is, a raw pointer points onto memory on the GPU, why else would I be able to access it from within the kernel. So I should technically be able to have a device_vector, which holds raw pointers, all pointers that should be accessible from within the GPU. This way I constructed the following code:
thrust::device_vector<Vector3r*> d_fluidmodelParticlePositions(nModels);
thrust::device_vector<unsigned int***> d_allFluidNeighborParticles(nModels);
thrust::device_vector<unsigned int**> d_nFluidNeighborsCrossFluids(nModels);
for(unsigned int fluidModelIndex = 0; fluidModelIndex < nModels; fluidModelIndex++)
{
FluidModel *model = sim->getFluidModelFromPointSet(fluidModelIndex);
const unsigned int numParticles = model->numActiveParticles();
thrust::device_vector<Vector3r> d_neighborPositions(model->getPositions().begin(), model->getPositions().end());
d_fluidmodelParticlePositions[fluidModelIndex] = CudaHelper::GetPointer(d_neighborPositions);
thrust::device_vector<unsigned int**> d_fluidNeighborIndexes(nModels);
thrust::device_vector<unsigned int*> d_nNeighborsFluid(nModels);
for(unsigned int pid = 0; pid < nModels; pid++)
{
FluidModel *fm_neighbor = sim->getFluidModelFromPointSet(pid);
thrust::device_vector<unsigned int> d_nNeighbors(numParticles);
thrust::device_vector<unsigned int*> d_neighborIndexesArray(numParticles);
for(unsigned int i = 0; i < numParticles; i++)
{
const unsigned int nNeighbors = sim->numberOfNeighbors(fluidModelIndex, pid, i);
d_nNeighbors[i] = nNeighbors;
thrust::device_vector<unsigned int> d_neighborIndexes(nNeighbors);
for(unsigned int j = 0; j < nNeighbors; j++)
{
d_neighborIndexes[j] = sim->getNeighbor(fluidModelIndex, pid, i, j);
}
d_neighborIndexesArray[i] = CudaHelper::GetPointer(d_neighborIndexes);
}
d_fluidNeighborIndexes[pid] = CudaHelper::GetPointer(d_neighborIndexesArray);
d_nNeighborsFluid[pid] = CudaHelper::GetPointer(d_nNeighbors);
}
d_allFluidNeighborParticles[fluidModelIndex] = CudaHelper::GetPointer(d_fluidNeighborIndexes);
d_nFluidNeighborsCrossFluids[fluidModelIndex] = CudaHelper::GetPointer(d_nNeighborsFluid);
}
Now the compiler won't complain, but accessing for example d_nFluidNeighborsCrossFluids from within the kernel will work, but return wrong values. I access it like this (again, from within a kernel):
d_nFluidNeighborsCrossFluids[iterator1][iterator2][iterator3];
// Note: out of bounds indexing guaranteed to not happen, indexing is definitely right
The question is, why does it return wrong values? The logic behind it should work in my opinion, since my indexing is correct and the pointers should be valid addresses from within the kernel.
Thank you already for your time and have a great day.
EDIT:
Here is a minimal reproducable example. For some reason the values appear right despite of having the same structure as my code, but cuda-memcheck reveals some errors. Uncommenting the two commented lines leads me to my main problem I am trying to solve. What does the cuda-memcheck here tell me?
/* Part of this example has been taken from code of Robert Crovella
in a comment below */
#include <thrust/device_vector.h>
#include <stdio.h>
template<typename T>
static T* GetPointer(thrust::device_vector<T> &vector)
{
return thrust::raw_pointer_cast(vector.data());
}
__global__
void k(unsigned int ***nFluidNeighborsCrossFluids, unsigned int ****allFluidNeighborParticles){
const unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;
if(i > 49)
return;
printf("i: %d nNeighbors: %d\n", i, nFluidNeighborsCrossFluids[0][0][i]);
//for(int j = 0; j < nFluidNeighborsCrossFluids[0][0][i]; j++)
// printf("i: %d j: %d neighbors: %d\n", i, j, allFluidNeighborParticles[0][0][i][j]);
}
int main(){
const unsigned int nModels = 2;
const int numParticles = 50;
thrust::device_vector<unsigned int**> d_nFluidNeighborsCrossFluids(nModels);
thrust::device_vector<unsigned int***> d_allFluidNeighborParticles(nModels);
for(unsigned int fluidModelIndex = 0; fluidModelIndex < nModels; fluidModelIndex++)
{
thrust::device_vector<unsigned int*> d_nNeighborsFluid(nModels);
thrust::device_vector<unsigned int**> d_fluidNeighborIndexes(nModels);
for(unsigned int pid = 0; pid < nModels; pid++)
{
thrust::device_vector<unsigned int> d_nNeighbors(numParticles);
thrust::device_vector<unsigned int*> d_neighborIndexesArray(numParticles);
for(unsigned int i = 0; i < numParticles; i++)
{
const unsigned int nNeighbors = i;
d_nNeighbors[i] = nNeighbors;
thrust::device_vector<unsigned int> d_neighborIndexes(nNeighbors);
for(unsigned int j = 0; j < nNeighbors; j++)
{
d_neighborIndexes[j] = i + j;
}
d_neighborIndexesArray[i] = GetPointer(d_neighborIndexes);
}
d_nNeighborsFluid[pid] = GetPointer(d_nNeighbors);
d_fluidNeighborIndexes[pid] = GetPointer(d_neighborIndexesArray);
}
d_nFluidNeighborsCrossFluids[fluidModelIndex] = GetPointer(d_nNeighborsFluid);
d_allFluidNeighborParticles[fluidModelIndex] = GetPointer(d_fluidNeighborIndexes);
}
k<<<256, 256>>>(GetPointer(d_nFluidNeighborsCrossFluids), GetPointer(d_allFluidNeighborParticles));
if (cudaGetLastError() != cudaSuccess)
printf("Sync kernel error: %s\n", cudaGetErrorString(cudaGetLastError()));
cudaDeviceSynchronize();
}
A device_vector is a class definition. That class has various methods and operators associated with it. The thing that allows you to do this:
d_nFluidNeighborsCrossFluids[...]...;
is a square-bracket operator. That operator is a host operator (only). It is not usable in device code. Issues like this give rise to the general statements that "thrust::device_vector is not usable in device code." The device_vector object itself is generally not usable. However the data it contains is usable in device code, if you attempt to access it via a raw pointer.
Here is an example of a thrust device vector that contains an array of pointers to the data contained in other device vectors. That data is usable in device code, as long as you don't attempt to make use of the thrust::device_vector object itself:
$ cat t1509.cu
#include <thrust/device_vector.h>
#include <stdio.h>
template <typename T>
__global__ void k(T **data){
printf("the first element of vector 1 is: %d\n", (int)(data[0][0]));
printf("the first element of vector 2 is: %d\n", (int)(data[1][0]));
printf("the first element of vector 3 is: %d\n", (int)(data[2][0]));
}
int main(){
thrust::device_vector<int> vector_1(1,1);
thrust::device_vector<int> vector_2(1,2);
thrust::device_vector<int> vector_3(1,3);
thrust::device_vector<int *> pointer_vector(3);
pointer_vector[0] = thrust::raw_pointer_cast(vector_1.data());
pointer_vector[1] = thrust::raw_pointer_cast(vector_2.data());
pointer_vector[2] = thrust::raw_pointer_cast(vector_3.data());
k<<<1,1>>>(thrust::raw_pointer_cast(pointer_vector.data()));
cudaDeviceSynchronize();
}
$ nvcc -o t1509 t1509.cu
$ cuda-memcheck ./t1509
========= CUDA-MEMCHECK
the first element of vector 1 is: 1
the first element of vector 2 is: 2
the first element of vector 3 is: 3
========= ERROR SUMMARY: 0 errors
$
EDIT: In the mcve you have now posted, you point out that an ordinary run of the code appears to give correct results, but when you use cuda-memcheck, errors are reported. You have a general design problem that will cause this.
In C++, when an object is defined within a curly-braces region:
{
{
Object A;
// object A is in-scope here
}
// object A is out-of-scope here
}
// object A is out of scope here
k<<<...>>>(anything that points to something in object A); // is illegal
and you exit that region, the object defined within the region is now out of scope. For objects with constructors/destructors, this usually means the destructor of the object will be called when it goes out-of-scope. For a thrust::device_vector (or std::vector) this will deallocate any underlying storage associated with that vector. That does not necessarily "erase" any data, but attempts to use that data are illegal and would be considered UB (undefined behavior) in C++.
When you establish pointers to such data inside an in-scope region, and then go out-of-scope, those pointers no longer point to anything that would be legal to access, so attempts to dereference the pointer would be illegal/UB. Your code is doing this. Yes, it does appear to give the correct answer, because nothing is actually erased on deallocation, but the code design is illegal, and cuda-memcheck will highlight that.
I suppose one fix would be to pull all this stuff out of the inner curly-braces, and put it at main scope, just like the d_nFluidNeighborsCrossFluids device_vector is. But you might also want to rethink your general data organization strategy and flatten your data.
You should really provide a minimal, complete, verifiable/reproducible example; yours is neither minimal, nor complete, nor verifiable.
I will, however, answer your side-question:
I know I cannot have device_vectors of device_vectors, for whichever underlying reason (explanation would be welcome)
While a device_vector regards a bunch of data on the GPU, it's a host-side data structure - otherwise you would not have been able to use it in host-side code. On the host side, what it holds should be something like: The capacity, the size in elements, the device-side pointer to the actual data, and maybe more information. This is similar to how an std::vector variable may refer to data that's on the heap, but if you create the variable locally the fields I mentioned above will exist on the stack.
Now, those fields of the device vector that are located in host memory are not generally accessible from the device-side. In device-side code you would typically use the raw pointer to the device-side data the device_vector manages.
Also, note that if you have a thrust::device_vector<T> v, each use of operator[] means a bunch of separate CUDA calls to copy data to or from the device (unless there's some caching going on under the hoold). So you really want to avoid using square-brackets with this structure.
Finally, remember that pointer-chasing can be a performance killer, especially on a GPU. You might want to consider massaging your data structure somewhat in order to make it amenable to flattening.

Understanding the method for OpenCL reduction on float

Following this link, I try to understand the operating of kernel code (there are 2 versions of this kernel code, one with volatile local float *source and the other with volatile global float *source, i.e local and global versions). Below I take local version :
float sum=0;
void atomic_add_local(volatile local float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while (atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
Afterwards, if I take a work-item, the code will add operand value to newVal.floatVal variable. Then, after this operation, I call atomic_cmpxchg function which check if previous assignment (preVal.floatVal = *source; and newVal.floatVal = prevVal.floatVal + operand; ) has been done, i.e by comparing the value stored at address source with the preVal.intVal.
During this atomic operation (which is not uninterruptible by definition), as value stored at source is different from prevVal.intVal, the new value stored at source is newVal.intVal, which is actually a float (because it is coded on 4 bytes like integer).
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
But for each work-item thread, is there only one iteration into the while loop?
I think there will be one iteration because the comparison "*source== prevVal.int ? newVal.intVal : newVal.intVal" will always assign newVal.intVal value to value stored at source address, won't it?
I have not understood all the subtleties of this trick for this kernel code.
Update
Sorry, I almost understand all the subtleties, especially in the while loop :
First case : for a given single thread, before the call of atomic_cmpxchg, if prevVal.floatVal is still equal to *source, then atomic_cmpxchg will change the value contained in source pointer and return the value contained in old pointer, which is equal to prevVal.intVal, so we break from the while loop.
Second case : If between the prevVal.floatVal = *source; instruction and the call of atomic_cmpxchg, the value *source has changed (by another thread ??) then atomic_cmpxchg returns old value which is no more equal to prevVal.floatVal, so the condition into while loop is true and we stay in this loop until previous condition isn't checked any more.
Is my interpretation correct?
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
volatile is a keyword of the C language that prevents the compiler from optimizing accesses to a specific location in memory (in other words, force a load/store at each read/write of said memory location). It has no impact on the ownership of the underlying storage. Here, it is used to force the compiler to re-read source from memory at each loop iteration (otherwise the compiler would be allowed to move that load outside the loop, which breaks the algorithm).
do {
prevVal.floatVal = *source; // Force read, prevent hoisting outside loop.
newVal.floatVal = prevVal.floatVal + operand;
} while(atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal)
After removing qualifiers (for simplicity) and renaming parameters, the signature of atomic_cmpxchg is the following:
int atomic_cmpxchg(int *ptr, int expected, int new)
What it does is:
atomically {
int old = *ptr;
if (old == expected) {
*ptr = new;
}
return old;
}
To summarize, each thread, individually, does:
Load current value of *source from memory into preVal.floatVal
Compute desired value of *source in newVal.floatVal
Execute the atomic compare-exchange described above (using the type-punned values)
If the result of atomic_cmpxchg == newVal.intVal, it means the compare-exchange was successful, break. Otherwise, the exchange didn't happen, go to 1 and try again.
The above loop eventually terminates, because eventually, each thread succeeds in doing their atomic_cmpxchg.
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
Mutexes are locks, while this is a lock-free algorithm. OpenCL can simulate mutexes with spinlocks (also implemented with atomics) but this is not one.

Memory leak, Pointer changing reference

I'm writing some signal processing routine, using the PortAudio library. I'm using a
stucture which contains a pointer to float which is intended to be used as a buffer. I then pass it to an audio callback function.
My problem is that after callback processing is finished, my pointer has changed reference and thus cannot be freed. This is not such a big deal but the thing is that I don't understand when and how the pointer reference is changed and I'm getting a feel like I'm missing something important.
Here is a simplified version of the code :
typedef struct{
float* tmp;
//other stuff
} Data;
Data data;
data.tmp = NULL;
data.tmp = (float*) calloc(N,sizeof(float));// N is the size of the buffer
Pa_OpenDefaultStream(some args, //opens a PortAudio stream and passes tmp to callback
callback,
&data );
A stream is then started in another high priority thread and the callback is being executed as many times as needed. During callback tmp is being used as a ring buffer and is constantly being copied new data to.
static int callback(args,void* data){
Data* x = (Data*) tmp;
x->tmp = update();
}
where update() returns a pointer to a float which is initialized the same way as tmp is (calloc).
float* update(){
//do stuff
return m_tmp2;
}
float* m_tmp2 = (float*) calloc(N,sizeof(float));//same N as before
But after the stream is closed I get an error when calling free before quitting.
free(data.tmp);//throws a SIGABRT error
Some breakpoint debugging showed me that the reference of the pointer is being changed during the callback processing, but I don't get when and how it happens because everything else runs smoothly. It must be something during the callback execution, but I'm sure update() returns a pointer that is the same size as tmp. Or is it link with PortAudio ?
Please, any clues ?
Not really sure if I understand it right. You allocated the float (x.tmp) every time the callback function is called..
static int callback(args,void* data){
Data* x = (Data*) tmp;
x->tmp = update();
}
I assume the above is typo, you actually mean
static int callback(args,void* data){
Data* x = (Data*) data;
x->tmp = update();
}
Well, you're actually change the pointer value of tmp by assigning it update() because it's reallocate a new memory location in heap and changed the pointing location of the tmp..
float* update(){
//do stuff
return m_tmp2;
}
The data.tmp must have pointed to a new location every time the callback function is called.. So, I don't see why it doesn't behave as you described..
That's the correct behavior already.. Maybe I miss anything?
and maybe you should provide a mechanism to keep track of the buffer.. so all tmp (float *) you allocate for your circular buffer can be freed (not just the first one before the first callback is called..

pointer vs double pointer for Linked List and Binary Tree

For single linklist
1.1. This is what I saw from a tutorial, I only wrote the important part.
sortedInsert(Node **root, int key){};
int main(){
Node *root = &a;
sortedInsert(&root, 4);
}
1.2. However I just used pointer rather than double pointer, and everything works fine, I can insert the key successfully.
sortedInsert(Node *root, int key){};
int main(){
Node *root = &a;
sortedInsert(root, 4);
}
For binary Tree
2.1. From tutorial(double pointer)
void insert_Tree(Tree **root, int key){
}
int main(){
Tree *root = NULL;
insert_Tree(&root, 10);
}
2.2. what I did is below, and I failed to insert the key, when I checked the node after insertion, the node is still null.(single pointer)
void insert_Tree(Tree *root, int key){
if(root == NULL){
root = (Tree *)malloc(sizeof(Tree));
root->val = key;
root->left = NULL;
root->right = NULL;
cout<<"insert data "<<key<<endl;
}else if(key< root->val){
insert_Tree(root->left, key);
cout<<"go left"<<endl;
}else{
insert_Tree(root->right, key);
cout<<"go right"<<endl;
}
}
int main(){
Tree *root = NULL;
insert_Tree(root, 10);
}
I have a few questions
1). which is right, 1.1/2.1 double pointer or 1.2/2.2 single pointer? Please explain in detail, it could be better if you can show an example, I think both of them are right.
2). Why did I insert key successfully in the linkedlist with single pointer, however I failed in the tree insertion with single pointer?
Thanks very much, I appreciate everyone's help.
I suspect you were lucky with your linked list test. Try inserting something at the head of the list.
To expand on that...
main() has a pointer to the head of the list which it passes by value into your version of sortedInsert(). If sortedInsert() inserts into the middle or end of the list then no problem, the head is not changed and when it returns to main() the head is the same. However, if your version of sortedInsert() has to insert a new head, fine it can do that, but how does it return the information about the new head back to main()? It can't, when it returns to main() main will still be pointing at the old head.
Passing a pointer to main()'s copy of the head pointer allows sortedInsert() to change its value if it has to.
both your approaches are correct.But where you used a single pointer ,your head pointer isn't being updated.All you need to do is return the new head by writing 'return head;' at the end of your function,

Resources