Passing a pointer based struct to cuda - pointers

I have C code which uses a pointer to a struct. I'm trying to figure out how to pass it to cuda without much luck.
I have
typedef struct node { /* describes a tip species or an ancestor */
struct node *next, *back; /* pointers to nodes */
etc...
} node;
Then
typedef node **pointptr;
static pointptr treenode;
In my code I iterate through all of these, and I'm trying to figure out how to pass them to the kernel so I can perform the following operation:
for (i = 1; i <= nonodes; i++) {
treenode[i - 1]->back = NULL;
etc....
}
But I can't figure out how to pass it.
Any ideas?

The problem is that in order to use your tree inside the kernel, your next and back should probably point somewhere in device memory. Assuming you construct your tree on the host and then pass it, you could do something like:
node* traverse(node*n){
if (n==NULL)
return NULL;
node x, *d;
x.back = traverse(n->back);
x.next = traverse(n->next);
cudaMalloc(&d, sizeof(node));
cudaMemcpy(d, &x, sizeof(node), cudaMemcpyHostToDevice);
return d;
}
and by calling it on the root you'd end up with a pointer to the root of the tree in device memory, which you could pass to your kernel directly. I haven't tested this code, and you'd have to write something similar to delete the tree afterwards.
Alternatively, you could store your tree nodes contiguously inside an array, with indices in the back and next instead of pointers (possibly changing them back to pointers in device code if necessary).

Check this question:
Copying a multi-branch tree to GPU memory
Although it does not answer your question exactly, I think it may clear some things out and ultimately help you tackle your problem.

Related

OpenCL - Storing a large array in private memory

I have a large array of float called source_array with the size of around 50.000. I am current trying to implement a collections of modifications on the array and evaluate it. Basically in pseudo code:
__kernel void doSomething (__global float *source_array, __global boolean *res. __global int *mod_value) {
// Modify values of source_array with mod_value;
// Evaluate the modified array.
}
So in the process I would need to have a variable to hold modified array, because source_array should be a constant for all work item, if i modify it directly it might interfere with another work item (not sure if I am right here).
The problem is the array is too big for private memory therefore I can't initialize in kernel code. What should I do in this case ?
I considered putting another parameter into the method, serves as place holder for modified array, but again it would intefere with another work items.
Private "memory" on GPUs literally consists of registers, which generally are in short supply. So the __private address space in OpenCL is not suitable for this as I'm sure you've found.
Victor's answer is correct - if you really need temporary memory for each work item, you will need to create a (global) buffer object. If all work items need to independently mutate it, it will need a size of <WORK-ITEMS> * <BYTES-PER-ITEM> and each work-item will need to use its own slice of the buffer. If it's only temporary, you never need to copy it back to host memory.
However, this sounds like an access pattern that will work very inefficiently on GPUs. You will do much better if you decompose your problem differently. For example, you may be able to make whole work-groups coordinate work on some subrange of the array - copy the subrange into local (group-shared) memory, the work is divided between the work items in the group, and the results are written back to global memory, and the next subrange is read to local, etc. Coordinating between work-items in a group is much more efficient than each work item accessing a huge range of global memory We can only help you with this algorithmic approach if you are more specific about the computation you are trying to perform.
Why not to initialize this array in OpenCL host memory buffer. I.e.
const size_t buffer_size = 50000 * sizeof(float);
/* cl_malloc, malloc or new float [50000] or = {0.1f,0.2f,...} */
float *host_array_ptr = (float*)cl_malloc(buffer_size);
/*
put your data into host_array_ptr hear
*/
cl_int err_code;
cl_mem my_array = clCreateBuffer( my_cl_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, buffer_size, host_array_ptr, &err_code );
Then you can use this cl_mem my_array in OpenCL kernel
Find out more

Store all the node pointers in an array of pointers for Binary Search Tree

Recently I was trying to manipulate the binary search tree and got stuck here. I want to have an array(array of pointers) inside which I want to store the pointers of each node of the binary search tree in in-order fashion. I DON'T NEED THE VALUE OF EACH NODE I need the pointers so that I can access their value, left subtree and right subtree. What I have done is
struct node{
int key;
struct node *left, *right;
};
node **arr;
int x=0;
void inorder(struct node *root){
if (root != NULL){
inorder(root->left);
//cout<<"X : "<<x<<endl;
arr[x] = root;
x++;
printf("%d \n", root->key);
inorder(root->right);
}
}
Please help. Thanks.
You can do that, but if sorted array of node pointers satisfies your needs, then you don't need a binary search tree: you can perform binary search on the array. This data structure has the same access speed as a tree (can be even slightly faster because data is tightly packed in memory) and is very memory efficient. But insertion of new data is costly: o(n). So this solution is not appropriate if many insertions are expected. But in this case by maintaining that sorted array you loose all benefits of tree structure.

Qt: function returns object, putting it into a pointer

Hopefully this isn't too stupid but I want to make sure I'm doing this right.
Some Qt functions return Qt objects as values, but we may want to store them in a pointer somewhere. For example, in QDomDocument, the function documentElement returns a QDomElement, not a pointer to it. Now, as a member of my class I have:
QDomElement *listRootElement;
In a function that sets things up I am using this:
listRootElement = new QDomElement;
*listRootElement = mainIndex->documentElement();
(mainIndex is a QDomDocument.)
This seems to work, but I just want to make sure I'm doing it right and that nothing will come back to bite me.
It would be very similar for some of the image functions where a QPixmap might be returned, and I want to maintain pointers to QPixMap's.
Thanks for any comments!
Assuming that you want to store a pointer to a QDomElement for some reason, and assuming that you aware of the potential pitfalls with pointers (like, two pointers might point to the same object):
The only thing to keep in mind is that the popular 'parent takes care of deleting children' system which Qt uses is only available for QObject (sub-)classes. So when new'ing a QString or a QDomElement or something like that, keep in mind that you do have to delete it yourself, too.
I'm guessing, but I think this:
listRootElement = new QDomElement(mainIndex->documentElement());
...may allow the compiler to optimise better (see this question for some reasoning).
You're overwriting the initially allocated object:
QDomElement *listRootElement; // undefined ptr value, I'd prefer null or new right away
listRootElement = new QDomElement;
*listRootElement = mainIndex->documentElement();
You're essentially doing:
int *x = new int(42);
*x = 47;
This works because both QDomElement and int implements the assignment operator (=).
Note that there's no need to delete anything, as the returned temporary is copied into your newly allocated object.

How do game trainers change an address in memory that's dynamic?

Lets assume I am a game and I have a global int* that contains my health. A game trainer's job is to modify this value to whatever in order to achieve god mode. I've looked up tutorials on game trainers to understand how they work, and the general idea is to use a memory scanner to try and find the address of a certain value. Then modify this address by injecting a dll or whatever.
But I made a simple program with a global int* and its address changes every time I run the app, so I don't get how game trainers can hard code these addresses? Or is my example wrong?
What am I missing?
The way this is usually done is by tracing the pointer chain from a static variable up to the heap address containing the variable in question. For example:
struct CharacterStats
{
int health;
// ...
}
class Character
{
public:
CharacterStats* stats;
// ...
void hit(int damage)
{
stats->health -= damage;
if (stats->health <= 0)
die();
}
}
class Game
{
public:
Character* main_character;
vector<Character*> enemies;
// ...
}
Game* game;
void main()
{
game = new Game();
game->main_character = new Character();
game->main_character->stats = new CharacterStats;
// ...
}
In this case, if you follow mikek3332002's advice and set a breakpoint inside the Character::hit() function and nop out the subtraction, it would cause all characters, including enemies, to be invulnerable. The solution is to find the address of the "game" variable (which should reside in the data segment or a function's stack), and follow all the pointers until you find the address of the health variable.
Some tools, e.g. Cheat Engine, have functionality to automate this, and attempt to find the pointer chain by themselves. You will probably have to resort to reverse-engineering for more complicated cases, though.
Discovery of the access pointers is quite cumbersome and static memory values are difficult to adapt to different compilers or game versions.
With API hooking of malloc(), free(), etc. there is a different method than following pointers. Discovery starts with recording all dynamic memory allocations and doing memory search in parallel. The found heap memory address is then reverse matched against the recorded memory allocations. You get to know the size of the object and the offset of your value within the object. You repeat this with backtracing and get the jump-back code address of a malloc() call or a C++ constructor. With that information you can track and modify all objects which get allocated from there. You dump the objects and compare them and find a lot more interesting values. E.g. the universal elite game trainer "ugtrain" does it like this on Linux. It uses LD_PRELOAD.
Adaption works by "objdump -D"-based disassembly and just searching for the library function call with the known memory size in it.
See: http://en.wikipedia.org/wiki/Trainer_%28games%29
Ugtrain source: https://github.com/sriemer/ugtrain
The malloc() hook looks like this:
static __thread bool no_hook = false;
void *malloc (size_t size)
{
void *mem_addr;
static void *(*orig_malloc)(size_t size) = NULL;
/* handle malloc() recursion correctly */
if (no_hook)
return orig_malloc(size);
/* get the libc malloc function */
no_hook = true;
if (!orig_malloc)
*(void **) (&orig_malloc) = dlsym(RTLD_NEXT, "malloc");
mem_addr = orig_malloc(size);
/* real magic -> backtrace and send out spied information */
postprocess_malloc(size, mem_addr);
no_hook = false;
return mem_addr;
}
But if the found memory address is located within the executable or a library in memory, then ASLR is likely the cause for the dynamic. On Linux, libraries are PIC (position-independent code) and with latest distributions all executables are PIE (position-independent executables).
EDIT: never mind it seems it was just good luck, however the last 3 numbers of the pointer seem to stay the same. Perhaps this is ASLR kicking in and changing the base image address or something?
aaahhhh my bad, i was using %d for printf to print the address and not %p. After using %p the address stayed the same
#include <stdio.h>
int *something = NULL;
int main()
{
something = new int;
*something = 5;
fprintf(stdout, "Address of something: %p\nValue of something: %d\nPointer Address of something: %p", &something, *something, something);
getchar();
return 0;
}
Example for a dynamicaly allocated varible
The value I want to find is the number of lives to stop my lives from being reduced to 0 and getting game over.
Play the Game and search for the location of the lifes variable this instance.
Once found use a disassembler/debugger to watch that location for changes.
Lose a life.
The debugger should have reported the address that the decrement occurred.
Replace that instruction with no-ops
Got this pattern from the program called tsearch
A few related websites found from researching this topic:
http://deviatedhacking.com/index.php?/topic/75-dynamic-memory-allocation/
http://www.edgeofnowhere.cc/viewforum.php?f=183
http://www.oldschoolhack.de/tutorials/Theories%20and%20methods%20of%20code-caves.htm
http://webcache.googleusercontent.com/search?q=cache:4wzMzFIZx54J:gamehacking.com/forums/tutorials-beginners/11597-c-making-game-trainer.html+reading+a+dynamic+memory+address+game+trainer&cd=2&hl=en&ct=clnk&gl=au&client=firefox-a (A google cache version)
http://www.codeproject.com/KB/cpp/codecave.aspx
The way things like Gameshark codes were figured out were by dumping the memory image of the application, then doing one thing, then looking to see what changed. There might be a few things changing, but there should be patterns to look for. E.g. dump memory, shoot, dump memory, shoot again, dump memory, reload. Then look for changes and get an idea for where/how ammo is stored. For health it'll be similar, but a lot more things will be changing (since you'll be moving at the very least). It'll be easiest though to do it when minimizing the "external effects," e.g. don't try to diff memory dumps during a firefight because a lot is happening, do your diffs while standing in lava, or falling off a building, or something of that nature.

How to link Two Multi-Dimensional arrays using pointers?

I need to basically merge a Binary Heap, and Linear Probing Hashtable to make a "compound" data structure, which has the functionality of a heap, with the sorting power of a hashtable.
What I need to do is create 2 2 dimension arrays for each data structure (Binary Heap, and Hash) then link them to each other with pointers so that when I change things, such as deleting a value in the Binary Heap, it also gets deleted in the Hash table.
Therefore, I need to have one row of the Heap array pointing from the Heap to the Hastable, and one row of the hashtable array pointing from the hashtable to the heap.
Create a container that contains both, with accessor functions/methods (depending on your language of implementation) that performs all the operations required of your algorithm.
IE:
Delete from container: does a delete from Binary and from hash.
Add to container: adds to binary and to hash.
EDIT:
Oh, an assignment - fun! :)
I'd do this:
still implement a container. But, instead of using a standard library for btree/hash, implement them like this:
Make a type that can be put in your data member that has a pointer to the BTree node and the Hashtable Node that the data element lives in.
To delete a data element, given a pointer to it, you can perform the delete algorithm on a btree (navigate to parent from node pointer, delete child (left or right), restructure tree) and on the hash table (delete from hash list). When adding a value, perform the add algorithm on btree and hash, but be sure you update the node pointers in the data before you return.
Some pseudocode (I'll use C, but i'm not sure what language your using):
typedef struct
{
BTreeNode* btree
HashNode* hash
} ContianerNode;
to put data in your container:
typedef struct
{
ContainerNode node;
void* data; /* whatever the data is */
} Data;
a BTreeNode has something like:
typedef struct _BTreeNode
{
struct _BTreeNode* parent;
struct _BTreeNode* left;
struct _BTreeNode* right;
} BTreeNode;
and a HashNode has something like:
typedef struct _HashNode
{
struct _HashNode* next;
} HashNode;
/* ala singly linked list */
and your BTree would be a pointer to a BTreeNode and your hastable would be an array of pointers to HashNodes. Like this:
typedef struct
{
BTreeNode* btree;
HashNode* hashtable[HASHTABLESIZE];
} Container;
void delete(Container* c, ContainerNode* n)
{
delete_btree_node(n->btree);
delete_hashnode(n->hash);
}
ContainerNode* add(Container* c, void* data)
{
ContainerNode* n = malloc(sizeof(ContainerNode));
n->btree = add_to_btree(n);
n->hash = add_to_hash(n);
}
I'll let you complete those other functions (can't do the whole assignment for you ;) )
Why bother with the links?
You have two associative structures just duplicate any operation on one to the other (ensuring that if one operation excepts you either crash the whole thing or leave the object in a valid state if you care about such things)
Unless you can make use of the structure of one to help you with the other (and I don't see how you can since either one can entirely rearrange it's internal state on any modification operation) this is just as effective and much simpler.
Of course this means that the O() cost of any modification operation is the cost of the most expensive and memory costs are doubled but that is true of the original plan unless their is some trick I'm missing.

Resources