Understanding the method for OpenCL reduction on float

Understanding the method for OpenCL reduction on float - opencl

Following this link, I try to understand the operating of kernel code (there are 2 versions of this kernel code, one with volatile local float *source and the other with volatile global float *source, i.e local and global versions). Below I take local version :
float sum=0;
void atomic_add_local(volatile local float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while (atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
Afterwards, if I take a work-item, the code will add operand value to newVal.floatVal variable. Then, after this operation, I call atomic_cmpxchg function which check if previous assignment (preVal.floatVal = *source; and newVal.floatVal = prevVal.floatVal + operand; ) has been done, i.e by comparing the value stored at address source with the preVal.intVal.
During this atomic operation (which is not uninterruptible by definition), as value stored at source is different from prevVal.intVal, the new value stored at source is newVal.intVal, which is actually a float (because it is coded on 4 bytes like integer).
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
But for each work-item thread, is there only one iteration into the while loop?
I think there will be one iteration because the comparison "*source== prevVal.int ? newVal.intVal : newVal.intVal" will always assign newVal.intVal value to value stored at source address, won't it?
I have not understood all the subtleties of this trick for this kernel code.
Update
Sorry, I almost understand all the subtleties, especially in the while loop :
First case : for a given single thread, before the call of atomic_cmpxchg, if prevVal.floatVal is still equal to *source, then atomic_cmpxchg will change the value contained in source pointer and return the value contained in old pointer, which is equal to prevVal.intVal, so we break from the while loop.
Second case : If between the prevVal.floatVal = *source; instruction and the call of atomic_cmpxchg, the value *source has changed (by another thread ??) then atomic_cmpxchg returns old value which is no more equal to prevVal.floatVal, so the condition into while loop is true and we stay in this loop until previous condition isn't checked any more.
Is my interpretation correct?

If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
volatile is a keyword of the C language that prevents the compiler from optimizing accesses to a specific location in memory (in other words, force a load/store at each read/write of said memory location). It has no impact on the ownership of the underlying storage. Here, it is used to force the compiler to re-read source from memory at each loop iteration (otherwise the compiler would be allowed to move that load outside the loop, which breaks the algorithm).
do {
prevVal.floatVal = *source; // Force read, prevent hoisting outside loop.
newVal.floatVal = prevVal.floatVal + operand;
} while(atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal)
After removing qualifiers (for simplicity) and renaming parameters, the signature of atomic_cmpxchg is the following:
int atomic_cmpxchg(int *ptr, int expected, int new)
What it does is:
atomically {
int old = *ptr;
if (old == expected) {
*ptr = new;
}
return old;
}
To summarize, each thread, individually, does:
Load current value of *source from memory into preVal.floatVal
Compute desired value of *source in newVal.floatVal
Execute the atomic compare-exchange described above (using the type-punned values)
If the result of atomic_cmpxchg == newVal.intVal, it means the compare-exchange was successful, break. Otherwise, the exchange didn't happen, go to 1 and try again.
The above loop eventually terminates, because eventually, each thread succeeds in doing their atomic_cmpxchg.
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
Mutexes are locks, while this is a lock-free algorithm. OpenCL can simulate mutexes with spinlocks (also implemented with atomics) but this is not one.

Related

Can I have boolean buffer in OpenCL and change its value during kernel execution, example to break while loop

I want to do some experiments in OpenCL and I want to know possibility to change states during kernel execution from host code using buffer.
I attempted to alter the state of a while loop in the kernel code by modifying the buffer value from within the host code, however the execution is hung.
void my_kernel(
__global bool *in,
__global int *out)
{
int i = get_global_id(0);
while(1) {
if(1 == *in) {
printf("while loop is finished");
break;
}
}
printf("out[0] = %d\n", out[0]);
}
I call second time the function clEnqueueWriteBuffer() to change state of input value.
input[0] = 1;
err = clEnqueueWriteBuffer(commands, input_buffer,
CL_TRUE, 0, sizeof(int), (void*)input,
0, NULL,NULL);

At least for OpenCL 1.x, this is not permitted, and any behaviour you may observe in one implementation cannot be relied upon.
See the NOTE in the OpenCL 1.2 specification, section 5.2.2, Reading, Writing and Copying Buffer Objects:
Calling clEnqueueWriteBuffer to update the latest bits in a region of the buffer object with the ptr argument value set to host_ptr + offset, where host_ptr is a pointer to the memory region specified when the buffer object being written is created with CL_MEM_USE_HOST_PTR, must meet the following requirements in order to avoid undefined behavior:
The host memory region given by (host_ptr + offset, cb) contains the latest bits when the enqueued write command begins execution.
The buffer object or memory objects created from this buffer object are not mapped.
The buffer object or memory objects created from this buffer object are not used by any command-queue until the write command has finished execution.
The final condition is not met by your code, therefore its behaviour is undefined.
I am not certain if the situation is different with OpenCL 2.x's Shared Virtual Memory (SVM) feature, as I have no practical experience using it, perhaps someone else can contribute an answer for that.

Finding pointer with 'find out what writes to this address' strange offset

I'm trying to find a base pointer for UrbanTerror42.
My setup is as followed, I have a server with 2 players.
cheat-engine runs on client a.
I climb a ladder with client b and then scan for incease/decrease.
When I have found the values, I use find out what writes to this address.
But the offset are very high and point to empty memory.
I don't really know how to proceed
For the sake of clarity, I have looked up several other values and they have the same problem
I've already looked at a number of tutorials and forums, but that's always about values where the offsets are between 0 and 100 and not 80614.
I would really appreciate it if someone could tell me why this happened and what I have to do/learn to proceed.
thanks in advance

Urban Terror uses the Quake Engine. Early versions of this engine use the Quake Virtual Machine and the game logic is implemented as bytecode which is compiled into assembly by the Quake Virtual Machine. Custom allocation routines are used to load these modules into memory, relative and hardcoded offsets/addresses are created at runtime to accommodate these relocations and do not use the normal relocation table method of the portable executable file format. This is why you see these seemingly strange numbers that change every time you run the game.
The Quake Virtual Machines are file format .qvm and these qvms in memory are tracked in the QVM table. You must find the QVM table to uncover this mystery. Once you find the 2-3 QVMs and record their addresses, finding the table is easy, as you're simply doing a scan for pointers that point to these addresses and narrowing down your results by finding those which are close in memory to each other.
The QVM is defined like:
struct vmTable_t
{
vm_t vm[3];
};
struct vm_s {
// DO NOT MOVE OR CHANGE THESE WITHOUT CHANGING THE VM_OFFSET_* DEFINES
// USED BY THE ASM CODE
int programStack; // the vm may be recursively entered
intptr_t(*systemCall)(intptr_t *parms);
//------------------------------------
char name[MAX_QPATH];
// for dynamic linked modules
void *dllHandle;
intptr_t entryPoint; //(QDECL *entryPoint)(int callNum, ...);
void(*destroy)(vm_s* self);
// for interpreted modules
qboolean currentlyInterpreting;
qboolean compiled;
byte *codeBase;
int codeLength;
int *instructionPointers;
int instructionCount;
byte *dataBase;
int dataMask;
int stackBottom; // if programStack < stackBottom, error
int numSymbols;
struct vmSymbol_s *symbols;
int callLevel; // counts recursive VM_Call
int breakFunction; // increment breakCount on function entry to this
int breakCount;
BYTE *jumpTableTargets;
int numJumpTableTargets;
};
typedef struct vm_s vm_t;
The value in EAX in your original screenshot should be the same as either the codeBase or dataBase member variable of the QVM structure. The offsets are just relative to these addresses. Similarly to how you deal with ASLR, you must calculate the addresses at runtime.
Here is a truncated version of my code that does exactly this and additionally grabs important structures from memory, as an example:
void OA_t::GetVM()
{
cg = nullptr;
cgs = nullptr;
cgents = nullptr;
bLocalGame = false;
cgame = nullptr;
for (auto &vm : vmTable->vm)
{
if (strstr(vm.name, "qagame")) { bLocalGame = true; continue; }
if (strstr(vm.name, "cgame"))
{
cgame = &vm;
gamestatus = GSTAT_GAME;
//char* gamestring = Cvar_VariableString("fs_game");
switch (cgame->instructionCount)
{
case 136054: //version 88
cgents = (cg_entities*)(cgame->dataBase + 0x1649c);
cg = (cg_t*)(cgame->dataBase + 0xCC49C);
cgs = (cgs_t*)(cgame->dataBase + 0xf2720);
return;
Full source code for reference available at OpenArena Aimbot Source Code, it even includes a video overview of the code.
Full disclosure: that is a link to my website and the only viable resource I know of that covers this topic.

Memory allocation for Pointer

In which section memory is allocated if I write something like
1. int *ptr;
*ptr = 22;
2. int *ptr = new int(22);
What I Understand is when we use keyword new then memory is get reserved into Heap and that reserved memory address is get returned .
But what happened in case we didn't use keyword new ?? Where memory is get allocated ??
is Both Syntax is Same ?? If No, what is Exact difference between these two statement ??

You code examples can be rephrased as follows:
1st:
int * ptr;
*ptr = 22;
2nd:
int * ptr;
ptr = new int; //the only difference
*ptr = 22;
What happens in the second one:
int * ptr; means create variable capable of storing address of int variable. For now variable isn't initialized, so it stores garbage. If you interpret garbage as pointer, it can points anywhere (it can be 0, or 0xabcdef11, or 0x31323334, or literally ANYTHING which is left on non-cleared memory form previous usage)
ptr = new int; means "allocate memory area capable of holding int and store its address in ptr variable". Since this line, ptr points to specific memory
*ptr = 22; means put value 22 to memory pointed by ptr.
In the first example you create variable, but don't initialize it. ptr contains garbage, but you ask to interpret it as address and store 22 to this address. What can happen:
address is invalid (e.g. 0, or out of address range, or points to protected memory) => program crashes
address is valid and writable, but memory area is used by another part of the program: you'll write 22, but it will corrupt someone's data, result totally unpredictable.
address is valid and writable, memory area isn't in use. You'll write 22, but you aren't guaranteed to read it back. Memory can become used for different purpose and 22 will be overwritten.
anything else. All this is actually an undefined behavior, everything is possible.
That's why it's always recommended to initialize pointer immediately:
int * ptr = NULL; //or better "nullptr" starting from C++11
Attempt to store value *ptr = 22; will at least explicitly crash the program.

C functions returning an array

Sorry for the post. I have researched this but..... still no joy in getting this to work. There are two parts to the question too. Please ignore the code TWI Reg code as its application specific I need help on nuts and bolts C problem.
So... to reduce memory usage for a project I have started to write my own TWI (wire.h lib) for ATMEL328p. Its not been put into a lib yet as '1' I have no idea how to do that yet... will get to that later and '2'its a work in progress which keeps getting added to.
The problem I'm having is with reading multiple bytes.
Problem 1
I have a function that I need to return an Array
byte *i2cBuff1[16];
void setup () {
i2cBuff1 = i2cReadBytes(mpuAdd, 0x6F, 16);
}
/////////////////////READ BYTES////////////////////
byte* i2cReadBytes(byte i2cAdd, byte i2cReg, byte i2cNumBytes) {
static byte result[i2cNumBytes];
for (byte i = 0; i < i2cNumBytes; i ++) {
result[i] += i2cAdd + i2cReg;
}
return result;
}
What I understand :o ) is I have declared a Static byte array in the function which I point to as the return argument of the function.
The function call requests the return of a pointer value for a byte array which is supplied.
Well .... it doesn't work .... I have checked multiple sites and I think this should work. The error message I get is:
MPU6050_I2C_rev1:232: error: incompatible types in assignment of 'byte* {aka unsigned char*}' to 'byte* [16] {aka unsigned char* [16]}'
i2cBuff1 = i2cReadBytes(mpuAdd, 0x6F, 16);
Problem 2
Ok say IF the code sample above worked. I am trying to reduce the amount of memory that I use in my sketch. By using any memory in the function even though the memory (need) is released after the function call, the function must need to reserve an amount of 'space' in some way, for when the function is called. Ideally I would like to avoid the use of static variables within the function that are duplicated within the main program.
Does anyone know the trade off with repeated function call.... i.e looping a function call with a bit shift operator, as apposed to calling a function once to complete a process and return ... an Array? Or was this this the whole point that C does not really support Array return in the first place.
Hope this made sense, just want to get the best from the little I got.
BR
Danny

This line:
byte *i2cBuff1[16];
declares i2cBuff1 as an array of 16 byte* pointers. But i2cReadBytes doesn't return an array of pointers, it returns an array of bytes. The declaration should be:
byte *i2cBuff1;
Another problem is that a static array can't have a dynamic size. A variable-length array has to be an automatic array, so that its size can change each time the function is called. You should use dynamic allocation with malloc() (I used calloc() instead because it automatically zeroes the memory).
byte* i2cReadBytes(byte i2cAdd, byte i2cReg, byte i2cNumBytes) {
byte *result = calloc(i2cNumBytes, sizeof(byte));
for (byte i = 0; i < i2cNumBytes; i ++) {
result[i] += i2cAdd + i2cReg;
}
return result;
}

swapping address values of pointers

Below is code and I want to ask, why I am not getting swapped number as a result, because instead of swapping numbers I tried to swap their addresses.
int *swap(int *ptr1,int *ptr2){
int *temp;
temp = ptr1;
ptr1= ptr2;
ptr2=temp;
return ptr1,ptr2;
}
int main(){
int num1=2,num2=4,*ptr1=&num1,*ptr2=&num2;
swap(ptr1,ptr2);
printf("\nafter swaping the first number is : %d\t and the second number is : %d\n",*ptr1,*ptr2);
}

I can see two problems in your code.
First, within the swap function, ptr1 and ptr2 are local copies of the pointers in main with the same name. Changing them in swap only changes those copies, not the originals.
Second, the return statement doesn't do anything useful. The function swap is declared as returning a single int *. The return statement actually only returns ptr2 - for why that is, look up the "comma operator" in C. But you ignore the return value in main anyway, so it makes no odds.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex