Where's the memory leak? - pointers

So I'm learning pointers and having a difficult time identifying the memory leak here. I confess I have never used malloc() before and am new to pointer arithmetic. Thanks in advance.
/*filename: p3.c */
#include <stdio.h>
#include <stdlib.h>
int main()
{
char *buffer;
char *p;
int n;
/* allocate 10 bytes */
buffer = (char *) malloc(10);
p = buffer;
for (n=0; n<=10; n++)
*p++ = '*';
p = buffer;
for (n=0; n <=10; n++)
printf("%c ", *p++);
return 0;
}

The rule is rather simple, for every malloc there must be a free. If you have more mallocs than frees you forgot to de-allocate memory and so you have a memory leak. If you have more frees than mallocs you're trying to de-allocate memory that has already been de-allocated and that's not something you want.

You simply need to free your buffer by using the free() function, when you don't need the buffer anymore:
/* ... */
free( buffer );
return 0;
}
Simply remember to balance each call to malloc with a call to free, when the memory is not used anymore.
The operations on your p variable won't affect buffer. They are two pointers pointing to the same area (at start), but they're still two distinct variables. So incrementing p won't increment buffer.
So nothing wrong with the pointer operations on p, except the fact you are writing out of bounds, as stated by Daniel Fisher in the comments of your question.
Also note that you should also always check for NULL, after the malloc call, as malloc may fail. It's pretty rare nowadays, but if it fails, your program will probably crash, as you will then dereference a NULL pointer:
buffer = malloc( 10 );
if( buffer == NULL )
{
/* Error management - Do not use buffer */
}
The cast to char * is not needed on malloc, unless you are dealing with C++. In C, it's valid to assign a void pointer to another pointer type.

It is not n<=10 you want, but n<10.

You call malloc and never call free. Of course it leaks.
In principle, every single allocation you request from the alloc family of function should be freeed as soon as you are done with them.
Buffers that you continue to use to up to the termination of the program are formally leaks, but not a problem as long as you are allocating a well defined number of them. That includes what you are doing here.

Related

Can I have boolean buffer in OpenCL and change its value during kernel execution, example to break while loop

I want to do some experiments in OpenCL and I want to know possibility to change states during kernel execution from host code using buffer.
I attempted to alter the state of a while loop in the kernel code by modifying the buffer value from within the host code, however the execution is hung.
void my_kernel(
__global bool *in,
__global int *out)
{
int i = get_global_id(0);
while(1) {
if(1 == *in) {
printf("while loop is finished");
break;
}
}
printf("out[0] = %d\n", out[0]);
}
I call second time the function clEnqueueWriteBuffer() to change state of input value.
input[0] = 1;
err = clEnqueueWriteBuffer(commands, input_buffer,
CL_TRUE, 0, sizeof(int), (void*)input,
0, NULL,NULL);
At least for OpenCL 1.x, this is not permitted, and any behaviour you may observe in one implementation cannot be relied upon.
See the NOTE in the OpenCL 1.2 specification, section 5.2.2, Reading, Writing and Copying Buffer Objects:
Calling clEnqueueWriteBuffer to update the latest bits in a region of the buffer object with the ptr argument value set to host_ptr + offset, where host_ptr is a pointer to the memory region specified when the buffer object being written is created with CL_MEM_USE_HOST_PTR, must meet the following requirements in order to avoid undefined behavior:
The host memory region given by (host_ptr + offset, cb) contains the latest bits when the enqueued write command begins execution.
The buffer object or memory objects created from this buffer object are not mapped.
The buffer object or memory objects created from this buffer object are not used by any command-queue until the write command has finished execution.
The final condition is not met by your code, therefore its behaviour is undefined.
I am not certain if the situation is different with OpenCL 2.x's Shared Virtual Memory (SVM) feature, as I have no practical experience using it, perhaps someone else can contribute an answer for that.

Need help in understanding Pointers and Strings using stack and heap memory

I was trying to understand underlying process when pointers, strings and functions are combined along with heap/stack memory. I was able to understand and learn, but I ended up with two errors which I failed to find out why.
My problem lies here:
// printf("%s\n", *ptrToString); // Gives bad mem access error if heap memory used
// printf("%s\n", ptrToString); // Output is wrong if stack was used for memory, and prints some hex values instead
Can anyone explain what am I missing here ? Also, I would like to ask some feedback about my code, and suggest any improvements we can make.
Thanks
Full code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define USE_STACK_MEMORY 0
char* NewString(char string[])
{
unsigned long num_chars;
char *copy = NULL;
// Find string length
num_chars = strlen(string);
// Allocate memory
#if USE_STACK_MEMORY
copy = alloca(sizeof(copy) + num_chars + 1); // Use stack memory
#else
copy = malloc(sizeof(copy) + num_chars + 1); // Use heap memory
#endif
// Make a local copy
strcpy(copy, string);
// If we use stack then it returns a string literal
return copy;
}
int main(void)
{
char *ptrToString = NULL;
ptrToString = NewString("HI");
printf("%s\n", ptrToString);
// printf("%s\n", *ptrToString); // Gives bad mem access error if heap memory used
// printf("%s\n", ptrToString); // Output is wrong if stack was used for memory, and prints some hex values instead
#if !USE_STACK_MEMORY
if ( ptrToString ) {
free(ptrToString);
}
#endif
return 0;
}
The first print reads the value where the pointer points to. It interprets this value then as a pointer to a string. This means the first value of your string will be interpreted as the address where the string would be.
The second print is wrong for stack memory because the memory you allocate with alloca is automatically freed as soon as your NewString method returns.
From the man page of alloca:
The alloca() function allocates size bytes of space in the stack frame
of the caller. This temporary space is automatically freed when the
function that called alloca() returns to its caller.

Understanding the method for OpenCL reduction on float

Following this link, I try to understand the operating of kernel code (there are 2 versions of this kernel code, one with volatile local float *source and the other with volatile global float *source, i.e local and global versions). Below I take local version :
float sum=0;
void atomic_add_local(volatile local float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while (atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
Afterwards, if I take a work-item, the code will add operand value to newVal.floatVal variable. Then, after this operation, I call atomic_cmpxchg function which check if previous assignment (preVal.floatVal = *source; and newVal.floatVal = prevVal.floatVal + operand; ) has been done, i.e by comparing the value stored at address source with the preVal.intVal.
During this atomic operation (which is not uninterruptible by definition), as value stored at source is different from prevVal.intVal, the new value stored at source is newVal.intVal, which is actually a float (because it is coded on 4 bytes like integer).
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
But for each work-item thread, is there only one iteration into the while loop?
I think there will be one iteration because the comparison "*source== prevVal.int ? newVal.intVal : newVal.intVal" will always assign newVal.intVal value to value stored at source address, won't it?
I have not understood all the subtleties of this trick for this kernel code.
Update
Sorry, I almost understand all the subtleties, especially in the while loop :
First case : for a given single thread, before the call of atomic_cmpxchg, if prevVal.floatVal is still equal to *source, then atomic_cmpxchg will change the value contained in source pointer and return the value contained in old pointer, which is equal to prevVal.intVal, so we break from the while loop.
Second case : If between the prevVal.floatVal = *source; instruction and the call of atomic_cmpxchg, the value *source has changed (by another thread ??) then atomic_cmpxchg returns old value which is no more equal to prevVal.floatVal, so the condition into while loop is true and we stay in this loop until previous condition isn't checked any more.
Is my interpretation correct?
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
volatile is a keyword of the C language that prevents the compiler from optimizing accesses to a specific location in memory (in other words, force a load/store at each read/write of said memory location). It has no impact on the ownership of the underlying storage. Here, it is used to force the compiler to re-read source from memory at each loop iteration (otherwise the compiler would be allowed to move that load outside the loop, which breaks the algorithm).
do {
prevVal.floatVal = *source; // Force read, prevent hoisting outside loop.
newVal.floatVal = prevVal.floatVal + operand;
} while(atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal)
After removing qualifiers (for simplicity) and renaming parameters, the signature of atomic_cmpxchg is the following:
int atomic_cmpxchg(int *ptr, int expected, int new)
What it does is:
atomically {
int old = *ptr;
if (old == expected) {
*ptr = new;
}
return old;
}
To summarize, each thread, individually, does:
Load current value of *source from memory into preVal.floatVal
Compute desired value of *source in newVal.floatVal
Execute the atomic compare-exchange described above (using the type-punned values)
If the result of atomic_cmpxchg == newVal.intVal, it means the compare-exchange was successful, break. Otherwise, the exchange didn't happen, go to 1 and try again.
The above loop eventually terminates, because eventually, each thread succeeds in doing their atomic_cmpxchg.
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
Mutexes are locks, while this is a lock-free algorithm. OpenCL can simulate mutexes with spinlocks (also implemented with atomics) but this is not one.

C functions returning an array

Sorry for the post. I have researched this but..... still no joy in getting this to work. There are two parts to the question too. Please ignore the code TWI Reg code as its application specific I need help on nuts and bolts C problem.
So... to reduce memory usage for a project I have started to write my own TWI (wire.h lib) for ATMEL328p. Its not been put into a lib yet as '1' I have no idea how to do that yet... will get to that later and '2'its a work in progress which keeps getting added to.
The problem I'm having is with reading multiple bytes.
Problem 1
I have a function that I need to return an Array
byte *i2cBuff1[16];
void setup () {
i2cBuff1 = i2cReadBytes(mpuAdd, 0x6F, 16);
}
/////////////////////READ BYTES////////////////////
byte* i2cReadBytes(byte i2cAdd, byte i2cReg, byte i2cNumBytes) {
static byte result[i2cNumBytes];
for (byte i = 0; i < i2cNumBytes; i ++) {
result[i] += i2cAdd + i2cReg;
}
return result;
}
What I understand :o ) is I have declared a Static byte array in the function which I point to as the return argument of the function.
The function call requests the return of a pointer value for a byte array which is supplied.
Well .... it doesn't work .... I have checked multiple sites and I think this should work. The error message I get is:
MPU6050_I2C_rev1:232: error: incompatible types in assignment of 'byte* {aka unsigned char*}' to 'byte* [16] {aka unsigned char* [16]}'
i2cBuff1 = i2cReadBytes(mpuAdd, 0x6F, 16);
Problem 2
Ok say IF the code sample above worked. I am trying to reduce the amount of memory that I use in my sketch. By using any memory in the function even though the memory (need) is released after the function call, the function must need to reserve an amount of 'space' in some way, for when the function is called. Ideally I would like to avoid the use of static variables within the function that are duplicated within the main program.
Does anyone know the trade off with repeated function call.... i.e looping a function call with a bit shift operator, as apposed to calling a function once to complete a process and return ... an Array? Or was this this the whole point that C does not really support Array return in the first place.
Hope this made sense, just want to get the best from the little I got.
BR
Danny
This line:
byte *i2cBuff1[16];
declares i2cBuff1 as an array of 16 byte* pointers. But i2cReadBytes doesn't return an array of pointers, it returns an array of bytes. The declaration should be:
byte *i2cBuff1;
Another problem is that a static array can't have a dynamic size. A variable-length array has to be an automatic array, so that its size can change each time the function is called. You should use dynamic allocation with malloc() (I used calloc() instead because it automatically zeroes the memory).
byte* i2cReadBytes(byte i2cAdd, byte i2cReg, byte i2cNumBytes) {
byte *result = calloc(i2cNumBytes, sizeof(byte));
for (byte i = 0; i < i2cNumBytes; i ++) {
result[i] += i2cAdd + i2cReg;
}
return result;
}

Basic OpenCL Mutex Implementation (Currently Hanging)

I am trying to write a mutex for OpenCL. The idea is for every single individual work item to be able to proceed atomically. Currently, I believe the problem may be that thread warps are unable to proceed when one thread in a warp gets the lock.
My current simple kernel below, for summing numbers. "numbers" is an array of floats as input. "sum" is a one element array for the result, and "semaphore" is a one element array for holding the semaphore. I based it heavily off the example here.
void acquire(__global int* semaphore) {
int occupied;
do {
occupied = atom_xchg(semaphore, 1);
} while (occupied>0);
}
void release(__global int* semaphore) {
atom_xchg(semaphore, 0); //the previous value, which is returned, is ignored
}
__kernel void test_kernel(__global float* numbers, __global float* sum, __global int* semaphore) {
int i = get_global_id(0);
acquire(semaphore);
*sum += numbers[i];
release(semaphore);
}
I am calling the kernel effectively like:
int numof_dimensions = 1;
size_t offset_global[1] = {0};
size_t size_global[1] = {4000}; //the length of the numbers array
size_t* size_local = NULL;
clEnqueueNDRangeKernel(command_queue, kernel, numof_dimensions,offset_global,size_global,size_local, 0,NULL, NULL);
As above, when running, the graphics card hangs, and the driver restarts itself. How can I fix it so that it doesn't?
What you are trying to do is not possible because of the GPU execution model, where all threads on a "processor" share the instruction pointer, even in branches. Here is a post that explains the problem in detail: http://vansa.ic.cz/author/admin/.
BTW, the example code that you found has the exact same problem and would never work.
The answer to this might seem obvious in retrospect, but it's not unless you thought of it.
Basically, the GPU's prediction of the ideal local group size (size of a thread warp) is greater than 1, and so thread warps lock up. To fix it, you just need to specify it to be 1 (i.e. "size_t size_local[1] = {1};"). Doing this produces a correct result.

Resources