Recursive deallocation of larger tries - recursion

I have written a basic function for recursively deallocating a trie data structure in C:
// Root pointer is passed as arg in initial call
void destroy(node *trav)
{
for (int i = 0; i < N; i++)
{
if (trav->children[i])
{
destroy(trav->children[i]);
}
}
free(trav);
}
This functions seems to work perfectly fine with any smaller dictionary file. The largest file that the program successfully loaded and unloaded contained 134,480 words.
However, it produces a segmentation fault when deallocating a larger trie. The larger file that causes a segmentation fault contains 506,915 words.
The error message produced by Valgrind states: "Invalid read of size 8" followed by several backtraces and finally; "Address is not stack'd, malloc'd or (recently) free'd".
What might be causing this?

What might be causing this?
Stack overflow might be causing this, although that seems somewhat unlikely: there are almost no locals, so each frame probably only consumes 32 bytes of stack, and that would allow recursion of 8M/32 == 262144 levels deep with Linux default 8MiB stack.
However, if your trie is extremely unbalanced, stack overflow is possible.
You can try ulimit -s unlimited and see if that makes the problem go away.
Or you could run your program under GDB, and examine the instruction at which the SIGSEGV is reported. If it's a CALL, PUSH, or another form of "move to stack", stack overflow is also very likely.

Related

What is the rule behind instruction count in Intel PIN?

I wanted to count instructions in simple recursive fibo function O(2^n). I succeded to do so with bubble sort and matrix multiplication, but in this case it seemed like instruction count ignored my fibo function. Here is the code used for instrumentation:
// Insert a call at the entry point of a routine to increment the call count
RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR)docount, IARG_PTR, &(rc->_rtnCount), IARG_END);
// For each instruction of the routine
for (INS ins = RTN_InsHead(rtn); INS_Valid(ins); ins = INS_Next(ins))
{
// Insert a call to docount to increment the instruction counter for this rtn
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_PTR, &(rc->_icount), IARG_END);
}
I started to wonder what's the difference between this program and the previous ones and my first thought was: here I'm not using an array.
This is what I realised after some manual tests:
a = 5; // instruction ignored by PIN and
// pretty much everything not using array
fibo[1] = 1 // instruction counted properly
a = fibo[1] // instruction ignored by PIN
So it seems like only instructions counted are writes to the memory (that's what I assume). After I changed my fibo function to this it works:
long fibonacciNumber(int n, long *fiboNumbers)
{
if (n < 2) {
fiboNumbers[n] = n;
return n;
}
fiboNumbers[n] = fiboNumbers[n-1] + fiboNumbers[n-2];
return fibonacciNumber(n - 1, fiboNumbers) + fibonacciNumber(n - 2, fiboNumbers);
}
But I would like to count instructions also for programs that aren't written by me. Is there a way to count all type of instrunctions? Is there any particular reason why only this instructions are counted? Any help appreciated.
//Edit
I used disassembly option in Visual Studio to check how it looks and it still makes no sense for me. I can't find the reason why only assingment to array is interpreted by PIN as instruction.
instruction_comparison
This exceeded all my expectations, counted as 2 instructions:
even 2 instructions, not one
PIN, like other low-level profiling and analysis tools, measures individual instructions, low-level orders like "add these two registers" or "load a value from that memory address". The sequence of instructions which a program comprises are generally produced from a high-level language like C++ through a compiler. An individual line of C++ code might be transformed into exactly one instruction, but it's also common for a line to translate to several instructions or even to zero instructions; and the instructions for a line of code may be interleaved with those of other instructions.
Your compiler can output an assembly-language file for your source code, showing what instructions were produced for which lines of code. (For GCC and Clang, this is done with the -S flag.) Note that reading the assembly code output from a compiler is not the best way to learn assembly. Also, I would point you to godbolt.org, a very convenient tool for analyzing assembly output.

clEnqueueNDRangeKernel' failed with error 'out of resources'

From my kernel, I call a function say f which has an infinite loop which breaks on ++depth > 5. This works without the following snippet.
for(int j = 0;j < 9;j++){
f1 = inside(prev.o, s[j]);
f2 = inside(x.o, s[j]);
if((f1 ^ f2)){
stage = 0;
break;
}
else if(fabs(offset(x.o, s[j])) < EPSILON)
{
id = j;
stage = 1;
break;
}
}
Looping over the 9 elements in s is the only thing I do here. This is inside the infinite loop. I checked and this does not have a problem running 2 times but the third time it runs out of memory. What is going on? It's not like I am creating any new variables anywhere. There is a lot of code in the while loop which does more complicated computation than the above snippet and that does not run into a problem. My guess is that I'm doing something wrong with storing s.
If you read the OpenCL documentation, the error is not produced because the kernel code is wrong. The code is not even run at all, it all happens at the queueing step:
OpenCL: clEnqueuNDRangeKernel
CL_OUT_OF_RESOURCES:
If there is a failure to queue the execution instance of kernel on the command-queue because of insufficient resources needed to execute the kernel. For example, the explicitly specified local_work_size causes a failure to execute the kernel because of insufficient resources such as registers or local memory.
Another example would be the number of read-only image args used in kernel exceed the CL_DEVICE_MAX_READ_IMAGE_ARGS value for device or the number of write-only image args used in kernel exceed the CL_DEVICE_MAX_WRITE_IMAGE_ARGS value for device or the number of samplers used in kernel exceed CL_DEVICE_MAX_SAMPLERS for device.
Check the local memory size, local group size, constant memory and kernel arguments size.

I don't understand the result of this little program

i've made this little program to test a little part of a bigger program.
int main()
{
char c[]="ddddddddddddd";
char *g= malloc(4*sizeof(char));
*g=NULL;
strcpy (g,c);
printf("Hello world %s!\n",g);
return 0;
}
I expected that the function would return "Hello World dddd" ,since the length of g is 4*sizeof(char), but it returns " Hello World ddddddddddddd ".Can you explain me Where I'm wrong ?
Don't do that, it's undefined behaviour.
The strcpy function will happily copy all those characters in c regardless of the size of g.
That's because it copies characters up to the first \0 in c. In this particular case it may corrupt your heap, or it may not, depending on the minimum size of things that get allocated in the heap (many have a "resolution" of sixteen bytes for example).
There are other functions you can use (though they're optional) if you want your code to be more robust, such as strncpy (provided you understand the limitations), or strcpy_s(), as detailed in Appendix K of the ISO C11 standard (and earlier iterations as well).
Or, if you can't use those for some reason, it's up to the developer to ensure they don't break the rules.

What are kernel blocks in OpenCL?

In the article "How to set up Xcode to run OpenCL code, and how to verify the kernels before building" NeXTCoder referred to some code as the "Short Answer", i.e. https://developer.apple.com/library/mac/#documentation/Performance/Conceptual/OpenCL_MacProgGuide/XCodeHelloWorld/XCodeHelloWorld.html.
In that code the author says "Wrap your kernel code into a kernel block:" without explaining what is a "kernel block". (The OpenCL Programmer Guide for Mac OS X by Apple makes no mention of kernel block.)
The host program calls "square_kernel" but the sample kernel is called "square" and the sample kernel block is labelled "kernelName" (in italics). Can you please tell me how to put the 3 pieces together:kernel, kernel block & host program to run in Xcode 5.1? I only have one kernel. Thanks.
It's not really jargon. It's closure-like entity.
OpenCL C 2.0 adds support for the clang block syntax. You use the ^ operator to declare a Block variable and to indicate the beginning of a Block literal. The body of the Block itself is contained within {}, as shown in the example (as usual with C, ; indicates the end of the statement).The Block is able to make use of variables from the same scope in which it was defined.
Example:
int multiplier = 7;
int (^myBlock)(int) = ^(int num) {
return num * multiplier;
};
printf(ā€œ%d\nā€, myBlock(3));
// prints 21
Source:
https://www.khronos.org/registry/cl/sdk/2.1/docs/man/xhtml/blocks.html
The term "kernel block" only seems to be a jargon to refer to the "part of the code that is the kernel". Particularly, the kernel block in this case is simply the function that is declared to be a kernel, by adding kernel before its declaration. Or, even simpler, and from the way how the term is used on this website, I would say that "kernel block" is the same as "kernel".
The kernelName (in italics) is a placeholder. The code there shows the general pattern of how to define any kernel:
It is prefixed with kernel
It returns void
It has a name ... the kernelName, which may for example be square
It has several input- and output parameters
The reason why the kernel is called square, but invoked with square_kernel seems to be some magic that is done by XCode: It seems to read the .cl file, and creates a .h file that contains additional declarations that are derived from the .cl file (as can be seen in this question, where a kernel called rebound is defined, and GCL generated a rebound_kernel declaration).

Global variable touched by a passed-in parameter becomes unusable

folks!
I pass a struct full of data to my kernel, and I run into the following difficulty using it (very stripped down):
[edit: mac osx / xcode 3.2 on mac book pro; this compile is obviously for cpu]
typedef struct
{
float xoom;
int sizex;
} varholder;
float zX, xd;
__kernel void Harlan( __global varholder * vh )
{
int X = get_global_id(0), Y = get_global_id(1);
zX = ( ( X - vh->sizex/2 ) / vh->xoom + vh->sizex/2 ); // (a)
xd = zX; // (b) BOOM!!
}
after executing line (a), the line marked (b), a simple assignment, gives "LLVM compiler failed to compile a function".
if, however, we do not execute line (a), then line (b) is fine.
So, through my fiddling around a LOT with this, it seems as if it is the assignment statement (a), which uses a passed-in parameter, that messes up the future access of the variable zX. However, of course I need to be able to use the results of calculations further down the line.
I have zX and xd declared at the file level because my helper functions need them.
Any thoughts?
Thanks!
David
p.s. I'm now registered so will be able to upvote and accept answers, which I am sadly unable to do for the last person who helped me (used same username to register, but can't seem to vote on the old post; sorry!).
No, say it ain't so!
I am sincerely hoping that this is not a "correct" answer to my own question. I found on another forum (though not the same question asked!) the following, and I am afraid that it refers to what I'm trying to do:
(quote)
You're doing something the standard prohibits. Section 6.5 says:
'All program scope variables must be declared in the __constant address space.'
In other words, program scope variables cannot be mutable.
(end quote)
... well, tcha!!!! What an astoundingly inconvenient restriction! I'm sure there's reasoning behind it.
[edit: Not At All inconvenient! it was in fact astonishingly easy to work around, given a fresh start the next morning. (And no alcohol.)]
You guys & dolls all knew this, right, and didn't have the heart to tell me?...

Resources