Opencl does not support recursive functions but does this cover indirect versions too?
void recursiveA(int *a,int b) // call this first to start recursion
{
a[b]=3;
if(b<10)
{
recursiveB(a,b+1); // A calls B
}
}
void recursiveB(int *a, int b)
{
a[b]=3;
if(b<10)
{
recursiveA(a,b+1); // B calls A while A still not finished before
// and entry point & arguments of A are corrupt ?
}
}
instead of
void recurse(int *a, int b)
{
a[b]=3;
if(b<10)
{
recurse(a,b+1); // some OpenCL devices does not have the ability so this is not
// possible in OpenCL
}
}
So, can we call a "R" function from another function even if first "R" is not finished? Those functions use only same constant addresses for arguments everytime we call them?
Do I have to use a custom "stack" implementation to do indirect recursion until Opencl 2.0 is released?
OpenCL does not support recursive control flow, which includes mutual recursion. Therefore, to ensure that your code works properly on every platform you may wish to target, you should refrain from using any form of recursion, and instead write your algorithms using an iterative approach.
In practice, the OpenCL compilers may be able to handle certain recursive algorithms just fine. For example, if your function is tail-recurisve, then the compiler can produce a non-recurisve form by applying standard tail-call optimisation techniques. I've just tried the second recursive code snippet you posted, and it was accepted by multiple OpenCL compilers. The first code snippet caused them all to crash, which indicates that they couldn't apply the necessary transformations to avoid recursive calls (though clearly they should produce a suitable error message rather than crashing).
So, you may be able to get away with simple recursion with some OpenCL implementations, but for maximum portability across different platforms I would strongly recommend that you avoid it.
Related
In dynamic languages, how is dynamically typed code JIT compiled into machine code? More specifically: does the compiler infer the types at some point? Or is it strictly interpreted in these cases?
For example, if I have something like the following pseuocode
def func(arg)
if (arg)
return 6
else
return "Hi"
How can the execution platform know before running the code what the return type of the function is?
In general, it doesn't. However, it can assume either type, and optimize for that. The details depend on what kind of JIT it is.
The so-called tracing JIT compilers interpret and observe the program, and record types, branches, etc. for a single run (e.g. loop iteration). They record these observations, insert a (quite fast) check that these assumptions are still true when the code is executed, and then optimize the heck out of the following code based on these assuptions. For example, if your function is called in a loop with a constantly true argument and adds one to it, the JIT compiler first records instructions like this (we'll ignore call frame management, memory allocation, variable indirection, etc. not because those aren't important, but because they take a lot of code and are optimized away too):
; calculate arg
guard_true(arg)
boxed_object o1 = box_int(6)
guard_is_boxed_int(o1)
int i1 = unbox_int(o1)
int i2 = 1
i3 = add_int(res2, res3)
and then optimizes it like this:
; calculate arg
; may even be elided, arg may be constant without you realizing it
guard_true(arg)
; guard_is_boxed_int constant-folded away
; unbox_int constant-folded away
; add_int constant-folded away
int i3 = 7
Guards can also be moved to allow optimizing earlier code, combined to have fewer guards, elided if they are redundant, strengthened to allow more optimizations, etc.
If guards fail too frequently, or some code is otherwise rendered useless, it can be discarded, or at least patched to jump to a different version on guard failure.
Other JITs take a more static approach. For instance, you can do quick, inaccurate type inference to at least recognize a few operations. Some JIT compilers only operate on function scope (they are thus called method JIT compilers by some), so they probably can't make much of your code snippet (one reason tracing JIT compilers are very popular for). Nevertheless, they exist -- an example is the latest revision of Mozilla's JavaScript engine, Ion Monkey, although it apparently takes inspiration from tracing JITs as well. You can also insert add not-always-valid optimizations (e.g. inline a function that may be changed later) and remove them when they become wrong.
When all else fails, you can do what interpreters do, box objects, use pointers to them, tag the data, and select code based on the tag. But this is extremely inefficient, the whole purpose of JIT compilers is getting rid of that overhead, so they will only do that when there is no reasonable alternative (or when they are still warming up).
I know that there is no way using std classes such as string, vector, map or set in CUDA kernel. However, it's very uncomfortable without them. I have to write a lot of code in CUDA kernel, so I would like to use at least strings and vectors. I'm not talking about something like thrust. I want to be able to write something like this:
__global__ void kernel()
{
cuda_vector<int> a;
for(int i=0;i<10;i++)
a.push_back(i);
}
int main()
{
kernel<<<1,512>>>();
return 0;
}
This should create 512 threads and in each thread I want to create cuda_vector class and use it as std::vector. I didn't find any solution on the internet and I started to write my own class. Each function of this class is defined as "__ host __ " and " __ device __" function so that I can use it on both CPU and GPU.
Theoretically, it can be implemented, however only on Fermi architecture. Because, we need to allocate memory dynamically. I have GTX 580 and started to write my own Vector. But it's tiring and needs a lot of time. Isn't there any implementation which I can use? I can't believe that there isn't any. Do so many software developers write on CUDA without it? And noone tried to write his/her own version?
The reason you don't find something like std::vector for cuda is performance. Your traditional vector object doesn't fit well with the CUDA model. If you are planning on using only 512 threads and each one will be managing a std::vector like object your performance is going to be worse than running the same code on the CPU.
GPU threads are not like CPU threads, they should be as light as possible. Use thread blocks and shared memory to have the threads cooperate. If you are manipulating a string, each thread should be working on one character, if you are using vectors in the CPU pass an array of that to the GPU, and have each thread work on one element. Basically, think about how to solve the problem with the CUDA programming model as apposed to solving it with a CPU approach and then translating it to CUDA.
I've not used it, but the CuPP framework may be of interest to you, especially the vector<T> implementation. Looks like it could do what you need it to do.
To my mind the power of functional purity is when deep code paths can be verified as side-effect free. What are people's experiences in the scale of the code tree that can be inside a pure specifier, and what of the level of code reuse?
A few things I spotted:
std.algorithm is mostly not marked as pure, but could potentially be largely pure, either by a pure version of algorithms demanding purity of the instantiating function or mixin, or else by the purity specifier itself being statically polymorphic.
Useful convertors like to!string( someInt ) aren't currently pure.
User defined structs seem to have problems (as illustrated below) with:
1. pure destructors on a nested struct
2. a pure postblit function even on a non-nested struct
The following code currently gives multiple errors on DMD 2.052 win 32-bit
struct InnerStruct
{
pure this(this) {}
pure ~this() {}
}
struct OuterStruct
{
InnerStruct innerStruct;
pure this(this) {}
pure ~this() {}
}
pure void somePureFunc()
{
OuterStruct s1 = OuterStruct(); // pure nested destructor does not compile
OuterStruct s2 = s1;
InnerStruct is1 = InnerStruct(); // pure non-nested destructor seems to compile
InnerStruct is2 = is1; // pure non-nested postblit does not compile
}
void main()
{
somePureFunc();
}
pure_postblit.d(18): Error: pure function 'somePureFunc' cannot call impure function '__cpctor'
pure_postblit.d(20): Error: pure function 'somePureFunc' cannot call impure function '__cpctor'
pure_postblit.d(18): Error: pure function 'somePureFunc' cannot call impure function '~this'
pure_postblit.d(17): Error: pure function 'somePureFunc' cannot call impure function '~this'
In theory the point of pure in D is that it's supposed to allow guarantees that a function is side effect free regardless of how that function is implemented. There are two kinds of purity in D:
All functions marked pure are weakly pure. They may not access any global mutable state (global variables, thread-local variables, static variables, etc.) or perform I/O. They may, however, mutate their arguments. The point of these functions is that they may be called from strongly pure functions (detailed below) without violating the guarantees of strong purity.
All functions that are weakly pure and do not have any arguments with mutable indirection are strongly pure. The const and immutable type constructors can be used to guarantee this. (When dealing with structs and classes, the this pointer is considered a parameter.) Strongly pure functions have all of the nice properties that functional programming people talk about, even if they're implemented using mutable state. A strongly pure function always returns the same value for any given arguments and has no observable side effects. Strongly pure functions are referentially transparent, meaning their return value may be substituted for a call to them with a given set of parameters without affecting observable behavior. Any strongly pure function can be safely executed in parallel with any other strongly pure function.
Unfortunately the interaction between generic code and pure (as well as const and immutable) is rather poor. There have been several proposals to fix this, but none have been accepted yet.
\std.algorithm is written to be as generic as possible, so it can't require that its lambda functions and the ranges it accepts be pure. Furthermore, the type system features that were added in D2 are generally the most buggy features in the language, because more basic things have been prioritized ahead of fixing the relevant issues. Right now, pure is basically not usable except for trivial cases like std.math.
I know that pointers in Go allow mutation of a function's arguments, but wouldn't it have been simpler if they adopted just references (with appropriate const or mutable qualifiers). Now we have pointers and for some built-in types like maps and channels implicit pass by reference.
Am I missing something or are pointers in Go just an unnecessary complication?
Pointers are usefull for several reasons. Pointers allow control over memory layout (affects efficiency of CPU cache). In Go we can define a structure where all the members are in contiguous memory:
type Point struct {
x, y int
}
type LineSegment struct {
source, destination Point
}
In this case the Point structures are embedded within the LineSegment struct. But you can't always embed data directly. If you want to support structures such as binary trees or linked list, then you need to support some kind of pointer.
type TreeNode {
value int
left *TreeNode
right *TreeNode
}
Java, Python etc doesn't have this problem because it does not allow you to embed composite types, so there is no need to syntactically differentiate between embedding and pointing.
Issues with Swift/C# structs solved with Go pointers
A possible alternative to accomplish the same is to differentiate between struct and class as C# and Swift does. But this does have limitations. While you can usually specify that a function takes a struct as an inout parameter to avoid copying the struct, it doesn't allow you to store references (pointers) to structs. This means you can never treat a struct as a reference type when you find that useful e.g. to create a pool allocator (see below).
Custom Memory Allocator
Using pointers you can also create your own pool allocator (this is very simplified with lots of checks removed to just show the principle):
type TreeNode {
value int
left *TreeNode
right *TreeNode
nextFreeNode *TreeNode; // For memory allocation
}
var pool [1024]TreeNode
var firstFreeNode *TreeNode = &pool[0]
func poolAlloc() *TreeNode {
node := firstFreeNode
firstFreeNode = firstFreeNode.nextFreeNode
return node
}
func freeNode(node *TreeNode) {
node.nextFreeNode = firstFreeNode
firstFreeNode = node
}
Swap two values
Pointers also allows you to implement swap. That is swapping the values of two variables:
func swap(a *int, b *int) {
temp := *a
*a = *b
*b = temp
}
Conclusion
Java has never been able to fully replace C++ for systems programming at places such as Google, in part because performance can not be tuned to the same extend due to the lack of ability to control memory layout and usage (cache misses affect performance significantly). Go has aimed to replace C++ in many areas and thus needs to support pointers.
I really like example taken from https://www.golang-book.com/books/intro/8
func zero(x int) {
x = 0
}
func main() {
x := 5
zero(x)
fmt.Println(x) // x is still 5
}
as contrasted with
func zero(xPtr *int) {
*xPtr = 0
}
func main() {
x := 5
zero(&x)
fmt.Println(x) // x is 0
}
Go is designed to be a terse, minimalist language. It therefore started with just values and pointers. Later, by necessity, some reference types (slices, maps, and channels) were added.
The Go Programming Language : Language Design FAQ : Why are maps, slices, and channels references while arrays are values?
"There's a lot of history on that topic. Early on, maps and channels were syntactically pointers and it was impossible to declare or use a non-pointer instance. Also, we struggled with how arrays should work. Eventually we decided that the strict separation of pointers and values made the language harder to use. Introducing reference types, including slices to handle the reference form of arrays, resolved these issues. Reference types add some regrettable complexity to the language but they have a large effect on usability: Go became a more productive, comfortable language when they were introduced."
Fast compilation is a major design goal of the Go programming language; that has its costs. One of the casualties appears to be the ability to mark variables (except for basic compile time constants) and parameters as immutable. It's been requested, but turned down.
golang-nuts : go language. Some feedback and doubts.
"Adding const to the type system forces it to appear everywhere, and
forces one to remove it everywhere if something changes. While there
may be some benefit to marking objects immutable in some way, we don't
think a const type qualifier is to way to go."
References cannot be reassigned, while pointers can. This alone makes pointers useful in many situations where references could not be used.
Rather than answering it in the context of “Go”, I would answer this question in the context of any language (e.g. C, C++, Go) which implements the concept of "pointers"; and the same reasoning can be applied to “Go” as well.
There are typically two memory sections where the memory allocation takes place: the Heap Memory and the Stack Memory (let’s not include “global section/memory” as it would go out of context).
Heap Memory: this is what most of the languages make use of: be it Java, C#, Python… But it comes with a penalty called the “Garbage Collection” which is a direct performance hit.
Stack Memory: variables can be allocated in the stack memory in languages like C, C++, Go, Java. Stack memory doesn’t require garbage collection; hence it is a performant alternative to the heap memory.
But there is a problem: when we allocate an object in the heap memory, we get back a “Reference” which can be passed to “multiple methods/functions” and it is through the reference, “multiple methods/functions” can read/update the same object(allocated in the heap memory) directly. Sadly, the same is not true for the stack memory; as we know whenever a stack variable is passed to a method/function, it is “passed by value”(e.g. Java) provided you have the “concept of pointers”(as in the case of C, C++, Go).
Here is where pointers come into picture. Pointes let “multiple methods/functions” read/update the data which is placed in the stack memory.
In a nutshell, “pointers” allow the use of “stack memory” instead of the heap memory in order to process variables/structures/objects by “multiple methods/functions”; hence, avoiding performance hit caused by the garbage collection mechanism.
Another reason for introducing pointers in Go could be: Go is ought to be an "Efficient System Programming Language" just like C, C++, Rust etc. and work smoothly with the system calls provided by the underlying Operating System as many of the system call APIs have pointers in their prototype.
One may argue that it can done by introducing a pointer-free layer on top of the system call interface. Yes, it can be done but having pointers would be like acting very close to the system call layer which is trait of a good System Programming Language.
I'm facing a problem where both recursion and using a loop seem like natural solutions. Is there a convention or "preferred method" for cases like this? (Obviously it is not quite as simple as below)
Recursion
Item Search(string desired, Scope scope) {
foreach(Item item in scope.items)
if(item.name == desired)
return item;
return scope.Parent ? Search(desired, scope.Parent) : null;
}
Loop
Item Search(string desired, Scope scope) {
for(Scope cur = scope; cur != null; cur = cur.Parent)
foreach(Item item in cur.items)
if(item.name == desired)
return item;
return null;
}
I favor recursive solutions when:
The implementation of the recursion is much simpler than the iterative solution, usually because it exploits a structural aspect of the problem in a way that the iterative approach cannot
I can be reasonably assured that the depth of the recursion will not cause a stack overflow, assuming we're talking about a language that implements recursion this way
Condition 1 doesn't seem to be the case here. The iterative solution is about the same level of complexity, so I'd stick with the iterative route.
If performance matters, then benchmark both and choose on a rational basis. If not, then choose based on complexity, with concern for possible stack overflow.
There is a guideline from the classic book The Elements of Programming Style (by Kernighan and Plauger) that algorithm should follow data structure. That is, recursive structures are often processed more clearly with recursive algorithms.
Recursion is used to express an algorithm that is naturally recursive in a form that is more easily understandable. A "naturally recursive" algorithm is one where the answer is built from the answers to smaller sub-problems which are in turn built from the answers to yet smaller sub-problems, etc. For example, computing a factorial.
In a programming language that is not functional, an iterative approach is nearly always faster and more efficient than a recursive approach, so the reason to use recursion is clarity, not speed. If a recursive implementation ends up being less clear than an iterative implementation, then by all means avoid it.
In this particular case, I would judge the iterative implementation to be clearer.
If you are using a functional language (doesn't appear to be so), go with recursion. If not, the loop will probably be better understood by anyone else working on the project. Of course, some tasks (like recursively searching a directory) are better suited to recursion than others.
Also, if the code cannot be optimized for tail end recursion, the loop is safer.
Well I saw tons of answers and even accepted answer but never saw the correct one and was thinking why...
Long story short :
Always avoid recursions if you can make same unit to be produced by loops!
How does recursion work?
• The Frame in Stack Memory is being allocated for a single function call
• The Frame contains reference to the actual method
• If method has objects, the objects are being put into Heap memory and Frame will contain reference to that objects in Heap memory.
•These steps are being done for each single method call!
Risks :
• StackOverFlow when the stack has no memory to put new recursive methods.
• OutOfMemory when the Heap has no memory to put recursive stored objects.
How Does loop work?
• All the steps before, except that the execution of repeatedly code inside the loop will not consume any further data if already consumed.
Risks :
• Single risk is inside while loop when your condition will just never exist...
Well that won't cause any crashes or anything else, it just won't quit the loop if you naively do while(true) :)
Test:
Do next in your software:
private Integer someFunction(){
return someFunction();
}
You will get StackOverFlow exception in a second and maybe OutOfMemory too
Do second:
while(true){
}
The software will just freeze and no crash will happen:
Last but not least - for loops :
Always go with for loops because this or that way this loop somewhat forces you to give the breaking point beyond which the loop won't go, surely you can be super angry and just find a way to make for loop never stop but I advice you to always use loops instead of recursion in sake of memory management and better productivity for your software which is a huge issue now days.
References:
Stack-Based memory allocation
Use the loop. It's easier to read and understand (reading code is always a lot harder than writing it), and is generally a lot faster.
It is provable that all tail-recursive algorithms can be unrolled into a loop, and vice versa. Generally speaking, a recursive implementation of a recursive algorithm is clearer to follow for the programmer than the loop implementation, and is also easier to debug. Also generally speaking, the real-world performance of the loop implementation will be faster, as a branch/jump in a loop is typically faster to execute than pushing and popping a stack frame.
Personally speaking, for tail-recursive algorithms I prefer sticking with the recursive implementation in all but the most-performance-intensive situations.
I prefer loops as
Recursion is error-prone
All the code remains into one function / method
Memory and speed savings
I use stacks (LIFO schema) to make the loops work
In java, stacks is covered with Deque interface
// Get all the writable folders under one folder
// java-like pseudocode
void searchWritableDirs(Folder rootFolder){
List<Folder> response = new List<Folder>(); // Results
Deque<Folder> folderDeque = new Deque<Folder>(); // Stack with elements to inspect
folderDeque.add(rootFolder);
while( ! folderDeque.isEmpty()){
Folder actual = folder.pop(); // Get last element
if (actual.isWritable()) response.add(actual); // Add to response
for(Folder actualSubfolder: actual.getSubFolder()) {
// Here we iterate subfolders, with this recursion is not needed
folderDeque.push(actualSubfolder);
}
}
log("Folders " + response.size());
}
Less complicated, more compact than
// Get all the writable folders under one folder
// java-like pseudocode
void searchWritableDirs(Folder rootFolder){
List<Folder> response = new List<Folder>(); // Results
rec_searchWritableDirs(actualSubFolder,response);
log("Folders " + response.size());
}
private void rec_searchWritableDirs(Folder actual,List<Folder> response) {
if (actual.isWritable()) response.add(actual); // Add to response
for(Folder actualSubfolder: actual.getSubFolder()) {
// Here we iterate subfolders, recursion is needed
rec_searchWritableDirs(actualSubFolder,response);
}
}
The latter has less code, but two functions and it's harder to understand code, IMHO.
I would say the recursion version is better understandable, but only with comments:
Item Search(string desired, Scope scope) {
// search local items
foreach(Item item in scope.items)
if(item.name == desired)
return item;
// also search parent
return scope.Parent ? Search(desired, scope.Parent) : null;
}
It is far easier to explain this version. Try to write a nice comment on the loop version and you will see.
I find the recursion more natural, but you may be forced to use the loop if your compiler doesn't do tail call optimization and your tree/list is too deep for the stack size.
If the system you're working on has a small stack (embedded systems), the recursion depth would be limited, so choosing the loop-based algorithm would be desired.
I usually prefer the use of loops. Most good OOP designs will allow you to use loops without having to use recursion (and thus stopping the program from pushing all those nasty parameters and addresses to the stack).
It has more of a use in procedural code where it seems more logical to think in a recursive manner (due to the fact that you can't easily store state or meta-data (information?) and thus you create more situations that would merit it's use).
Recursion is good for proto-typing a function and/or writing a base, but after you know the code works and you go back to it during the optimization phase, try to replace it with a loop.
Again, this is all opinionated. Go with what works best for you.
You can also write the loop in a more readable format. C's for(init;while;increment) have some readability drawbacks since the increment command is mentioned at the start but executed at the end of the loop.
Also YOUR TWO SAMPLES ARE NOT EQUIVALENT. The recursive sample will fail and the loop will not, if you call it as: Search(null,null). This makes the loop version better for me.
Here are the samples modified (and assuming null is false)
Recursion (fixed and tail-call optimizable)
Item Search(string desired, Scope scope) {
if (!scope) return null
foreach(Item item in scope.items)
if(item.name == desired)
return item;
//search parent (recursive)
return Search(desired, scope.Parent);
}
Loop
Item Search(string desired, Scope scope) {
// start
Scope cur = scope;
while(cur) {
foreach(Item item in cur.items)
if(item.name == desired)
return item;
//search parent
cur = cur.Parent;
} //loop
return null;
}
If your code is compiled it will likely make little difference. Do some testing and see how much memory is used and how fast it runs.
Avoid recursion. Chances are that piece of code will have to be maintained eventually at some point and it'll be easier if it is not done with recursion. Second, it'll most likely have a slower execution time.