I wrote a function that shall perform foldl similar to Haskell in Maxima,
if (is(li#[])) then
And it works fine in folding the list from left side and evaluating along the way hence preventing any accumulation of long unevaluated expression in buffer.
The problem I am facing in running this with,
Error in PROGN [or a callee]: Bind stack overflow.
But if I run it upto 96, it generates result appropriately.
I don't understand why is this simple addition causing problem as I don't have any infinte loop or memory hungry task going on.

Well, foldl is defined as a recursive function, and it will call itself as many times as there are elements in the list. So whether it works depends on the Lisp implementation-specific limit for the function call stack. For GCL it seems the limit is relatively small. For other Lisp implementations, the limit is greater. But the only way to make it work for all sizes of the list is to write it iteratively.
There are built-in functions similar to foldl -- see lreduce, rreduce, xreduce, and tree_reduce.


Why after pressing semicolon program is back in deep recursion?

I'm trying to understand the semicolon functionality.
I have this code:
del(X,[Y|Tail],[Y|Rest]) :-
permutation(L,[X|P]) :- del(X,L,L1), permutation(L1,P).
It's the simple predicate to show all permutations of given list.
I used the built-in graphical debugger in SWI-Prolog because I wanted to understand how it works and I understand for the first case which returns the list given in argument. Here is the diagram which I made for better understanding.
But I don't get it for the another solution. When I press the semicolon it doesn't start in the place where it ended instead it's starting with some deep recursion where L=[] (like in step 9). I don't get it, didn't the recursion end earlier? It had to go out of the recursions to return the answer and after semicolon it's again deep in recursion.
Could someone clarify that to me? Thanks in advance.
One analogy that I find useful in demystifying Prolog is that Backtracking is like Nested Loops, and when the innermost loop's variables' values are all found, the looping is suspended, the vars' values are reported, and then the looping is resumed.
As an example, let's write down simple generate-and-test program to find all pairs of natural numbers above 0 that sum up to a prime number. Let's assume is_prime/1 is already given to us.
We write this in Prolog as
above(0, N), between(1, N, M), Sum is M+N, is_prime(Sum).
We write this in an imperative pseudocode as
for N from 1 step 1:
for M from 1 step 1 until N:
Sum := M+N
if is_prime(Sum):
Now when report_to_user_and_ask is called, it prints Sum out and asks the user whether to abort or to continue. The loops are not exited, on the contrary, they are just suspended. Thus all the loop variables values that got us this far -- and there may be more tests up the loops chain that sometimes succeed and sometimes fail -- are preserved, i.e. the computation state is preserved, and the computation is ready to be resumed from that point, if the user presses ;.
I first saw this in Peter Norvig's AI book's implementation of Prolog in Common Lisp. He used mapping (Common Lisp's mapcan which is concatMap in Haskell or flatMap in many other languages) as a looping construct though, and it took me years to see that nested loops is what it is really all about.
Goals conjunction is expressed as the nesting of the loops; goals disjunction is expressed as the alternatives to loop through.
Further twist is that the nested loops' structure isn't fixed from the outset. It is fluid, the nested loops of a given loop can be created depending on the current state of that loop, i.e. depending on the current alternative being explored there; the loops are written as we go. In (most of the) languages where such dynamic creation of nested loops is impossible, it can be encoded with nested recursion / function invocation / inside the loops. (Here's one example, with some pseudocode.)
If we keep all such loops (created for each of the alternatives) in memory even after they are finished with, what we get is the AND-OR tree (mentioned in the other answer) thus being created while the search space is being explored and the solutions are found.
(non-coincidentally this fluidity is also the essence of "monad"; nondeterminism is modeled by the list monad; and the essential operation of the list monad is the flatMap operation which we saw above. With fluid structure of loops it is "Monad"; with fixed structure it is "Applicative Functor"; simple loops with no structure (no nesting at all): simply "Functor" (the concepts used in Haskell and the like). Also helps to demystify those.)
So, the proper slogan could be Backtracking is like Nested Loops, either fixed, known from the outset, or dynamically-created as we go. It's a bit longer though. :)
Here's also a Prolog example, which "as if creates the code to be run first (N nested loops for a given value of N), and then runs it." (There's even a whole dedicated tag for it on SO, too, it turns out, recursive-backtracking.)
And here's one in Scheme ("creates nested loops with the solution being accessible in the innermost loop's body"), and a C++ example ("create n nested loops at run-time, in effect enumerating the binary encoding of 2n, and print the sums out from the innermost loop").
There is a big difference between recursion in functional/imperative programming languages and Prolog (and it really became clear to me only in the last 2 weeks or so):
In functional/imperative programming, you recurse down a call chain, then come back up, unwinding the stack, then output the result. It's over.
In Prolog, you recurse down an AND-OR tree (really, alternating AND and OR nodes), selecting a predicate to call on an OR node (the "choicepoint"), from left to right, and calling every predicate in turn on an AND node, also from left to right. An acceptable tree has exactly one predicate returning TRUE under each OR node, and all predicates returning TRUE under each AND node. Once an acceptable tree has been constructed, by the very search procedure, we are (i.e. the "search cursor" is) on a rightmost bottommost node .
Success in constructing an acceptable tree also means a solution to the query entered at the Prolog Toplevel (the REPL) has been found: The variable values are output, but the tree is kept (unless there are no choicepoints).
And this is also important: all variables are global in the sense that if a variable X as been passed all the way down the call chain from predicate to predicate to the rightmost bottommost node, then constrained at the last possible moment by unifying it with 2 for example, X = 2, then the Prolog Toplevel is aware of that without further ado: nothing needs to be passed up the call chain.
If you now press ;, search doesn't restart at the top of the tree, but at the bottom, i.e. at the current cursor position: the nearest parent OR node is asked for more solutions. This may result in much search until a new acceptable tree has been constructed, we are at a new rightmost bottommost node. The new variable values are output and you may again enter ;.
This process cycles until no acceptable tree can be constructed any longer, upon which false is output.
Note that having this AND-OR as an inspectable and modifiable data structure at runtime allows some magical tricks to be deployed.
There is bound to be a lot of power in debugging tools which record this tree to help the user who gets the dreaded sphynxian false from a Prolog program that is supposed to work. There are now Time Traveling Debuggers for functional and imperative languages, after all...

R programming: Level of allowed recursion depth differs when calling a local helper function

this is a purely academic question:
I have recently been working with languages that use tail recursion optimization. For practice I wrote two recursive implementations of sum functions in R, one of them being tail recursive. I quickly realized there is no tail recursion optimization in R. I can live with that.
However, I also noticed a different level of allowed depth when using the local helper function for the tail recursion.
Here is the code:
## Recursive
sum <- function (i, end, fun){
if (i>=end) 0
else fun(i) + sum(i+1, end, fun)
## Tail recursive
sum_tail <- function (i, end, fun){
sum_helper<- function(i, acc){
if (i>=end) acc
else sum_helper(i+1, acc+fun(i))
sum_helper(i, 0)
## Simple example
harmonic <- function(k){
print(sum(1, 1200, harmonic)) # <- This works fine, but is close to the limit
# print(sum_tail(1, 1200, harmonic)) <- This will crash
print(sum_tail(1, 996, harmonic)) # <- This is the deepest allowed
I am fairly intrigued. Can someone explain this behavior or point me towards a document explaining how the allowed recursion depth is calculated?
I'm not sure of R's internal implementation of the call stack, but it's pretty obvious from here that there is a maximum stack depth. (Many languages have this for various reasons, mostly related to memory and detecting infinite recursion.) You can set it with options(), and the default setting seems to depends on the platform -- on my machine, I can do print(sum_tail(1, 996, harmonic)) without difficulty.
Sidebar: you really shouldn't name your naive implementation sum() because you wind up shadowing a builtin. I know you're just playing with recursion here, but you should also generally avoid doing your own implementation of sum() -- it's not provided just as a convenience function but also because it's non trivial to implement a numerically correct version of sum() with floating point.
In your naive implementation, the call to fun() returns before the recursive call -- this means that each recursive call increases the depth of the call stack by exactly 1. In the other case, you've got an additional function call that's waiting to be evaluated. For more details, you should look into how R handles closures and how lazy / eager evaluation in R is handled. If I recall correctly, R uses environments (roughly, R's notion of scope, and deeply related to closures) to wrap arguments in certain situations and delay their evaluation, thus effectively using lazy evaluation. There's a lot of information on R internals available online, see here for a quick overview of argument evaluation. I'm not sure how accurate I am on the details, but it seems that the arguments to the tail-call are themselves getting placed on the call stack, thus increasing the depth of the call-stack by more than 1.
Sidebar the Second: I don't recall well enough how R implements this, and I know placing helper functions in the body is common practice, but placing a helper function definition in the recursive call could lead to each recursive call defining the helper function anew. This could interact in various ways with the way environments and closures are handled, but I'm not sure.
The functions traceback() and trace() could be useful in exploring the call behavior if you're curious about more of the details.

Quicksort and tail recursive optimization

In Introduction to Algorithms p169 it talks about using tail recursion for Quicksort.
The original Quicksort algorithm earlier in the chapter is (in pseudo-code)
Quicksort(A, p, r)
if (p < r)
q: <- Partition(A, p, r)
Quicksort(A, p, q)
Quicksort(A, q+1, r)
The optimized version using tail recursion is as follows
Quicksort(A, p, r)
while (p < r)
q: <- Partition(A, p, r)
Quicksort(A, p, q)
p: <- q+1
Where Partition sorts the array according to a pivot.
The difference is that the second algorithm only calls Quicksort once to sort the LHS.
Can someone explain to me why the 1st algorithm could cause a stack overflow, whereas the second wouldn't? Or am I misunderstanding the book.
First let's start with a brief, probably not accurate but still valid, definition of what stack overflow is.
As you probably know right now there are two different kind of memory which are implemented in too different data structures: Heap and Stack.
In terms of size, the Heap is bigger than the stack, and to keep it simple let's say that every time a function call is made a new environment(local variables, parameters, etc.) is created on the stack. So given that and the fact that stack's size is limited, if you make too many function calls you will run out of space hence you will have a stack overflow.
The problem with recursion is that, since you are creating at least one environment on the stack per iteration, then you would be occupying a lot of space in the limited stack very quickly, so stack overflow are commonly associated with recursion calls.
So there is this thing called Tail recursion call optimization that will reuse the same environment every time a recursion call is made and so the space occupied in the stack is constant, preventing the stack overflow issue.
Now, there are some rules in order to perform a tail call optimization. First, each call most be complete and by that I mean that the function should be able to give a result at any moment if you interrupts the execution, in SICP
this is called an iterative process even when the function is recursive.
If you analyze your first example, you will see that each iteration is defined by two recursive calls, which means that if you stop the execution at any time you won't be able to give a partial result because you the result depends of those calls to be finished, in this scenario you can't reuse the stack environment because the total information is split between all those recursive calls.
However, the second example doesn't have that problem, A is constant and the state of p and r can be locally determined, so since all the information to keep going is there then TCO can be applied.
The essence of the tail recursion optimization is that there is no recursion when the program is actually executed. When the compiler or interpreter is able to kick TRO in, it means that it will essentially figure out how to rewrite your recursively-defined algorithm into a simple iterative process with the stack not used to store nested function invocations.
The first code snippet can't be TR-optimized because there are 2 recursive calls in it.
Tail recursion by itself is not enough. The algorithm with the while loop can still use O(N) stack space, reducing it to O(log(N)) is left as exercise in that section of CLRS.
Assume we are working in a language with array slices and tail call optimization. Consider the difference between these two algorithms:
Quicksort(arraySlice) {
if (arraySlice.length > 1) {
slices = Partition(arraySlice)
(smallerSlice, largerSlice) = sortBySize(slices)
Quicksort(largerSlice) // Not a tail call, requires a stack frame until it returns.
Quicksort(smallerSlice) // Tail call, can replace the old stack frame.
Quicksort(arraySlice) {
if (arraySlice.length > 1){
slices = Partition(arraySlice)
(smallerSlice, largerSlice) = sortBySize(slices)
Quicksort(smallerSlice) // Not a tail call, requires a stack frame until it returns.
Quicksort(largerSlice) // Tail call, can replace the old stack frame.
The second one is guarenteed to never need more than log2(length) stack frames because smallerSlice is less than half as long as arraySlice. But for the first one, the inequality is reversed and it will always need more than or equal to log2(length) stack frames, and can require O(N) stack frames in the worst case where smallerslice always has length 1.
If you don't keep track of which slice is smaller or larger, you will have similar worst cases to the first overflowing case, even though it will require O(log(n)) stack frames on average. If you always sort the smaller slice first, you will never need more than log_2(length) stack frames.
If you are using a language that doesn't have tail call optimization, you can write the second (not stack-blowing) version as:
Quicksort(arraySlice) {
while (arraySlice.length > 1) {
slices = Partition(arraySlice)
(smallerSlice, arraySlice) = sortBySize(slices)
Quicksort(smallerSlice) // Still not a tail call, requires a stack frame until it returns.
Another thing worth noting is that if you are implementing something like Introsort which changes to Heapsort if the recursion depth exceeds some number proportional to log(N), you will never hit the O(N) worst case stack memory usage of quicksort, so you technically don't need to do this. Doing this optimization (popping smaller slices first) still improves the constant factor of the O(log(N)) though, so it is strongly recommended.
Well, the most obvious observation would be:
Most common stack overflow problem - definition
The most common cause of stack overflow is excessively deep or infinite recursion.
The second uses less deep recursion than the first (n branches per call instead of n^2) hence it is less likely to cause a stack overflow..
(so lower complexity means less chance to cause a stack overflow)
But somebody would have to add why the second can never cause a stack overflow while the first can.
Well If you consider the complexity of the two methods the first method obviously has more complexity than the second since it calls Recursion on both LHS and RHS as a result there are more chances of getting stack overflow
Note: That doesnt mean that there are absolutely no chances of getting SO in second method
In the function 2 that you shared, Tail Call elimination is implemented. Before proceeding further let us understand what is tail recursion function?. If the last statement in the code is the recursive call and does do anything after that, then it is called tail recursive function. So the first function is a tail recursion function. For such a function with some changes in the code one can remove the last recursion call like you showed in function 2 which performs the same work as function 1. This process is called tail recursion optimization or tail call elimination and following are the result of it
Optimizing in terms of auxiliary space
Optimizing in terms of recursion call overhead
Last recursive call is eliminated by using the while loop. The good thing is that for function 2, no auxiliary space is used for the right call as its recursion is eliminated using p: <- q+1 and the overall function does not have recursion call overhead. So whatever way partition happens maximum space needed is theta(log n)

How does one implement a "stackless" interpreted language?

I am making my own Lisp-like interpreted language, and I want to do tail call optimization. I want to free my interpreter from the C stack so I can manage my own jumps from function to function and my own stack magic to achieve TCO. (I really don't mean stackless per se, just the fact that calls don't add frames to the C stack. I would like to use a stack of my own that does not grow with tail calls). Like Stackless Python, and unlike Ruby or... standard Python I guess.
But, as my language is a Lisp derivative, all evaluation of s-expressions is currently done recursively (because it's the most obvious way I thought of to do this nonlinear, highly hierarchical process). I have an eval function, which calls a Lambda::apply function every time it encounters a function call. The apply function then calls eval to execute the body of the function, and so on. Mutual stack-hungry non-tail C recursion. The only iterative part I currently use is to eval a body of sequential s-expressions.
(defun f (x y)
(a x y)) ; tail call! goto instead of call.
; (do not grow the stack, keep return addr)
(defun a (x y)
(+ x y))
; ...
(print (f 1 2)) ; how does the return work here? how does it know it's supposed to
; return the value here to be used by print, and how does it know
; how to continue execution here??
So, how do I avoid using C recursion? Or can I use some kind of goto that jumps across c functions? longjmp, perhaps? I really don't know. Please bear with me, I am mostly self- (Internet- ) taught in programming.
One solution is what is sometimes called "trampolined style". The trampoline is a top-level loop that dispatches to small functions that do some small step of computation before returning.
I've sat here for nearly half an hour trying to contrive a good, short example. Unfortunately, I have to do the unhelpful thing and send you to a link:
The paper is called "Scheme: An Interpreter for Extended Lambda Calculus", and section 5 implements a working scheme interpreter in an outdated dialect of Lisp. The secret is in how they use the **CLINK** instead of a stack. The other globals are used to pass data around between the implementation functions like the registers of a CPU. I would ignore **QUEUE**, **TICK**, and **PROCESS**, since those deal with threading and fake interrupts. **EVLIS** and **UNEVLIS** are, specifically, used to evaluate function arguments. Unevaluated args are stored in **UNEVLIS**, until they are evaluated and out into **EVLIS**.
Functions to pay attention to, with some small notes:
MLOOP: MLOOP is the main loop of the interpreter, or "trampoline". Ignoring **TICK**, its only job is to call whatever function is in **PC**. Over and over and over.
SAVEUP: SAVEUP conses all the registers onto the **CLINK**, which is basically the same as when C saves the registers to the stack before a function call. The **CLINK** is actually a "continuation" for the interpreter. (A continuation is just the state of a computation. A saved stack frame is technically continuation, too. Hence, some Lisps save the stack to the heap to implement call/cc.)
RESTORE: RESTORE restores the "registers" as they were saved in the **CLINK**. It's similar to restoring a stack frame in a stack-based language. So, it's basically "return", except some function has explicitly stuck the return value into **VALUE**. (**VALUE** is obviously not clobbered by RESTORE.) Also note that RESTORE doesn't always have to return to a calling function. Some functions will actually SAVEUP a whole new computation, which RESTORE will happily "restore".
AEVAL: AEVAL is the EVAL function.
EVLIS: EVLIS exists to evaluate a function's arguments, and apply a function to those args. To avoid recursion, it SAVEUPs EVLIS-1. EVLIS-1 would just be regular old code after the function application if the code was written recursively. However, to avoid recursion, and the stack, it is a separate "continuation".
I hope I've been of some help. I just wish my answer (and link) was shorter.
What you're looking for is called continuation-passing style. This style adds an additional item to each function call (you could think of it as a parameter, if you like), that designates the next bit of code to run (the continuation k can be thought of as a function that takes a single parameter). For example you can rewrite your example in CPS like this:
(defun f (x y k)
(a x y k))
(defun a (x y k)
(+ x y k))
(f 1 2 print)
The implementation of + will compute the sum of x and y, then pass the result to k sort of like (k sum).
Your main interpreter loop then doesn't need to be recursive at all. It will, in a loop, apply each function application one after another, passing the continuation around.
It takes a little bit of work to wrap your head around this. I recommend some reading materials such as the excellent SICP.
Tail recursion can be thought of as reusing for the callee the same stack frame that you are currently using for the caller. So you could just re-set the arguments and goto to the beginning of the function.

How does erlang handle case statements mixed with tail recursion

Let's say I have this code here:
do_recv_loop(State) ->
{do,Stuff} ->
case Stuff of
one_thing ->
another_thing ->
_ ->
{die} -> im_dead_now;
_ -> do_recv_loop(State)
Now, in theory this is tail-recursive, as none of the three calls to do_recv_loop require anything to be returned. But will erlang recognize that this is tail recursive and optimize appropriately? I'm worried that the nested structure might make it not able to recognize it.
Yes, it will. Erlang is required to optimize tail calls, and this is clearly a tail call since nothing happens after the function is called.
I used to wish there were a tailcall keyword in Erlang so the compiler could warn me about invalid uses, but then I got used to it.
Yes, it is tail recursive. The main gotcha to be aware of is if you are wrapped inside exceptions. In that case, sometimes the exception needs to live on the stack and that will make something that looks tail-recursive into something deceptively not so.
The tail-call optimization is applicable if the call is in tail-position. Tail position is the "last thing before the function will return". Note that in
fact(0) -> 1;
fact(N) -> N * fact(N-1).
the recursive call to fact is not in tail position because after fact(N-1) is calculated, you need to run the continuation N * _ (i.e., multiply by N).
This I think is relevant because you are asking about how you know if your recursive function is optimized by the compiler. Since you aren't using lists:reverse/1 the below might not apply but for someone else with the exact same question but with a different code example it might be very relevant.
From the The Eight Myths of Erlang Performance in the Erlang Efficiency Guide
In R12B and later releases, there is
an optimization that will in many
cases reduces the number of words used
on the stack in body-recursive calls,
so that a body-recursive list function
and tail-recursive function that calls
lists:reverse/1 at the end will use
exactly the same amount of memory.
I think the take away message is that you may have to measure in some cases to see what will be best.
I'm pretty new to Erlang but from what I've gathered, the rule seems to be that in order to be tail-recursive, the function has to do one of two things in any given logical branch:
not make a recursive call
return the value of the recursive call and do nothing else after it
That recursive call can be nested into as many if, case, or receive calls as you want as long as nothing actually happens after it.
