How does one implement a "stackless" interpreted language? - recursion

I am making my own Lisp-like interpreted language, and I want to do tail call optimization. I want to free my interpreter from the C stack so I can manage my own jumps from function to function and my own stack magic to achieve TCO. (I really don't mean stackless per se, just the fact that calls don't add frames to the C stack. I would like to use a stack of my own that does not grow with tail calls). Like Stackless Python, and unlike Ruby or... standard Python I guess.
But, as my language is a Lisp derivative, all evaluation of s-expressions is currently done recursively (because it's the most obvious way I thought of to do this nonlinear, highly hierarchical process). I have an eval function, which calls a Lambda::apply function every time it encounters a function call. The apply function then calls eval to execute the body of the function, and so on. Mutual stack-hungry non-tail C recursion. The only iterative part I currently use is to eval a body of sequential s-expressions.
(defun f (x y)
(a x y)) ; tail call! goto instead of call.
; (do not grow the stack, keep return addr)
(defun a (x y)
(+ x y))
; ...
(print (f 1 2)) ; how does the return work here? how does it know it's supposed to
; return the value here to be used by print, and how does it know
; how to continue execution here??
So, how do I avoid using C recursion? Or can I use some kind of goto that jumps across c functions? longjmp, perhaps? I really don't know. Please bear with me, I am mostly self- (Internet- ) taught in programming.

One solution is what is sometimes called "trampolined style". The trampoline is a top-level loop that dispatches to small functions that do some small step of computation before returning.
I've sat here for nearly half an hour trying to contrive a good, short example. Unfortunately, I have to do the unhelpful thing and send you to a link:
http://en.wikisource.org/wiki/Scheme:_An_Interpreter_for_Extended_Lambda_Calculus/Section_5
The paper is called "Scheme: An Interpreter for Extended Lambda Calculus", and section 5 implements a working scheme interpreter in an outdated dialect of Lisp. The secret is in how they use the **CLINK** instead of a stack. The other globals are used to pass data around between the implementation functions like the registers of a CPU. I would ignore **QUEUE**, **TICK**, and **PROCESS**, since those deal with threading and fake interrupts. **EVLIS** and **UNEVLIS** are, specifically, used to evaluate function arguments. Unevaluated args are stored in **UNEVLIS**, until they are evaluated and out into **EVLIS**.
Functions to pay attention to, with some small notes:
MLOOP: MLOOP is the main loop of the interpreter, or "trampoline". Ignoring **TICK**, its only job is to call whatever function is in **PC**. Over and over and over.
SAVEUP: SAVEUP conses all the registers onto the **CLINK**, which is basically the same as when C saves the registers to the stack before a function call. The **CLINK** is actually a "continuation" for the interpreter. (A continuation is just the state of a computation. A saved stack frame is technically continuation, too. Hence, some Lisps save the stack to the heap to implement call/cc.)
RESTORE: RESTORE restores the "registers" as they were saved in the **CLINK**. It's similar to restoring a stack frame in a stack-based language. So, it's basically "return", except some function has explicitly stuck the return value into **VALUE**. (**VALUE** is obviously not clobbered by RESTORE.) Also note that RESTORE doesn't always have to return to a calling function. Some functions will actually SAVEUP a whole new computation, which RESTORE will happily "restore".
AEVAL: AEVAL is the EVAL function.
EVLIS: EVLIS exists to evaluate a function's arguments, and apply a function to those args. To avoid recursion, it SAVEUPs EVLIS-1. EVLIS-1 would just be regular old code after the function application if the code was written recursively. However, to avoid recursion, and the stack, it is a separate "continuation".
I hope I've been of some help. I just wish my answer (and link) was shorter.

What you're looking for is called continuation-passing style. This style adds an additional item to each function call (you could think of it as a parameter, if you like), that designates the next bit of code to run (the continuation k can be thought of as a function that takes a single parameter). For example you can rewrite your example in CPS like this:
(defun f (x y k)
(a x y k))
(defun a (x y k)
(+ x y k))
(f 1 2 print)
The implementation of + will compute the sum of x and y, then pass the result to k sort of like (k sum).
Your main interpreter loop then doesn't need to be recursive at all. It will, in a loop, apply each function application one after another, passing the continuation around.
It takes a little bit of work to wrap your head around this. I recommend some reading materials such as the excellent SICP.

Tail recursion can be thought of as reusing for the callee the same stack frame that you are currently using for the caller. So you could just re-set the arguments and goto to the beginning of the function.

Related

How is it possible that a function can call itself

I know about recursion, but I don't know how it's possible. I'll use the fallowing example to further explain my question.
(def (pow (x, y))
(cond ((y = 0) 1))
(x * (pow (x , y-1))))
The program above is in the Lisp language. I'm not sure if the syntax is correct since I came up with it in my head, but it will do. In the program, I am defining the function pow, and in pow it calls itself. I don't understand how it's able to do this. From what I know the computer has to completely analyze a function before it can be defined. If this is the case, then the computer should give an undefined message when I use pow because I used it before it was defined. The principle I'm describing is the one at play when you use an x in x = x + 1, when x was not defined previously.
Compilers are much smarter than you think.
A compiler can turn the recursive call in this definition:
(defun pow (x y)
(cond ((zerop y) 1)
(t (* x (pow x (1- y))))))
into a goto intruction to re-start the function from scratch:
Disassembly of function POW
(CONST 0) = 1
2 required arguments
0 optional arguments
No rest parameter
No keyword parameters
12 byte-code instructions:
0 L0
0 (LOAD&PUSH 1)
1 (CALLS2&JMPIF 172 L15) ; ZEROP
4 (LOAD&PUSH 2)
5 (LOAD&PUSH 3)
6 (LOAD&DEC&PUSH 3)
8 (JSR&PUSH L0)
10 (CALLSR 2 57) ; *
13 (SKIP&RET 3)
15 L15
15 (CONST 0) ; 1
16 (SKIP&RET 3)
If this were a more complicated recursive function that a compiler cannot unroll into a loop, it would merely call the function again.
From what I know the computer has to completely analyze a function before it can be defined.
When the compiler sees that one defines a function POW, then it tells itself: now we are defining function POW. If it then inside the definition sees a call to POW, then the compiler says to itself: oh, this seems to be a call to the function that I'm currently compiling and it can then create code to make a recursive call.
A function is just a block of code. It's name is just help so you don't have to calculate the exact address it will end up in. The programming language will turn the names into where the program is to go to execute.
How one function call another is by storing the address of the next command in this function on the stack, perhaps add arguments to the stack and then jump to the address location of the function. The function itself jumps to the return address it finds so that control goes back to the callee. There are several calling conventions implemented by the language on which side do what. CPUs don't really have function support so just like there is nothing called a while loop in CPUs functions are emulated.
Just like functions have names, arguments have names too, however they are mere pointers just like the return address. When calling itself it just adds a new return address and arguments onto the stack and jump to itself. The top of the stack will be different and thus the same variable names are unique addresses to the call so x and y in the previous call is somewhere else than the current x and y. In fact there is no special treatment needed for calling itself than calling anything else.
Historically the first high level language, Fortran, did not support recursion. It would call itself but when it returned it returned to the original callee without doing the rest of the function after the self call. Fortran itself would have been impossible to write without recursion so while itself used recursion it did not offer it to the programmer that used it. This limitation is the reason why John McCarthy discovered Lisp.
I think to see how this can work in general, and in particular in cases where recursive calls can't be turned into loops, it's worth thinking about how a general compiled language might work, because the problems are not different.
Let's imagine how a compiler might turn this function into machine code:
(defun foo (x)
(+ x (bar x)))
And let's assume that it does not know anything about bar at the time of compilation. Well, it has two options.
It can compile foo in such a way that the call to bar is translated a set of instructions which say, 'look up the function definition stored under the name bar, whatever it currently is, and arrange to call that function with the right arguments'.
It can compile foo in such a way that there is a machine-level function call to a function but the address of that function is left as a placeholder of some kind. And it can then attach some metadata to foo which says: 'before this function is called you need to find the function named bar, find its address, splice it into the code in the right place, and remove this metadata.
Both of these mechanisms allow foo to be defined before it's known what bar is. And note that instead of bar I could have written foo: these mechanisms deal with recursive calls too. They differ apart from that, however.
The first mechanism means that, every time foo is called it needs to do some kind of dynamic lookup for bar which will involve some overhead (but this overhead can be pretty small):
as a consequence of this the first mechanism will be slightly slower than it might be;
but, also as a consequence of this, if bar gets redefined, then the new definition will get picked up, which is a very desirable thing for an interactive language, which Lisp implementations usually are.
The second mechanism means that, after foo has all its references to other functions linked in to it, then the calls happen at the machine level:
this means they will be quick;
but that redefinition will be, at best, more complicated or, at worst, not possible at all.
The second of these implementations is close to how traditional compilers compile code: they compile code leaving a bunch of placeholders with associated metadata saying what names those placeholders correspond to. A linker, (sometimes known as a link-loader, or loader) then grovels over all the files produced by the compiler as well as other libraries of code and resolves all these references, resulting in a bit of code which can actually be run.
A very simple-minded Lisp system might work entirely by the first mechanism (I am pretty sure that this is how Python works, for instance). A more advanced compiler will probably work by some combination of the first and second mechanism. As an example of this, CL allows the compiler to make assumptions that apparent self-calls in functions really are self-calls, and so the compiler may well compile them as direct calls (essentially it will compile the function and then link it on the fly). But when compiling code in general, it might call 'through the name' of the function.
There are also more-or-less heroic strategies which things could do: for instance at the first call of a function link it, on the fly, to all the things it refers to, and note in their definitions that if they change then this thing needs to be unlinked as well so it all happens again. These kind of tricks once seemed implausible, but compilers for languages like JavaScript do things at least as hairy as this all the time now.
Note that compilers and linkers for modern systems actually do something more complicated than I've described, because of shared libraries &c: what I described is more-or-less what happened pre shared-library.

In functional languages, how is the compiler able to translate non-tail recursion into loops to avoid stack overflows (if at all)?

I've been recently learning about functional languages and how many don't include for loops. While I don't personally view recursion as more difficult than a for loop (and often easier to reason out) I realized that many examples of recursion aren't tail recursive and therefor cannot use simple tail recursion optimization in order to avoid stack overflows. According to this question, all iterative loops can be translated into recursion, and those iterative loops can be transformed into tail recursion, so it confuses me when the answers on a question like this suggest that you have to explicitly manage the translation of your recursion into tail recursion yourself if you want to avoid stack overflows. It seems like it should be possible for a compiler to do all the translation from either recursion to tail recursion, or from recursion straight to an iterative loop with out stack overflows.
Are functional compilers able to avoid stack overflows in more general recursive cases? Are you really forced to transform your recursive code in order to avoid stack overflows yourself? If they aren't able to perform general recursive stack-safe compilation, why aren't they?
Any recursive function can be converted into a tail recursive one.
For instance, consider the transition function of a Turing machine, that
is the mapping from a configuration to the next one. To simulate the
turing machine you just need to iterate the transition function until
you reach a final state, that is easily expressed in tail recursive
form. Similarly, a compiler typically translates a recursive program into
an iterative one simply adding a stack of activation records.
You can also give a translation into tail recursive form using continuation
passing style (CPS). To make a classical example, consider the fibonacci
function.
This can be expressed in CPS style in the following way, where the second
parameter is the continuation (essentially, a callback function):
def fibc(n, cont):
if n <= 1:
return cont(n)
return fibc(n - 1, lambda a: fibc(n - 2, lambda b: cont(a + b)))
Again, you are simulating the recursion stack using a dynamic data structure:
in this case, lambda abstractions.
The use of dynamic structures (lists, stacks, functions, etc.) in all previous
examples is essential. That is to say, that in order to simulate a generic
recursive function iteratively, you cannot avoid dynamic memory allocation,
and hence you cannot avoid stack overflow, in general.
So, memory consumption is not only related to the iterative/recursive
nature of the program. On the other side, if you prevent dynamic memory
allocation, your
programs are essentially finite state machines, with limited computational
capabilities (more interesting would be to parametrise memory according to
the dimension of inputs).
In general, in the same way as you cannot predict termination, you cannot
predict an unbound memory consumption of your program: working with
a Turing complete language, at compile time
you cannot avoid divergence, and you cannot avoid stack overflow.
Tail Call Optimization:
The natural way to do arguments and calls is to sort out the cleaning up when exiting or when returning.
For tail calls to work you need to alter it so that the tail call inherits the current frame. Thus instead of making a new frame it massages the frame so that the next call returns to the current functions caller instead of this function, which really only cleans up and returns if it's a tail call.
Thus TCO is all about cleaning up before the last call.
Continuation Passing Style - make tail calls out of everything
A compiler can change the code such that it only does primitive operations and pass it to continuations. Thus the stack usage gets moved onto the heap since the computation to be continued is made a function.
An example is:
function hypotenuse(k1, k2) {
return sqrt(add(square(k1), square(k2)))
}
becomes
function hypotenuse(k, k1, k2) {
(function (sk1) {
(function (sk2) {
(function (ar) {
k(sqrt(ar));
}(add(sk1,sk2));
}(square(k2));
}(square(k1));
}
Notice every function has exactly one call now and the order of evaluation is set.
According to this question, all iterative loops can be translated into recursion
"Translated" might be a bit of a stretch. The proof that for every iterative loop there is an equivalent recursive program is trivial if you understand Turing completeness: since a Turing machine can be implemented using strictly iterative structures and strictly recursive structures, every program that can be expressed in an iterative language can be expressed in a recursive language, and vice-versa. This means that for every iterative loop there is an equivalent recursive construct (and the other way around). However, that doesn't mean we have some automated way of transforming one into the other.
and those iterative loops can be transformed into tail recursion
Tail recursion can perhaps be easily transformed into an iterative loop, and the other way around. But not all recursion is tail recursion. Here's an example. Suppose we have some binary tree. It consists of nodes. Each node can have a left and a right child and a value. If a node has no children, then isLeaf returns true for it. We'll assume there's some function max that returns the maximum of two values, and if one of the values is null it returns the other one. Now we want to define a function that finds the maximum value among all the leaf nodes. Here it is in some pseudo-code I cooked up.
findmax(node) {
if (node == null) {
return null
}
if (node.isLeaf) {
return node.value
} else {
return max(findmax(node.left), findmax(node.right))
}
}
There's two recursive calls in the max function, so we can't optimize for tail recursion. We need the results of both before we can supply them to the max function and determine the result of the call for the current node.
Now, there may be a way of getting the same result, using recursion and only a single tail-recursive call. It is functionally equivalent, but it is a different algorithm. Compilers can do a lot of transformations to create a functionally equivalent program with lots of optimizations, but they're not quite clever enough to create functionally equivalent algorithms.
Even the transformation of a function that only calls itself recursively once into a tail-recursive version would be far from trivial. Such an adaptation usually employs some argument passed into the recursive invocation that is used as an "accumulator" for the current results.
Look at the next naive implementation for calculating a factorial of a number (e.g. fact(5) = 5*4*3*2*1):
fact(number) {
if (number == 1) {
return 1
} else {
return number * fact(number - 1)
}
}
It's not tail-recursive. But it can be made so in this way:
fact(number, acc) {
if (number == 1) {
return acc
} else {
return fact(number - 1, number * acc)
}
}
// Helper function
fact(number) {
return fact(number, 1)
}
This requires an interpretation of what is being done. Recognizing the case for stuff like this is easy enough, but what if you call a function instead of a multiplication? How will the compiler know that for the initial call the accumulator must be 1 and not, say, 0? How do you translate this program?
recsub(number) {
if (number == 1) {
return 1
} else {
return number - recsub(number - 1)
}
}
This is as of yet outside the scope of the sort of compiler we have now, and may in fact always be.
Maybe it would be interesting to ask this on the computer science Stack Exchange to see if they know of some papers or proofs that investigate this more in-depth.

What do "continuations" mean in functional programming?(Specfically SML)

I have read a lot about continuations and a very common definition I saw is, it returns the control state.
I am taking a functional programming course taught in SML.
Our professor defined continuations to be:
"What keeps track of what we still have to do"
; "Gives us control of the call stack"
A lot of his examples revolve around trees. Before this chapter, we did tail recursion. I understand that tail recursion lets go of the stack to hold the recursively called functions by having an additional argument to "build" up the answer. Reversing a list would be built in a new accumulator where we append to it accordingly. Also, he said something about functions are called(but not evaluated) except till we reach the end where we replace backwards. He said an improved version of tail recursion would be using CPS(Continuation Programming Style).
Could someone give a simplified explanation of what continuations are and why they are favoured over other programming styles?
I found this stackoverflow link that helped me, but still did not clarify the idea for me:
I just don't get continuations!
Continuations simply treat "what happens next" as first class objects that can be used once unconditionally, ignored in favour of something else, or used multiple times.
To address what Continuation Passing Style is, here is some expression written normally:
let h x = f (g x)
g is applied to x and f is applied to the result.
Notice that g does not have any control. Its result will be passed to f no matter what.
in CPS this is written
let h x next = (g x (fun result -> f result next))
g not only has x as an argument, but a continuation that takes the output of g and returns the final value. This function calls f in the same manner, and gives next as the continuation.
What happened? What changed that made this so much more useful than f (g x)? The difference is that now g is in control. It can decide whether to use what happens next or not. That is the essence of continuations.
An example of where continuations arise are imperative programming languages where you have control structures. Whiles, blocks, ordinary statements, breaks and continues are all generalized through continuations, because these control structures take what happens next and decide what to do with it, for example we can have
...
while(condition1) {
statement1;
if(condition2) break;
statement2;
if(condition3) continue;
statement3;
}
return statement3;
...
The while, the block, the statement, the break and the continue can all be described in a functional model through continuations. Each construct can be considered to be a function that accepts the
current environment containing
the enclosing scopes
optional functions accepting the current environment and returning a continuation to
break from the inner most loop
continue from the inner most loop
return from the current function.
all the blocks associated with it (if-blocks, while-block, etc)
a continuation to the next statement
and returns the new environment.
In the while loop, the condition is evaluated according to the current environment. If it is evaluated to true, then the block is evaluated and returns the new environment. The result of evaluating the while loop again with the new environment is returned. If it is evaluated to false, the result of evaluating the next statement is returned.
With the break statement, we lookup the break function in the environment. If there is no function found then we are not inside a loop and we give an error. Otherwise we give the current environment to the function and return the evaluated continuation, which would be the statement after the the while loop.
With the continue statement the same would happen, except the continuation would be the while loop.
With the return statement the continuation would be the statement following the call to the current function, but it would remove the current enclosing scope from the environment.

Fixed-Point Combinators

I am new to the world of fixed-point combinators and I guess they are used to recurse on anonymous lambdas, but I haven't really got to use them, or even been able to wrap my head around them completely.
I have seen the example in Javascript for a Y-combinator but haven't been able to successfully run it.
The question here is, can some one give an intuitive answer to:
What are Fixed-point combinators, (not just theoretically, but in context of some example, to reveal what exactly is the fixed-point in that context)?
What are the other kinds of fixed-point combinators, apart from the Y-combinator?
Bonus Points: If the example is not just in one language, preferably in Clojure as well.
UPDATE:
I have been able to find a simple example in Clojure, but still find it difficult to understand the Y-Combinator itself:
(defn Y [r]
((fn [f] (f f))
(fn [f]
(r (fn [x] ((f f) x))))))
Though the example is concise, I find it difficult to understand what is happening within the function. Any help provided would be useful.
Suppose you wanted to write the factorial function. Normally, you would write it as something like
function fact(n) = if n=0 then 1 else n * fact(n-1)
But that uses explicit recursion. If you wanted to use the Y-combinator instead, you could first abstract fact as something like
function factMaker(myFact) = lamba n. if n=0 then 1 else n * myFact(n-1)
This takes an argument (myFact) which it calls were the "true" fact would have called itself. I call this style of function "Y-ready", meaning it's ready to be fed to the Y-combinator.
The Y-combinator uses factMaker to build something equivalent to the "true" fact.
newFact = Y(factMaker)
Why bother? Two reasons. The first is theoretical: we don't really need recursion if we can "simulate" it using the Y-combinator.
The second is more pragmatic. Sometimes we want to wrap each function call with some extra code to do logging or profiling or memoization or a host of other things. If we try to do this to the "true" fact, the extra code will only be called for the original call to fact, not all the recursive calls. But if we want to do this for every call, including all the recursive call, we can do something like
loggingFact = LoggingY(factMaker)
where LoggingY is a modified version of the Y combinator that introduces logging. Notice that we did not need to change factMaker at all!
All this is more motivation why the Y-combinator matters than a detailed explanation from how that particular implementation of Y works (because there are many different ways to implement Y).
To answer your second question about fix-point combinators other than Y. There are countably infinitely many standard fix-point combinators, that is, combinators fix that satisfy the equation
fix f = f (fix f)
There are also contably many non-standard fix-point combinators, which satisfy the equation
fix f = f (f (fix f))
etc. Standard fix-point combinators are recursively enumerable, but non-standard are not. Please see the following web page for examples, references and discussion.
http://okmij.org/ftp/Computation/fixed-point-combinators.html#many-fixes

What are best practices for including parameters such as an accumulator in functions?

I've been writing more Lisp code recently. In particular, recursive functions that take some data, and build a resulting data structure. Sometimes it seems I need to pass two or three pieces of information to the next invocation of the function, in addition to the user supplied data. Lets call these accumulators.
What is the best way to organize these interfaces to my code?
Currently, I do something like this:
(defun foo (user1 user2 &optional acc1 acc2 acc3)
;; do something
(foo user1 user2 (cons x acc1) (cons y acc2) (cons z acc3)))
This works as I'd like it to, but I'm concerned because I don't really need to present the &optional parameters to the programmer.
3 approaches I'm somewhat considering:
have a wrapper function that a user is encouraged to use that immediately invokes the extended definiton.
use labels internally within a function whose signature is concise.
just start using a loop and variables. However, I'd prefer not since I'd like to really wrap my head around recursion.
Thanks guys!
If you want to write idiomatic Common Lisp, I'd recommend the loop and variables for iteration. Recursion is cool, but it's only one tool of many for the Common Lisper. Besides, tail-call elimination is not guaranteed by the Common Lisp spec.
That said, I'd recommend the labels approach if you have a structure, a tree for example, that is unavoidably recursive and you can't get tail calls anyway. Optional arguments let your implementation details leak out to the caller.
Your impulse to shield implementation details from the user is a smart one, I think. I don't know common lisp, but in Scheme you do it by defining your helper function in the public function's lexical scope.
(define (fibonacci n)
(let fib-accum ((a 0)
(b 1)
(n n))
(if (< n 1)
a
(fib-accum b (+ a b) (- n 1)))))
The let expression defines a function and binds it to a name that's only visible within the let, then invokes the function.
I have used all the options you mention. All have their merits, so it boils down to personal preference.
I have arrived at using whatever I deem appropriate. If I think that leaving the &optional accumulators in the API might make sense for the user, I leave it in. For example, in a reduce-like function, the accumulator can be used by the user for providing a starting value. Otherwise, I'll often rewrite it as a loop, do, or iter (from the iterate library) form, if it makes sense to perceive it as such. Sometimes, the labels helper is also used.

Resources