transforming tail recursion into non tail recursion - functional-programming

I would like to know if there is any method that converts tail-recursive function into a non tail recursive one. I know, tail recursive functions are better than no tail recursive functions if one's focus is on efficiency, but in verification tail-recursion is not suitable.
thanks,

I doubt this question is well posed, because tail recursion elimination is an optimisation that you could simply ignore or turn off for the purposes of verification if you need to. (I guess that replacing a call with a jump might confuse it or damage certain security properties.)
However, if you do really want to do this, you need to change your source code so that the recursive function call is no longer in tail position. The tricky part will be making sure that the optimiser does not change your code back again.
Consider this F# tail-recursive factorial function:
let rec fact n v =
if n <= 1
then v
else fact (n-1) (n*v)
Now, let's add a function that does nothing at all unless you set some global variable to true:
// we'll never set this to true, but the compiler doesn't know this
let mutable actuallyDoSomething = false
let doNothing() =
if actuallyDoSomething then failwith "Don't set actuallyDoSomething!"
Now we can change the tail call in fact:
let rec fact n v =
if n <= 1
then v
else
let res = fact (n-1) (n*v)
doNothing()
res
Similar tricks should work in other languages.

Related

Why use a recursive function rather than `while true do` in F#?

Whilst watching a Pluralsight course by Tomas Petricek (who I assume knows what he is talking about), I saw code like the following...
let echo =
MailboxProcessor<string>.Start(fun inbox ->
async {
while do true
let! msg = inbox.Receive()
printfn "Hello %s" msg
})
Ignore the fact that this was to demo agents, I'm interested in the inner function, which uses while do true to keep it running indefinitely.
Whilst looking around for other example of agents, I saw that many other people use code like this...
let counter =
MailboxProcessor.Start(fun inbox ->
let rec loop n =
async { do printfn "n = %d, waiting..." n
let! msg = inbox.Receive()
return! loop(n+msg) }
loop 0)
Code copied from Wikibooks.
The inner function here is recursive, and is started off by calling it with a base value before the main function declaration ends.
Now I realise that in the second case recursion is a handy way of passing a private value to the inner function without having to use a mutable local value, but is there any other reason to use recursion here rather than while do true? Would there be any benefit in writing the first code snippet using recursion?
I find the non-recursive version much easier to read (subjective opinion of course), which seems like a good reason to use that whenever possible.
Talking about MailboxProcessor specifically, I think the choice depends on what exactly you are doing. In general, you can always use while loop or recursion.
Recursion makes it easier to use immutable state and I find while loop nicer if you have no state or if you use mutable state. Using mutable state is often quite useful, because MailboxProcessor protects you from concurrent accesses and you can keep the state local, so things like Dictionary (efficient hash table) are often useful.
In general:
If you don't need any state, I would prefer while
If you have mutable state (like Dictionary or ResizeArray), I'd go for while
If you have some immutable state (like functional list or an integer), then recursion is nicer
If your logic switches between multiple modes of operation then you can write it as two mutually recursive functions, which is not doable nicely with loops.
In many cases it depends on how you like to code. Like in your example.
Everything you can write recursiv you can also write with a loop, but sometime like with recursive data structures it is easier to write the in a recursive style.
At university I learned that with recursive programming you only have to look at your next step which is pretty handy!
You might be interested in this question as it explains my answer a bit further:
recursion versus iteration
In F# for and while loop expressions lacks capabilities common in other languages:
continue - Skip the rest of the loop and restart from the top of loop expression.
break - Stop the loop prematurely.
If you want continue and break don't do what I did in the beginning a write a very complex test expression for while loops. Instead tail recursion is the best answer in F#:
let vs : int [] = ...
let rec findPositiveNumberIndex i =
if i < vs.Length then
if vs.[i] > 0 then
Some i
else
findPositiveNumberIndex (i + 1)
else
None
match findPositiveNumberIndex 0 with
| Some i -> printfn "First positive number index: %d" i
| None -> printfn "No positive numbers found"
In code like this F# applies something call tail-call optimization (TCO) which transforms the code above into a while loop with break. This means that we won't run out of stack space and the loop is efficient. TCO is a feature that C# lacks so we don't want to write code like above in C#.
Like others are saying with tail recursion you can sometimes avoid mutable state but that's not all.
With tail recursion your loop expressions can return a result which is nice.
In addition, if you like to iterate fast in F# over types like int64 or with increment other than 1 and -1 you have to rely on tail recursion. The reason is that F# only does efficient for expressions for ints and increments of 1 and -1.
for i in 0..100 do
printfn "This is a fast loop"
for i in 0..2..100 do
printfn "This is a slow loop"
for i in 0L..100L do
printfn "This is a slow loop"
Sometimes when you are hunting for performance a trick is to loop towards 0 (saves a CPU register). Unfortunately the way the F# generates for loop code this doesn't perform as well one would hope:
for i = 100 downto 0 do
printfn "Unfortunately this is not as efficient as it can be"
Tail recursion towards 0 saves the CPU register.
(Unfortunately the F# compiler don't merge the test and the loop instruction for tail-recursion so it's not so good as it could be)

Recursive Sequences in F#

Let's say I want to calculate the factorial of an integer. A simple approach to this in F# would be:
let rec fact (n: bigint) =
match n with
| x when x = 0I -> 1I
| _ -> n * fact (n-1I)
But, if my program needs dynamic programming, how could I sustain functional programming whilst using memoization?
One idea I had for this was making a sequence of lazy elements, but I ran into a problem. Assume that the follow code was acceptable in F# (it is not):
let rec facts =
seq {
yield 1I
for i in 1I..900I do
yield lazy (i * (facts |> Seq.item ((i-1I) |> int)))
}
Is there anything similar to this idea in F#?
(Note: I understand that I could use a .NET Dictionary but isn't invoking the ".Add()" method imperative style?)
Also, Is there any way I could generalize this with a function? For example, could I create a sequence of length of the collatz function defined by the function:
let rec collatz n i =
if n = 0 || n = 1 then (i+1)
elif n % 2 = 0 then collatz (n/2) (i+1)
else collatz (3*n+1) (i+1)
If you want to do it lazily, this is a nice approach:
let factorials =
Seq.initInfinite (fun n -> bigint n + 1I)
|> Seq.scan ((*)) 1I
|> Seq.cache
The Seq.cache means you won't repeatedly evaluate elements you've already enumerated.
You can then take a particular number of factorials using e.g. Seq.take n, or get a particular factorial using Seq.item n.
At first, i don't see in your example what you mean with "dynamic programming".
Using memorization doesn't mean something is not "functional" or breaks immutability. The important
point is not how something is implemented. The important thing is how it behaves. A function that uses
a mutable memoization is still considered pure, as long as it behaves like a pure function/immutable
function. So using a mutable variables in a limited scope that is not visible to the caller is still
considered pure. If the implementation would be important we could also consider tail-recursion as
not pure, as the compiler transform it into a loop with mutable variables under the hood. There
also exists some List.xyz function that use mutation and transform things into a mutable variable
just because of speed. Those function are still considered pure/immutable because they still behave like
pure function.
A sequence itself is already lazy. It already computes all its elements only when you ask for those elements.
So it doesn't make much sense to me to create a sequence that returns lazy elements.
If you want to speed up the computation there exists multiple ways how to do it. Even in the recursion
version you could use an accumulator that is passed to the next function call. Instead of doing deep
recursion.
let fact n =
let rec loop acc x =
if x = n
then acc * x
else loop (acc*x) (x+1I)
loop 1I 1I
That overall is the same as
let fact' n =
let mutable acc = 1I
let mutable x = 1I
while x <= n do
acc <- acc * x
x <- x + 1I
acc
As long you are learning functional programming it is a good idea to get accustomed to the first version and learn
to understand how looping and recursion relate to each other. But besides learning there isn't a reason why you
always should force yourself to always write the first version. In the end you should use what you consider more
readable and easier to understand. Not whether something uses a mutable variable as an implementation or not.
In the end nobody really cares for the exact implementation. We should view functions as black-boxes. So as long as
a function behaves like a pure function, everything is fine.
The above uses an accumulator, so you don't need to repetitive call a function again to get a value. So you also
don't need an internal mutable cache. if you really have a slow recursive version and want to speed it up with
caching you can use something like that.
let fact x =
let rec fact x =
match x with
| x when x = 1I -> 1I
| x -> (fact (x-1I)) * x
let cache = System.Collections.Generic.Dictionary<bigint,bigint>()
match cache.TryGetValue x with
| false,_ ->
let value = fact x
cache.Add(x,value)
value
| true,value ->
value
But that would probably be slower as the versions with an accumulator. If you want to cache calls to fact even across multiple
fact calls across your whole application then you need an external cache. You could create a Dictionary outside of fact and use a
private variable for this. But you also then can use a function with a closure, and make the whole process itself generic.
let memoize (f:'a -> 'b) =
let cache = System.Collections.Generic.Dictionary<'a,'b>()
fun x ->
match cache.TryGetValue x with
| false,_ ->
let value = f x
cache.Add(x,value)
value
| true,value ->
value
let rec fact x =
match x with
| x when x = 1I -> 1I
| x -> (fact (x-1I)) * x
So now you can use something like that.
let fact = memoize fact
printfn "%A" (fact 100I)
printfn "%A" (fact 100I)
and create a memoized function out of every other function that takes 1 parameter
Note that memoization doesn't automatically speed up everything. If you use the memoize function on fact
nothing get speeded up, it will even be slower as without the memoization. You can add a printfn "Cache Hit"
to the | true,value -> branch inside the memoize function. Calling fact 100I twice in a row will only
yield a single "Cache Hit" line.
The problem is how the algorithm works. It starts from 100I and it goes down to 0I. So calculating 100I ask
the cache of 99I, it doesn't exists, so it tries to calculate 98I and ask the cache. That also doesn't exists
so it goes down to 1I. It always asked the cache, never found a result and calculates the needed value.
So you never get a "Cache Hit" and you have the additional work of asking the cache. To really benefit from the
cache you need to change fact itself, so it starts from 1I up to 100I. The current version even throws StackOverflow
for big inputs, even with the memoize function.
Only the second call benefits from the cache, That is why calling fact 100I twice will ever only print "Cache Hit" once.
This is just an example that is easy to get the behaviour wrong with caching/memoization. In general you should try to
write a function so it is tail-recursive and uses accumulators instead. Don't try to write functions that expects
memoization to work properly.
I would pick a solution with an accumulator. If you profiled your application and you found that this is still to slow
and you have a bottleneck in your application and caching fact would help, then you also can just cache the results of
facts directly. Something like this. You could use dict or a Map for this.
let factCache = [1I..100I] |> List.map (fun x -> x,fact x) |> dict
let factCache = [1I..100I] |> List.map (fun x -> x,fact x) |> Map.ofList

What is the difference between these two recursive ocaml functions?

let rec x1() = x1();()
let rec x2() = x2();;
Calling x1();; generates a stack overflow whereas calling x2();; causes the program to run indefinitely. What is the difference between the 2 functions?
let rec x1() = x1();()
This function is not tail recursive. It calls itself x1(); and when this call returns, the function will return a unit ().
let rec x2() = x2();;
This function is calling itself at the very end; therefore the compiler can perform tail-call optimization - which will mean that the recursive function calls will never use up all the stack space.
This page explains tail recursion: http://ocaml.org/learn/tutorials/if_statements_loops_and_recursion.html#Tailrecursion - It is a fundamental technique used by functional programming languages so that we can use recursion for implementing loops without running out of memory.
I first learnt about the process stack when I read Smashing The Stack For Fun And Profit; I still think that it has the best description of what the process stack is all about.

Why does ocaml need both "let" and "let rec"? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why are functions in Ocaml/F# not recursive by default?
OCaml uses let to define a new function, or let rec to define a function that is recursive. Why does it need both of these - couldn't we just use let for everything?
For example, to define a non-recursive successor function and recursive factorial in OCaml (actually, in the OCaml interpreter) I might write
let succ n = n + 1;;
let rec fact n =
if n = 0 then 1 else n * fact (n-1);;
Whereas in Haskell (GHCI) I can write
let succ n = n + 1
let fact n =
if n == 0 then 1 else n * fact (n-1)
Why does OCaml distinguish between let and let rec? Is it a performance issue, or something more subtle?
Well, having both available instead of only one gives the programmer tighter control on the scope. With let x = e1 in e2, the binding is only present in e2's environment, while with let rec x = e1 in e2 the binding is present in both e1 and e2's environments.
(Edit: I want to emphasize that it is not a performance issue, that makes no difference at all.)
Here are two situations where having this non-recursive binding is useful:
shadowing an existing definition with a refinement that use the old binding. Something like: let f x = (let x = sanitize x in ...), where sanitize is a function that ensures the input has some desirable property (eg. it takes the norm of a possibly-non-normalized vector, etc.). This is very useful in some cases.
metaprogramming, for example macro writing. Imagine I want to define a macro SQUARE(foo) that desugars into let x = foo in x * x, for any expression foo. I need this binding to avoid code duplication in the output (I don't want SQUARE(factorial n) to compute factorial n twice). This is only hygienic if the let binding is not recursive, otherwise I couldn't write let x = 2 in SQUARE(x) and get a correct result.
So I claim it is very important indeed to have both the recursive and the non-recursive binding available. Now, the default behaviour of the let-binding is a matter of convention. You could say that let x = ... is recursive, and one must use let nonrec x = ... to get the non-recursive binder. Picking one default or the other is a matter of which programming style you want to favor and there are good reasons to make either choice. Haskell suffers¹ from the unavailability of this non-recursive mode, and OCaml has exactly the same defect at the type level : type foo = ... is recursive, and there is no non-recursive option available -- see this blog post.
¹: when Google Code Search was available, I used it to search in Haskell code for the pattern let x' = sanitize x in .... This is the usual workaround when non-recursive binding is not available, but it's less safe because you risk writing x instead of x' by mistake later on -- in some cases you want to have both available, so picking a different name can be voluntary. A good idiom would be to use a longer variable name for the first x, such as unsanitized_x. Anyway, just looking for x' literally (no other variable name) and x1 turned a lot of results. Erlang (and all language that try to make variable shadowing difficult: Coffeescript, etc.) has even worse problems of this kind.
That said, the choice of having Haskell bindings recursive by default (rather than non-recursive) certainly makes sense, as it is consistent with lazy evaluation by default, which makes it really easy to build recursive values -- while strict-by-default languages have more restrictions on which recursive definitions make sense.

How does one implement a "stackless" interpreted language?

I am making my own Lisp-like interpreted language, and I want to do tail call optimization. I want to free my interpreter from the C stack so I can manage my own jumps from function to function and my own stack magic to achieve TCO. (I really don't mean stackless per se, just the fact that calls don't add frames to the C stack. I would like to use a stack of my own that does not grow with tail calls). Like Stackless Python, and unlike Ruby or... standard Python I guess.
But, as my language is a Lisp derivative, all evaluation of s-expressions is currently done recursively (because it's the most obvious way I thought of to do this nonlinear, highly hierarchical process). I have an eval function, which calls a Lambda::apply function every time it encounters a function call. The apply function then calls eval to execute the body of the function, and so on. Mutual stack-hungry non-tail C recursion. The only iterative part I currently use is to eval a body of sequential s-expressions.
(defun f (x y)
(a x y)) ; tail call! goto instead of call.
; (do not grow the stack, keep return addr)
(defun a (x y)
(+ x y))
; ...
(print (f 1 2)) ; how does the return work here? how does it know it's supposed to
; return the value here to be used by print, and how does it know
; how to continue execution here??
So, how do I avoid using C recursion? Or can I use some kind of goto that jumps across c functions? longjmp, perhaps? I really don't know. Please bear with me, I am mostly self- (Internet- ) taught in programming.
One solution is what is sometimes called "trampolined style". The trampoline is a top-level loop that dispatches to small functions that do some small step of computation before returning.
I've sat here for nearly half an hour trying to contrive a good, short example. Unfortunately, I have to do the unhelpful thing and send you to a link:
http://en.wikisource.org/wiki/Scheme:_An_Interpreter_for_Extended_Lambda_Calculus/Section_5
The paper is called "Scheme: An Interpreter for Extended Lambda Calculus", and section 5 implements a working scheme interpreter in an outdated dialect of Lisp. The secret is in how they use the **CLINK** instead of a stack. The other globals are used to pass data around between the implementation functions like the registers of a CPU. I would ignore **QUEUE**, **TICK**, and **PROCESS**, since those deal with threading and fake interrupts. **EVLIS** and **UNEVLIS** are, specifically, used to evaluate function arguments. Unevaluated args are stored in **UNEVLIS**, until they are evaluated and out into **EVLIS**.
Functions to pay attention to, with some small notes:
MLOOP: MLOOP is the main loop of the interpreter, or "trampoline". Ignoring **TICK**, its only job is to call whatever function is in **PC**. Over and over and over.
SAVEUP: SAVEUP conses all the registers onto the **CLINK**, which is basically the same as when C saves the registers to the stack before a function call. The **CLINK** is actually a "continuation" for the interpreter. (A continuation is just the state of a computation. A saved stack frame is technically continuation, too. Hence, some Lisps save the stack to the heap to implement call/cc.)
RESTORE: RESTORE restores the "registers" as they were saved in the **CLINK**. It's similar to restoring a stack frame in a stack-based language. So, it's basically "return", except some function has explicitly stuck the return value into **VALUE**. (**VALUE** is obviously not clobbered by RESTORE.) Also note that RESTORE doesn't always have to return to a calling function. Some functions will actually SAVEUP a whole new computation, which RESTORE will happily "restore".
AEVAL: AEVAL is the EVAL function.
EVLIS: EVLIS exists to evaluate a function's arguments, and apply a function to those args. To avoid recursion, it SAVEUPs EVLIS-1. EVLIS-1 would just be regular old code after the function application if the code was written recursively. However, to avoid recursion, and the stack, it is a separate "continuation".
I hope I've been of some help. I just wish my answer (and link) was shorter.
What you're looking for is called continuation-passing style. This style adds an additional item to each function call (you could think of it as a parameter, if you like), that designates the next bit of code to run (the continuation k can be thought of as a function that takes a single parameter). For example you can rewrite your example in CPS like this:
(defun f (x y k)
(a x y k))
(defun a (x y k)
(+ x y k))
(f 1 2 print)
The implementation of + will compute the sum of x and y, then pass the result to k sort of like (k sum).
Your main interpreter loop then doesn't need to be recursive at all. It will, in a loop, apply each function application one after another, passing the continuation around.
It takes a little bit of work to wrap your head around this. I recommend some reading materials such as the excellent SICP.
Tail recursion can be thought of as reusing for the callee the same stack frame that you are currently using for the caller. So you could just re-set the arguments and goto to the beginning of the function.

Resources