Recursive Sequences in F# - recursion

Let's say I want to calculate the factorial of an integer. A simple approach to this in F# would be:
let rec fact (n: bigint) =
match n with
| x when x = 0I -> 1I
| _ -> n * fact (n-1I)
But, if my program needs dynamic programming, how could I sustain functional programming whilst using memoization?
One idea I had for this was making a sequence of lazy elements, but I ran into a problem. Assume that the follow code was acceptable in F# (it is not):
let rec facts =
seq {
yield 1I
for i in 1I..900I do
yield lazy (i * (facts |> Seq.item ((i-1I) |> int)))
}
Is there anything similar to this idea in F#?
(Note: I understand that I could use a .NET Dictionary but isn't invoking the ".Add()" method imperative style?)
Also, Is there any way I could generalize this with a function? For example, could I create a sequence of length of the collatz function defined by the function:
let rec collatz n i =
if n = 0 || n = 1 then (i+1)
elif n % 2 = 0 then collatz (n/2) (i+1)
else collatz (3*n+1) (i+1)

If you want to do it lazily, this is a nice approach:
let factorials =
Seq.initInfinite (fun n -> bigint n + 1I)
|> Seq.scan ((*)) 1I
|> Seq.cache
The Seq.cache means you won't repeatedly evaluate elements you've already enumerated.
You can then take a particular number of factorials using e.g. Seq.take n, or get a particular factorial using Seq.item n.

At first, i don't see in your example what you mean with "dynamic programming".
Using memorization doesn't mean something is not "functional" or breaks immutability. The important
point is not how something is implemented. The important thing is how it behaves. A function that uses
a mutable memoization is still considered pure, as long as it behaves like a pure function/immutable
function. So using a mutable variables in a limited scope that is not visible to the caller is still
considered pure. If the implementation would be important we could also consider tail-recursion as
not pure, as the compiler transform it into a loop with mutable variables under the hood. There
also exists some List.xyz function that use mutation and transform things into a mutable variable
just because of speed. Those function are still considered pure/immutable because they still behave like
pure function.
A sequence itself is already lazy. It already computes all its elements only when you ask for those elements.
So it doesn't make much sense to me to create a sequence that returns lazy elements.
If you want to speed up the computation there exists multiple ways how to do it. Even in the recursion
version you could use an accumulator that is passed to the next function call. Instead of doing deep
recursion.
let fact n =
let rec loop acc x =
if x = n
then acc * x
else loop (acc*x) (x+1I)
loop 1I 1I
That overall is the same as
let fact' n =
let mutable acc = 1I
let mutable x = 1I
while x <= n do
acc <- acc * x
x <- x + 1I
acc
As long you are learning functional programming it is a good idea to get accustomed to the first version and learn
to understand how looping and recursion relate to each other. But besides learning there isn't a reason why you
always should force yourself to always write the first version. In the end you should use what you consider more
readable and easier to understand. Not whether something uses a mutable variable as an implementation or not.
In the end nobody really cares for the exact implementation. We should view functions as black-boxes. So as long as
a function behaves like a pure function, everything is fine.
The above uses an accumulator, so you don't need to repetitive call a function again to get a value. So you also
don't need an internal mutable cache. if you really have a slow recursive version and want to speed it up with
caching you can use something like that.
let fact x =
let rec fact x =
match x with
| x when x = 1I -> 1I
| x -> (fact (x-1I)) * x
let cache = System.Collections.Generic.Dictionary<bigint,bigint>()
match cache.TryGetValue x with
| false,_ ->
let value = fact x
cache.Add(x,value)
value
| true,value ->
value
But that would probably be slower as the versions with an accumulator. If you want to cache calls to fact even across multiple
fact calls across your whole application then you need an external cache. You could create a Dictionary outside of fact and use a
private variable for this. But you also then can use a function with a closure, and make the whole process itself generic.
let memoize (f:'a -> 'b) =
let cache = System.Collections.Generic.Dictionary<'a,'b>()
fun x ->
match cache.TryGetValue x with
| false,_ ->
let value = f x
cache.Add(x,value)
value
| true,value ->
value
let rec fact x =
match x with
| x when x = 1I -> 1I
| x -> (fact (x-1I)) * x
So now you can use something like that.
let fact = memoize fact
printfn "%A" (fact 100I)
printfn "%A" (fact 100I)
and create a memoized function out of every other function that takes 1 parameter
Note that memoization doesn't automatically speed up everything. If you use the memoize function on fact
nothing get speeded up, it will even be slower as without the memoization. You can add a printfn "Cache Hit"
to the | true,value -> branch inside the memoize function. Calling fact 100I twice in a row will only
yield a single "Cache Hit" line.
The problem is how the algorithm works. It starts from 100I and it goes down to 0I. So calculating 100I ask
the cache of 99I, it doesn't exists, so it tries to calculate 98I and ask the cache. That also doesn't exists
so it goes down to 1I. It always asked the cache, never found a result and calculates the needed value.
So you never get a "Cache Hit" and you have the additional work of asking the cache. To really benefit from the
cache you need to change fact itself, so it starts from 1I up to 100I. The current version even throws StackOverflow
for big inputs, even with the memoize function.
Only the second call benefits from the cache, That is why calling fact 100I twice will ever only print "Cache Hit" once.
This is just an example that is easy to get the behaviour wrong with caching/memoization. In general you should try to
write a function so it is tail-recursive and uses accumulators instead. Don't try to write functions that expects
memoization to work properly.
I would pick a solution with an accumulator. If you profiled your application and you found that this is still to slow
and you have a bottleneck in your application and caching fact would help, then you also can just cache the results of
facts directly. Something like this. You could use dict or a Map for this.
let factCache = [1I..100I] |> List.map (fun x -> x,fact x) |> dict
let factCache = [1I..100I] |> List.map (fun x -> x,fact x) |> Map.ofList

Related

Shadowing in F#

I'm wondering why F# allows shadowing, particularly within the same scope.
I've always thought of value binding in purely functional programming constructs as being akin to assignments in algebra/mathematics.
So for example,
y = x + 1
will be a valid mathematical expression but
y = y + 1
won't be.
However, because of shadowing,
let y = y + 1
is a perfectly valid expression in F#.
Why does the language allow this?
The most correct answer to this is because the creator, Don Syme, felt that it was a useful feature to add to the language.
There are some good uses for it, though. One is when working with F#-style optional parameters:
type C() =
member _.M(?x) =
let x = Option.defaultValue 0 x
printfn "%d" x
F# optional parameters are options, which brings a high degree of correctness and consistency. But imagine having 3 or more optional parameters to a method. It would get annoying to have to rebind the value for each of them and use a different name for them! This is one area where shadowing comes in handy.
It's also handy when writing recursive subroutines. Consider the following naiive implementation of sum for a list:
let mySum xs =
let rec loop xs acc =
match xs with
| [] -> acc
| h :: t -> loop t (h + acc)
loop xs 0
I don't need to rebind xs for the inner loop because of shadowing. Since it's generic, xs is about as good of a name as I can come up with, so it would be annoying to have had to use a different name for the inner loop.
Shadowing isn't all good news, though. If you're not careful, types from one open declaraction can shadow types from one that was declared previously. This can be confusing. F# editor tooling can distinguish bindings from the ones that shadow them, but you don't get that with plain text. So the bottom line is: think carefully when applying shadowing in F#.

F# tree-building function causes stack overflow in Xamarin Studio

I'm trying to build up some rules in a tree structure, with logic gates i.e. and, not, or as well as conditions, e.g. property x equals value y. I wrote the most obvious recursive function first, which worked. I then tried to write a version that wouldn't cause a stack-overflow in continuation passing style taking my cue from this post about generic tree folding and this answer on stackoverflow.
It works for small trees (depth of approximately 1000), but unfortunately when using a large tree it causes a stackoverflow when I run it on my Mac with Xamarin Studio. Can anyone tell me whether I've misunderstood how F# treats tail-recursive code or whether this code isn't tail-recursive?
The full sample is here.
let FoldTree andF orF notF leafV t data =
let rec Loop t cont =
match t with
| AndGate (left, right)->
Loop left (fun lacc ->
Loop right (fun racc ->
cont (andF lacc racc)))
| OrGate (left, right)->
Loop left (fun lacc ->
Loop right (fun racc ->
cont (orF lacc racc)))
| NotGate exp ->
Loop exp (fun acc -> cont (notF acc))
| EqualsExpression(property,value) -> cont (leafV (property,value))
Loop t id
let evaluateContinuationPassingStyle tree data =
FoldTree (&&) (||) (not) (fun (prop,value) -> data |> Map.find prop |> ((=) value)) tree data
The code is tail-recursive, you got it right. But the problem is with Mono. See, Mono is not as high-quality implementation of .NET as the official thing. In particular, it doesn't do tail call elimination. Like, at all.
For the simplest (and most prevalent) case of self-recursion this doesn't matter too much, because the compiler catches it earlier. The F# compiler is smart enough to spot that the function is calling itself, figure out under what conditions, and convert it into a neat while loop, so that the compiled code doesn't make any calls at all.
But when your tail call is to a function passed as parameter, the compiler can't do that, because the actual function being called isn't known until runtime. In fact, even mutual recursion of two functions can't be converted into a loop reliably.
Possible solutions:
Switch to .NET Core.
Don't use recursive continuations, use accumulator instead (might not be possible).
Use self-recursion and pass manually maintained stack of continuations.
If all else fails, use a mutable stack.

transforming tail recursion into non tail recursion

I would like to know if there is any method that converts tail-recursive function into a non tail recursive one. I know, tail recursive functions are better than no tail recursive functions if one's focus is on efficiency, but in verification tail-recursion is not suitable.
thanks,
I doubt this question is well posed, because tail recursion elimination is an optimisation that you could simply ignore or turn off for the purposes of verification if you need to. (I guess that replacing a call with a jump might confuse it or damage certain security properties.)
However, if you do really want to do this, you need to change your source code so that the recursive function call is no longer in tail position. The tricky part will be making sure that the optimiser does not change your code back again.
Consider this F# tail-recursive factorial function:
let rec fact n v =
if n <= 1
then v
else fact (n-1) (n*v)
Now, let's add a function that does nothing at all unless you set some global variable to true:
// we'll never set this to true, but the compiler doesn't know this
let mutable actuallyDoSomething = false
let doNothing() =
if actuallyDoSomething then failwith "Don't set actuallyDoSomething!"
Now we can change the tail call in fact:
let rec fact n v =
if n <= 1
then v
else
let res = fact (n-1) (n*v)
doNothing()
res
Similar tricks should work in other languages.

Why did I still get stackoverflow even if I used tail-recursion in OCaml?

I wrote a function which generates a list of randomized ints in OCaml.
let create_shuffled_int_list n =
Random.self_init;
let rec create n' acc =
if n' = 0 then acc
else
create (n'-1) (acc # [Random.int (n/2)])
in
create n [];;
When I tried to generate 10000 integers, it gives Exception: RangeError: Maximum call stack size exceeded. error.
However, I believed in the function, I have used tail-recursion and it should not give stackoverflow error, right?
Any idea?
From the core library documentation
val append : 'a list -> 'a list -> 'a list
Catenate two lists. Same function as the infix operator #. Not tail-recursive (length of the first argument). The # operator is not tail-recursive either.
So it's not your function that's causing the overflow, it's the # function. Seeing as you only care about producing a shuffled list, however, there's no reason to be appending things onto the end of lists. Even if the # operator were tail-recursive, list append is still O(n). List prepending, however, is O(1). So if you stick your new random numbers on the front of your list, you avoid the overflow (and make your function much much faster):
let create_shuffled_int_list n =
Random.self_init;
let rec create n' acc =
if n' = 0 then acc
else
create (n'-1) (Random.int (n/2) :: acc)
in
create n [];;
If you care about the order (not sure why), then just stick a List.rev on the end:
List.rev (create n []);;
As an aside, you should not call Random.self_init in a function, since:
the user of your function may want to control the seed in order to obtain reproductible results (testing, sharing results...)
this may reset the seed with a not so random entropy source and you probably want to do this only once.

Why does ocaml need both "let" and "let rec"? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why are functions in Ocaml/F# not recursive by default?
OCaml uses let to define a new function, or let rec to define a function that is recursive. Why does it need both of these - couldn't we just use let for everything?
For example, to define a non-recursive successor function and recursive factorial in OCaml (actually, in the OCaml interpreter) I might write
let succ n = n + 1;;
let rec fact n =
if n = 0 then 1 else n * fact (n-1);;
Whereas in Haskell (GHCI) I can write
let succ n = n + 1
let fact n =
if n == 0 then 1 else n * fact (n-1)
Why does OCaml distinguish between let and let rec? Is it a performance issue, or something more subtle?
Well, having both available instead of only one gives the programmer tighter control on the scope. With let x = e1 in e2, the binding is only present in e2's environment, while with let rec x = e1 in e2 the binding is present in both e1 and e2's environments.
(Edit: I want to emphasize that it is not a performance issue, that makes no difference at all.)
Here are two situations where having this non-recursive binding is useful:
shadowing an existing definition with a refinement that use the old binding. Something like: let f x = (let x = sanitize x in ...), where sanitize is a function that ensures the input has some desirable property (eg. it takes the norm of a possibly-non-normalized vector, etc.). This is very useful in some cases.
metaprogramming, for example macro writing. Imagine I want to define a macro SQUARE(foo) that desugars into let x = foo in x * x, for any expression foo. I need this binding to avoid code duplication in the output (I don't want SQUARE(factorial n) to compute factorial n twice). This is only hygienic if the let binding is not recursive, otherwise I couldn't write let x = 2 in SQUARE(x) and get a correct result.
So I claim it is very important indeed to have both the recursive and the non-recursive binding available. Now, the default behaviour of the let-binding is a matter of convention. You could say that let x = ... is recursive, and one must use let nonrec x = ... to get the non-recursive binder. Picking one default or the other is a matter of which programming style you want to favor and there are good reasons to make either choice. Haskell suffers¹ from the unavailability of this non-recursive mode, and OCaml has exactly the same defect at the type level : type foo = ... is recursive, and there is no non-recursive option available -- see this blog post.
¹: when Google Code Search was available, I used it to search in Haskell code for the pattern let x' = sanitize x in .... This is the usual workaround when non-recursive binding is not available, but it's less safe because you risk writing x instead of x' by mistake later on -- in some cases you want to have both available, so picking a different name can be voluntary. A good idiom would be to use a longer variable name for the first x, such as unsanitized_x. Anyway, just looking for x' literally (no other variable name) and x1 turned a lot of results. Erlang (and all language that try to make variable shadowing difficult: Coffeescript, etc.) has even worse problems of this kind.
That said, the choice of having Haskell bindings recursive by default (rather than non-recursive) certainly makes sense, as it is consistent with lazy evaluation by default, which makes it really easy to build recursive values -- while strict-by-default languages have more restrictions on which recursive definitions make sense.

Resources