I'm solving problem 4 in the 99 problems of Ocaml and I'm still learning OCaml
Question 4 reads
OCaml standard library has List.length but we ask that you reimplement it. Bonus for a tail recursive solution.
So as per my understanding a recursive solution merits bonus points because it more efficient than a potentially easier more verbose solution
I came up with this for a tail recursive solution
let length in_list =
let rec find_len cur_length = function
| [] -> cur_length
| hd::tl -> find_len (cur_length + 1) tl
in find_len 0 in_list
;;
And as per my understanding of tail recursion this is valid because it operates recursively on the tail
My question is what is the opposite of this? What is a valid non tail recursive solution
I guess it would be something that operates recursively on the head of the list
I came up with
let hd_rec_length in_list =
let rec pop_last saved_list= function
| [] -> saved_list
| [last] -> saved_list
| hd::tl -> pop_last (saved_list#[hd]) tl
in
let rec hd_rec_find_len cur_length in_list =
match in_list with
| [] -> cur_length
| hd::tl -> hd_rec_find_len (cur_length+1) (pop_last [] in_list)
in hd_rec_find_len 0 in_list
;;
But my gut tells me I'm missing something more obvious than this and the second solution seems like too much work and the first seems more natural and easy, what am I missing?
A tail-recursive function is a recursive function where all recursive calls happen in a tail position. What that means is that the recursive call must be the last thing that happens in any given path through the function. All of the recursive functions in your question are tail recursive.
Tail recursion does not mean recursing on the tail of the list. In fact a tail recursive function doesn't have to involve lists at all.
So a non-tail recursive function would be any function where you do anything after the recursive call (or there are even multiple recursive calls that aren't mutually exclusive). Usually that means applying some other function to the result of the recursive function after the recursive function returns.
A non-tail recursive version of length would be:
let rec length = function
| [] -> 0
| _::tl -> 1 + length tl
This is not tail-recursive because the last thing that happens here is the addition, not the call to length. So after the recursive call returns, we add 1 to its result and then we return as well.
Related
I need to implement a powerset function in ML that takes a list of ints with the following contraints:
1) Never call built in library functions except map, foldr, and foldl.
2) Never be recursive. All recursion occur within map and fold
3) Never contain let or local expressions
I've already implemented the function without these constraints using normal recursion of the form:
Powerset(x::xs) = powerset(xs) # x.powerset(xs)
I'm having trouble thinking of a way to translate this type of implementation into one that uses only maps and folds.
I'm not necessarily asking for someone to implement it for me, but I would appreciate a nudge in the right direction or any help for that matter.
Thakns
Here's how I go about solving problems with folds.
Think about the final value you want to obtain. In your case, this would be the powerset of the input list. The crucial question is now this: can the powerset of a list be recomputed after inserting a new element? That is, given P(S), the powerset of some set of items S, it is possible to compute P(S ∪ {x}) in terms of only P(S) and x?
I believe you've already answered this question definitively above. Not explicitly, perhaps, but if you rework your Powerset function, you'll find that the right idea is hidden in it. You'll want to package up this code into a little helper function:
fun extendPowerset (PS : 'a list list, x : 'a) : 'a list list =
(* PS is the powerset of some set S. This function
* should return the powerset of S ∪ {x}. *)
...
Great, now how do you plug that into a fold? Well, at each step of the fold, you are given two things:
the next element of the list, and
some value computed from all previous elements (think of this as a "summary" of the previous elements)
In return, you must compute a slightly larger summary: it should summarize the next element in addition to all previous. Doesn't this feel vaguely similar? The extendPowerset function is basically doing exactly what you need, where the "summary" of the previous elements is the powerset of those elements. I'll leave the rest to you.
Aside: note that you can also append two lists with a fold. As a hint, try working out what value is computed by foldl op:: [1,2] [3,4]. It's not quite like appending [1,2] and [3,4], but it's close...
Like you have already done, I would also first write a powerset function using explicit recursion. I would then try and discover the higher-order functional patterns and substitute those afterwards, one pattern at a time.
You might for example turn a helper function
fun consMany (x, []) = []
| consMany (x, L::Ls) = (x::L) :: consMany (x, Ls)
into the expression
map (fn L => x::L) Ls
You might eliminate a let-expression (useful for re-using a once-computed value twice)
fun foo (x::xs) =
let val rest = foo xs
in ... x ... rest ... rest ...
end
using a function:
fun foo (x::xs) =
bar (x, foo xs)
and bar (x, rest) = ... x ... rest ... rest ...
although, this might as well be an anonymous function:
fun foo (x::xs) =
(fn rest => ... x ... rest ... rest ...) (foo xs)
And you might turn a list-recursive function
fun derp [] = ...
| derp (x::xs) = g (x, derp xs)
into the fold:
fun derp xs = foldr g (...) xs
but if the order of your results don't matter, foldl is tail-recursive.
I have this function that finds the even numbers in a list and returns a new list with only those numbers:
def even([]), do: []
def even([head | tail]) when rem(head, 2) == 0 do
[head | even(tail)]
end
def even([_head| tail]) do
even(tail)
end
Is this already tail-call optimized? Or does every clause have to call itself at the end (the second version of the "even" function doesn't)? If not, how can it be refactored to be tail-call recursive?
I know this can be done with filter or reduce but I wanted to try without it.
You're right that this function is not tail recursive because the second clause's last call is the list prepend operation, not a call to itself. To make this tail-recursive, you'll have to use an accumulator. Since the accumulation happens in reverse, in the first clause you'll need to reverse the list.
def even(list), do: even(list, [])
def even([], acc), do: :lists.reverse(acc)
def even([head | tail], acc) when rem(head, 2) == 0 do
even(tail, [head | acc])
end
def even([_head| tail], acc) do
even(tail, acc)
end
But in Erlang, your "body-recursive" code is automatically optimized and may not be slower than a tail-recursive solution which does a :lists.reverse call at the end. The Erlang documentation recommends writing whichever of the two results in cleaner code in such cases.
According to the myth, using a tail-recursive function that builds a list in reverse followed by a call to lists:reverse/1 is faster than a body-recursive function that builds the list in correct order; the reason being that body-recursive functions use more memory than tail-recursive functions.
That was true to some extent before R12B. It was even more true before R7B. Today, not so much. A body-recursive function generally uses the same amount of memory as a tail-recursive function. It is generally not possible to predict whether the tail-recursive or the body-recursive version will be faster. Therefore, use the version that makes your code cleaner (hint: it is usually the body-recursive version).
For a more thorough discussion about tail and body recursion, see Erlang's Tail Recursion is Not a Silver Bullet.
Myth: Tail-Recursive Functions are Much Faster Than Recursive Functions.
I'm trying to build up some rules in a tree structure, with logic gates i.e. and, not, or as well as conditions, e.g. property x equals value y. I wrote the most obvious recursive function first, which worked. I then tried to write a version that wouldn't cause a stack-overflow in continuation passing style taking my cue from this post about generic tree folding and this answer on stackoverflow.
It works for small trees (depth of approximately 1000), but unfortunately when using a large tree it causes a stackoverflow when I run it on my Mac with Xamarin Studio. Can anyone tell me whether I've misunderstood how F# treats tail-recursive code or whether this code isn't tail-recursive?
The full sample is here.
let FoldTree andF orF notF leafV t data =
let rec Loop t cont =
match t with
| AndGate (left, right)->
Loop left (fun lacc ->
Loop right (fun racc ->
cont (andF lacc racc)))
| OrGate (left, right)->
Loop left (fun lacc ->
Loop right (fun racc ->
cont (orF lacc racc)))
| NotGate exp ->
Loop exp (fun acc -> cont (notF acc))
| EqualsExpression(property,value) -> cont (leafV (property,value))
Loop t id
let evaluateContinuationPassingStyle tree data =
FoldTree (&&) (||) (not) (fun (prop,value) -> data |> Map.find prop |> ((=) value)) tree data
The code is tail-recursive, you got it right. But the problem is with Mono. See, Mono is not as high-quality implementation of .NET as the official thing. In particular, it doesn't do tail call elimination. Like, at all.
For the simplest (and most prevalent) case of self-recursion this doesn't matter too much, because the compiler catches it earlier. The F# compiler is smart enough to spot that the function is calling itself, figure out under what conditions, and convert it into a neat while loop, so that the compiled code doesn't make any calls at all.
But when your tail call is to a function passed as parameter, the compiler can't do that, because the actual function being called isn't known until runtime. In fact, even mutual recursion of two functions can't be converted into a loop reliably.
Possible solutions:
Switch to .NET Core.
Don't use recursive continuations, use accumulator instead (might not be possible).
Use self-recursion and pass manually maintained stack of continuations.
If all else fails, use a mutable stack.
Let's say I want to calculate the factorial of an integer. A simple approach to this in F# would be:
let rec fact (n: bigint) =
match n with
| x when x = 0I -> 1I
| _ -> n * fact (n-1I)
But, if my program needs dynamic programming, how could I sustain functional programming whilst using memoization?
One idea I had for this was making a sequence of lazy elements, but I ran into a problem. Assume that the follow code was acceptable in F# (it is not):
let rec facts =
seq {
yield 1I
for i in 1I..900I do
yield lazy (i * (facts |> Seq.item ((i-1I) |> int)))
}
Is there anything similar to this idea in F#?
(Note: I understand that I could use a .NET Dictionary but isn't invoking the ".Add()" method imperative style?)
Also, Is there any way I could generalize this with a function? For example, could I create a sequence of length of the collatz function defined by the function:
let rec collatz n i =
if n = 0 || n = 1 then (i+1)
elif n % 2 = 0 then collatz (n/2) (i+1)
else collatz (3*n+1) (i+1)
If you want to do it lazily, this is a nice approach:
let factorials =
Seq.initInfinite (fun n -> bigint n + 1I)
|> Seq.scan ((*)) 1I
|> Seq.cache
The Seq.cache means you won't repeatedly evaluate elements you've already enumerated.
You can then take a particular number of factorials using e.g. Seq.take n, or get a particular factorial using Seq.item n.
At first, i don't see in your example what you mean with "dynamic programming".
Using memorization doesn't mean something is not "functional" or breaks immutability. The important
point is not how something is implemented. The important thing is how it behaves. A function that uses
a mutable memoization is still considered pure, as long as it behaves like a pure function/immutable
function. So using a mutable variables in a limited scope that is not visible to the caller is still
considered pure. If the implementation would be important we could also consider tail-recursion as
not pure, as the compiler transform it into a loop with mutable variables under the hood. There
also exists some List.xyz function that use mutation and transform things into a mutable variable
just because of speed. Those function are still considered pure/immutable because they still behave like
pure function.
A sequence itself is already lazy. It already computes all its elements only when you ask for those elements.
So it doesn't make much sense to me to create a sequence that returns lazy elements.
If you want to speed up the computation there exists multiple ways how to do it. Even in the recursion
version you could use an accumulator that is passed to the next function call. Instead of doing deep
recursion.
let fact n =
let rec loop acc x =
if x = n
then acc * x
else loop (acc*x) (x+1I)
loop 1I 1I
That overall is the same as
let fact' n =
let mutable acc = 1I
let mutable x = 1I
while x <= n do
acc <- acc * x
x <- x + 1I
acc
As long you are learning functional programming it is a good idea to get accustomed to the first version and learn
to understand how looping and recursion relate to each other. But besides learning there isn't a reason why you
always should force yourself to always write the first version. In the end you should use what you consider more
readable and easier to understand. Not whether something uses a mutable variable as an implementation or not.
In the end nobody really cares for the exact implementation. We should view functions as black-boxes. So as long as
a function behaves like a pure function, everything is fine.
The above uses an accumulator, so you don't need to repetitive call a function again to get a value. So you also
don't need an internal mutable cache. if you really have a slow recursive version and want to speed it up with
caching you can use something like that.
let fact x =
let rec fact x =
match x with
| x when x = 1I -> 1I
| x -> (fact (x-1I)) * x
let cache = System.Collections.Generic.Dictionary<bigint,bigint>()
match cache.TryGetValue x with
| false,_ ->
let value = fact x
cache.Add(x,value)
value
| true,value ->
value
But that would probably be slower as the versions with an accumulator. If you want to cache calls to fact even across multiple
fact calls across your whole application then you need an external cache. You could create a Dictionary outside of fact and use a
private variable for this. But you also then can use a function with a closure, and make the whole process itself generic.
let memoize (f:'a -> 'b) =
let cache = System.Collections.Generic.Dictionary<'a,'b>()
fun x ->
match cache.TryGetValue x with
| false,_ ->
let value = f x
cache.Add(x,value)
value
| true,value ->
value
let rec fact x =
match x with
| x when x = 1I -> 1I
| x -> (fact (x-1I)) * x
So now you can use something like that.
let fact = memoize fact
printfn "%A" (fact 100I)
printfn "%A" (fact 100I)
and create a memoized function out of every other function that takes 1 parameter
Note that memoization doesn't automatically speed up everything. If you use the memoize function on fact
nothing get speeded up, it will even be slower as without the memoization. You can add a printfn "Cache Hit"
to the | true,value -> branch inside the memoize function. Calling fact 100I twice in a row will only
yield a single "Cache Hit" line.
The problem is how the algorithm works. It starts from 100I and it goes down to 0I. So calculating 100I ask
the cache of 99I, it doesn't exists, so it tries to calculate 98I and ask the cache. That also doesn't exists
so it goes down to 1I. It always asked the cache, never found a result and calculates the needed value.
So you never get a "Cache Hit" and you have the additional work of asking the cache. To really benefit from the
cache you need to change fact itself, so it starts from 1I up to 100I. The current version even throws StackOverflow
for big inputs, even with the memoize function.
Only the second call benefits from the cache, That is why calling fact 100I twice will ever only print "Cache Hit" once.
This is just an example that is easy to get the behaviour wrong with caching/memoization. In general you should try to
write a function so it is tail-recursive and uses accumulators instead. Don't try to write functions that expects
memoization to work properly.
I would pick a solution with an accumulator. If you profiled your application and you found that this is still to slow
and you have a bottleneck in your application and caching fact would help, then you also can just cache the results of
facts directly. Something like this. You could use dict or a Map for this.
let factCache = [1I..100I] |> List.map (fun x -> x,fact x) |> dict
let factCache = [1I..100I] |> List.map (fun x -> x,fact x) |> Map.ofList
I wrote a function which generates a list of randomized ints in OCaml.
let create_shuffled_int_list n =
Random.self_init;
let rec create n' acc =
if n' = 0 then acc
else
create (n'-1) (acc # [Random.int (n/2)])
in
create n [];;
When I tried to generate 10000 integers, it gives Exception: RangeError: Maximum call stack size exceeded. error.
However, I believed in the function, I have used tail-recursion and it should not give stackoverflow error, right?
Any idea?
From the core library documentation
val append : 'a list -> 'a list -> 'a list
Catenate two lists. Same function as the infix operator #. Not tail-recursive (length of the first argument). The # operator is not tail-recursive either.
So it's not your function that's causing the overflow, it's the # function. Seeing as you only care about producing a shuffled list, however, there's no reason to be appending things onto the end of lists. Even if the # operator were tail-recursive, list append is still O(n). List prepending, however, is O(1). So if you stick your new random numbers on the front of your list, you avoid the overflow (and make your function much much faster):
let create_shuffled_int_list n =
Random.self_init;
let rec create n' acc =
if n' = 0 then acc
else
create (n'-1) (Random.int (n/2) :: acc)
in
create n [];;
If you care about the order (not sure why), then just stick a List.rev on the end:
List.rev (create n []);;
As an aside, you should not call Random.self_init in a function, since:
the user of your function may want to control the seed in order to obtain reproductible results (testing, sharing results...)
this may reset the seed with a not so random entropy source and you probably want to do this only once.