Accidental recursion, blowing up the stack with Seq.append, without using `rec` - recursion

I had code that was waiting to blow up something lurking around. Using F# 4.1 Result it is similar to this:
module Result =
let unwindSeq (sourceSeq: #seq<Result<_, _>>) =
sourceSeq
|> Seq.fold (fun state res ->
match state with
| Error e -> Error e
| Ok innerResult ->
match res with
| Ok suc ->
Seq.singleton suc
|> Seq.append innerResult
|> Ok
| Error e -> Error e) (Ok Seq.empty)
The obvious bottleneck here is Seq.singleton added to Seq.append. I understand that this is slow (and badly written), but why does it have to blow up the stack? I don't think that Seq.append is inherently recursive...
// blows up stack, StackOverflowException
Seq.init 1000000 Result.Ok
|> Result.unwindSeq
|> printfn "%A"
And as an aside, to unwind a sequence of Result, I fixed this function by using a simple try-catch-reraise, but that feels sub-par too. Any ideas as to how to do this more idiomatically without force-evaluating the sequence or blowing up the stack?
Not-so-perfect unwinding (it also forces the result-fail type), but at least without pre-evaluation of the sequence:
let unwindSeqWith throwArgument (sourceSeq: #seq<Result<_, 'a -> 'b>>) =
try
sourceSeq
|> Seq.map (throwOrReturnWith throwArgument)
|> Ok
with
| e ->
(fun _ -> raise e)
|> Error

I believe the idiomatic way of folding a sequence of Results in the way you suggest would be:
let unwindSeq<'a,'b> =
Seq.fold<Result<'a,'b>, Result<'a seq, 'b>>
(fun acc cur -> acc |> Result.bind (fun a -> cur |> Result.bind (Seq.singleton >> Seq.append a >> Ok)))
(Ok Seq.empty)
Not that this will be any faster than your current implementation, it just leverages Result.bind to do most of the work. I believe the stack is overflowing because a recursive function somewhere in the F# library, likely in the Seq module. My best evidence for this is that materializing the sequence to a List first seems to make it work, as in the following example:
let results =
Seq.init 2000000 (fun i -> if i <= 1000000 then Result.Ok i else Error "too big")
|> Seq.toList
results
|> unwindSeq
|> printfn "%A"
However, this may not work in your production scenario if the sequence is too big to materialize in memory.

Related

F# stop Seq.map when a predicate evaluates true

I'm currently generating a sequence in a similar way to:
migrators
|> Seq.map (fun m -> m())
The migrator function is ultimately returning a discriminated union like:
type MigratorResult =
| Success of string * TimeSpan
| Error of string * Exception
I want to stop the map once I encounter my first Error but I need to include the Error in the final sequence.
I have something like the following to display a final message to the user
match results |> List.rev with
| [] -> "No results equals no migrators"
| head :: _ ->
match head with
| Success (dt, t) -> "All migrators succeeded"
| Error (dt, ex) -> "Migration halted owing to error"
So I need:
A way to stop the mapping when one of the map steps produces an Error
A way to have that error be the final element added to the sequence
I appreciate there may be a different sequence method other than map that will do this, I'm new to F# and searching online hasn't yielded anything as yet!
I guess there are multiple approaches here, but one way would be to use unfold:
migrators
|> Seq.unfold (fun ms ->
match ms with
| m :: tl ->
match m () with
| Success res -> Some (Success res, tl)
| Error res -> Some (Error res, [])
| [] -> None)
|> List.ofSeq
Note the List.ofSeq at the end, that's just there for realizing the sequence. A different way to go would be to use sequence comprehensions, some might say it results in a clearer code.
The ugly things Tomaš alludes to are 1) mutable state, and 2) manipulation of the underlying enumerator. A higher-order function which returns up to and including when the predicate holds would then look like this:
module Seq =
let takeUntil pred (xs : _ seq) = seq{
use en = xs.GetEnumerator()
let flag = ref true
while !flag && en.MoveNext() do
flag := not <| pred en.Current
yield en.Current }
seq{1..10} |> Seq.takeUntil (fun x -> x % 5 = 0)
|> Seq.toList
// val it : int list = [1; 2; 3; 4; 5]
For your specific application, you'd map the cases of the DU to a boolean.
(migrators : seq<MigratorResult>)
|> Seq.takeUntil (function Success _ -> false | Error _ -> true)
I think the answer from #scrwtp is probably the nicest way to do this if your input is reasonably small (and you can turn it into an F# list to use pattern matching). I'll add one more version, which works when your input is just a sequence and you do not want to turn it into a list.
Essentially, you want to do something that's almost like Seq.takeWhile, but it gives you one additional item at the end (the one, for which the predicate fails).
To use a simpler example, the following returns all numbers from a sequence until one that is divisible by 5:
let nums = [ 2 .. 10 ]
nums
|> Seq.map (fun m -> m % 5)
|> Seq.takeWhile (fun n -> n <> 0)
So, you basically just need to look one element ahead - to do this, you could use Seq.pairwise which gives you the current and the next element in the sequence"
nums
|> Seq.map (fun m -> m % 5)
|> Seq.pairwise // Get sequence of pairs with the next value
|> Seq.takeWhile (fun (p, n) -> p <> 0) // Look at the next value for test
|> Seq.mapi (fun i (p, n) -> // For the first item, we return both
if i = 0 then [p;n] else [n]) // for all other, we return the second
|> Seq.concat
The only ugly thing here is that you then need to flatten the sequence again using mapi and concat.
This is not very nice, so a good thing to do would be to define your own higher-order function like Seq.takeUntilAfter that encapsulates the behavior you need (and hides all the ugly things). Then your code could just use the function and look nice & readable (and you can experiment with other ways of implementing this).

Using continuation / CPS to implement tail-recursive MergeSort in OCaml

I am trying to implement a tail-recursive MergeSort in OCaml.
Since Mergesort naturally is not tail-recursive, so I am using CPS to implement it.
Also my implementation is inspired by Tail-recursive merge sort in OCaml
Below is my code
let merge compare_fun l1 l2 =
let rec mg l1 l2 acc =
match l1, l2 with
| ([], []) -> List.rev acc
| ([], hd2::tl2) -> mg [] tl2 (hd2::acc)
| (hd1::tl1, []) -> mg tl1 [] (hd1::acc)
| (hd1::tl1, hd2::tl2) ->
let c = compare_fun hd1 hd2
in
if c = 1 then mg l1 tl2 (hd2::acc)
else if c = 0 then mg tl1 tl2 (hd2::hd1::acc)
else mg tl1 l2 (hd1::acc)
in
mg l1 l2 [];;
let split_list p l =
let rec split_list p (acc1, acc2) = function
| [] -> (List.rev acc1, List.rev acc2)
| hd::tl ->
if p > 0 then split_list (p-1) (hd::acc1, acc2) tl
else split_list (p-2) (acc1, hd::acc2) tl
in
split_list p ([], []) l;;
let mergeSort_cps compare_fun l =
let rec sort_cps l cf = (*cf = continuation func*)
match l with
| [] -> cf []
| hd::[] -> cf [hd]
| _ ->
let (left, right) = split_list ((List.length l)/2) l
in
sort_cps left (fun leftR -> sort_cps right (fun rightR -> cf (merge compare_fun leftR rightR)))
in
sort_cps l (fun x -> x);;
When I compile it, and run it with a 1,000,000 integers, it gives the error of stackoverflow. Why?
Edit
Here is the code I used for testing:
let compare_int x y =
if x > y then 1
else if x = y then 0
else -1;;
let create_list n =
Random.self_init ();
let rec create n' acc =
if n' = 0 then acc
else
create (n'-1) ((Random.int (n/2))::acc)
in
create n [];;
let l = create_list 1000000;;
let sl = mergeSort_cps compare_int l;;
in http://try.ocamlpro.com/, it gave this error: Exception: RangeError: Maximum call stack size exceeded.
in local ocaml top level, it didn't have any problem
Adding another answer to make a separate point: it seems that much of the confusion among answerers is caused by the fact that you don't use the standard OCaml compiler, but the TryOCaml website which runs a distinct OCaml backend, on top of javascript, and has therefore slightly different optimization and runtime characteristics.
I can reliably reproduce the fact that, on the TryOCaml website, the CPS-style function mergeSort_cps you show fails on lists of length 1_000_000 with the following error:
Exception: InternalError: too much recursion.
My analysis is that this is not due to a lack of tail-rec-ness, but by a lack of support, on the Javascript backend, of the non-obvious way in which the CPS-translated call is tailrec: recursion goes through a lambda-abstraction boundary (but still in tail position).
Turning the code in the direct, non-tail-rec version makes the problem go away:
let rec merge_sort compare = function
| [] -> []
| [hd] -> [hd]
| l ->
let (left, right) = split_list (List.length l / 2) l in
merge compare (merge_sort compare left) (merge_sort compare right);;
As I said in my other answer, this code has a logarithmic stack depth, so no StackOverflow will arise from its use (tail-rec is not everything). It is simpler code that the Javascript backend handles better.
Note that you can make it noticeably faster by using a better implementation split (still with your definition of merge) that avoids the double traversal of List.length then splitting:
let split li =
let rec split ls rs = function
| [] -> (ls, rs)
| x::xs -> split rs (x::ls) xs in
split [] [] li;;
let rec merge_sort compare = function
| [] -> []
| [hd] -> [hd]
| l ->
let (left, right) = split l in
merge compare (merge_sort compare left) (merge_sort compare right);;
Reading the comments, it seems that your Stack_overflow error is hard to reproduce.
Nevertheless, your code is not entirely in CPS or tail-recursive: in merge_sort, the calls to split_list and merge are made in a non-tail-call position.
The question is: by making your CPS transform and generous use of accumulators, what will be the worst stack depth related to recursion? Saving stack depth on the sort calls is in fact not very interesting: as each split the list in two, the worst stack depth would be O(log n) for n the size the input list.
On the contrary, split and merge would have made a linear O(n) usage of the stack if they weren't written in accumulator-passing style, so they are important to make tail-rec. As your implementation of those routines is tail-rec, there should be no need to worry about stack usage, and neither to convert the sort routine itself in CPS form that makes the code harder to read.
(Note that this logarithmic-decrease argument is specific to mergesort. A quicksort can have linear stack usage in worst case, so it could be important to make it tail-rec.)

F# replacing variables with actual values results in endless loop (recursive function)

I recently started with F# and implemented a very basic recursive function that represents the Sieve of Eratosthenes. I came up with the following, working code:
static member internal SieveOfEratosthenesRecursive sequence accumulator =
match sequence with
| [] -> accumulator
| head::tail -> let rest = tail |> List.filter(fun number -> number % head <> 0L)
let newAccumulator = head::accumulator
Prime.SieveOfEratosthenesRecursive rest newAccumulator
This function is not really memory efficient so I tried to eliminate the variables "rest" and "newAccumulator". I came up with the following code
static member internal SieveOfEratosthenesRecursive sequence accumulator =
match sequence with
| [] -> accumulator
| head::tail -> tail |> List.filter(fun number -> number % head <> 0L)
|> Prime.SieveOfEratosthenesRecursive (head::accumulator)
As far as I understand the tutorials I've read Prime.SieveOfEratosthenesRecursive will be called with the filtered tail as first parameter and a list consisting of head::accumulator as second one. However when I try to run the code with the reduced variable usage, the program gets trappen in an infinite loop. Why is this happening and what did I do wrong?
As far as I understand the tutorials I've read Prime.SieveOfEratosthenesRecursive will be called with the filtered tail as first parameter and a list consisting of head::accumulator as second one.
You have this backwards.
In the first version, you're passing rest then newAccumulator; in the second version, you're effectively passing newAccumulator then rest. I.e., you've transposed the arguments.
Prime.SieveOfEratosthenesRecursive (head::accumulator) is a partial function application wherein you're applying (head::accumulator) as the first argument (sequence). This partial function application yields a unary function (expecting accumulator), to which you are passing (via |>) what is called rest in the first version of your code.
Changing SieveOfEratosthenesRecursive's argument order is the easiest solution, but I would consider something like the following idiomatic as well:
static member internal SieveOfEratosthenesRecursive sequence accumulator =
match sequence with
| [] -> accumulator
| head::tail ->
tail
|> List.filter(fun number -> number % head <> 0L)
|> Prime.SieveOfEratosthenesRecursive <| (head::accumulator)
or
static member internal SieveOfEratosthenesRecursive sequence accumulator =
let inline flipzip a b = b, a
match sequence with
| [] -> accumulator
| head::tail ->
tail
|> List.filter(fun number -> number % head <> 0L)
|> flipzip (head::accumulator)
||> Prime.SieveOfEratosthenesRecursive
FWIW, eliminating rest and newAccumulator as named variables here is not going to impact your memory usage in the slightest.
The last call in your second function is equivalent to:
Prime.SieveOfEratosthenesRecursive newAccumulator rest
where you switch positions of two params. Since newAccumulator grows bigger after each recursive call, you will never reach the base case of empty list.
The rule of thumb is putting the most frequently changing parameter at last:
let rec sieve acc xs =
match xs with
| [] -> acc
| x::xs' -> xs' |> List.filter (fun y -> y % x <> 0L)
|> sieve (x::acc)
The above function could be shortened using function keyword:
let rec sieve acc = function
| [] -> acc
| x::xs' -> xs' |> List.filter (fun y -> y % x <> 0L)
|> sieve (x::acc)
Using pipe (|>) operator only makes the function more readable, it doesn't affect memory usage at all.

Recursive lambdas in F#

Take this example code (ignore it being horribly inefficient for the moment)
let listToString (lst:list<'a>) = ;;' prettify fix
let rec inner (lst:list<'a>) buffer = ;;' prettify fix
match List.length lst with
| 0 -> buffer
| _ -> inner (List.tl lst) (buffer + ((List.hd lst).ToString()))
inner lst ""
This is a common pattern I keep coming across in F#, I need to have an inner function who recurses itself over some value - and I only need this function once, is there in any way possible to call a lambda from within it self (some magic keyword or something) ? I would like the code to look something like this:
let listToString2 (lst:list<'a>) = ;;' prettify fix
( fun
(lst:list<'a>) buffer -> match List.length lst with ;;' prettify fix
| 0 -> buffer
| _ -> ##RECURSE## (List.tl lst) (buffer + ((List.hd lst).ToString()))
) lst ""
But as you might expect there is no way to refer to the anonymous function within itself, which is needed where I put ##RECURSE##
Yes, it's possible using so called y-combinators (or fixed-point combinators). Ex:
let rec fix f x = f (fix f) x
let fact f = function
| 0 -> 1
| x -> x * f (x-1)
let _ = (fix fact) 5 (* evaluates to "120" *)
I don't know articles for F# but this haskell entry might also be helpful.
But: I wouldn't use them if there is any alternative - They're quite hard to understand.
Your code (omit the type annotations here) is a standard construct and much more expressive.
let listToString lst =
let rec loop acc = function
| [] -> acc
| x::xs -> loop (acc ^ (string x)) xs
loop "" lst
Note that although you say you use the function only once, technically you refer to it by name twice, which is why it makes sense to give it a name.

How do I use the F# Reflection library?

I am trying to follow this example (from p137 of Rob Pickering's "Foundations of F#" book) but I can't get it to work with the latest F# CTP.
I appear to be missing the definition of 'Value' on the 3rd line where it does
Value.GetInfo(x)
This generates :
error FS0039: The namespace or module 'Value' is not defined.
Can anyone tell me where this is coming from or what the new syntax is if this is now done differently? (be gentle - this is my first play with F#)
Here's the example I am working from:-
#light
open Microsoft.FSharp.Reflection
let printTupleValues x =
match Value.GetInfo(x) with
| TupleValue vals ->
print_string "("
vals
|> List.iteri
(fun i v ->
if i <> List.length vals - 1 then
Printf.printf " %s, " (any_to_string v)
else
print_any v)
print_string " )"
| _ -> print_string "not a tuple"
printTupleValues ("hello world", 1)
The F# reflection library was rewritten for either Beta 1 or the CTP. Here is your code slightly changed to use the new library, and to avoid using the F# PlusPack (print_string is for OCaml compatibility).
open Microsoft.FSharp.Reflection
let printTupleValues x =
if FSharpType.IsTuple( x.GetType() ) then
let s =
FSharpValue.GetTupleFields( x )
|> Array.map (fun a -> a.ToString())
|> Array.reduce (fun a b -> sprintf "%s, %s" a b)
printfn "(%s)" s
else
printfn "not a tuple"
printTupleValues ("hello world", 1)
Or, if you prefer using match to decompose the tuple, then try this using an active pattern. Advantage is you can add support for additional types pretty easily.
open Microsoft.FSharp.Reflection
let (|ParseTuple|_|) = function
| o when FSharpType.IsTuple( o.GetType() ) ->
Some( FSharpValue.GetTupleFields(o) )
| _ -> None
let printTupleValues = function
| ParseTuple vals ->
let s =
vals
|> Array.map (fun a -> a.ToString())
|> Array.reduce (fun a b -> sprintf "%s, %s" a b)
printfn "(%s)" s
| _ ->
printf "not a tuple"
printTupleValues ("hello world", 1)
I don't know whether your function has been renamed or removed in the current F# versions.
You should take a look at FSharp.Reflection in your IDE's object explorer to check that and maybe read this page.

Resources