Can the F# compiler optimize these mutually recursive functions?

Can the F# compiler optimize these mutually recursive functions? - recursion

I wrote the following function that checks the validity of bracketed expressions:
let matched str =
let rec matched' stack = function
| "" -> isEmpty stack
| str ->
match first str with
| '(' | '[' | '{' as p -> matched' (push p stack) (rest str)
| ')' -> matchClosing '(' stack str
| ']' -> matchClosing '[' stack str
| '}' -> matchClosing '{' stack str
| _ -> matched' stack (rest str)
and matchClosing expected stack s =
match peek stack with
| Some c when c = expected -> matched' (pop stack) (rest s)
| _ -> false
matched' [] str
If we substitute the implementation of matchClosing into matched', we get a tail recursive function. Can the F# compiler recognize this and optimize away the recursive calls?

AFAICT your example isn't complete which makes it harder to check. I complemented it somewhat and was able to compile it.
Using ILSpy one sees that the mutual recursion is still in place:
// F#: | ')' -> matchClosing '(' stack str
case ')':
return Program.matchClosing#39('(', stack, str);
// F#: | matched' t (tail s)
return Program.matched'#28(t, s.Tail);
So while it should be technically possible to unpack two mutually tail recursive function into a loop it's not done.
When checking the IL code we see that the the calls are tagged with .tail
// F#: | matchClosing '(' stack str
IL_0083: tail. // Here
IL_0085: call bool Program::matchClosing#39(char, class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<char>, valuetype Program/SubString)
// F#: | matched' t (tail s)
IL_002a: tail. // Here
IL_002c: call bool Program::'matched\'#28'(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<char>, valuetype Program/SubString)
The .NET Jitter is in release mode kind enough to consider .tail flag
// As you can see when debugging the code in WinDbg
02410bdf e8fbd3176b call clr!JIT_TailCall (6d58dfdf)
We also see when we debug in WinDbg that the stack don't grow. Unfortunately when looking at clr!JIT_TailCall it does a fair amount of work meaning while it doesn't consume stack it consumes clock cycles instead like noted here: How to eliminate time spent in JIT_TailCall for functions that are genuinely non-recursive
However in Debug mode (and at least older versions of Mono) .tail flag is ignored
// As you can see when debugging the code in WinDbg (this is a normal call)
02f619c1 e8c2f4ffff call 02f60e88
We also see when we debug in WinDbg that the stack grow.
So the answer to your question should be:
No, the F# compiler isn't able to transform the mutually tail recursive calls into a loop.
However, the F# compiler tags the calls with a .tail attribute
The Release mode JIT:er kindly considers the .tail attributes and generates tail calls that don't grow the stack (but are ineffecient)
In Debug mode (and possibly mono) .tail attributes are ignored and no tail calls are generated by the JIT:er and the stack will grow.

Related

Stack implementation in recursive function

I'm trying to implement a recursive backtracking function using depth first search and I'm stuck in a point where I need to know my previous position in a matrix.
The idea is this: I have a matrix as a 2D Array and this is my function:
Mark the current point,if the point is what I'm looking for, I set the point in the matrix as part of the solution and all the previously marked points as part of the solution as well.
Else I call the function to a valid adjacent point.
The problem is the third case: if there are no valid adjacents points, then I need to mark the point as wrong and call the function to my previous location. To do that I think I need a stack that keeps track of my previous movement but I'm having an hard time figuring out how to do so in f#.
let rec solve (x,y) =
mark (x,y)
if (x,y) = pointimlookingfor then
for x in 0.. array width-1 do
for y in 0..array height-1 do
if Myarray.[x,y]=markedpoint then
Myarray.[x,y]<-partofsolution
else if (List.isEmpty(adjacentslist) then
Myarray.[x,y]<-wrong point
solve (the previous visited point)
else
for (adjacentpoint) in adjacentslist do
solve(adjacentpoint)
Any ideas?

In most functional languages, the default list type is an immutable linked-list, which you can use as a simple stack, because of its construction.
cons is push into stack, and head is pop from stack.
With that, we can write a simple stack module.
module Stack =
let empty = []
let push item stack = item::stack
let pop = function
| [] -> failwith "No items in stack"
| x::xs -> xs
let peek stack = stack |> List.tryHead
So,
Stack.empty |> Stack.push 1 |> Stack.push 2 |> Stack.pop |> Stack.pop = Stack.empty //true
In actual practice, instead of explicitly using functions like the above, the easiest way is to use pattern matching on some accumulator which you carry with you as you recurse/fold.
For an example, let's re-create a classic use-case for a stack - balancing parenthesis.
Every time you encounter an open brace, you push to stack, when you encounter a closing brace, you pop from the stack, and see if it matches the last one you pushed in. If it doesn't, it's unbalanced.
let rec isBalanced stack = function
| '(' | '{' | '[' as opened -> opened::stack //push into stack
| ')' | '}' | ']' as closed ->
match stack with
| opened::rest as all -> //pop from stack
match opened, closed with
| '(', ')'
| '{', '}'
| '[', ']' -> rest
| _ -> failwith "Mismatched braces"
| [] -> failwith "Closing before open"
| _ -> stack
"abc() { [ 1; 2; 3] }" |> Seq.fold (isBalanced) []
There are more concise ways to write this, but this illustrates how you can simulate a classical stack with immutable structures.
In your case, you could push an (x,y) tuple on to the stack, and let the algorithm backtrack by destructuring it: (x,y)::tail.

Understanding side effects with monadic traversal

I am trying to properly understand how side effects work when traversing a list in F# using monadic style, following Scott's guide here
I have an AsyncSeq of items, and a side-effecting function that can return a Result<'a,'b> (it is saving the items to disk).
I get the general idea - split the head and tail, apply the func to the head. If it returns Ok then recurse through the tail, doing the same thing. If an Error is returned at any point then short circuit and return it.
I also get why Scott's ultimate solution uses foldBack rather than fold - it keeps the output list in the same order as the input as each processed item is prepended to the previous.
I can also follow the logic:
The result from the list's last item (processed first as we are using foldback) will be passed as the accumulator to the next item.
If it is an Error and the next item is Ok, the next item is discarded.
If the next item is an Error, it replaces any previous results and becomes the accumulator.
That means by the time you have recursed over the entire list from right to left and ended up at the start, you either have an Ok of all of the results in the correct order or the most recent Error (which would have been the first to occur if we had gone left to right).
The thing that confuses me is that surely, since we are starting at the end of the list, all side effects of processing every item will take place, even if we only get back the last Error that was created?
This seems to be confirmed here as the print output starts with [5], then [4,5], then [3,4,5] etc.
The thing that confuses me is that this isn't what I see happening when I use AsyncSeq.traverseChoiceAsync from the FSharpx lib (which I wrapped to process Result instead of Choice). I see side effects happening from left to right, stopping on the first error, which is what I want to happen.
It also looks like Scott's non-tail recursive version (which doesn't use foldBack and just recurses over the list) goes from left to right? The same goes for the AsyncSeq version. That would explain why I see it short circuit on the first error but surely if it completes Ok then the output items would be reversed, which is why we normally use foldback?
I feel I am misunderstanding or misreading something obvious! Could someone please explain it to me? :)
Edit:
rmunn has given a really great comprehensive explanation of the AsyncSeq traversal below. The TLDR was that
Scott's initial implementation and the AsyncSeq traverse both do go from left to right as I thought and so only process until they hit an error
they keep their contents in order by prepending the head to the processed tail rather than prepending each processed result to the previous (which is what the built in F# fold does).
foldback would keep things in order but would indeed execute every case (which could take forever with an async seq)

It's pretty simple: traverseChoiceAsync isn't using foldBack. Yes, with foldBack the last item would be processed first, so that by the time you get to the first item and discover that its result is Error you'd have triggered the side effects of every item. Which is, I think, precisely why whoever wrote traverseChoiceAsync in FSharpx chose not to use foldBack, because they wanted to ensure that side effects would be triggered in order, and stop at the first Error (or, in the case of the Choice version of the function, the first Choice2Of2 — but I'll pretend from this point on that that function was written to use the Result type.)
Let's look at the traverseChoieAsync function in the code you linked to, and read through it step-by-step. I'll also rewrite it to use Result instead of Choice, because the two types are basically identical in function but with different names in the DU, and it'll be a little easier to tell what's going on if the DU cases are called Ok and Error instead of Choice1Of2 and Choice2Of2. Here's the original code:
let rec traverseChoiceAsync (f:'a -> Async<Choice<'b, 'e>>) (s:AsyncSeq<'a>) : Async<Choice<AsyncSeq<'b>, 'e>> = async {
let! s = s
match s with
| Nil -> return Choice1Of2 (Nil |> async.Return)
| Cons(a,tl) ->
let! b = f a
match b with
| Choice1Of2 b ->
return! traverseChoiceAsync f tl |> Async.map (Choice.mapl (fun tl -> Cons(b, tl) |> async.Return))
| Choice2Of2 e ->
return Choice2Of2 e }
And here's the original code rewritten to use Result. Note that it's a simple rename, and none of the logic needs to be changed:
let rec traverseResultAsync (f:'a -> Async<Result<'b, 'e>>) (s:AsyncSeq<'a>) : Async<Result<AsyncSeq<'b>, 'e>> = async {
let! s = s
match s with
| Nil -> return Ok (Nil |> async.Return)
| Cons(a,tl) ->
let! b = f a
match b with
| Ok b ->
return! traverseChoiceAsync f tl |> Async.map (Result.map (fun tl -> Cons(b, tl) |> async.Return))
| Error e ->
return Error e }
Now let's step through it. The whole function is wrapped inside an async { } block, so let! inside this function means "unwrap" in an async context (essentially, "await").
let! s = s
This takes the s parameter (of type AsyncSeq<'a>) and unwraps it, binding the result to a local name s that henceforth will shadow the original parameter. When you await the result of an AsyncSeq, what you get is the first element only, while the rest is still wrapped in an async that needs to be further awaited. You can see this by looking at the result of the match expression, or by looking at the definition of the AsyncSeq type:
type AsyncSeq<'T> = Async<AsyncSeqInner<'T>>
and AsyncSeqInner<'T> =
| Nil
| Cons of 'T * AsyncSeq<'T>
So when you do let! x = s when s is of type AsyncSeq<'T>, the value of x will either be Nil (when the sequence has run to its end) or it will be Cons(head, tail) where head is of type 'T and tail is of type AsyncSeq<'T>.
So after this let! s = s line, our local name s now refers to an AsyncSeqInner type, which contains the head item of the sequence (or Nil if the sequence was empty), and the rest of the sequence is still wrapped in an AsyncSeq so it has yet to be evaluated (and, crucially, its side effects have not yet happened).
match s with
| Nil -> return Ok (Nil |> async.Return)
There's a lot happening in this line, so it'll take a bit of unpacking, but the gist is that if the input sequence s had Nil as its head, i.e. had reached its end, then that's not an error, and we return an empty sequence.
Now to unpack. The outer return is in an async keyword, so it takes the Result (whose value is Ok something) and turns it into an Async<Result<something>>. Remembering that the return type of the function is declared as Async<Result<AsyncSeq>>, the inner something is clearly an AsyncSeq type. So what's going on with that Nil |> async.Return? Well, async isn't an F# keyword, it's the name of an instance of AsyncBuilder. Inside a computation expression foo { ... }, return x is translated into foo.Return(x). So calling async.Return x is just the same as writing async { return x }, except that it avoids nesting a computation expression inside another computation expression, which would be a little nasty to try and parse mentally (and I'm not 100% sure the F# compiler allows it syntactically). So Nil |> async.Return is async.Return Nil which means it produces a value of Async<x> where x is the type of the value Nil. And as we just saw, this Nil is a value of type AsyncSeqInner, so Nil |> async.Return produces an Async<AsyncSeqInner>. And another name for Async<AsyncSeqInner> is AsyncSeq. So this whole expression produces an Async<Result<AsyncSeq>> that has the meaning of "We're done here, there are no more items in the sequence, and there was no error".
Phew. Now for the next line:
| Cons(a,tl) ->
Simple: if the next item in the AsyncSeq named s was a Cons, we deconstruct it so that the actual item is now called a, and the tail (another AsyncSeq) is called tl.
let! b = f a
This calls f on the value we just got out of s, and then unwraps the Async part of f's return value, so that b is now a Result<'b, 'e>.
match b with
| Ok b ->
More shadowed names. Inside this branch of the match, b now names a value of type 'b rather than a Result<'b, 'e>.
return! traverseResultAsync f tl |> Async.map (Result.map (fun tl -> Cons(b, tl) |> async.Return))
Hoo boy. That's too much to tackle at once. Let's write this as if the |> operators were lined up on separate lines, and then we'll go through each step one at a time. (Note that I've wrapped an extra pair of parentheses around this, just to clarify that it's the final result of this whole expression that will be passed to the return! keyword).
return! (
traverseResultAsync f tl
|> Async.map (
Result.map (
fun tl -> Cons(b, tl) |> async.Return)))
I'm going to tackle this expression from the inside out. The inner line is:
fun tl -> Cons(b, tl) |> async.Return
The async.Return thing we've already seen. This is a function that takes a tail (we don't currently know, or care, what's inside that tail, except that by the necessity of the type signature of Cons it must be an AsyncSeq) and turns it into an AsyncSeq that is b followed by the tail. I.e., this is like b :: tl in a list: it sticks b onto the front of the AsyncSeq.
One step out from that innermost expression is:
Result.map
Remember that the function map can be thought of in two ways: one is "take a function and run it against whatever is "inside" this wrapper". The other is "take a function that operates on 'T and make it into a function that operates on Wrapper<'T>". (If you don't have both of those clear in your mind yet, https://sidburn.github.io/blog/2016/03/27/understanding-map is a pretty good article to help grok that concept). So what this is doing is taking a function of type AsyncSeq -> AsyncSeq and turning it into a function of type Result<AsyncSeq> -> Result<AsyncSeq>. Alternately, you could think of it as taking a Result<tail> and calling fun tail -> ... against that tail result, then re-wrapping the result of that function in a new Result. Important: Because this is using Result.map (Choice.mapl in the original) we know that if tail is an Error value (or if the Choice was a Choice2Of2 in the original), the function will not be called. So if traverseResultAsync produces a result that starts with an Error value, it's going to produce an <Async<Result<foo>>> where the value of Result<foo> is an Error, and so the value of the tail will be discarded. Keep that in mind for later.
Okay, next step out.
Async.map
Here, we have a Result<AsyncSeq> -> Result<AsyncSeq> function produced by the inner expression, and this converts it to an Async<Result<AsyncSeq>> -> Async<Result<AsyncSeq>> function. We've just talked about this, so we don't need to go over how map works again. Just remember that the effect of this Async<Result<AsyncSeq>> -> Async<Result<AsyncSeq>> function that we've built up will be the following:
Await the outer async.
If the result is Error, return that Error.
If the result is Ok tail, produce an Ok (Cons (b, tail)).
Next line:
traverseResultAsync f tl
I probably should have started with this, because this will actually run first, and then its value will be passed into the Async<Result<AsyncSeq>> -> Async<Result<AsyncSeq>> function that we've just analysed.
So what this whole thing will do is to say "Okay, we took the first part of the AsyncSeq we were handed, and passed it to f, and f produced an Ok result with a value we're calling b. So now we need to process the rest of the sequence similarly, and then, if the rest of the sequence produces an Ok result, we'll stick b on the front of it and return an Ok sequence with contents b :: tail. BUT if the rest of the sequence produces an Error, we'll throw away the value of b and just return that Error unchanged."
return!
This just takes the result we just got (either an Error or an Ok (b :: tail), already wrapped in an Async) and returns it unchanged. But note that the call to traverseResultAsync is NOT tail-recursive, because its value had to be passed into the Async.map (...) expression first.
And now we still have one more bit of traverseResultAsync to look at. Remember when I said "Keep that in mind for later"? Well, that time has arrived.
| Error e ->
return Error e }
Here we're back in the match b with expression. If b was an Error result, then no further recursive calls are made, and the whole traverseResultAsync returns an Async<Result> where the Result value is Error. And if we were currently nested deep inside a recursion (i.e., we're in the return! traverseResultAsync ... expression), then our return value will be Error, which means the result of the "outer" call, as we've kept in mind, will also be Error, discarding any other Ok results that might have happened "before".
Conclusion
And so the effect of all of that is:
Step through the AsyncSeq, calling f on each item in turn.
The first time f returns Error, stop stepping through, throw away any previous Ok results, and return that Error as the result of the whole thing.
If f never returns Error and instead returns Ok b every time, return an Ok result that contains an AsyncSeq of all those b values, in their original order.
Why are they in their original order? Because the logic in the Ok case is:
If sequence was empty, return an empty sequence.
Split into head and tail.
Get value b from f head.
Process the tail.
Stick value b in front of the result of processing the tail.
So if we started with (conceptually) [a1; a2; a3], which actually looks like Cons (a1, Cons (a2, Cons (a3, Nil))) we'll end up with Cons (b1, Cons (b2, Cons (b3, Nil))) which translates to the conceptual sequence [b1; b2; b3].

See #rmunn's great answer above for the explanation. I just wanted to post a little helper for anyone that reads this in the future, it allows you to use the AsyncSeq traverse with Results instead of the old Choice type it was written with:
let traverseResultAsyncM (mapping : 'a -> Async<Result<'b,'c>>) source =
let mapping' =
mapping
>> Async.map (function
| Ok x -> Choice1Of2 x
| Error e -> Choice2Of2 e)
AsyncSeq.traverseChoiceAsync mapping' source
|> Async.map (function
| Choice1Of2 x -> Ok x
| Choice2Of2 e -> Error e)
Also here is a version for non-async mappings:
let traverseResultM (mapping : 'a -> Result<'b,'c>) source =
let mapping' x = async {
return
mapping x
|> function
| Ok x -> Choice1Of2 x
| Error e -> Choice2Of2 e
}
AsyncSeq.traverseChoiceAsync mapping' source
|> Async.map (function
| Choice1Of2 x -> Ok x
| Choice2Of2 e -> Error e)

Why is this not tail recursive

I've got a (co?)recursive pair of functions that process a list of tuples, and fold them into batches based on some start and end criteria.
I don't do f# that much so I may be being stupid.
I've already amended a simple non tail recursive version into this, by explicitly introducing a "tot" parameter that constitutes the current folded state, what I believed to be tail recursive, yet I get the dreaded stack overflow on large inputs....(in both debugger and (debug) .exe)
There probably is a better way of doing this as an explicit fold...but that's almost not the point, the point is why is it seemingly not tail recursice?
let rec ignoreUntil2 (xs : List<(string * string)>) tot = //: List<(string * string)> -> List<List<(string * string)>> -> List<List<(string * string)>> =
match xs with
| [] -> tot
| ((s1,s2)::tail) ->
if s2.StartsWith("Start importing record: Product") then
takeUntil2 [] ((s1,s2)::tail) tot
else
ignoreUntil2 tail tot
and takeUntil2 acc xs tot = // : List<(string * string)> -> List<(string * string)> -> List<List<(string * string)>> -> List<List<(string * string)>> =
match xs with
| [] -> acc :: tot
| ((s1,s2)::tail) ->
let newAcc = ((s1,s2)::acc)
if s2.StartsWith("Finished importing record: Product") then
ignoreUntil2 tail (newAcc :: tot)
else
takeUntil2 newAcc tail tot

Your code is tail recursive.
(in both debugger and (debug) .exe)
By default the F# compiler does not eliminate tail calls in debug mode. You'll need to either enable the --tailcalls option explicitly or compile in release mode.

How can I make a retry function tail recursive?

I have a discriminated union that is similar to the Result type used in Scott's Railway Oriented Programming. For simplicity's sake, it's slightly simplified here:
type ErrorMessage = ErrorMessage of string
type ValidationResult<'a> =
| Success of 'a
| Error of ErrorMessage
I have a corresponding module ValidationResult that contains functions that act on these ValidationResults, one of them is a recursive retryable function that allows the parameter, f: unit -> 'a, to be called again (such as reading from stdin) if the ValidationResult is Error:
module ValidationResult
let doubleMap success error = function
| Success x -> success x
| Error e -> error e
let rec retryable errorHandler f =
let result = f ()
let retry e =
errorHandler e
retryable errorHandler f
doubleMap id retry result
But it isn't tail recursive and I would like to convert it to be so. How can I do that?

The F# compiler compiles tail-recursive functions in two different ways.
If the function is simple (calls itself directly), then it is compiled into a loop
If the tail-recursion involves multiple different functions (or even function values), then the compiler uses the .tail IL instruction to do a tail-call. This is also a tail-call, but handled by the .NET runtime rather than eliminated by the F# compiler.
In your case, the retryable function is already tail-recursive, but it is the second kind. Daniel's answer makes it simple enough so that it becomes the first kind.
However, you can keep the function as you have it and it will be tail-recursive. The only thing to note is that the compiler does not generate the .tail instruction by default in Debug mode (as it messes up the call stack) and so you need to enable it explicitly (in project options, check "Generate tail calls").

Just removing the call to doubleMap should do it:
let rec retryable errorHandler f =
match f() with
| Success x -> x
| Error e ->
errorHandler e
retryable errorHandler f

How to return the index of a for loop in OCaml?

let find_free_next heap start =
for i = start to ((Array.length heap)-1) do
match heap.(i) with
Hdr (Free (h), g) ->
i
done
How can i return the index of a loop as an integer once the match has been found?

If you want to stick to the imperative style, you can use an exception to exit the loop:
exception Found of int
let find_free_next heap start =
try
for i = start to Array.length heap - 1 do
match heap.(i) with
| Hdr (Free (h), g) -> raise (Found i)
| _ -> () (* If it is not what you are seeking *)
done;
raise Not_found
with
| Found n -> n
But generally, as ppl have already written, functional style is more preferred in OCaml:
let find_free_next heap start =
let len = Array.length heap in
let rec find i =
if i >= len then None
else
match heap.(i) with
| Hdr (Free h, g) -> Some i
| _ -> find (i+1)
in
find start
In this example, there is not much difference between the two versions, but use of exceptions for exiting loops/recursions must be used with caution; you can introduce control flow bugs pretty easily with them, and they are sometimes hard to debug.
BTW, you can use Array.unsafe_get heap i to speed up your array access since you can be sure that i is always in the valid range of the array the above examples. (Oh, we need start >= 0 check in addition, though.)

Asumu Takikawa is right, the for loop in OCaml doesn't return a result. In idiomatic OCaml, you should use recursion instead. Ideally there would be a standard function like List.find that works for arrays. There is a function BatArray.findi in OCaml Batteries Included that does what you seem to want.

Simpler, and more efficient (no allocation at all):
let rec find_free_next heap start =
if start = Array.length heap then raise Not_found;
match heap.(i) with
| Hdr (Free h, g) -> i
| _ -> find_free_start heap (i+1)
Or, in imperative style:
let exit = Exit
let find_free_next heap start =
let pos = ref (-1) in
try
for i = start to Array.length heap - 1 do
match heap.(i) with
| Hdr (Free h, g) -> pos := i; raise exit
| _ -> ()
done;
raise Not_found
with Exit -> !pos
(notice that raise exit does not allocate only because the exception if precomputed).

Loops in Ocaml are supposed to be imperative, so it shouldn't return a result (aside from unit). So if you try to return a non-unit result, the compiler will give a warning.
The reason that Ocaml doesn't let you return a result from a loop is because this isn't a very functional idiom. If you use a recursive function instead of a loop, it's easy to exit early and return a result (by returning the result instead of recurring). If you want to write idiomatic Ocaml, you probably want to use recursion in this case.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Can the F# compiler optimize these mutually recursive functions? - recursion

Related

Stack implementation in recursive function

Understanding side effects with monadic traversal

Why is this not tail recursive

How can I make a retry function tail recursive?

How to return the index of a for loop in OCaml?

Categories

Resources