How to combine Lwt filters? - asynchronous

I am currently learning Lwt. I am interested into using asynchronous processes to replace some shell routines by OCaml routines.
Let us take a look at a simplified first attempt, where a filter is created by combining two threads running cat:
let filter_cat ()=
Lwt_process.pmap_lines ("cat", [| "cat" |])
let filter_t () =
Lwt_io.stdin
|> Lwt_io.read_lines
|> filter_cat ()
|> filter_cat ()
|> Lwt_io.write_lines Lwt_io.stdout
let () =
filter_t ()
|> Lwt_main.run
This filter somehow works but hangs up when its standard input closes instead of exiting. If I remove one of the filter_cat, it works as expected.
I am guessing that I do not compose these filters appropriately and therefore cannot join the two threads I am starting. What is the correct way to compose these filters, so that the program terminates after it reads EOF on stdin?
You can find this program together with a BSD Owl Makefile in a Github gist.

The answer to this, is that there is a little bug in Lwt. There is an internal function, monitor that which performs the piping:
(* Monitor the thread [sender] in the stream [st] so write errors are
reported. *)
let monitor sender st =
let sender = sender >|= fun () -> None in
let state = ref Init in
Lwt_stream.from
(fun () ->
match !state with
| Init ->
let getter = Lwt.apply Lwt_stream.get st in
let result _ =
match Lwt.state sender with
| Lwt.Sleep ->
(* The sender is still sleeping, behave as the
getter. *)
getter
| Lwt.Return _ ->
(* The sender terminated successfully, we are
done monitoring it. *)
state := Done;
getter
| Lwt.Fail _ ->
(* The sender failed, behave as the sender for
this element and save current getter. *)
state := Save getter;
sender
in
Lwt.try_bind (fun () -> Lwt.choose [sender; getter]) result result
| Save t ->
state := Done;
t
| Done ->
Lwt_stream.get st)
The problem is in the definition
let getter = Lwt.apply Lwt_stream.get st
When the getter process meets the end of the stream, then it is saved, but the sender is lost, which seems to prevent completion. This can be fixed by improving the definition of getter by telling it to behave as the sender when the end of the stream has been reached.

Related

Railway oriented programming with Async operations

Previously asked similar question but somehow I'm not finding my way out, attempting again with another example.
The code as a starting point (a bit trimmed) is available at https://ideone.com/zkQcIU.
(it has some issue recognizing Microsoft.FSharp.Core.Result type, not sure why)
Essentially all operations have to be pipelined with the previous function feeding the result to the next one. The operations have to be async and they should return error to the caller in case an exception occurred.
The requirement is to give the caller either result or fault. All functions return a Tuple populated with either Success type Article or Failure with type Error object having descriptive code and message returned from the server.
Will appreciate a working example around my code both for the callee and the caller in an answer.
Callee Code
type Article = {
name: string
}
type Error = {
code: string
message: string
}
let create (article: Article) : Result<Article, Error> =
let request = WebRequest.Create("http://example.com") :?> HttpWebRequest
request.Method <- "GET"
try
use response = request.GetResponse() :?> HttpWebResponse
use reader = new StreamReader(response.GetResponseStream())
use memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(reader.ReadToEnd()))
Ok ((new DataContractJsonSerializer(typeof<Article>)).ReadObject(memoryStream) :?> Article)
with
| :? WebException as e ->
use reader = new StreamReader(e.Response.GetResponseStream())
use memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(reader.ReadToEnd()))
Error ((new DataContractJsonSerializer(typeof<Error>)).ReadObject(memoryStream) :?> Error)
Rest of the chained methods - Same signature and similar bodies. You can actually reuse the body of create for update, upload, and publish to be able to test and compile code.
let update (article: Article) : Result<Article, Error>
// body (same as create, method <- PUT)
let upload (article: Article) : Result<Article, Error>
// body (same as create, method <- PUT)
let publish (article: Article) : Result<Article, Error>
// body (same as create, method < POST)
Caller Code
let chain = create >> Result.bind update >> Result.bind upload >> Result.bind publish
match chain(schemaObject) with
| Ok article -> Debug.WriteLine(article.name)
| Error error -> Debug.WriteLine(error.code + ":" + error.message)
Edit
Based on the answer and matching it with Scott's implementation (https://i.stack.imgur.com/bIxpD.png), to help in comparison and in better understanding.
let bind2 (switchFunction : 'a -> Async<Result<'b, 'c>>) =
fun (asyncTwoTrackInput : Async<Result<'a, 'c>>) -> async {
let! twoTrackInput = asyncTwoTrackInput
match twoTrackInput with
| Ok s -> return! switchFunction s
| Error err -> return Error err
}
Edit 2 Based on F# implementation of bind
let bind3 (binder : 'a -> Async<Result<'b, 'c>>) (asyncResult : Async<Result<'a, 'c>>) = async {
let! result = asyncResult
match result with
| Error e -> return Error e
| Ok x -> return! binder x
}
Take a look at the Suave source code, and specifically the WebPart.bind function. In Suave, a WebPart is a function that takes a context (a "context" is the current request and the response so far) and returns a result of type Async<context option>. The semantics of chaining these together are that if the async returns None, the next step is skipped; if it returns Some value, the next step is called with value as the input. This is pretty much the same semantics as the Result type, so you could almost copy the Suave code and adjust it for Result instead of Option. E.g., something like this:
module AsyncResult
let bind (f : 'a -> Async<Result<'b, 'c>>) (a : Async<Result<'a, 'c>>) : Async<Result<'b, 'c>> = async {
let! r = a
match r with
| Ok value ->
let next : Async<Result<'b, 'c>> = f value
return! next
| Error err -> return (Error err)
}
let compose (f : 'a -> Async<Result<'b, 'e>>) (g : 'b -> Async<Result<'c, 'e>>) : 'a -> Async<Result<'c, 'e>> =
fun x -> bind g (f x)
let (>>=) a f = bind f a
let (>=>) f g = compose f g
Now you can write your chain as follows:
let chain = create >=> update >=> upload >=> publish
let result = chain(schemaObject) |> Async.RunSynchronously
match result with
| Ok article -> Debug.WriteLine(article.name)
| Error error -> Debug.WriteLine(error.code + ":" + error.message)
Caution: I haven't been able to verify this code by running it in F# Interactive, since I don't have any examples of your create/update/etc. functions. It should work, in principle — the types all fit together like Lego building blocks, which is how you can tell that F# code is probably correct — but if I've made a typo that the compiler would have caught, I don't yet know about it. Let me know if that works for you.
Update: In a comment, you asked whether you need to have both the >>= and >=> operators defined, and mentioned that you didn't see them used in the chain code. I defined both because they serve different purposes, just like the |> and >> operators serve different purposes. >>= is like |>: it passes a value into a function. While >=> is like >>: it takes two functions and combines them. If you would write the following in a non-AsyncResult context:
let chain = step1 >> step2 >> step3
Then that translates to:
let asyncResultChain = step1AR >=> step2AR >=> step3AR
Where I'm using the "AR" suffix to indicate versions of those functions that return an Async<Result<whatever>> type. On the other hand, if you had written that in a pass-the-data-through-the-pipeline style:
let result = input |> step1 |> step2 |> step3
Then that would translate to:
let asyncResult = input >>= step1AR >>= step2AR >>= step3AR
So that's why you need both the bind and compose functions, and the operators that correspond to them: so that you can have the equivalent of either the |> or the >> operators for your AsyncResult values.
BTW, the operator "names" that I picked (>>= and >=>), I did not pick randomly. These are the standard operators that are used all over the place for the "bind" and "compose" operations on values like Async, or Result, or AsyncResult. So if you're defining your own, stick with the "standard" operator names and other people reading your code won't be confused.
Update 2: Here's how to read those type signatures:
'a -> Async<Result<'b, 'c>>
This is a function that takes type A, and returns an Async wrapped around a Result. The Result has type B as its success case, and type C as its failure case.
Async<Result<'a, 'c>>
This is a value, not a function. It's an Async wrapped around a Result where type A is the success case, and type C is the failure case.
So the bind function takes two parameters:
a function from A to an async of (either B or C)).
a value that's an async of (either A or C)).
And it returns:
a value that's an async of (either B or C).
Looking at those type signatures, you can already start to get an idea of what the bind function will do. It will take that value that's either A or C, and "unwrap" it. If it's C, it will produce an "either B or C" value that's C (and the function won't need to be called). If it's A, then in order to convert it to an "either B or C" value, it will call the f function (which takes an A).
All this happens within an async context, which adds an extra layer of complexity to the types. It might be easier to grasp all this if you look at the basic version of Result.bind, with no async involved:
let bind (f : 'a -> Result<'b, 'c>) (a : Result<'a, 'c>) =
match a with
| Ok val -> f val
| Error err -> Error err
In this snippet, the type of val is 'a, and the type of err is 'c.
Final update: There was one comment from the chat session that I thought was worth preserving in the answer (since people almost never follow chat links). Developer11 asked,
... if I were to ask you what Result.bind in my example code maps to your approach, can we rewrite it as create >> AsyncResult.bind update? It worked though. Just wondering i liked the short form and as you said they have a standard meaning? (in haskell community?)
My reply was:
Yes. If the >=> operator is properly written, then f >=> g will always be equivalent to f >> bind g. In fact, that's precisely the definition of the compose function, though that might not be immediately obvious to you because compose is written as fun x -> bind g (f x) rather than as f >> bind g. But those two ways of writing the compose function would be exactly equivalent. It would probably be very instructive for you to sit down with a piece of paper and draw out the function "shapes" (inputs & outputs) of both ways of writing compose.
Why do you want to use Railway Oriented Programming here? If you just want to run a sequence of operations and return information about the first exception that occurs, then F# already provides a language support for this using exceptions. You do not need Railway Oriented Programming for this. Just define your Error as an exception:
exception Error of code:string * message:string
Modify the code to throw the exception (also note that your create function takes article but does not use it, so I deleted that):
let create () = async {
let ds = new DataContractJsonSerializer(typeof<Error>)
let request = WebRequest.Create("http://example.com") :?> HttpWebRequest
request.Method <- "GET"
try
use response = request.GetResponse() :?> HttpWebResponse
use reader = new StreamReader(response.GetResponseStream())
use memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(reader.ReadToEnd()))
return ds.ReadObject(memoryStream) :?> Article
with
| :? WebException as e ->
use reader = new StreamReader(e.Response.GetResponseStream())
use memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(reader.ReadToEnd()))
return raise (Error (ds.ReadObject(memoryStream) :?> Error)) }
And then you can compose functions just by sequencing them in async block using let! and add exception handling:
let main () = async {
try
let! created = create ()
let! updated = update created
let! uploaded = upload updated
Debug.WriteLine(uploaded.name)
with Error(code, message) ->
Debug.WriteLine(code + ":" + message) }
If you wanted more sophisticated exception handling, then Railway Oriented Programming might be useful and there is certainly a way of integrating it with async, but if you just want to do what you described in your question, then you can do that much more easily with just standard F#.

How run in background a blocking call in F#?

I need to call in the background a API that call a webservice. I don't wish to turn the (very complex) method to async, just say "do all this on the background".
But I'm lost in how do this with F#. This is what I have:
let task = async {
let result = SyncApi.syncData(login.url, login.zone, login.user, login.pwd) <-- THIS MUST RUN IN BACKGROUND...
match result with
|Some(msg) -> failwith msg
| None -> ()
}
task
|> Async.Catch
|> Async.RunSynchronously
|> fun x ->
match x with
| Choice1Of2 x -> rootPage.Navigation.PopToRootAsync(true) |> ignore
| Choice2Of2 ex -> showMsgError(ex.Message)
If you're looking for simple fire and forget style to start the API call an don't use the result on the current thread, Async.Start(task) might be what you're looking for. It takes the Async workflow, starts it on a thread pool and immediately returns so that your method can continue.
But seeing that you need the result to either change navigation or show an error message, you might need to call the SyncApi synchronously on the current thread and wait for its result.
Alternatively, if your application model allows it, you can do something like this:
(* Define the task including error handling. *)
let task = async {
let result = SyncApi.syncData(login.url, login.zone, login.user, login.pwd)
match result with
| Some msg ->
(* This may have to be posted back to the UI context.
Correct way depends on technology (Xamarin vs. WPF vs. MVC...) *)
showMsgError msg
| None -> ()
}
(* Fire and forget the async API call. *)
Async.Start(task)
(* Optimistically navigate away immediately,
while `task` may still be in progress. *)
rootPage.Navigation.PopToRootAsync(true) |> ignore
This will start the task on a thread pool, navigate away, but in case the async task failed, it will trigger the error message. However it assumes that your application can show the error message asynchronously for example as a popup, not only in the context of the page that started the task.

Select First Async Result

Is there an async operator to get the value first returned by two asynchronous values (Async<_>)?
For example, given two Async<_> values where one A1 returns after 1 second and A2 returns after 2 seconds, then I want the result from A1.
The reason is I want to implement an interleave function for asynchronous sequences, so that if there are two asynchronous sequences "defined" like this (with space indicating time as with marble diagrams):
S1 = -+-----+------------+----+
S2 = ---+-------+----------+-----+
Then I want to generate a new asynchronous sequence that acts like this:
S3 = -+-+---+---+--------+-+--+--+
Interleave S1 S2 = S3
But two do that, I probably need a kind of async select operator to select select values.
I think this would be like "select" in Go, where you can take the first available value from two channels.
TPL has a function called Task.WhenAny - I probably need something similar here.
I don't think the operator is available in the F# library. To combine this from existing operations, you could use Async.StartAsTask and then use the existing Task.WhenAny operator. However, I'm not exactly sure how that would behave with respect to cancellation.
The other option is to use the Async.Choose operator implemented on F# Snippets web site. This is not particularly elegant, but it should do the trick! To make the answer stand-alone, the code is attached below.
/// Creates an asynchronous workflow that non-deterministically returns the
/// result of one of the two specified workflows (the one that completes
/// first). This is similar to Task.WaitAny.
static member Choose(a, b) : Async<'T> =
Async.FromContinuations(fun (cont, econt, ccont) ->
// Results from the two
let result1 = ref (Choice1Of3())
let result2 = ref (Choice1Of3())
let handled = ref false
let lockObj = new obj()
let synchronized f = lock lockObj f
// Called when one of the workflows completes
let complete () =
let op =
synchronized (fun () ->
// If we already handled result (and called continuation)
// then ignore. Otherwise, if the computation succeeds, then
// run the continuation and mark state as handled.
// Only throw if both workflows failed.
match !handled, !result1, !result2 with
| true, _, _ -> ignore
| false, (Choice2Of3 value), _
| false, _, (Choice2Of3 value) ->
handled := true
(fun () -> cont value)
| false, Choice3Of3 e1, Choice3Of3 e2 ->
handled := true;
(fun () ->
econt (new AggregateException
("Both clauses of a choice failed.", [| e1; e2 |])))
| false, Choice1Of3 _, Choice3Of3 _
| false, Choice3Of3 _, Choice1Of3 _
| false, Choice1Of3 _, Choice1Of3 _ -> ignore )
op()
// Run a workflow and write result (or exception to a ref cell
let run resCell workflow = async {
try
let! res = workflow
synchronized (fun () -> resCell := Choice2Of3 res)
with e ->
synchronized (fun () -> resCell := Choice3Of3 e)
complete() }
// Start both work items in thread pool
Async.Start(run result1 a)
Async.Start(run result2 b) )
Tomas already answered the precise question. However, you might be interested to know that my Hopac library for F# directly supports Concurrent ML -style first-class, higher-order, selective events, called alternatives, which directly provide a choose -combinator and provide a more expressive concurrency abstraction mechanism than Go's select statement.
Regarding your more specific problem of interleaving two asynchronous sequences, I recently started experimenting with ideas on how Rx-style programming could be done with Hopac. One potential approach I came up with is to define a kind of ephemeral event streams. You can find the experimental code here:
Alts.fsi
Alts.fs
As you can see, one of the operations defined for event streams is merge. What you are looking for may be slightly different semantically, but would likely be straightforward to implement using Hopac -style alternatives (or Concurrent ML -style events).

Why would disposal of resources be delayed when using the "use" binding within an async computation expression?

I've got an agent which I set up to do some database work in the background. The implementation looks something like this:
let myAgent = MailboxProcessor<AgentData>.Start(fun inbox ->
let rec loop =
async {
let! data = inbox.Receive()
use conn = new System.Data.SqlClient.SqlConnection("...")
data |> List.map (fun e -> // Some transforms)
|> List.sortBy (fun (_,_,t,_,_) -> t)
|> List.iter (fun (a,b,c,d,e) ->
try
... // Do the database work
with e -> Log.error "Yikes")
return! loop
}
loop)
With this I discovered that if this was called several times in some amount of time I would start getting SqlConnection objects piling up and not being disposed, and eventually I would run out of connections in the connection pool (I don't have exact metrics on how many "several" is, but running an integration test suite twice in a row could always cause the connection pool to run dry).
If I change the use to a using then things are disposed properly and I don't have a problem:
let myAgent = MailboxProcessor<AgentData>.Start(fun inbox ->
let rec loop =
async {
let! data = inbox.Receive()
using (new System.Data.SqlClient.SqlConnection("...")) <| fun conn ->
data |> List.map (fun e -> // Some transforms)
|> List.sortBy (fun (_,_,t,_,_) -> t)
|> List.iter (fun (a,b,c,d,e) ->
try
... // Do the database work
with e -> Log.error "Yikes")
return! loop
}
loop)
It seems that the Using method of the AsyncBuilder is not properly calling its finally function for some reason, but it's not clear why. Does this have something to do with how I've written my recursive async expression, or is this some obscure bug? And does this suggest that utilizing use within other computation expressions could produce the same sort of behavior?
This is actually the expected behavior - although not entirely obvious!
The use construct disposes of the resource when the execution of the asynchronous workflow leaves the current scope. This is the same as the behavior of use outside of asynchronous workflows. The problem is that recursive call (outside of async) or recursive call using return! (inside async) does not mean that you are leaving the scope. So in this case, the resource is disposed of only after the recursive call returns.
To test this, I'll use a helper that prints when disposed:
let tester () =
{ new System.IDisposable with
member x.Dispose() = printfn "bye" }
The following function terminates the recursion after 10 iterations. This means that it keeps allocating the resources and disposes of all of them only after the entire workflow completes:
let rec loop(n) = async {
if n < 10 then
use t = tester()
do! Async.Sleep(1000)
return! loop(n+1) }
If you run this, it will run for 10 seconds and then print 10 times "bye" - this is because the allocated resources are still in scope during the recursive calls.
In your sample, the using function delimits the scope more explicitly. However, you can do the same using nested asynchronous workflow. The following only has the resource in scope when calling the Sleep method and so it disposes of it before the recursive call:
let rec loop(n) = async {
if n < 10 then
do! async {
use t = tester()
do! Async.Sleep(1000) }
return! loop(n+1) }
Similarly, when you use for loop or other constructs that restrict the scope, the resource is disposed immediately:
let rec loop(n) = async {
for i in 0 .. 10 do
use t = tester()
do! Async.Sleep(1000) }

Does Async.StartChild have a memory leak?

When I run the following test (built with F#2.0) I get OutOfMemoryException. It takes about 5 min to reach exception on my system (i7-920 6gb ram if it was running as x86 process), but in any case we can see how memory is growing in task manager.
module start_child_test
open System
open System.Diagnostics
open System.Threading
open System.Threading.Tasks
let cnt = ref 0
let sw = Stopwatch.StartNew()
Async.RunSynchronously(async{
while true do
let! x = Async.StartChild(async{
if (Interlocked.Increment(cnt) % 100000) = 0 then
if sw.ElapsedMilliseconds > 0L then
printfn "ops per sec = %d" (100000L*1000L / sw.ElapsedMilliseconds)
else
printfn "ops per sec = INF"
sw.Restart()
GC.Collect()
})
do! x
})
printfn "done...."
I don't see nothing wrong with this code, and don't see any reasons for memory growing. I made alternate implementation to make sure my arguments are valid:
module start_child_fix
open System
open System.Collections
open System.Collections.Generic
open System.Threading
open System.Threading.Tasks
type IAsyncCallbacks<'T> = interface
abstract member OnSuccess: result:'T -> unit
abstract member OnError: error:Exception -> unit
abstract member OnCancel: error:OperationCanceledException -> unit
end
type internal AsyncResult<'T> =
| Succeeded of 'T
| Failed of Exception
| Canceled of OperationCanceledException
type internal AsyncGate<'T> =
| Completed of AsyncResult<'T>
| Subscribed of IAsyncCallbacks<'T>
| Started
| Notified
type Async with
static member StartChildEx (comp:Async<'TRes>) = async{
let! ct = Async.CancellationToken
let gate = ref AsyncGate.Started
let CompleteWith(result:AsyncResult<'T>, callbacks:IAsyncCallbacks<'T>) =
if Interlocked.Exchange(gate, Notified) <> Notified then
match result with
| Succeeded v -> callbacks.OnSuccess(v)
| Failed e -> callbacks.OnError(e)
| Canceled e -> callbacks.OnCancel(e)
let ProcessResults (result:AsyncResult<'TRes>) =
let t = Interlocked.CompareExchange<AsyncGate<'TRes>>(gate, AsyncGate.Completed(result), AsyncGate.Started)
match t with
| Subscribed callbacks ->
CompleteWith(result, callbacks)
| _ -> ()
let Subscribe (success, error, cancel) =
let callbacks = {
new IAsyncCallbacks<'TRes> with
member this.OnSuccess v = success v
member this.OnError e = error e
member this.OnCancel e = cancel e
}
let t = Interlocked.CompareExchange<AsyncGate<'TRes>>(gate, AsyncGate.Subscribed(callbacks), AsyncGate.Started)
match t with
| AsyncGate.Completed result ->
CompleteWith(result, callbacks)
| _ -> ()
Async.StartWithContinuations(
computation = comp,
continuation = (fun v -> ProcessResults(AsyncResult.Succeeded(v))),
exceptionContinuation = (fun e -> ProcessResults(AsyncResult.Failed(e))),
cancellationContinuation = (fun e -> ProcessResults(AsyncResult.Canceled(e))),
cancellationToken = ct
)
return Async.FromContinuations( fun (success, error, cancel) ->
Subscribe(success, error, cancel)
)
}
For this test it works well without any considerably memory consumption. Unfortunately I'm not much experienced in F# and have doubts if I miss some things. In case if it is bug how can I report it to F# team?
I think you're correct - there seems to be a memory leak in the implementation of StartChild.
I did a bit of profiling (following a fantastic tutorial by Dave Thomas) and the open-source F# release and I think I even know how to fix that. If you look at the implementation of StartChild, it registers a handler with the current cancellation token of the workflow:
let _reg = ct.Register(
(fun _ ->
match !ctsRef with
| null -> ()
| otherwise -> otherwise.Cancel()), null)
The objects that stay alive in the heap are instances of this registered function. They could be unregistered by calling _reg.Dispose(), but that never happens in the F# source code. I tried adding _reg.Dispose() to the functions that get called when the async completes:
(fun res -> _reg.Dispose(); ctsRef := null; resultCell.RegisterResult (Ok res, reuseThread=true))
(fun err -> _reg.Dispose(); ctsRef := null; resultCell.RegisterResult (Error err,reuseThread=true))
(fun err -> _reg.Dispose(); ctsRef := null; resultCell.RegisterResult (Canceled err,reuseThread=true))
... and based on my experiments, this fixes the problem. So, if you want a workaround, you can probably copy all the required code from control.fs and add this as a fix.
I'll send a bug report to the F# team with a link to your question. If you find something else, you can contact them by sending bug reports to fsbugs at microsoft dot com.

Resources