Flattening nested async sequences in F# - asynchronous

Consider a series of F# sequences:
let seqOf123 = seq { for i in 1..3 do yield i }
let seqOf456 = seq { for i in 4..6 do yield i }
let seqOf789 = seq { for i in 7..9 do yield i }
... and a sequence which contains all of them:
let seqOfSeqOfNums =
seq {
yield seqOf123
yield seqOf456
yield seqOf789
}
Now we have a sequence of sequences that we can flatten using built-in Seq.concat function and wrap in async clause to execute asynchronously:
let firstAsyncSeqOfNums = async { return Seq.concat seqOfSeqOfNums }
We have a resulting async sequence of 9 numbers with a signature Async<seq<int>>that we will come back to.
Now consider a series of async sequences:
let asyncSeqOf123 = async { return seqOf123 }
let asyncSeqOf456 = async { return seqOf456 }
let asyncSeqOf789 = async { return seqOf789 }
... and a sequence containing them:
let seqOfAsyncSeqOfNums =
seq {
yield asyncSeqOf123
yield asyncSeqOf456
yield asyncSeqOf789
}
We have now a sequence of type seq<Async<seq<int>>>. We can't flatten this one using Seq.concat because it's a sequence of async sequences. So how can we convert it to a type Async<seq<int>> where all integer data are flattened? We can try to do the following:
let secondAsyncSeqOfNums =
async {
return seqOfAsyncSeqOfNums
|> Seq.map (fun x -> x |> Async.RunSynchronously)
|> Seq.concat
}
It looks like it does its job: it has a type Async<seq<int>> and if we pipe it to Async.RunSynchronously it will produce the same sequence of 9 integers. But the way it produces the same sequence is not equivalent to firstAsyncSeqOfNums that appears above. The implementation of secondAsyncSeqOfNums calls Async.RunSynchronously for every nested sequence of integers during the generation of a singe flattened sequence. But can this be avoided? Note that we are generating an async flattened sequence that ideally would need only a single call to Async.RunSynchronously to evaluate its content. But I can't find a way to rewrite the code without Async.RunSynchronously being called multiple times.

Are you looking for something like this:
> let aMap f wf = async {
- let! a = wf
- return f a
- };;
val aMap : f:('a -> 'b) -> wf:Async<'a> -> Async<'b>
> let aConcat wf = Async.Parallel wf |> aMap Seq.concat;;
val aConcat : wf:seq<Async<#seq<'b>>> -> Async<seq<'b>>

Related

Asynchronous mapFold

For the following example, Array.mapFold produces the result ([|1; 4; 12|], 7).
let mapping s x = (s * x, s + x)
[| 1..3 |]
|> Array.mapFold mapping 1
Now suppose our mapping is asynchronous.
let asyncMapping s x = async { return (s * x, s + x) }
I am able to create Array.mapFoldAsync for the following to work.
[| 1..3 |]
|> Array.mapFoldAsync asyncMapping 1
|> Async.RunSynchronously
Is there a succinct way to achieve this without creating Array.mapFoldAsync?
I am asking as a way to learn other techniques - my attempts using Array.fold were horrible.
I don't think it would generally be of much benefit to combine mapFold with an Async function, because the expected result is a tuple ('values * 'accumulator), but using an Async function will at best give you an Async<'values * 'accumulator>. Consider the following attempt to make Array.mapFold work with Async:
let mapping s x = async {
let! s' = s
let! x' = x
return (s' * x', s' + x')
}
[| 1..3 |]
|> Array.map async.Return
|> Array.mapFold mapping (async.Return 1)
Even this doesn't work, because of the type mismatch: The type ''a * Async<'b>' does not match the type 'Async<'c * 'd>'.
You may also have noticed that while there is an Array.Parallel.map, there's no Array.Parallel.fold or Array.Parallel.mapFold. If you try to write your own mapFoldAsync, you may see why. The mapping part is pretty easy, just partially apply Array.map and compose with Async.Parallel:
let mapAsync f = Array.map f >> Async.Parallel
You can implement an async fold as well, but since each evaluation depends on the previous result, you can't leverage Async.Parallel this time:
let foldAsync f state array =
match array |> Array.length with
| 0 -> async.Return state
| length ->
async {
let mutable acc = state
for i = 0 to length - 1 do
let! value = f acc array.[i]
acc <- value
return acc
}
Now, when we try to combine these to build a mapFoldAsync, it becomes apparent that we can't leverage parallel execution on the mapping anymore, because both the values and the accumulator can be based on the result of the previous evaluation. That means our mapFoldAsync will be a modified 'foldAsync', not a composition of it with mapAsync:
let mapFoldAsync (f: 's -> 'a -> Async<'b * 's>) (state: 's) (array: 'a []) =
match array |> Array.length with
| 0 -> async.Return ([||], state)
| length ->
async {
let mutable acc = state
let results = Array.init length <| fun _ -> Unchecked.defaultof<'b>
for i = 0 to length - 1 do
let! (x,y) = f acc array.[i]
results.[i] <- x
acc <- y
return (results, acc)
}
While this will give you a way to do a mapFold with an async mapping function, the only real benefit would be if the mapping function did something with high-latency, such as a service call. You won't be able to leverage parallel execution for speed-up. If possible, I would suggest considering an alternative solution, based on your real-world scenario.
Without external libraries (I recommend to try AsyncSeq or Hopac.Streams)
you could do this:
let mapping s x = (fst s * x, snd s + x) |> async.Return
module Array =
let mapFoldAsync folderAsync (state: 'state) (array: 'elem []) = async {
let mutable finalState = state
for elem in array do
let! nextState = folderAsync finalState elem
finalState <- nextState
return finalState
}
[| 1..4 |]
|> Array.mapFoldAsync mapping (1,0)
|> Async.RunSynchronously

Idiomatically parsing a whitespace-separated string into a Vec of tuples of differing types

I have a struct EnclosingObject which contains a field of a Vec of tuples. I want to implement FromStr for this struct in a way that an EnclosingObject can be parsed from a string with the following structure: <number of tuples> <tuple1 str1> <tuple1 str2> <tuple1 i32> <tuple2 str1> <tuple2 str2> ...
This is what I have come up with so far (ignoring the case of an invalid number of tuples):
use std::str::FromStr;
use std::num::ParseIntError;
#[derive(Debug)]
struct EnclosingObject{
tuples: Vec<(String, String, i32)>,
}
impl FromStr for EnclosingObject {
type Err = ParseIntError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
let elems_vec = s.split_whitespace().collect::<Vec<_>>();
let mut elems = elems_vec.as_slice();
let num_tuples = elems[0].parse::<usize>()?;
elems = &elems[1..];
let mut tuples = Vec::with_capacity(num_tuples);
for chunk in elems.chunks(3).take(num_tuples){
tuples.push((chunk[0].into(),
chunk[1].into(),
chunk[2].parse::<i32>()?));
}
Ok(EnclosingObject{
tuples : tuples
})
}
}
fn main(){
println!("{:?}", EnclosingObject::from_str("3 a b 42 c d 32 e f 50"));
}
(playground)
As expected, for a valid string it prints out:
Ok(EnclosingObject { tuples: [("a", "b", 42), ("c", "d", 32), ("e", "f", 50)] })
and for an invalid string e.g. "3 a b x c d 32 e f 50":
Err(ParseIntError { kind: InvalidDigit })
Can I parse this Vec of tuples in a more elegant/idiomatic way, such as by using iterators?
I tried a combination of map and collect, but the problem with this is the error handling:
let tuples = elems
.chunks(3)
.take(num_tuples)
.map(|chunk| (chunk[0].into(),
chunk[1].into(),
chunk[2].parse::<i32>()?))
.collect();
The questionmark-operator seems not to work in this context (within the tuple). So I transformed it a bit:
let tuples = try!(elems
.chunks(3)
.take(num_tuples)
.map(|chunk| {
let integer = chunk[2].parse::<i32>()?;
Ok((chunk[0].into(),
chunk[1].into(),
integer))})
.collect());
... which works, but again appears a bit cumbersome.
The questionmark-operator seems not to work in this context (within the tuple).
The problem is that ? returns an Err in case of failure and you weren't returning an Ok in case of success. The operator works just fine if you do that. Beyond that, you can avoid the extraneous allocation of the Vec by operating on the iterator from splitting on whitespace:
fn from_str(s: &str) -> Result<Self, Self::Err> {
let mut elems = s.split_whitespace();
let num_tuples = elems.next().expect("error handling: count missing").parse()?;
let tuples: Vec<_> = elems
.by_ref()
.tuples()
.map(|(a, b, c)| Ok((a.into(), b.into(), c.parse()?)))
.take(num_tuples)
.collect::<Result<_, _>>()?;
if tuples.len() != num_tuples { panic!("error handling: too few") }
if elems.next().is_some() { panic!("error handling: too many") }
Ok(EnclosingObject { tuples })
}
I've also used Itertools' tuples method which automatically groups an iterator into tuples and collected into a Result<Vec<_>, _>. I reduced the redundant tuples: tuples in the struct and added some placeholders for the remainder of the error handling. I removed the Vec::with_capacity because I trust that the size_hint set by take will be good enough. If you didn't trust it, you could still use with_capacity and then extend the vector with the iterator.

Optimizing syntax for mapped async sequences in F#

I am trying to figure out efficient syntax for the following F# expression. Let's say I have an F# async computation:
let asyncComp n = async { return n }
It has a signature 'a -> Async<'a>. Now I define a sequence of those:
let seqOfAsyncComps =
seq {
yield asyncComp 1
yield asyncComp 2
yield asyncComp 3
}
Now I have an item of seq<Async<int>>. What if I want to asynchronously map the elements from seq<Async<int>> to seq<Async<int>>. This won't work:
let seqOfAsyncSquares =
seqOfAsyncComps |> Seq.map (fun x -> x * x) // ERROR!!!
Of course, x is Async<int>, I have to extract an int first, so I can do the following instead:
let seqOfAsyncSquares =
seqOfAsyncComps |> Seq.map (fun x -> async {
let! y = x
return y * y }) // OK
This works fine but the syntax is clumsy. It takes away F# compactness, and if I want to chain several seq processing I have to do the same trick in every map, filter or iter.
I suspect there might be a more efficient syntax to deal with sequences that consist of async computations.
You could use Async.map (that I've only now blatantly stolen from Tomas Petricek):
module Async =
let map f workflow = async {
let! res = workflow
return f res }
let seqOfAsyncSquares' =
seqOfAsyncComps |> Seq.map (Async.map (fun x -> x * x))
If you evaluate it, you'll see that it seems to produce the expected outcome:
> seqOfAsyncSquares' |> Async.Parallel |> Async.RunSynchronously;;
val it : int [] = [|1; 4; 9|]

F#- AsyncSeq - how to return values in a list

Attempting to find anagrams in a list of words using F Sharps Async Sequences (I am aware there are better algorithms for anagram finding but trying to understand Async Sequneces)
From the 'runTest' below how can I
1. async read the collecion returned and output to screen
2. block until all results return & display final count/collection
open System
open System.ServiceModel
open System.Collections.Generic
open Microsoft.FSharp.Linq
open FSharp.Control
[<Literal>]
let testWord = "table"
let testWords = new List<string>()
testWords.Add("bleat")
testWords.Add("blate")
testWords.Add("junk")
let hasWord (word:string) =
let mutable res = true
let a = testWord.ToCharArray() |> Set.ofArray
let b = word.ToCharArray() |> Set.ofArray
let difference = Set.intersect a b
match difference.Count with
| 0 -> false
| _ -> true
let test2 (words:List<string>, (word:string)) : AsyncSeq<string> =
asyncSeq {
let res =
(words)
|> Seq.filter(fun x-> (hasWord(x)) )
|> AsyncSeq.ofSeq
yield! res
}
let runTest = test2(testWords,testWord)
|> //pull stuff from stream
|> // output to screen
|> ignore
()
So as you have the test2 function returning an asyncSeq. Your questions:
1. async read the collecion returned and output to screen
If you want to have some side-effecting code (such as outputting to the screen) you can use AsyncSeq.iter to apply a function to each item as it becomes available. Iter returns an Async<unit> so you can then "kick it off" using an appropriate Async method (blocking/non-blocking).
For example:
let processItem i =
// Do whatever side effecting code you want to do with an item
printfn "Item is '%s'" i
let runTestQ1 =
test2 (testWords, testWord)
|> AsyncSeq.iter processItem
|> Async.RunSynchronously
2. block until all results return & display final count/collection
If you want all the results collected so that you can work on them together, then you can convert the AsyncSeq into a normal Seq using AsyncSeq.toBlockingSeq and then convert it to a list to force the Seq to evaluate.
For example:
let runTestQ2 =
let allResults =
test2 (testWords, testWord)
|> AsyncSeq.toBlockingSeq
|> Seq.toList
// Do whatever you would like with your list of results
printfn "Final list is '%A' with a count of %i" allResults (allResults.Length)

Combine F# async and maybe computation expression

Say i want to return an Option while in an async workflow:
let run =
async {
let! x = doAsyncThing
let! y = doNextAsyncThing x
match y with
| None -> return None
| Some z -> return Some <| f z
}
Ideally I would use the maybe computation expression from FSharpx at the same time as async to avoid doing the match. I could make a custom builder, but is there a way to generically combine two computation expressions? It might look something like this:
let run =
async {
let! x = doAsyncThing
let! y = doNextAsyncThing x
return! f y
}
Typically in F# instead of using generic workflows you define the workflow by hand, or use one that is ready available as in your case async and maybe but if you want to use them combined you will need to code a specific workflow combination by hand.
Alternatively you can use F#+ which is a project that provides generic workflows for monads, in that case it will be automatically derived for you, here's a working example, using your workflow and then using OptionT which is a monad transformer:
#r "nuget: FSharpPlus, 1.2"
open FSharpPlus
open FSharpPlus.Data
let doAsyncThing = async {return System.DateTime.Now}
let doNextAsyncThing (x:System.DateTime) = async {
let m = x.Millisecond
return (if m < 500 then Some m else None)}
let f x = 2 * x
// then you can use Async<_> (same as your code)
let run = monad {
let! x = doAsyncThing
let! y = doNextAsyncThing x
match y with
| None -> return None
| Some z -> return Some <| f z}
let res = Async.RunSynchronously run
// or you can use OptionT<Async<_>> (monad transformer)
let run' = monad {
let! x = lift doAsyncThing
let! y = OptionT (doNextAsyncThing x)
return f y}
let res' = run' |> OptionT.run |> Async.RunSynchronously
The first function has to be 'lifted' into the other monad, because it only deals with Async (not with Option), the second function deals with both so it only needs to be 'packed' into our OptionT DU.
As you can see both workflows are derived automatically, the one you had (the async workflow) and the one you want.
For more information about this approach read about Monad Transformers.
A simple way to do so is to use Option module:
let run =
async {
let! x = doAsyncThing
let! y = doNextAsyncThing x
return Option.map f y
}
I suppose you don't have to deal with option in context of async so often. FSharpx also provides many more high-order functions for option type. Most of the cases, I think using them is enough.
To get the feeling of using these functions, please take a look at this nice article.
type MaybeMonad() =
member __.Bind(x, f) =
match x with
| Some v -> f v
| None -> None
member __.Return(x) =
Some x
let maybe = MaybeMonad()
let run = async {
let! x = doAsyncThing
let! y = doNextAsyncThing x
return maybe {
let! y_val = y
return f y_val
}
}
just use f# Computation expressions inside.

Resources