bound/throttle concurrent jobs, without creating a thread-per-job - asynchronous

I want to process a collection of io-bound jobs concurrently, but bound/limit the number of outstanding (actively running) concurrent jobs.
Chunking is an easy way to increase concurrency, but creates bottlenecks if the items take varying amounts of time.
The way I found to do this is has some issues 1). Is there a way do this avoiding the issues below while remaining comparably idiomatic and succinct?
1) use a BlockingCollection (shown below). However, this leads to a solution in which the concurrency here is generated by boundedSize number of "consumer" threads. I'm looking a solution that doesn't require boundedSize number of threads to achieve boundedSize concurrent jobs. (what if boundedSize is very large?). I didn't see how I could take an item, process it, and then signal completion. I can only take items... and since I don't want to rip through the whole list at once, the consumer needs to run it's work Synchronously.
type JobNum = int
let RunConcurrentlyBounded (boundedSize:int) (start : JobNum) (finish : JobNum) (mkJob: JobNum -> Async<Unit>) =
// create a BlockingCollection
use bc = new BlockingCollection<Async<Unit>>(boundedSize)
// put async jobs on BlockingCollection
Async.Start(async {
{ start .. finish }
|> Seq.map mkJob
|> Seq.iter bc.Add
bc.CompleteAdding()
})
// each consumer runs it's job synchronously
let mkConsumer (consumerId:int) = async { for job in bc.GetConsumingEnumerable() do do! job }
// create `boundedSize` number of consumers in parallel
{ 1 .. boundedSize }
|> Seq.map mkConsumer
|> Async.Parallel
|> Async.RunSynchronously
|> ignore
let Test () =
let boundedSize = 15
let start = 1
let finish = 50
let mkJob = (fun jobNum -> async {
printfn "%A STARTED" jobNum
do! Async.Sleep(5000)
printfn "%A COMPLETED" jobNum
})
RunConcurrentlyBounded boundedSize start finish mkJob
I'm aware of TPL and mailbox processors, but thought there might've been something simple & robust, but avoids the high number of thread creation route.
Ideally there would just be one producer thread and one consumer thread; I suspect that BlockingCollection might not be the right concurrency primitive for such a case?

this seems as good as I'm going to get, by using SemaphoreSlim.
I suppose the underlying ThreadPool is really controlling the concurrency here.
let RunConcurrentlySemaphore (boundedSize:int) (start : JobNum) (finish : JobNum) (mkJob: JobNum -> Async<Unit>) =
use ss = new SemaphoreSlim(boundedSize);
{ start .. finish }
|> Seq.map (mkJob >> fun job -> async {
do! Async.AwaitTask(ss.WaitAsync())
try do! job finally ss.Release() |> ignore
})
|> Async.Parallel
|> Async.RunSynchronously

Related

Running Parallel async functions and waiting for the result in F# in a loop

I have a function that I want to run 5 at a time from a chunked list and then wait a second so as not to hit an Api rate limit. I know the code is wrong but it is as far as I have got.
let ordersChunked = mOrders |> List.chunkBySize 5
for fiveOrders in ordersChunked do
let! tasks =
fiveOrders
|> List.map (fun (order) -> trackAndShip(order) )
|> Async.Parallel
Async.Sleep(1000) |> ignore
trackAndShip is an async function not a task and I get a little confused about the two.
Some of the answers I have read add |> Async.RunSynchronously to the end of it - and I do that at the top level but I don't feel it is good practice here ( and I would need to convert async to Task? )
This is probably a very simple answer if you are familiar with async / tasks but I am still "finding my feet" in this area.
Unlike C#'s tasks, Async computation do not start when they are created and you have to explicitly start them. Furthermore, even if you would start the Async operation, it won't do what you expect because the sleep will not execute because you just ignore it.
From the let! in your example code, I'm going to assume you're the code snippet is inside a computation expression. So the following may be what you want
let asyncComputation =
async {
let ordersChunked = [] |> List.chunkBySize 5
for fiveOrders in ordersChunked do
let! tasks =
fiveOrders
|> List.map (fun (order) -> trackAndShip(order) )
|> Async.Parallel
do! Async.Sleep(1000) // Note the do! here, without it your sleep will be ignored
}
Async.RunSynchronously asyncComputation
Note that the Async.Sleep is now chained into the CE by using do!. The Async.RunSynchronously call is one way to start the async computation: it runs the async operation and blocks the current thread while waiting for the async operation's result. There are many other ways of running it described in the documentation.

F# ASync.Parallel plus main thread

I have an F# program to copy files that I want to work asynchronously. So far I have:
let asyncFileCopy (source, target, overwrite) =
let copyfn (source,target,overwrite) =
printfn "Copying %s to %s" source target
File.Copy(source, target, overwrite)
printfn "Copyied %s to %s" source target
let fn = new Func<string * string * bool, unit>(copyfn)
Async.FromBeginEnd((source, target, overwrite), fn.BeginInvoke, fn.EndInvoke)
[<EntryPoint>]
let main argv =
let copyfile1 = asyncFileCopy("file1", "file2", true)
let copyfile2 = asyncFileCopy("file3", "file4", true)
let asynctask =
[copyfile1; copyfile2]
|> Async.Parallel
printfn "doing other stuff"
Async.RunSynchronously asynctask |> ignore
which works (the files are copied) but not in the way I want. I want to start the parallel copy operations so that they begin copying. Meanwhile, I want to continue doing stuff on the main thread. Later, I want to wait for the asynchronous tasks to complete. What my code seems to do is set up the parallel copies, then do other stuff, but not actually execute the copies until it hits Async.Runsychronously.
Is there a way to, in effect, Async.Run"a"synchronously, to get the copies started in the thread pool, then do other stuff and later wait for the copies to finish?
Figured it out:
let asynctask =
[copyfile1; copyfile2]
|> Async.Parallel
|> Async.StartAsTask
let result = Async.AwaitIAsyncResult asynctask
printfn "doing other stuff"
Async.RunSynchronously result |> ignore
printfn "Done"
The key is using StartAsTask, AwaitIAsyncResult and only later RunSynchronously to await task completion

How can I throttle a large array of async workflows passed to Async.Parallel

I have an array holding a large number of small async database queries; for example:
// I actually have a more complex function that
// accepts name/value pairs for query parameters.
let runSql connString sql = async {
use connection = new SqlConnection(connString)
use command = new SqlCommand(sql, connection)
do! connection.OpenAsync() |> Async.AwaitIAsyncResult |> Async.Ignore
return! command.ExecuteScalarAsync() |> Async.AwaitTask
}
let getName (id:Guid) = async {
// I actually use a parameterized query
let querySql = "SELECT Name FROM Entities WHERE ID = '" + id.ToString() + "'"
return! runSql connectionString querySql
}
let ids : Guid array = getSixtyThousandIds()
let asyncWorkflows = ids |> Array.map getName
//...
Now, the problem: The next expression runs all 60K workflows at once, flooding the server. This leads to many of the SqlCommands timing out; it also typically causes out of memory exceptions in the client (which is F# interactive) for reasons I do not understand and (not needing to understand them) have not investigated:
//...
let names =
asyncWorkflows
|> Async.Parallel
|> Async.RunSynchronously
I've written a rough-and-ready function to batch the requests:
let batch batchSize asyncs = async {
let batches = asyncs
|> Seq.mapi (fun i a -> i, a)
|> Seq.groupBy (fst >> fun n -> n / batchSize)
|> Seq.map (snd >> Seq.map snd)
|> Seq.map Async.Parallel
let results = ref []
for batch in batches do
let! result = batch
results := (result :: !results)
return (!results |> List.rev |> Seq.collect id |> Array.ofSeq)
}
To use this function, I replace Async.Parallel with batch 20 (or another integer value):
let names =
asyncWorkflows
|> batch 20
|> Async.RunSynchronously
This works reasonably well, but I would prefer to have a system that starts each new async as soon as one completes, so rather than successive batches of size N starting after each previous batch of size N has finished, I am always awaiting N active SqlCommands (until I get to the end, of course).
Questions:
Am I reinventing the wheel? In other words, are there library functions that do this already? (Would it be profitable to look into exploiting ParallelEnumerable.WithDegreeOfParallelism somehow?)
If not, how should I implement a continuous queue instead of a series of discrete batches?
I am not primarily seeking suggestions to improve the existing code, but such suggestions will nonetheless be received with interest and gratitude.
FSharpx.Control offers an Async.ParallelWithThrottle function. I'm not sure if it is the best implementation as it uses SemaphoreSlim. But the ease of use is great and since my application doesn't need top performance it works well enough for me. Although since it is a library if someone knows how to make it better it is always a nice thing to make libraries top performers out of the box so the rest of us can just use the code that works and just get our work done!
Async.Parallel had support for throttling added in FSharp v 4.7. You do:
let! results = Async.Parallel(workflows, maxDegreeOfParallelism = dop)
if doing more than 1200 workflows concurrently in FSharp.Core versions <= 6.0.5, see this resolved issue
Proposal for a more explicit API

Why would disposal of resources be delayed when using the "use" binding within an async computation expression?

I've got an agent which I set up to do some database work in the background. The implementation looks something like this:
let myAgent = MailboxProcessor<AgentData>.Start(fun inbox ->
let rec loop =
async {
let! data = inbox.Receive()
use conn = new System.Data.SqlClient.SqlConnection("...")
data |> List.map (fun e -> // Some transforms)
|> List.sortBy (fun (_,_,t,_,_) -> t)
|> List.iter (fun (a,b,c,d,e) ->
try
... // Do the database work
with e -> Log.error "Yikes")
return! loop
}
loop)
With this I discovered that if this was called several times in some amount of time I would start getting SqlConnection objects piling up and not being disposed, and eventually I would run out of connections in the connection pool (I don't have exact metrics on how many "several" is, but running an integration test suite twice in a row could always cause the connection pool to run dry).
If I change the use to a using then things are disposed properly and I don't have a problem:
let myAgent = MailboxProcessor<AgentData>.Start(fun inbox ->
let rec loop =
async {
let! data = inbox.Receive()
using (new System.Data.SqlClient.SqlConnection("...")) <| fun conn ->
data |> List.map (fun e -> // Some transforms)
|> List.sortBy (fun (_,_,t,_,_) -> t)
|> List.iter (fun (a,b,c,d,e) ->
try
... // Do the database work
with e -> Log.error "Yikes")
return! loop
}
loop)
It seems that the Using method of the AsyncBuilder is not properly calling its finally function for some reason, but it's not clear why. Does this have something to do with how I've written my recursive async expression, or is this some obscure bug? And does this suggest that utilizing use within other computation expressions could produce the same sort of behavior?
This is actually the expected behavior - although not entirely obvious!
The use construct disposes of the resource when the execution of the asynchronous workflow leaves the current scope. This is the same as the behavior of use outside of asynchronous workflows. The problem is that recursive call (outside of async) or recursive call using return! (inside async) does not mean that you are leaving the scope. So in this case, the resource is disposed of only after the recursive call returns.
To test this, I'll use a helper that prints when disposed:
let tester () =
{ new System.IDisposable with
member x.Dispose() = printfn "bye" }
The following function terminates the recursion after 10 iterations. This means that it keeps allocating the resources and disposes of all of them only after the entire workflow completes:
let rec loop(n) = async {
if n < 10 then
use t = tester()
do! Async.Sleep(1000)
return! loop(n+1) }
If you run this, it will run for 10 seconds and then print 10 times "bye" - this is because the allocated resources are still in scope during the recursive calls.
In your sample, the using function delimits the scope more explicitly. However, you can do the same using nested asynchronous workflow. The following only has the resource in scope when calling the Sleep method and so it disposes of it before the recursive call:
let rec loop(n) = async {
if n < 10 then
do! async {
use t = tester()
do! Async.Sleep(1000) }
return! loop(n+1) }
Similarly, when you use for loop or other constructs that restrict the scope, the resource is disposed immediately:
let rec loop(n) = async {
for i in 0 .. 10 do
use t = tester()
do! Async.Sleep(1000) }

Async.TryCancelled doesn't work with Async.RunSynchronously

I try to create an agent that updates UI based on user interaction. If user clicks on a button, the GUI should be refreshed. The preparation of model takes a long time, so it is desirable that if user clicks on other button, the preparation is cancelled and the new one is started.
What I have so far:
open System.Threading
type private RefreshMsg =
| RefreshMsg of AsyncReplyChannel<CancellationTokenSource>
type RefresherAgent() =
let mutable cancel : CancellationTokenSource = null
let doSomeModelComputation i =
async {
printfn "start %A" i
do! Async.Sleep(1000)
printfn "middle %A" i
do! Async.Sleep(1000)
printfn "end %A" i
}
let mbox =
MailboxProcessor.Start(fun mbx ->
let rec loop () = async {
let! msg = mbx.Receive()
match msg with
| RefreshMsg(chnl) ->
let cancelSrc = new CancellationTokenSource()
chnl.Reply(cancelSrc)
let update = async {
do! doSomeModelComputation 1
do! doSomeModelComputation 2
//do! updateUI // not important now
}
let cupdate = Async.TryCancelled(update, (fun c -> printfn "refresh cancelled"))
Async.RunSynchronously(cupdate, -1, cancelSrc.Token)
printfn "loop()"
return! loop()
}
loop ())
do
mbox.Error.Add(fun exn -> printfn "Error in refresher: %A" exn)
member x.Refresh() =
if cancel <> null then
// I don't handle whether the previous computation finished
// I just cancel it; might be improved
cancel.Cancel()
cancel.Dispose()
cancel <- mbox.PostAndReply(fun reply -> RefreshMsg(reply))
printfn "x.Refresh end"
//sample
let agent = RefresherAgent()
agent.Refresh()
System.Threading.Thread.Sleep(1500)
agent.Refresh()
I return a CancellationTokenSource for each request and store it in a mutable variable (the x.Refresh() is thread safe, it is called on UI thread).
If Refresh() is called for the first time, the cancellation source is returned. If Refresh() is called for the second time, I call Cancel which should abort the async task that I run through Async.RunSynchronously.
However, an exception is raised. The output from my sample is
x.Refresh end
start 1
middle 1
end 1
refresh cancelled
Error in refresher: System.OperationCanceledException: The operation was canceled.
at Microsoft.FSharp.Control.AsyncBuilderImpl.commit[a](Result`1 res)
Now as I think about this, it might make sense, because the thread on which the agent runs, was interrputed, right? But, how do I achieve the desired behaviour?
I need to cancel async workflow inside the agent, so that the agent can continue consuming new messages. Why do I use the mailbox processor? Cause it is guaranteed that only one thread is trying to create UI model, so I save resources.
Let's suppose I create UI model by downloading data from several web services, that's why I use async call. When user changes a combo and select other option, I want to stop querying the webservices (= cancel the async calls) with old value and want to create new model base od web services call with new value.
Any suggestion that I can use instead of my solution and will solve my problem, is also welcome.
I have difficulties in trying to understand what you want to achieve. But maybe this does not matter - the error just says that the workflow you are executing with RunSynchronously was canceled (RunSynchronously will throw the exception) - so you can wrap this call into a try-match block and just ignore the OC-Exception
a better option might be to refactor your cupdate and to the try-match inside of this - you can even bring the in TryCancelled into it if you catch the OC-Exceptions directly ;)
let update =
async {
try
do! doSomeModelComputation 1
do! doSomeModelComputation 2
with
| :? OperationCanceledException ->
printfn "refresh cancelled"
}
Async.RunSynchronously(update, -1, cancelSrc.Token)
But I still don't get the part why you want this Synchronously

Resources