is async really making sense here? NpgSQL with F# - asynchronous

I have the following transaction:
let insertPositionsAsync accountId timestamp (positions: PositionInfo list) : Async<Result<int, ExchangeError>> =
async {
try
use connection = getConnection location
do! connection.OpenAsync()
use writer =
connection.BeginBinaryImport(
$"COPY {accountId}.{tablePositionsName} (ts,instrument,average_price,leverage,unrealized_pnl,side,initial_margin,maintenance_margin,position_initial_margin,open_order_initial_margin,quantity,max_notional)
FROM STDIN (FORMAT BINARY)"
)
for t in positions do
do! writer.StartRowAsync() |> Async.AwaitTask
do! writer.WriteAsync(timestamp, NpgsqlDbType.Timestamp) |> Async.AwaitTask
do! writer.WriteAsync(t.Instrument.Ticker, NpgsqlDbType.Varchar) |> Async.AwaitTask
do! writer.WriteAsync(t.AveragePrice, NpgsqlDbType.Double) |> Async.AwaitTask
do! writer.WriteAsync(t.Leverage, NpgsqlDbType.Integer) |> Async.AwaitTask
do! writer.WriteAsync(t.UnrealizedPnl, NpgsqlDbType.Double) |> Async.AwaitTask
do! writer.WriteAsync(t.Side.ToString().ToLower(), NpgsqlDbType.Varchar) |> Async.AwaitTask
do! writer.WriteAsync(t.InitialMargin, NpgsqlDbType.Double) |> Async.AwaitTask
do! writer.WriteAsync(t.MaintenanceMargin, NpgsqlDbType.Double) |> Async.AwaitTask
do! writer.WriteAsync(t.PositionInitialMargin, NpgsqlDbType.Double) |> Async.AwaitTask
do! writer.WriteAsync(t.OpenOrderInitialMargin, NpgsqlDbType.Double) |> Async.AwaitTask
do! writer.WriteAsync(t.Quantity, NpgsqlDbType.Double) |> Async.AwaitTask
do! writer.WriteAsync(t.MaxNotional, NpgsqlDbType.Double) |> Async.AwaitTask
let! c = writer.CompleteAsync()
return Ok (int c)
with ex ->
error $"insertPositionsAsync {ex.Humanize()}"
return Error (ServiceException ex)
}
my understanding is that the loop:
for t in positions do
do! writer.StartRowAsync() |> Async.AwaitTask
do! writer.WriteAsync(timestamp, NpgsqlDbType.Timestamp) |> Async.AwaitTask
...
let! c = writer.CompleteAsync()
is happening in the driver and it's just collecting data in some local storage. Does it make sense to have all these async blocks then? (performance wise).
But the async API must exist for some reason. What could I be missing?

It's true that in many (even most) cases, the APIs on NpgsqlBinaryImporter will complete synchronously, since they simply write to Npgsql's memory buffer. This is true not just for individual value writing (WriteAsync), but for StartRowAsync as well - none of these APIs perform I/O unless the internal buffer is full.
So in effect, you're streaming out an arbitrarily long stream of data to PostgreSQL, via a memory buffer. Given enough data (after all, this is a "bulk import"), at some point the buffer will be exhausted and the API will flush; the question is whether that flush should block the calling thread or happen asynchronously.
So yes, it does make sense to use the async APIs, if you want I/O to happen asynchronously (even if most calls don't result in I/O). There should be very little overhead for calling the async methods (as opposed to the sync ones), so the async ones should generally be the default; but measure the overhead yourself to get the full picture.

As with any performance related question, the only meaningful answer is measure it and see for yourself!
More generally, async makes it possible to do the communication with the database without blocking an operating system thread. This may or may not have performance implications - but it depends on what else is happening in your program.
If you just have a single logical process communicating with the database, it makes no difference whether you do this in a synchronous or asynchronous bit of code.
If you want to write data to a database, you could do this from multiple concurrent writers. In this case, async would make a difference as you could create more concurrent writters. But I think it would likely not help much - because the database will probably not be able to write faster when you have excessive number of writer threads.
A more interesting case is when you are doing something else in your program. For example, if you have a web server than needs to handle many concurrent requests, it makes sense to use asynchronous communication with a database - because it will not block threads that you need to handle other web requests.
The summary is, you should measure the performance - but it will likely depend on what else is going on in your program. I do not think changing a single threaded writer script to async would make any difference (aside form adding some small overhead).

There's a discussion of this API here. It starts with someone saying:
It would be helpful to have asynchronous versions of BeginBinaryImport and BeginBinaryExport so they can be used from e.g. api endpoints without blocking the server.
For Import, given the write happens on Close/Dispose I'm guessing the write methods would not become async
However, one one of the developers then comments that:
Write can also block. So you need WriteAsync too.

Related

F#: Synchronously start Async within a SynchronizationContext

Async.SwitchSynchronizationContext allows an Async action to switch to running within a given SynchronizationContext. I would like to synchronously begin an Async computation within a SynchronizationContext, rather than switching inside the Async.
This would ensure that Async actions are run in the desired order, and that they are not run concurrently.
Is this possible? The documentation for Async.SwitchSynchronizationContext mentions using SynchronizationContext.Post, but no such functionality is exposed for Async.
I have not tested this, but I think the easiest way to achieve what you want is to combine the SynchronizationContext.Send method (mentioned in the comments) with the Async.StartImmediate operation. The former lets you start some work synchronously in the synchronization context. The latter lets you start an async workflow in the current context.
If you combine the two, you can define a helper that starts a function in a given synchronization context and, in this function, immediately starts an async workflow:
let startInContext (sync:SynchronizationContext) work =
SynchronizationContext.Current.Send((fun _ ->
Async.StartImmediate(work)), null)
For example:
async {
printfn "Running on the given sync context"
do! Async.Sleep(1000)
printfn "Should be back on the original sync context" }
|> startInContext SynchronizationContext.Current

F# Async: equivalent of Boost asio's strands

Boost's asio library allows the serialisation of asynchronous code in the following way. Handlers to asynchronous functions such as those which read from a stream, may be associated to a strand. A strand is associated with an "IO context". An IO context owns a thread pool. However many threads in the pool, it is guaranteed that no two handlers associated with the same strand are run concurrently. This makes it possible, for instance, to implement a state machine as if it were single-threaded, where all handlers for that machine serialise over a private strand.
I have been trying to figure out how this might be done with F#'s Async. I could not find any way to make sure that chosen sets of Async processes never run concurrently. Can anyone suggest how to do this?
It would be useful to know what is the use case that you are trying to implement. I don't think F# async has anything that would directly map to strands and you would likely use different techniques for implementing different things that might all be implemented using strands.
For example, if you are concerend with reading data from a stream, F# async block lets you write code that is asynchronous but sequential. The following runs a single logical process (which might be moved between threads of a thread pool when you wait using let!):
let readTest () = async {
let fs = File.OpenRead(#"C:\Temp\test.fs")
let buffer = Array.zeroCreate 10
let mutable read = 1
while read <> 0 do
let! r = fs.AsyncRead(buffer, 0, 10)
printfn "Read: %A" buffer.[0 .. r-1]
read <- r }
readTest() |> Async.Start
If you wanted to deal with events that occur without any control (i.e. push based rather than pull based), for example, when you cannot ask the system to read next buffer of data, you could serialize the events using a MailboxProcessor. The following sends two messages to the agent almost at the same time, but they are processed sequentially, with 1 second delay:
let agent = MailboxProcessor.Start(fun inbox -> async {
while true do
let! msg = inbox.Receive()
printfn "Got: %s" msg
do! Async.Sleep(1000)
})
agent.Post("hello")
agent.Post("world")

Why Async.StartChild does not take CancellationToken?

I am struggling to understand Async.[StartChild|Start] API design.
What I would like is to start an async process which does some tcp stream reading and calling a callback according to commands arriving on tcp.
As this async process does not really return any single value, it seems like I should use Async.Start. At some point I want to "close" my tcp client and `Async.Start takes CancellationToken, which gives me ability to implement 'close". So far so good.
The problem is, I would like to know when tcp client is done with cancellation. There is some buffer flushing work done, once Cancel is requested, so I do not want to terminate application before tcp client is done cleanup. But Async.Start returns unit, which means I have no way of knowing when such async process is complete. So, looks like Async.StartChild should help. I should be able to invoke cancellation, and when cleanup is done, this async will invoke next contiuation in chain (or throw an exception?). But... Async.StartChild does not take CancellationToken, only timeout.
Why Async.StartChild implements just single case of cancellation strategy (timeout) instead of exposing more generic way (accept CancellationToken)?
To answer the first part of the question - if you need to do some cleanup work, you can just put it in finally and it will be called when the workflow is cancelled. For example:
let work =
async {
try
printfn "first work"
do! Async.Sleep 1000
printfn "second work"
finally
printfn "cleanup" }
Say you run this using Async.Start, wait for 500ms and then cancel the computation:
let cts = new System.Threading.CancellationTokenSource()
Async.Start(work, cts.Token)
System.Threading.Thread.Sleep(500)
cts.Cancel()
The output will be "first work, cleanup". As you can see, cancelling the computation will run all the finally clauses.
To answer the second part of the question - if you need to wait until the work completes, you can use RunSynchronously (but then, perhaps you do not actually need asynchronous workflows, if you are blocking anyway...).
The following starts a background process that cancels the main work after 500ms and then starts the main work synchronously:
let cts = new System.Threading.CancellationTokenSource()
async {
do! Async.Sleep(500)
cts.Cancel() } |> Async.Start
try Async.RunSynchronously(work, cancellationToken=cts.Token)
with :? System.OperationCanceledException -> ()
printfn "completed"
This prints "first work, cleanup, completed" - as you can see, the RunSynchronously call was blocked until the work was cancelled.

F# Async File Copy

To copy a file asynchronously, will something like this work?
let filecopyasync (source, target) =
let task = Task.Run((fun () ->File.Copy(source, target, true)))
// do other stuff
Async.AwaitIAsyncResult task
In particular, will this fire up a new thread to do the copy while I "do other stuff"?
UPDATE:
Found another solution:
let asyncFileCopy (source, target, overwrite) =
printfn "Copying %s to %s" source target
let fn = new Func<string * string * bool, unit>(File.Copy)
Async.FromBeginEnd((source, target, overwrite), fn.BeginInvoke, fn.EndInvoke)
let copyfile1 = asyncFileCopy("file1", "file2", true)
let copyfile2 = asyncFileCopy("file3", "file4", true)
[copyfile1; copyfile2] |> seq |> Async.Parallel |> Async.RunSynchronously |> ignore
Your question is conflating two issues, namely multithreading and asychrony. It's important to realise that these things are entirely different concepts:
Asychrony is about a workflow of tasks where we respond to the completion of those tasks independently of the main program flow.
Multithreading is an execution model, one which can be used to implement asychrony, although asychrony can be acheived in other ways (such as hardware interrupts).
Now, when it comes to I/O, the question you should not be asking is "Can I spin up another thread to do it for me?"
Why, you ask?
If you do some I/O in the main thread, you typically block the main thread waiting for results. If you evade this problem by creating a new thread, you haven't actually solved the issue, you've just moved it around. Now you've blocked either a new thread that you've created or a thread pool thread. Oh dear, same problem.
Threads are an expensive and valuable resources and shouldn't be squandered on waiting for blocking I/O to complete.
So, what is the real solution?
Well, we achieve asynchrony via one of these other approaches. That way, we can request that the OS perform some I/O and request that it let us know when the I/O operation is complete. That way, the thread is not blocked while we're waiting for results. In Windows, this is implemented via something called I/O completion ports.
How do I do this in F#?
The .NET CopyToAsync method is probably the easiest approach. Since this returns a plain task, it's helpful to create a helper method:
type Async with
static member AwaitPlainTask (task : Task) =
task.ContinueWith(ignore) |> Async.AwaitTask
Then
[<Literal>]
let DEFAULT_BUFFER_SIZE = 4096
let copyToAsync source dest =
async {
use sourceFile = new FileStream(source, FileMode.Open, FileAccess.Read, FileShare.Read, DEFAULT_BUFFER_SIZE, true);
use destFile = new FileStream(dest, FileMode.OpenOrCreate, FileAccess.Write, FileShare.None, DEFAULT_BUFFER_SIZE, true);
do! sourceFile.CopyToAsync(destFile) |> Async.AwaitPlainTask
}
You could then use this with Async.Parallel to perform multiple copies concurrently.
Note: This is different to what you wrote above because File.Copy is a sychronous method that returns unit while CopyToAsync is an async method that returns Task. You cannot magically make synchronous methods asychronous by putting async wrappers around them, instead you need to make sure you are using async all the way down.
You can test it yourself with a few printfns. I found I had to RunAsynchronously to force the main thread to wait for the copy to complete. I'm not sure why the await didn't work, but you can see the expected set of outputs indicating that the copy happened in the background.
open System
open System.IO
open System.Threading
open System.Threading.Tasks
let filecopyasync (source, target) =
let task = Task.Run((fun () ->
printfn "CopyThread: %d" Thread.CurrentThread.ManagedThreadId;
Thread.Sleep(10000);
File.Copy(source, target, true); printfn "copydone"))
printfn "mainThread: %d" Thread.CurrentThread.ManagedThreadId;
let result=Async.AwaitIAsyncResult task
Thread.Sleep(3000)
printfn "doing stuff"
Async.RunSynchronously result
printfn "done"
Output:
filecopyasync (#"foo.txt",#"bar.txt");;
mainThread: 1
CopyThread: 7
doing stuff
copydone
done
If all you're trying to do is run something on another thread while you do something else, then your initial Task.Run approach should be fine (note that you can get a Task<unit> if you call Task.Run<_> instead of the non-generic Task.Run, which might be marginally easier to deal with).
However, you should be clear about your goals - arguably a "proper" asynchronous file copy wouldn't require a separate .NET thread (which is a relatively heavy-weight primitive) and would rely on operating system features like completion ports instead; since System.IO.File doesn't provide a native CopyAsync method you'd need to write your own (see https://stackoverflow.com/a/35467471/82959 for a simple C# implementation that would be easy to transliterate).

WPF background operations using Asynchronous Workflows

To execute operations on a background thread and avoid blocking the UI in a WPF application, I often find myself writing this pattern:
async {
// some code on the UI thread
let uiThread = SynchronizationContext.Current
do! Async.SwitchToThreadPool()
let! result = // some Async<'t>
do! Async.SwitchToContext uiThread
// do things with the result if it wasn't () all along
}
Am I doing this right at all? Is this idiomatic? Should it be done differently?
If this is correct, of course I would prefer not to have to do it like that all the time - is there a built-in shorter way to achieve the same thing? None of the existing Async functions appears to do something like that.
If not, does it make sense to just turn the above code into a function?
let onThreadPool operation =
async {
let context = SynchronizationContext.Current
do! Async.SwitchToThreadPool()
let! result = operation
do! Async.SwitchToContext context
return result
}
That adds another level of async { } nesting - can this cause issues at "some" point?
What you're doing here definitely makes sense. One useful operation here is Async.StartImmediate, which starts the async workflow on the current thread. If you call this from the UI thread, this guarantees that the workflow will also start on the UI thread and so you can capture the synchronization context inside the workflow.
The other trick is that many built-in asynchronous F# operations automatically jump back to the original synchronization context (those that are created using Async.FromContinuations, including e.g. AsyncDownloadString), so when you're calling one of those, you do not even need to explicitly jump back to the original synchronization context.
But for other asynchronous operations (and for non-async operations that you want to run in the background), your onThreadPool function looks like a great way of doing this.
#random-dev is right capturing the context must happen outside the workflow
let onThreadPool operation =
let context = SynchronizationContext.Current
async {
do! Async.SwitchToThreadPool()
let! result = operation
do! Async.SwitchToContext context
return result
}

Resources