I remember that I read that if task is complete faster than n msecs it's should not run as async. But I can't find any articles or docs where this interval is described.
The answer depends on concrete use case. One corner case is when the main thread needs the result of the (possibly) async function and can do nothing without that result. Then the function should never be run as async. On the opposite side, if the main thread does not need the result of async function, then any function which executes longer than the time to launch async execution (several microseconds), should be run in async mode.
Related
I think std::thread::sleep(...) is not the right use case here.
Using the tokio crate, there exists this function
tokio::time::delay_for(tokio::time::Duration::from_millis(WAIT_TIME_INTERVAL_MS));
That actually waits for Duration with any execution.
I need to wait for many Future to complete their computation, a bunch of HTTP requests, I don't want to await all of them in order as this will result in each Future being evaluated in order, one after the other, kinda synchronously, which is wasteful as some Future might complete sooner than others.
I can't find in the docs of Future.wait is truly asynchronous, firing all the Future at the same time, or if its the same as calling them one after the other, waiting for the previous to complete to call the next.
In a nushell: I am looking for the best performant way to get the results of many HTTP requests in Dart.
Future.wait does await multiple futures in an asynchronous way, as in each of the futures is started independently of one another. By contrast, Future.forEach will loop through the list of futures and await on them in sequence.
For example, if you had the following list of futures:
var futures = [
Future.delayed(Duration(seconds, 3)).then((_) => print('1')),
Future.delayed(Duration(seconds, 2)).then((_) => print('2')),
Future.delayed(Duration(seconds, 1)).then((_) => print('3')),
];
Calling Future.wait on this list would result in the following output:
3 (after 1 second)
2 (after 2 seconds)
1 (after 3 seconds)
Whereas calling Future.forEach on the list would result in this:
1 (after 3 seconds)
2 (after 5 seconds)
3 (after 6 seconds)
This is because wait fires them all simultaneously then handles them as they all resolve while forEach fires them individually in order and waits for each one to resolve before going to the next one.
(Here is the usual disclaimer that asynchronous programming is not the same thing as parallel programming, and that Dart is an inherently single-threaded language so true parallelism cannot be achieved with futures alone.)
If you check the implementation of the Future.wait method in:
https://github.com/dart-lang/sdk/blob/a75ffc89566a1353fb1a0f0c30eb805cc2e8d34c/sdk/lib/async/future.dart#L354-L443
You can actually see it goes through each Future in the input list and runs the then method which a method to be run when the Future completes:
for (var future in futures) {
int pos = remaining;
future.then((T value) {
...
You could therefore say that all Future instances are "started" at the same time since each completed Future will end up checking if there are more futures to wait for.
However, please note that Dart is single threaded with a job queue behind the scene, so unless some of the work are done in Isolate instances, each job will end up running in serial. More details about this can be found here:
https://medium.com/dartlang/dart-asynchronous-programming-isolates-and-event-loops-bffc3e296a6a
But yes, if you have a bunch if Future's which can be handles in any order it makes sense to use the Future.wait method.
I have a function that looks like this:
fun <R> map(block: (T) -> R): Result<R> { ... }
and I'd like to make a suspending version:
suspend fun <R> mapAsync(block: suspend (T) -> R): Result<R> { ... }
The logic in both bodies are identical, but one suspends and one doesn't.
I don't want to have this duplicated logic. The only way I found for this to work is to have the map function call to the mapAsync function and then wrap the result in runBlocking:
fun <R> map(block: (T) -> R): Result<R> =
runBlocking { mapAsync { block(it) } }
So I have two questions:
Is there any performance considerations in taking a "normal" function, passing it as a suspend parameter, then block until the result is done?
Based on what I've read, it sounds like the initial thread keeps "doing the work" inside the suspend block until it hits the first suspend point. Then, the continuation is put into the wait queue and the initial thread is free to perform other work.
However, in this case, there isn't any "real" suspend point because the actual function is just (T) -> R, though I don't know if the compiler can tell that.
I'm worried that this setup is actually utilizing another thread from the pool that is just notifying my first thread to wake up...
Is there a better way to have a suspend and non-suspend set of functions utilize the same code?
You have encountered the infamous "colored function" problem. The two worlds are indeed separate and, while you can add a superficial layer that unifies them, you can't get it at zero performance cost. This is so fundamental that, even assuming that your suspend block never actually suspends, and the wrapping layer leverages that assumption and doesn't even use runBlocking on it, you will still pay the price of "being ready to suspend". The price isn't huge, though: it means creating a small object per each suspend fun call that holds the data that would normally reside on the thread's native call stack. In your case only the outer block is suspendable, so that's just one such object.
runBlocking runs the coroutine on the thread where you called it and it will finish synchronously on the same thread unless it suspends itself. Therefore your case where you'd have some synchronous code in a suspend block wouldn't suffer an additional performance hit from thread coordination.
If the coroutine does suspend itself, then there will have to be some external worker thread which will react to the event that allows the coroutine to resume, and there will have to be some coordination between that thread and your original runBlocking thread. This is a fundamental mechanism that's there with or without coroutines.
Your approach is correct, runBlocking was specifically designed to serve as a connection between blocking and non-blocking operations. From the documentation:
Runs new coroutine and blocks current thread interruptibly until its
completion. This function should not be used from coroutine. It is
designed to bridge regular blocking code to libraries that are written
in suspending style, to be used in main functions and in tests.
https://kotlin.github.io/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines/run-blocking.html
Also further read:
https://github.com/Kotlin/kotlinx.coroutines/blob/master/docs/basics.md#bridging-blocking-and-non-blocking-worlds
And some interesting videos by Roman Elizarov:
https://youtu.be/_hfBv0a09Jc
https://youtu.be/a3agLJQ6vt8
I'm bit confused to understand the difference between Asynchronous calls and Callbacks.
I read this posts which teach about CallBacks but none of the answers addresses how it differs from Asynchronous calls.
Is this Callbacks = Lambda Expressions?
Callbacks are running in a different thread?
Can anyone explains this with plain simple English?
Very simply, a callback needn't be asynchronous.
http://docs.apigee.com/api-baas/asynchronous-vs-synchronous-calls
Synchronous:
If an API call is synchronous, it means that code execution will
block (or wait) for the API call to return before continuing. This
means that until a response is returned by the API, your application
will not execute any further, which could be perceived by the user as
latency or performance lag in your app. Making an API call
synchronously can be beneficial, however, if there if code in your app
that will only execute properly once the API response is received.
Asynchronous:
Asynchronous calls do not block (or wait) for the API call to return
from the server. Execution continues on in your program, and when the
call returns from the server, a "callback" function is executed.
In Java, C and C#, "callbacks" are usually synchronous (with respect to a "main event loop").
In Javascript, on the other hand, callbacks are usually asynchronous - you pass a function that will be invoked ... but other events will continue to be processed until the callback is invoked.
If you don't care what Javascript events occur in which order - great. Otherwise, one very powerful mechanism for managing asynchronous behavior in Javascript is to use "promises":
http://www.html5rocks.com/en/tutorials/es6/promises/
PS:
To answer your additional questions:
Yes, a callback may be a lambda - but it's not a requirement.
In Javascript, just about every callback will be an "anonymous function" (basically a "lambda expression").
Yes, callbacks may be invoked from a different thread - but it's certainly not a requirement.
Callbacks may also (and often do) spawn a thread (thus making themselves "asynchronous").
'Hope that helps
====================================================================
Hi, Again:
Q: #paulsm4 can you please elaborate with an example how the callback
and asynchronous call works in the execution flow? That will be
greatly helpful
First we need to agree on a definition for "callback". Here's a good one:
https://en.wikipedia.org/wiki/Callback_%28computer_programming%29
In computer programming, a callback is a piece of executable code that
is passed as an argument to other code, which is expected to call back
(execute) the argument at some convenient time. The invocation may be
immediate as in a synchronous callback, or it might happen at a later
time as in an asynchronous callback.
We must also define "synchronous" and "asynchronous". Basically - if a callback does all it's work before returning to the caller, it's "synchronous". If it can return to the caller immediately after it's invoked - and the caller and the callback can work in parallel - then it's "asynchronous".
The problem with synchronous callbacks is they can appear to "hang". The problem with asynchronous callbacks is you can lose control of "ordering" - you can't necessarily guarantee that "A" will occur before "B".
Common examples of callbacks include:
a) a button press handler (each different "button" will have a different "response"). These are usually invoked "asynchronousy" (by the GUI's main event loop).
b) a sort "compare" function (so a common "sort()" function can handle different data types). These are usually invoked "synchronously" (called directly by your program).
A CONCRETE EXAMPLE:
a) I have a "C" language program with a "print()" function.
b) "print()" is designed to use one of three callbacks: "PrintHP()", "PrintCanon()" and "PrintPDF()".
c) "PrintPDF()" calls a library to render my data in PDF. It's synchronous - the program doesn't return back from "print()" until the .pdf rendering is complete. It usually goes pretty quickly, so there's no problem.
d) I've coded "PrintHP()" and "PrintCanon()" to spawn threads to do the I/O to the physical printer. "Print()" exits as soon as the thread is created; the actual "printing" goes on in parallel with program execution. These two callbacks are "asynchronous".
Q: Make sense? Does that help?
They are quite similar but this is just mho.
When you use callbacks you specify which method should you should be called back on and you rely on the methods you call to call you back. You could specify your call back to end up anywhere and you are not guaranteed to be called back.
In Asynchronous programming, the call stack should unwind to the starting position, just as in normal synchronous programming.
Caveat: I am specifically thinking of the C# await functionality as there are other async techniques.
I want to comment paulsm4 above, but I have no enough reputation, so I have to give another new answer.
according to wikipedia, "a callback is a piece of executable code that is passed as an argument to other code, which is expected to call back (execute) the argument at some convenient time. The invocation may be immediate as in a synchronous callback, or it might happen at a later time as in an asynchronous callback.", so the decorative word "synchronous" and "asynchronous" are on "callback", which is the key-point. we often confuse them with an "asynchronous function", which is the caller function indeed. For example,
const xhr = new XMLHttpRequest();
xhr.addEventListener('loadend', () => {
log.textContent = `${log.textContent}Finished with status: ${xhr.status}`;
});
xhr.open('GET', 'https://raw.githubusercontent.com/mdn/content/main/files/en-us/_wikihistory.json');
xhr.send();
here, xhr.send() is an asynchronous caller function, while the anonymous function defined in xhr.addEventListener is an asynchronous callback function.
for clarification, the following is synchronous callback example:
function doOperation(callback) {
const name = "world";
callback(name);
}
function doStep(name) {
log.console(`hello, ${name}`);
}
doOperation(doStep)
then, let's answer the specified question:
Is this Callbacks = Lambda Expressions?
A: nop, callback is just a normal function, it can be named or anonymous(lambda expressions).
Callbacks are running in a different thread?
A: if callbacks are synchronous, they are running within the same thread of the caller function. if callbacks are asynchronous, they are running in another thread to the caller function to avoid blocking the execution of the caller.
A call is Synchronous: It returns control to the caller when it's done.
A call is Async.: Otherwise.
I am reading a large XML file using XmlReader and am exploring potential performance improvements via Async & pipelining. The following initial foray into the world of Async is showing that the Async version (which for all intents and purposes at this point is the equivalent of the Synchronous version) is much slower. Why would this be? All I've done is wrapped the "normal" code in an Async block and called it with Async.RunSynchronously
Code
open System
open System.IO.Compression // support assembly required + FileSystem
open System.Xml // support assembly required
let readerNormal (reader:XmlReader) =
let temp = ResizeArray<string>()
while reader.Read() do
()
temp
let readerAsync1 (reader:XmlReader) =
async{
let temp = ResizeArray<string>()
while reader.Read() do
()
return temp
}
let readerAsync2 (reader:XmlReader) =
async{
while reader.Read() do
()
}
[<EntryPoint>]
let main argv =
let path = #"C:\Temp\LargeTest1000.xlsx"
use zipArchive = ZipFile.OpenRead path
let sheetZipEntry = zipArchive.GetEntry(#"xl/worksheets/sheet1.xml")
let stopwatch = System.Diagnostics.Stopwatch()
stopwatch.Start()
let sheetStream = sheetZipEntry.Open() // again
use reader = XmlReader.Create(sheetStream)
let temp1 = readerNormal reader
stopwatch.Stop()
printfn "%A" stopwatch.Elapsed
System.GC.Collect()
let stopwatch = System.Diagnostics.Stopwatch()
stopwatch.Start()
let sheetStream = sheetZipEntry.Open() // again
use reader = XmlReader.Create(sheetStream)
let temp1 = readerAsync1 reader |> Async.RunSynchronously
stopwatch.Stop()
printfn "%A" stopwatch.Elapsed
System.GC.Collect()
let stopwatch = System.Diagnostics.Stopwatch()
stopwatch.Start()
let sheetStream = sheetZipEntry.Open() // again
use reader = XmlReader.Create(sheetStream)
readerAsync2 reader |> Async.RunSynchronously
stopwatch.Stop()
printfn "%A" stopwatch.Elapsed
printfn "DONE"
System.Console.ReadLine() |> ignore
0 // return an integer exit code
INFO
I am aware that the above Async code does not do any actual Async work - what I a trying to ascertain here is the overhead of simply making it Async
I don't expect it to go faster just because I've wrapped it in an Async. My question is the opposite: why the dramatic (IMHO) slowdown.
TIMINGS
A comment below correctly pointed out that I should provide timings for datasets of various sizes which is implicitly what had led me to be asking this question in the first instance.
The following are some times based on small vs large datasets. While the absolute values are not too meaningful, the relativities are interesting:
30 elements (small dataset)
Normal: 00:00:00.0006994
Async1: 00:00:00.0036529
Async2: 00:00:00.0014863
(A lot slower but presumably indicative of Async setup costs - this is as expected)
1.5 million elements
Normal: 00:00:01.5749734
Async1: 00:00:03.3942754
Async2: 00:00:03.3760785
(~ 2x slower. Surprised that the difference in timing is not amortized as the dataset gets bigger. If this is the case, then pipelining/parallelization can only improve performance here if you have more than two cores - to outweigh the overhead that I can't explain...)
There's no asynchronous work to do. In effect, all you get is the overheads and no benefits. async {} doesn't mean "everything in the braces suddenly becomes asynchronous". It simply means you have a simplified way of using asynchronous code - but you never call a single asynchronous function!
Additionaly, "asynchronous" doesn't necessarily mean "parallel", and it doesn't necessarily involve multiple threads. For example, when you do an asynchronous request to read a file (which you're not doing here), it means that the OS is told what you want to be done, and how you should be notified when it is done. When you run code like this using RunSynchronously, you're simply blocking one thread while posting asynchronous file requests - a scenario pretty much identical to using synchronous file requests in the first place.
The moment you do RunSynchronously, you throw away any reason whatsoever to use asynchronous code in the first place. You're still using a single thread, you just blocked another thread at the same time - instead of saving on threads, you waste one, and add another to do the real work.
EDIT:
Okay, I've investigated with the minimal example, and I've got some observations.
The difference is absolutely brutal with a profiler on - the non-async version is somewhat slower (up to 2x), but the async version is just never ending. It seems as if a huge number of allocations is going on - and yet, when I break the profiler, I can see that the non-async version (running in 4 seconds) makes a hundred thousand allocations (~20 MiB), while the async version (running over 10 minutes) only makes mere thousands. Maybe the memory profiler interacts badly with F# async? The CPU time profiler doesn't have this problem.
The generated IL is very different for the two cases. Most importantly, even though our async code doesn't actually do anything asynchronous, it creates a ton of async builder helpers, sprinkles a ton of (asynchronous) Delay calls through the code, and going into outright absurd territory, each iteration of the loop is an extra method call, including the setup of a helper object.
Apparently, F# automatically translates while into an asynchronous while. Now, given how well compressed xslt data usually is, very little I/O is involved in those Read operations, so the overhead absolutely dominates - and since every iteration of the "loop" has its own setup cost, the overhead scales with the amount of data.
While this is mostly caused by the while not actually doing anything, it also obviously means that you need to be careful about what you select as async, and you need to avoid using it in a case where CPU time dominates (as in this case - after all, both the async and non-async case are almost 100% CPU tasks in practice). This is further worsened by the fact that Read reads a single node at a time - something that's relatively trivial even in a big, non-compressed xml file. The overheads absolutely dominate. In effect, this is analogous to using Parallel.For with a body like sum += i - the setup cost of each of the each of the iterations absolutely dwarfs any actual work being done.
The CPU profiling makes this rather obvious - the two most work intensive methods are:
XmlReader.Read (expected)
Thread::intermediateThreadProc - also known as "this code runs on a thread pool thread". The overhead from this in a no-op code like this is around 100% - yikes. Apparently, even though there is no real asynchronicity anywhere, the callbacks are never run synchronously. Every iteration of the loop posts work to a new thread pool thread.
The lesson learned? Probably something like "don't use loops in async if the loop body does very little work". The overhead is incurred for each and every iteration of the loop. Ouch.
Asynchronous code doesn't magically make your code faster. As you've discovered, it'll tend to make isolated code slower, because there's overhead involved with managing the asynchrony.
What it can do is to be more efficient, but that's not the same as being inherently faster. The main purpose of Async is to make Input/Output code more efficient.
If you invoke a 'slow', blocking I/O operation directly, you'll block the thread until the operation returns.
If you instead invoke that slow operation asynchronously, it may free up the thread to do other things. It does require that there's an underlying implementation that's not thread-bound, but uses another mechanism for receiving the response. I/O Completion Ports could be such a mechanism.
Now, if you run a lot of asynchronous code in parallel, it may turn out to be faster than attempting to run the blocking implementation in parallel, because the async versions use fewer resources (fewer threads = less memory).