Why is Async version slower than single threaded version? - asynchronous

I am reading a large XML file using XmlReader and am exploring potential performance improvements via Async & pipelining. The following initial foray into the world of Async is showing that the Async version (which for all intents and purposes at this point is the equivalent of the Synchronous version) is much slower. Why would this be? All I've done is wrapped the "normal" code in an Async block and called it with Async.RunSynchronously
Code
open System
open System.IO.Compression // support assembly required + FileSystem
open System.Xml // support assembly required
let readerNormal (reader:XmlReader) =
let temp = ResizeArray<string>()
while reader.Read() do
()
temp
let readerAsync1 (reader:XmlReader) =
async{
let temp = ResizeArray<string>()
while reader.Read() do
()
return temp
}
let readerAsync2 (reader:XmlReader) =
async{
while reader.Read() do
()
}
[<EntryPoint>]
let main argv =
let path = #"C:\Temp\LargeTest1000.xlsx"
use zipArchive = ZipFile.OpenRead path
let sheetZipEntry = zipArchive.GetEntry(#"xl/worksheets/sheet1.xml")
let stopwatch = System.Diagnostics.Stopwatch()
stopwatch.Start()
let sheetStream = sheetZipEntry.Open() // again
use reader = XmlReader.Create(sheetStream)
let temp1 = readerNormal reader
stopwatch.Stop()
printfn "%A" stopwatch.Elapsed
System.GC.Collect()
let stopwatch = System.Diagnostics.Stopwatch()
stopwatch.Start()
let sheetStream = sheetZipEntry.Open() // again
use reader = XmlReader.Create(sheetStream)
let temp1 = readerAsync1 reader |> Async.RunSynchronously
stopwatch.Stop()
printfn "%A" stopwatch.Elapsed
System.GC.Collect()
let stopwatch = System.Diagnostics.Stopwatch()
stopwatch.Start()
let sheetStream = sheetZipEntry.Open() // again
use reader = XmlReader.Create(sheetStream)
readerAsync2 reader |> Async.RunSynchronously
stopwatch.Stop()
printfn "%A" stopwatch.Elapsed
printfn "DONE"
System.Console.ReadLine() |> ignore
0 // return an integer exit code
INFO
I am aware that the above Async code does not do any actual Async work - what I a trying to ascertain here is the overhead of simply making it Async
I don't expect it to go faster just because I've wrapped it in an Async. My question is the opposite: why the dramatic (IMHO) slowdown.
TIMINGS
A comment below correctly pointed out that I should provide timings for datasets of various sizes which is implicitly what had led me to be asking this question in the first instance.
The following are some times based on small vs large datasets. While the absolute values are not too meaningful, the relativities are interesting:
30 elements (small dataset)
Normal: 00:00:00.0006994
Async1: 00:00:00.0036529
Async2: 00:00:00.0014863
(A lot slower but presumably indicative of Async setup costs - this is as expected)
1.5 million elements
Normal: 00:00:01.5749734
Async1: 00:00:03.3942754
Async2: 00:00:03.3760785
(~ 2x slower. Surprised that the difference in timing is not amortized as the dataset gets bigger. If this is the case, then pipelining/parallelization can only improve performance here if you have more than two cores - to outweigh the overhead that I can't explain...)

There's no asynchronous work to do. In effect, all you get is the overheads and no benefits. async {} doesn't mean "everything in the braces suddenly becomes asynchronous". It simply means you have a simplified way of using asynchronous code - but you never call a single asynchronous function!
Additionaly, "asynchronous" doesn't necessarily mean "parallel", and it doesn't necessarily involve multiple threads. For example, when you do an asynchronous request to read a file (which you're not doing here), it means that the OS is told what you want to be done, and how you should be notified when it is done. When you run code like this using RunSynchronously, you're simply blocking one thread while posting asynchronous file requests - a scenario pretty much identical to using synchronous file requests in the first place.
The moment you do RunSynchronously, you throw away any reason whatsoever to use asynchronous code in the first place. You're still using a single thread, you just blocked another thread at the same time - instead of saving on threads, you waste one, and add another to do the real work.
EDIT:
Okay, I've investigated with the minimal example, and I've got some observations.
The difference is absolutely brutal with a profiler on - the non-async version is somewhat slower (up to 2x), but the async version is just never ending. It seems as if a huge number of allocations is going on - and yet, when I break the profiler, I can see that the non-async version (running in 4 seconds) makes a hundred thousand allocations (~20 MiB), while the async version (running over 10 minutes) only makes mere thousands. Maybe the memory profiler interacts badly with F# async? The CPU time profiler doesn't have this problem.
The generated IL is very different for the two cases. Most importantly, even though our async code doesn't actually do anything asynchronous, it creates a ton of async builder helpers, sprinkles a ton of (asynchronous) Delay calls through the code, and going into outright absurd territory, each iteration of the loop is an extra method call, including the setup of a helper object.
Apparently, F# automatically translates while into an asynchronous while. Now, given how well compressed xslt data usually is, very little I/O is involved in those Read operations, so the overhead absolutely dominates - and since every iteration of the "loop" has its own setup cost, the overhead scales with the amount of data.
While this is mostly caused by the while not actually doing anything, it also obviously means that you need to be careful about what you select as async, and you need to avoid using it in a case where CPU time dominates (as in this case - after all, both the async and non-async case are almost 100% CPU tasks in practice). This is further worsened by the fact that Read reads a single node at a time - something that's relatively trivial even in a big, non-compressed xml file. The overheads absolutely dominate. In effect, this is analogous to using Parallel.For with a body like sum += i - the setup cost of each of the each of the iterations absolutely dwarfs any actual work being done.
The CPU profiling makes this rather obvious - the two most work intensive methods are:
XmlReader.Read (expected)
Thread::intermediateThreadProc - also known as "this code runs on a thread pool thread". The overhead from this in a no-op code like this is around 100% - yikes. Apparently, even though there is no real asynchronicity anywhere, the callbacks are never run synchronously. Every iteration of the loop posts work to a new thread pool thread.
The lesson learned? Probably something like "don't use loops in async if the loop body does very little work". The overhead is incurred for each and every iteration of the loop. Ouch.

Asynchronous code doesn't magically make your code faster. As you've discovered, it'll tend to make isolated code slower, because there's overhead involved with managing the asynchrony.
What it can do is to be more efficient, but that's not the same as being inherently faster. The main purpose of Async is to make Input/Output code more efficient.
If you invoke a 'slow', blocking I/O operation directly, you'll block the thread until the operation returns.
If you instead invoke that slow operation asynchronously, it may free up the thread to do other things. It does require that there's an underlying implementation that's not thread-bound, but uses another mechanism for receiving the response. I/O Completion Ports could be such a mechanism.
Now, if you run a lot of asynchronous code in parallel, it may turn out to be faster than attempting to run the blocking implementation in parallel, because the async versions use fewer resources (fewer threads = less memory).

Related

How can I make CUDA return control after kernel launch?

It might be a stupid question but is there a way to return asynchronously from a kernel? For example, I have this kernel which does a first stream compaction which is outputted to the user but before it must do a second stream compaction to update its internal structure.
Is there a way to return the control to the user after the first stream compaction done while the GPU continues its second stream compaction in the background? Of course, the second stream compaction works only on shared memory and global memory, but nothing the user should retrieve.
I can't use thrust.
A GPU kernel does not, in itself, take control from the "user", i.e. from CPU threads on the system with the GPU.
However, with CUDA's runtime, the default way to invoke a GPU kernel has your thread wait until the kernel's execution concludes:
my_kernel<<<my_grid_dims,my_block_dims,dynamic_shared_memory_size>>>(args,go,here);
but you can also use streams. These are hardware-supported execution queues on which you can enqueue work (memory copying, kernel execution etc.) asynchronously, just like you asked.
Your launch in this case may look like:
cudaStream_t my_stream;
cudaError_t result = cudaStreamCreateWithFlags(&my_stream, cudaStreamNonBlocking);
if (result != cudaSuccess) { /* error handling */ }
my_kernel<<<my_grid_dims,my_block_dims,dynamic_shared_memory_size,my_stream>>>(args,go,here);
There are lots of resources on using streams; try this blog post for starters. The CUDA programming guide has a larg section on asynchronous execution .
Streams and various libraries
Thrust has offered asynchronous functionality for a while, using thrust::future and other constructs. See here.
My own Modern-C++ CUDA API wrappers make it somewhat easier to work with streams, relieving you of the need to check for errors all the time and to remember to destroy streams and release memory before it goes out of scope. make it somewhat easier to work with streams. See this example; the syntax looks something like this:
auto stream = device.create_stream(cuda::stream::async);
stream.enqueue.copy(d_a.get(), a.get(), nbytes);
stream.enqueue.kernel_launch(my_kernel, launch_config, d_a.get(), more, args);
(and errors throw an exception)

Efficiently connecting an asynchronous IMFSourceReader to a synchronous IMFTransform

Given an asynchronous IMFSourceReader connected to a synchronous only IMFTransform.
Then for the IMFSourceReaderCallback::OnReadSample() callback is it a good idea not to call IMFTransform::ProcessInput directly within OnReadSample, but instead push the produced sample onto another queue for another thread to call the transforms ProcessInput on?
Or would I just be replicating identical work source readers typically do internally? Or put another way does work within OnReadSample run the risk of blocking any further decoding work within the source reader that could have otherwise happened more asynchronously?
So I am suggesting something like:
WorkQueue transformInputs;
...
// Called back async
HRESULT OnReadSampleCallback(... IMFSample* sample)
{
// Push sample and return immediately
Push(transformInputs, sample);
}
// Different worker thread awoken for transformInputs queue samples
void OnTransformInputWork()
{
// Transform object is not async capable
transform->TransformInput(0, Pop(transformInputs), 0);
...
}
This is touched on, but not elaborated on here 'Implementing the Callback Interface':
https://learn.microsoft.com/en-us/windows/win32/medfound/using-the-source-reader-in-asynchronous-mode
Or is it completely dependent on whatever the source reader sets up internally and not easily determined?
It is not a good idea to perform a long blocking operation in IMFSourceReaderCallback::OnReadSample. Nothing is going to be fatal or serious but this is not the intended usage.
Taking into consideration your previous question about audio format conversion though, audio sample data conversion is fast enough to happen on such callback.
Also, it is not clear or documented (depends on actual implementation), ProcessInput is often instant and only references input data. ProcessOutput would be computationally expensive in this case. If you don't do ProcessOutput right there in the same callback you might run into situation where MFT is no longer accepting input, and so you'd have to implement a queue anyway.
With all this in mind you would just do the processing in the callback neglecting performance impact assuming your processing is not too heavy, or otherwise you would just start doing the queue otherwise.

A MailboxProcessor that operates with a LIFO logic

I am learning about F# agents (MailboxProcessor).
I am dealing with a rather unconventional problem.
I have one agent (dataSource) which is a source of streaming data. The data has to be processed by an array of agents (dataProcessor). We can consider dataProcessor as some sort of tracking device.
Data may flow in faster than the speed with which the dataProcessor may be able to process its input.
It is OK to have some delay. However, I have to ensure that the agent stays on top of its work and does not get piled under obsolete observations
I am exploring ways to deal with this problem.
The first idea is to implement a stack (LIFO) in dataSource. dataSource would send over the latest observation available when dataProcessor becomes available to receive and process the data. This solution may work but it may get complicated as dataProcessor may need to be blocked and re-activated; and communicate its status to dataSource, leading to a two way communication problem. This problem may boil down to a blocking queue in the consumer-producer problem but I am not sure..
The second idea is to have dataProcessor taking care of message sorting. In this architecture, dataSource will simply post updates in dataProcessor's queue. dataProcessor will use Scanto fetch the latest data available in his queue. This may be the way to go. However, I am not sure if in the current design of MailboxProcessorit is possible to clear a queue of messages, deleting the older obsolete ones. Furthermore, here, it is written that:
Unfortunately, the TryScan function in the current version of F# is
broken in two ways. Firstly, the whole point is to specify a timeout
but the implementation does not actually honor it. Specifically,
irrelevant messages reset the timer. Secondly, as with the other Scan
function, the message queue is examined under a lock that prevents any
other threads from posting for the duration of the scan, which can be
an arbitrarily long time. Consequently, the TryScan function itself
tends to lock-up concurrent systems and can even introduce deadlocks
because the caller's code is evaluated inside the lock (e.g. posting
from the function argument to Scan or TryScan can deadlock the agent
when the code under the lock blocks waiting to acquire the lock it is
already under).
Having the latest observation bounced back may be a problem.
The author of this post, #Jon Harrop, suggests that
I managed to architect around it and the resulting architecture was actually better. In essence, I eagerly Receive all messages and filter using my own local queue.
This idea is surely worth exploring but, before starting to play around with code, I would welcome some inputs on how I could structure my solution.
Thank you.
Sounds like you might need a destructive scan version of the mailbox processor, I implemented this with TPL Dataflow in a blog series that you might be interested in.
My blog is currently down for maintenance but I can point you to the posts in markdown format.
Part1
Part2
Part3
You can also check out the code on github
I also wrote about the issues with scan in my lurking horror post
Hope that helps...
tl;dr I would try this: take Mailbox implementation from FSharp.Actor or Zach Bray's blog post, replace ConcurrentQueue by ConcurrentStack (plus add some bounded capacity logic) and use this changed agent as a dispatcher to pass messages from dataSource to an army of dataProcessors implemented as ordinary MBPs or Actors.
tl;dr2 If workers are a scarce and slow resource and we need to process a message that is the latest at the moment when a worker is ready, then it all boils down to an agent with a stack instead of a queue (with some bounded capacity logic) plus a BlockingQueue of workers. Dispatcher dequeues a ready worker, then pops a message from the stack and sends this message to the worker. After the job is done the worker enqueues itself to the queue when becomes ready (e.g. before let! msg = inbox.Receive()). Dispatcher consumer thread then blocks until any worker is ready, while producer thread keeps the bounded stack updated. (bounded stack could be done with an array + offset + size inside a lock, below is too complex one)
Details
MailBoxProcessor is designed to have only one consumer. This is even commented in the source code of MBP here (search for the word 'DRAGONS' :) )
If you post your data to MBP then only one thread could take it from internal queue or stack.
In you particular use case I would use ConcurrentStack directly or better wrapped into BlockingCollection:
It will allow many concurrent consumers
It is very fast and thread safe
BlockingCollection has BoundedCapacity property that allows you to limit the size of a collection. It throws on Add, but you could catch it or use TryAdd. If A is a main stack and B is a standby, then TryAdd to A, on false Add to B and swap the two with Interlocked.Exchange, then process needed messages in A, clear it, make a new standby - or use three stacks if processing A could be longer than B could become full again; in this way you do not block and do not lose any messages, but could discard unneeded ones is a controlled way.
BlockingCollection has methods like AddToAny/TakeFromAny, which work on an arrays of BlockingCollections. This could help, e.g.:
dataSource produces messages to a BlockingCollection with ConcurrentStack implementation (BCCS)
another thread consumes messages from BCCS and sends them to an array of processing BCCSs. You said that there is a lot of data. You may sacrifice one thread to be blocking and dispatching your messages indefinitely
each processing agent has its own BCCS or implemented as an Agent/Actor/MBP to which the dispatcher posts messages. In your case you need to send a message to only one processorAgent, so you may store processing agents in a circular buffer to always dispatch a message to least recently used processor.
Something like this:
(data stream produces 'T)
|
[dispatcher's BCSC]
|
(a dispatcher thread consumes 'T and pushes to processors, manages capacity of BCCS and LRU queue)
| |
[processor1's BCCS/Actor/MBP] ... [processorN's BCCS/Actor/MBP]
| |
(process) (process)
Instead of ConcurrentStack, you may want to read about heap data structure. If you need your latest messages by some property of messages, e.g. timestamp, rather than by the order in which they arrive to the stack (e.g. if there could be delays in transit and arrival order <> creation order), you can get the latest message by using heap.
If you still need Agents semantics/API, you could read several sources in addition to Dave's links, and somehow adopt implementation to multiple concurrent consumers:
An interesting article by Zach Bray on efficient Actors implementation. There you do need to replace (under the comment // Might want to schedule this call on another thread.) the line execute true by a line async { execute true } |> Async.Start or similar, because otherwise producing thread will be consuming thread - not good for a single fast producer. However, for a dispatcher like described above this is exactly what needed.
FSharp.Actor (aka Fakka) development branch and FSharp MPB source code (first link above) here could be very useful for implementation details. FSharp.Actors library has been in a freeze for several months but there is some activity in dev branch.
Should not miss discussion about Fakka in Google Groups in this context.
I have a somewhat similar use case and for the last two days I have researched everything I could find on the F# Agents/Actors. This answer is a kind of TODO for myself to try these ideas, of which half were born during writing it.
The simplest solution is to greedily eat all messages in the inbox when one arrives and discard all but the most recent. Easily done using TryReceive:
let rec readLatestLoop oldMsg =
async { let! newMsg = inbox.TryReceive 0
match newMsg with
| None -> oldMsg
| Some newMsg -> return! readLatestLoop newMsg }
let readLatest() =
async { let! msg = inbox.Receive()
return! readLatestLoop msg }
When faced with the same problem I architected a more sophisticated and efficient solution I called cancellable streaming and described in in an F# Journal article here. The idea is to start processing messages and then cancel that processing if they are superceded. This significantly improves concurrency if significant processing is being done.

Main loop in event-driven programming and alternatives

To the best of my knowledge, event-driven programs require a main loop such as
while (1) {
}
I am just curious if this while loop can cost a high CPU usage? Is there any other way to implement event-driven programs without using the main loop?
Your example is misleading. Usually, an event loop looks something like this:
Event e;
while ((e = get_next_event()) != E_QUIT)
{
handle(e);
}
The crucial point is that the function call to our fictitious get_next_event() pumping function will be generous and encourage a context switch or whatever scheduling semantics apply to your platform, and if there are no events, the function would probably allow the entire process to sleep until an event arrives.
So in practice there's nothing to worry about, and no, there's not really any alternative to an unbounded loop if you want to process an unbounded amount of information during your program's runtime.
Usually, the problem with a loop like this is that while it's doing one piece of work, it can't be doing anything else (e.g. Windows SDK's old 'cooperative' multitasking). The next naive jump up from this is generally to spawn a thread for each piece of work, but that's incredibly dangerous. Most people would end up with an executor that generally has a thread pool inside. Then, the handle call is actually just enqueueing the work and the next available thread dequeues it and executes it. The number of concurrent threads remains fixed as the total number of worker threads in the pool and when threads don't have anything to do, they are not eating CPU.

asynchronous and non-blocking calls? also between blocking and synchronous

What is the difference between asynchronous and non-blocking calls? Also between blocking and synchronous calls (with examples please)?
In many circumstances they are different names for the same thing, but in some contexts they are quite different. So it depends. Terminology is not applied in a totally consistent way across the whole software industry.
For example in the classic sockets API, a non-blocking socket is one that simply returns immediately with a special "would block" error message, whereas a blocking socket would have blocked. You have to use a separate function such as select or poll to find out when is a good time to retry.
But asynchronous sockets (as supported by Windows sockets), or the asynchronous IO pattern used in .NET, are more convenient. You call a method to start an operation, and the framework calls you back when it's done. Even here, there are basic differences. Asynchronous Win32 sockets "marshal" their results onto a specific GUI thread by passing Window messages, whereas .NET asynchronous IO is free-threaded (you don't know what thread your callback will be called on).
So they don't always mean the same thing. To distil the socket example, we could say:
Blocking and synchronous mean the same thing: you call the API, it hangs up the thread until it has some kind of answer and returns it to you.
Non-blocking means that if an answer can't be returned rapidly, the API returns immediately with an error and does nothing else. So there must be some related way to query whether the API is ready to be called (that is, to simulate a wait in an efficient way, to avoid manual polling in a tight loop).
Asynchronous means that the API always returns immediately, having started a "background" effort to fulfil your request, so there must be some related way to obtain the result.
synchronous / asynchronous is to describe the relation between two modules.
blocking / non-blocking is to describe the situation of one module.
An example:
Module X: "I".
Module Y: "bookstore".
X asks Y: do you have a book named "c++ primer"?
blocking: before Y answers X, X keeps waiting there for the answer. Now X (one module) is blocking. X and Y are two threads or two processes or one thread or one process? we DON'T know.
non-blocking: before Y answers X, X just leaves there and do other things. X may come back every two minutes to check if Y has finished its job? Or X won't come back until Y calls him? We don't know. We only know that X can do other things before Y finishes its job. Here X (one module) is non-blocking. X and Y are two threads or two processes or one process? we DON'T know. BUT we are sure that X and Y couldn't be one thread.
synchronous: before Y answers X, X keeps waiting there for the answer. It means that X can't continue until Y finishes its job. Now we say: X and Y (two modules) are synchronous. X and Y are two threads or two processes or one thread or one process? we DON'T know.
asynchronous: before Y answers X, X leaves there and X can do other jobs. X won't come back until Y calls him. Now we say: X and Y (two modules) are asynchronous. X and Y are two threads or two processes or one process? we DON'T know. BUT we are sure that X and Y couldn't be one thread.
Please pay attention on the two bold-sentences above. Why does the bold-sentence in the 2) contain two cases whereas the bold-sentence in the 4) contains only one case? This is a key of the difference between non-blocking and asynchronous.
Let me try to explain the four words with another way:
blocking: OMG, I'm frozen! I can't move! I have to wait for that specific event to happen. If that happens, I would be saved!
non-blocking: I was told that I had to wait for that specific event to happen. OK, I understand and I promise that I would wait for that. But while waiting, I can still do some other things, I'm not frozen, I'm still alive, I can jump, I can walk, I can sing a song etc.
synchronous: My mom is gonna cook, she sends me to buy some meat. I just said to my mom: We are synchronous! I'm so sorry but you have to wait even if I might need 100 years to get some meat back...
asynchronous: We will make a pizza, we need tomato and cheeze. Now I say: Let's go shopping. I'll buy some tomatoes and you will buy some cheeze. We needn't wait for each other because we are asynchronous.
Here is a typical example about non-blocking & synchronous:
// thread X
while (true)
{
msg = recv(Y, NON_BLOCKING_FLAG);
if (msg is not empty)
{
break;
}
else
{
sleep(2000); // 2 sec
}
}
// thread Y
// prepare the book for X
send(X, book);
You can see that this design is non-blocking (you can say that most of time this loop does something nonsense but in CPU's eyes, X is running, which means that X is non-blocking. If you want you can replace sleep(2000) with any other code) whereas X and Y (two modules) are synchronous because X can't continue to do any other things (X can't jump out of the loop) until it gets the book from Y.
Normally in this case, making X blocking is much better because non-blocking spends much resource for a stupid loop. But this example is good to help you understand the fact: non-blocking doesn't mean asynchronous.
The four words do make us confused easily, what we should remember is that the four words serve for the design of architecture. Learning about how to design a good architecture is the only way to distinguish them.
For example, we may design such a kind of architecture:
// Module X = Module X1 + Module X2
// Module X1
while (true)
{
msg = recv(many_other_modules, NON_BLOCKING_FLAG);
if (msg is not null)
{
if (msg == "done")
{
break;
}
// create a thread to process msg
}
else
{
sleep(2000); // 2 sec
}
}
// Module X2
broadcast("I got the book from Y");
// Module Y
// prepare the book for X
send(X, book);
In the example here, we can say that
X1 is non-blocking
X1 and X2 are synchronous
X and Y are asynchronous
If you need, you can also describe those threads created in X1 with the four words.
One more time: the four words serve for the design of architecture. So what we need is to make a proper architecture, instead of distinguishing the four words like a language lawyer. If you get some cases, where you can't distinguish the four words very clearly, you should forget about the four words, use your own words to describe your architecture.
So the more important things are: when do we use synchronous instead of asynchronous? when do we use blocking instead of non-blocking? Is making X1 blocking better than non-blocking? Is making X and Y synchronous better than asynchronous? Why is Nginx non-blocking? Why is Apache blocking? These questions are what you must figure out.
To make a good choice, you must analyze your need and test the performance of different architectures. There is no such an architecture that is suitable for various of needs.
Asynchronous refers to something done in parallel, say is another thread.
Non-blocking often refers to polling, i.e. checking whether given condition holds (socket is readable, device has more data, etc.)
Synchronous is defined as happening at the same time (in predictable timing, or in predictable ordering).
Asynchronous is defined as not happening at the same time. (with unpredictable timing or with unpredictable ordering).
This is what causes the first confusion, which is that asynchronous is some sort of synchronization scheme, and yes it is used to mean that, but in actuality it describes processes that are happening unpredictably with regards to when or in what order they run. And such events often need to be synchronized in order to make them behave correctly, where multiple synchronization schemes exists to do so, one of those called blocking, another called non-blocking, and yet another one confusingly called asynchronous.
So you see, the whole problem is about finding a way to synchronize an asynchronous behavior, because you've got some operation that needs the response of another before it can begin. Thus it's a coordination problem, how will you know that you can now start that operation?
The simplest solution is known as blocking.
Blocking is when you simply choose to wait for the other thing to be done and return you a response before moving on to the operation that needed it.
So if you need to put butter on toast, and thus you first need to toast the bred. The way you'd coordinate them is that you'd first toast the bred, then stare endlessly at the toaster until it pops the toast, and then you'd proceed to put butter on them.
It's the simplest solution, and works very well. There's no real reason not to use it, unless you happen to also have other things you need to be doing which don't require coordination with the operations. For example, doing some dishes. Why wait idle staring at the toaster constantly for the toast to pop, when you know it'll take a bit of time, and you could wash a whole dish while it finishes?
That's where two other solutions known respectively as non-blocking and asynchronous come into play.
Non-blocking is when you choose to do other unrelated things while you wait for the operation to be done. Checking back on the availability of the response as you see fit.
So instead of looking at the toaster for it to pop. You go and wash a whole dish. And then you peek at the toaster to see if the toasts have popped. If they haven't, you go wash another dish, checking back at the toaster between each dish. When you see the toasts have popped, you stop washing the dishes, and instead you take the toast and move on to putting butter on them.
Having to constantly check on the toasts can be annoying though, imagine the toaster is in another room. In between dishes you waste your time going to that other room to check on the toast.
Here comes asynchronous.
Asynchronous is when you choose to do other unrelated things while you wait for the operation to be done. Instead of checking on it though, you delegate the work of checking to something else, could be the operation itself or a watcher, and you have that thing notify and possibly interupt you when the response is availaible so you can proceed to the other operation that needed it.
Its a weird terminology. Doesn't make a whole lot of sense, since all these solutions are ways to create synchronous coordination of dependent tasks. That's why I prefer to call it evented.
So for this one, you decide to upgrade your toaster so it beeps when the toasts are done. You happen to be constantly listening, even while you are doing dishes. On hearing the beep, you queue up in your memory that as soon as you are done washing your current dish, you'll stop and go put the butter on the toast. Or you could choose to interrupt the washing of the current dish, and deal with the toast right away.
If you have trouble hearing the beep, you can have your partner watch the toaster for you, and come tell you when the toast is ready. Your partner can itself choose any of the above three strategies to coordinate its task of watching the toaster and telling you when they are ready.
On a final note, it's good to understand that while non-blocking and async (or what I prefer to call evented) do allow you to do other things while you wait, you don't have too. You can choose to constantly loop on checking the status of a non-blocking call, doing nothing else. That's often worse than blocking though (like looking at the toaster, then away, then back at it until it's done), so a lot of non-blocking APIs allow you to transition into a blocking mode from it. For evented, you can just wait idle until you are notified. The downside in that case is that adding the notification was complex and potentially costly to begin with. You had to buy a new toaster with beep functionality, or convince your partner to watch it for you.
And one more thing, you need to realize the trade offs all three provide. One is not obviously better than the others. Think of my example. If your toaster is so fast, you won't have time to wash a dish, not even begin washing it, that's how fast your toaster is. Getting started on something else in that case is just a waste of time and effort. Blocking will do. Similarly, if washing a dish will take 10 times longer then the toasting. You have to ask yourself what's more important to get done? The toast might get cold and hard by that time, not worth it, blocking will also do. Or you should pick faster things to do while you wait. There's more obviously, but my answer is already pretty long, my point is you need to think about all that, and the complexities of implementing each to decide if its worth it, and if it'll actually improve your throughput or performance.
Edit:
Even though this is already long, I also want it to be complete, so I'll add two more points.
There also commonly exists a fourth model known as multiplexed. This is when while you wait for one task, you start another, and while you wait for both, you start one more, and so on, until you've got many tasks all started and then, you wait idle, but on all of them. So as soon as any is done, you can proceed with handling its response, and then go back to waiting for the others. It's known as multiplexed, because while you wait, you need to check each task one after the other to see if they are done, ad vitam, until one is. It's a bit of an extension on top of normal non-blocking.
In our example it would be like starting the toaster, then the dishwasher, then the microwave, etc. And then waiting on any of them. Where you'd check the toaster to see if it's done, if not, you'd check the dishwasher, if not, the microwave, and around again.
Even though I believe it to be a big mistake, synchronous is often used to mean one thing at a time. And asynchronous many things at a time. Thus you'll see synchronous blocking and non-blocking used to refer to blocking and non-blocking. And asynchronous blocking and non-blocking used to refer to multiplexed and evented.
I don't really understand how we got there. But when it comes to IO and Computation, synchronous and asynchronous often refer to what is better known as non-overlapped and overlapped. That is, asynchronous means that IO and Computation are overlapped, aka, happening concurrently. While synchronous means they are not, thus happening sequentially. For synchronous non-blocking, that would mean you don't start other IO or Computation, you just busy wait and simulate a blocking call. I wish people stopped misusing synchronous and asynchronous like that. So I'm not encouraging it.
Edit2:
I think a lot of people got a bit confused by my definition of synchronous and asynchronous. Let me try and be a bit more clear.
Synchronous is defined as happening with predictable timing and/or ordering. That means you know when something will start and end.
Asynchronous is defined as not happening with predictable timing and/or ordering. That means you don't know when something will start and end.
Both of those can be happening in parallel or concurrently, or they can be happening sequentially. But in the synchronous case, you know exactly when things will happen, while in the asynchronous case you're not sure exactly when things will happen, but you can still put some coordination in place that at least guarantees some things will happen only after others have happened (by synchronizing some parts of it).
Thus when you have asynchronous processes, asynchronous programming lets you place some order guarantees so that some things happen in the right sequence, even though you don't know when things will start and end.
Here's an example, if we need to do A then B and C can happen at any time. In a sequential but asynchronous model you can have:
A -> B -> C
or
A -> C -> B
or
C -> A -> B
Every time you run the program, you could get a different one of those, seemingly at random. Now this is still sequential, nothing is parallel or concurrent, but you don't know when things will start and end, except you have made it so B always happens after A.
If you add concurrency only (no parallelism), you can also get things like:
A<start> -> C<start> -> A<end> -> C<end> -> B<start> -> B<end>
or
C<start> -> A<start> -> C<end> -> A<end> -> B<start> -> B<end>
or
A<start> -> A<end> -> B<start> -> C<start> -> B<end> -> C<end>
etc...
Once again, you don't really know when things will start and end, but you have made it so B is coordinated to always start after A ends, but that's not necessarily immediately after A ends, it's at some unknown time after A ends, and B could happen in-between fully or partially.
And if you add parallelism, now you have things like:
A<start> -> A<end> -> B<start> -> B<end> ->
C<start> -> C<keeps going> -> C<keeps going> -> C<end>
or
A<start> -> A<end> -> B<start> -> B<end>
C<start> -> C<keeps going> -> C<end>
etc...
Now if we look at the synchronous case, in a sequential setting you would have:
A -> B -> C
And this is the order always, each time you run the program, you get A then B and then C, even though C conceptually from the requirements can happen at any time, in a synchronous model you still define exactly when it will start and end. Off course, you could specify it like:
C -> A -> B
instead, but since it is synchronous, then this order will be the ordering every time the program is ran, unless you changed the code again to change the order explicitly.
Now if you add concurrency to a synchronous model you can get:
C<start> -> A<start> -> C<end> -> A<end> -> B<start> -> B<end>
And once again, this would be the order no matter how many time you ran the program. And similarly, you could explicitly change it in your code, but it would be consistent across program execution.
Finally, if you add parallelism as well to a synchronous model you get:
A<start> -> A<end> -> B<start> -> B<end>
C<start> -> C<end>
Once again, this would be the case on every program run. An important aspect here is that to make it fully synchronous this way, it means B must start after both A and C ends. If C is an operation that can complete faster or slower say depending on the CPU power of the machine, or other performance consideration, to make it synchronous you still need to make it so B waits for it to end, otherwise you get an asynchronous behavior again, where not all timings are deterministic.
You'll get this kind of synchronous thing a lot in coordinating CPU operations with the CPU clock, and you have to make sure that you can complete each operation in time for the next clock cycle, otherwise you need to delay everything by one more clock to give room for this one to finish, if you don't, you mess up your synchronous behavior, and if things depended on that order they'd break.
Finally, lots of systems have synchronous and asynchronous behavior mixed in, so if you have any kind of inherently unpredictable events, like when a user will click a button, or when a remote API will return a response, but you need things to have guaranteed ordering, you will basically need a way to synchronize the asynchronous behavior so it guarantees order and timing as needed. Some strategies to synchronize those are what I talk about previously, you have blocking, non-blocking, async, multiplexed, etc. See the emphasis on "async", this is what I mean by the word being confusing. Somebody decided to call a strategy to synchronize asynchronous processes "async". This then wrongly made people think that asynchronous meant concurrent and synchronous meant sequential, or that somehow blocking was the opposite of asynchronous, where as I just explained, synchronous and asynchronous in reality is a different concept that relates to the timing of things as being in sync (in time with each other, either on some shared clock or in a predictable order) or out of sync (not on some shared clock or in an unpredictable order). Where as asynchronous programming is a strategy to synchronize two events that are themselves asynchronous (happening at an unpredictable time and/or order), and for which we need to add some guarantees of when they might happen or at least in what order.
So we're left with two things using the word "asynchronous" in them:
Asynchronous processes: processes that we don't know at what time they will start and end, and thus in what order they would end up running.
Asynchronous programming: a style of programming that lets you synchronize two asynchronous processes using callbacks or watchers that interrupt the executor in order to let them know something is done, so that you can add predictable ordering between the processes.
A nonblocking call returns immediately with whatever data are available: the full number of bytes requested, fewer, or none at all.
An asynchronous call requests a transfer that will be performed in its whole(entirety) but will complete at some future time.
Putting this question in the context of NIO and NIO.2 in java 7, async IO is one step more advanced than non-blocking.
With java NIO non-blocking calls, one would set all channels (SocketChannel, ServerSocketChannel, FileChannel, etc) as such by calling AbstractSelectableChannel.configureBlocking(false).
After those IO calls return, however, you will likely still need to control the checks such as if and when to read/write again, etc.
For instance,
while (!isDataEnough()) {
socketchannel.read(inputBuffer);
// do something else and then read again
}
With the asynchronous api in java 7, these controls can be made in more versatile ways.
One of the 2 ways is to use CompletionHandler. Notice that both read calls are non-blocking.
asyncsocket.read(inputBuffer, 60, TimeUnit.SECONDS /* 60 secs for timeout */,
new CompletionHandler<Integer, Object>() {
public void completed(Integer result, Object attachment) {...}
public void failed(Throwable e, Object attachment) {...}
}
}
As you can probably see from the multitude of different (and often mutually exclusive) answers, it depends on who you ask. In some arenas, the terms are synonymous. Or they might each refer to two similar concepts:
One interpretation is that the call will do something in the background essentially unsupervised in order to allow the program to not be held up by a lengthy process that it does not need to control. Playing audio might be an example - a program could call a function to play (say) an mp3, and from that point on could continue on to other things while leaving it to the OS to manage the process of rendering the audio on the sound hardware.
The alternative interpretation is that the call will do something that the program will need to monitor, but will allow most of the process to occur in the background only notifying the program at critical points in the process. For example, asynchronous file IO might be an example - the program supplies a buffer to the operating system to write to file, and the OS only notifies the program when the operation is complete or an error occurs.
In either case, the intention is to allow the program to not be blocked waiting for a slow process to complete - how the program is expected to respond is the only real difference. Which term refers to which also changes from programmer to programmer, language to language, or platform to platform. Or the terms may refer to completely different concepts (such as the use of synchronous/asynchronous in relation to thread programming).
Sorry, but I don't believe there is a single right answer that is globally true.
Blocking call: Control returns only when the call completes.
Non blocking call: Control returns immediately. Later OS somehow notifies the process that the call is complete.
Synchronous program: A program which uses Blocking calls. In order not to freeze during the call it must have 2 or more threads (that's why it's called Synchronous - threads are running synchronously).
Asynchronous program: A program which uses Non blocking calls. It can have only 1 thread and still remain interactive.
Non-blocking: This function won't wait while on the stack.
Asynchronous: Work may continue on behalf of the function call after that call has left the stack
Synchronous means to start one after the other's result, in a sequence.
Asynchronous means start together, no sequence is guaranteed on the result
Blocking means something that causes an obstruction to perform the next step.
Non-blocking means something that keeps running without waiting for anything, overcoming the obstruction.
Blocking eg: I knock on the door and wait till they open it. ( I am idle here )
Non-Blocking eg: I knock on the door, if they open it instantly, I greet them, go inside, etc. If they do not open instantly, I go to the next house and knock on it. ( I am doing something or the other, not idle )
Synchrounous eg: I will go out only if it rains. ( dependency exists )
Asynchronous eg: I will go out. It can rain. ( independent events, does't matter when they occur )
Synchronous or Asynchronous, both can be blocking or non-blocking and vice versa
The blocking models require the initiating application to block when the I/O has started. This means that it isn't possible to overlap processing and I/O at the same time. The synchronous non-blocking model allows overlap of processing and I/O, but it requires that the application check the status of the I/O on a recurring basis. This leaves asynchronous non-blocking I/O, which permits overlap of processing and I/O, including notification of I/O completion.
To Simply Put,
function sum(a,b){
return a+b;
}
is a Non Blocking. while Asynchronous is used to execute Blocking task and then return its response
synchronous
asynchonous
block
Block I/O must be a synchronus I/O, becuase it has to be executed in order. Synchronous I/O might not be block I/O
Not exist
non-block
Non-block and Synchronous I/O at the same time is polling/multi-plexing..
Non-block and Asynchronous I/O at the same time is parallel execution, such as signal trigger…
block/non-block describe behavior of the initializing entity itself, it means what the entity does during wating for I/O completion
synchronous/asynchronous describe behavior between I/O initilaizing entity and I/O executor(the operating system, for example), it means whether these two entity can be executed parallelly
They differ in spelling only. There is no difference in what they refer to. To be technical you could say they differ in emphasis. Non blocking refers to control flow(it doesn't block.) Asynchronous refers to when the event\data is handled(not synchronously.)
Blocking: control returns to invoking precess after processing of primitive(sync or async) completes
Non blocking: control returns to process immediately after invocation

Resources