I have been though asynchronous I/O is always has a callback form. But recently I discovered some low level implementations are using polling style API.
kqueue
libpq
And this leads me to think that maybe all (or most) asynchronous I/O (any of file, socket, mach-port, etc.) is implemented in a kind of polling manner at last. Maybe the callback form is just an abstraction only for higher-level API.
This could be a silly question, but I don't know how actually most of asynchronous I/O implemented at low level. I just used the system level notifications, and when I see kqueue - which is the system notification, it's a polling style!
How should I understand asynchronous I/O at low-level? How the high-level asynchronous notification is being made from low-level polling system? (if it actually does)
At the lowest (or at least, lowest worth looking at) hardware level, asynchronous operations truly are asynchronous in modern operating systems.
For example, when you read a file from the disk, the operating system translates your call to read to a series of disk operations (seek to location, read blocks X through Y, etc.). On most modern OSes, these commands get written either to special registers, or special locations in main memory, and the disk controller is informed that there are operations pending. The operating system then goes on about its business, and when the disk controller has completed all of the operations assigned to it, it triggers an interrupt, causing the thread that requested the read to pickup where it left off.
Regardless of what type of low-level asynchronous operation you're looking at (disk I/O, network I/O, mouse and keyboard input, etc.), ultimately, there is some stage at which a command is dispatched to hardware, and the "callback" as it were is not executed until the hardware reaches out and informs the OS that it's done, usually in the form of an interrupt.
That's not to say that there aren't some asynchronous operations implemented using polling. One trivial (but naive and costly) way to implement any blocking operation asynchronously is just to spawn a thread that waits for the operation to complete (perhaps polling in a tight loop), and then call the callback when it's finished. Generally speaking, though, common asynchronous operations at the OS level are truly asynchronous.
It's also worth mentioning that just because an API is blocking doesn't mean it's polling: you can put a blocking API on an asynchronous operation, and a non-blocking API on a synchronous operation. With things like select and kqueues, for example, the thread actually just goes to sleep until something interesting happens. That "something interesting" comes in in the form of an interrupt (usually), and that's taken as an indication that the operating system should wake up the relevant threads to continue work. It doesn't just sit there in a tight loop waiting for something to happen.
There really is no way to tell whether a system uses polling or "real" callbacks (like interrupts) just from its API, but yes, there are asynchronous APIs that are truly backed by asynchronous operations.
Related
The library of asyncio in Python, and generally, when we talk about asynchronous programming, I always think about doing “concurrent” I/O operations only on the level thread for optimized CPU use.
The library of asyncio has function of asyncio.sleep(seconds), but what disturb me was that sleep operation isn’t I/O operation, sleep operation is done on the kernel level with the CPU hardware without any external devices that can be counted as I/O [my definition for I/O is every hardware except from CPU and RAM].
So why does the asyncio lib (Asynchronous I/O) call this operation as an asynchronous I/O operation?
This is not a network interface controller we send requests to or the hard disk. I don’t have a problem with “concurrent” every operation we can on the level thread. However, the name of I/O in the end of the library makes me feel that it isn’t the proper terminology. I will be happy for clarification.
One more related question, does the terminology of asynchronous programming refer to “concurrent” I/O operations only or every operation, including CPU operations like x = x + 1 on the level thread? (I guess the last operation can be done “concurrently” on the level thread, but this will be unnecessary)
Link:
https://docs.python.org/3/library/asyncio.html
Code snippet:
import asyncio
async def main():
print('Hello ...')
await asyncio.sleep(1)
print('... World!')
asyncio.run(main())
Paraphrasing Wikipedia, "Asynchronous programming" generally refers to the occurrence of events outside of the main program flow and ways of handling such events. As such, asynchronous operations are not necessarily I/O ones.
These asynchronous events are generally handled at the hardware or OS level and it is important to understand that at this level almost anything is asynchronous: jobs are put into queues and scheduled by the OS, then they are regularly polled for completion by the OS which then notifies the main application that the job is done.
Such asynchronous events comprises:
Network requests (multiplexed and polled by the OS),
Timers (managed by hardware timers and interrupts),
Communication with various external devices such as keyboards (hardware interrupts),
Communication with internal devices such as the GPU (jobs are committed to command queues),
etc.
The purpose of the AsyncIO library is to allow the expression of asynchronous programs in a more "structured" and linear way. As such, it wraps many common asynchronous operations such as I/Os and timers into async-await equivalents. AsyncIO is thus not restricted to only asynchronous I/O operations and one can implement an AsyncIO async-await interface to support GPU for example.
Using MPI_SEND (the standard blocking send) is simpler than using MPI_ISEND (the standard non-blocking send), because the latter should be used along with another MPI function to ensure that the communication has been "completed", so that the send buffer can be reused. But apart from that, has MPI_SEND any advantages over MPI_ISEND? It seems that, in general, MPI_ISEND prevents deadlock and also allows better performance (because the calling process can do other things while the communication proceeds in the background by MPI implementation).
So, is it a good idea to use the blocking version at all?
Performance wise, MPI_Send() has the potential of being faster that MPI_Isend() immediately followed by MPI_Wait() (and it is faster in Open MPI).
But most importantly, if your MPI library does not provide a progress thread, your message might be sitting on the sender node until MPI is progressed by your code (that typically occurs when a MPI subroutine is invoked, and definitely happens when MPI_Wait() is called).
I understood that .net know to use multiple threads for multiple requests.
So, if probably our service wont get more request than the number of threads our server can produce (it look like huge number), the only reason I can see to use async is on single request that do multiple blocking operations which can done in parallel.
Am I right?
Another advantage may be that serve multiple requests with same thread is cheaper than use multiple threads. How significant is this difference?
(note: no UI exists in our service (I saw that there is single thread for this, but it isn't relevant))
thanks!
Am I right?
No, doing multiple independent blocking operations, is the job of Concurrent APIs anyway (though sometimes they need Synchronization (like lock, mutex) to maintain the object state and avoid Race condition), but the usage of Async-Await is to schedule the IO Operations, like File Read / Write, call a remote service or Database Read / Write, which doesn't need a thread, as they are queued on a queue in hardware called IO Completion ports.
Benefits of Async-Await:
Doesn't start a IO operation on a separate Thread, since Thread is a costly resource, in terms memory and resource allocation and would do little precious than wait for IO call to come back. Separate thread shall be used for the compute bound operations, no IO bound.
Free up the UI / caller thread to make it completely responsive to carry out other tasks / operations
This is the evolution of Asynchronous programming model (BeginXX, EndXX), which was fairly complex to understand and implement
Another advantage may be that serve multiple requests with same thread is cheaper than use multiple threads. How significant is this difference?
Its a good strategy depending on the kind of request from caller, if they are compute bound better invoke a Parallel API and finish them fast, IO bound there's Async-Await, only issue with multiple threads is Resource allocation and Context switching, which needs to be factored in, but on other end it efficiently utilize the processor cores, which are fairly under utilized in the current day systems, as you would see most of the time processor is lying idle
To overlap MPI communications and computations, I am working on issuing asynchronous I/O (MPI calls) with user-defined computation function on the data from I/O.
MS-Window's 'Overlap' is not the friend of MPI (It supports for overlapped I/O only for File I/O and Socket communication, but not for MPI operations...)
I cannot find an appropriate MPI API for it, is there anyone with a glimpse on it?
There are no completion callbacks in MPI. Non-blocking operations always return a request handle that must be either be synchronously waited on using MPI_Wait and family or periodically tested using the non-blocking MPI_Test and family.
With the help of either MPI_Waitsome or MPI_Testsome, it is possible to implement a dispatch mechanism that monitors multiple requests and calls specific functions upon their completion. None of the MPI calls has any timeout characteristics though - it is either "wait forever" (MPI_Wait...) or "check without waiting" (MPI_Test...).
Non-blocking sends/recvs return immediately in MPI and the operation is completed in the background. The only way I see that happening is that the current process/thread invokes/creates another process/thread and loads an image of the send/recv code into that and itself returns. Then this new process/thread completes this operation and sets a flag somewhere which the Wait/Test returns. Am I correct ?
There are two ways that progress can happen:
In a separate thread. This is usually an option in most MPI implementations (usually at configure/compile time). In this version, as you speculated, the MPI implementation has another thread that runs a separate progress engine. That thread manages all of the MPI messages and sending/receiving data. This way works well if you're not using all of the cores on your machine as it makes progress in the background without adding overhead to your other MPI calls.
Inside other MPI calls. This is the more common way of doing things and is the default for most implementations I believe. In this version, non-blocking calls are started when you initiate the call (MPI_I<something>) and are essentially added to an internal queue. Nothing (probably) happens on that call until you make another call to MPI later that actually does some blocking communication (or waits for the completion of previous non-blocking calls). When you enter that future MPI call, in addition to doing whatever you asked it to do, it will run the progress engine (the same thing that's running in a thread in version #1). Depending on what the MPI call that's supposed to be happening is doing, the progress engine may run for a while or may just run through once. For instance, if you called MPI_WAIT on an MPI_IRECV, you'll stay inside the progress engine until you receive the message that you're waiting for. If you are just doing an MPI_TEST, it might just cycle through the progress engine once and then jump back out.
More exotic methods. As Jeff mentions in his post, there are more exotic methods that depend on the hardware on which you're running. You may have a NIC that will do some magic for you in terms of moving your messages in the background or some other way to speed up your MPI calls. In general, these are very specific to the implementation and hardware on which you're running, so if you want to know more about them, you'll need to be more specific in your question.
All of this is specific to your implementation, but most of them work in some way similar to this.
Are you asking, if a separate thread for message processing is the only solution for non-blocking operations?
If so, the answer is no. I even think, many setups use a different strategy. Usually progress of the message processing is done during all MPI-Calls. I'd recommend you to have a look into this Blog entry by Jeff Squyres.
See the answer by Wesley Bland for a more complete answer.