CFQ IO scheduler request queues - asynchronous

The CFQ IO scheduler in Linux has a set of request queues.The synchronous requests from processes go into separate per process request queues while all asynchronous requests go into a set of shared queues.
How are requests classified as synchronous or asynchronous? Does asynchronous in this context mean IO done using kernel AIO? ( and all other normal read()/write() and buffered fread()/fwrite() being counted as synchronous)

Synchronous requests are those the process is blocked until they complete, asynchronous requests are those that the process can continue in parallel to their completion.
Typically, all normal reads a program makes are synchronous since the process cannot advance until it has the data it requested. Writes however are most often asynchronous by nature - as long as the process is guaranteed to see all writes it has performed, which is taken care by the buffer/page cache, the process does not care when the data is actually written to the storage device once it has called the write system call.
From there on it gets complicated: an fsync() system call is a synchronous request and the same is true for some meta data changing calls on journalled file systems, but not on non journalled ones and so on...

Related

Multiple consumer on single JMS queue

JMS Queue is having 2 consumers, synchronous and asynchronous Java application process waiting for the response.
1)Synchronous application send request and will be waiting for the response for 60 seconds based on the JMS correlation ID.
2)Asynchronous thread will be constantly listening on the same queue.
In this scenario, when the response is received on the queue within 60 second I would expect load is distributed on both synchronous and asynchronous application. However, for some unknown reason almost all the response messages are consumed by synchronous process. And,only in some cases the messages are picked up asynchronous process.
Are there any factors that could cause only synchronous application to pick almost all the messages?
There is usually no guarantee that the load will be distributed evenly, especially if its synchronous versus async. consumer. The synchronous consumer will have to poll, wait, poll, wait while the async. consumer is probably waiting on the socket in a separate thread until a message arrives and then call your callback. So the async. consumer will most always be there first.
Any chance you can change to Topics and discard messages you don't wont ? Or change your sync. consumer to be async ? Another alternative would be to build a small 'asnyc' gateway in front of your synchronous consumer: a little application that makes an async consumption and then copies each message received to a second queue where the sync. consumer picks it up. Depending on your JMS provider it might support this type of 'JMS bridge' already - what are you using ?

How is non-blocking IO actually works from client's perspective?

So I came across the idea of blocking and non-blocking I/O. But what I understood from the concept and some of the sample implementations is that we implement code on the server side to achieve this nature of the code.
But now my question is, if (for example postman sending HTTP request to the server) the request has to wait for the server to respond, then what's the point of non-blocking I/O? (Please correct me if I am wrong) Or the whole concept is just for the increase of throughput of the server instead of actual async nature w.r.t. to client.
For example, in one of my project what I did was created a post request to create a request in the system for processing which will return the transaction ID, now using this transaction id, I can query the server to know the outcome.
I may sound too naive, but the concept has confused me a lot. I do not understand this concept clearly. Please help.
Thanks
the request has to wait for the server to respond, then what's the point of non-blocking I/O?
There's a confusion. Waiting for a response and (non)blocking i/o are very loosely related. You always have to wait for response. That's why youve made the request to begin with. But the question is: how?
Non-blocking HTTP: "Dear server, here's my request, please process it and send me a response, I'm going to do something else in the meantime, like calculating n-th digit of Pi (I'm a weirdo)".
Blocking HTTP: "Dear server, here's my request, please process it and send me a response, I'm going to patiently wait for it doing nothing".
Or the whole concept is just for the increase of throughput of the server instead of actual async nature w.r.t. to client.
The whole concept is to be able to do other things while waiting for i/o at the same time. And to do that while minimizing the usage of threads which don't scale well.
Asynchronous systems, i.e. systems without "I'm going to wait idly" part tend to perform better at the cost of complexity.
Side note: nonblocking i/o can be used both on the server side and client side. For example almost all JS engines in browsers are built on top of some asynchronous engine. JS is often single-threaded, meaning nonblocking i/o is necessary to achieve any concurrency.
But what I understood from the concept and some of the sample implementations is that we implement code on the server side to achieve this nature of the code.
You implement code in whereever you are doing the non-blocking UI. What a server does has no bearing on whether a client uses blocking or non-blocking UI, and what a client does has no bearing on whether a server uses blocking or non-blocking UI.
if (for example postman sending HTTP request to the server) the request has to wait for the server to respond, then what's the point of non-blocking I/O?
So that you're not wasting resources.
Let's consider first a simple console application that hits the web and then does something with the results. In this case there's very little to gain with non-blocking I/O as the application is just going to be sitting around waiting for something to do anyway.
Now let's consider a simple console application that hits 50 different web resources and collates the responses. Now non-blocking I/O is more useful, because with blocking I/O it would have to either get one resource after the other, or spin up 50 threads. With non-blocking I/O one, a small number of threads is all that is needed to hit 50 resources and respond promptly to each returning a response.
Now let's consider a GUI version of this application that wants to remain responsive to user input, while also running on low-power low-memory devices in which blocked threads are all the more expensive. The advantages of the above are increased.
Finally, consider a web application that is doing I/O both with the client and also as a client to a database, file system and maybe other web applications. It may have multiple requests at the same time, and blocking on either the I/O it does with the client or any of the I/O it does with db, file or other applications would cost a thread, which would put a scalability limit on how many requests it can handle simultaneously. Not blocking on I/O allows threads to be used for other requests while the I/O is pending.

Nifi Processor to performantly handle asynchronous tasks

I have a Nifi processor that is calling an external service that can take days before a result is returned. During this time the processor can call Thread.sleep() periodically to relinquish CPU.
The issue is that even if Thread.sleep() is called in an onTrigger() method, the NiFi processor will not read in and handle new FlowFiles since it is waiting for onTrigger() to finish. From NiFi's perspective the cpu is still blocking for the asynchronous call to finish.
Is there a way to maintain concurrency when asynchronous calls are being made in the onTrigger() method of a NiFi processor?
Val Bonn's suggestion of pushing asynchronous FlowFiles back to a WAIT queue works well. As asynchronous requests come in, java Process objects are created and held in memory. The FlowFile is then routed to a WAIT relationship which is connected back into the processor. Periodically FlowFiles from the WAIT queue are checked against the corresponding Process to see if it completed and are then routed to a SUCCESS relationship, otherwise they are penalized. This allows many long running asynchronous processes to be kicked off without allocating precious cpu resources for each incoming request. One source of complexity was handling processor shutdowns invoked from the UI. In these situations an onStopped method is invoked that waits for all in memory processes to complete and archives the stderr and stdout to disk. When the processor is started again, the archive is read back in and paired against any FlowFiles in the WAIT queue.

Using asynchronous API to create nodes in Zookeeper

While looking for zookeeper, the accepted answer says that concurrent writes are not allowed.
Explaining Apache ZooKeeper
Now my question is as Zookeeper has linear writes, that does not stop me to use Asynchronous APIs to create nodes and take the response in a callback ? Though internally it may not allow concurrent writes , or am I missing something ?
Even though zookeeper operates in an ensemble, writes are always served through the leader. Therefore, leader is capable of queuing write requests and completing them sequentially.
Using the asynchronous API will not do any harm to the above mentioned approach. Even though the write requests are asynchronous (from the client side), leader will always make sure that they are served sequentially. Once a asynchronous write request is served, client will be notified through the callback. It is simple as that. Remember, the requests are asynchronous as viewed by the client. But from the leader's point of view, they are served sequentially.

Tornado and asynchronous requests handling

My question is two-part:
What exactly does it mean by an 'asynchronous server', which is usually what people call Tornado? Can someone please provide an concrete example to illustrate the concept/definition?
In the case of Tornado, what exactly does it mean by 'non-blocking'? is this related to the asynchronous nature above? In addition, I read it somewhere it always uses a single thread for handling all the requests, does this mean that requests are handled sequentially one by one or in parallel? If the latter case, how does Tornado do it?
Tornado uses asynchronous, non-blocking I/O to solve the C10K problem. That means all I/O operations are event driven, that is, they use callbacks and event notification rather than waiting for the operation to return. Node.js and Nginx use a similar model. The exception is tornado.database, which is blocking. The Tornado IOLoop source is well documented if you want to look at in detail. For a concrete example, see below.
Non-blocking and asynchronous are used interchangeably in Tornado, although in other cases there are differences; this answer gives an excellent overview. Tornado uses one thread and handles requests sequentially, albeit very very quickly as there is no waiting for IO. In production you'd typically run multiple Tornado processes.
As far as a concrete example, say you have a HTTP request which Tornado must fetch some data (asynchronously) and respond to, here's (very roughly) what happens:
Tornado receives the request and calls the appropriate handler method in your application
Your handler method makes an asynchronous database call, with a callback
Database call returns, callback is called, and response is sent.
What's different about Tornado (versus for example Django) is that between step 2 and 3 the process can continue handling other requests. The Tornado IOLoop simply holds the connection open and continues processing its callback queue, whereas with Django (and any synchronous web framework) the thread will hang, waiting for the database to return.
This is my test about the performance of web.py(cherrypy) and tornado.
how is cherrypy working? it handls requests well compared with tornado when concurrence is low

Resources