Nifi Processor to performantly handle asynchronous tasks - asynchronous

I have a Nifi processor that is calling an external service that can take days before a result is returned. During this time the processor can call Thread.sleep() periodically to relinquish CPU.
The issue is that even if Thread.sleep() is called in an onTrigger() method, the NiFi processor will not read in and handle new FlowFiles since it is waiting for onTrigger() to finish. From NiFi's perspective the cpu is still blocking for the asynchronous call to finish.
Is there a way to maintain concurrency when asynchronous calls are being made in the onTrigger() method of a NiFi processor?

Val Bonn's suggestion of pushing asynchronous FlowFiles back to a WAIT queue works well. As asynchronous requests come in, java Process objects are created and held in memory. The FlowFile is then routed to a WAIT relationship which is connected back into the processor. Periodically FlowFiles from the WAIT queue are checked against the corresponding Process to see if it completed and are then routed to a SUCCESS relationship, otherwise they are penalized. This allows many long running asynchronous processes to be kicked off without allocating precious cpu resources for each incoming request. One source of complexity was handling processor shutdowns invoked from the UI. In these situations an onStopped method is invoked that waits for all in memory processes to complete and archives the stderr and stdout to disk. When the processor is started again, the archive is read back in and paired against any FlowFiles in the WAIT queue.

Related

Multiple consumer on single JMS queue

JMS Queue is having 2 consumers, synchronous and asynchronous Java application process waiting for the response.
1)Synchronous application send request and will be waiting for the response for 60 seconds based on the JMS correlation ID.
2)Asynchronous thread will be constantly listening on the same queue.
In this scenario, when the response is received on the queue within 60 second I would expect load is distributed on both synchronous and asynchronous application. However, for some unknown reason almost all the response messages are consumed by synchronous process. And,only in some cases the messages are picked up asynchronous process.
Are there any factors that could cause only synchronous application to pick almost all the messages?
There is usually no guarantee that the load will be distributed evenly, especially if its synchronous versus async. consumer. The synchronous consumer will have to poll, wait, poll, wait while the async. consumer is probably waiting on the socket in a separate thread until a message arrives and then call your callback. So the async. consumer will most always be there first.
Any chance you can change to Topics and discard messages you don't wont ? Or change your sync. consumer to be async ? Another alternative would be to build a small 'asnyc' gateway in front of your synchronous consumer: a little application that makes an async consumption and then copies each message received to a second queue where the sync. consumer picks it up. Depending on your JMS provider it might support this type of 'JMS bridge' already - what are you using ?

MVC3 AsyncController - Can we send heartbeat data to the client?

In order to overcome the (apparent) 4 minute idle connection timeout on the Azure load balancer, it seems necessary to send some data down the pipe to the client every now and again to keep the connection from being regarded as idle.
Our controller is set up as an AsyncController, and it fires several different asynchronous methods on other objects, all of which are set up to use IO Completion Ports. Thus, we return from our method immediately, and when the completion packet is processed, IIS hooks back up to the original request so that we can render our View.
Is there any way to periodically send a few bytes down the wire in this case? In a "classic" situation, we could have executed the method and then just spun while we waited, sending data every few seconds until the asynchronous method was complete. But, in this situation, the IIS thread is freed to go do other business, and we hook back up to it in our completion callback. What to do? Is this possible?
While your particular case concerns Windows Azure specific (the 4 minute timeout of LBs), the question is pure IIS / ASP.NET workwise. Anyway, I don't think it is possible to send "ping-backs" to the client while in AsyncController/AsyncPage. This is the whole idea of the AsyncPages/Controllers. The IIS leaves the socket aside having the thread serving other requests. And gets back only when you got the OutstandingOperations to zero with AsyncManager.OutstandingOperations.Decrement(); Only then the control is given back to send final response to the client. And once you are the point of sending response, there is no turning back.
I would rather argue for the architectural approach of why you thing someone would wait 4 minutes to get a response (even with a good animated "please wait")? A lot of things may happen during this time. From browser crash, through internet disruption to total power loss/disruption at client. If you are doing real Azure, why not just send tasks for a Worker Role via a Queue (Azure Storage Queues or Service Bus Queues). The other option that stays in front of you for so long running tasks is to use SingalR and fully AJAXed solution. Where you communicate via SignalR the status of the long running operation.
UPDATE 1 due to comments
In addition to the approach suggested by #knightpfhor this can be also achieved with a Queues. Requestor creates a task with some Unique ID and sends it to "Task submission queue". Then "listens" (or polls at regular/irregular intervals) a "Task completion" queue for a message with given Task ID.
In any way I don't see a reason for keeping client connected for the whole duration of the long running task. There are number of ways to decouple such communication.

Why create new thread with startAsync instead of doing work in servlet thread?

In servlet 3.0 one can use startAsync to put long work in another thread, so that you can free up servlet thread.
Seems that I'm missing something, because I don't see, why not just to use servlet thread for working? Is the thread created by startAsync somehow cheaper?
In most situations when handling requests you are blocking or waiting on some external resource/condition. In this case you are occupying the thread (hence a lot of memory) without doing any work.
With servlet 3.0 you can serve thousands of concurrent connections, much more than available threads. Think about an application that provides downloading of files with limited throughput. Most of the time your threads are idle because they are waiting to send next chunk of data. In ordinary servlets you cannot serve more clients than the number of your HTTP threads, even though most of the time these threads are idle/sleeping.
In servlet 3.0 you can have thousands of connected clients with few HTTP threads. You can find a real world example in my article: Tenfold increase in server throughput with Servlet 3.0 asynchronous processing inspired by this question: Restrict download file bandwidth/speed in Servlet
Is the thread created by startAsync somehow cheaper?
There is no thread created by startAsync! It just tells the servlet container: hey, although the doGet/doPost method finished, I am not done with this request, please do not close. That's the whole point - you probably won't create new thread per each async request. Here is another example - you have thousands of browsers waiting for a stock price change using comet. In standard servlets this would mean: thousands of idle threads waiting for some event.
With servlet 3.0 you can just keep all asynchronous requests waiting in an ArrayList or a some queue. When the stock price change arrives, send it to all clients one after another. Not more than one thread is needed in this scenario - and all HTTP threads are free to process remaining resources.
With servlet 3.0 you can just keep all asynchronous requests waiting in an ArrayList or a some queue
Problem is this. You still need a new thread to process the request and pick up the request to finally send the response.
So we free up http threads but have to create some thread to process the request

CFQ IO scheduler request queues

The CFQ IO scheduler in Linux has a set of request queues.The synchronous requests from processes go into separate per process request queues while all asynchronous requests go into a set of shared queues.
How are requests classified as synchronous or asynchronous? Does asynchronous in this context mean IO done using kernel AIO? ( and all other normal read()/write() and buffered fread()/fwrite() being counted as synchronous)
Synchronous requests are those the process is blocked until they complete, asynchronous requests are those that the process can continue in parallel to their completion.
Typically, all normal reads a program makes are synchronous since the process cannot advance until it has the data it requested. Writes however are most often asynchronous by nature - as long as the process is guaranteed to see all writes it has performed, which is taken care by the buffer/page cache, the process does not care when the data is actually written to the storage device once it has called the write system call.
From there on it gets complicated: an fsync() system call is a synchronous request and the same is true for some meta data changing calls on journalled file systems, but not on non journalled ones and so on...

How to limit transfer rate using HttpHandler

I'm programming a file transfer handler with speed limit feature, the rate based on user level. How do I control/calculate transfer rate in HttpHandler?.
Some asp.net resource tell me that use Thread.Sleep will block asp.net thread pool.
It is generally a bad idea to Sleep any thread from ASP .NET, because those threads could be used otherwise to service requests from the pool. If there were say, 10 threads in the pool, sleeping 10 threads that were processing downloads would cause all other requests to pile up in the queue until a download had finished.
You are perhaps best served by creating an IHttpAsyncHandler instead of an IHttpHandler, as perscribed in:
http://msdn.microsoft.com/en-us/library/ms227433.aspx
You can use a timer to periodically pump x bytes of data to the client (but be sure to periodically pool for a closed connection using IsClientConnected or some such).
You might want to try using timers and a timer callback to do this. The idea would be to have a timer (or maybe two) that triggers when your handler can run and for how long. Every time the "go" timer expires, it starts a thread which writes your data to the response until the "stop" timer expires (or the same timer expires again), then that thread finishes what it was doing, does the housekeeping for the next thread, resets the "go" timer, and exits. Your main thread justs sets up the initial timer, the data for the transfer, then invokes the timer and exits. Presumably you'd need to keep a handle to the response somewhere so that you could get access to it again. By varying the length of time that the handler has to wait/execute you can control how many resources it uses.

Resources