I have a question regarding Flink and its timer service.
I have a keyBy stream which uses a timer,
When the timer is called I need to send an http request which might take time to respond.
My question is, should I make the http call async?
or flink is making the timer call already as a new thread with async per key?
Thanks in advance
You can use a ProcessFunction that stores the data required for the HTTP request, and that can have a timer. When it fires, you emit a record that has the request data, which a subsequent AsyncFunction will use to make the periodic request that you need.
If You are asking if the onTimer method is invoked in a separate thread for each key, then I am pretty sure that it is not. So, You would need to invoke the HTTP call asynchronously in this case.
But to be completely honest, I don't think this is a good idea, in general, to use the onTimer function to perform HTTP calls. I don't know anything about Your use-case, but I think You should consider using different mechanisms like creating side-output and then using the Flink Async operator. Using asynchronous calls inside the onTimer can be tricky, especially if You consider things like retries, timeouts and possible failures.
So according to comment the use-case is to make a call to service each X mins and then post something to Kafka. So, what You could do is to simply have a process function that schedules timers. Each time the timer is fired You then generate some output record with data needed for request if there is any data needed. Then You use the Async operator to actually perform the requests, parse the response and based on the response generate some output record that can be then saved to Kafka.
Related
I have a Netty ChannelInboundHandler in which I'm making calls to an external service (which is relatively slow). Depending upon the result of that call, I'm rewriting the pipeline to invoke different subsequent handlers. I have this working using a different executor group for the handler, but that's still inefficient, since the handler thread is doing nothing while waiting for the external service to respond.
Complicating the issue is that I'm doing this from a derivative of the PortUnificationServerHandler (itself a derivative of ByteToMessageDecoder), since the external service looks at the SNI hostname to determine whether or not to insert a SslHandler and decode or just to pass the traffic along straight.
I've seen how the HexDumpProxy example makes a call to an external service, but I don't see how this can be done from within something like ByteToMessageDecoder. My rough idea is to create a future for the external request, then have the futures call ChannelHandlerContext.fireUserEventTriggered on completion with a custom event that my handler can look for and do the pipeline rewrites. But that feels ugly, and tests didn't make it look like my event would reach my own handler...
Suggestions?
I was reading the documentation of Microsoft specifically the Async programming article and I didn't understand this section while he is explaining the work of the server's threads when using Async code.
because it(The server) uses async and await, each of its threads is freed up when the I/O-bound work starts, rather than when it finishes.
Could anyone help what does it mean by the threads r freed up when the I/O starts??
Here is the article : https://learn.microsoft.com/en-us/dotnet/standard/async-in-depth
When ASP.NET gets an HTTP request, it takes a thread from the thread pool and uses that to execute the handler for that request (e.g., a specific controller action).
For synchronous actions, the thread stays assigned to that HTTP request until the action completes. For asynchronous actions, the await in the action method may cause the thread to return an incomplete task to the ASP.NET runtime. In this case, ASP.NET will free up the thread to handle other requests while the I/O is in flight.
Further reading about the difference between synchronous and asynchronous request handling and how asynchronous work doesn't require a thread at all times.
When your application makes a call to an external resource like Database or HttpClient thread, that initiated connection needs to wait.
Until it gets a response, it waits idly.
In the asynchronous approach, the thread gets released as soon as the app makes an external call.
Here is an article about how it happens:
https://medium.com/#karol.rossa/asynchronous-programming-73b4f1988cc6
And performance comparison between async and sync apporach
https://medium.com/#karol.rossa/asynchronous-performance-1be01a71925d
Here's an analogy for you: have you ever ordered at a restaurant with a large group and had someone not be ready to order when the waiter came to them? Did they bring in a different waiter to wait for him or did the waiter just come back to him after he took other people's orders?
The fact that the waiter is allowed to come back to him later means that he's freed up immediately after calling on him rather than having to wait around until he's ready.
Asynchronous I/O works the same way. When you do a web service call, for example, the slowest part (from the perspective of the client at least) is waiting for the result to come back: most of the delay is introduced by the network (and the other server), during which time the client thread would otherwise have nothing to do but wait. Async allows the client to do other things in the background.
Do I need to do anything to make all requests asynchronous or are they automatically handled that way?
I ran some tests and it appears that each request comes in on its own thread, but I figure better to ask as I might have tested wrong.
Update: (I have a bad habit of not explaining fully - sorry) Here's my concern. A client browser makes a REST request to my server of http://data.domain/com/employee_database/?query=state:Colorado. That comes in to the appropriate method in the controller. That method queries the database and returns an object which is then turned into a JSON structure and returned to the calling app.
Now let's say 10,000 clients all make a similar query to the same server. So I have 10,000 requests coming in at once. Will my controller method be called simultaneously in 10,000 distinct threads? Or must the first request return before the second request is called?
I'm not asking about the code in my handler method having asynchronous components. For my case the request becomes a single SQL query so the code has nothing that can be handled asynchronously. And until I get the requested data, I can't return from the method.
No REST is not async by default. the request are handled synchronously. However, your web server (IIS) has a number of max threads setting which can work at the same time, and it maintains a queue of the request received. So, the request goes in the queue and if a thread is available it gets executed else, the request waits in the IIS queue till a thread is available
I think you should be using async IO/operations such as database calls in your case. Yes in Web Api, every request has its own thread, but threads can run out if there are many consecutive requests. Also threads use memory so if your api gets hit by too many request it may put pressure on your system.
The benefit of using async over sync is that you use your system resources wisely. Instead of blocking the thread while it is waiting for the database call to complete in sync implementation, the async will free the thread to handle more requests or assign it what ever process needs a thread. Once IO (database) call completes, another thread will take it from there and continue with the implementation. Async will also make your api run faster if your IO operations take longer to complete.
To be honest, your question is not very clear. If you are making an HTTP GET using HttpClient, say the GetAsync method, request is fired and you can do whatever you want in your thread until the time you get the response back. So, this request is asynchronous. If you are asking about the server side, which handles this request (assuming it is ASP.NET Web API), then asynchronous or not is up to how you implemented your web API. If your action method, does three things, say 1, 2, and 3 one after the other synchronously in blocking mode, the same thread is going to the service the request. On the other hand, say #2 above is a call to a web service and it is an HTTP call. Now, if you use HttpClient and you make an asynchronous call, you can get into a situation where one request is serviced by more than one thread. For that to happen, you should have made the HTTP call from your action method asynchronously and used async keyword. In that case, when you call await inside the action method, your action method execution returns and the thread servicing your request is free to service some other request and ultimately when the response is available, the same or some other thread will continue from where it was left off previously. Long boring answer, perhaps but difficult to explain just through words by typing, I guess. Hope you get some clarity.
UPDATE:
Your action method will execute in parallel in 10,000 threads (ideally). Why I'm saying ideally is because a CLR thread pool having 10,000 threads is not typical and probably impractical as well. There are physical limits as well as limits imposed by the framework as well but I guess the answer to your question is that the requests will be serviced in parallel. The correct term here will be 'parallel' but not 'async'.
Whether it is sync or async is your choice. You choose by the way to write your action. If you return a Task, and also use async IO under the hood, it is async. In other cases it is synchronous.
Don't feel tempted to slap async on your action and use Task.Run. That is async-over-sync (a known anti-pattern). It must be truly async all the way down to the OS kernel.
No framework can make sync IO automatically async, so it cannot happen under the hood. Async IO is callback-based which is a severe change in programming model.
This does not answer what you should do of course. That would be a new question.
I have a web service that receives requests from users and returns some json. I need to save the json string in the database so for the moment, the write query occurs before the response is sent back.
Is there a way to send the response first and then do the write query, after the response left the web service?
Thanks.
There's a couple of different options here - they all have tradeoffs, though, and would be pretty esoteric. You don't mention why you want to do this, so I'm guessing performance. If that's the case, I think you're barking up the wrong tree - a simple write is almost certainly not your performance problem.
So, off the top of my head:
Queuing, as Ragesh mentions, would be a nice approach. This gets you similar semantics of a transaction, while off loading the write. You still have to write to the queue, though, which may be about the same overhead as writing to the DB.
You could spawn a new thread (using either the ThreadPool or System.Threading.Thread - there's some debates about which is preferable in ASP.NET) to handle the write. This can generally work, but you may have issues with unhandled exceptions, app domain restarts, etc.
You could store the JSON data into a static or Application variable, then use a Timer to periodically write them to the DB. This will be multithreaded code, so you will need to synchronize read/writes to the collection.
Similar to #3, store the JSON data into Cache and use the invalidation callback to write to the DB.
Lots of variations on store somewhere (memory, disk, flat DB table, etc.), process later (ASP.NET, scheduled task, Windows Service, Sql Agent, etc.).
#frenchie says: a response starts by reading the json string from the db and ends with writing it back. In other words, if the user sends a request, the json string that's going to be read must be the one that was written in the previous response.
That complicates things, since inherent in async work is not knowing when something is done. If you require the async portion (writing back to the DB) to be done before handling the next request, you'll have to execute a wait to make sure it actually completed. In order to do that, you'll need to keep server side state on the client - not exactly a best practice as far as services go (though, it sounds like you're already doing that with these JSON request/response pairs).
Given the complications, I would make sure that you've done your profiling and determined it is indeed a performance problem.
You can do schedule a query work like
ThreadPool.QueueUserWorkItem(state =>
this.AsynchronousExecuteReference());
// and run
static void AsynchronousExecuteReference()
{
// run here your sql update
}
One other example using Thread inside an class and you can pass parameters to it.
public class RunThreadProcess
{
// Some parametres
public int cProductID;
// my thread
private Thread t = null;
// start it
public Thread Start()
{
t = new Thread(new ThreadStart(this.work));
t.IsBackground = true;
t.SetApartmentState(ApartmentState.MTA);
t.Start();
return t;
}
// actually work
private void work()
{
// do thread work
all parametres are available here
}
}
And here is how I run it
var OneAction = new RunThreadProcess();
OneAction.cProductID = 100;
OneAction.Start();
Do not worry about memory, CG knows that this process is used until the thread ends, so I have check it and CG not delete it and wait the thread to ends.
You should look at using message queues like MSMQ, ActiveMQ or RabbitMQ to do this. When you receive your request, you'll put the relevant data in to the queue, and send your response to the client. At the other end of the queue, you'll have some process that reads from the queue and inserts data in to your database.
this is missing the point of a request/response. unless you want to get into async commands like a service bus, but that's pub/sub, not request/response. the point of request/response is to do the work on the server after receiving the request and before sending the response. even if the work is sending an async message to a service bus.
You could try moving your web service URL to an ASPX page where the lifecycles come in to play.
In the code-behind, call your routine that does the main portion of the work in Page_Load or Page_Prerender (or whenever is appropriate prior to the response being sent) and then do your DB work in the Page_Unload event which occurs after the response has been sent (http://msdn.microsoft.com/en-us/library/ie/ms178472.aspx).
I'm setting up a web service in Axis2 whose job it will be to take a bunch of XML and put it on to a queue to be processed later. I understand its possible to set up a client to invoke a synchronous web service asynchronously by creating a using an "invokeNonBlocking" operation on the "Call" instance. (ref http://onjava.com/pub/a/onjava/2005/07/27/axis2.html?page=4)
So, my question is, is there any advantage to using an asynchronous web service in this case? It seems redundant because 1) the client isn't blocked and 2) the service has to accept and write the xml to queue regardless if it's synchronous or asynchronous
In my opinion, asynchronous is the appropriate way to go. A couple of things to consider:
Do you have multiple clients accessing this service at any given moment?
How often is this process occurring?
It does take a little more effort to implement the async methods. But I guarantee, in the end you will be much happier with the result. For one, you don't have to manage threading. Your primary concern might just be the volatility of the data in the que (i.e. race/deadlock conditions).
A "sync call" seems appropriate, I agree.
If the request from the client isn't time consuming, then I don't see the advantage either in making the call asynchronous. From what I understand of the situation in question here, the web-service will perform its "processing" against the request some time in the future.
If, on the contrary, the request had required a time consuming process, then an async call would haven been appropriate.
After ruminating some more about it, I'm thinking that the service should be asynchronous. The reason is that it would put the task of writing the data to the queue into a separate thread, thus lessening the chances of a timeout. It makes the process more complicated, but if I can avoid a timeout, then it's got to be done.