Let me provide a brief overview of the problem I'm struggling at the moment.
Assume we have an API endpoint (an async endpoint) which we are calling like this:
http POST 'http://localhost:8080/check_smth/?param1=param1¶m2=param2¶m3=null¶m4=null&reply_to=http://some_url:some_port/' \
Postman-Token:84dcfd8c-8e0a-438c-95ef-bc4d39809f35 \
cache-control:no-cache
Once we hit the API endpoint some processing starts under the hood, when it's done it will notify URL that we have provided as a value to
reply_to parameter
I'm wondering is there some built-in functionality in the Gatling tool that will allow me to specify some URL in request, that in turn will wait for that async process to finish?
If 1st item is not the case for the Gatling is it possible to poll somehow DB waiting for specific status and once it has changed, start processing another iteration of requests?
I'm sorry if it sounds silly and unprofessional, but I'm just getting started with Gatling. I will appreciate any thoughts on how to achieve/not achieve the desired result.
Thanks
I don't think there's any way to accept an incoming request, but you might be able to use the jdbc protocol to poll for a record to be present in the db
Related
So suppose I have an API to create a cloud instance asynchronously. So after I made an API call it will just return the success response, but the cloud instance will not been initialized yet. It will take 1-2 minutes to create cloud instance and after that it will save the cloud instance information (ex. ip, hostname, os) to db which mean I have to wait 1-2 minutes so I can fetch the data again to show cloud information. At first I try making a loading component, but the problem is that I don't know when the cloud instance is initialized (each instance has different time duration for creating). I'm considering using websocket or using cron or should I redesign my API? Has anyone design asynchronous system before how do you handle such a case.
If the API that you call gives you no information on when it's done with its asynchronous processing, it seems to me that you'll have to check at intervals until you find that the resource is ready; i.e. to poll it.
This seems to me to roughly fit the description and intent of the Polling Consumer pattern. In general, for asynchronous systems design, I can't recommend Enterprise Integration Patterns enough.
As other noted you can either have a notification channel using WebSockets or poll the backend. Personally I'd probably go with the latter for this case and would actually create several APIs, one for initiating the work and get back a URL with "job id" in it where the status of the job can be polled.
RESTfully that would look something like POST /instances to initiate a job GET /instances see all the instances that are running/created/stopped and GET /instances/<id> to see the status of a current instance (initiating , failed , running or whatever)
WebSockets would work, but might be an overkill for this use case. I would probably display a status of 'creating' or something similar after receiving the success response from the API call, and then start polling the API to see if the creation process has finished.
I'm working on an API (Pragmatic Rest API or very similar). I would like to know if it is possible to do an API request that will return a quick response (in JSON) and continue to process heavy code in background.
I suppose this is possible by using queue system but I have no idea where to start with this.
You can have your API delegate long running things to another process.
You mentioned queues, that's one way of doing things, all you need really is an application which can execute whatever long running tasks you have.
Let's imagine a simple system that can do this.
Your API receives a request to do something.
Instead of doing this something, the API writes one record into a database with the details of what needs to be done. Another app watches that table, sees a new record, runs the thing, updates the record with the status / result / whatever it needs.
On any requests from now on, the API can check the record and return whatever is there.
This is the simplest thing I can think of. You can easily do other things as well, talk to a queue system, send it data, let something else execute it.
Looking at your comments, what you are suggesting is not really a good way of building APIs. Why do I say this?
Well, let's say that you receive a request, the API starts a work thread and sends back a 200 to the client. Great the client knows work has started and how does it know when that process had ended and how does it receive whatever data it expects back?
Let's go a bit deeper next.
What happens when 1000 clients call that one endpoint and your API is attempting to start 1000 work threads? You've killed your API, no work gets done and no client gets anything.
This is why I suggest to delegate the work to something else, not the API. Let the API do what it does best, run quick things and return results and delegate other things to something else.
Can anyone clarify what is the purpose of using queue ?
What i understand is that a webhook is just a URL , you do a POST request to that URL and then do some stuff based on the body/data of the request. So why i need to queue the data and store it in a database then loop through the database again and perform the stuff.
The short answer is, you don't have to use a queue. A webhook is just an HTTP request (typically POST) notifying your application of some type of event. The reason you might want to consider a queue is because of typical issues you could run into.
One of these is because of response time back to the webhook requester (source). Many sources want a response (HTTP status 200) as quickly as possible so they can dequeue the request from their webhook system. If processing the webhook takes some time, a source will typically advise you to use a queue to defer the lengthier process asynchronous to the 200 response to the webhook.
Another possible reason could be for removing duplicate requests. There is no guarantee with webhooks that you will only receive a single request per event. A queue can be used to de-dupe these requests.
I would recommend you stick with a simple request handler if possible, then evolve a more sophisticated handler if you run into issues. Consider queues as a potential design approach if you run into issues like those above.
You need some way to prevent a conflict if the webhook is invoked multiple times very close together.
It doesn't necessarily have to be a queue, though. If the webhook performs database queries and updates, you can use a transaction to ensure that this is atomic for each invocation.
In this respect, it's little different from any other web utility. You should do something similar in scripts that process web forms.
I am yet to understand the behavior of web server thread, if I make an async call to say, a database, and immediately return response ( say OK ) to the client without even waiting for the async call to return back. First of all, is it a good approach ? What will happen to the thread which made the async call and if it is used again to serve another request and then the previous async call returns to this particular thread. Or does web server holds this thread waiting till the async call which it made, returns. Then the issue would be many hanging threads would be open as and web server would be available to take more requests. I am looking for an answer.
It depends on the way your HTTP servers works. But you should be very cautious.
Let's say you have a main event loop taking care of incoming HTTP connections, and workers threads which manage the HTTP communications.
A worker thread should be considered ready to accept a new HTTP request management only when it is effectively completly ready for that.
In terms of pure HTTP the more important thing is to avoid sending a response before having received the whole query. It seems simple, and it's usually the case. But if the query as a body, which may be a chunked body, it could take time to receive the whole message.
You should never send a response before, unless it's something like a 400 bad request response, followed by a real tcp/ip connection closing. If you fail to do so, and you have a message length parsing issue, the fact that you sent a response before the end of the query may lead to security problems. It could be used to exploit differences in the parsing of messages between your server and any other HTTP agent in front of your server (ssl terminator, reverse proxy, etc), in some sort of http smuggling issue. For this agent, if you made a response, it means you had the whole message, and it can send the next message, where you will in fact think this is just another part of the body.
Now if you have the whole message, you can decide to send an early response and detach an asynchronous task to really perform some sort of stuff. but this means:
you have to assume that no more output should be generated, you will not try to send any output to the request issuer, you should consider that the communication is now closed
the worker thread should not receive new requests to manage, and this is the hard part. If this thread is marked as available for a new request, it may also be killed by the thread manager (you have in Nginx or Apache request counters associated with workers, and they are killed after reaching a limit, to create fresh ones). it may also receive a gracefull reload command (usually it's a kill), etc.
So you start to enter a zone where you should know the internals of the HTTP server, which is maybe managed by you, or not, and where changes may appear sooner or later. And you start to make very strange things, which leads usually to strange issues, hard to reproduce.
Uausally the best way to handle asynchronous tasks, while still being able to understand what happen, is to use a messaging system. Put a list of tasks in queue, and get a parallel asynchronous worker process which does things with theses tasks. track status of theses tasks if you need it.
Same things may apply with the client, after receiving a very fast HTTP answer, it may need to perform some ajax status polling for the task status. And you will maybe only have to check the status of the task in the queue to send a response.
You will get more control on the whole thing.
For me I really dislike having detached threads, coming from strange code, performing heavy tasks without any way of outputing a status or reporting errors, and maybe preventing the nice application stop calls (still waiting for strange threads to join) which does not imply a killall.
It depends whether this asynchronous operation performs something which the client should be notified about.
If you return 200 OK (i.e. successfully completed) and later the asynchronous operation fails then the client will not know about the error.
You of course have some options like sending some kind of push notification over websocket or sending another request which would return the actual result and things like that. So basically depends on your needs...
Is there any way to achieve this?
My actual requirement is, I want to return success from my rest service as soon I get the data and perform a basic action on it and then I want to continue some more operations on the data.
I thought of two approaches
Threading - Currently I don't know how I will make it through threading.
Bulk update - I will schedule a task that will do all this processing after may be an hour or so.
But I am not very sure how should I start implementing this.
Any help?
In the context of HTTP we have to remember that once we finish the response, the conversation is over. There is really no way to keep the request alive after you have given the client a response. You would need to implement some worker process that runs outside of HTTP that processes things "queued up" or "triggered" by an HTTP request.