I have an application that retrieves data from a specific database and through a SOA client sends this data to the integration, I have several threads sending instantiating this client and sending this data in parallel. However, the amount of submissions is being limited to 1,000,000 per hour, so when I reach this limit, I will have to send the registrations in the next submission and so on. What implementation/technology can I use to ensure that all records are submitted?
Sounds like any persistent queue would be helpful here. I'd make it so that all the request behave the same, that is the server would only reply with a place to get the data (or the client would give a callback to where the data should be sent) and all the server will do on request would be to queue the request and return the next step. It can then have separate process read from the queue and process the requests at whichever way makes sense
Related
I would like to understand the internals of Session~setMaxInactiveInterval
I understand that if HTTP requests are not received within the said interval then the session is cleared. All the objects belonging to the session will be gone.
Does that also mean that the existing requests part of the session will be terminated ?
I have scenario for large file transfer to happen over a single request.
So if one request (A) is long running and there are no other requests sent within the time interval. Then will the request A will be terminated ?
Best Regards,
Saurav
Can anyone clarify what is the purpose of using queue ?
What i understand is that a webhook is just a URL , you do a POST request to that URL and then do some stuff based on the body/data of the request. So why i need to queue the data and store it in a database then loop through the database again and perform the stuff.
The short answer is, you don't have to use a queue. A webhook is just an HTTP request (typically POST) notifying your application of some type of event. The reason you might want to consider a queue is because of typical issues you could run into.
One of these is because of response time back to the webhook requester (source). Many sources want a response (HTTP status 200) as quickly as possible so they can dequeue the request from their webhook system. If processing the webhook takes some time, a source will typically advise you to use a queue to defer the lengthier process asynchronous to the 200 response to the webhook.
Another possible reason could be for removing duplicate requests. There is no guarantee with webhooks that you will only receive a single request per event. A queue can be used to de-dupe these requests.
I would recommend you stick with a simple request handler if possible, then evolve a more sophisticated handler if you run into issues. Consider queues as a potential design approach if you run into issues like those above.
You need some way to prevent a conflict if the webhook is invoked multiple times very close together.
It doesn't necessarily have to be a queue, though. If the webhook performs database queries and updates, you can use a transaction to ensure that this is atomic for each invocation.
In this respect, it's little different from any other web utility. You should do something similar in scripts that process web forms.
I have an application where requests to a controller will take a while to process. The controller starts a thread per request and eventually returns some data to a database. I need to limit how many requests can be processed. So let's say our limit is 100, if the controller is already processing 100 requests, the 101st request will return 503 status till at least one request is completed.
I could use an application wide static counter to keep count of current processes but is there a better way to do this ?
EDIT:
The reason why the controller takes a while to respond is because the controller calls another API, which is a large database spanning several TB of geostationary data. Even if I could optimize this in theory, its not something I have control over. To make matters worse, the third party API simply times out if I have more than 10 concurrent requests. I am already dropping incoming requests to a servicebus queue. I just need a good way on my api controller to keep a global count of how many requests are coming in and returning 503 whenever it exceeds a set number of requests.
The requests to the API controller should not be limited. An idea would be to take requests and store the list of processes that need completing (database, queue etc)
Then create something outside the web request that processes this work, this is where you can manage how many are processed at once using parallel processing/multi-threading etc. (using windows service /Worker Role / Hangfire etc)
Once processed, you could communicate back to the page via SignalR to then get the data required to display once processed, or show status.
The benefit of this is that you can always go back to the page or refresh and get some kind of status, without re-running the whole process.
I am yet to understand the behavior of web server thread, if I make an async call to say, a database, and immediately return response ( say OK ) to the client without even waiting for the async call to return back. First of all, is it a good approach ? What will happen to the thread which made the async call and if it is used again to serve another request and then the previous async call returns to this particular thread. Or does web server holds this thread waiting till the async call which it made, returns. Then the issue would be many hanging threads would be open as and web server would be available to take more requests. I am looking for an answer.
It depends on the way your HTTP servers works. But you should be very cautious.
Let's say you have a main event loop taking care of incoming HTTP connections, and workers threads which manage the HTTP communications.
A worker thread should be considered ready to accept a new HTTP request management only when it is effectively completly ready for that.
In terms of pure HTTP the more important thing is to avoid sending a response before having received the whole query. It seems simple, and it's usually the case. But if the query as a body, which may be a chunked body, it could take time to receive the whole message.
You should never send a response before, unless it's something like a 400 bad request response, followed by a real tcp/ip connection closing. If you fail to do so, and you have a message length parsing issue, the fact that you sent a response before the end of the query may lead to security problems. It could be used to exploit differences in the parsing of messages between your server and any other HTTP agent in front of your server (ssl terminator, reverse proxy, etc), in some sort of http smuggling issue. For this agent, if you made a response, it means you had the whole message, and it can send the next message, where you will in fact think this is just another part of the body.
Now if you have the whole message, you can decide to send an early response and detach an asynchronous task to really perform some sort of stuff. but this means:
you have to assume that no more output should be generated, you will not try to send any output to the request issuer, you should consider that the communication is now closed
the worker thread should not receive new requests to manage, and this is the hard part. If this thread is marked as available for a new request, it may also be killed by the thread manager (you have in Nginx or Apache request counters associated with workers, and they are killed after reaching a limit, to create fresh ones). it may also receive a gracefull reload command (usually it's a kill), etc.
So you start to enter a zone where you should know the internals of the HTTP server, which is maybe managed by you, or not, and where changes may appear sooner or later. And you start to make very strange things, which leads usually to strange issues, hard to reproduce.
Uausally the best way to handle asynchronous tasks, while still being able to understand what happen, is to use a messaging system. Put a list of tasks in queue, and get a parallel asynchronous worker process which does things with theses tasks. track status of theses tasks if you need it.
Same things may apply with the client, after receiving a very fast HTTP answer, it may need to perform some ajax status polling for the task status. And you will maybe only have to check the status of the task in the queue to send a response.
You will get more control on the whole thing.
For me I really dislike having detached threads, coming from strange code, performing heavy tasks without any way of outputing a status or reporting errors, and maybe preventing the nice application stop calls (still waiting for strange threads to join) which does not imply a killall.
It depends whether this asynchronous operation performs something which the client should be notified about.
If you return 200 OK (i.e. successfully completed) and later the asynchronous operation fails then the client will not know about the error.
You of course have some options like sending some kind of push notification over websocket or sending another request which would return the actual result and things like that. So basically depends on your needs...
I have to create a Java EE application which converts large documents into different formats. Each conversion takes between 10 seconds and 2 minutes.
The SOAP requests will be made from a client application which I also have to create.
What's the best way to handle these long running requests? Clearly the process takes to much time to run without any feedback to the user.
I can think of the following ways to provide some kind of feedback, but I'm not sure if there isn't a better way, perhaps something standardized.
The client performs the request from a thread and the server sends the document in the response, which can take a few minutes. Until then the client shows a "Please wait" message, progress spinner, etc. (This seems to be simple to implement.)
The client sends a "Start conversion" command. The server returns some kind of job ID which the client can use to frequently poll for a status update or the final document. (This seems to be user friendly, because I can display a progress, but also requires the server to be stateful.)
The client sends a "Start conversion" command. The server somehow notifies the client when it is done. (Here I don't even know how to do this)
Are there other approaches? Which one is the best in terms of performance, stability, fault tolerance, user-friendliness, etc.?
Thank you for your answers.
Since this almost all done server-side, there isn't much a client can do besides poll the server somehow for updates on the status.
#1 is OK, but users get impatient really fast. "A few minutes" is a bit too long for most people. You'd need HTTP Streaming to implement #3, but I think that's overkill.
I would just go with #2.
For 3 the server should return a unique ID back to the client and using that ID the client has to ask the server the result at a later time
option 4 for those desiring to use web sockets
you request will be response with a jobId,
you get progress state over the web soket