I am trying to switch my application to the async version from apache http-components client . The goal is to be able to handle more outbound connections (in the near future).
The payload of the requests is quite small (<5KB)
The endpoints I hit are around 20 in number.
With sync version of apache http client, the through put is about 200 requests/sec.
The average response time is about 100ms/request.
I abort the requests after a max of 180ms.
After switching to Async, the response time went up by 20ms/request.
The throughput also reduced to 160/sec. The number of aborted requests doubled.
This is after fine tuning the application a lot.
Is there anything I can do to improve the performance of async client?
I set maxConnectionsPerRoute high. Have a large Connection pool.
Are there any params that are key to getting the most out of async client?
Did you forget to set maxConnTotal?
The default maxConnTotal is 20, this is a global limit.
I forgot to set it once.
Related
I am using grpc for bidirection streaming service. Beause this is an online real-time service, and we don't want our clients to wait too long, so we want to limit the rpcs to a certain number and reject the extra requests.
We are able to do it in Python with following code:
grpcServer = grpc.server(futures.ThreadPoolExecutor(max_workers=10), maximum_concurrent_rpcs=10)
In such case, only 10 requests will be processed, and other request will be rejected, the clients will throw a RESOURCE_EXHAUSTED error.
However, we find it hard to do it in C++. We use code following grpc sync server limit handle thread
ServerBuilder builder;
builder.SetSyncServerOption(ServerBuilder::SyncServerOption::NUM_CQS, 10);
grpc::ResourceQuota quota;
quota.SetMaxThreads(10 * 2);
builder.SetResourceQuota(quota);
We are using grpc in many services. Some using kaldi, some using libtorch.
In some cases, the above code behaves normally, it processes 10 requests at a time, and rejects other requests. and the processing speed (our service requires a log of cpu calculation) is ok.
In some cases, it only accepts 9 request at a time.
In some cases, it accepts 10 request at a time, but the processing speed is significantly lower than before.
We also tried
builder.AddChannelArgument(GRPC_ARG_MAX_CONCURRENT_STREAMS, 10);
But it is useless becuase GRPC_ARG_MAX_CONCURRENT_STREAMS is the max_concurrent_rpcs on a single http connection.
Could someone please point out the equivalent C++ version of the Python code? We want to make our service to handle 10 requests at a time, and rejects other request, we do not want any service to wait in the queue.
Say I have a webserivce used internally by other webservices with an average response time of 1 minute.
What are the pros and cons of such a service with "synchronous" responses versus making the service return id of the request, process it in the background and make the clients poll for results?
Is there any cons with HTTP connections which stay active for more than one minute? Does the default keep alive of TCP matters here?
Depending on your application it may matter. Couple of things worth mentioning are !
HTTP protocol is sync
There is very wide misconception that HTTP is async. Http is synchronous protocol but your client could deal it async. E.g. when you call any service using http, your http client may schedule is on the background thread (async). However The http call will be waiting until either it's timeout or response is back , during all this time the http call chain is awaiting synchronously.
Sockets
Since HTTP uses socket and there is hard limit on sockets. Every HTTP connection (if created new every time) opens up new socket . if you have hundreds of requests at a time you can image how many http calls are scheduled synchronously and you may run of sockets. Not sure for other operation system but on windows even if you are done with request sockets they are not disposed straight away and stay for couple of mins.
Network Connectivity
Keeping http connection alive for long is not recommended. What if you loose network partially or completely ? your http request would timeout and you won't know the status at all.
Keeping all these things in mind it's better to schedule long running tasks on background process.
If you keep the user waiting while your long job is running on server, you are tying up a valuable HTTP connection while waiting.
Best practice from RestFul point of view is to reply an HTTP 202 (Accepted) and return a response with the link to poll.
If you want to hang the client while waiting, you should set a request timeout at the client end.
If you've some Firewalls in between, that might drop connections if they are inactive for some time.
Higher Response Throughput
Typically, you would want your OLTP (Web Server) to respond quickly as possible, Since your queuing the task on the background, your web server can handle more requests which results to higher response throughput and processing capabilities.
More Memory Friendly
Queuing long running task on background jobs via messaging queues, prevents abusive usage of web server memory. This is good because it will increase the Out of memory threshold of your application.
More Resilient to Server Crash
If you queue task on the background and something goes wrong, the job can be queued to a dead-letter queue which helps you to ultimately fix problems and re-process the request that caused your unhandled exceptions.
I think I know what is happening here, but would appreciate a confirmation and/or reading material that can turn that "think" into just "know", actual questions at the end of post in Tl,DR section:
Scenario:
I am in the middle of testing my MVC application for a case where one of the internal components is stalling (timeouts on connections to our database).
On one of my web pages there is a Jquery datatable which queries for an update via ajax every half a second - my current task is to display correct error if that data requests times out. So to test, I made a stored procedure that asks DB server to wait 3 seconds before responding, which is longer than the configured timeout settings - so this guarantees a time out exception for me to trap.
I am testing in Chrome browser, one client. Application is being debugged in VS2013 IIS Express
Problem:
Did not expect the following symptoms to show up when my purposeful slow down is activated:
1) After launching the page with the rigged datatable, application slowed down in handling of all requests from the client browser - there are 3 other components that send ajax update requests parallel to the one I purposefully broke, and this same slow down also applied to any actions I made in the web application that would generate a request (like navigating to other pages). The browser's debugger showed the requests were being sent on time, but the corresponding break points on the server side were getting hit much later (delays of over 10 seconds to even a several minutes)
2) My server kept processing requests even after I close the tab with the application. I closed the browser, I made sure that the chrome.exe process is terminated, but breakpoints on various Controller actions were still getting hit for 20 minutes afterward - mostly on the actions that were "triggered" by automatically looping ajax requests from several pages I was trying to visit during my tests. Also breakpoints were hit on main pages I was trying to navigate to. On second test I used RawCap monitor the loopback interface to make sure that there was nothing actually making requests still running in the background.
Theory I would like confirmed or denied with an alternate explanation:
So the above scenario was making looped requests at a frequency that the server couldn't handle - the client datatable loop was sending them every .5 seconds, and each one would take at least 3 seconds to generate the timeout. And obviously somewhere in IIS express there has to be a limit of how many concurrent requests it is able to handle...
What was a surprise for me was that I sort of assumed that if that limit (which I also assumed to exist) was reached, then requests would be denied - instead it appears they were queued for an absolutely useless amount of time to be processed later - I mean, under what scenario would it be useful to process a queued web request half an hour later?
So my questions so far are these:
Tl,DR questions:
Does IIS Express (that comes with Visual Studio 2013) have a concurrent connection limit?
If yes :
{
Is this limit configurable somewhere, and if yes, where?
How does IIS express handle situations where that limit is reached - is that handling also configurable somewhere? ( i mean like queueing vs. immediate error like server is busy)
}
If no:
{
How does the server handle scenarios when requests are coming faster than they can be processed and can that handling be configured anywhere?
}
Here - http://www.iis.net/learn/install/installing-iis-7/iis-features-and-vista-editions
I found that IIS7 at least allowed unlimited number of silmulatneous connections, but how does that actually work if the server is just not fast enough to process all requests? Can a limit be configured anywhere, as well as handling of that limit being reached?
Would appreciate any links to online reading material on the above.
First, here's a brief web server 101. Production-class web servers are multithreaded, and roughly one thread = one request. You'll typically see some sort of setting for your web server called its "max requests", and this, again, roughly corresponds to how many threads it can spawn. Each thread has overhead in terms of CPU and RAM, so there's a very real upward limit to how many a web server can spawn given the resources the machine it's running on has.
When a web server reaches this limit, it does not start denying requests, but rather queues requests to handled once threads free up. For example, if a web server has a max requests of 1000 (typical) and it suddenly gets bombarded with 1500 requests. The first 1000 will be handled immediately and the further 500 will be queued until some of the initial requests have been responded to, freeing up threads and allowing some of the queued requests to be processed.
A related topic area here is async, which in the context of a web application, allows threads to be returned to the "pool" when they're in a wait-state. For example, if you were talking to an API, there's a period of waiting, usually due to network latency, between sending the request and getting a response from the API. If you handled this asynchronously, then during that period, the thread could be returned to the pool to handle other requests (like those 500 queued up requests from the previous example). When the API finally responded, a thread would be returned to finish processing the request. Async allows the server to handle resources more efficiently by using threads that otherwise would be idle to handle new requests.
Then, there's the concept of client-server. In protocols like HTTP, the client makes a request and the server responds to that request. However, there's no persistent connection between the two. (This is somewhat untrue as of HTTP 1.1. Connections between the client and server are sometimes persisted, but this is only to allow faster future requests/responses, as the time it takes to initiate the connection is not a factor. However, there's no real persistent communication about the status of the client/server still in this scenario). The main point here is that if a client, like a web browser, sends a request to the server, and then the client is closed (such as closing the tab in the browser), that fact is not communicated to the server. All the server knows is that it received a request and must respond, and respond it will, even though there's technically nothing on the other end to receive it, any more. In other words, just because the browser tab has been closed, doesn't mean that the server will just stop processing the request and move on.
Then there's timeouts. Both clients and servers will have some timeout value they'll abide by. The distributed nature of the Internet (enabled by protocols like TCP/IP and HTTP), means that nodes in the network are assumed to be transient. There's no persistent connection (aside from the same note above) and network interruptions could occur between the client making a request and the server responding to the request. If the client/server did not plan for this, they could simply sit there forever waiting. However, these timeouts are can vary widely. A server will usually timeout in responding to a request within 30 seconds (though it could potentially be set indefinitely). Clients like web browsers tend to be a bit more forgiving, having timeouts of 2 minutes or longer in some cases. When the server hits its timeout, the request will be aborted. Depending on why the timeout occurred the client may receive various error responses. When the client times out, however, there's usually no notification to the server. That means that if the server's timeout is higher than the client's, the server will continue trying to respond, even though the client has already moved on. Closing a browser tab could be considered an immediate client timeout, but again, the server is none the wiser and keeps trying to do its job.
So, what all this boils down is this. First, when doing long-polling (which is what you're doing by submitting an AJAX request repeatedly per some interval of time), you need to build in a cancellation scheme. For example, if the last 5 requests have timed out, you should stop polling at least for some period of time. Even better would be to have the response of one AJAX request initiate the next. So, instead of using something like setInterval, you could use setTimeout and have the AJAX callback initiate it. That way, the requests only continue if the chain is unbroken. If one AJAX request fails, the polling stops immediately. However, in that scenario, you may need some fallback to re-initiate the request chain after some period of time. This prevents bombarding your already failing server endlessly with new requests. Also, there should always be some upward limit of the time polling should continue. If the user leaves the tab open for days, not using it, should you really keep polling the server for all that time?
On the server-side, you can use async with cancellation tokens. This does two things: 1) it gives your server a little more breathing room to handle more requests and 2) it provides a way to unwind the request if some portion of it should time out. More information about that can be found at: http://www.asp.net/mvc/overview/performance/using-asynchronous-methods-in-aspnet-mvc-4#CancelToken
Our front-end MVC3 web application is using AsyncController, because each of our instances is servicing many hundreds of long-running, IO bound processes.
Since Azure will terminate "inactive" http sessions after some pre-determined interval (which seems to vary depending up on what website you read), how can we keep the connections alive?
Our clients MUST stay connected, and our processes will run from 30 seconds to 5 minutes or more. How can we keep the client connected/alive? I initially thought of having a timeout on the Async method, and just hitting the Response object with a few bytes of output, sort of like chunking the response, and then going back and waiting some more. However, I don't think this will work, since MVC3 is handling the hookup of an IIS thread back to the asynchronous response, which will have already rendered a view at that time.
How can we run a really long process on an AsyncController, but have the client not be disconnected by the Azure Load Balancer? Sending an immediate response to the caller, and asking that caller to poll or check another resource URL is not acceptable.
Azure load balancer idle time-out is 4 minutes. Can you try to configure TCP keep-alive on the client side for less than 4 minutes, that should keep the connection alive?
On the other hand, it's pretty expensive to keep a connection open per client for a long time. This will limit the number of clients you can handle per server. Also, I think IIS may still decide to close a connection regardless of keep-alives if it thinks it need the connection to serve other requests.
I have a server application which runs in the Amazon EC2 cloud. From my client (the browser) I make a HTTP request which uploads a file to the server which then processes the file. If there is a lot of processing (large file
), the server always times out with a 504 backend continuation error always exactly after 120 seconds. Though I get this error, the server continues to process the request and completes it (verified by checking the database) but I cannot see the final result on my client because of the timeout.
I am clueless as to why this is happening. Has anyone faced a similar 504 timeout ? Is there some intermediate proxy server not in my control which is timing out ?
I have a similar problem and in my case I believe it is due to the connection between the Elastic Load Balancer (ELB) and the EC2 instance.
For a long-term solution I will go with the 303 Status response + back-end processing suggested by james.garriss above.
For short-term solution it may be possible for Amazon support to increase the ELB timeout (see their response in https://forums.aws.amazon.com/thread.jspa?messageID=491594񸁊). Unfortunately there doesn't seem to be any way to change the timeout yourself through either API or console.
[Update] AWS now does allow you to update the idle timeout either through console, CLI or .ebextensions configuration. See http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/config-idle-timeout.html (thanks #Daniel Patz for the update)
Assuming that the correct status code is being returned, the problem is that an intermediate proxy is timing out. "The server, while acting as a gateway or proxy, did not receive a timely response from the upstream server specified by the URI." (http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.5) It most likely indicates that the origin server is having some sort of issue (i.e., taking a long time to process your request), so it's not responding quickly.
Perhaps the best solution is to re-craft your server app so that it responds with a "303 See Other" status code; then your client can retrieve the data at a later data point, once the server is done processing and creates the final result.
Edit: Another idea is to re-craft your server app so that it responds with a "413 Request Entity Too Large" status code when the request entity size is too large. This will get rid of the error, though it may make your app less useful if it can only process "small" files."
Other possible solutions:
Increase timeout value of the proxy (if it's under your control)
Make your request to a different server (if there's another, faster server with the same app)
Make your request differently (if possible) such that you are sending less data at a time
it is possible that the browser timeouts during the script execution.