I have been facing the below exception in my application while calling another service. Upon investigation I was told that the waiting time of the requests (waiting to find an open connection), sitting in threadpool queue is reaching the max wait time and hence they are being rejected. As an action I got to tune in the total number of max connections(http.maxTotalConnections) that can be served by my application concurrently to avoid the long wait time for requests.
Our current http.maxTotalConnections value is set to 100.
Below is the error:
exception=reactor.netty.internal.shaded.reactor.pool.PoolAcquireTimeoutException:
Pool#acquire(Duration) has been pending for more than the configured
timeout of 50ms
I have below questions:
How does one decide the right value for http.maxTotalConnections ?
How do we decide the right balance between the apt total connections vs to a point
where it instead starts affecting the application performance.
Related
I'm monitoring my application health in SolarWinds, the strange thing I'm noticing that the Request Wait Time Counter is steady. It is not changing for quite a few hours even though there are no requests in the queue and also the Requests/Sec is not that much. Is there any specific reason for the same? What can be the possible reason?
Request Wait Time
The number of milliseconds that the most recent request waited in the queue for processing.
https://msdn.microsoft.com/en-us/library/fxk122b4.aspx
In other words: This value should remain the same until IIS processes another request.
Credit: https://serverfault.com/a/579180
Is there a time out for a http request which is kept in the IIS request queue?
If there is a time out, what will happens if a request stayed longer time in the IIS request queue ?
a - Does it discards or execute by the server when threads available?
Good question, I'm surprised it's infinite by default, as a surge would overload IIS with requests (up to the limit, which is 3000 by default).
If you have a well tuned application, I would say 1-3 seconds is a good range. Users typically don't wait longer than a second anyway, they'll hit refresh. In my case I have a dinosaur with all kinds of clunky reports so have set to 30 seconds.
I am running Windows 8.1 and I have an integration test suite that leverages HostableWebCore to spin up isolated ASP.NET web server processes. For performance reasons, I am launching 8 of these at a time and once they are started up I send a very simple web request to each, which is handled by an MVC application loaded into each. Every instance is listening on a different port.
The problem is that the requests are getting held up (I believe) in HTTP.sys (or whatever it is called these days). If I look at fiddler, I can see all 8 requests immediately (within a couple milliseconds) hit the ServerGotRequest state. However, the requests sit in this state for 20-100 seconds, depending on how many I run in parallel at a time.
The reason I suspect this is HTTP.sys problem is because the amount of time I have to wait for any of them to respond increases with the number of hosting applications I spin up in parallel. If I only launch a single hosting application, it will start responding in ~20 seconds. If I spin up 2 they will both start responding in ~30 seconds. If I spin up 4, ~40 seconds. If I spin up 8, ~100 seconds (which is default WebClient request timeout).
Because of this long delay, I have enough time to attach a debugger and put a breakpoint in my controller action and that breakpoint will be hit after the 20-100 second delay, suggesting that my process hasn't yet received the request. All of the hosts are sitting idle for those 20-100 seconds after ~5-10 seconds of cold start CPU churning. All of the hosts appear to receive the requests at the same time, as if something was blocking any request from going through and then all of a sudden let everything through.
My problem is, I have been unable to locate any information related to how one can debug HTTP.sys. How can I see what it is doing? What is causing the block? Why is it waiting to forward on the requests to the workers? Why do they all come through together?
Alternatively, if someone has any idea how I can work around this and get the requests to come through immediately (without the waiting) I would very much appreciate it.
Another note: I can see System (PID 4) immediately register to listen on the port I have specified as soon as the hosting applications launch.
Additional Information:
This is what one of my hosting apps looks like under netsh http show servicestate
Server session ID: FD0000012000004C
Version: 2.0
State: Active
Properties:
Max bandwidth: 4294967295
Timeouts:
Entity body timeout (secs): 120
Drain entity body timeout (secs): 120
Request queue timeout (secs): 120
Idle connection timeout (secs): 120
Header wait timeout (secs): 120
Minimum send rate (bytes/sec): 150
URL groups:
URL group ID: FB00000140000018
State: Active
Request queue name: IntegrationTestAppPool10451{974E3BB1-7774-432B-98DB-99850825B023}
Properties:
Max bandwidth: inherited
Max connections: inherited
Timeouts:
Timeout values inherited
Logging information:
Log directory: C:\inetpub\logs\LogFiles\W3SVC1
Log format: 0
Number of registered URLs: 2
Registered URLs:
HTTP://LOCALHOST:10451/
HTTP://*:10451/
Request queue name: IntegrationTestAppPool10451{974E3BB1-7774-432B-98DB-99850825B023}
Version: 2.0
State: Active
Request queue 503 verbosity level: Basic
Max requests: 1000
Number of active processes attached: 1
Controller process ID: 12812
Process IDs:
12812
Answering this mainly for posterity. Turns out that my problem wasn't HTTP.sys but instead it was ASP.NET. It opens up a shared lock when it tries to compile files. This shared lock is identified by System.Web.HttpRuntime.AppDomainAppId. I believe that since all of my apps are built dynamically from a common applicationHost.config file, they all have the same AppDomainAppId (/LM/W3SVC/1/ROOT). This means they all share a lock and effectively all page compilation happens sequentially for all of the apps. However, due to the nature of coming/going from the lock all of the pages tend to finish at the same time because it is unlikely that any of them will get to the end of the process in a timely fashion, causing them all to finish around the same time. Once one of them makes it through, others are likely close behind and finish just after.
I have a message driven bean which serves messages in a following way:
1. It takes data from incoming message.
2. Calls external service via HTTP (literally, sends GET requests using HttpURLConnection), using the data from step 1. No matter how long the call takes - the message MUST NOT be dropped.
3. Uses the outcome from step 2 to persist data (using entity beans).
Rate of incoming messages is:
I. Low most of the time: an order of units / tens in a day.
II. Sometimes high: order of hundreds in a few minutes.
QUESTION:
Having that service in step (2) is relatively slow (20 seconds per request and degrades upon increasing workload), what is the best way to deal with situation II?
WHAT I TRIED:
1. Letting MDB to wait until service is executed, no matter how long it takes. This tends to rollback MDB transactions by timeout and to re-deliver message, increasing workload and making things even worse.
2. Setting timeout for HttpURLConnection gives some guarantees in terms of completion time of MDB onMessage() method, but leaves an open question: how to proceed with 'timed out' messages.
Any ideas are very much appreciated.
Thank you!
In that case you can just increase a transaction timeout for your message driven beans.
This is what I ended up with (mostly, this is application server configuration):
Relatively short (comparing to transaction timeout) timeout for HTTP call. The
rationale: long-running transactions from my experience tend to
have adverse side effects such as threads which are "hung" from app.
server point of view, or extra attention to database configuration,
etc.I chose 80 seconds as timeout value.
Increased up to several minutes re-delivery interval for failed
messages.
Careful adjustment of the number of threads which handle messages
simultaneously. I balanced this value with throughput of HTTP service.
Say I have script, that does long polling on server to check if user has any new mesages. Server side would be something like this
while counter < 5
if something_changed
push_changes_to_client
break
else
counter++
sleep 5
Which checks database 5 times and every time if there is no change, it waits 5s untill next check, which results in maximum execution time of about 25s.
What happens when client moves from one page to another really fast? I suppose the server script keep on running even after client move to different page, where it sends another request for changes.
Does this mean, that when lot of people are moving quickly around the site (less than the 25s max execution on each page), then the server has to keep running all the scripts, that are trying to respond to page that doesn't exist any more? Wouldn't this cause the server to use all of it's thread pool pretty fast?
In a thread-per-connection model with synchronous sleep calls, this indeed may tie up a large number of threads. However, if the "sleep" simply schedules a callback and returns, the thread pool logjam can be avoided.