Firstly there's a great overview of the IIS7 HTTP request lifecycle and various settings that affect performance here:
ASP.NET Thread Usage on IIS 7.0 and 6.0
Very specifically though, in dotNet 4 the defaults for maxConcurrentRequestsPerCPU and requestsQueueLimit are set to 5000. E.g. equivalent to: (in aspnet.config):
<system.web>
<applicationPool
maxConcurrentRequestsPerCPU="5000"
maxConcurrentThreadsPerCPU="0"
requestQueueLimit="5000" /> (** see note below)
</system.web>
Seems to me that on a multi-CPU/core server the requestQueueLimit here will always be invoked well berfore the 'perCPU' limit. Thus, if a max of 5000 requests per CPU is what you actually want then I would expect that the requestQueueLimit needs to be increased to 5000 * CPUCount or just disabled altogether.
Is my interpretation correct? If so can I disable requestQueueLimit? (set it to zero?). The documentation on this setting doesn't appear to address this question (so maybe I'm missing something or misreading?)
** side note from the above article: The requestQueueLimit is poorly named. It actually limits the maximum number of requests that can be serviced by ASP.NET concurrently. This includes both requests that are queued and requests that are executing. If the "Requests Current" performance counter exceeds requestQueueLimit, new incoming requests will be rejected with a 503 status code)
***Is my interpretation correct?
Yes, if you want to execute more than 5000 requests concurrently, you'll need to increase the requestQueueLimit. The requestQueueLimit restricts the total number of requests in the system. Due to its legacy, it is actually the total number of requests in the system, and not the number of requests in some queue. It's goal is to prevent the server from toppling over due to lack of physical memory, virtual memory, etc. When the limit is reached, incoming requests will receive a quick 503 "Server Too Busy" response. By the way, the current number of requests in the system is exposed by the "ASP.NET\Requests Current" performance counter.
***can I disable requestQueueLimit? (set it to zero?)
You can effectively disable it by setting it to a large value, like 50000. You must set the value in the aspnet.config fileI doubt your server can handle 50000 concurrent requests, but if so, then double that. Setting it to zero will not disable it...oddly, it means no more than one request can execute concurrently.
By the way, it looks like there is a bug in v4. For integrated mode, it only successfully reads the value of requestQueueLimit if it is configured in the aspnet.config file as described on MSDN. For some reason, v4 was not reading it from machine.config when I experimented with it a little bit ago.
You might want to check this application's source code. IIS tuner is an open source application which optimizes IIS settings for better performance. On the other hand this page could be useful for your questions.
Related
I have a website that is frequently overloaded with multiple requests from thousands of clients. I cannot scale to infinity my servers and the application in current state is not possible to handle the traffic. For a better comfort I would like to let firstly the clients that started the transaction to complete it, and after that allow other clients to start the transaction. I am looking for a solution how to divide HTTP requests to two groups: the first requests that are able to finish the transaction and the others that should receive the 503 Server busy web page.
I can handle some amount of transactions concurrently. The rest transactions I would like to hold for a while with Server busy web page. I thought that I can use varnish for that. Bud I cannot think up the right condition in VCL for that.
I would like to find in varnish the number of current connections to the backend. If the current number of connections will be higher than some value (eg. 100) and the request didn't have a session cookie, the response will be 503 Server busy. If the number of connections will even greater than 100, but the session cookie exists, the requests will be passed to the backend.
AFAIK in varnish VCL I can get only the health of the backend (director) that should be true/false. But when backend is considered not healthy, the requests are not passed to it. When I use max_connections to the backend, all connections up to the limit will got 503 error.
Is there a way how to achive this behavior with varnish, ngingx, apache or any other tool?
Does your content have to be dynamic no matter what? I run a site that handles 3 to 4 million unique a day and use features like grace mode to handle invalidation.
Maybe another option is ESI, Edge Side Includes, that may help reduce load by caching everything that isn't dynamic.
When you set the httpRuntime executionTimeout, does that cut off a response that has already been partially sent to the client?
I've noticed in the IIS request queue that occasionally there are some requests that run for a lot longer than our executionTimeout setting and the state is SendResponse in the IIS Web Core module. Does the executionTimeout include the time that it takes to send the response to the client?
The only documentation that I could find is on the following page in this snip, but it is from 2003 for older versions of IIS:
https://msdn.microsoft.com/en-us/library/ms972959.aspx
Request Execution Time. The number of milliseconds taken to execute the last request. In version 1.0 of the Framework, the execution time begins when the worker process receives the request, and stops when the ASP.NET ISAPI sends HSE_REQ_DONE_WITH_SESSION to IIS. For IIS version 5, this includes the time taken to write the response to the client, but for IIS version 6, the response buffers are sent asynchronously, and so the time taken to write the response to the client is not included.
The IIS version is 7.5 on Windows Server 2008 R2 running an ASP.NET 4.5 web application and debug is set to false.
I apologize for not addressing your question directly, but there is something that might help with your problem:
There's a minBytesPerSecond setting for setting a lower bound on transmission speed in the <webLimits> section. I suspect it was originally added as a defense against SlowLoris, but you can bump the value up to kick out users whose pipe can't support your site. The default value is 240, which is pretty low.
I think I know what is happening here, but would appreciate a confirmation and/or reading material that can turn that "think" into just "know", actual questions at the end of post in Tl,DR section:
Scenario:
I am in the middle of testing my MVC application for a case where one of the internal components is stalling (timeouts on connections to our database).
On one of my web pages there is a Jquery datatable which queries for an update via ajax every half a second - my current task is to display correct error if that data requests times out. So to test, I made a stored procedure that asks DB server to wait 3 seconds before responding, which is longer than the configured timeout settings - so this guarantees a time out exception for me to trap.
I am testing in Chrome browser, one client. Application is being debugged in VS2013 IIS Express
Problem:
Did not expect the following symptoms to show up when my purposeful slow down is activated:
1) After launching the page with the rigged datatable, application slowed down in handling of all requests from the client browser - there are 3 other components that send ajax update requests parallel to the one I purposefully broke, and this same slow down also applied to any actions I made in the web application that would generate a request (like navigating to other pages). The browser's debugger showed the requests were being sent on time, but the corresponding break points on the server side were getting hit much later (delays of over 10 seconds to even a several minutes)
2) My server kept processing requests even after I close the tab with the application. I closed the browser, I made sure that the chrome.exe process is terminated, but breakpoints on various Controller actions were still getting hit for 20 minutes afterward - mostly on the actions that were "triggered" by automatically looping ajax requests from several pages I was trying to visit during my tests. Also breakpoints were hit on main pages I was trying to navigate to. On second test I used RawCap monitor the loopback interface to make sure that there was nothing actually making requests still running in the background.
Theory I would like confirmed or denied with an alternate explanation:
So the above scenario was making looped requests at a frequency that the server couldn't handle - the client datatable loop was sending them every .5 seconds, and each one would take at least 3 seconds to generate the timeout. And obviously somewhere in IIS express there has to be a limit of how many concurrent requests it is able to handle...
What was a surprise for me was that I sort of assumed that if that limit (which I also assumed to exist) was reached, then requests would be denied - instead it appears they were queued for an absolutely useless amount of time to be processed later - I mean, under what scenario would it be useful to process a queued web request half an hour later?
So my questions so far are these:
Tl,DR questions:
Does IIS Express (that comes with Visual Studio 2013) have a concurrent connection limit?
If yes :
{
Is this limit configurable somewhere, and if yes, where?
How does IIS express handle situations where that limit is reached - is that handling also configurable somewhere? ( i mean like queueing vs. immediate error like server is busy)
}
If no:
{
How does the server handle scenarios when requests are coming faster than they can be processed and can that handling be configured anywhere?
}
Here - http://www.iis.net/learn/install/installing-iis-7/iis-features-and-vista-editions
I found that IIS7 at least allowed unlimited number of silmulatneous connections, but how does that actually work if the server is just not fast enough to process all requests? Can a limit be configured anywhere, as well as handling of that limit being reached?
Would appreciate any links to online reading material on the above.
First, here's a brief web server 101. Production-class web servers are multithreaded, and roughly one thread = one request. You'll typically see some sort of setting for your web server called its "max requests", and this, again, roughly corresponds to how many threads it can spawn. Each thread has overhead in terms of CPU and RAM, so there's a very real upward limit to how many a web server can spawn given the resources the machine it's running on has.
When a web server reaches this limit, it does not start denying requests, but rather queues requests to handled once threads free up. For example, if a web server has a max requests of 1000 (typical) and it suddenly gets bombarded with 1500 requests. The first 1000 will be handled immediately and the further 500 will be queued until some of the initial requests have been responded to, freeing up threads and allowing some of the queued requests to be processed.
A related topic area here is async, which in the context of a web application, allows threads to be returned to the "pool" when they're in a wait-state. For example, if you were talking to an API, there's a period of waiting, usually due to network latency, between sending the request and getting a response from the API. If you handled this asynchronously, then during that period, the thread could be returned to the pool to handle other requests (like those 500 queued up requests from the previous example). When the API finally responded, a thread would be returned to finish processing the request. Async allows the server to handle resources more efficiently by using threads that otherwise would be idle to handle new requests.
Then, there's the concept of client-server. In protocols like HTTP, the client makes a request and the server responds to that request. However, there's no persistent connection between the two. (This is somewhat untrue as of HTTP 1.1. Connections between the client and server are sometimes persisted, but this is only to allow faster future requests/responses, as the time it takes to initiate the connection is not a factor. However, there's no real persistent communication about the status of the client/server still in this scenario). The main point here is that if a client, like a web browser, sends a request to the server, and then the client is closed (such as closing the tab in the browser), that fact is not communicated to the server. All the server knows is that it received a request and must respond, and respond it will, even though there's technically nothing on the other end to receive it, any more. In other words, just because the browser tab has been closed, doesn't mean that the server will just stop processing the request and move on.
Then there's timeouts. Both clients and servers will have some timeout value they'll abide by. The distributed nature of the Internet (enabled by protocols like TCP/IP and HTTP), means that nodes in the network are assumed to be transient. There's no persistent connection (aside from the same note above) and network interruptions could occur between the client making a request and the server responding to the request. If the client/server did not plan for this, they could simply sit there forever waiting. However, these timeouts are can vary widely. A server will usually timeout in responding to a request within 30 seconds (though it could potentially be set indefinitely). Clients like web browsers tend to be a bit more forgiving, having timeouts of 2 minutes or longer in some cases. When the server hits its timeout, the request will be aborted. Depending on why the timeout occurred the client may receive various error responses. When the client times out, however, there's usually no notification to the server. That means that if the server's timeout is higher than the client's, the server will continue trying to respond, even though the client has already moved on. Closing a browser tab could be considered an immediate client timeout, but again, the server is none the wiser and keeps trying to do its job.
So, what all this boils down is this. First, when doing long-polling (which is what you're doing by submitting an AJAX request repeatedly per some interval of time), you need to build in a cancellation scheme. For example, if the last 5 requests have timed out, you should stop polling at least for some period of time. Even better would be to have the response of one AJAX request initiate the next. So, instead of using something like setInterval, you could use setTimeout and have the AJAX callback initiate it. That way, the requests only continue if the chain is unbroken. If one AJAX request fails, the polling stops immediately. However, in that scenario, you may need some fallback to re-initiate the request chain after some period of time. This prevents bombarding your already failing server endlessly with new requests. Also, there should always be some upward limit of the time polling should continue. If the user leaves the tab open for days, not using it, should you really keep polling the server for all that time?
On the server-side, you can use async with cancellation tokens. This does two things: 1) it gives your server a little more breathing room to handle more requests and 2) it provides a way to unwind the request if some portion of it should time out. More information about that can be found at: http://www.asp.net/mvc/overview/performance/using-asynchronous-methods-in-aspnet-mvc-4#CancelToken
The following article by Thomas Marquardt describes how IIS handles ASP.Net requests, the max/min CLR worker threads/Managed IO threads that can be configured to run, the various requests queues involved and their default sizes.
Now as per the article, the following occurs in IIS 6.0:
ASP.NET picks up the requests from a IIS IO Thread and posts "HSE_STATUS_PENDING" to IIS IO Thread
The requests is handed over to a CLR Worker thread
If the requests are of high latency and all the threads are occupied (the thread count approaches httpRuntime.minFreeThreads), then the requests are posted to the Application level request queue (this queue is per AppDomain)
Also ASP.NET checks the number of concurrently executing requests. The article states that "if the number of concurrently executing requests is too high" it queues the incoming requests to a ASP.NET global request queue (this is per Worker Process) (Please check Update 2)
I want to know what is the "threshold value" at which point ASP.NET considers that the number of requests currently executing it too high and then starts queuing the requests to the global ASP.NET request queue?
I think this threshold will depend upon the configuration of max number of worker threads, but there might be some formula based on which ASP.NET will determine that the number of concurrently executing requests is too high and starts queuing the requests to the ASP.NET global request queue. What might this formula? or is this setting configurable?
Update
I read through the article again and in the comments sections I found this:
1) On IIS 6 and in IIS 7 classic mode, each application (AppDomain)
has a queue that it uses to maintain the availability of worker
threads. The number of requests in this queue increases if the number
of available worker threads falls below the limit specified by
httpRuntime minFreeThreads. When the limit specified by
httpRuntime appRequestQueueLimit is exceeded, the request is
rejected with a 503 status code and the client receives an
HttpException with the message "Server too busy." There is also an
ASP.NET performance counter, "Requests In Application Queue", that
indicates how many requests are in the queue. Yes, the CLR thread
pool is the one exposed by the .NET ThreadPool class.
2) The requestQueueLimit is poorly named. It actually limits the
maximum number of requests that can be serviced by ASP.NET
concurrently. This includes both requests that are queued and
requests that are executing. If the "Requests Current" performance
counter exceeds requestQueueLimit, new incoming requests will be
rejected with a 503 status code.
So essentially requestQueueLimit limits the number of requests that are queued (I am assuming it will sum the number of requests queued in Application queues plus the global ASP.Net request queue plus the number of requests currently executing) and are executing. All though this does not answer the original question, it does provide information about when we might receive a 503 server busy error because of high number of concurrent requests/high latency requests.
(Check update 2)
Update 2
There was a mistake in my part in the understanding. I had mixed up the descriptions for IIS 6 and IIS 7.
Essentially when ASP.NET is hosted on IIS 7.5 and 7.0 in integrated mode, the application-level queues are no longer present, ASP.NET maintains a global request queue.
So IIS 7/7.5 will start to queue requests to the global request queue if the number of executing requests is deemed high. The question applies more to IIS 7/7.5 rather than 6.
As far IIS 6.0 is concerned, there is no global ASP.NET request queue, but the following is true:
1. ASP.NET picks up the requests from a IIS IO Thread and posts "HSE_STATUS_PENDING" to IIS IO Thread
2. The requests is handed over to a CLR Worker thread
3. If the requests are of high latency and all the threads are occupied (the thread count approaches httpRuntime.minFreeThreads), then the requests are posted to the Application level request queue (this queue is per AppDomain)
4. Also ASP.NET checks the number of requests queued and currently executing before accepting a new request. If this number is greater than value specified by processModel.requestQueueLimit then incoming requests will be rejected with 503 server busy error.
This article might help to understand the settings a little better.
minFreeThreads: This setting is used by the worker process to queue all the incoming requests if the number of available threads in
the thread pool falls below the value for this setting. This setting
effectively limits the number of requests that can run concurrently to
maxWorkerThreads minFreeThreads. Set minFreeThreads to 88 * # of
CPUs. This limits the number of concurrent requests to 12 (assuming
maxWorkerThreads is 100).
Edit:
In this SO post, Thomas provides more detail and examples of request handling in the integrated pipeline. Be sure to read the comments on the answer for additional explanations.
A native callback (in webengine.dll) picks up request on CLR worker
thread, we compare maxConcurrentRequestsPerCPU * CPUCount to total
active requests. If we've exceeded limit, request is inserted in
global queue (native code). Otherwise, it will be executed. If it was
queued, it will be dequeued when one of the active requests completes.
Can I determine from an ASP.NET application the transfer rate, i.e. how many KB per second are transferd?
You can set some performance counters on ASP.NET.
See here for some examples.
Some specific ones that may help you figure out what you want are:
Request Bytes Out Total
The total size, in bytes, of responses sent to a client. This does not include standard HTTP response headers.
Requests/Sec
The number of requests executed per second. This represents the current throughput of the application. Under constant load, this number should remain within a certain range, barring other server work (such as garbage collection, cache cleanup thread, external server tools, and so on).
Requests Total
The total number of requests since the service was started.
There are a number of debugging tools you can use to check this at the browser. It will of course vary by page, cache settings, server load, network connection speed, etc.
Check out http://www.fiddlertool.com/fiddler/
Or if you are using Firefox, the FireBug add-in http://addons.mozilla.org/en-US/firefox/addon/1843