Cannot Create the Desired Throughput on the IIS Server - asp.net

In short, I am trying to do a load test. But I cannot create the desired throughput on the IIS server (Windows Server 2016 Datacenter) even though there seems to be no bottleneck in terms of cpu, memory, disk or network.
Here is my configuration:
IIS Server: 16 vCPU, 32GB memory
SQL Server: 4 vCPU, 8GB memory
Test Server (sending the requests): 8 vCPU, 16GB memory
In order to remove concurrency limits on the IIS server, I did the following changes:
<serverRuntime appConcurrentRequestLimit="1000000" />
<applicationPool
maxConcurrentRequestsPerCPU="1000000"
maxConcurrentThreadsPerCPU="0"
requestQueueLimit="1000000" />
Default Application Pool Queue Length: 65000
<processModel minWorkerThreads="5000">
I have created a WPF application that creates the desired number of concurrent requests towards the IIS server using HttpClient and deployed it on the test server. (I changed the service point default connection limit to 1000000 as well.) And I tested with 5000 requests which all returned 200 OK.
Normally, one request returns in 20ms. And here are the results of the test I obtained in the WPF application:
Total time starting from sending the first request through getting the last response: 9380ms
Average response time : 3919ms
Max. response time: 7243ms
Min. response time: 77ms
When I look at the performance counters on the test server, I see that 5000 requests completed in about 3 seconds. Here is the graph I obtained from perfmon:
But when I look at the performance counters on the IIS server, I see that requests are continually received and executed during the course of 9 seconds. So, the average throughput observed is about 400 requests per second. I also tried the test with 10000 requests but the average throughput is always around 400 req/sec.
Why doesn't ASP.NET complete receiving all the requests at the end of the first 3 seconds? How can I increase throughput to any desired value so that I can conduct a proper load test?

After a lot of experimenting, I found out that any value over 2000 for minWorkerThreads seem to be ignored. I checked it using the ThreadPool.GetMinThreads method. And I also added the maxWorkerThreads value of 2100 as #StephenCleary suggested. With these values, the problem disappeared. But the strange thing is that, I have not seen such a limitation on the minWorkerThreads value in any of the MS documentations.

Related

IIS https requests are 4-5 times slower than http, lsass.exe is consuming 40% CPU

We have an ASP.NET application running on IIS on Windows Server 2016, hosted on a D48s Azure virtual machine (48 cores).
Most of the time, the web app is processing requests from regular users at a pace of 200-300 requests per seconds.
But at times, the server receives high incoming traffic as webhooks from external sites – up to 1000-5000 requests per second.
When this happens, the CPU usage get really high, and we noticed that the Local Security Authority Process (lsass.exe) is consuming most of the CPU at that time: it can take up to 40-50% of the total CPU usage (the orange graph is lsass.exe CPU usage):
Needless to say, the server becomes really busy, and other requests start to slow down.
By the way, each webhook request is very lightweight and adds a record to a table in SQL Server, for later processing.
We made a load test on our server, and found out that lsass becomes so active only when requests are made using https. However, if the very same requests are made using http, lsass.exe is not active at all.
The second discovery was that when using http the server was able to process 4-5 times more requests under the same load, compared to https!
Here is a screenshot from Performance Monitor: the green line show Requests per seconds, and the brown line shows lsass.exe CPU usage. On the left is what happens when using http, and on the right – when using https:
So the bottom line is that:
https makes the requests 4-5 times slower.
When using https, lsass.exe starts to eat a lot of CPU resources.
Questions are:
Why is lsass.exe so active during https requests?
I found an article on the web saying that lsass.exe was used by IIS 6.0 to cipher / decipher https traffic, but that starting from IIS 7.0 it is no longer used. However, our experiments indicate the contrary.
I don't understand how 4000-5000 requests per seconds having a body of 3-5 kilobytes can make the the CPU of a 48-core server so busy.
Maybe there are some hidden SSL settings in IIS that can make https more efficient?
We found info on SSL offloading: can this be done on Azure?
Update
We created a new Azure VM from scratch (a 32-core D32s), installed IIS and created a simple ASP.NET Web Forms app that has only 1 "Hello World" aspx page that does nothing (no SQL Server requests etc.)
With JMeter we created a load test to this page, and the same pattern appeared here:
1) http: the server was processing 20 000 requests per second, and lsass.exe was not active.
2) https: the server was processing only 1000-1500 requests per seconds, and lsass.exe was consuming 10% of total CPU.
Here is the Performance Minitor graph (http on the left, https on the right):
By the way, JMeter and the ASP.NET web app were run from the same VM, so network round-trips were minimal.
How can https be 15-20 times slower than http in this simple situation?
And what is the role of lsass.exe in this situation?

NiFi HandleHttpResponse 503 Errors

In NiFi, I have an HTTP endpoint that accepts POST requests with payloads varying from 7kb to 387kb or larger (up to 4mb). The goal is to have a clustered implementation capable of handling approximately 10,000 requests per second. However, no matter if I have NiFi clustered with 3 nodes or a single instance, I've never been able to average more than 15-20 requests/second without the Jetty service returning a 503 error. I've tried reducing the time penalty and increasing the number of Maximum Outstanding Requests in the StandardHttpContextMap. No matter what I try, whether on my local machine or on a remote VM, I cannot get any impressive number of requests to go through.
Any idea why this is occurring and how to fix this? Even when clustering, I notice one node (not even the primary node) does the majority of the work and I think this explains why the throughput isn't much higher for a clustered implementation.
No matter at what bulletin level, this is the error I get in the nifi-app.log:
2016-08-09 09:54:41,568 INFO [qtp1644282758-117] o.a.n.p.standard.HandleHttpRequest HandleHttpRequest[id=6e30cb0d-221f-4b36-b35f-735484e05bf0] Sending back a SERVICE_UNAVAILABLE response to 127.0.0.1; request was POST 127.0.0.1
This is the same whether I'm running just two processors (HandleHttpRequest and HandleHttpResponse) as my only processors or I have my general flow where I'm routing on content, replacing some text, and writing to a database or jms messaging system. I can get higher throughput (up to 40 requests/sec) when I'm running just the web service without my entire flow, but it still has a KO rate of about 90%, so it's not much better - still seems to be an issue with the jetty service.

wso2 esb out of memory when clients disconnects

I've a wso2esb 4.8.0 running with some proxies deployed in.
All works great until clients that are calling the proxies begin to disconnect, before receiving the response from the esb.
after few minutes the ESB goes in outOfmemory error.
the average size of requests is: 1.2 Kb.
the average size of responses is: 1.6 Mb.
the server is running with: -Xms256m -Xmx1024m -XX:MaxPermSize=256m.
In the heap dump I can see that the major classes retaining memory are java.lang.Thread (PassThroghtMessageProcessor), lot of and all with size of 36Mb .
sometimes there is also this error: java.lang.OutOfMemoryError: GC overhead limit exceeded
if clients don't disconnect everything works good.
any idea?
If clients are disconnecting, maybe the root cause is that your back end is taking too long to answer.
There are a few thing you can do:
Decrease the timeout for back end responses, so messages don't pile up on the ESB
Use the throttling mediator or handler in order to reject messages above a certain rate.
Both are targeted at the symptoms (oom at the ESB). Other than that, it may be necessary an integration model redesign, maybe using a store-and-forward approach.

How can I debug buffering with HTTP.sys?

I am running Windows 8.1 and I have an integration test suite that leverages HostableWebCore to spin up isolated ASP.NET web server processes. For performance reasons, I am launching 8 of these at a time and once they are started up I send a very simple web request to each, which is handled by an MVC application loaded into each. Every instance is listening on a different port.
The problem is that the requests are getting held up (I believe) in HTTP.sys (or whatever it is called these days). If I look at fiddler, I can see all 8 requests immediately (within a couple milliseconds) hit the ServerGotRequest state. However, the requests sit in this state for 20-100 seconds, depending on how many I run in parallel at a time.
The reason I suspect this is HTTP.sys problem is because the amount of time I have to wait for any of them to respond increases with the number of hosting applications I spin up in parallel. If I only launch a single hosting application, it will start responding in ~20 seconds. If I spin up 2 they will both start responding in ~30 seconds. If I spin up 4, ~40 seconds. If I spin up 8, ~100 seconds (which is default WebClient request timeout).
Because of this long delay, I have enough time to attach a debugger and put a breakpoint in my controller action and that breakpoint will be hit after the 20-100 second delay, suggesting that my process hasn't yet received the request. All of the hosts are sitting idle for those 20-100 seconds after ~5-10 seconds of cold start CPU churning. All of the hosts appear to receive the requests at the same time, as if something was blocking any request from going through and then all of a sudden let everything through.
My problem is, I have been unable to locate any information related to how one can debug HTTP.sys. How can I see what it is doing? What is causing the block? Why is it waiting to forward on the requests to the workers? Why do they all come through together?
Alternatively, if someone has any idea how I can work around this and get the requests to come through immediately (without the waiting) I would very much appreciate it.
Another note: I can see System (PID 4) immediately register to listen on the port I have specified as soon as the hosting applications launch.
Additional Information:
This is what one of my hosting apps looks like under netsh http show servicestate
Server session ID: FD0000012000004C
Version: 2.0
State: Active
Properties:
Max bandwidth: 4294967295
Timeouts:
Entity body timeout (secs): 120
Drain entity body timeout (secs): 120
Request queue timeout (secs): 120
Idle connection timeout (secs): 120
Header wait timeout (secs): 120
Minimum send rate (bytes/sec): 150
URL groups:
URL group ID: FB00000140000018
State: Active
Request queue name: IntegrationTestAppPool10451{974E3BB1-7774-432B-98DB-99850825B023}
Properties:
Max bandwidth: inherited
Max connections: inherited
Timeouts:
Timeout values inherited
Logging information:
Log directory: C:\inetpub\logs\LogFiles\W3SVC1
Log format: 0
Number of registered URLs: 2
Registered URLs:
HTTP://LOCALHOST:10451/
HTTP://*:10451/
Request queue name: IntegrationTestAppPool10451{974E3BB1-7774-432B-98DB-99850825B023}
Version: 2.0
State: Active
Request queue 503 verbosity level: Basic
Max requests: 1000
Number of active processes attached: 1
Controller process ID: 12812
Process IDs:
12812
Answering this mainly for posterity. Turns out that my problem wasn't HTTP.sys but instead it was ASP.NET. It opens up a shared lock when it tries to compile files. This shared lock is identified by System.Web.HttpRuntime.AppDomainAppId. I believe that since all of my apps are built dynamically from a common applicationHost.config file, they all have the same AppDomainAppId (/LM/W3SVC/1/ROOT). This means they all share a lock and effectively all page compilation happens sequentially for all of the apps. However, due to the nature of coming/going from the lock all of the pages tend to finish at the same time because it is unlikely that any of them will get to the end of the process in a timely fashion, causing them all to finish around the same time. Once one of them makes it through, others are likely close behind and finish just after.

IIS7 Integrated Pipeline: Interaction between maxConcurrentRequestsPerCPU and requestsQueueLimit settings

Firstly there's a great overview of the IIS7 HTTP request lifecycle and various settings that affect performance here:
ASP.NET Thread Usage on IIS 7.0 and 6.0
Very specifically though, in dotNet 4 the defaults for maxConcurrentRequestsPerCPU and requestsQueueLimit are set to 5000. E.g. equivalent to: (in aspnet.config):
<system.web>
<applicationPool
maxConcurrentRequestsPerCPU="5000"
maxConcurrentThreadsPerCPU="0"
requestQueueLimit="5000" /> (** see note below)
</system.web>
Seems to me that on a multi-CPU/core server the requestQueueLimit here will always be invoked well berfore the 'perCPU' limit. Thus, if a max of 5000 requests per CPU is what you actually want then I would expect that the requestQueueLimit needs to be increased to 5000 * CPUCount or just disabled altogether.
Is my interpretation correct? If so can I disable requestQueueLimit? (set it to zero?). The documentation on this setting doesn't appear to address this question (so maybe I'm missing something or misreading?)
** side note from the above article: The requestQueueLimit is poorly named. It actually limits the maximum number of requests that can be serviced by ASP.NET concurrently. This includes both requests that are queued and requests that are executing. If the "Requests Current" performance counter exceeds requestQueueLimit, new incoming requests will be rejected with a 503 status code)
***Is my interpretation correct?
Yes, if you want to execute more than 5000 requests concurrently, you'll need to increase the requestQueueLimit. The requestQueueLimit restricts the total number of requests in the system. Due to its legacy, it is actually the total number of requests in the system, and not the number of requests in some queue. It's goal is to prevent the server from toppling over due to lack of physical memory, virtual memory, etc. When the limit is reached, incoming requests will receive a quick 503 "Server Too Busy" response. By the way, the current number of requests in the system is exposed by the "ASP.NET\Requests Current" performance counter.
***can I disable requestQueueLimit? (set it to zero?)
You can effectively disable it by setting it to a large value, like 50000. You must set the value in the aspnet.config fileI doubt your server can handle 50000 concurrent requests, but if so, then double that. Setting it to zero will not disable it...oddly, it means no more than one request can execute concurrently.
By the way, it looks like there is a bug in v4. For integrated mode, it only successfully reads the value of requestQueueLimit if it is configured in the aspnet.config file as described on MSDN. For some reason, v4 was not reading it from machine.config when I experimented with it a little bit ago.
You might want to check this application's source code. IIS tuner is an open source application which optimizes IIS settings for better performance. On the other hand this page could be useful for your questions.

Resources