We have an asp.net 4.0 (integrated mode) web application that runs on iis 7.5 x64 (w2k8) with 12 GB ram and 16 cores that has problem with spikes of requests queued. Normally the queue is zero, but occasionally (probably aroud 15 times over a 10 minute period) the queue spikes up to about 20-100 in the queue.
Sometimes this queue also correlate with a higher number of requests/sec. But that isn't always the case.
Requests current seems to always be between 15-30.
nbr of current logical and physical threads is as low as 60-100
CPU load is avg of 6%
requests/sec is around 150-200
connections active seems to be slowly increasing. It's about 7000.
connections established seems faily consitent around 130-140.
Since we are running .net 4.0 in integrated mode I suppose that we should be able to handle up to 5000 simultanously requests, or atleast 1000 (http.sys kernel)
http://blogs.msdn.com/b/tmarq/archive/2007/07/21/asp-net-thread-usage-on-iis-7-0-and-6-0.aspx
What could be causing .net to queue the requests even though there are threads left and requests/sec is low?
Just a guess: Garbage collection suspends all threads, so perhaps the period immediately following the garbage collection would look like a request spike, since IIS would be piling up requests during the GC. Can you correlate the spikes with garbage collections? If your application is I/O-bound, it may not be possible to drive the CPU load very high, since the threads will spend most of their time blocked.
The apparent leak of active connections is disturbing though, if it really is ever-increasing.
Related
I am running a large scale ERP system on the following server configuration. The application is developed using AngularJS and ASP.NET 4.5
Dell PowerEdge R730 (Quad Core 2.7 Ghz, 32 GB RAM, 5 x 500 GB Hard disk, RAID5 configured) Software: Host OS is VMWare ESXi 6.0 Two VMs run on VMWare ESXi .. one is Windows Server 2012 R2 with 16 GB memory allocated ... this contains IIS 8 server with my application code Another VM is also Windows Server 2012 R2 with SQL Server 2012 and 16 GB memory allocated .... this just contains my application database.
You see, I separated the application server and database server for load balancing purposes.
My application contains a registration module where the load is expected to be very very high (around 10,000 visitors over 10 minutes)
To support this volume of requests, I have done the following in my IIS server -> increase request queue in application pool length to 5000 -> enable output caching for aspx files -> enable static and dynamic compression in IIS server -> set virtual memory limit and private memory limit of each application pool to 0 -> Increase maximum worker process of each application pool to 6
I then used gatling to run load testing on my application. I injected 500 users at once into my registration module.
However, I see that only 40% / 45% of my RAM is being used. Each worker process is using only a maximum amount of 130 MB or so.
And gatling is reporting that around 20% of my requests are getting 403 error, and more than 60% of all HTTP requests have a response time greater than 20 seconds.
A single user makes 380 HTTP requests over a span of around 3 minutes. The total data transfer of a single user is 1.5 MB. I have simulated 500 users like this.
Is there anything missing in my server tuning? I have already tuned my application code to minimize memory leaks, increase timeouts, and so on.
There is a known issue with the newest generation of PowerEdge servers that use the Broadcom Network Chip set. Apparently, the "VM" feature for the network is broken which results in horrible network latency on VMs.
Head to Dell and get the most recent firmware and Windows drivers for the Broadcom.
Head to VMWare Downloads and get the latest Broadcom Driver
As for the worker process settings, for maximum performance, you should consider running the same number of worker processes as there are NUMA nodes, so that there is 1:1 affinity between the worker processes and NUMA nodes. This can be done by setting "Maximum Worker Processes" AppPool setting to 0. In this setting, IIS determines how many NUMA nodes are available on the hardware and starts the same number of worker processes.
I guess the 1 caveat to the answer you received would be if your server isn't NUMA aware/uses symmetric processing, you won't see those IIS options under CPU, but the above poster seems to know a good bit more than I do about the machine. Sorry I don't have enough street cred to add this as a comment. As far as IIS you may also want to make sure your app pool doesn't use default recycle conditions and pick a time like midnight for recycle. If you have root level settings applied the default app pool recycling at 29 hours may also trigger garbage collection against your child pool/causing delays even in concurrent gc where it sounds like you may benefit a bit from Gcserver=true. Pretty tough to assess that though.
Has your sql server been optimized for that type of workload? If your data isn't paramount you could squeeze faster execution times with delayed durability, then assess queries that are returning too much info for async io wait types. In general there's not enough here to really assess for sql optimizations, but if not configured right (size/growth options) you could be hitting a lot of timeouts due to growth, vlf fragmentation, etc.
Server Specks
Microsoft Windows Server 2003 Enterprise Edition SP2
IIS 6
.net4
Intel(R) Xeon(R) CPU
X5680 # 3.33GHz, 2.00GB of RAM
Physical Address Extension
I am having trouble finding the cause of our server's random downtime. Our clients inform us that their website goes down for hours at a time. Sometimes users are able to log in however the site is extremely slow/unstable and unusable. Sometimes users are not able to log in at all. When users are able to log in not all images are displayed (they get the image not found image).
We upgraded their website from .net1 to .net4 because we thought the cause of their downtime and random user log out was due to them running their website on .net1. The website was running fine with no issues for a few months.
The first time the server started to go down after that was due to the drive with which the website resided on running out of disk space. There was 40GB partitioned to this drive and 20GB was added. This didn't resolve the issue for very long.
The second time the server would randomly go down, I noticed in the Event viewer, that the web worker associated with the app pool used by the website would periodically require to be recylcled. That is, in the Security tab of the Event Viewer I would periodically see an event with ID 1074 reading 'A worker process with process id of '1540' serving application pool 'Net4' has requested a recycle because the worker process reached its allowed processing time limit.'. I then went into this app pool's properties and saw that the app pool would be recycled every 29 hours, which is the default. I modified this to have the app pool recycle every day at 3:00am. Since that we have not seen this event in the Event Viewer. We were able to catch the website during one of its downtimes before this was changed and recycled the app pool manually. This resolved the issue in this one instance.
This did not permanently fix the issue however, as we are still receiving emails from our client informing us that the website is down for hours at a time.
I then set up a performance monitor counter log. We have managed to monitor the server's performance during many of these downtimes. It does not appear to be a problem with memory as there is plenty of space on the drive. It does not appear to be a memory leak or related to excessive paging as there are no running processes which take up an excessive amount of % Processor Time and the Pages/Second Memory counter does not peak at an excessive amount during most of the downtime (I'll explain why excessive paging occurs later). The total IO Data Bytes/sec and IO Other Data Bytes/sec Process counter does not appear to be usually high or low during downtime. The total Thread Count and Handle Count Process counter do not exhibit any abnormal spikes or drops during this time. The total thread count, at a given time, seems to be between 600 and 900, give or take. The total handle count, at a given time, seems to be between 15,000 and 23,00, give or take. The % Time in Jit .NET CLR jit counter for instance w3wp is 0 for about half of the time and will randomly peak at almost 100 the other half, most of the time peaking for just a moment but rarely peaking for about 10 minutes, unrelated to downtime.
There are random times throughout the day where the process dsmcsvc takes up most, if not all, of the % Processor Time. This is a process run by the Symantec Antivirus software. When this process takes up the % Processor Time there is a corresponding event in the Event Viewer signifying that a new virus definition file has been uploaded that is, an Application event with ID 7 'New virus definition file loaded. Version: #version number#'. When this event occurs, the Pages/Sec counter spikes. Sometimes it spikes to only 200-300 but will at times peak over 10,000. This event seems to be completely unrelated to website downtime. I have researched the Symantec Antivirus software and found that there is a known memory leak in old versions of this software. I have found that this software is known to cause high memory usage when the link to a process called NavLogon.exe is broken/does not exist. This process does not appear to exist on the server so I currently have no way of restoring the link to it. I also found that this software uses Crypt32.dll and that old versions of Crypt32.dll have a known memory leak. The Crypt32.dll which exists on the server was last updated in 2007.
The Performance Monitor log monitors the total Sessions Active ASP.Net Applications counter. During downtime, the total number of sessions does not exhibit any abnormal behavior, there are a normal amount of active sessions during this time. Active sessions at a given time can be between 0 and 200. I was informed that the time when the most users are active is during 1st shift, however during about 10pm and 2am every day, this number peaks.
The site runs JavaScript client side, and Visual Basic.net server side. All users have about 10-15 session variables almost all of the time.
When the site goes down there are no events which seem to correspond to its downtime in the Event Viewer.
I also have set up a W3C Extended Log File Format log for this site. During downtime there seems be an excessive amount of GET requests for a Telerik.RadUploadProgressHandler.ashx.
I have seriously run out of ideas at this point and have extensively searched the web for solutions and come up empty. Any feedback as to why this may be occurring would be great.
It does not appear to be a problem with memory as there is plenty of space on the drive.
Really? Memory and hard drive space are two completely different things. 2GB of RAM was okay a decade ago, when that server was new, but today it's laughably small.
But don't bother upgrading or adding RAM. This server is old enough, the problem is probably just that the hardware is reaching the end of it's useful life. Additionally, the operating system is also nearing it's end of life. Server 2003 is scheduled for end of life on July 14, 2015. After that date, there will be no new patches of any kind produced for Server 2003... not even critical security patches. That will make Server 2003 completely unsuitable as a web server.
This seems like a good time to execute a transition to a completely new server.
I am load testing an .Net 4.0 MVC application hosted on IIS 7.5 (default config, in particular processModel autoconfig=true), and am observing odd behavior in how .Net manages the threads.
http://msdn.microsoft.com/en-us/library/0ka9477y(v=vs.100).aspx mentions that "When a minimum is reached, the thread pool can create additional threads or wait until some tasks complete".
It seems the duration that threads are blocked for, plays a role in whether it creates new threads or waits for tasks to complete. Not necessarily resulting in optimal throughput.
Question: Is there any way to control that behavior, so threads are generated as needed and request queuing is minimized?
Observation:
I ran two tests, on a test controller action, that does not do much beside Thread.Sleep for an arbirtrary time.
50 requests/second with the page sleeping 1 second
5 requests/second with the page sleeping for 10 seconds
For both cases .Net would ideally use 50 threads to keep up with incoming traffic. What I observe is that in the first case it does not do that, instead it chugs along executing some 20 odd requests concurrently, letting the incoming requests queue up. In the second case threads seem to be added as needed.
Both tests generated traffic for 100 seconds. Here are corresponding perfmon screenshots.
In both cases the Requests Queued counter is highlighted (note the 0.01 scaling)
50/sec Test
For most of the test 22 requests are handled concurrently (turquoise line). As each takes about a second, that means almost 30 requests/sec queue up, until the test stops generating load after 100 seconds, and the queue gets slowly worked off. Briefly the number of concurrency jumps to just above 40 but never to 50, the minimum needed to keep up with the traffic at hand.
It is almost as if the threadpool management algorithm determines that it doesn't make sense to create new threads, because it has a history of ~22 tasks completing (i.e. threads becoming available) per second. Completely ignoring the fact that it has a queue of some 2800 requests waiting to be handled.
5/sec Test
Conversely in the 5/sec test threads are added at a steady rate (red line). The server falls behind initially, and requests do queue up, but no more than 52, and eventually enough threads are added for the queue to be worked off with more than 70 requests executing concurrently, even while load is still being generated.
Of course the workload is higher in the 50/sec test, as 10x the number of http requests is being handled, but the server has no problem at all handling that traffic, once the threadpool is primed with enough threads (e.g. by running the 5/sec test).
It just seems to not be able to deal with a sudden burst of traffic, because it decides not to add any more threads to deal with the load (it would rather throw 503 errors than add more threads in this scenario, it seems). I find this hard to believe, as a 50 requests/second traffic burst is surely something IIS is supposed to be able to handle on a 16 core machine. Is there some setting that would nudge the threadpool towards erring slightly more on the side of creating new threads, rather than waiting for tasks to complete?
Looks like it's a known issue:
"Microsoft recommends that you tune the minimum number of threads only when there is load on the Web server for only short periods (0 to 10 minutes). In these cases, the ThreadPool does not have enough time to reach the optimal level of threads to handle the load."
Exactly describes the situation at hand.
Solution: Slightly increased the minWorkerThreads in machine.config to handle expected traffic burst. (4 would give us 64 threads on the 16 core machine).
I am running IIS6 on Windows 2003 Server 32 bit. I have read that IIS6 has a maximum virtual memory limit of 2gb (3gb with the 3gb switch fipped).
What I am unclear on is whether this means all ASP.NET sessions have 2gb between them or 2gb each.
So if I have a session variable storing 200kb and have 10,000 active sessions am I going to be hitting up against this 2gb limit?
In general the advice is to leave these options unticked for ASP.NET applications, it affects how quickly the appPool recycles more information here summary below:
Physical and Virtual memory: This section is for recycling application pools which consume too much memory. Focusing on physical I typically like to limit app pools around 800MB to 1200 MB max on a 32 bit app with very few app pools depending on the number and amount of memory. On a server with 2 GB RAM I'd set it at around 800MB max. On a 4GB of RAM server around 1GB and more if more with a max around 1200. On a 64 bit web front end with 8-16 GB memory I've heard of settings of 2GB of RAM or even allowing it to let it ride, rather than limiting it.
You really need to profile it since these can really grow to process and cache. The greater the amount of memory and the greater the load the higher the worker process will grow. When people ask about configuring the app pool, this is where they are usually asking what the numbers should be. What you are doing here is explicitly limiting the app pool from consuming more memory.
Notice this setting is on the recycle tab, there's a reason for that. When an app pool reaches the max it isn't like the max processor setting. It will cycle the worker process which is like a tiny reboot or similar to an iisreset, but not since sometimes we want this to happen so we can release our memory. You really don't want to cycle more than a couple of times per 24 hour period in an ideal world. I've heard of some trying to cycle right before the morning peak occurs so they have the most amount of memory available, then a cycle right at the end of the day before the backups or crawling begins.
Basically the recommendation is not setting a limit (leave the options unchecked) because once the limit is hit IIS will recycle the application pool causing all active users to be temporarily disconnected from the site. You users will likely receive an HTTP 500 while the application pool recycles and then once it's back there will be a delay while the application pool warms up.
Sessions
For an application of any size do not use InProc (stored in memory) sessions use state server or SQL server to store your sessions. http://msdn.microsoft.com/en-us/library/ms178586.aspx
Conclusion
It really depends on the profile of your application, if your expecting 10,000 active sessions though don't use InProc, don't use IIS6 and don't use a 32 bit server.
We have a fairly popular site that has around 4 mil users a month. It is hosted on a Dedicated Box with 16 gb of Ram, 2 procc with 24 cores.
At any given time the CPU is always under 40% and the memory is under 12 GB but at the highest traffic we see a very poor performance. The site is very very slow. We have 2 app pools one for our main site and one for our forum. Only the site is being slow. We don't have any restrictions on cpu or memory per app pool.
I have looked at he Performance counters and I saw something very interesting. At our peek time for some reason Request are being queued. Overall context switching numbers are very high around 30 - 110 000 k.
As i understand high context switching is caused by locks. Can anyone give me an example code that would cause a high number of context switches.
I am not too concerned with the context switching, and i don't think the numbers are huge. You have a lot of threads running in IIS (since its a 24 core machine), and higher context switching numbers re expected. However, I am definitely concerned with the request queuing.
I would do several things and see how it affects your performance counters:
Your server CPU is evidently under-utilized, since you run below 40% all the time. You can try to set a higher value of "Threads per processor limit" in IIS until you get to a 50-60% utilization. An optimal value of threads per core by the books is 20, but it depends on the scenario, and you can experiment with higher or lower values. I would recommend trying setting a value >=30. Low CPU utilization can also be a sign of blocking IO operations.
Adjust the "Queue Length" settings in IIS properties. If you have configured the "Threads per processor limit" to be 20, then you should configure the Queue Length to be 20 x 24 cores = 480. Again, if the requests are getting Queued, that can be a sign that all your threads are blocked serving other requests or blocked waiting for an IO response.
Don't serve your static files from IIS. Move them to a CDN, amazon S3 or whatever else. This will significantly improve your server performance, because 1,000s of Server requests will go somewhere else! If you MUST serve the files from IIS, than configure IIS file compression. In addition use expire headers for your static content, so they get cached on the client, which will save a lot of bandwidth.
Use Async IO wherever possible (reading/writing from disk, db, network etc.) in your ASP.NET controllers, handlers etc. to make sure you are using your threads optimally. Blocking the available threads using blocking IO (which is done in 95% of the ASP.NET apps i have seen in my life) could easily cause the thread pool to be fully utilized under heavy load, and Queuing would occur.
Do a general optimization to prevent the number of requests that hit your server, and the processing time of single requests. This can include Minification and Bundling of your CSS/JS files, refactoring your Javascript to do less roundtrips to the server, refactoring your controller/handler methods to be faster etc. I have added links below to Google and Yahoo recommendations.
Disable ASP.NET debugging in IIS.
Google and Yahoo recommendations:
https://developers.google.com/speed/docs/insights/rules
https://developer.yahoo.com/performance/rules.html
If you follow all these advices, i am sure you will get some improvements!