We're running a custom application on our intranet and we have found a problem after upgrading it recently where IIS hangs with 100% CPU usage, requiring a reset.
Rather than subject users to the hangs, we've rolled back to the previous release while we determine a solution. The first step is to reproduce the problem -- but we can't.
Here's some background:
Prod has a single virtualized (vmware) web server with two CPUs and 2 GB of RAM. The database server has 4GB, and 2 CPUs as well. It's also on VMWare, but separate physical hardware.
During normal usage the application runs fine. The w3wp.exe process normally uses betwen 5-20% CPU and around 200MB of RAM. CPU and RAM fluctuate slightly under normal use, but nothing unusual.
However, when we start running into problems, the RAM climbs dramatically and the CPU pegs at 98% (or as much as it can get). The site becomes unresponsive, necessitating a IIS restart. Resetting the app pool does nothing in this situation, a full IIS restart is required.
It does not happen during the night (no usage). It happens more when the site is under load, but it has also happened under non-peak periods.
First step to solving this problem is reproducing it. To simulate the load, we starting using JMeter to simulate usage. Our load script is based on actual usage around the time of the crash. Using JMeter, we can ramp the usage up quite high (2-3 times the load during the crash) but the site behaves fine. CPU is up high, and the site does become sluggish, but memory usage is reasonable and nothing is hanging.
Does anyone have any tips on how to reproduce a problem like this in a non-production environment? We'd really like to reproduce the error, determine a solution, then test again to make sure we've resolved it. During the process we've found a number of small things that we've improved that might solve the problem, but I'd really feel a lot more confident if we could reproduce the problem and test the improved version.
Any tools, techniques or theories much appreciated!
You can find some information about troubleshooting this kind of problem at this blog entry. Her blog is generally a good debugging resource.
I have an article about debugging ASP.NET in production which may provide some pointers.
Is your test env the same really as live?
i.e
2 separate vm instances on 2 physical servers - with the network connection and account types?
Is there any other instances on the Database?
Is there any other web applications in IIS?
Is the .Net Config right?
Is the App Pool Config right for service accounts ?
Try look at this - MS Article on II6 Optmising for Performance
Lots of tricks.
Related
I have a website application running in it's own application pool on IIS 7.0. The application is an ASP.NET MVC 3 website.
I have noticed the memory usage for this applications corresponding w3wp IIS worker service is quite high ( 800 MB, with some fluctuation ).
I am trying to diagnose the problem and have tried the following:
I have disabled output page caching for the website at IIS level and then recycled the application pool. This causes the w3wp process to restart. The memory usage for this process then slowly creeps up to around 800 MB, it takes around 30 seconds to do so. There are no page requests being handled at this time. When I restart the website from IIS the memory size of the process does not alter.
I have tried running a debug copy of the application from VS 2010, there are no problems with memory usage.
Some ideas I have/questions are:
Is this problem related to the websites code? - Given that the memory rockets before any page requests have been sent/handled, I would assume this is NOT a code problem?
The application built in MVC has no handling of caching written into it.
The website uses real-time displaying of data, it uses ajax requests periodically, and is generally left 'open' for long periods of time.
Why does the memory usage rocket up after the application is recycled and no user requests are being sent? Is this because it is loading old cache information into it's memory from disk?
The application does NOT crash, I'm just concerned about the memory usage, it is not that big of a website...
Any ideas/help with getting to the bottom of this problem would be appreciated.
The best thing to do if you can afford to use a debugger is install the Windows Debugging Tools and use something like WinDbg and SOS.dll to figure out exactly what is it in memory.
once you've installed the tools then you can:
Launch Windbg.exe running elevated (as Administrator)
Use "File->Attach To Process" and choose w3wp.exe for the app you are trying to figure out. If you have many you can use Task Manager and add the command-line column to see the PID or use IIS Manager->Worker Processes to figure it out, and then choose that process in WinDBG.
run:
.loadby sos clr
!dumpheap -stat
At that point you should be able to see all types sorted by the most memory consumption so you can start with the ones at the bottom. (I would recommend exclude Strings, and Object since those are usually a side-effect and not the cause).
Use "!dumpheap -type type-here" to find the instances and use !gcroot for those to figure out why they are in memory, maybe due to a static field, or an event handler leaked, WCF channels not disposed, or things like that are common sources.
I just looked my server and my pools use 900-1000 MB Virtual size Memory, and 380 MB Working set. My sites run smooth with out problem for some years now, and I have checked the sites from all sides. My pool never recycles and the server runs until the next update continuously with 40% stable free physical memory.
If your memory is not continuously growing, then this memory is the code plus the data that you set as static, const, the string, and the possible cache, inside your application.
You can use process explorer to see the working and the virtual size memory.
You can also think to run a profile against your code to see if you have any "memory leak" or other issue. Find one from google: https://www.google.com/search?hl=en&q=asp.net+memory+profiler.
It probably doesn't apply here but thought I would throw it in for good measure. Recently I had a problem where my memory would go right up and max out when it really could of cleaned up 80% of it. Problem: It thought it about 2 more gig than it actually did so the GC was quite lazy. (It was due to a VM ware bug -windows was reporting 8 Gig but physically there was only 6.4). See blog.http://www.worthalook.net/2014/01/give-back-memory/
Something that might help: if you "rewrite" (open/save) the web.config , then your application will reset, you should monitor the memory usage from that point. If it keeps growing during usage, this could mean memory leak or insane caching. You might be able to identify which actions on your site lead to memory increase. During a long time the memory usage of an application should be stable.
All of my websites are hosted in IIS and configured with one application pool. This application pool consists 10 websites running.
It is working fine till today, but all of sudden I am observing that there is sudden up and down % in CPU usage. I am unable to trace out the problem.
Is there anyway to check which website is taking much load among all in the application pool?
Performance counters, task manager and native code analysis tools only tell part of the story. To gain a deeper understanding of what is happening inside your ASP.NET application you need to use WinDBG, SOS and ADPlus.
Tess Ferrandez has a great series of articles on tracking down what is to blame here:
.NET Debugging Demos Lab 4: High CPU hang
.NET Debugging Demos Lab 4: High CPU Hang - Review
This is a real world example:
High CPU in .NET app using a static Generic.Dictionary
You will probably want to separate your sites into individual application pools so you can identify and isolate the site that is causing the high CPU (but it already looks like you have a suspect so I'd isolate that one). From then you can follow Tess's advice and guidance to track down the cause.
You should also take a look at the logs to see if you're experiencing an unexpected spike or increase in traffic. Perhaps there's a badly behaved search engine site indexer nailing the site. If that's the case then maybe you need to (if you haven't already done so) create a robots.txt to prevent crawlers from indexing parts of the site that don't need to be indexed. On top of that if certain crawlers are being overly promiscious then just ban them. Perhaps consider a sitemap for google to tame and tune its activities.
If your server has reached it's max capacity, you will see CPU go up and down erratically because the GC will start trying to recover resources(cache..etc), which in turn causes your sites to work even harder. It's an endless cycle.
Have you been monitoring your performance counters? Do you have any idea what normal capacity is for your site? If you cannot answer these questions, I suggest you gather some perf numbers as soon as possible.
My rule of thumb is to always measure first, then make necessary changes.
Most of the time performance bottlenecks aren't where you think they would be.
There is really no performance counter way to tell, because the CPU counters are at the process level. Your best bet would be to do a time corelation with other events in the event log and .NET/ASP.NET counters for garbage collection, requests etc.
If you really want to go hardcore, you could use the SysInternals toolset to take snapshots of your app pool over time and then do a post-analysis to figure out what code was executed when the spike happened. Here is a related example from Mark Russinovich's blog - http://blogs.technet.com/b/markrussinovich/archive/2008/04/07/3031251.aspx.
I'd like to describe strange issue I've noticed while analyzing my asp.net application in production and ask for some advice or opinion on the following matter.
Application usually runs with some 80-90 MB of memory footprint. This seems stable since no memory leaks have been detected so far - no slight increase in memory usage over time. Yet, problem occurs when application pool recycles (I'm using shared hosting and judging by logs it occurs either when app is idle for 20 mins or every ~30 hours - something like that). The issue is that used memory almost doubles for some period on recycle - it goes to some 160-170 MBs without any explanation. This is confusing, since it is common claim that recycling should purge the memory and all other resources - at least I get it that way. System holds this amount of memory for some 7-8 hours and then memory usage drops to it's usual level of 90-100 MB, again, with no apparent reason (at least not know to me). All the time, application seems to work well - no significant delays or troubles with site availability - to the users everything seems OK, no complaints so far. When looking at the memory consumption over time graph - it looks almost like a step function.
The important thing is that I haven't been able to reproduce this sort of behavior in my testing environment. Occasionally, I've been getting notes from provider administrators that my app is using more resources than allowed and this really bugs me.
So, what I would like to know - is there any possible scenario where application pool recycling does not release all memory resources ? Is there any advice or guideline what I should focus on ? I'm not an expert in this area, but I've been reading about things like overlapping recycling, serialization problems on recycle and couple more issues ... Any ideas ? Similar experience ?
Thanks
This post provides a pretty good overview of what happens when your site's app pool is recycled: http://blogs.msdn.com/b/tess/archive/2006/08/02/asp-net-case-study-lost-session-variables-and-appdomain-recycles.aspx
My speculation is that your memory usage is increasing due to the JIT-compilation that follows every recycle of the app pool. My guess is that your shared host has different configuration and environmental settings than your development server.
IMHO, if you're using ~100 megs of memory on a shared host, you are asking for trouble if it's a host like DiscountASP.NET or GoDaddy. If you care at all about that website, go get a VPS or some more configurable hosting where you can pay a premium for a higher memory limit.
I work for a hosting company, providing ASP.Net 3.5 hosting. Honestly, we usually provide quite good uptime and velocity. However, we are having problems with one of our shared pools. As usual, we try to maximize the number of webs that can run into one pool.
Lately we are suffering continuous hangs. The process doesn't crash, but starts to show OutOfMemoryExceptions or stops processing requests. We think this is responsability of one of the applications (it would be great to know which one).
I have some memory dumps that I have processed with WinDbg. I've run f.e:
!dumpheap -stat
This method provide global memory usage of objects. Nothing remarkable... Also I've checked:
~*e!clrstack
I see various non managed threads. In those who are managed appears stacks like:
[HelperMethodFrame_1OBJ: 0f30e320]
System.Threading.WaitHandle.WaitMultiple(System.Threading.WaitHandle...
0f30e3ec 7928b3ff System.Threading.WaitHandle.WaitAny(System.Threading...
0f30e40c 7a55fc89 System.Net.TimerThread.ThreadProc()...
0f30e45c 792d6e46 System.Threading.ThreadHelper.ThreadStart_Context(System...
0f30e468 792f5781 System.Threading.ExecutionContext.runTryCode(System...
At least, I haven't seen exception throwing or similar (in that moment). I've also had access to two scripts written by Tess Ferrandez for calculating the number of sessions and size. Also here not promising results. Anything peculiar or remarkable (24000 bytes as average).
I would like to know what kind of strategies are you usually using facing this kind of problems. Have you ever used Microsoft Support?
Thanks a lot!
Very nice question, well a bad asp.net can hang all shared web apps on the same pool...
Ok let see... if the problem is on memory, get the VMMap from Sysinternals, and also the Process Explorer
Run them both, and from Process explorer find the PID number of pool that you wish to investigate, its under the inetinfo.exe, and have probably the name aspnet_wp.exe.
Now on the VMMap add for monitoring this Pool using for help the PID, and voila, you see the memory and the open images (aspx files) that probably are a lot and make the problems... The files that you going to see are located on temporary of asp.net Framework, but you can connect them and see from witch site they come from.
Well if the problem is not on memory, but the programmer have create bad loops, or even create thread sleeps, then I think process explorer is a way to investigate the pools and search for whats eating the power.
Additional
Maybe a pool recycle every 15minute can solve this issue ?
More about
In those videos there are a lot of informations about VMMap and memory manager.
Mysteries of Windows Memory Management, Part 1, and , Part 2
There are many tools, but it sounds like your main goal is to determine what's causing the problem. This can be done very simply with a binary search.
Break the pool in half, and see which one crashes. Repeat until you have a crashed pool with only one application in it.
This is already O(log2n), but you can speed the process up arbitrarily by dividing into more than two sub-pools.
An ASP.NET web app running on IIS6 periodically shoots the CPU up to 100%. It's the W3WP that's responsible for nearly all CPU usage during these episodes. The CPU stays pinned at 100% anywhere from a few minutes to over an hour.
This is on a staging server and the site is only getting very light traffic from testers at this point.
We've running ANTS profiler on the server, but it's been unenlightening.
Where can we start finding out what's causing these episodes and what code is keeping the CPU busy during all that time?
Standard Windows performance counters (look for other correlated activity, such as many GET requests, excessive network or disk I/O, etc); you can read them from code as well as from perfmon (to trigger data collection if CPU use exceeds a threshold, for example)
Custom performance counters (particularly to time for off-box requests and other calls where execution time is uncertain)
Load testing, using tools such as Visual Studio Team Test or WCAT
If you can test on or upgrade to IIS 7, you can configure Failed Request Tracing to generate a trace if requests take more a certain amount of time
Use logparser to see which requests arrived at the time of the CPU spike
Code reviews / walk-throughs (in particular, look for loops that may not terminate properly, such as if an error happens, as well as locks and potential threading issues, such as the use of statics)
CPU and memory profiling (can be difficult on a production system)
Process Explorer
Windows Resource Monitor
Detailed error logging
Custom trace logging, including execution time details (perhaps conditional, based on the CPU-use perf counter)
Are the errors happening when the AppPool recycles? If so, it could be a clue.
It's not much of an answer, but you might need to go old school and capture an image snapshot of the IIS process and debug it. You might also want to check out Tess Ferrandez's blog - she is a kick a** microsoft escalation engineer and her blog focuses on debugging windows ASP.NET, but the blog is relevant to windows debugging in general. If you select the ASP.NET tag (which is what I've linked to) then you'll see several items that are similar.
If your CPU is spiking to 100% and staying there, it's quite likely that you either have a deadlock scenario or an infinite loop. A profiler seems like a good choice for finding an infinite loop. Deadlocks are much more difficult to track down, however.
Process Explorer is an excellent tool for troubleshooting. You can try it for finding the problem of high CPU usage. It gives you an insight into the way your application works.
You can also try Procdump to dump the process and analyze what really happened on the CPU.
Also, look at your perfmon counters. They can tell you where a lot of that cpu time is being spent. Here's a link to the most common counters to use:
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/852720c8-7589-49c3-a9d1-73fdfc9126f0.mspx?mfr=true
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/be425785-c1a4-432c-837c-a03345f3885e.mspx?mfr=true
We had this on a recursive query that was dumping tons of data to the output - have you double checked everything does exit and no infinite loops exist?
Might try to narrow it down with a single page - we found ANTS to not be much help in that same case either - what we ended up doing was running the site hit a page watch the CPU - hit the next page watch CPU - very methodical and time consuming but if you cant find it with some code tracing you might be out of luck -
We were able to use IIS log files to track it to a set of pages that were suspect -
Hope that helps !
This is a guess at best, but perhaps your development team is building and deploying the application in debug mode, in stead of release mode. This will cause the occurrence of .pdb files. The implication of this is that your application will take up additional resources to collect system state and debugging information during the execution of your system, causing more processor utilization.
So, it would be simple enough to ensure that they are building and deploying in release mode.
This is a very old post, I know, but this is also a common problem. All of the suggested methods are very nice but they will always point to a process, and there are many chances that we already know that our site is making problems, but we just want to know what specific page is spending too much time in processing.
The most precise and simple tool in my opinion is IIS itself.
Just click on your server in the left pane of IIS.
Click on 'Worker Processes' in the main pane. you already see what application pool is taking too much CPU.
Double click on this line (eventually refresh by clicking 'Show All') to see what pages consume too much CPU time ('Time elapsed'
column) in this pool
If you identify a page that takes time to load, use SharePoint's Developer Dashboard to see which component takes time.