Debugging ASP.Net shared pool techniques

Debugging ASP.Net shared pool techniques - asp.net

I work for a hosting company, providing ASP.Net 3.5 hosting. Honestly, we usually provide quite good uptime and velocity. However, we are having problems with one of our shared pools. As usual, we try to maximize the number of webs that can run into one pool.
Lately we are suffering continuous hangs. The process doesn't crash, but starts to show OutOfMemoryExceptions or stops processing requests. We think this is responsability of one of the applications (it would be great to know which one).
I have some memory dumps that I have processed with WinDbg. I've run f.e:
!dumpheap -stat
This method provide global memory usage of objects. Nothing remarkable... Also I've checked:
~*e!clrstack
I see various non managed threads. In those who are managed appears stacks like:
[HelperMethodFrame_1OBJ: 0f30e320]
System.Threading.WaitHandle.WaitMultiple(System.Threading.WaitHandle...
0f30e3ec 7928b3ff System.Threading.WaitHandle.WaitAny(System.Threading...
0f30e40c 7a55fc89 System.Net.TimerThread.ThreadProc()...
0f30e45c 792d6e46 System.Threading.ThreadHelper.ThreadStart_Context(System...
0f30e468 792f5781 System.Threading.ExecutionContext.runTryCode(System...
At least, I haven't seen exception throwing or similar (in that moment). I've also had access to two scripts written by Tess Ferrandez for calculating the number of sessions and size. Also here not promising results. Anything peculiar or remarkable (24000 bytes as average).
I would like to know what kind of strategies are you usually using facing this kind of problems. Have you ever used Microsoft Support?
Thanks a lot!

Very nice question, well a bad asp.net can hang all shared web apps on the same pool...
Ok let see... if the problem is on memory, get the VMMap from Sysinternals, and also the Process Explorer
Run them both, and from Process explorer find the PID number of pool that you wish to investigate, its under the inetinfo.exe, and have probably the name aspnet_wp.exe.
Now on the VMMap add for monitoring this Pool using for help the PID, and voila, you see the memory and the open images (aspx files) that probably are a lot and make the problems... The files that you going to see are located on temporary of asp.net Framework, but you can connect them and see from witch site they come from.
Well if the problem is not on memory, but the programmer have create bad loops, or even create thread sleeps, then I think process explorer is a way to investigate the pools and search for whats eating the power.
Additional
Maybe a pool recycle every 15minute can solve this issue ?
More about
In those videos there are a lot of informations about VMMap and memory manager.
Mysteries of Windows Memory Management, Part 1, and , Part 2

There are many tools, but it sounds like your main goal is to determine what's causing the problem. This can be done very simply with a binary search.
Break the pool in half, and see which one crashes. Repeat until you have a crashed pool with only one application in it.
This is already O(log2n), but you can speed the process up arbitrarily by dividing into more than two sub-pools.

Related

ASP.DLL memory leak (or something) forcing constant restart of w3wp

We've been struggling with this for the past 12 or so months. We think it's due to either one or two apps that are leaking memory or a large amount of leaks that have finally accumulated over years of programming in classic ASP. We've begun the conversion to ASP.NET but we still have a large number of apps in classic.
We've tried changing how IIS restarts, depending on CPU and memory usage and we've tried to clean up some processes. We've installed multiple analytical tools to be able to track exactly where it's coming from to no avail.
Just today we were able to finally track down a more detail error message, "Detected possible blocking or leaked critical section at asp!g Template cache+88 owned by thread 72 in W3WP". It also states that "ASP.DLL is currently holding a Critical Section Lock on ASP template cache manager...".
So, is there any tool that will help track where our leak is coming from? Or maybe a better way to restart this before it freezes our whole web process?
I appreciate your time!

You have to use cache class (to html) for the most viewed pages look at http://www.webdevbros.net/2006/11/18/cache-object-for-classic-asp/.
You have to close all connections at the end of pages.
These will solve the memory leak.

Memory problems in ASP.NET

I got problems with memory in my asp.net application. The problem is that I can't see any problems when running it locally (between 100-200mb) but on the production system I get 503-errors because of the memory limit (512mb) being reached (running it on shared hosting).
How can I pin down the problem? I don't think that I have access to the current memory usage, at least I have not found any way and the company who hosts my site says that there is no way.
I have absolutely no experience tracking down memory leaks. :)
Thanks

Use a trial version of RedGate's Memory Profiler
http://www.red-gate.com/products/ants_memory_profiler/index.htm?utm_source=google&utm_medium=cpc&utm_content=unmet_need&utm_campaign=antsmemoryprofiler&gclid=CJLijJblm6UCFQqAgwodHjokHg
or JetBrains dotTrace
http://www.jetbrains.com/profiler/
Both tools are very simple and easy to use and do a great job of identifying protential memory leaks etc.
Most common sources of leaks are missed dispose calls, or poor management of event handlers... depending on the size of your code base, you may be able to just "spot" the trouble spots, but I find using a tool speeds up the process greatly as both will present before/after snapshots of the object graphs so you can see what is and is not being cleaned up by th GC.
Good overview of memory management:
http://msdn.microsoft.com/en-us/library/ee817660.aspx

I don't know that this is completely answerable here, but here's a start for you... The other answers are addressing specific memory issues, but tirst, you need to understand how memory is allocated and deallocated (reserved, used, and released) by the computer, the .NET runtime and in turn, your program.
Then you need to understand your code well enough to understand which functions happen on a per-user bases, and look at how much memory is being used. From there, you can get into your code and track down issues, but you need a firm understanding of the basics.
If I were you, I'd start with this article, and plan on spending some more time researching and learning. Hoefully, this article will not only answer questions, but give you enough knowledge to ask more specific/better questions. It's a good article, and I believe it will really help you, but it's not the whole kit-n-kaboodle. There's a quite a bit to learn.
http://msdn.microsoft.com/en-us/magazine/cc188781.aspx
The article is a bit old, and I'm assuming you're using more recent tools, so when you're done digesting that article, jump to http://msdn.microsoft.com/en-us/library/ms182372.aspx to learn about the Visual Studio Profiler.

This isn't necessarily an answer to your problem, per se, but more of a suggestion as to how to track things like this down.
One thing that I've found helps in tracking down these sorts of issues is to build into your application some sort of instrumentation. It could start as simple as providing a cache of sorts to keep track of pages request durations. This could be accomplished by creating a static cache class to hold either all (not recommended) or just long-running requests that you define (a safer approach) and have it all triggered in the OnBegin and OnEnd events (an HTTP module would be ideal). You could then create a basic dashboard page to list the contents of the cache to see potential places for trouble.

First things first... 503 is not only because of memory. If your application crashes 5 times in 5 minutes, due to rapid fail the application pool gets shut down and you get 503 - Service unavailable error.
500 MB odd memory seems pretty less to me and hence, memory could be adding to your problem. If it is 503 error, it means you have troubleshoot the issue from a crash perspective. Link
If you are having memory issues, you will typically get Out of memory exceptions, in which case, you should take multiple memory dumps of your process (w3wp.exe) and analyze it. Link has many posts on how you should analyze the memory dumps for memory leak. Right now, it would be too early for you to call it a memory leak.

W3WP.EXE using 100% CPU - where to start?

An ASP.NET web app running on IIS6 periodically shoots the CPU up to 100%. It's the W3WP that's responsible for nearly all CPU usage during these episodes. The CPU stays pinned at 100% anywhere from a few minutes to over an hour.
This is on a staging server and the site is only getting very light traffic from testers at this point.
We've running ANTS profiler on the server, but it's been unenlightening.
Where can we start finding out what's causing these episodes and what code is keeping the CPU busy during all that time?

Standard Windows performance counters (look for other correlated activity, such as many GET requests, excessive network or disk I/O, etc); you can read them from code as well as from perfmon (to trigger data collection if CPU use exceeds a threshold, for example)
Custom performance counters (particularly to time for off-box requests and other calls where execution time is uncertain)
Load testing, using tools such as Visual Studio Team Test or WCAT
If you can test on or upgrade to IIS 7, you can configure Failed Request Tracing to generate a trace if requests take more a certain amount of time
Use logparser to see which requests arrived at the time of the CPU spike
Code reviews / walk-throughs (in particular, look for loops that may not terminate properly, such as if an error happens, as well as locks and potential threading issues, such as the use of statics)
CPU and memory profiling (can be difficult on a production system)
Process Explorer
Windows Resource Monitor
Detailed error logging
Custom trace logging, including execution time details (perhaps conditional, based on the CPU-use perf counter)
Are the errors happening when the AppPool recycles? If so, it could be a clue.

It's not much of an answer, but you might need to go old school and capture an image snapshot of the IIS process and debug it. You might also want to check out Tess Ferrandez's blog - she is a kick a** microsoft escalation engineer and her blog focuses on debugging windows ASP.NET, but the blog is relevant to windows debugging in general. If you select the ASP.NET tag (which is what I've linked to) then you'll see several items that are similar.

If your CPU is spiking to 100% and staying there, it's quite likely that you either have a deadlock scenario or an infinite loop. A profiler seems like a good choice for finding an infinite loop. Deadlocks are much more difficult to track down, however.

Process Explorer is an excellent tool for troubleshooting. You can try it for finding the problem of high CPU usage. It gives you an insight into the way your application works.
You can also try Procdump to dump the process and analyze what really happened on the CPU.

Also, look at your perfmon counters. They can tell you where a lot of that cpu time is being spent. Here's a link to the most common counters to use:
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/852720c8-7589-49c3-a9d1-73fdfc9126f0.mspx?mfr=true
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/be425785-c1a4-432c-837c-a03345f3885e.mspx?mfr=true

We had this on a recursive query that was dumping tons of data to the output - have you double checked everything does exit and no infinite loops exist?
Might try to narrow it down with a single page - we found ANTS to not be much help in that same case either - what we ended up doing was running the site hit a page watch the CPU - hit the next page watch CPU - very methodical and time consuming but if you cant find it with some code tracing you might be out of luck -
We were able to use IIS log files to track it to a set of pages that were suspect -
Hope that helps !

This is a guess at best, but perhaps your development team is building and deploying the application in debug mode, in stead of release mode. This will cause the occurrence of .pdb files. The implication of this is that your application will take up additional resources to collect system state and debugging information during the execution of your system, causing more processor utilization.
So, it would be simple enough to ensure that they are building and deploying in release mode.

This is a very old post, I know, but this is also a common problem. All of the suggested methods are very nice but they will always point to a process, and there are many chances that we already know that our site is making problems, but we just want to know what specific page is spending too much time in processing.
The most precise and simple tool in my opinion is IIS itself.
Just click on your server in the left pane of IIS.
Click on 'Worker Processes' in the main pane. you already see what application pool is taking too much CPU.
Double click on this line (eventually refresh by clicking 'Show All') to see what pages consume too much CPU time ('Time elapsed'
column) in this pool

If you identify a page that takes time to load, use SharePoint's Developer Dashboard to see which component takes time.

How do you profile a production ASP.NET applicaiton?

We have some performance problems with one of our applications. I thought about using something like dotTrace to find out where the problems are, but dotTrace would probably degrade performance even more.
What's the best way to profile an application that's on a prod environment w/o affecting performance too much?

The general answer is "don't do it".
Other than that, you can gain a lot by using performance counters. If the built-in counters don't help, you can create your own.
Among other things, the performance counters may give you an idea of how to reproduce the performance problems through load testing.
The next idea is to narrow down the area you're interested in. There's no sense impacting performance for the entire application if it turns out to be your web service access that's slow.
Next, be sure to have instrumented your application, preferably by using configuration. The Enterprise Library Logging Application Block is great for that, as it allows you to add the logging to your application, but have it configured off. Then, you can configure what kind of information to log, and where to log it to.
This gives you choices about how expensive the logging should be, from logging to the event log to logging to an XML file. And you can decided this all at runtime.
Finally, you're not going to be able to use dotTrace or something else that requires restarting IIS an adding code to your running application. Not in production. The ideas above are for the purpose of not needing to do so.

Profiling memory or cpu?
Memory: the best way would be to create a memory dump of the w3wp process (launch task manager, right click the process then "create dump"), then copy the dump to your local machine and analyse it with WinDbg. And look at which classes consume the most memory. There are lots of questions/answers here on Stackoverflow on how do do that (how to use WinDbg and analyse the .NET heap).
CPU: we use a short command-line profiler by Sam Saffron (woohoo, one of the creators of Stackoverflow!) His project is abandoned, but we forked it and maintain it here. https://github.com/jitbit/cpu-analyzer Everyone's welcome to contribute. It attaches to your threads using Microsoft's DbgManager and finds call-stacks that take longest time to execute.

Did you load-test the application on a number of sessions that's anywhere near the actual load of the production environment?
The first thing that comes to mind is that your app is not scaling well under load or that your db is not scaling well with an increase in size (causing this way problems even with a very limited number of concurrent sessions) but it could be anything really.
My suggestion is to replicate the production environment and run proper load-testing then look at the data and it'll give you some clue.
you don't wanna play games with your production environment, but if you don't have it already you could use logging to keep track of the sequence and time spans of key-events and take it from there.

You could use ants profiler
http://www.red-gate.com/products/ants_performance_profiler/index2.htm
They claim that "the overhead was hardly noticeable".
There is a 14 day free trial so you could give it a try.
Edit: I agree with John's comment, it will disrupt, require some down time, to get it started / stopped. Best to use it on a test environment to identify the bottle necks.

You can use ants profile as well as performance counter of the system. It will help you to determine whats the problem.
Here are some details about performance counter..
http://msdn.microsoft.com/en-us/library/fxk122b4.aspx
http://msdn.microsoft.com/en-us/library/ms979204.aspx
http://www.codeproject.com/KB/dotnet/perfcounter.aspx

I would recommend to take several memory dumps of the process in Production, look at all the stack traces and see if you find a pattern.

Replicating load related crashes in non-production environments

We're running a custom application on our intranet and we have found a problem after upgrading it recently where IIS hangs with 100% CPU usage, requiring a reset.
Rather than subject users to the hangs, we've rolled back to the previous release while we determine a solution. The first step is to reproduce the problem -- but we can't.
Here's some background:
Prod has a single virtualized (vmware) web server with two CPUs and 2 GB of RAM. The database server has 4GB, and 2 CPUs as well. It's also on VMWare, but separate physical hardware.
During normal usage the application runs fine. The w3wp.exe process normally uses betwen 5-20% CPU and around 200MB of RAM. CPU and RAM fluctuate slightly under normal use, but nothing unusual.
However, when we start running into problems, the RAM climbs dramatically and the CPU pegs at 98% (or as much as it can get). The site becomes unresponsive, necessitating a IIS restart. Resetting the app pool does nothing in this situation, a full IIS restart is required.
It does not happen during the night (no usage). It happens more when the site is under load, but it has also happened under non-peak periods.
First step to solving this problem is reproducing it. To simulate the load, we starting using JMeter to simulate usage. Our load script is based on actual usage around the time of the crash. Using JMeter, we can ramp the usage up quite high (2-3 times the load during the crash) but the site behaves fine. CPU is up high, and the site does become sluggish, but memory usage is reasonable and nothing is hanging.
Does anyone have any tips on how to reproduce a problem like this in a non-production environment? We'd really like to reproduce the error, determine a solution, then test again to make sure we've resolved it. During the process we've found a number of small things that we've improved that might solve the problem, but I'd really feel a lot more confident if we could reproduce the problem and test the improved version.
Any tools, techniques or theories much appreciated!

You can find some information about troubleshooting this kind of problem at this blog entry. Her blog is generally a good debugging resource.

I have an article about debugging ASP.NET in production which may provide some pointers.

Is your test env the same really as live?
i.e
2 separate vm instances on 2 physical servers - with the network connection and account types?
Is there any other instances on the Database?
Is there any other web applications in IIS?
Is the .Net Config right?
Is the App Pool Config right for service accounts ?
Try look at this - MS Article on II6 Optmising for Performance
Lots of tricks.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex