I manage an ASP.NET MVC 3 website with multiple online transactions. In the website, customers can place orders, pay bills while vendors can bill customers. All this can happen simultaneously so I have semaphores to ensure thread safety.
What I have noticed is that about once a week, the website stalls for ten minutes. My first tought was for deadlocks in the semaphores, but after putting in place a semaphore log and analysing the results, there seems to be no deadlocks. Also, the website comes back by itself after ten minutes.
While investigating, I noticed that the entire website becomes irresponsive and not just the parts using the semaphores. They all use the database tough. That is why my primary suspect is the database.
What is stranger is that every time, the website freeze for ten minutes almost to the second. Could SQL Server have a scheduled maintenance or anything that could explain this delay? If not, do you have any idea what could cause this?
The answer to your question seems to be "yes". Something is happening in the environment that is locking things up.
Have you run sp_who2 to see what is running when it stalls?
If that is inconvenient, then set up a job to dump sp_who2 output into a table every five minutes. When it stalls, you can see what is running and work from there.
I have faced what may be a similar problem, where the master database seems to be getting locked up. As a consequence, renaming databases does not work. Fortunately, this is not in a live transaction environemnt, so waiting five minutes and trying again does the trick.
ASP.NET Hangs can happen for a variety of reasons. Typically you get Connection or Command timeouts when you have problems with SQL not hangs.
You're much better off
Grabbing the Debugging Tools for Windows
Use AdPlus to grab a memory dump (adplus -hang -pn processname.exe) or DebugDiag and setup a dump rule
Use WinDbg (or VS 2010 for 4.0 framework) (after you set up a symbol cache) and start examining what's happening using
!threads or !dumpheap -stat to inspect the threads and the heap objects.
Please note debugging production issues is very hard and WinDbg is not a friendly tool but guessing and looking at logs is even less so.
Related
We've been struggling with this for the past 12 or so months. We think it's due to either one or two apps that are leaking memory or a large amount of leaks that have finally accumulated over years of programming in classic ASP. We've begun the conversion to ASP.NET but we still have a large number of apps in classic.
We've tried changing how IIS restarts, depending on CPU and memory usage and we've tried to clean up some processes. We've installed multiple analytical tools to be able to track exactly where it's coming from to no avail.
Just today we were able to finally track down a more detail error message, "Detected possible blocking or leaked critical section at asp!g Template cache+88 owned by thread 72 in W3WP". It also states that "ASP.DLL is currently holding a Critical Section Lock on ASP template cache manager...".
So, is there any tool that will help track where our leak is coming from? Or maybe a better way to restart this before it freezes our whole web process?
I appreciate your time!
You have to use cache class (to html) for the most viewed pages look at http://www.webdevbros.net/2006/11/18/cache-object-for-classic-asp/.
You have to close all connections at the end of pages.
These will solve the memory leak.
I work for a hosting company, providing ASP.Net 3.5 hosting. Honestly, we usually provide quite good uptime and velocity. However, we are having problems with one of our shared pools. As usual, we try to maximize the number of webs that can run into one pool.
Lately we are suffering continuous hangs. The process doesn't crash, but starts to show OutOfMemoryExceptions or stops processing requests. We think this is responsability of one of the applications (it would be great to know which one).
I have some memory dumps that I have processed with WinDbg. I've run f.e:
!dumpheap -stat
This method provide global memory usage of objects. Nothing remarkable... Also I've checked:
~*e!clrstack
I see various non managed threads. In those who are managed appears stacks like:
[HelperMethodFrame_1OBJ: 0f30e320]
System.Threading.WaitHandle.WaitMultiple(System.Threading.WaitHandle...
0f30e3ec 7928b3ff System.Threading.WaitHandle.WaitAny(System.Threading...
0f30e40c 7a55fc89 System.Net.TimerThread.ThreadProc()...
0f30e45c 792d6e46 System.Threading.ThreadHelper.ThreadStart_Context(System...
0f30e468 792f5781 System.Threading.ExecutionContext.runTryCode(System...
At least, I haven't seen exception throwing or similar (in that moment). I've also had access to two scripts written by Tess Ferrandez for calculating the number of sessions and size. Also here not promising results. Anything peculiar or remarkable (24000 bytes as average).
I would like to know what kind of strategies are you usually using facing this kind of problems. Have you ever used Microsoft Support?
Thanks a lot!
Very nice question, well a bad asp.net can hang all shared web apps on the same pool...
Ok let see... if the problem is on memory, get the VMMap from Sysinternals, and also the Process Explorer
Run them both, and from Process explorer find the PID number of pool that you wish to investigate, its under the inetinfo.exe, and have probably the name aspnet_wp.exe.
Now on the VMMap add for monitoring this Pool using for help the PID, and voila, you see the memory and the open images (aspx files) that probably are a lot and make the problems... The files that you going to see are located on temporary of asp.net Framework, but you can connect them and see from witch site they come from.
Well if the problem is not on memory, but the programmer have create bad loops, or even create thread sleeps, then I think process explorer is a way to investigate the pools and search for whats eating the power.
Additional
Maybe a pool recycle every 15minute can solve this issue ?
More about
In those videos there are a lot of informations about VMMap and memory manager.
Mysteries of Windows Memory Management, Part 1, and , Part 2
There are many tools, but it sounds like your main goal is to determine what's causing the problem. This can be done very simply with a binary search.
Break the pool in half, and see which one crashes. Repeat until you have a crashed pool with only one application in it.
This is already O(log2n), but you can speed the process up arbitrarily by dividing into more than two sub-pools.
Some background info:
We have several websites running on a 64-bit machine with IIS6
These websites all have the same core code, but different skins and content
We have a SQL 2005 database which is fairly heavily used throughout the site
Historically we've used SQL stored procs, but have been gradually transitioning to NHibernate. The majority of our code uses NHibernate now, but not all.
These sites have been running fine on our live web server for a while, although we get a few errors a day regarding SQL connectivity / deadlocking.
Last Thursday we noticed the sites going very slow, then checking task manager revealed one of the websites was hogging over 1.6Gb of memory. Ever since then we've been restarting the app and watching it slowly increase in size over the course of the day.
We apparently have a memory leak (or at least, that's the effect), but I'm losing hair trying to work out how to trace it.
It only appears to be happening on this one website, even though as far as I am aware nothing had changed in the code before it started happenning. It is, however, our busiest website so it could be a traffic issue.
Debug Diagnostics hasn't revealed any issues.
Refreshing certain pages very quickly causes the memory to jump up rapidly, then fall slightly, but all the time the gradual progression is upwards.
I cannot replicate the issue on our test servers or locally. Probably because the traffic has something to do with it.
My suspicion is that the problem lies in database connectivity / locking. However, I'm not sure how that would cause the problem specified.
Any ideas?
Edit
Okay so not exactly sure I've found the problem but we're getting closer. It's definately SQL related. The error log reveals lots of errors since last thursday.
It all happened after we ran some windows updates on our servers. One of the updates failed on the SQL server so not sure if this caused some problems.
The warnings we're getting are:
SQL Server has encountered XX occurence(s) of I/O requests taking longer than 15 seconds to complete on file .. tempdb.mdf
Where XX is anything between 17 and 90! Does that sound like a deadlocking issue?
Followed by the following erors:
Unable to complete login process due to delay in opening server connection
These coincide with our log times for when the websites have been "blipping".
We've increased the page file size on SQL server to the recommended size, as it was set to a max of 4Gb, but recommended was 12Gb. I think we may need to roll back the windows updates we did on Thursday if that doesn't fix it.
Unfortunately I can't get into Activity monitor as it tells me Timeout expired!
Edit
Okay after a reboot I'm into Activity monitor. How many sleeping processes would you say would be normal? We have roughly 127 sleeping. That's serving over 10 websites.
If there is a deadlock or timeout issue, will NHibernate not clean up its connections properly?
Okay so in the end it seems it's quite complex. Sql deadlocks and data problems, heightened it seems by anti-virus software that was locking up or choking on a file.
Turning off the anti-virus reduced the problems, but we still need to resolve the underlying data issues.
One of our web servers is suffering from random w3wp.exe crashing and after a couple of weeks of debugging i simply cannot figure out why. The only thing that has helped so far is reducing the max worker processes from 15 to 5 however this isn't ideal as we are using a multi-cpu machine in the hopes of reducing the total number of servers needed. We serve a large volume of small requests so parallel processing is a requirement.
As far as I am aware all possible sources of parallel processing collision have been addressed using thread locking.
Win 2008 64Bit SP2
IIS7
Dual 3.1Ghz Xeon
4Gb Ram
First error:
Application: w3wp.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an internal error in the .NET Runtime at IP 70D9CECA (70D40000) with exit code 80131506.
Followed straight away by:
Faulting application w3wp.exe, version 7.0.6002.18005, time stamp 0x49e023cf, faulting module clr.dll, version 4.0.30319.1, time stamp 0x4ba1d9ef, exception code 0xc0000005, fault offset 0x0005ceca, process id 0x%9, application start time 0x%10.
Many Thanks
Edit
Problem eventually solved. It turned out SQL server was unmounting the database straight after every query, so every new query had to wait for it to re-mount. Anyway, telling SQL Server not to do that seems to have solved it, no idea how but it's working so I'm happy
Problem eventually solved. It turned out SQL server was unmounting the database straight after every query, so every new query had to wait for it to re-mount. Anyway, telling SQL Server not to do that seems to have solved it, no idea how but it's working so I'm happy
Exception code 0xc0000005 generally points to memory access violation. Look for any unsafe component you may be using.
You are in for a doozy of a ride. These exceptions are extremely tricky to track down and correct.
The first step is to get the IIS Debug Diagnostic Tool (v1.1). Once you have installed this, you'll need to set up some tracking projects and then attach the debugger to your running processes. Keep in mind, this tool collects a LOT of data (it can be in excess of 1GB of stuff), so combing through it may be a hassle, but it has a good potential of telling you what modules are causing the crash and what modules are interfering.
The reason w3wp.exe is crashing, though, is that an unhandle-able exception is occurring during phases of the transaction that your code/health-monitoring/etc are already completed.
In my own personal case, I found that decoupling the session from the process solved the problem. I never discovered the full reason, but the best guess we had was that the memory requirements for paging were too great for the w3wp.exe to handle all at the same time. Once we decoupled into an external session state server, the problem went away.
It may be time to re-think your web-garden. Scott Forsyth has an interesting 11 minute vLog on why webgardens are counterproductive: http://dotnetslackers.com/articles/iis/Why-You-Shouldnt-Use-Web-Gardens-in-IIS-Week-24.aspx
Links to articles he mentions in his VLog:
Tuning recommendations for IIS6 and IIS7 -- read the whole article: http://support.microsoft.com/kb/821268 Further information http://blogs.msdn.com/b/tmarq/archive/2007/07/21/asp-net-thread-usage-on-iis-7-0-and-6-0.aspx
His bottom line is if you have performance problems that are resolved by web gardens—use the web gardens as a great crutch until the underlying performance issue (usually resource contention) is resolved
An ASP.NET web app running on IIS6 periodically shoots the CPU up to 100%. It's the W3WP that's responsible for nearly all CPU usage during these episodes. The CPU stays pinned at 100% anywhere from a few minutes to over an hour.
This is on a staging server and the site is only getting very light traffic from testers at this point.
We've running ANTS profiler on the server, but it's been unenlightening.
Where can we start finding out what's causing these episodes and what code is keeping the CPU busy during all that time?
Standard Windows performance counters (look for other correlated activity, such as many GET requests, excessive network or disk I/O, etc); you can read them from code as well as from perfmon (to trigger data collection if CPU use exceeds a threshold, for example)
Custom performance counters (particularly to time for off-box requests and other calls where execution time is uncertain)
Load testing, using tools such as Visual Studio Team Test or WCAT
If you can test on or upgrade to IIS 7, you can configure Failed Request Tracing to generate a trace if requests take more a certain amount of time
Use logparser to see which requests arrived at the time of the CPU spike
Code reviews / walk-throughs (in particular, look for loops that may not terminate properly, such as if an error happens, as well as locks and potential threading issues, such as the use of statics)
CPU and memory profiling (can be difficult on a production system)
Process Explorer
Windows Resource Monitor
Detailed error logging
Custom trace logging, including execution time details (perhaps conditional, based on the CPU-use perf counter)
Are the errors happening when the AppPool recycles? If so, it could be a clue.
It's not much of an answer, but you might need to go old school and capture an image snapshot of the IIS process and debug it. You might also want to check out Tess Ferrandez's blog - she is a kick a** microsoft escalation engineer and her blog focuses on debugging windows ASP.NET, but the blog is relevant to windows debugging in general. If you select the ASP.NET tag (which is what I've linked to) then you'll see several items that are similar.
If your CPU is spiking to 100% and staying there, it's quite likely that you either have a deadlock scenario or an infinite loop. A profiler seems like a good choice for finding an infinite loop. Deadlocks are much more difficult to track down, however.
Process Explorer is an excellent tool for troubleshooting. You can try it for finding the problem of high CPU usage. It gives you an insight into the way your application works.
You can also try Procdump to dump the process and analyze what really happened on the CPU.
Also, look at your perfmon counters. They can tell you where a lot of that cpu time is being spent. Here's a link to the most common counters to use:
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/852720c8-7589-49c3-a9d1-73fdfc9126f0.mspx?mfr=true
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/be425785-c1a4-432c-837c-a03345f3885e.mspx?mfr=true
We had this on a recursive query that was dumping tons of data to the output - have you double checked everything does exit and no infinite loops exist?
Might try to narrow it down with a single page - we found ANTS to not be much help in that same case either - what we ended up doing was running the site hit a page watch the CPU - hit the next page watch CPU - very methodical and time consuming but if you cant find it with some code tracing you might be out of luck -
We were able to use IIS log files to track it to a set of pages that were suspect -
Hope that helps !
This is a guess at best, but perhaps your development team is building and deploying the application in debug mode, in stead of release mode. This will cause the occurrence of .pdb files. The implication of this is that your application will take up additional resources to collect system state and debugging information during the execution of your system, causing more processor utilization.
So, it would be simple enough to ensure that they are building and deploying in release mode.
This is a very old post, I know, but this is also a common problem. All of the suggested methods are very nice but they will always point to a process, and there are many chances that we already know that our site is making problems, but we just want to know what specific page is spending too much time in processing.
The most precise and simple tool in my opinion is IIS itself.
Just click on your server in the left pane of IIS.
Click on 'Worker Processes' in the main pane. you already see what application pool is taking too much CPU.
Double click on this line (eventually refresh by clicking 'Show All') to see what pages consume too much CPU time ('Time elapsed'
column) in this pool
If you identify a page that takes time to load, use SharePoint's Developer Dashboard to see which component takes time.