Troubleshooting Intermittent Failures on Web Applications (ASP.NET) - asp.net

Got reports of a web app going down twice in three weeks. Need to do some root cause analysis. works fine after a reboot. I'm not really an expert in this field.
It is hosted on IIS and Windows 2003.
There is nothing interesting in the event viewer, and IIS logs just show lots of successful GET operations. There is nothing interesting in SQL logs on the remote SQL server it connects to.
I'm not sure how to decipher the IIS log. It just looks like a bunch of successful GET messages with no errors.
I don't think I can really get too much further with root cause analysis track down the cause of the issue?

The only thing you could try to get some real results is this excellent blog by Tess Ferrandez. I think that you will find crash lab very enlightening :)

Take a look at this, it might help you find the app shutdown cause.

Depending on your traffic, twice in three weeks doesn't sound like a lot. The root cause may relate to the fix- if you were able to bring it back up by restarting IIS, it could be a memory leak. If you had to restart the server, it could be a deeper problem.

Related

Debugging an ASP.NET website that is running slowly?

We're getting more and more complaints from users that our ASP.NET 4.5.2 website is running slowly or just generally "freezing up." Things look fine from our test servers and from our workstations, but we're probably using better workstation hardware and browsers than our customers. We're running ASP.NET 4.5.2, C#, SQL Server.
What are some areas that we should concentrate on for debugging such a nebulous request? Should I be looking at system performance and resources on the application servers? System performance and resources on the SQL server? We're tracking application page load times, and they don't seem to be excessive or much changed from months ago, even though customer complaints have gone up.
What are some best practices for starting our investigation, and where's the low hanging fruit on improving performance overall?
If your page is getting slower "sometimes" during the day, I would suggest first to check the Performance Monitor at your IIS server. This could easily be an issue with the server hitting it's limits (Machine or IIS settings). One way verifying this is by creating a sandbox server and run your application from there for your testers.
After that if you are executing stored procedures, add a monitor function in them to gather some cases and then check if any of them causes the process to freeze or delay.
I must also mention here the possibility of locked tables, so maybe a code review maybe in line. (most time consuming from all the above..)
This should be able to give you a hint where your issue originate.
Good luck
If you suspect some SQL problems, you can try to run a Sql Server Profiler to check what is running at the moment and if there is something that could be "freezing up" your system. This way you can check what is going on when the system is slow.
Reference

How do I track down sporadic ASP.NET performance problems in a production environment?

I've had sporadic performance problems with my website for awhile now. 90% of the time the site is very fast. But occasionally it is just really, really slow. I mean like 5-10 seconds load time kind of slow. I thought I had narrowed it down to the server I was on so I migrated everything to a new dedicated server from a completely different web hosting company. But the problems continue.
I guess what I'm looking for is a good tool that'll help me track down the problem, because it's clearly not the hardware. I'd like to be able to log certain events in my ASP.NET code and have that same logger also track server performance/resources at the time. If I can then look back at the logs then I can see what exactly my website was doing at the time of extreme slowness.
Is there a .NET logging system that'll allow me to make calls into it with code while simultaneously tracking performance? What would you recommend?
Every intermittent performance problem I ever had turn out to be caused by something in the database.
You need to check out my blog post Unexplained-SQL-Server-Timeouts-and-Intermittent-Blocking. No, it's not caused by a heavy INSERT or UPDATE process like you would expect.
I would run a database trace for 1/2 a day. Yes, the trace has to be done on production because the problem doesn't usually happen in a low use environment.
Your trace log rows will have a "Duration" column showing how long an event took. You are looking at the long running ones, and the ones before them that might be holding up the long running ones. Once you find the pattern you need to figure out how things are working.
IIS 7.0 has built-in ETW tracing capability. ETW is the fastest and least overhead logging. It is built into Kernel. With respect to IIS it can log every call. The best part of ETW you can include everything in the system and get a holistic picture of the application and the sever. For example you can include , registry, file system, context switching and get call-stacks along with duration.
Here is the basic overview of ETW and specific to IIS and I also have few posts on ETW
I would start by monitoring ASP.NET related performance counters. You could even add your own counters to your application, if you wanted. Also, look to the number of w3wp.exe processes running at the time of the slow down vs normal. Look at their memory usage. Sounds to me like a memory leak that eventually results in a termination of the worker process, which of course fixes the problem, temporarily.
You don't provide specifics of what your application is doing in terms of the resources (database, networking, files) that it is using. In addition to the steps from the other posters, I would take a look at anything that is happening at "out-of-process" such as:
Databases connections
Files opened
Network shares accessed
...basically anything that is not happening in the ASP.NET process.
I would start off with the following list of items:
Turn on ASP.Net Health Monitoring to start getting some metrics & numbers.
Check the memory utilization on the server. Does re-cycling the IIS periodically remove this issue (memory leak??).
ELMAH is a good tool to start looking at the exceptions. Also, go though the logs your application might be generating.
Then, I would look for anti-virus software running at a particular time or some long running processes which might be slowing down the machine etc., a database backup schedule...
HTH.
Of course ultimately I just want to solve the intermittent slowness issues (and I'm not yet sure if I have). But in my initial question I was asking for a rather specific logger.
I never did find an answer for that so I wrote my own stopwatch threshold logging. It's not quite as detailed as my initial idea but it has the benefit of being very easy to apply globally to a web application.
From my experience performance related issues are almost always IO related and is rarely the CPU.
In order to get a gauge on where things are at without writing instrumentation code or installing software is to use Performance Monitor in Windows to see where the time is being spent.
Another quick way to get a sense of where problems might be is to run a small load test locally on your machine while a code profiler (like the one built into VS) is attached to the process to tell you where all the time is going. I usually find a few "quick wins" with that approach.

How to find issue on remote server that you don't have access to?

Ok, so this is my dilemma... I have an ASP.NET MVC site that is running into some conditions that it is pegging the processor on the iss boxes it's running on. I don't have access to these servers (it's a farm of about 5 iis6 boxes behind a netscalar). I am doing some logging to a sql database, but the problem is that when the cpu pegs my database starts timing out. The iis servers are hosted in house, but I can't get access to them.
And to make things ever more complicated, I can't reproduce any of these issues in my qa environment (which I don't have access to either). QA is setup to similarly to our prod environment, but it runs on a single box that isn't behind a netscalar.
So, any thoughts on the best way to try to track down where my issues lie? Thanks!
Since you are already logging to a database, why you don't log to another database, install this DB on another computer, so that when your MVC application starts killing the CPU the database won't be affected (since it is working on another computer).
or you could log to an FTP folder that you can access.
Hope I helped.
Regards.
ASP.NET Trace. Haven't used MVC, but I'm assuming it still works...
http://msdn.microsoft.com/en-us/library/y13fw6we%28VS.71%29.aspx
If you want to know what is going on with the system you could read from the event viewer programatically:
http://support.microsoft.com/kb/815314
This should help you to learn what is going on with the system. This way you can build a web interface for it and capture any info you may want to look at for what is going.

ASP.NET Application becomes unresponsive

I make an application for querying from and inserting data to the database using ASP.NET 3.5 linq to SQL.
It works fine in the development server.
But after deploying to the staging server, after the first few requests, the application seems unresponsive no matter what I type in the URL. The whole IIS application is frozen. I know I can restart the application to fix that. But I don't want it to happen again in the future.
What are the possible causes of this?
I've just found a ref about this problem:
http://blogs.msdn.com/lucascan/archive/2009/04/14/troubleshooting-an-unresponsive-web-server-iis-part-1-of-2-gathering-the-data.aspx
http://forums.iis.net/p/1154624/1893546.aspx
It's not easy to provide an exact cause since we have no idea how the application was written, what dependencies exist, whether service packs/patches are installed etc. What we could help with is debugging the application.
Things I would start with:
Find out if other applications have the same problems.
Review the server event logs on both servers.
Memory, CPU usage etc on the server with Performance Monitor (perfmon.exe)
See what SQL is being generated with SQL Profiler.
Use an HTTP Analyzer like Fiddler to find out if the server is running anything in particular but the browser is not displaying it.
As BrianLy says this is one of those tricky to pin down situations. We had several problems with ASP.NET apps taking seemingly forever to start, this was down to our corporate firewall blocking crl.microsoft.com.
It's probably a stab in the dark, but it might be worth investigating. The chances of your issue being this sound slim though.
A quick test to see if it is something related to this is to add 127.0.0.1 crl.microsoft.com to your hosts file.

article about number of connected users?

Several months (maybe even a year or two) ago, I saw an asp .net article that showed how to tell how many people were connected to a running web application. Of course I only glanced over the article & didn't save it. Does anyone remember seeing the article or know where I can find it or perhaps something like it? I have searched Google from the best of my memory of the title & content but I'm getting no hits.
The reason I'm asking is because I have a WCF web service that has crashed several times after I publish updates and the only thing I can think that would cause these weird problems is that people are connected to it & its corrupting the files. I'm not going to publish any more updates during the day now, but we also have a couple of people that work during the night and it would be nice to see if people are connected or not before "flipping the switch".
Any help would be greatly appreciated...
Thanks,
Wali
The following article shows how you can use Session_Start and Session_End of the Global Application Class to count the number of active sessions:
How to show number of online users / visitors for ASP.NET website?
In your search, consider using keywords like
perfmon asp.net sessions
Intel has a good article. Unfortunately, it's 404 at the moment, but Google cache has a nice copy. Original link to the Intel "Using perfmon to tune n-tier .NET applications"
When your WCF service crashes, there are likely to be entries in the Windows event log. If not, then the service should be doing logging on its own. I suggest you look and find out whether the service may not have been telling you wnat's wrong.

Resources