Troubleshooting an IIS .NET website outage

Troubleshooting an IIS .NET website outage - asp.net

Last night one of the websites (.NET 4.0 forms) hosted on my Win 2008 R2 (IIS 7.5) Server started to time out throwing the following error for all connected users.
TYPE System.Web.HttpException
MESSAGE Request timed out.
DETAIL System.Web.HttpException (0x80004005): Request timed out.
The outage was confined to just one website within IIS, the others continued to work fine.
Unfortunately I was unable to identify why the website was timing out. Here are the steps I took:
First thing I did was look at the task manager which revealed normal CPU and memory usage. Network activity was also moderate.
I then opened IIS to look at the live connections under 'Worker Processes'. There were about 60 live connections, so it didn't look like anything DDoS related.
Checked database connectivity (hosted on a separate server), all fine!
I then reset the website on IIS. That didn't work
I tried to then do a complete iisreset...still no luck :(
In the end (and under some duress) the only thing I could think to do to resolve this was to restart the server.
Restarting the server worked but I am nervous not knowing why this happened in the first place. Can anyone recommend any checks that I failed to carryout? Is there an official checklist for working through these sorts of IIS problems? I have reviewed the IIS logs but don't see anything unusual on the run up to the outage.
Any pointers or links to useful resources to help me understand and mitigate against this in future will be much appreciated.
EDIT
The only time I logged into the server that day was to add an additional web handler component (for remote deploy) to IIS Web Deploy. I'm doubtful this caused the outage as the server worked for for 6 hours after.

Because iisreset didn't helped and you had to restart whole machine, I would suspect it was a global resources shortage and mostly used website (or most resource consuming) was impacted. It could be because of not available RAM, network connections congestion due to some malfunctioning calls (for example a lot of CLOSE_WAIT sockets exhausting connections pool, we've seen that in production because of malfunction of external service). It could be also one specific client problem, which was disconnected after machine restart so eventually the problem disappeared.
I would start from:
Historical analysis
review Event Viewer to see any errors/warnings from that period of time,
although you have already looked into IIS logs, I would do it once again with help of Log Parser Lizard to make some statistics like number of request per client, network bandwith per client, average response time per client and so on.
Monitoring
continuously monitor Performance Counters:
\Processor(_Total_)\% Processor Time,
\.NET CLR Exceptions(_Global_)\# of Exceps Thrown / sec,
\Memory\Available MBytes,
\Web Service(Default Web Site)\Current Connections (per each your site name),
\ASP.NET v4.0.30319\Request Wait Time,
\ASP.NET v4.0.30319\Requests Current,
\ASP.NET v4.0.30319\Request Queued,
\Process(XXX)\Working Set,
\Process(XXX)\% Processor Time (XXX per each w3wp process),
\Network Interface(XXX)\Bytes total / sec
run Performance Analysis of Logs (PAL) Tool during time of failure to make a very detailed analysis of performance counters data,
run netstat -ano to analyze network traffic (or TCPView tool even better)
If all this will not lead you to any conclusion, create a Debug Diagnostic rule to create a memory dump of the process for long running requests and analyze it with WinDbg and PSSCor extension for .NET debugging.

Related

ASP.NET Application Pool process often gets stuck

We have a webserver (WinSrv2019) running a few ASP.NET 4.8 websites with a huge traffic.
Each site is running in its personal 64-bit Application Pool (Integrated Pipeline), .NET CLR Version v4.0.
One of the websites gets stuck apparently and stops responding.
Sometimes it continues a few minutes, sometimes it can remain stucked for a hour.
During this period our inter-application logs are empty.
Moreover, IIS log that logs incoming requests is empty as well and incoming requests during this period aren't being logged.
On the client side browser doesn't get any error, but just continue waiting for response.
Since it happens with the single site only, we've excluded the option that the problem refers server hardware. The server resources usage aren't get over 50%
On the second hand we don't have idea what can cause Application Pool to get stuck in this way.
Please advice.
How we can investigate this issue?
Is there any tool allowing to "debug" what happens inside IIS on the low level?
Thanks in advance

IIS bottleneck?

I have 3 machines - one which is IIS, one with a database and one from which I test the efficiency of my application - which means:
Using The Grinder I run 1000 instances of my application (hosted on the IIS and operating with the database on the machine with SQL Server). And using perfmon I observe that there really are 1000 requests.
BUT the problem is that connecting to this application (IIS) from another computer is very slow. I suppose there is some bottleneck on the IIS side but I cannot find it - CPU usage is less than 10%.
I think I changed every option in the IIS Manager and machine.config and web.config files - nothing seems to have any effect.

First thing is you need to confirm if you have a slowness issue while browsing the site
Check the IIS logs and look for the time-taken field. If the time taken is more than 10 seconds then it is considered as a Slowness.
The slowness might be because of several reasons. It might be because of the Network or might be because something in your code might be causing it.
I suggest you to capture a Network trace using Netmon or WireShark in case if its a Network related.
If its not Network you can collect a Process dump using Debug diag 2 update 2 tool.
You can check the below link to collect the dumps and check them and try to find out if there is any slowness:
https://msdn.microsoft.com/en-us/library/ff420662.aspx

w3wp.exe is restarting but all GET requests are eventually queued and serviced?

I have a w3wp.exe that is restarting on my IIS server (see specs below). Memory gradually climbs to ~3G then it randomly restarts itself about every 1-2min.
Memory Usage:
The odd thing is that once this memory drop (what looks like a restart - btw...the app pool does not get recycled/restarted) happens GET requests are queued but then serviced as soon as the service warms/starts up (causing a delay in responses to our clients - who were initially reporting delayed reponse times on occasion).
I have followed this link to get a stack dump once the .exe restarts (private bytes go to ~0) but nothing gets logged (no .dmp file) with diag debug once the service restarts.
I see tons of warnings in my webserver (IIS) log but that's it:
A process serving application pool 'MyApplication' suffered a fatal
communication error with the Windows Process Activation Service. The
process id was '1732'. The data field contains the error number.
ASK: I'm not sure if this is a memory limitation, if cacheing is not playing well with my threads/tasks, if cacheing is blowing up, if there is watchdog service restarting my application, etc,. Has anybody run across something similar with w3wp.exe restarting? It's hard to tell because diagdebug is not giving me a dump once it restarts.
SPECS:
MVC4 Web API servicing GET requests (code is debug build with debug=true)
Uses MemoryCache with Model and Business Objects with cache eviction set to 2hrs...uses
Task (TPS) for each new request.
Database: SQL Server 2008R2
Web Servers: Windows Server 2008R2 Enterprise SP1 (64bit, 64G RAM)
IIS 7.5
One application pool...no other LOB applications running on this server

Your first step is to reproduce the problem in your test environment. Setup some kind of load generation app (you can write it yourself pretty easily) and get the same problem happening. Then turn off debug in web.config and see if that fixes the issue. Then change it to be a release build and test again.
I've never used memorycache - try reducing the cache eviction time or just turn it off and see if that fixes the issue. Good luck :)

ASP.NET high CPU load doesn't occur when running the service as a console once?

My environment: Windows Server 2008, IIE 7.0, ASP.NET
I developed a Silverlight client.
This client gets updates from the ASP.NET host through a WCF web service.
We get 100% CPU usage and connection drops when we have a very low number of users (~50).
The server should clearly be able to handle a lot more than that.
I ran some tests on our DEV server and indeed 100 requests / s maxes out the CPU.
What's odd is that even if the service is replaced by a dummy sending back hardcoded data the service still maxes out the CPU.
The thread count looked very low at about 20 so I thought there was some contention somewhere.
I changed all configuration options I could find to increase the worker threads (processModel, httpRuntime and the MaxRequestsPerCPU registry entry).
Nothing changed.
Then I stopped the IIS server and ran the web service as a console (removing all the ASP authentication references).
The service maxed out the CPU as well.
Then comes the magic part: I killed the console app and restarted IIS and now the service runs a 5-60% CPU with 100 requests / s and I can see 50+ worker threads.
I did the same thing on our preprod machine and had the same magic effect.
Rebooting the machines keeps the good behaviour.
So my question is: what happened to fix my IIS server? I really can't understand what fixed it.
Cheers.

Find out the root cause of the high CPU usage, and then you can find a fix,
http://support.microsoft.com/kb/919791

impact of disabling recycling of worker process in IIS application pool

I have a WCF service hosted in IIS which takes a long time (around 5 hours) to execute. the WCF service basically generates some reports using SSRS (SQL server reporting services) and saves them to a locaton on the server. this service was actually stopping after generating few reports, so I disabled the "recycling of worker process", "shut down worker processes after being idle" and "limit kernel request queue" in application pool and that fixed the issue and all the reports were generated regardless of the amount taken to generate them. but I am not sure if this is the right fix for this and I would like to know what is the impact of unchecking these settings in application pool for the WCF service in IIS? and is there any better way to get around this problem?

For any long running process it is much better to do it outside of IIS.
In this case I would have a regular windows service running that monitored a request queue. When a request comes in to generate a report, it would then spin off a thread to perform the generation.
The web service would be responsible for 3 things. First, adding an item to the queue to be handled. Second, checking status on the queue as to whether the report is ready. Third, sending the completed report back to the calling client.
This would allow the client to essentially do a fire and forget on the report request and call back later to check on it's status. Further it would mean that if IIS recycled for whatever reason you are still OK.
For bonus points I would add some error handling code that when the windows service restarted it could restart report jobs that were in the middle of execution. This would make it a bit more robust and allow you to reboot the server at any point.

I have disable also all the automatic shut down process from iis for my application with out any issue. I have monitor the memory limits and of course the program work smoothly with out any issues on memory.
I think that this automatic shut down triggers are designed mostly for process that keep too many web sites together and possible some of them have not good programming. But if you are the master of your iis, and you have check your program that have not memory issue, then is better to not shut it down, or at least control the shut down process with some way.
Ok is better to make long running process outside IIS, but is not so simple to develop it, not so simple to install it, not so simple to check it out.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex