How to troubleshoot unresponsive ASP .NET Web API on IIS?

How to troubleshoot unresponsive ASP .NET Web API on IIS? - asp.net

I am having an issue with Web API hosted on IIS. Here are the details
Environment: ASP.Net MVC, IIS, SQL server
Web API hosted on separate server, Load balanced with 2 servers, Big
IP
The API works fine in PROD but in lower environments it becomes unresponsive or very slow. Sometimes it takes 5 minutes to process a request. Otherwise it takes forever. The issues occurs only once or twice a month. App Pool recycle or IIS restart didn’t seem to help. But after rebooting server it works fine and processes the same request in 10 – 20 seconds.
Authentication is set for every request (i.e. 401 followed by 200). When issue occurs the IIS log has only 401 entry. Issue seems to happen on only 1 server as it starts to work fine after restarting only that server.
During the issue when another API request comes, it gets to the affected server. What might be the reason that the request doesn’t go to another server which is free?
The system logs were fine. The CPU utilization was 3%, Memory usage etc. looks good on the server at the time of issue. IIS configuration settings are same as PROD and look good.
What tools can be used to monitor IIS Apps, Server? Are there any free tools? Any help to troubleshoot this issue will be great.
Some errors on server:
Activation of app Microsoft.Windows.Cortana_cw5n1h2txyewy!CortanaUI failed with error: This app can't be activated by the Built-in Administrator. See the Microsoft-Windows-TWinUI/Operational log for additional information.
The Open Procedure for service "BITS" in DLL "C:\Windows\System32\bitsperf.dll" failed. Performance data for this service will not be available. The first four bytes (DWORD) of the Data section contains the error code.
Windows cannot load the extensible counter DLL ASP.NET_2.0.50727. The first four bytes (DWORD) of the Data section contains the Windows error code.
Cryptographic Services failed while processing the OnIdentity() call in the System Writer Object.
UPDATE:
DebugDiag2 Analysis - CrashHangAnalysis report.
Is this causing deadlock?
Thread ID Total CPU Time Entry Point for Thread
2 00:00:00.031 ntdll!RtlReleaseSRWLockExclusive+2200
0 00:00:00.030 w3wp+2e50
1 00:00:00.000 nativerd!DllGetClassObject+24680
3 00:00:00.000 ntdll!RtlReleaseSRWLockExclusive+2200
4 00:00:00.000 w3tp!THREAD_POOL::CreateThreadPool+350
4 Threads (40% of all threads) have this same call stack.
Note: Grouping of identical threads can be disabled in the 'Preferences' tab of the Analysis Options
Thread 2 - System ID 4704
Entry point ntdll!RtlReleaseSRWLockExclusive+2200
Create time 9/20/2021 1:00:34 PM
Time spent in user mode 0 Days 00:00:00.000
Time spent in kernel mode 0 Days 00:00:00.031
This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required.
Thread 3 - System ID 2576
Entry point ntdll!RtlReleaseSRWLockExclusive+2200
Create time 9/20/2021 1:00:34 PM
Time spent in user mode 0 Days 00:00:00.000
Time spent in kernel mode 0 Days 00:00:00.000
This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required.
Thread 8 - System ID 2112
Entry point ntdll!RtlReleaseSRWLockExclusive+2200
Create time 9/20/2021 1:00:34 PM
Time spent in user mode 0 Days 00:00:00.000
Time spent in kernel mode 0 Days 00:00:00.000
This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required.
Thread 9 - System ID 5832
Entry point ntdll!RtlReleaseSRWLockExclusive+2200
Create time 9/20/2021 1:01:04 PM
Time spent in user mode 0 Days 00:00:00.000
Time spent in kernel mode 0 Days 00:00:00.000
This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required.
Instruction Address Source
[0x7ffadb029444] ntdll!NtWaitForWorkViaWorkerFactory+14
[0x7ffadaf9eb4e] ntdll!RtlReleaseSRWLockExclusive+296e
[0x7ffada7184d4] kernel32!BaseThreadInitThunk+14
[0x7ffadafd1781] ntdll!RtlUserThreadStart+21

Related

Sessions are killed after short time in 64bit application pool

We have a .net web application hosted on IIS 7.5.
Earlier this application was running on a 32bit application pool but some time ago we've switched to 64 bit application pool.
Recently users have started to complain that after 1-2 minutes of idling their session is being killed which we have confirmed today.
In the web.config file the session timeout is set to 60 minutes.
Also we have noticed in task manager that the w3wp process for this application consumes about 2-2,4GB of memory so maybe the problem is that the application pool is trying to recycle some memory?
The recycling is set to limited time periods 21:00 and 4:00
What could be the reason for this problems with sessions?
EDIT:
I have inspected some counters and done the basic memory dump analyze but I don't see any problems.
In the dump eeheap analyze I see only generation 2 objects about 10-30MB for every heap and I have 24 of them
Heap 0 (0000000003083a90) generation 0 starts at 0x00000000fff568b8 generation 1 starts at 0x00000000ffa6acf0 generation 2 starts at 0x00000000ff471000 ephemeral segment allocation context: none segment begin allocated size 00000000ff470000 00000000ff471000 00000000ffff8de0 0xb87de0(12090848) Large object heap starts at 0x00000006ff471000 segment begin allocated size 00000006ff470000 00000006ff471000 00000006ff7495c8 0x2d85c8(2983368)
Heap Size: Size: 0xe603a8 (15074216) bytes.
Heap 1 (00000000030889c0) generation 0 starts at 0x000000013fc36ed8 generation 1 starts at 0x000000013f949348 generation 2 starts at 0x000000013f471000 ephemeral segment allocation context: none segment begin allocated size 000000013f470000 000000013f471000 000000014035e7b8 0xeed7b8(15652792) Large object heap starts at 0x0000000703471000 segment begin allocated size 0000000703470000 0000000703471000 00000007035c5d58 0x154d58(1396056) Heap Size: Size: 0x1042510 (17048848) bytes.
EDIT: 2015-08-19 09:00
Those are the counters for 09:00 2015-08-19
What worries me is why the memory in task manager shows 2,5GB when the Bytes in all Heaps shows only about 100MB and why the Private Bytes (216MB) are bigger then Bytes in all Heaps?
The load in this current moment is about 40 users on this server.
EDIT 2015-08-19 14:09
After some time I see that there could be a problem with assemblies.
How can I check this with windbg when I'm on .NET 4.5 where there is no !dda command?

Try copy the running app to a different pool but in this new one disable all assemblies / references that you dont need, to see what is doing that.
Like you said i think that some assembler is crashing your application pool, maybe because maybe isnt support for 64 bits.
Try disabling all references that you dont use, update all, etc.

Server random downtime windows server 2003 sp2 .net4

Server Specks
Microsoft Windows Server 2003 Enterprise Edition SP2
IIS 6
.net4
Intel(R) Xeon(R) CPU
X5680 # 3.33GHz, 2.00GB of RAM
Physical Address Extension
I am having trouble finding the cause of our server's random downtime. Our clients inform us that their website goes down for hours at a time. Sometimes users are able to log in however the site is extremely slow/unstable and unusable. Sometimes users are not able to log in at all. When users are able to log in not all images are displayed (they get the image not found image).
We upgraded their website from .net1 to .net4 because we thought the cause of their downtime and random user log out was due to them running their website on .net1. The website was running fine with no issues for a few months.
The first time the server started to go down after that was due to the drive with which the website resided on running out of disk space. There was 40GB partitioned to this drive and 20GB was added. This didn't resolve the issue for very long.
The second time the server would randomly go down, I noticed in the Event viewer, that the web worker associated with the app pool used by the website would periodically require to be recylcled. That is, in the Security tab of the Event Viewer I would periodically see an event with ID 1074 reading 'A worker process with process id of '1540' serving application pool 'Net4' has requested a recycle because the worker process reached its allowed processing time limit.'. I then went into this app pool's properties and saw that the app pool would be recycled every 29 hours, which is the default. I modified this to have the app pool recycle every day at 3:00am. Since that we have not seen this event in the Event Viewer. We were able to catch the website during one of its downtimes before this was changed and recycled the app pool manually. This resolved the issue in this one instance.
This did not permanently fix the issue however, as we are still receiving emails from our client informing us that the website is down for hours at a time.
I then set up a performance monitor counter log. We have managed to monitor the server's performance during many of these downtimes. It does not appear to be a problem with memory as there is plenty of space on the drive. It does not appear to be a memory leak or related to excessive paging as there are no running processes which take up an excessive amount of % Processor Time and the Pages/Second Memory counter does not peak at an excessive amount during most of the downtime (I'll explain why excessive paging occurs later). The total IO Data Bytes/sec and IO Other Data Bytes/sec Process counter does not appear to be usually high or low during downtime. The total Thread Count and Handle Count Process counter do not exhibit any abnormal spikes or drops during this time. The total thread count, at a given time, seems to be between 600 and 900, give or take. The total handle count, at a given time, seems to be between 15,000 and 23,00, give or take. The % Time in Jit .NET CLR jit counter for instance w3wp is 0 for about half of the time and will randomly peak at almost 100 the other half, most of the time peaking for just a moment but rarely peaking for about 10 minutes, unrelated to downtime.
There are random times throughout the day where the process dsmcsvc takes up most, if not all, of the % Processor Time. This is a process run by the Symantec Antivirus software. When this process takes up the % Processor Time there is a corresponding event in the Event Viewer signifying that a new virus definition file has been uploaded that is, an Application event with ID 7 'New virus definition file loaded. Version: #version number#'. When this event occurs, the Pages/Sec counter spikes. Sometimes it spikes to only 200-300 but will at times peak over 10,000. This event seems to be completely unrelated to website downtime. I have researched the Symantec Antivirus software and found that there is a known memory leak in old versions of this software. I have found that this software is known to cause high memory usage when the link to a process called NavLogon.exe is broken/does not exist. This process does not appear to exist on the server so I currently have no way of restoring the link to it. I also found that this software uses Crypt32.dll and that old versions of Crypt32.dll have a known memory leak. The Crypt32.dll which exists on the server was last updated in 2007.
The Performance Monitor log monitors the total Sessions Active ASP.Net Applications counter. During downtime, the total number of sessions does not exhibit any abnormal behavior, there are a normal amount of active sessions during this time. Active sessions at a given time can be between 0 and 200. I was informed that the time when the most users are active is during 1st shift, however during about 10pm and 2am every day, this number peaks.
The site runs JavaScript client side, and Visual Basic.net server side. All users have about 10-15 session variables almost all of the time.
When the site goes down there are no events which seem to correspond to its downtime in the Event Viewer.
I also have set up a W3C Extended Log File Format log for this site. During downtime there seems be an excessive amount of GET requests for a Telerik.RadUploadProgressHandler.ashx.
I have seriously run out of ideas at this point and have extensively searched the web for solutions and come up empty. Any feedback as to why this may be occurring would be great.

It does not appear to be a problem with memory as there is plenty of space on the drive.
Really? Memory and hard drive space are two completely different things. 2GB of RAM was okay a decade ago, when that server was new, but today it's laughably small.
But don't bother upgrading or adding RAM. This server is old enough, the problem is probably just that the hardware is reaching the end of it's useful life. Additionally, the operating system is also nearing it's end of life. Server 2003 is scheduled for end of life on July 14, 2015. After that date, there will be no new patches of any kind produced for Server 2003... not even critical security patches. That will make Server 2003 completely unsuitable as a web server.
This seems like a good time to execute a transition to a completely new server.

Troubleshooting an IIS .NET website outage

Last night one of the websites (.NET 4.0 forms) hosted on my Win 2008 R2 (IIS 7.5) Server started to time out throwing the following error for all connected users.
TYPE System.Web.HttpException
MESSAGE Request timed out.
DETAIL System.Web.HttpException (0x80004005): Request timed out.
The outage was confined to just one website within IIS, the others continued to work fine.
Unfortunately I was unable to identify why the website was timing out. Here are the steps I took:
First thing I did was look at the task manager which revealed normal CPU and memory usage. Network activity was also moderate.
I then opened IIS to look at the live connections under 'Worker Processes'. There were about 60 live connections, so it didn't look like anything DDoS related.
Checked database connectivity (hosted on a separate server), all fine!
I then reset the website on IIS. That didn't work
I tried to then do a complete iisreset...still no luck :(
In the end (and under some duress) the only thing I could think to do to resolve this was to restart the server.
Restarting the server worked but I am nervous not knowing why this happened in the first place. Can anyone recommend any checks that I failed to carryout? Is there an official checklist for working through these sorts of IIS problems? I have reviewed the IIS logs but don't see anything unusual on the run up to the outage.
Any pointers or links to useful resources to help me understand and mitigate against this in future will be much appreciated.
EDIT
The only time I logged into the server that day was to add an additional web handler component (for remote deploy) to IIS Web Deploy. I'm doubtful this caused the outage as the server worked for for 6 hours after.

Because iisreset didn't helped and you had to restart whole machine, I would suspect it was a global resources shortage and mostly used website (or most resource consuming) was impacted. It could be because of not available RAM, network connections congestion due to some malfunctioning calls (for example a lot of CLOSE_WAIT sockets exhausting connections pool, we've seen that in production because of malfunction of external service). It could be also one specific client problem, which was disconnected after machine restart so eventually the problem disappeared.
I would start from:
Historical analysis
review Event Viewer to see any errors/warnings from that period of time,
although you have already looked into IIS logs, I would do it once again with help of Log Parser Lizard to make some statistics like number of request per client, network bandwith per client, average response time per client and so on.
Monitoring
continuously monitor Performance Counters:
\Processor(_Total_)\% Processor Time,
\.NET CLR Exceptions(_Global_)\# of Exceps Thrown / sec,
\Memory\Available MBytes,
\Web Service(Default Web Site)\Current Connections (per each your site name),
\ASP.NET v4.0.30319\Request Wait Time,
\ASP.NET v4.0.30319\Requests Current,
\ASP.NET v4.0.30319\Request Queued,
\Process(XXX)\Working Set,
\Process(XXX)\% Processor Time (XXX per each w3wp process),
\Network Interface(XXX)\Bytes total / sec
run Performance Analysis of Logs (PAL) Tool during time of failure to make a very detailed analysis of performance counters data,
run netstat -ano to analyze network traffic (or TCPView tool even better)
If all this will not lead you to any conclusion, create a Debug Diagnostic rule to create a memory dump of the process for long running requests and analyze it with WinDbg and PSSCor extension for .NET debugging.

web app slow performance when leave idle for some time

We have a web application deployed on IIS 7.5 target framework 4.0
the application perform slow when leave idle for few minutes for first time and then perform as expected this happened each time application is idle.
With the help of fiddler I found its TCP/IP connection which is taking time about 21 secs whilein subsequent calls this time is 0.
The Idle time out is also set high and connection time out is also high in the IIS settings.
server is - Windows 2008 R2.
there is nothing in the event viewer related to the website.
we used form authentication but the time out for that is also set about 10 hours in the config file.
Can anybody point me to the setting with is affecting the response time after the app is idle for some time.
Note - this was working proper when deployed withing the LAN but this problem starts when deployed out of the LAN or in separate domain.

Problem
here is the problem in IIS app pool idle time out, its by default set to 20 minutes, after 20 minutes app pool shutdown if no request within 20 minutes,
when any request comes after 20 minute its again start,
The problem is that the first visit to an app pool needs to create a new w3wp.exe worker process which is slow because the app pool needs to be created, ASP.NET or another framework needs to be loaded, and then your application needs to be loaded. so it may take time 20-30 seconds or depends on the application content size.
Solution
so to avoid this type of delay we need to set the idle time out to 0.
now it will always load fast.
app pool setting

The IIS application pool is shut down after 30 minutes of inactivity. After that, when you make a request IIS basically has to start the website up again, which leads to the behavior you are describing. You can change the idle time of your website in IIS though to avoid it.
You could also look into the Auto-Start feature of the 4.0 framework.

Well, a bit late, but may help someone else. I had the same problem, nothing in the logs, spent days, then looking at the network adapter properties / configuration / power management - Allow computer to shut down save power was checked. Unchecked and the problem was solved.

W3WP memory and threads utilization every day at midnight

i have ASP.NET application (Microsoft "Stock Trader 5.0") installed on IIS 7.5 (Win 2008 R2) and i'm using in load application to load stress on the ASP.NET application.
every morning when i check the "perfmon" counters i see that around midnight (12 AM) i got memory and threads utilization problem (thread count jump from 150 to 1400, private memory increased in 200 MB for a few minutes).
this issue happened every day only at midnight, i tried to disabled the IIS logging, stop any schedule task that running around midnight, but i still got this issue.
there is something else that i can try or check to solve this issue ?
thanks !!

Try having a look at the access log files in: C:\inetpub\logs\LogFiles
They might show any spikes in the number of requests that would explain the increase in thread count and memory usage.
You could also use profiling tools like New Relic that will give you a "CT scan" view of your application.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex