Some requests on IIS hang for minutes and end in a lost connection - asp.net

I have an awkward issue with IIS 10.0 on Windows Server 2016 and ASP.Net 4.5.2 and MVC 5.2.7.
At times, certain requests do not receive a response and run for minutes, maybe 10 or so, before ending in a lost connection (PR_CONNECT_RESET_ERROR in Firefox on Windows, NSURLDomainError in Firefox on iOS). These are mostly POST requests. When this issue occurs, other GET requests will receive a swift response and a correct result. Normally, POST-request do no take long to be processed, typically less than 3 seconds.
Recycling the associated worker process will make the issue go away, for hours or days.
When today inspected the web server when the issue was going on, I saw little CPU usage, less than 10%, memory 56%, the worker process a modest 615 MB. I saw neither logging in the W3C log of these requests, nor in my custom application logs.
I added the Web-Request-Monitor conform How do I see currently executing web request on IIS 8, but in doing so, the the worker process probably got recycled, as the issue is not currently occurring.
There are a reverse proxy and an access manager between the internet and my web server. I suppose they can have something to do with this issue, but it certainly is related to IIS, as recycling helps.
All of this is happening on a acceptation web server running a newer version of my application. I am not aware of any big changes to the application's architecture that could be involved. Also, there will be very little traffic from other clients, if none at all.
What could be next steps to investigate this issue further?

Update
This issue was definitely caused by log4net. However, it was not related to the log4net.Internal.Debug setting. It was caused by two application domains accessing the same log file. This occasionally resulted in concurrency issues with accessing the log file. It appeared that log4net could not properly handle this and got stuck while writing to the log file.
This log file was configured with the RollingFileAppender option. Since we also used AdoNetAppender, we decided to remove file logging all together.
Original
I have found a probable cause. I'll report the steps I took to investigate the issue.
I activated the Worker Processes feature in IIS.
When, after a couple of days of waiting, the issue started again, I found long running requests. They all had State ExecuteRequestHandler and Module Name ManagedPipelineHandler. They had Time Elapsed of hundreds of seconds.
I also activated the Failed Requests Tracing with a rule for long running requests with a Time Taken of 1 minute.
After a couple of days, I started to receive failed request reports. The failed request all have a GENERAL_SET_RESPONSE_HEADER event as their last event.
I added additional debug logging events for each requests. When debugging in my development environment, at one point, I started to see the hanging behaviour there, on one of the new logging statements(!). The application uses log4net.
I captured a stack trace:
log4net.dll!log4net.Appender.AppenderSkeleton.DoAppend(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Util.AppenderAttachedImpl.AppendLoopOnAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.CallAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.Log(System.Type callerStackBoundaryDeclaringType, log4net.Core.Level level, object message, System.Exception exception) log4net.dll!log4net.Core.LogImpl.DebugFormat(string format, object arg0)
The DoAppend method uses lock(this), which may very well cause hangs.
I also found out that the config setting log4net.Internal.Debug was set to true, which I do not want under normal circumstances and this may be related. I did not attempt to understand the log4net code, but I remember that logging initially did not work, in the acceptance environment, so the setting may very well have been set to true then, causing the issue to start.
Another indication that this is happening with log4net is that when the issue last occurred, I realized that logging of level standard, only occurs in some POST requests. I found a POST-request that does not log and requests to it where handled normally, while the other POST-requests still hung.
For now, I have set log4net.Internal.Debug to false and will wait to see what happens.

IIS recycle fix this issue doesn't mean that this is an IIS issue because all asp.net application run in .net runtime unless it is proved that the request is hang in IIS module.
So you may need to wait this issue happen again, then create a Failed request tracing rule for time-taken. Then it will tell us this issue is happening on IIS pipeline module or .net runtime.
If all request hang in .net runtime. Then you may have to capture a hang dump and do a deep analysis via WINDGB and mex extension. It will tell us what's happening there.

Related

ASP.NET deployed application stops responding

We have an ASP.NET deployed application running on IIS 7. Lately we started having problems with the website, which usually starts at high traffic times, and the issue is that the page stops loading without showing an error. It essentially continues spinning and does not load. IIS reset would usually fix the issue but we have tried everything to resolve it with no success. Below are additional information about what we have already tried.I can intentionally put the website into this state by running 25 concurrent users to the landing page after which I would have to reset iis because it would stop responding. I am thinking this might have to do with a settings in IIS. Maximum Concurrent Connection is set to the default which is 4294967295. Kind of at a lost here.
We turned on Failed log tracing on IIS. The data in the error log did not provide anything conclusive.This might be partly related to the request not failing completely hence no log was created. Most errors were collected based on time the page took to respond.
I have also looked at the app pool and host log files and found nothing out of place
25 concurrent users is nothing. What's the back end stack? There's not too many details listed here but I'd start with looking at each stage the request is in and in addition enable failed request tracing. Mike (formerly from the IIS team) has a great write up on this, in a nutshell though:
Troubleshoot hanging requests on IIS in 3 steps
View requests
%windir%system32inetsrvappcmd list requests /elapsed:30000
Enabled Failed Request Tracing (modify if not using Default Web Site of course)
%windir%system32inetsrvappcmd configure trace "Default Web Site" /enablesite
%windir%system32inetsrvappcmd configure trace "Default Web Site" /enable /path:test.aspx /timeTaken:00:00:30
Then hopefully you can find some details via
appcmd list traces | findstr "yourpage.aspx"

Debugging requests which are 'stuck' in an IIS worker process

In case of TL;DR - I basically need guidance regarding what tools are available to debug requests which are issued to IIS and which stall inside a module.
I have a problem with an old ASP 2.0 app at the moment whereby it will periodically become unavailable and recycling the app pool (horrible as that may be) doesn't bring it back up 100% of the time.
So first of all it presents itself as requests entering the app pool and being trapped in state 'BeginRequest' in RewriteModule.
It is not a specific request which is always the first to experience this issue. The issue cannot be easily recreated either.
Eventually requests join this backlog and when it becomes 70+ deep the app pool fails to respond to pings from WAS and it forcibly recycles. Predictably it doesn't stop on-time and the old app pool is forced to stop. When the new app pool comes up it either works just fine or it instantly experiences the same issue as the outgoing one and requests begin to queue.
In issues like this all the official guidance is understandably focussed around looking at why the RewriteModule may choke.
I have validated my redirections and though complex there are no obvious issues with syntax (XML validates).
Likewise in inetmgr loading up the URL Rewrite Module seems to parse the configs fine and show them visually.
Basic stuff like permissions is all fine.
When the app is working normally I also used Failed Request Tracing/Logging to look at the request pipeline for a sample URL which stalled and I can confirm that there is no circular logic or weird errors presenting - the request seems to be handled just fine. This also showed me how high up the rewritemodule is invoked and from this I really don't see how the issue could be app-related as .NET isn't invoked at this point.
Annoyingly when an app pool is experiencing this issue and I can throw in requests which just stall Failed Request Tracing is no good because you actually need a request to get to the end of it's journey and fail otherwise it refuses to log anything out.
I resorted to taking process dumps of affected w3wp.exe's and running them through DebugDiag. Unfortunately the only thing I see is that threads are open accessing the rewritemodule but precious little about what they are stuck on.
As anyone else would do I've tried to track the start of the issue back to any recently installed patches or code changes but nothing matches. Likewise this is happening on 3x servers otherwise I would try reinstalling the rewritemodule. Other sites on the same server which invoke rewritemodule are unaffected.
Has anyone else experienced issues like this - the net seems to have relatively little info in this case. Perhaps you can recommend further debugging tools or approaches for IIS which I can adapt to this scenario? This is sort of a cry for help from someone more used to Apache/Nginx - sorry for the long post.

Secured WCF service timing out on 2nd invocation of client channel

We have a secured & authenticated WCF service which cannot use service references. Thus, we provide the interface for the contracts and open client channel manually.
We have found out that as long we open it once, everything works fine. We can call several methods several times. However, if the channel is closed or just set to a new instance, the Login() (which happens to be required for first step prior to using the service), times out.
To make the matters even more mysterious, this only happens on our production server. If I run the same project locally, I am able to login many times as I want. Consuming the methods inside a web browser (even on a code-behind ASPX page) do not have this problem even with the production server. ONLY when it's a .NET client trying to open a client channel against the production server, do we have this problem.
We are not even sure where to start looking. Any advices would be greatly appreciated.
UPDATE:
As per #Rene's suggestion, we turned on logging on both sides. From client's log, there is a record of error which is basically the same timeout error we already got via the exception. Nothing meaningful. On the server's logs, there are records of service methods being invoked successfully even after 2nd login() and from server's POV, the request is served.
Additionally, I discovered that I could not even reproduce this issue on my machine using same test project to reproduce this problem. This reproduces on my developer's machine. I verified that we were at same version of .NET framework and Visual Studio. It has to be surely a client-side problem. What could be it?
In case anyone else is looking for answer, we finally found it -- the issue is due to the need to set on client's side System.Net.ServicePointManager.DefaultConnectionLimit to some higher value. The default value is 2 but in reality this allows only one proxy to be created and be usable. Setting it to 3 would allow 2 proxies to be created & be used.

iis 7.5 ASP.net hanging requests

I am having some performance issues with my iis webserver. It is hanging randomly and I am trying to figure out how to speed up the server. I enabled Failed request tracing on the server and set it to generate a log when the request is over 3 seconds.
The resulting logs(xml) dont show much but there is a point in the compact performance log that indicates what part of the log the server is hanging on. Below is the part of the log where the large time loss is occurring.
65. i GENERAL_GET_URL_METADATA PhysicalPath="", AccessPerms="513" 17:46:32.577
66. i HANDLER_CHANGED OldHandlerName="", NewHandlerName="ExtensionlessUrlHandler-Integrated-4.0", NewHandlerModules="ManagedPipelineHandler", NewHandlerScriptProcessor="", NewHandlerType="System.Web.Handlers.TransferRequestHandler" 17:46:32.577
67. i VIRTUAL_MODULE_UNRESOLVED Name="FormsAuthentication", Type="System.Web.Security.FormsAuthenticationModule" 17:46:47.771
I am not sure what Handler changed is but it is taking a long time, any tips would be great on where to start looking.
It is hard to come up with a solution without having any piece of code in sight. Here are some general hints/tips you can follow in order to have great performances with an ASP.NET application.
The fastest way to do a request is to not do it in the first place. Try caching everything that can be cached. There are server-side caches and client-side caches. Each have their own uses, but you are not limited to only one type.
Make sure you do not cache and/or keep references of any request-related objects into memory. ASP.NET have a limited number of concurrent requests and keeping a request reference in memory will hang your server if it runs out of threads
Close the request as soon as you are done with it
Everything that is not needed by the client at the time of the request should be done in the background
Make sure you have no memory leak in your application. Garbage Collections are often the cause of hangs in ASP.NET application. When garbage collecting, all running threads are paused. This is especially true for Gen 2 garbage collections. You can enable background generation 2 garbage collections.
Isolate the problematic code. Use a profiler and see which type of request is CPU-intensive. Then dig deeper and see what inside that request makes it slow.
In any well-balanced application, objects should either be short-lived and live forever. In the case of an ASP.NET application, the objects created during the course of a request should ideally die within that request or during the next GC gen 0.
Consider object pooling for large objects and objects that are long to initialize
Make sure your app pool doesn't totally crash and restarts (look the IIS logs and/or the Windows Events)
Some useful debugging tools you can use:
LeanSentry. Great for diagnosing ASP.NET server hangs
windbg. High learning curve but by far the most powerful debugging tool you can use
PerfView. Useful for analyzing ETW events like I/O or CPU usage
There are many ways to improve server performance. But before that you should start with checking CPU usage during the "hang". An infinite loop in the application code may cause this behavior. Unless there is I/O, locking, or sleeps in the loop, you will be able to see it from the CPU usage as you will get exactly one full core's worth of CPU usage for each infinite loop.
Help link to improve server performance
More Info:
I can see entry related to VIRTUAL MODULE UNRESOLVED: which is related to bad use of Response.Redirect(url); Also make sure you have deployed your app on integrated mode on IIS.
here's a simple checklist you might want to reconsider:
Always pre-compiling your site, as opposed to copying it! you might gain a significant performance boost compiling your website before deployment: ASP.NET Precompilation Overview
Do not run the production application with debug="true" enabled, when debug flag is true in your web.config, Much more memory is used within the application at runtime, and since some additional debug paths are enabled, codes can execute much slower
Check your Web.config file to ensure trace is disabled in the section
IIS 7.5 comes with the Auto-Start Feature. WAS (Windows Process Activation Service) starts all the application pools that are configured to start automatically, ensure that your application pool is configured to AlwaysRunning in the IIS 7.5 applicationHost.config, check out here for more detail.
Every asp.net server can be well configured by aspnet.config file located in the root of the framework folder. Ensure that Publisher Evidence for Code Access Security (CAS) is set to false in your aspnet.config file, This might increase the initial page load when you restart the ASP.NET app pool. you can read more about it here.
Also you might want to try Application Initialization Module for IIS 7.5, this module also available on IIS 8.0 can decrease the response time for first requests by pre-loading worker processes

iis startup delay with aspx pages

Environment: Windows Server 2003; IIS 6, ASP.NET 2.0.50727
I'm going crazy with a brand new web server that we set up (note that this problem doesn't happen on our other web servers which have the same configuration). When loading and asp.net app the first time, the page hangs for over a full minute before showing the page in the browser. After it loads the first page, everything runs very quickly.
Note 1: You will probably say that the application is being compiled for the first time. But I've ruled that out. I put trace messages EVERYWHERE in the app and all the trace messages run within a second of requesting the page. Thus, the app compiles and runs immediately. But when the app is finished rendering the page and my last trace message is printed, nothing happens. IIS is doing something behind the scenes for a full minute before transferring the finished page along http to the user's browser.
Note 2: We found that after hitting the app the first time and things run fine, if we wait an hour then we get the delay again. Thus, IIS has something in its cache that it clears out after an hour and causes our site to stall again.
Note 3: Between each test we stop/start IIS to force it to hang upon loading the app.
Note 4: We watched the Task Manager to see if IIS was spiking and taking up a lot of resources processing something. But that wasn't it. We did see a very quick spike to 50% immediately before the browser showed the page, but for the previous 60 seconds there was only 1% usage on the server.
Note 5: On another test I created a HelloWorld.html page and this does not cause IIS to hang. Thus, it has something to do with calling the ASP.NET library the very first time it sends a rendered page across http. Also, since the app has already been compiled and runs instantly, it's just the part of asp.net that sends the rendered page to the user's browser that causes the delay.
Any ideas? We are a a loss here. All of our other web servers are setup the same way and work fine, but this is a new install. So there must be a configuration setting that was missed or maybe something needs to be installed?
Thanks,
Brian
If you have access to the servers, then make sure that app pool recycling is actually logged to the event logs
cscript adsutil.vbs get w3svc/AppPools/DefaultAppPool/LogEventOnRecycle
you can set it to log everything with
cscript adsutil.vbs Set w3svc/AppPools/DefaultAppPool/LogEventOnRecycle 255
See more here
Then check if there were any recycles.
App initialization, creation the worker process, threads, load the app domain and all the references dll's can take some time, that's normal, but that 1 minute delay is something else probably.
Try to precompile the app on the server and see if that helps
aspnet_compiler -m /LM/W3SVC/[site id ]/Root/[your appname]
If you want to dig deeper, you can check the event trace ETW.
logman query providers
Save the IIS /ASP.NET related Guids to a file like iisproviders.txt
logman start ExampleTrace -pf iisproviders.txt -ets -rt
reproduce
LogParser "SELECT * FROM ExampleTrace" -i:ETW
logman stop ExampleTrace -ets
You can find more hereTroubleshooting appdomain restarts and other issues with ETW tracing
I would also check the w3wp.exe with procexp if it has a TCP connection time out or with Procmon for other clues.
If you have experience with windbg, then you can make a request to the app then quickly attach the debugger to the process
windbg -p [process id of the app pool]
.loadby sos mscorwks
g
and take it from there. If there are exceptions, process crash, etc you should be able to catch it...
Once we had a weird server issue like this and a .NET reinstall solved the problem, still not sure what was the culprit.
Could be some aspnet.config settings on this box that are different from others. Have you tried copying over their config files to this server? There appears to be certificate options along with registry modifications that you can do to remove some lag time during the initial load of a page (precompiling aside)
See here and here
One thing you might want to check on is if there are any database access going on on your page load. That might be blocking the creation of the page during initial page load. Then when the query is cached (either by the db engine or another cache mechanism like memcached), subsequent page loads work as normal.
As per your last comment,
I could stop/start IIS multiple times and the app always ran instantly. I thought it was fixed for good. But now I just tried again (it has been sitting idle for the past couple of hours) and now it is back to hanging on the first request.
This could mean that the cache has expired and thus needs to hit the database once again, causing the delay in page load.

Resources