I am running an ASP.NET application with a custom module registered under IIS7.
Until two days back, everything was running fine. Now I notice that the requests started to hang at the AuthenticateRequest state and in the WindowsAuthentication module. My custom module intercepts at the BeginRequest state and processes the requests and completes the request processing using HttpContext.Current.ApplicationInstance.CompleteRequest(). The requests that it doesn't process are left for IIS to take them through the other modules for processing.
The problem (the request hang) occurs in pages that my custom module doesn't process.
Any ideas where I should start troubleshooting this problem? I have consistently reproduced this problem on three different machines today. I also found that we did not change our web.config file in the last month.
Any help towards troubleshooting this problem is greatly appreciated.
Thanks in advance,
Charles Prakash Dasari
Finally I found the solution to my problem.
The custom module I have implemented uses async handlers:
context.AddOnBeginRequestAsync(
new BeginEventHandler(BeginBeginRequest),
new EndEventHandler(EndBeginRequest)
);
In the case where my module do not process the request, the begin event handler completes the request and has nothing to do in it. Up until a couple of days ago, I was jumping off to another thread to process these requests in the Begin method and recently I fixed it such a way that I jump off to another thread only if my module has to process the request. Now this has caused the problem. Apparently IIS is not liking that I am completing my processing in the same thread.
So now I jump off to another thread again - no matter what. IIS is happy and my app is not hanging any more.
I still have to investigate further and make sure why this happens - or if it is a bug really in IIS or in the way I return the IAsyncResult from BeginBeginRequest method. But for now I know that I have to process this request on a different thread.
Related
I have an awkward issue with IIS 10.0 on Windows Server 2016 and ASP.Net 4.5.2 and MVC 5.2.7.
At times, certain requests do not receive a response and run for minutes, maybe 10 or so, before ending in a lost connection (PR_CONNECT_RESET_ERROR in Firefox on Windows, NSURLDomainError in Firefox on iOS). These are mostly POST requests. When this issue occurs, other GET requests will receive a swift response and a correct result. Normally, POST-request do no take long to be processed, typically less than 3 seconds.
Recycling the associated worker process will make the issue go away, for hours or days.
When today inspected the web server when the issue was going on, I saw little CPU usage, less than 10%, memory 56%, the worker process a modest 615 MB. I saw neither logging in the W3C log of these requests, nor in my custom application logs.
I added the Web-Request-Monitor conform How do I see currently executing web request on IIS 8, but in doing so, the the worker process probably got recycled, as the issue is not currently occurring.
There are a reverse proxy and an access manager between the internet and my web server. I suppose they can have something to do with this issue, but it certainly is related to IIS, as recycling helps.
All of this is happening on a acceptation web server running a newer version of my application. I am not aware of any big changes to the application's architecture that could be involved. Also, there will be very little traffic from other clients, if none at all.
What could be next steps to investigate this issue further?
Update
This issue was definitely caused by log4net. However, it was not related to the log4net.Internal.Debug setting. It was caused by two application domains accessing the same log file. This occasionally resulted in concurrency issues with accessing the log file. It appeared that log4net could not properly handle this and got stuck while writing to the log file.
This log file was configured with the RollingFileAppender option. Since we also used AdoNetAppender, we decided to remove file logging all together.
Original
I have found a probable cause. I'll report the steps I took to investigate the issue.
I activated the Worker Processes feature in IIS.
When, after a couple of days of waiting, the issue started again, I found long running requests. They all had State ExecuteRequestHandler and Module Name ManagedPipelineHandler. They had Time Elapsed of hundreds of seconds.
I also activated the Failed Requests Tracing with a rule for long running requests with a Time Taken of 1 minute.
After a couple of days, I started to receive failed request reports. The failed request all have a GENERAL_SET_RESPONSE_HEADER event as their last event.
I added additional debug logging events for each requests. When debugging in my development environment, at one point, I started to see the hanging behaviour there, on one of the new logging statements(!). The application uses log4net.
I captured a stack trace:
log4net.dll!log4net.Appender.AppenderSkeleton.DoAppend(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Util.AppenderAttachedImpl.AppendLoopOnAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.CallAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.Log(System.Type callerStackBoundaryDeclaringType, log4net.Core.Level level, object message, System.Exception exception) log4net.dll!log4net.Core.LogImpl.DebugFormat(string format, object arg0)
The DoAppend method uses lock(this), which may very well cause hangs.
I also found out that the config setting log4net.Internal.Debug was set to true, which I do not want under normal circumstances and this may be related. I did not attempt to understand the log4net code, but I remember that logging initially did not work, in the acceptance environment, so the setting may very well have been set to true then, causing the issue to start.
Another indication that this is happening with log4net is that when the issue last occurred, I realized that logging of level standard, only occurs in some POST requests. I found a POST-request that does not log and requests to it where handled normally, while the other POST-requests still hung.
For now, I have set log4net.Internal.Debug to false and will wait to see what happens.
IIS recycle fix this issue doesn't mean that this is an IIS issue because all asp.net application run in .net runtime unless it is proved that the request is hang in IIS module.
So you may need to wait this issue happen again, then create a Failed request tracing rule for time-taken. Then it will tell us this issue is happening on IIS pipeline module or .net runtime.
If all request hang in .net runtime. Then you may have to capture a hang dump and do a deep analysis via WINDGB and mex extension. It will tell us what's happening there.
In case of TL;DR - I basically need guidance regarding what tools are available to debug requests which are issued to IIS and which stall inside a module.
I have a problem with an old ASP 2.0 app at the moment whereby it will periodically become unavailable and recycling the app pool (horrible as that may be) doesn't bring it back up 100% of the time.
So first of all it presents itself as requests entering the app pool and being trapped in state 'BeginRequest' in RewriteModule.
It is not a specific request which is always the first to experience this issue. The issue cannot be easily recreated either.
Eventually requests join this backlog and when it becomes 70+ deep the app pool fails to respond to pings from WAS and it forcibly recycles. Predictably it doesn't stop on-time and the old app pool is forced to stop. When the new app pool comes up it either works just fine or it instantly experiences the same issue as the outgoing one and requests begin to queue.
In issues like this all the official guidance is understandably focussed around looking at why the RewriteModule may choke.
I have validated my redirections and though complex there are no obvious issues with syntax (XML validates).
Likewise in inetmgr loading up the URL Rewrite Module seems to parse the configs fine and show them visually.
Basic stuff like permissions is all fine.
When the app is working normally I also used Failed Request Tracing/Logging to look at the request pipeline for a sample URL which stalled and I can confirm that there is no circular logic or weird errors presenting - the request seems to be handled just fine. This also showed me how high up the rewritemodule is invoked and from this I really don't see how the issue could be app-related as .NET isn't invoked at this point.
Annoyingly when an app pool is experiencing this issue and I can throw in requests which just stall Failed Request Tracing is no good because you actually need a request to get to the end of it's journey and fail otherwise it refuses to log anything out.
I resorted to taking process dumps of affected w3wp.exe's and running them through DebugDiag. Unfortunately the only thing I see is that threads are open accessing the rewritemodule but precious little about what they are stuck on.
As anyone else would do I've tried to track the start of the issue back to any recently installed patches or code changes but nothing matches. Likewise this is happening on 3x servers otherwise I would try reinstalling the rewritemodule. Other sites on the same server which invoke rewritemodule are unaffected.
Has anyone else experienced issues like this - the net seems to have relatively little info in this case. Perhaps you can recommend further debugging tools or approaches for IIS which I can adapt to this scenario? This is sort of a cry for help from someone more used to Apache/Nginx - sorry for the long post.
We have a secured & authenticated WCF service which cannot use service references. Thus, we provide the interface for the contracts and open client channel manually.
We have found out that as long we open it once, everything works fine. We can call several methods several times. However, if the channel is closed or just set to a new instance, the Login() (which happens to be required for first step prior to using the service), times out.
To make the matters even more mysterious, this only happens on our production server. If I run the same project locally, I am able to login many times as I want. Consuming the methods inside a web browser (even on a code-behind ASPX page) do not have this problem even with the production server. ONLY when it's a .NET client trying to open a client channel against the production server, do we have this problem.
We are not even sure where to start looking. Any advices would be greatly appreciated.
UPDATE:
As per #Rene's suggestion, we turned on logging on both sides. From client's log, there is a record of error which is basically the same timeout error we already got via the exception. Nothing meaningful. On the server's logs, there are records of service methods being invoked successfully even after 2nd login() and from server's POV, the request is served.
Additionally, I discovered that I could not even reproduce this issue on my machine using same test project to reproduce this problem. This reproduces on my developer's machine. I verified that we were at same version of .NET framework and Visual Studio. It has to be surely a client-side problem. What could be it?
In case anyone else is looking for answer, we finally found it -- the issue is due to the need to set on client's side System.Net.ServicePointManager.DefaultConnectionLimit to some higher value. The default value is 2 but in reality this allows only one proxy to be created and be usable. Setting it to 3 would allow 2 proxies to be created & be used.
I am working on solving a problem that I have had for a couple of days now. Every time one of my sites are rebuild or the AppPool is recycled, the first pageload will hang forever (well, I've only waited up to 30 minutes). It is only happening on one particular site out of ~10 sites. It is an ASP.NET site.
Here are the things I have observed:
In IIS Manager under worker processes I can see the request. Verb = GET, Sate = ExecuteRequestHandler, Module Name = ManagedPipelineHandler. Time Elapsed just keeps increasing, of course.
If I close down the browser in which I made the initial request and then open a new one to make another request, the page will load instantly.
In my code the Application_Start of my Global.asax file is not called on the first request. It is called on the second request.
The worker proccess is causing the memory usage on my machine to go through the roof
I'm inexperienced in troubleshooting IIS, but hours and hours of searching has led me nowhere.
The only major code change we have made on the site recently is that we have started implementing logging using log4net. I have though tried to remove any log4net code, both from my web.config file and Global.asax - still no luck.
Has anyone else experienced this and if so how did you solve it?
Any and all help will be much appreciated.
ADD:
If I place a .txt file in the root of the site and load that as the first thing after a build it will load instantly.
However the worker proccess still acts exactly as before and the memory usage still goes through the roof.
Final edit:
I feel like such an idiot. I can't explain why, but for some reason my break points in Global.asax suddenly got hit and I was able to identify the problem. It was a call to a database via Entity Framework that was badly written - i.e. the filtering was done after all the rows from the column in question had been fetched. And to make it worse, the filtering was done inside a foreach loop. Anyway, now everything is back to normal and I'm happy.
Possibly stating the obvious but you haven't got any silly code in your global asax in the app_start that could be causing this?
Sounds like an infinite loop or something?
Just a quick note what happend in my case:
Neither Process Monitor nor Failed Request Tracing was of any help. The website simply loaded (nearly) forever.
Finally, after waiting for several minutes an error occurred stating that it "cannot locate the network path".
The reason was that I entered a connection string to a non-existing SQL Server instance, so it somehow keept searching for the server. Finally, a timeout occured.
The solution was to simply specify the correct SQL Server in the connection string inside Web.Config.
I won't get into specifics unless I need to, but I have an app which loads 8 widgets in what should be an async manner. Instead, though, only 3 or so of them are loading asynchronously, and then the others end up getting queued and wait for the first ones to finish. Each of the widget actions has at least one web service call, so that is a factor too. I assume its just a thread or request limitation from a browser, or IIS, or whatever. The problem, though, is that IIS isn't freeing up the threads from the first few widgets to be used for the other ones. Even after the first few are totally done loading, there seems to be only one usable thread.
I am testing on a windows 7 machine with iis7.
If any more info is needed let me know..
Apparently it was a session lock issue. Readonly session state fixed this.