iis startup delay with aspx pages - asp.net

Environment: Windows Server 2003; IIS 6, ASP.NET 2.0.50727
I'm going crazy with a brand new web server that we set up (note that this problem doesn't happen on our other web servers which have the same configuration). When loading and asp.net app the first time, the page hangs for over a full minute before showing the page in the browser. After it loads the first page, everything runs very quickly.
Note 1: You will probably say that the application is being compiled for the first time. But I've ruled that out. I put trace messages EVERYWHERE in the app and all the trace messages run within a second of requesting the page. Thus, the app compiles and runs immediately. But when the app is finished rendering the page and my last trace message is printed, nothing happens. IIS is doing something behind the scenes for a full minute before transferring the finished page along http to the user's browser.
Note 2: We found that after hitting the app the first time and things run fine, if we wait an hour then we get the delay again. Thus, IIS has something in its cache that it clears out after an hour and causes our site to stall again.
Note 3: Between each test we stop/start IIS to force it to hang upon loading the app.
Note 4: We watched the Task Manager to see if IIS was spiking and taking up a lot of resources processing something. But that wasn't it. We did see a very quick spike to 50% immediately before the browser showed the page, but for the previous 60 seconds there was only 1% usage on the server.
Note 5: On another test I created a HelloWorld.html page and this does not cause IIS to hang. Thus, it has something to do with calling the ASP.NET library the very first time it sends a rendered page across http. Also, since the app has already been compiled and runs instantly, it's just the part of asp.net that sends the rendered page to the user's browser that causes the delay.
Any ideas? We are a a loss here. All of our other web servers are setup the same way and work fine, but this is a new install. So there must be a configuration setting that was missed or maybe something needs to be installed?
Thanks,
Brian

If you have access to the servers, then make sure that app pool recycling is actually logged to the event logs
cscript adsutil.vbs get w3svc/AppPools/DefaultAppPool/LogEventOnRecycle
you can set it to log everything with
cscript adsutil.vbs Set w3svc/AppPools/DefaultAppPool/LogEventOnRecycle 255
See more here
Then check if there were any recycles.
App initialization, creation the worker process, threads, load the app domain and all the references dll's can take some time, that's normal, but that 1 minute delay is something else probably.
Try to precompile the app on the server and see if that helps
aspnet_compiler -m /LM/W3SVC/[site id ]/Root/[your appname]
If you want to dig deeper, you can check the event trace ETW.
logman query providers
Save the IIS /ASP.NET related Guids to a file like iisproviders.txt
logman start ExampleTrace -pf iisproviders.txt -ets -rt
reproduce
LogParser "SELECT * FROM ExampleTrace" -i:ETW
logman stop ExampleTrace -ets
You can find more hereTroubleshooting appdomain restarts and other issues with ETW tracing
I would also check the w3wp.exe with procexp if it has a TCP connection time out or with Procmon for other clues.
If you have experience with windbg, then you can make a request to the app then quickly attach the debugger to the process
windbg -p [process id of the app pool]
.loadby sos mscorwks
g
and take it from there. If there are exceptions, process crash, etc you should be able to catch it...
Once we had a weird server issue like this and a .NET reinstall solved the problem, still not sure what was the culprit.

Could be some aspnet.config settings on this box that are different from others. Have you tried copying over their config files to this server? There appears to be certificate options along with registry modifications that you can do to remove some lag time during the initial load of a page (precompiling aside)
See here and here

One thing you might want to check on is if there are any database access going on on your page load. That might be blocking the creation of the page during initial page load. Then when the query is cached (either by the db engine or another cache mechanism like memcached), subsequent page loads work as normal.
As per your last comment,
I could stop/start IIS multiple times and the app always ran instantly. I thought it was fixed for good. But now I just tried again (it has been sitting idle for the past couple of hours) and now it is back to hanging on the first request.
This could mean that the cache has expired and thus needs to hit the database once again, causing the delay in page load.

Related

Some requests on IIS hang for minutes and end in a lost connection

I have an awkward issue with IIS 10.0 on Windows Server 2016 and ASP.Net 4.5.2 and MVC 5.2.7.
At times, certain requests do not receive a response and run for minutes, maybe 10 or so, before ending in a lost connection (PR_CONNECT_RESET_ERROR in Firefox on Windows, NSURLDomainError in Firefox on iOS). These are mostly POST requests. When this issue occurs, other GET requests will receive a swift response and a correct result. Normally, POST-request do no take long to be processed, typically less than 3 seconds.
Recycling the associated worker process will make the issue go away, for hours or days.
When today inspected the web server when the issue was going on, I saw little CPU usage, less than 10%, memory 56%, the worker process a modest 615 MB. I saw neither logging in the W3C log of these requests, nor in my custom application logs.
I added the Web-Request-Monitor conform How do I see currently executing web request on IIS 8, but in doing so, the the worker process probably got recycled, as the issue is not currently occurring.
There are a reverse proxy and an access manager between the internet and my web server. I suppose they can have something to do with this issue, but it certainly is related to IIS, as recycling helps.
All of this is happening on a acceptation web server running a newer version of my application. I am not aware of any big changes to the application's architecture that could be involved. Also, there will be very little traffic from other clients, if none at all.
What could be next steps to investigate this issue further?
Update
This issue was definitely caused by log4net. However, it was not related to the log4net.Internal.Debug setting. It was caused by two application domains accessing the same log file. This occasionally resulted in concurrency issues with accessing the log file. It appeared that log4net could not properly handle this and got stuck while writing to the log file.
This log file was configured with the RollingFileAppender option. Since we also used AdoNetAppender, we decided to remove file logging all together.
Original
I have found a probable cause. I'll report the steps I took to investigate the issue.
I activated the Worker Processes feature in IIS.
When, after a couple of days of waiting, the issue started again, I found long running requests. They all had State ExecuteRequestHandler and Module Name ManagedPipelineHandler. They had Time Elapsed of hundreds of seconds.
I also activated the Failed Requests Tracing with a rule for long running requests with a Time Taken of 1 minute.
After a couple of days, I started to receive failed request reports. The failed request all have a GENERAL_SET_RESPONSE_HEADER event as their last event.
I added additional debug logging events for each requests. When debugging in my development environment, at one point, I started to see the hanging behaviour there, on one of the new logging statements(!). The application uses log4net.
I captured a stack trace:
log4net.dll!log4net.Appender.AppenderSkeleton.DoAppend(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Util.AppenderAttachedImpl.AppendLoopOnAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.CallAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.Log(System.Type callerStackBoundaryDeclaringType, log4net.Core.Level level, object message, System.Exception exception) log4net.dll!log4net.Core.LogImpl.DebugFormat(string format, object arg0)
The DoAppend method uses lock(this), which may very well cause hangs.
I also found out that the config setting log4net.Internal.Debug was set to true, which I do not want under normal circumstances and this may be related. I did not attempt to understand the log4net code, but I remember that logging initially did not work, in the acceptance environment, so the setting may very well have been set to true then, causing the issue to start.
Another indication that this is happening with log4net is that when the issue last occurred, I realized that logging of level standard, only occurs in some POST requests. I found a POST-request that does not log and requests to it where handled normally, while the other POST-requests still hung.
For now, I have set log4net.Internal.Debug to false and will wait to see what happens.
IIS recycle fix this issue doesn't mean that this is an IIS issue because all asp.net application run in .net runtime unless it is proved that the request is hang in IIS module.
So you may need to wait this issue happen again, then create a Failed request tracing rule for time-taken. Then it will tell us this issue is happening on IIS pipeline module or .net runtime.
If all request hang in .net runtime. Then you may have to capture a hang dump and do a deep analysis via WINDGB and mex extension. It will tell us what's happening there.

Debugging requests which are 'stuck' in an IIS worker process

In case of TL;DR - I basically need guidance regarding what tools are available to debug requests which are issued to IIS and which stall inside a module.
I have a problem with an old ASP 2.0 app at the moment whereby it will periodically become unavailable and recycling the app pool (horrible as that may be) doesn't bring it back up 100% of the time.
So first of all it presents itself as requests entering the app pool and being trapped in state 'BeginRequest' in RewriteModule.
It is not a specific request which is always the first to experience this issue. The issue cannot be easily recreated either.
Eventually requests join this backlog and when it becomes 70+ deep the app pool fails to respond to pings from WAS and it forcibly recycles. Predictably it doesn't stop on-time and the old app pool is forced to stop. When the new app pool comes up it either works just fine or it instantly experiences the same issue as the outgoing one and requests begin to queue.
In issues like this all the official guidance is understandably focussed around looking at why the RewriteModule may choke.
I have validated my redirections and though complex there are no obvious issues with syntax (XML validates).
Likewise in inetmgr loading up the URL Rewrite Module seems to parse the configs fine and show them visually.
Basic stuff like permissions is all fine.
When the app is working normally I also used Failed Request Tracing/Logging to look at the request pipeline for a sample URL which stalled and I can confirm that there is no circular logic or weird errors presenting - the request seems to be handled just fine. This also showed me how high up the rewritemodule is invoked and from this I really don't see how the issue could be app-related as .NET isn't invoked at this point.
Annoyingly when an app pool is experiencing this issue and I can throw in requests which just stall Failed Request Tracing is no good because you actually need a request to get to the end of it's journey and fail otherwise it refuses to log anything out.
I resorted to taking process dumps of affected w3wp.exe's and running them through DebugDiag. Unfortunately the only thing I see is that threads are open accessing the rewritemodule but precious little about what they are stuck on.
As anyone else would do I've tried to track the start of the issue back to any recently installed patches or code changes but nothing matches. Likewise this is happening on 3x servers otherwise I would try reinstalling the rewritemodule. Other sites on the same server which invoke rewritemodule are unaffected.
Has anyone else experienced issues like this - the net seems to have relatively little info in this case. Perhaps you can recommend further debugging tools or approaches for IIS which I can adapt to this scenario? This is sort of a cry for help from someone more used to Apache/Nginx - sorry for the long post.

ASP.Net timeout processing file after upload

We have an ASP.Net web application on IIS7 that is used to upload Excel files and then load them into a SQL database by running jobs on the SQL server. The app will wait until the job completes then show the user a message. Due to some larger files being used the app is throwing the error below.
Network Error (tcp_error)
A communication error occurred: "" The Web Server may be down, too
busy, or experiencing other problems preventing it from responding to
requests. You may wish to try again at a later time. For assistance,
contact your network support team.
The app uses an asp:View to progress from various steps. I have tried to bump the session timeout and httpRuntime executionTimeout values to account for how long the job takes to run but it does not appear to have any effect. I know the job completes but the app isn't showing that feedback to the user. I think the error is thrown as the app hits the logic to display the user the view showing all the final messages.
I can only guess that a) there is another setting I'm not aware of for timeout, b) another config file setting is overruling my web config setting for the app, or c) the asp:View is counting all the various steps as one long process and not reseting the "clock" as each step is completed.
As I said, the file upload fine, and the job completes fine, the app just can't advance to that last step where it shows the user the view upon the end. Any ideas on what I can look for to fix this issue? My only other option would be to rewrite the app to not wait for the job to finish and handle notifying the user some other way.
Update 1
After further testing it appears the error is from the ASP.Net custom code we created that does a SQL bulk copy and not the running of the SQL job. The current test runs around 220 seconds testing locally but causes a timeout on a test server.
Update 2
After more research I'm inclinded to think user pevgeniev is correct and this is just a limiting factor of the browser. The only thing that prevents me from marking this as answered is I don't know why file uploads don't appear to have the same issue.
If you're getting this error in the browser, than the timeout is on the client side, and there isn't much you could do server side. As you've suggested, you could rewrite the app, so that it polls for the result from the client, rather than expecting to finish the task in a single request.

Re-enable ASP.NET session that caused IIS hang

I'm trying to implement some fail safes on a client's web server which is running two of their most important sites (ASP.NET on IIS7). I'm going to set up application pool limiting so that if any w3wp process uses 90%+ CPU for longer than a minute then it gets killed (producing a temporary 503 Service Unavailable message to any visitors), and based on my local testing will be restarted within a minute - a much better solution than having one CPU-hogging process taking down the whole server for any length of time.
This seems to work, however during my fiddling on my local IIS7 instance I've noticed that if a request calls my "Kill.aspx", even when the site comes back up IIS will not serve the session that caused it to hang. I can only restart the test site from a different session - but as soon as I clear my cookies on the "killer" browser I can get to the site again.
So, whatever malicious behaviour IIS is trying to curb with this would not work against an even slightly determined opponent. In most cases, if excrement does hit fan it will be coding/configuration error and not the fault of the user who happened to request a page at that time.
Therefore, I'd like to turn this feature off as the theoretical user would have no idea that they need to clear their cookies before they can access the site again. I would really appreciate any ideas on how this might be possible.
Yous should be using ASP.Net Session StateServer instead of In-Proc (see msdn for details). That way, you session will run in different process and won't be affected by IIS crash.
Turn what "feature" off? If the worker process is reset (and your using in-proc session) then the session is blown away on a reset.
You might want to investigate moving your session storage to a state server or some other out of process scenario.
Also, you might want to set the application pool to use several worker processes (aka: web garden) this way if one process is killed the others continue serving content.
Next, as another option you might want to set up multiple web servers and load balance them.
Finally, you might want to profile the app to see exactly how they are causing it to spin into nothingness. My guess is that there are a number of code issues you are simply covering up with this idea.

Mysterious IIS Problem: Site stops serving dynamic pages, no errors in logs

This may be the most mysterious problem I've ever encountered.
We have an IIS7 install with 3 Web Sites on it, each with it's own Application Pool. Once a day, for about an hour, a specific one of them goes down.
What I mean by "goes down" is:
It stops responding to requests for dynamic pages (ex. default.aspx) but will serve static files fine (logo.png).
Wireshark tells me that these dynamic page requests are actually return HTTP 500 Internal Server errors, but in the browser, I don't see an error. I just see the browser spinning.
If I log on locally to the box and surf around everything runs fine. All the pages pull up, so the database is being queried. It all seems perfectly normal.
There are no errors in the event log.
There are no errors recorded that have been captured by our internal (Application-level) error logging.
The basic IIS log file, which I thought logged every request, shows no record of these requests coming in.
And, if I restart the App Pool for the Web Site, everything comes back immediately. Or, if I just wait an hour or so, it comes back.
So, I've ruled out:
DNS issues, since I have no problem terminal servicing into the box by hostname.
Database issues, since the site works fine when I'm local to the box and surfing around
HTTP firewall issues, since I'm seeing the requests in wireshark, and am even getting images to serve up.
I have to assume it's a problem with my application, but IIS doesn't even show that these requests ever happened, and nothing in IIS or my app is logging errors.
It also doesn't even go down at the same time each day. This started at night (#midnight) and seems that it's gradually started moving it's daily time by an hour or so, until the point now where it hit at 9AM.
Any clues you might have for further troubleshooting would be greatly appreciated.
Tom
I'd fire up performance monitor and look for requests and exceptions being thrown. Not a whole lot of value in my answer but it might started pointing you in the right direction.
Actually, check the event logs first, see if something is throwing errors. Also, check memory usage and paging.

Resources