ASP.NET script combiner returns blank response at times - asp.net

I am using a script manager for ASP.NET MVC to combine and compact CSS files and JavaScript files for pages on a website. For the most part this works as expected, however there are times (couple of times per week) when the HTTP handler responsible for returning the content returns an empty response and so pages load without any CSS - the HTML returns and the images load as well. When this happens, refreshing the page does not resolve the problem, while resetting IIS always resolves the problem. Also, without resetting IIS, after some time the problem stops.
Normally the script handler logs errors, however there are no errors logged during the issue. It seems as if the handler is never invoked. There are no failed request logs generated by IIS.
I monitored resource usage when this was happening and did not notice anything out of the ordinary. The web server is running IIS 7 and has low CPU usage. I increased some parameters in IIS settings regarding the number of allowable requests to process, the problem still exists though perhaps less frequent. The website receives about 1.5 million pageviews monthly.

This was found to be an issue with HTTP cache headers.

Related

A potential CEF cache corruption scenario

We have a .NET WPF container app in which we host several web apps using CEFSharp.WinForms control. At times, we see that for some users, some JavaScript resource requests fail with the ERR_CONTENT_DECODING_FAILED error message. This issue gets resolved if we reload the app after either clearing the CEF cache or after disabling the cache from the network tab in the developer toolbar window. Please note that this issue isn't confined to a specific subset of resource files: instead, we have seen it happening sporadically for a variety of JavaScript resource files (some hosted on Apache while the others hosted on IIS servers).
While a possible cause for usual ERR_CONTENT_DECODING_FAILED error is a server-side content-encoding issue, in this specific case, we believe this could potentially be related to the CEF browser caching. Please see the analysis section below for the reasons we believe so.
Application Setup
When we initialize CEF settings, we set MultiThreadedMessageLoop setting to true and set CachePath property to a location under %localappdata% on windows 10 machine. When the container app starts, it creates three CEF web browser controls and launches web apps in them. All three apps load concurrently. After that, more CEF web browsers are created as the user visits more apps. The user also reloads some of these apps over time. All the web apps are internal apps sharing the same domain but physically hosted on different web servers. The JavaScript resource files in question usually have caching policy set to allow them to be cached for a week.
CEFSharp version - 79.1.360.0
CEF version - r79.1.36+g90301bd+chromium-79.0.3945.130
Chromium version - 79.0.3945.130
Our Analysis so far
We checked the web-server logs for the failing JavaScript resources. We observed that in most cases, the server requests for those resource files (by the impacted user) were made a few days ago. The users are usually able to use the application well for some days before they sporadically start getting this error.
We checked the network logs (*.HAR file). We see that for the failing JavaScript resource, _transferSize is 0 (which seems to indicate that response was served from the cache)
When the error occurs, it gets resolved when we reload the app after either clearing the cache or disabling the cache from the network tab.
We tried artificially simulating this error. We used Fiddler's autoresponder feature to deliberately respond with a bad server response (the content was 'gzip' encoded however Content-Encoding header indicated 'br'). We could simulate the ERR_CONTENT_DECODING_FAILED error, however, we could see in network logs that _tranferSize was a non-zero value. We also observed that chrome did not cache the bad response. This test indicates that when the original JavaScript response was cached by the browser, it must have been a correctly encoded response, or else the browser would not have cached it.
All of the above points lead us to believe that, JavaScript resource files were downloaded (with correct encoding) and cached in CEF cache. The user was also able to use the apps for some time. After that however, in certain scenarios, some of these files potentially got corrupted in CEF cache, leading to the content decoding error.
We tried using CEF response filter mechanism as explained here to capture the bad response when content decoding error occurs. Unfortunately, we observed that dataIn stream which gets passed to filter function is null when the response fails with this error.
Summary and Questions
This is a sporadic issue which our users are facing. We haven't found a way to deterministically recreate this problem. However based on our analysis so far, we believe some JavaScript files may be getting corrupted in CEF cache over time. We are not sure if the fact that we host several CEF web browsers and load them concurrently could be playing some role in causing this issue.
Has anyone else observed/reported a similar issue? Do you have any idea if we are missing or overlooking something here or going in the wrong direction? Any pointers will be greatly appreciated.

Some requests on IIS hang for minutes and end in a lost connection

I have an awkward issue with IIS 10.0 on Windows Server 2016 and ASP.Net 4.5.2 and MVC 5.2.7.
At times, certain requests do not receive a response and run for minutes, maybe 10 or so, before ending in a lost connection (PR_CONNECT_RESET_ERROR in Firefox on Windows, NSURLDomainError in Firefox on iOS). These are mostly POST requests. When this issue occurs, other GET requests will receive a swift response and a correct result. Normally, POST-request do no take long to be processed, typically less than 3 seconds.
Recycling the associated worker process will make the issue go away, for hours or days.
When today inspected the web server when the issue was going on, I saw little CPU usage, less than 10%, memory 56%, the worker process a modest 615 MB. I saw neither logging in the W3C log of these requests, nor in my custom application logs.
I added the Web-Request-Monitor conform How do I see currently executing web request on IIS 8, but in doing so, the the worker process probably got recycled, as the issue is not currently occurring.
There are a reverse proxy and an access manager between the internet and my web server. I suppose they can have something to do with this issue, but it certainly is related to IIS, as recycling helps.
All of this is happening on a acceptation web server running a newer version of my application. I am not aware of any big changes to the application's architecture that could be involved. Also, there will be very little traffic from other clients, if none at all.
What could be next steps to investigate this issue further?
Update
This issue was definitely caused by log4net. However, it was not related to the log4net.Internal.Debug setting. It was caused by two application domains accessing the same log file. This occasionally resulted in concurrency issues with accessing the log file. It appeared that log4net could not properly handle this and got stuck while writing to the log file.
This log file was configured with the RollingFileAppender option. Since we also used AdoNetAppender, we decided to remove file logging all together.
Original
I have found a probable cause. I'll report the steps I took to investigate the issue.
I activated the Worker Processes feature in IIS.
When, after a couple of days of waiting, the issue started again, I found long running requests. They all had State ExecuteRequestHandler and Module Name ManagedPipelineHandler. They had Time Elapsed of hundreds of seconds.
I also activated the Failed Requests Tracing with a rule for long running requests with a Time Taken of 1 minute.
After a couple of days, I started to receive failed request reports. The failed request all have a GENERAL_SET_RESPONSE_HEADER event as their last event.
I added additional debug logging events for each requests. When debugging in my development environment, at one point, I started to see the hanging behaviour there, on one of the new logging statements(!). The application uses log4net.
I captured a stack trace:
log4net.dll!log4net.Appender.AppenderSkeleton.DoAppend(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Util.AppenderAttachedImpl.AppendLoopOnAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.CallAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.Log(System.Type callerStackBoundaryDeclaringType, log4net.Core.Level level, object message, System.Exception exception) log4net.dll!log4net.Core.LogImpl.DebugFormat(string format, object arg0)
The DoAppend method uses lock(this), which may very well cause hangs.
I also found out that the config setting log4net.Internal.Debug was set to true, which I do not want under normal circumstances and this may be related. I did not attempt to understand the log4net code, but I remember that logging initially did not work, in the acceptance environment, so the setting may very well have been set to true then, causing the issue to start.
Another indication that this is happening with log4net is that when the issue last occurred, I realized that logging of level standard, only occurs in some POST requests. I found a POST-request that does not log and requests to it where handled normally, while the other POST-requests still hung.
For now, I have set log4net.Internal.Debug to false and will wait to see what happens.
IIS recycle fix this issue doesn't mean that this is an IIS issue because all asp.net application run in .net runtime unless it is proved that the request is hang in IIS module.
So you may need to wait this issue happen again, then create a Failed request tracing rule for time-taken. Then it will tell us this issue is happening on IIS pipeline module or .net runtime.
If all request hang in .net runtime. Then you may have to capture a hang dump and do a deep analysis via WINDGB and mex extension. It will tell us what's happening there.

iis startup delay with aspx pages

Environment: Windows Server 2003; IIS 6, ASP.NET 2.0.50727
I'm going crazy with a brand new web server that we set up (note that this problem doesn't happen on our other web servers which have the same configuration). When loading and asp.net app the first time, the page hangs for over a full minute before showing the page in the browser. After it loads the first page, everything runs very quickly.
Note 1: You will probably say that the application is being compiled for the first time. But I've ruled that out. I put trace messages EVERYWHERE in the app and all the trace messages run within a second of requesting the page. Thus, the app compiles and runs immediately. But when the app is finished rendering the page and my last trace message is printed, nothing happens. IIS is doing something behind the scenes for a full minute before transferring the finished page along http to the user's browser.
Note 2: We found that after hitting the app the first time and things run fine, if we wait an hour then we get the delay again. Thus, IIS has something in its cache that it clears out after an hour and causes our site to stall again.
Note 3: Between each test we stop/start IIS to force it to hang upon loading the app.
Note 4: We watched the Task Manager to see if IIS was spiking and taking up a lot of resources processing something. But that wasn't it. We did see a very quick spike to 50% immediately before the browser showed the page, but for the previous 60 seconds there was only 1% usage on the server.
Note 5: On another test I created a HelloWorld.html page and this does not cause IIS to hang. Thus, it has something to do with calling the ASP.NET library the very first time it sends a rendered page across http. Also, since the app has already been compiled and runs instantly, it's just the part of asp.net that sends the rendered page to the user's browser that causes the delay.
Any ideas? We are a a loss here. All of our other web servers are setup the same way and work fine, but this is a new install. So there must be a configuration setting that was missed or maybe something needs to be installed?
Thanks,
Brian
If you have access to the servers, then make sure that app pool recycling is actually logged to the event logs
cscript adsutil.vbs get w3svc/AppPools/DefaultAppPool/LogEventOnRecycle
you can set it to log everything with
cscript adsutil.vbs Set w3svc/AppPools/DefaultAppPool/LogEventOnRecycle 255
See more here
Then check if there were any recycles.
App initialization, creation the worker process, threads, load the app domain and all the references dll's can take some time, that's normal, but that 1 minute delay is something else probably.
Try to precompile the app on the server and see if that helps
aspnet_compiler -m /LM/W3SVC/[site id ]/Root/[your appname]
If you want to dig deeper, you can check the event trace ETW.
logman query providers
Save the IIS /ASP.NET related Guids to a file like iisproviders.txt
logman start ExampleTrace -pf iisproviders.txt -ets -rt
reproduce
LogParser "SELECT * FROM ExampleTrace" -i:ETW
logman stop ExampleTrace -ets
You can find more hereTroubleshooting appdomain restarts and other issues with ETW tracing
I would also check the w3wp.exe with procexp if it has a TCP connection time out or with Procmon for other clues.
If you have experience with windbg, then you can make a request to the app then quickly attach the debugger to the process
windbg -p [process id of the app pool]
.loadby sos mscorwks
g
and take it from there. If there are exceptions, process crash, etc you should be able to catch it...
Once we had a weird server issue like this and a .NET reinstall solved the problem, still not sure what was the culprit.
Could be some aspnet.config settings on this box that are different from others. Have you tried copying over their config files to this server? There appears to be certificate options along with registry modifications that you can do to remove some lag time during the initial load of a page (precompiling aside)
See here and here
One thing you might want to check on is if there are any database access going on on your page load. That might be blocking the creation of the page during initial page load. Then when the query is cached (either by the db engine or another cache mechanism like memcached), subsequent page loads work as normal.
As per your last comment,
I could stop/start IIS multiple times and the app always ran instantly. I thought it was fixed for good. But now I just tried again (it has been sitting idle for the past couple of hours) and now it is back to hanging on the first request.
This could mean that the cache has expired and thus needs to hit the database once again, causing the delay in page load.

Mysterious IIS Problem: Site stops serving dynamic pages, no errors in logs

This may be the most mysterious problem I've ever encountered.
We have an IIS7 install with 3 Web Sites on it, each with it's own Application Pool. Once a day, for about an hour, a specific one of them goes down.
What I mean by "goes down" is:
It stops responding to requests for dynamic pages (ex. default.aspx) but will serve static files fine (logo.png).
Wireshark tells me that these dynamic page requests are actually return HTTP 500 Internal Server errors, but in the browser, I don't see an error. I just see the browser spinning.
If I log on locally to the box and surf around everything runs fine. All the pages pull up, so the database is being queried. It all seems perfectly normal.
There are no errors in the event log.
There are no errors recorded that have been captured by our internal (Application-level) error logging.
The basic IIS log file, which I thought logged every request, shows no record of these requests coming in.
And, if I restart the App Pool for the Web Site, everything comes back immediately. Or, if I just wait an hour or so, it comes back.
So, I've ruled out:
DNS issues, since I have no problem terminal servicing into the box by hostname.
Database issues, since the site works fine when I'm local to the box and surfing around
HTTP firewall issues, since I'm seeing the requests in wireshark, and am even getting images to serve up.
I have to assume it's a problem with my application, but IIS doesn't even show that these requests ever happened, and nothing in IIS or my app is logging errors.
It also doesn't even go down at the same time each day. This started at night (#midnight) and seems that it's gradually started moving it's daily time by an hour or so, until the point now where it hit at 9AM.
Any clues you might have for further troubleshooting would be greatly appreciated.
Tom
I'd fire up performance monitor and look for requests and exceptions being thrown. Not a whole lot of value in my answer but it might started pointing you in the right direction.
Actually, check the event logs first, see if something is throwing errors. Also, check memory usage and paging.

browser timeouts while asp.net application keeps running

I'm encountering a situation where it takes a long time for ASP.NET to generate reply with the web page (more than 2 hours). It due to the codebehind running for a while (very long, slow loop).
Browser (both IE & Firefox) stops waiting for the reply (after about an hour) and gives generic cannot display webpage error (similar to what you would see if you'd try to navige to non-existing server).
At the same time asp.net app keeps going (I can see it in debugger) and eventually completes.
Why does this happen? Are there any settings in web.config to influence this? I'm hoping there's a timeout setting that I'm missing that's causing this.
Maybe a settings in IE or Firefox? But I think they wait while the server is keeping connection alive.
I'm experiencing this even when I launch app in debug mode (with compilation debug="true") on my local machine from VS (so it's not running on IIS, but on ASP.NET Dev Server).
I know it's bad that it takes so long to generate the page, but it doesn't matter at this stage. Speeding it up would take a lot of extra work and the delay doesn't really matter. This is used internally.
I realize I can redesign around this issue running logic to a background process and getting notified when it's done through AJAX, or pull it to a desktop app or service or whatever. Something along those lines will be done eventually, but that's not what I'm asking about right now.
Sounds like you're using IE and it is timing out while waiting for a response from the server.
You can find a technet article to adjust this limit:
http://support.microsoft.com/kb/181050
CAUSE
By design, Internet Explorer imposes a
time-out limit for the server to
return data. The time-out limit is
five minutes for versions 4.0 and 4.01
and is 60 minutes for versions 5.x, 6,
and 7. As a result, Internet Explorer
does not wait endlessly for the server
to come back with data when the server
has a problem. Back to the top
RESOLUTION
In general, if a page does not return within a few
minutes, many users perceive that a
problem has occurred and stop the
process. Therefore, design your server
processes to return data within 5
minutes so that users do not have to
wait for an extensive period of time.
The entire paradigm of the Web is of request/response. Not request, wait two hours, response!
If the work takes so long to do, then have the page request trigger the work, and then not wait for it. Put the long-running code into a Windows service, and have the service listen to an MSMQ queue (or use WCF with an MSMQ endpoint). Have the page send requests for work to this queue. The service will read a request, maybe start up a new thread to process it, then write a response to another queue, file, or whatever.
The same page, or a different, "progress" page can poll the response queue or file for responses, and update the user, assuming the user still cares after two hours.
For something that takes this long, I would figure out a way to kick it off via AJAX and then periodically check on it's status. The background process should update some status variable on a regular basis and store it's data in the cache or session when complete. When it completes and the browser detects this (via AJAX), have the browser do a real postback (or get by changing location.href), pick up the saved data, and generate the page.
I have a process that can take a few minutes so I spin off a separate thread and send the result via ftp. If an error occures in the process I send myself an error message including the stack trace. You may want to consider sending the results via email or some other place then the browser and use a thread as well.

Resources