I was assigned some debugging to do on an existing ASP.NET WebForms application that uses a lot of UpdatePanels.
One of the button launches some tasks that are very long to perform, and it is expected.
The problem is that we have a strange behavior : on Internet Explorer, the query stops exactly 5 minutes after it started with the following error :
On IE11 with enterprise mode enabled (simulates IE8) (the website uses Enterprise Mode):
Sys.WebForms.PageRequestManagerServerErrorException: An unknown error
occurred while processing the request on the server. The status code
returned from the server was: 12019
On native IE11 :
XMLHttpRequest: Network Error 0x2ef3, Could not complete the operation due to error 00002ef3
Sys.WebForms.PageRequestManagerServerErrorException: An unknown error
occurred while processing the request on the server. The status code
returned from the server was: 0
So the update panel is not refreshed, but I see that even if the query failed, the server continues to perform the tasks (writes to the database), so it's not a server timeout.
Moreover, if I perform the same action locally, on the IIS server, the query isn't interrupted.
What baffles me is that it works without any issue with Chrome (the query we're testing lasts 6 minutes)
I read a lot of things on the Internet but nothing that helps, to my knowledge we don't have a load balancer (this error seem to appear mostly with reports behind a load balancer).
I looked for all the timeouts I can (server and client side), and increased all the values to ridiculous amounts, no effect.
I tried to apply the changes to the registry as explained here, without success.
I tried with Fiddler, but not sure how to interpret its results and both on IE and Chrome, 5 minutes after the start of the query I get an HTTP 401 error and it asks for an authentication (we're using Windows authentication). That only happens when using Fiddler, no authentication is asked usually.
Not sure what to do with that :). Again, Chrome seems to handle it without problem.
I don't have any more ideas...
I suspect that it's a company proxy issue but I don't understand why it works with Chrome, and it might be complicated to ask for configuration changes on the proxy, I need to be absolutely sure it's that, I don't know how.
Any idea/suggestion?
Related
I'm hosting a website serves global regions, and recently there's a weird issue came up.
Already checked other posts on the Internet including the one in stackoverflow with a lot of discussions:Chrome net::ERR_HTTP2_PROTOCOL_ERROR 200 after a reconnect , but none of the answers helped.
Website is building on ASP.NET webform legacy "website" (not web application).
There's a important function which performs several process once user click a button on website.
Let's say there are 100 lines of code in that function, and I've added some flags to log which steps have been hit and processed.
Weird situation is:
Only China users are facing the issue. (website is not hosted in China)
Some users are using firefox and it returned below, in English it is "Secure Connection Failed"
But checked several posts including firefox documents, there should be error code on screen like
ssl_error_no_cypher_overlap but there is nothing.
Firefox error
Some users are using other browsers which is Chrome based, it returns:
Chrome error
In additionally, I checked the process log in these user feedbacks, most of them does not finish all the code, in other words, if there are 100 lines of codes and some of them just stopped in line 50.
Website has TLS 1.2 enabled, also http2 protocol (h2) is applied when I checked via Chrome-Network tab.
I'm wondering if it is possible if client browser shut down the connection in some reasons, it will end with the result I see (stopped at the middle of entire code flow), from my opinion if a request is posted to server then no matter what client does, the process should finish entire flow.
Any ideas or thoughts will be appreciated!
I was just dealing with that exact situation.
From what I read in various posts on the HTTP2_PROTOCOL_ERROR, I think what happens is the response is started but code problem(s) prevent the server from completing the response. The incomplete response gives the protocol error in Chrome, and, because it's over TLS, Firefox sees it as a security error. (I'd share links, but I've already closed all those windows - sorry.)
Somehow my code was preventing the server from completing the response without causing an exception.
I was able to track down the offending code by commenting out the body of every code-behind procedure on the page and then bringing them back one at a time.
Good luck to you!
I can't give you a concrete example, but in my case, there was no problem on the application side.
Have you recently added settings to your in-house infrastructure engineer?
For example, have you added WAF settings? You may want to check.
FYI
I have an awkward issue with IIS 10.0 on Windows Server 2016 and ASP.Net 4.5.2 and MVC 5.2.7.
At times, certain requests do not receive a response and run for minutes, maybe 10 or so, before ending in a lost connection (PR_CONNECT_RESET_ERROR in Firefox on Windows, NSURLDomainError in Firefox on iOS). These are mostly POST requests. When this issue occurs, other GET requests will receive a swift response and a correct result. Normally, POST-request do no take long to be processed, typically less than 3 seconds.
Recycling the associated worker process will make the issue go away, for hours or days.
When today inspected the web server when the issue was going on, I saw little CPU usage, less than 10%, memory 56%, the worker process a modest 615 MB. I saw neither logging in the W3C log of these requests, nor in my custom application logs.
I added the Web-Request-Monitor conform How do I see currently executing web request on IIS 8, but in doing so, the the worker process probably got recycled, as the issue is not currently occurring.
There are a reverse proxy and an access manager between the internet and my web server. I suppose they can have something to do with this issue, but it certainly is related to IIS, as recycling helps.
All of this is happening on a acceptation web server running a newer version of my application. I am not aware of any big changes to the application's architecture that could be involved. Also, there will be very little traffic from other clients, if none at all.
What could be next steps to investigate this issue further?
Update
This issue was definitely caused by log4net. However, it was not related to the log4net.Internal.Debug setting. It was caused by two application domains accessing the same log file. This occasionally resulted in concurrency issues with accessing the log file. It appeared that log4net could not properly handle this and got stuck while writing to the log file.
This log file was configured with the RollingFileAppender option. Since we also used AdoNetAppender, we decided to remove file logging all together.
Original
I have found a probable cause. I'll report the steps I took to investigate the issue.
I activated the Worker Processes feature in IIS.
When, after a couple of days of waiting, the issue started again, I found long running requests. They all had State ExecuteRequestHandler and Module Name ManagedPipelineHandler. They had Time Elapsed of hundreds of seconds.
I also activated the Failed Requests Tracing with a rule for long running requests with a Time Taken of 1 minute.
After a couple of days, I started to receive failed request reports. The failed request all have a GENERAL_SET_RESPONSE_HEADER event as their last event.
I added additional debug logging events for each requests. When debugging in my development environment, at one point, I started to see the hanging behaviour there, on one of the new logging statements(!). The application uses log4net.
I captured a stack trace:
log4net.dll!log4net.Appender.AppenderSkeleton.DoAppend(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Util.AppenderAttachedImpl.AppendLoopOnAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.CallAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.Log(System.Type callerStackBoundaryDeclaringType, log4net.Core.Level level, object message, System.Exception exception) log4net.dll!log4net.Core.LogImpl.DebugFormat(string format, object arg0)
The DoAppend method uses lock(this), which may very well cause hangs.
I also found out that the config setting log4net.Internal.Debug was set to true, which I do not want under normal circumstances and this may be related. I did not attempt to understand the log4net code, but I remember that logging initially did not work, in the acceptance environment, so the setting may very well have been set to true then, causing the issue to start.
Another indication that this is happening with log4net is that when the issue last occurred, I realized that logging of level standard, only occurs in some POST requests. I found a POST-request that does not log and requests to it where handled normally, while the other POST-requests still hung.
For now, I have set log4net.Internal.Debug to false and will wait to see what happens.
IIS recycle fix this issue doesn't mean that this is an IIS issue because all asp.net application run in .net runtime unless it is proved that the request is hang in IIS module.
So you may need to wait this issue happen again, then create a Failed request tracing rule for time-taken. Then it will tell us this issue is happening on IIS pipeline module or .net runtime.
If all request hang in .net runtime. Then you may have to capture a hang dump and do a deep analysis via WINDGB and mex extension. It will tell us what's happening there.
In case of TL;DR - I basically need guidance regarding what tools are available to debug requests which are issued to IIS and which stall inside a module.
I have a problem with an old ASP 2.0 app at the moment whereby it will periodically become unavailable and recycling the app pool (horrible as that may be) doesn't bring it back up 100% of the time.
So first of all it presents itself as requests entering the app pool and being trapped in state 'BeginRequest' in RewriteModule.
It is not a specific request which is always the first to experience this issue. The issue cannot be easily recreated either.
Eventually requests join this backlog and when it becomes 70+ deep the app pool fails to respond to pings from WAS and it forcibly recycles. Predictably it doesn't stop on-time and the old app pool is forced to stop. When the new app pool comes up it either works just fine or it instantly experiences the same issue as the outgoing one and requests begin to queue.
In issues like this all the official guidance is understandably focussed around looking at why the RewriteModule may choke.
I have validated my redirections and though complex there are no obvious issues with syntax (XML validates).
Likewise in inetmgr loading up the URL Rewrite Module seems to parse the configs fine and show them visually.
Basic stuff like permissions is all fine.
When the app is working normally I also used Failed Request Tracing/Logging to look at the request pipeline for a sample URL which stalled and I can confirm that there is no circular logic or weird errors presenting - the request seems to be handled just fine. This also showed me how high up the rewritemodule is invoked and from this I really don't see how the issue could be app-related as .NET isn't invoked at this point.
Annoyingly when an app pool is experiencing this issue and I can throw in requests which just stall Failed Request Tracing is no good because you actually need a request to get to the end of it's journey and fail otherwise it refuses to log anything out.
I resorted to taking process dumps of affected w3wp.exe's and running them through DebugDiag. Unfortunately the only thing I see is that threads are open accessing the rewritemodule but precious little about what they are stuck on.
As anyone else would do I've tried to track the start of the issue back to any recently installed patches or code changes but nothing matches. Likewise this is happening on 3x servers otherwise I would try reinstalling the rewritemodule. Other sites on the same server which invoke rewritemodule are unaffected.
Has anyone else experienced issues like this - the net seems to have relatively little info in this case. Perhaps you can recommend further debugging tools or approaches for IIS which I can adapt to this scenario? This is sort of a cry for help from someone more used to Apache/Nginx - sorry for the long post.
I already put this into the old forum so I hope this will be fine.
Suddenly in one location users to the CMS side now are getting errors. If they work elsewhere there is no problems. I know the forum usage is low but if I shall slap the network people silly I need to have some pointers.
User gets several errors during the loading homepage process.
Err 1: A few times: JavaScript alert -
[synchronizer] unable to get client-side resource with ID xxxx
Err 2: Sometimes:
Unspecified error. on /library/javascript/mdvc.js
Err 3: several times:
A GUI system error occured. Details:[CmdsHTTPDone]
<tcmapi:Response xmlns:tcmapi="http://www.tridion.com/ContentManager/5.0/TCMAPI" success="false" actionWF="false" ID="WebGUIResponder.aspx"><tcmapi:Error><tcm:Line Cause="true" mlns:tcm="http://www.tridion.com/ContentManager/5.0"><![CDATA[Request message cannot be empty. ]]></tcm:Line></tcmapi:Error></tcmapi:Response>
Err 4: Sometimes we also get "permission denied" errors on TaskBarControl.js or other scripts.
In the end.. all views empty.
When trying to use a web proxy tool (Fiddler2) to see what is sent/received; user do NOT get any problems. Can log in and use the CMS without any problems. As long as the local web proxy tool is used, user have no problems with the CMS. As soon as tool is shut down, same problems come back.
So using this tool, we cannot even debug as we don't know what impact fiddler has on the connection making it work. Just in one location for Prod and Test (same issues) but DEV still is fine.. so my deduction is.. "some rule in the local network" is wrong - but how to proceed?
The CME GUI loaded in the browser reguarly checks back with the CME server. This looks like the browser cannot get a connection with the CME server.
For further troublehsooting you can try what happens if you do a full reload (CTRL-F5) of the web browser to see if it has a connection issue indeed.
If it is a connection issue it might not be Tridion related at all.
This is probably a proxy issue -- especially since you say that you cannot reproduce it using Fiddler. Fiddler works by acting as a proxy, so that would explain the lack of symptoms when using it.
You can try just using your browser's developer tools (press F12). Then watch for any requests that come back with a different status code than 200 or 304. You can then show this to your network team who can hopefully troubleshoot the issue from there.
I wonder if refreshing page with runtime error will overload the web server. For example I did refreshed domain.com/default.asp?id=99999999999999999999999999999999999999999 page which generates following error:
Microsoft VBScript runtime error '800a000d'
Type mismatch: 'Cint'
/default.asp, line 9
This caused server not respond for all sites hosted on it or my IP was blocked for some time by the firewall.
It depends on what the rest of the code around that error looks like (which you can't see). You won't overload the server in the sense of DoS too many requests (flood) since those would be handled before the request gets to IIS to process on the server side.
But if the code where the page breaks does other processing based on that value it could crash IIS or the app pool. It could also be stuck waiting on a passed DB call and has to timeout before the server responds. It will either time out or reset itself and that is when you see the site functional again.
Either way the code or the website/server should be setup better to alleviate the problem. Hopefully the admins will figure that out when they investigate why the site keeps crashing due to your web hits ;)
Issue seems to be type casting. Try below steps.
Add this line on top of the page where you get this error:
"Option Explicit"
You can get more meaning ful error message:
This link provide details for each error message.