Chrome ERR_HTTP2_PROTOCOL_ERROR + Firefox Secure Connection Failed - asp.net

I'm hosting a website serves global regions, and recently there's a weird issue came up.
Already checked other posts on the Internet including the one in stackoverflow with a lot of discussions:Chrome net::ERR_HTTP2_PROTOCOL_ERROR 200 after a reconnect , but none of the answers helped.
Website is building on ASP.NET webform legacy "website" (not web application).
There's a important function which performs several process once user click a button on website.
Let's say there are 100 lines of code in that function, and I've added some flags to log which steps have been hit and processed.
Weird situation is:
Only China users are facing the issue. (website is not hosted in China)
Some users are using firefox and it returned below, in English it is "Secure Connection Failed"
But checked several posts including firefox documents, there should be error code on screen like
ssl_error_no_cypher_overlap but there is nothing.
Firefox error
Some users are using other browsers which is Chrome based, it returns:
Chrome error
In additionally, I checked the process log in these user feedbacks, most of them does not finish all the code, in other words, if there are 100 lines of codes and some of them just stopped in line 50.
Website has TLS 1.2 enabled, also http2 protocol (h2) is applied when I checked via Chrome-Network tab.
I'm wondering if it is possible if client browser shut down the connection in some reasons, it will end with the result I see (stopped at the middle of entire code flow), from my opinion if a request is posted to server then no matter what client does, the process should finish entire flow.
Any ideas or thoughts will be appreciated!

I was just dealing with that exact situation.
From what I read in various posts on the HTTP2_PROTOCOL_ERROR, I think what happens is the response is started but code problem(s) prevent the server from completing the response. The incomplete response gives the protocol error in Chrome, and, because it's over TLS, Firefox sees it as a security error. (I'd share links, but I've already closed all those windows - sorry.)
Somehow my code was preventing the server from completing the response without causing an exception.
I was able to track down the offending code by commenting out the body of every code-behind procedure on the page and then bringing them back one at a time.
Good luck to you!

I can't give you a concrete example, but in my case, there was no problem on the application side.
Have you recently added settings to your in-house infrastructure engineer?
For example, have you added WAF settings? You may want to check.
FYI

Related

Weird / arbitrarily behavior of Collabora Online

So I’ve been experiencing some very strange behavior alongside Collabora Online. Since migrating to a new Collabora Online Server (version 21.11.3), nothing works reliably anymore. All the abnormalities described in the following follow no pattern, are totally unpredictable and therefore cannot be reproduced.
The save button sometimes work and sometimes it does not. Meaning the changes sometimes get saved on the file server as it should be and sometimes no http request to the wopi host is made. I tried to manually save with the help of the PostMessageAPI. When adding an event listener to the iframe window the Collabora Online editor is embedded in, I notice that it rarely gets trigered upon user actions such as “saving the document.” Feels very buggy to me.
Often the document is loaded and rendered instantly without any problems. When trying to load the same document some time later, it sometimes happens that the content of the iframe schows me some weird error messages. I attached them to this thread. The first error message says that it is cleaning up the document from the last session. Whatever that means, it can last for hours. The second error message says it cannot establish a connection to the document (although it worked minutes before).
Could this be a configuration error on a server Collabora Online is running/depending on?
Error1
Error2
this might be a bug → https://github.com/CollaboraOnline/online/issues/4773
please participate there!

A potential CEF cache corruption scenario

We have a .NET WPF container app in which we host several web apps using CEFSharp.WinForms control. At times, we see that for some users, some JavaScript resource requests fail with the ERR_CONTENT_DECODING_FAILED error message. This issue gets resolved if we reload the app after either clearing the CEF cache or after disabling the cache from the network tab in the developer toolbar window. Please note that this issue isn't confined to a specific subset of resource files: instead, we have seen it happening sporadically for a variety of JavaScript resource files (some hosted on Apache while the others hosted on IIS servers).
While a possible cause for usual ERR_CONTENT_DECODING_FAILED error is a server-side content-encoding issue, in this specific case, we believe this could potentially be related to the CEF browser caching. Please see the analysis section below for the reasons we believe so.
Application Setup
When we initialize CEF settings, we set MultiThreadedMessageLoop setting to true and set CachePath property to a location under %localappdata% on windows 10 machine. When the container app starts, it creates three CEF web browser controls and launches web apps in them. All three apps load concurrently. After that, more CEF web browsers are created as the user visits more apps. The user also reloads some of these apps over time. All the web apps are internal apps sharing the same domain but physically hosted on different web servers. The JavaScript resource files in question usually have caching policy set to allow them to be cached for a week.
CEFSharp version - 79.1.360.0
CEF version - r79.1.36+g90301bd+chromium-79.0.3945.130
Chromium version - 79.0.3945.130
Our Analysis so far
We checked the web-server logs for the failing JavaScript resources. We observed that in most cases, the server requests for those resource files (by the impacted user) were made a few days ago. The users are usually able to use the application well for some days before they sporadically start getting this error.
We checked the network logs (*.HAR file). We see that for the failing JavaScript resource, _transferSize is 0 (which seems to indicate that response was served from the cache)
When the error occurs, it gets resolved when we reload the app after either clearing the cache or disabling the cache from the network tab.
We tried artificially simulating this error. We used Fiddler's autoresponder feature to deliberately respond with a bad server response (the content was 'gzip' encoded however Content-Encoding header indicated 'br'). We could simulate the ERR_CONTENT_DECODING_FAILED error, however, we could see in network logs that _tranferSize was a non-zero value. We also observed that chrome did not cache the bad response. This test indicates that when the original JavaScript response was cached by the browser, it must have been a correctly encoded response, or else the browser would not have cached it.
All of the above points lead us to believe that, JavaScript resource files were downloaded (with correct encoding) and cached in CEF cache. The user was also able to use the apps for some time. After that however, in certain scenarios, some of these files potentially got corrupted in CEF cache, leading to the content decoding error.
We tried using CEF response filter mechanism as explained here to capture the bad response when content decoding error occurs. Unfortunately, we observed that dataIn stream which gets passed to filter function is null when the response fails with this error.
Summary and Questions
This is a sporadic issue which our users are facing. We haven't found a way to deterministically recreate this problem. However based on our analysis so far, we believe some JavaScript files may be getting corrupted in CEF cache over time. We are not sure if the fact that we host several CEF web browsers and load them concurrently could be playing some role in causing this issue.
Has anyone else observed/reported a similar issue? Do you have any idea if we are missing or overlooking something here or going in the wrong direction? Any pointers will be greatly appreciated.

ASP.Net WebForms Communication Failure in Production

I am experiencing a problem in production with two specific webforms that perform a server-side postback to perform calculations.
There is a <button runat=server onserverclick=doMath>Calc</button>.
All of the data for the calculations is on the web page, and there is no database communication, but the code is written old school and everything happens server-side via postbacks; no ajax panels etc.
When the button is pressed in production, for some users, a page can not be displayed error is returned after 30-60 seconds. In the application logs on the server there is matching log entry that states an object reference was null. After testing and testing further it is clear that the data for the null reference is being sent to the web server, but it is not getting there in its entirety, and no response is making it to the user even though an error is logged.
The code seems to not be relevant, however, if that was the case, I think I would see this taking place on more than two pages. And these two pages are very similar and related to each other. However, because the problem is intermittent and it only happens to some users I also think it is a network communication problem. For example:
From home I can use the calc button over and over and I only get the error once out of 1000 clicks.
From the office I can get the error almost every single click.
The problem never takes place in dev or in qa. I am hoping for help with a method to isolate the source of the problem or maybe someone has seen this before.
EventValidation is off.
Path Pings show that there are some nodes dropping packets, but they are not "our" servers.
After cracking open Wireshark I have discovered some additional information. When the "timeout" takes place a handshake is failing.
bad handshake?
Unfortunately, I am not a network guru. Even if this is the problem I am still concerned as it only seems to happen with two specific pages.

Error 403 on SECOND postback of the same form (and various other situations)

we recently migrated our application (IIS Server + DB Server) to AWS and also modified the network architecture a little bit. The entry point of the system is an Astaro Firewall (we use the AWS AMI) which also host the SSL certificate of the web server. Everything related to the firewall has been done by a vendor and we only have some read-only privileges.
We are getting 403 errors in a few situations but I will explain one, as they all may be related.
We got a form which query the database and return a report in HTML format (this report also have some checkbox to do updates). The first time the form is submitted, we always get the report back. If we wanna post the form again, updated with new data, it crash, returning error 403. We noted that it doesn't crash when the first results returned a very low number of rows (or none).
Looking at the details of the POSTs in Developer Tools, what seems to be the only difference between a working and 403 error reply is the size of the data posted. The second post is always bigger because it contains the data of the first report (as the page have also other option to checkbox the rows).
Also, looking at the IIS logs we don't see any traces of the POST that crash. Nothing at all.
This problem happen only in production. In dev environment it's all working flawlessly. The only difference is that the production have the firewall/ssl, while development is all open. This is why we think it may be related to SSL.
The vendor is not the most helpful, we are looking for help to pinpoint the issue and trying to take the situation in our hands.
Any input appreciated.

synchronizer - unable to get client-side resource

I already put this into the old forum so I hope this will be fine.
Suddenly in one location users to the CMS side now are getting errors. If they work elsewhere there is no problems. I know the forum usage is low but if I shall slap the network people silly I need to have some pointers.
User gets several errors during the loading homepage process.
Err 1: A few times: JavaScript alert -
[synchronizer] unable to get client-side resource with ID xxxx
Err 2: Sometimes:
Unspecified error. on /library/javascript/mdvc.js
Err 3: several times:
A GUI system error occured. Details:[CmdsHTTPDone]
<tcmapi:Response xmlns:tcmapi="http://www.tridion.com/ContentManager/5.0/TCMAPI" success="false" actionWF="false" ID="WebGUIResponder.aspx"><tcmapi:Error><tcm:Line Cause="true" mlns:tcm="http://www.tridion.com/ContentManager/5.0"><![CDATA[Request message cannot be empty. ]]></tcm:Line></tcmapi:Error></tcmapi:Response>
Err 4: Sometimes we also get "permission denied" errors on TaskBarControl.js or other scripts.
In the end.. all views empty.
When trying to use a web proxy tool (Fiddler2) to see what is sent/received; user do NOT get any problems. Can log in and use the CMS without any problems. As long as the local web proxy tool is used, user have no problems with the CMS. As soon as tool is shut down, same problems come back.
So using this tool, we cannot even debug as we don't know what impact fiddler has on the connection making it work. Just in one location for Prod and Test (same issues) but DEV still is fine.. so my deduction is.. "some rule in the local network" is wrong - but how to proceed?
The CME GUI loaded in the browser reguarly checks back with the CME server. This looks like the browser cannot get a connection with the CME server.
For further troublehsooting you can try what happens if you do a full reload (CTRL-F5) of the web browser to see if it has a connection issue indeed.
If it is a connection issue it might not be Tridion related at all.
This is probably a proxy issue -- especially since you say that you cannot reproduce it using Fiddler. Fiddler works by acting as a proxy, so that would explain the lack of symptoms when using it.
You can try just using your browser's developer tools (press F12). Then watch for any requests that come back with a different status code than 200 or 304. You can then show this to your network team who can hopefully troubleshoot the issue from there.

Resources