Weird / arbitrarily behavior of Collabora Online - iframe

So I’ve been experiencing some very strange behavior alongside Collabora Online. Since migrating to a new Collabora Online Server (version 21.11.3), nothing works reliably anymore. All the abnormalities described in the following follow no pattern, are totally unpredictable and therefore cannot be reproduced.
The save button sometimes work and sometimes it does not. Meaning the changes sometimes get saved on the file server as it should be and sometimes no http request to the wopi host is made. I tried to manually save with the help of the PostMessageAPI. When adding an event listener to the iframe window the Collabora Online editor is embedded in, I notice that it rarely gets trigered upon user actions such as “saving the document.” Feels very buggy to me.
Often the document is loaded and rendered instantly without any problems. When trying to load the same document some time later, it sometimes happens that the content of the iframe schows me some weird error messages. I attached them to this thread. The first error message says that it is cleaning up the document from the last session. Whatever that means, it can last for hours. The second error message says it cannot establish a connection to the document (although it worked minutes before).
Could this be a configuration error on a server Collabora Online is running/depending on?
Error1
Error2

this might be a bug → https://github.com/CollaboraOnline/online/issues/4773
please participate there!

Related

Chrome ERR_HTTP2_PROTOCOL_ERROR + Firefox Secure Connection Failed

I'm hosting a website serves global regions, and recently there's a weird issue came up.
Already checked other posts on the Internet including the one in stackoverflow with a lot of discussions:Chrome net::ERR_HTTP2_PROTOCOL_ERROR 200 after a reconnect , but none of the answers helped.
Website is building on ASP.NET webform legacy "website" (not web application).
There's a important function which performs several process once user click a button on website.
Let's say there are 100 lines of code in that function, and I've added some flags to log which steps have been hit and processed.
Weird situation is:
Only China users are facing the issue. (website is not hosted in China)
Some users are using firefox and it returned below, in English it is "Secure Connection Failed"
But checked several posts including firefox documents, there should be error code on screen like
ssl_error_no_cypher_overlap but there is nothing.
Firefox error
Some users are using other browsers which is Chrome based, it returns:
Chrome error
In additionally, I checked the process log in these user feedbacks, most of them does not finish all the code, in other words, if there are 100 lines of codes and some of them just stopped in line 50.
Website has TLS 1.2 enabled, also http2 protocol (h2) is applied when I checked via Chrome-Network tab.
I'm wondering if it is possible if client browser shut down the connection in some reasons, it will end with the result I see (stopped at the middle of entire code flow), from my opinion if a request is posted to server then no matter what client does, the process should finish entire flow.
Any ideas or thoughts will be appreciated!
I was just dealing with that exact situation.
From what I read in various posts on the HTTP2_PROTOCOL_ERROR, I think what happens is the response is started but code problem(s) prevent the server from completing the response. The incomplete response gives the protocol error in Chrome, and, because it's over TLS, Firefox sees it as a security error. (I'd share links, but I've already closed all those windows - sorry.)
Somehow my code was preventing the server from completing the response without causing an exception.
I was able to track down the offending code by commenting out the body of every code-behind procedure on the page and then bringing them back one at a time.
Good luck to you!
I can't give you a concrete example, but in my case, there was no problem on the application side.
Have you recently added settings to your in-house infrastructure engineer?
For example, have you added WAF settings? You may want to check.
FYI

Backend doesn't respond to HTTP requests

I know that the information that follows is most certainly not enough to solve my problem and I may specify what is needed when and if needed.
The situation is the following. I was programming normally and, as I did, I made a PUT request to my application backend. The browser tab suddenly logged many (like, many) errors concerning different information which I cannot remember due to the fact that it closed itself soon after. Almost at the same time, both VSCode windows I had open and running the backend and frontend of the application closed themselves.
Since then, the backend does not appear to answer any HTTP request made to it. It doesn't get to the point in which the code of the controllers run, for I have put a console.log there and nothing is logged. Similarly, no errors are shown in the console when the request is made. It still connects to an MQTT broker, as it was supposed to.
The Insomnia request loads forever unless I cancel it, and, Insomnia still notices when the connection is reset, giving the
Failure when receiving data from the peer
error.
Lastly, the frontend created a debug.log file in the project's folder that contain 21 lines that read
[0624/203732.834:ERROR:crash_report_database_win.cc(428)] unexpected header
with the only difference being the numbers in the start, and, also created a yarn-error.log, a very long log with a line that caught my eye (because it had the word "Error" in it) that reads, among other things,
Trace: Error: EBUSY: resource busy or locked...
I have no clue what happened or what I should do.
Run the following:
npm install express

ASP.Net WebForms Communication Failure in Production

I am experiencing a problem in production with two specific webforms that perform a server-side postback to perform calculations.
There is a <button runat=server onserverclick=doMath>Calc</button>.
All of the data for the calculations is on the web page, and there is no database communication, but the code is written old school and everything happens server-side via postbacks; no ajax panels etc.
When the button is pressed in production, for some users, a page can not be displayed error is returned after 30-60 seconds. In the application logs on the server there is matching log entry that states an object reference was null. After testing and testing further it is clear that the data for the null reference is being sent to the web server, but it is not getting there in its entirety, and no response is making it to the user even though an error is logged.
The code seems to not be relevant, however, if that was the case, I think I would see this taking place on more than two pages. And these two pages are very similar and related to each other. However, because the problem is intermittent and it only happens to some users I also think it is a network communication problem. For example:
From home I can use the calc button over and over and I only get the error once out of 1000 clicks.
From the office I can get the error almost every single click.
The problem never takes place in dev or in qa. I am hoping for help with a method to isolate the source of the problem or maybe someone has seen this before.
EventValidation is off.
Path Pings show that there are some nodes dropping packets, but they are not "our" servers.
After cracking open Wireshark I have discovered some additional information. When the "timeout" takes place a handshake is failing.
bad handshake?
Unfortunately, I am not a network guru. Even if this is the problem I am still concerned as it only seems to happen with two specific pages.

Error 403 on SECOND postback of the same form (and various other situations)

we recently migrated our application (IIS Server + DB Server) to AWS and also modified the network architecture a little bit. The entry point of the system is an Astaro Firewall (we use the AWS AMI) which also host the SSL certificate of the web server. Everything related to the firewall has been done by a vendor and we only have some read-only privileges.
We are getting 403 errors in a few situations but I will explain one, as they all may be related.
We got a form which query the database and return a report in HTML format (this report also have some checkbox to do updates). The first time the form is submitted, we always get the report back. If we wanna post the form again, updated with new data, it crash, returning error 403. We noted that it doesn't crash when the first results returned a very low number of rows (or none).
Looking at the details of the POSTs in Developer Tools, what seems to be the only difference between a working and 403 error reply is the size of the data posted. The second post is always bigger because it contains the data of the first report (as the page have also other option to checkbox the rows).
Also, looking at the IIS logs we don't see any traces of the POST that crash. Nothing at all.
This problem happen only in production. In dev environment it's all working flawlessly. The only difference is that the production have the firewall/ssl, while development is all open. This is why we think it may be related to SSL.
The vendor is not the most helpful, we are looking for help to pinpoint the issue and trying to take the situation in our hands.
Any input appreciated.

synchronizer - unable to get client-side resource

I already put this into the old forum so I hope this will be fine.
Suddenly in one location users to the CMS side now are getting errors. If they work elsewhere there is no problems. I know the forum usage is low but if I shall slap the network people silly I need to have some pointers.
User gets several errors during the loading homepage process.
Err 1: A few times: JavaScript alert -
[synchronizer] unable to get client-side resource with ID xxxx
Err 2: Sometimes:
Unspecified error. on /library/javascript/mdvc.js
Err 3: several times:
A GUI system error occured. Details:[CmdsHTTPDone]
<tcmapi:Response xmlns:tcmapi="http://www.tridion.com/ContentManager/5.0/TCMAPI" success="false" actionWF="false" ID="WebGUIResponder.aspx"><tcmapi:Error><tcm:Line Cause="true" mlns:tcm="http://www.tridion.com/ContentManager/5.0"><![CDATA[Request message cannot be empty. ]]></tcm:Line></tcmapi:Error></tcmapi:Response>
Err 4: Sometimes we also get "permission denied" errors on TaskBarControl.js or other scripts.
In the end.. all views empty.
When trying to use a web proxy tool (Fiddler2) to see what is sent/received; user do NOT get any problems. Can log in and use the CMS without any problems. As long as the local web proxy tool is used, user have no problems with the CMS. As soon as tool is shut down, same problems come back.
So using this tool, we cannot even debug as we don't know what impact fiddler has on the connection making it work. Just in one location for Prod and Test (same issues) but DEV still is fine.. so my deduction is.. "some rule in the local network" is wrong - but how to proceed?
The CME GUI loaded in the browser reguarly checks back with the CME server. This looks like the browser cannot get a connection with the CME server.
For further troublehsooting you can try what happens if you do a full reload (CTRL-F5) of the web browser to see if it has a connection issue indeed.
If it is a connection issue it might not be Tridion related at all.
This is probably a proxy issue -- especially since you say that you cannot reproduce it using Fiddler. Fiddler works by acting as a proxy, so that would explain the lack of symptoms when using it.
You can try just using your browser's developer tools (press F12). Then watch for any requests that come back with a different status code than 200 or 304. You can then show this to your network team who can hopefully troubleshoot the issue from there.

Resources