Why is there a sudden increase in TTransportException? - evernote

We're recently receiving a lot of com.evernote.thrift.transport.TTransportException with HTTP codes 400, 429, 418.
429 corresponds to Too Many Requests but it was handled with EDAMSystemException RATE_LIMIT_REACHED before.

We managed to make contact with Evernote developer support where they came to the realisation that a change their end was causing our issues.
We'd observed:
rate limits being far stricter than normal
rate limit errors being empty 429 responses, therefore not being picked up by the correct error handling on the SDK (we using Ruby)
Our issue is fixed now. My hope is that the change that fixed things for us has been rolled out to everyone, but if not I recommend emailing devsupport#evernote.com.

Related

Why is my website experiencing random slow api requests?

I have a VB.NET/Vue website hosted on an internal IIS 8.5 Windows 2012R2 Server. Our company has about 30 users using the site at any given time. The users are experiencing random delays throughout the day and on some days there's no delays (site works great most of the time). What I'm looking for is any suggestions on where to start looking to solve the issue. Here's what I've found so far.
User goes to site and initiates an api request from the UI
User sees a loading icon for anywhere up to a minute or so while the request returns
The request eventually reaches the server after some time and executes really fast within milliseconds and returns the response to the user
By this time, many users have already refreshed the page making new requests that succeed on page load. For the users that are patient and wait for the response, it eventually returns the response.
Here's some screenshots:
So to sum everything up, there are several users experiencing delays on a daily basis.
Some days we don’t have any delays, but most days we have several users experiencing multiple delays of several seconds to 30 seconds to 1 minute.
I’ve found all this using LogRocket and NewRelic and what is happening is all these requests are completing within milliseconds, but the request doesn’t seem to reach the server for some period of time.
I’ve been monitoring the CPU/Memory/Network on these servers and it all seems fine to me during when these issues occur.
It seems that the problem lies between the users computer and whatever hardware/software exists before reaching the web server.
Update here... Found that the problem is occurring on the users computer in all these instances. Using google chrome's performance api, I was able to track timing info for these requests and found that the problem is in the fetchStart. So whatever is happening here is the cause of the issue.
Example below:
entryType: resource
startTime: 1119531.820000033
duration: 56882.43999995757
initiatorType: xmlhttprequest
nextHopProtocol: http/1.1
workerStart: 0
redirectStart: 0
redirectEnd: 0
fetchStart: 1119531.820000033
domainLookupStart: 1176401.0199999902
domainLookupEnd: 1176402.2699999623
connectStart: 1176402.2699999623
connectEnd: 1176404.8350000521
secureConnectionStart: 1176403.6700000288
requestStart: 1176404.8549999716
responseStart: 1176413.5300000198
responseEnd: 1176414.2599999905
transferSize: 15145
encodedBodySize: 14884
decodedBodySize: 14884
serverTiming: []
workerTiming: []
fetchStart is at 1119531.820000033, then requestStart is at 1176404.8549999716 so the problem is something between fetchStart and requestStart. Still looking into what is causing this.
In 2022, we are experiencing something very similar with a small fraction of our customers. There is a significant gap between the timing api requestStart and the startTime. This gap can be up to 8 minutes -- I admire the patience of customers waiting that long. The wait periods are also close to multiples of a minute.
In our case, it appears that there is a (transparent?) proxy between those browsers and our server infrastructure which appears to be triggering the problem. In particular, it forces a downgrade of HTTP/2 to HTTP/1.1. Whitelisting our website in that proxy does solve the problem. This isn't a very satisfactory solution, but it does make the customer happier!
[UPDATE]
In our case, it turned out that we were sending a Content-Length header with a non-zero value on a 304 response. This is technically invalid and it caused problems with the proxy. This happened because of the Django CommonMiddleware which always puts a content-length header on responses. The solution was to add a new piece of middleware that strips out the content-length (and content) on a 304 response.
It turned out that the content was already being stripped by our nginx frontend, but it is better not to generate it in the first place.
And what was the content? -- in our case, it was the 4 characters 'null'!

Failed Google Page Speed Test with Lighthouse returned an error: FAILED_DOCUMENT_REQUEST

When I check (https://www.readonlinenewspaper.com) site speed using PageSpeed Insights.
I am not able to see and results and get an error message like below:
Lighthouse returned an error: FAILED_DOCUMENT_REQUEST. Lighthouse was unable to reliably load the page you requested. Make sure you are testing the correct URL and that the server is properly responding to all requests. (Details: net::ERR_CONNECTION_FAILED)
It is probably caused by one of two things
1. The site just takes too long to load.
Your page takes well over 40 seconds to load (on a high speed desktop connection, albeit in the UK and I am guessing this is somewhere else due to the long delay on requests.) so Page Speed Insights thinks it is broken as the page never completes loading within its timeout period.
Your country flags are the main cause of this, you should instead consider a CSS image sprite, or inline SVGs as the total of 438 requests on your page is so high you will never get good performance (generally only 8 requests can be made at once so that means you have over 50 round trips to your server for resources.)
If each set of eight resources takes 200ms to complete that is 10 seconds of latency (dead time waiting for a response) on its own, for me they were taking 800 to 1000 ms each!
This is particularly slow so perhaps there is something wrong with your hosting configuration or website setup? (You aren't storing the flag URLs in the database and looking them up one at a time in a loop by any chance are you?).
2. Hotjar
For some reason Page Speed Insights doesn't seem to play well with hotjar.
It is something to do with websockets but I never got to the bottom of it I just know that this is a problem I see often when people use hotjar and it is related to web sockets (maybe something to do with the wss:// protocol or their implementation).
Try disabling hotjar and run the test and see if it works then (perhaps test on another page when investigating this as it is only the homepage that is unbearably slow to load because of the flags as per point one).
p.s. the resource online-newspapers-banner-02.jpg is not being loaded over HTTPS so fix that, nothing to do with your question I just noticed the site was showing as "not secure" and I think that is the cause.

Which HTTP status should be returned when a SPECIFIC page will be down for several days

We are uploading a new version of our website.
For various reasons, some pages that exists on the older version still aren't ready for the new version and we need to temporarily take them off.
Which HTTP status should we return for these pages considered they will be up and running again within several days.
Is using ServiceUnavailable = 503 only for these pages the right way or will it have negative impact on the entire website?
(Using ASP.NET in case it's related in some way...)
The status code 503 seems to be the best choice here:
The 503 (Service Unavailable) status code indicates that the server is currently unable to handle the request due to a temporary overload or scheduled maintenance, which will likely be alleviated after some delay.
It shouldn’t be relevant that it’s not due to overload or maintenance in your case. What is relevant is that it’s your fault (hence a status code from the 5xx class), and that it’s temporary (hence 503), so there’s no need to let them know the real reason.
While 503 is typically used for the whole site, I can see no reason why it shouldn’t be used for specific pages only. A possible drawback: If a bot ‎successively crawls a few documents that give 503, it might think that the whole site is affected and stop crawling for now.
If you know when the page will be available again, you can send the Retry-After header:
When sent with a 503 (Service Unavailable) response, Retry-After indicates how long the service is expected to be unavailable to the client.
(FWIW, the Googlebot seems to support this.)
In the post Website outages and blackouts the right way (by a Google employee at the time), my assumptions are confirmed as far as Google Search is concerned: 503 should also be used for specific pages only; crawling rate might be affected if Googlebot gets many 503 answers.

What is an acceptable poll rate for an rss feed?

I'm currently polling the youtube rss feed servers about 3 times per second (multiple channels getting the feed every 3 seconds each). Is this too much and will getting it this often result in getting updates slower since it's going to start caching it for me? I'm trying to get new videos updated as quickly as possible but I can't find any gudielines on this sort of stuff.
I used to get them using the /v3/search/ at the same rate but the response from my server was always late compared to what I got when I did it on my local machine(where I didn't poll this often).
Also, is there any good alternatives? (I tried to push notifications but they were highly unreliable)
You should be using caching in your app to reduce bandwidth and latency. When caching, store the eTag so that you can include it when getting a resource. If the resource has not changed, you will get a 304 response code (NOT MODIFIED), which means you can use your cached version. Otherwise, you will get the updated version of the resource.
More info:
https://developers.google.com/youtube/v3/getting-started#etags

Facebook Invite Acceptance Rate dropping and steeper decline

Our app is having an issue with Invite Acceptance Rate, and I'm trying to find out if anyone is experiencing a similar problem. We had an app update mid-December, and there was about a 70% drop, and then started coming back up- only to decline again after the first of the year.
We had been using the REquest 2.0 framework for some time, but our setting "Upgrade to Request 2.0" setting was disabled. This was fixed over the wknd, but that still wouldnt have explained the initial drop mid-December since the Request 2.0 migration was not enforced until the first of the year.
We've looked at several features in the past app update but nothing seems to be the root cause. Any ideas? Or anyone experience a similar issue?

Resources