I have an application which gets Google Analytics statistics for 28 google users (a.k.a: accounts/logins/emails) from Reporting API.
Every client makes a request to his own ga.data (some metrics for the last 3 days) ones per 10 minutes.
Everything was ok for a long period, but yesterday around 5:00 pm UTC one the GA users started to get 403:rateLimitExceeded error in response.
A cycle of 10 repeating requests gets the same 403 error. In 10 min., the new cycle starts and the result is the same.
All other clients on the same application keep updating well, without getting 403:rateLimit error in return.
I have a "sleep" function for 1 second before making a request. Also I am sending a uniq "quotaUser" in each request. My application makes less than 1 RPS and keeps within 20K requests per day.
As I am aware "403:rateLimitExceeded error" stands for the overall limit of the requests for the whole application per day, however in my case all other clients except this one keep updating properly and the overall daily limit of 50K is not being exceeded.
UPDATE:
0:00 UTC this GA user stopped getting "403 error" and now keeps updating well.
Please advise what could be the possible reasons of getting "403:rateLimitExceeded error" for that client and what I can do to avoid getting the same problem again?
The reason was too many failed requests per a little time period, in this case you should wait for quotas reset at midnight
Related
According to docs, the daily request limits all reset at midnight pacific, but the daily error limit just says per day. Is this also reset at midnight or is this a rolling 24 hour period?
I don't have an authoritative answer but based on a lot of experimentation over the last week it looks like it is a 24-hour rolling window and not based on a set time, like the daily max request rate limits.
daily error limit
The daily error limit is something that they added a few years back. The idea behind is is that if your application sends to many errors. Error's being making requests that are invalid. This is not the errors realted to the quota or going to fast.
Google added this error to stop people looping though bad requests and not fixing their code. The thing is I have never heard of anyone actually hitting this limit.
I was told at the time that if your app continues with these errors the length of time you are banned gets longer each time. I say banned because it is actually like a ban Google is stopping you from making requests that are useless as they are wrong.
So when will it reset off hand i remember something about days or a perm ban. If you can give me the full error message you are getting i can contact the team for you and get some more details. Or PM me on twitter and if you give me the project id i might be able to get you some info that way as well.
You're correct that the daily server error quotas are 24-hour rolling windows. We updated the documentation on sever error quotas on the Developer's site:
Server error rate quotas are enforced over rolling windows of time for each of the hourly and daily error rate quotas. One hour after a project and view pair's first server error, the quota resets. If a project and view pair sends 10 server errors in one hour, the project and view pair is blocked from the API until the hour after the first server error elapses.
For example if a project and view pair have not sent a server error in the last 24 hours, this project and view will have used 0 of its 50 quota. Let's say this project and view now send a server error at 6:12 AM. If this project and view send 49 additional server errors before 6:12 AM of the following day, this project and view will be blocked until 6:12 AM of the following day. At 6:12 AM of the following day, the server error rate quota will completely reset for this project and view.
There are two sever error quotas: 1) daily & 2) hourly. If either of these quotas are exhausted, the project and view pair are blocked from making requests for up to the next 24 hours or up to the next hour (respectively). The duration of being blocked does not increase if a project and view pair repeatedly exhaust this quota. For example, it is impossible to be blocked for more than 24 hours.
Thanks,
The Google Analytics Team
I have a VB.NET/Vue website hosted on an internal IIS 8.5 Windows 2012R2 Server. Our company has about 30 users using the site at any given time. The users are experiencing random delays throughout the day and on some days there's no delays (site works great most of the time). What I'm looking for is any suggestions on where to start looking to solve the issue. Here's what I've found so far.
User goes to site and initiates an api request from the UI
User sees a loading icon for anywhere up to a minute or so while the request returns
The request eventually reaches the server after some time and executes really fast within milliseconds and returns the response to the user
By this time, many users have already refreshed the page making new requests that succeed on page load. For the users that are patient and wait for the response, it eventually returns the response.
Here's some screenshots:
So to sum everything up, there are several users experiencing delays on a daily basis.
Some days we don’t have any delays, but most days we have several users experiencing multiple delays of several seconds to 30 seconds to 1 minute.
I’ve found all this using LogRocket and NewRelic and what is happening is all these requests are completing within milliseconds, but the request doesn’t seem to reach the server for some period of time.
I’ve been monitoring the CPU/Memory/Network on these servers and it all seems fine to me during when these issues occur.
It seems that the problem lies between the users computer and whatever hardware/software exists before reaching the web server.
Update here... Found that the problem is occurring on the users computer in all these instances. Using google chrome's performance api, I was able to track timing info for these requests and found that the problem is in the fetchStart. So whatever is happening here is the cause of the issue.
Example below:
entryType: resource
startTime: 1119531.820000033
duration: 56882.43999995757
initiatorType: xmlhttprequest
nextHopProtocol: http/1.1
workerStart: 0
redirectStart: 0
redirectEnd: 0
fetchStart: 1119531.820000033
domainLookupStart: 1176401.0199999902
domainLookupEnd: 1176402.2699999623
connectStart: 1176402.2699999623
connectEnd: 1176404.8350000521
secureConnectionStart: 1176403.6700000288
requestStart: 1176404.8549999716
responseStart: 1176413.5300000198
responseEnd: 1176414.2599999905
transferSize: 15145
encodedBodySize: 14884
decodedBodySize: 14884
serverTiming: []
workerTiming: []
fetchStart is at 1119531.820000033, then requestStart is at 1176404.8549999716 so the problem is something between fetchStart and requestStart. Still looking into what is causing this.
In 2022, we are experiencing something very similar with a small fraction of our customers. There is a significant gap between the timing api requestStart and the startTime. This gap can be up to 8 minutes -- I admire the patience of customers waiting that long. The wait periods are also close to multiples of a minute.
In our case, it appears that there is a (transparent?) proxy between those browsers and our server infrastructure which appears to be triggering the problem. In particular, it forces a downgrade of HTTP/2 to HTTP/1.1. Whitelisting our website in that proxy does solve the problem. This isn't a very satisfactory solution, but it does make the customer happier!
[UPDATE]
In our case, it turned out that we were sending a Content-Length header with a non-zero value on a 304 response. This is technically invalid and it caused problems with the proxy. This happened because of the Django CommonMiddleware which always puts a content-length header on responses. The solution was to add a new piece of middleware that strips out the content-length (and content) on a 304 response.
It turned out that the content was already being stripped by our nginx frontend, but it is better not to generate it in the first place.
And what was the content? -- in our case, it was the 4 characters 'null'!
We have a backend API which runs in almost constant time (it does "sleep" for given period). When we run a managed API which proxies to it for a long time, we see that from time to time its execution time increases up to twice the average.
From analyzing the Amazon ALB data in production, it seems that the time the request spends inside Synapse remains the same, but the connection time (the time the request enters the queue for processing) is high.
In an isolated environment we noticed that those lags happen approximately every 10 minutes. In production, where we have multiple workers that gets request all the time, the picture is more obscured, as it happens more often (possibly the lag accumulates).
Does anyone aware of any periodic activity in the worker which result delays entering the queue every
few minutes? Any parameter that control this? Otherwise, any idea how to figure out what is the cause?
Attached is an image demonstrating it.
Could be due to gateway token cache invalidation. The default timeout is 15 minutes.
Once the key validation is done via the Key Manager, the key validation info is cached in the gateway. The subsequent API invocations, key validation will be done from this cache. During this time, the execution time will be lower.
After the cache invalidation, the token validation will be done via the key manager (using DB). This will cause an increase in the execution time.
Investigating further, we found out two causes for the spikes.
Our handler writes log to shared file system which was set as sync instead of async. This caused delays. This reduced most of the spikes.
Additional spikes seem to be related to registry updates. We did not investigate those, as they were more sporadic.
I've been using the google analytics API to authenticate on behalf of my customers and show them customized reports. The req throughput was always good enough (11 req/sec).
However, since a couple of days ago, the throughput had decreased drastically, from 11 req/sec to 5 req/sec. And the reports that usually took 10 seconds, now finishes in 40 seconds.
Nothing has been modified in the last days, neither the queries nor the way to access the API
I tried but haven't found anything about any performance degradation from Google.
Here are the requests for the last 4 days:
Is there something that I can do to validate what's wrong or where is the problem?
Recently we changed a dedicated server from Windows Server 2003 to a more powerful server, based on Window Server 2012. A problem appeared on the new server, that sometimes requests are running too slow.
I did an additional logging at various stages of requests, and get confusing data. For example, a call to web service method takes 7 seconds between PreRequestHandlerExecute and PostRequestHandlerExecute (tracked time in Global.asax), but at the same time there are log records made inside the called method, that show execution time of this method was less than a second (log records and the start and end of the method have same milliseconds). So it means that the method itself executed fast. The question is what consumed 7 seconds between PreRequestHandlerExecute and PostRequestHandlerExecute?
Note that the problem is not replicatable, I can never replicate it myself, but only see this in log happending to other people (I programmed an email notification sent to me when it happens that request takes more than 3 seconds).
Sometimes the execution time on some pages goes to such crazy values as 2 minutes, and from log records I have on different stages of the page execusion I cannot see what consumes that time. The problem did not exist on the old 2003 server.