I've been recently fixing my application that apparently reached some GA quota
limitations and I've found a couple of things that were not clear to
me:
Does the 4 concurrent requests limitation apply per application,
per web property or anything else?
If we break the 10 requests in any given 1-second period or 4
concurrent requests limitation, how long does it take before GA stops
responding with 503 ServiceUnavailable error?
Does quota per application refer to the application name string
only? We are running two different web application using different GA
application string. Both apps connect GA API from the same IP address.
Can we expect the quota per application is calculated for each
application string separately in this case?
Are the status codes sent with 503 ServiceUnavailable response
documented anywhere? Can we be sure that rateLimitExceeded refers to
the 10 requests per second limitation? How can I found out the cause
of an 503 response?
Btw is it possible that a stronger quota restrictions than documented
may take effect sometimes?
For example, is it possible that GA replies with 503 ServiceUnavailable
response just after 6 fast but subsequent requests or just because of any
other undesired behavior of a client application that's not included
in the documentation?
Regards,
Pavel
It's just been answered by Nick in the GA Google Group.
Main points: the 10 qps and 4 parallel requests limitation count per IP, even an application running on a different machine in the same network may be counted.
I've submitted a documentation bugreport to the GData issue tracker.
Related
We are currently monitoring a web API using the data in the Performance page of Application Insights, to give us the number of requests received per operation.
The architecture of our API solution is to use APIM as the frontend and an App Service as the backend. Both instances have App Insights enabled, and we don't see a reasonable correlation between the number of requests to APIM and the requests to the App Service. Also, this is most noticeable only in a couple of operations.
For example,
Apim-GetUsers operation has a count of 60,000 requests per day (APIM's AI instance)
APIM App Insights Performance Page
AS-GetUsers operation has a count of 3,000,000 requests per day (App Service's AI instance)
App Service App Insights Performance Page
Apim-GetUsers routes the request to AS-GetUsers and Apim-GetUsers is the only operation that can call AS-GetUsers.
Given this, I would expect to see ~60,000 requests on the App Service's AI performance page for that operation, instead we see that huge number.
I looked into this issue a little bit and found out about sampling and that some App Insights features use the itemCount property to find the exact number of requests. In summary,
Is my expectation correct, and if so what could cause this? Also, would disabling adaptive sampling and using a fixed sampling rate give me the expected result?
Is my expectation wrong, and if so, what is a good way to get the expected result? Should I not use the Performance page for that metric?
Haven't tried a whole lot yet as I don't have access to play with the settings until I can find a viable solution, but I looked into sampling and itemCount property as mentioned above. APIM sampling is set to 100%.
I ran a query in Log Analytics on the requests table and when I just used the requests count, I got a number that was closer to the one I see in APIM, but when I use a sum of the itemCount, as suggested by some MS docs, I get that huge number as seen in the performance page.
List of NuGet packages and version that you are using:
Microsoft.Extensions.Logging.ApplicationInsights 2.14.0
Microsoft.ApplicationInsights.AspNetCore 2.14.0
Runtime version (e.g. net461, net48, netcoreapp2.1, netcoreapp3.1, etc. You can find this information from the *.csproj file):
netcoreapp3.1
Hosting environment (e.g. Azure Web App, App Service on Linux, Windows, Ubuntu, etc.):
App Service on Windows
Edit 1: Picture of operation_Id and itemCount
I have a VB.NET/Vue website hosted on an internal IIS 8.5 Windows 2012R2 Server. Our company has about 30 users using the site at any given time. The users are experiencing random delays throughout the day and on some days there's no delays (site works great most of the time). What I'm looking for is any suggestions on where to start looking to solve the issue. Here's what I've found so far.
User goes to site and initiates an api request from the UI
User sees a loading icon for anywhere up to a minute or so while the request returns
The request eventually reaches the server after some time and executes really fast within milliseconds and returns the response to the user
By this time, many users have already refreshed the page making new requests that succeed on page load. For the users that are patient and wait for the response, it eventually returns the response.
Here's some screenshots:
So to sum everything up, there are several users experiencing delays on a daily basis.
Some days we don’t have any delays, but most days we have several users experiencing multiple delays of several seconds to 30 seconds to 1 minute.
I’ve found all this using LogRocket and NewRelic and what is happening is all these requests are completing within milliseconds, but the request doesn’t seem to reach the server for some period of time.
I’ve been monitoring the CPU/Memory/Network on these servers and it all seems fine to me during when these issues occur.
It seems that the problem lies between the users computer and whatever hardware/software exists before reaching the web server.
Update here... Found that the problem is occurring on the users computer in all these instances. Using google chrome's performance api, I was able to track timing info for these requests and found that the problem is in the fetchStart. So whatever is happening here is the cause of the issue.
Example below:
entryType: resource
startTime: 1119531.820000033
duration: 56882.43999995757
initiatorType: xmlhttprequest
nextHopProtocol: http/1.1
workerStart: 0
redirectStart: 0
redirectEnd: 0
fetchStart: 1119531.820000033
domainLookupStart: 1176401.0199999902
domainLookupEnd: 1176402.2699999623
connectStart: 1176402.2699999623
connectEnd: 1176404.8350000521
secureConnectionStart: 1176403.6700000288
requestStart: 1176404.8549999716
responseStart: 1176413.5300000198
responseEnd: 1176414.2599999905
transferSize: 15145
encodedBodySize: 14884
decodedBodySize: 14884
serverTiming: []
workerTiming: []
fetchStart is at 1119531.820000033, then requestStart is at 1176404.8549999716 so the problem is something between fetchStart and requestStart. Still looking into what is causing this.
In 2022, we are experiencing something very similar with a small fraction of our customers. There is a significant gap between the timing api requestStart and the startTime. This gap can be up to 8 minutes -- I admire the patience of customers waiting that long. The wait periods are also close to multiples of a minute.
In our case, it appears that there is a (transparent?) proxy between those browsers and our server infrastructure which appears to be triggering the problem. In particular, it forces a downgrade of HTTP/2 to HTTP/1.1. Whitelisting our website in that proxy does solve the problem. This isn't a very satisfactory solution, but it does make the customer happier!
[UPDATE]
In our case, it turned out that we were sending a Content-Length header with a non-zero value on a 304 response. This is technically invalid and it caused problems with the proxy. This happened because of the Django CommonMiddleware which always puts a content-length header on responses. The solution was to add a new piece of middleware that strips out the content-length (and content) on a 304 response.
It turned out that the content was already being stripped by our nginx frontend, but it is better not to generate it in the first place.
And what was the content? -- in our case, it was the 4 characters 'null'!
My team and I have been at this for 4 full days now, analyzing every log available to us, Azure Application Insights, you name it, we've analyzed it. And we can not get down to the cause of this issue.
We have a customer who is integrated with our API to make search calls and they are complaining of intermittent but continual 502.3 Bad Gateway errors.
Here is the flow of our architecture:
All resources are in Azure. The endpoint our customers call is a .NET Framework 4.7 Web App Service in Azure that acts as the stateless handler for all the API calls and responses.
This API app sends the calls to an Azure Service Fabric Cluster - that cluster load balances on the way in and distributes the API calls to our Search Service Application. The Search Service Application then generates and ElasticSearch query from the API call, and sends that query to our ElasticSearch cluster.
ElasticSearch then sends the results back to Service Fabric, and the process reverses from there until the results are sent back to the customer from the API endpoint.
What may separate our process from a typical API is that our response payload can be relatively large, based on the search. On average these last several days, the payload of a single response can be anywhere from 6MB to 12MB. Our searches simply return a lot of data from ElasticSearch. In any case, a normal search is typically executed and returned in 15 seconds or less. As of right now, we have already increased our timeout window to 5 minutes just to try to handle what is happening and reduce timeout errors for the fact their searches are taking so long. However, we increased the timeout via the following code in Startup.cs:
services.AddSingleton<HttpClient>(s => {
return new HttpClient() { Timeout = TimeSpan.FromSeconds(300) };
});
I've read in some places that you actually have to do this in the web.config file as opposed to here, or at least in addition to it. Not sure if this is true?
So The customer who is getting the 502.3 errors have significantly increased the volumes they are sending us over the last week, but we believe we are fully scaled to be able to handle it. They are still trying to put the issue on us, but after many days of research, I'm starting to wonder if the problem is actually on their side. Could it be possible that they are not equipped to take the increased payload on their side. Can it be that their integration architecture is not scaled enough to take the return payload from the increased volumes? When we observe our resources usages (CPU/RAM/IO) on all of the above applications, they are all normal - all below 50%. This also makes me wonder if this is on their side.
I know it's a bit of a subjective question, but I'm hoping for some insight from someone who may have experienced this before, but even more importantly, from someone who has experience with a .Net API app in Azure which return large datasets in it's responses.
Any code blocks of our API app, or screenshots from Application Insights are available to post upon request - just not sure what exactly anyone would want to see yet as I type this.
I use calls like
http://maps.googleapis.com/maps/api/geocode/xml?address=Switzerland+Bern+&components=country:CH&sensor=false
to get geocoodinates. This works for some hours an after that there is a response
<?xml version="1.0" encoding="UTF-8"?>
<GeocodeResponse> <status>OVER_QUERY_LIMIT</status> </GeocodeResponse>
I am pretty shure that I dont exceed 2500 calls I can make with my server api key and in the API Konsole I only have reached 10% of my quota.
I even have billing enabled. Is there a way to debug this further or is there something I am missing?
This is my API Konsole:
This is the usage data
I have the same issue. From research I understand the geocoder has a request limit of 2,500 per day and 20 per second. These are on an IP address basis.
See https://developers.google.com/maps/documentation/geocoding/index#Limits
That means if your application is on a shared server, and there are other applications sharing the IP address and using the geocode service, their requests will be bundled with yours to provide the total.
I do not know of any way of seeing the request report for the geocoder, the report in the API console is for Maps, which is for displaying a map not for geocoding an address.
As far as I'm aware, there does not seem to be any way round the geocoder limit, simply having a standard paid API account does not seem to make any difference to the request limit.
Maps for Business accounts however allow a usage limit of up to 100,000 requests per day.
The best option may be for you to change the architecture of your application (if possible) from server side geocoding to client side geocoding, which would negate the restriction on your servers IP. See https://developers.google.com/maps/articles/geocodestrat
I think it's a problem with Google. Try to avoid server side geocode. Instead use a client side geocode.
I'm placing up to 25 markers on a map but when I hit 12 I get an error of "OVER_QUERY_LIMIT".
I have hit nowhere near the 2,500 hits a day limit.
If I try and plot only 11 markers I have no problem.
Any one know why this is?
edit
Ok, a lot of testing and I have determined that I can't call geocoder.geocode more than a certain number of times before I have to wait until the calls are done.
I have implemented a version that sends a bunch of requests, waits and then sends more and it's working but it's a total fudge.
Is there a way to geocode a bunch of addresses at once without this limitation?
My client does not store the latlng of the addresses so I need to get that from the address.
The JS geocoder is rate limited:
"The Google Maps API provides a geocoder class for geocoding addresses dynamically from user input. These requests are rate-limited to discourage abuse of the service. If instead, you wish to geocode static, known addresses, see the Geocoding web service documentation."
From http://code.google.com/apis/maps/documentation/javascript/geocoding.html#Geocoding
The web service documentation also mentions a rate limit, but presumably it's higher:
http://code.google.com/apis/maps/documentation/geocoding/#Limits
You can also cache addresses for a limited amount of time for performance purposes. That would allow you to check your cache first before running into the problem.