Google Maps returns status of OVER_QUERY_LIMITS under some conditions. I'm looking for any details on what those conditions are.
I'm able to deal with the status by resubmitting the Directions Services route request after a delay. I've been able to minimize the number of OVER_QUERY_LIMITS status by managing and throttling the requests with several 'magic number' setTimeout delays. I'm concerned that while these 'magic numbers' work for me on my machine with my network connection at my location on the day they were set they may be in appropriate under other circumstances.
Does anyone have any experience with addressing the OVER_QUERY_LIMIT issue particularly with adaptive strategies not susceptible to any of the above?
Edit ---------------------------------------------------------------------
It seems that the question might not be entirely clear. I found and understand the online documentation. The question is about what it means, what the details are, and strategies for dealing with them. In particular:
What is "to many request per second"? Is it 1 or 5 or 50 or what? How is it determined? What does it respond to? How does it change with use?
What strategies, particularly adaptive strategies, seem to work for dealing with the issue?
Edit again: Perhaps it's still not clear. ------------------------------------
In the particular case of a directions services request:
Does anyone know what the actual values are for 'request per second'?
Does anyone know if the value is a constant or if it changes?
Does anyone know if it changes what affects the change?
Does anyone know how long a wait is necessary before doing another
directions services request?
More generally, does anyone know how these change with other google map services?
Google's services are subject to a quota and a rate limit (which can depend on server load).
From the documentation on OVER_QUERY_LIMIT in the web services
Usage limits exceeded
If you exceed the usage limits you will get an OVER_QUERY_LIMIT status code as a response.
This means that the web service will stop providing normal responses and switch to returning only status code OVER_QUERY_LIMIT until more usage is allowed again. This can happen:
Within a few seconds, if the error was received because your application sent too many requests per second.
Some time in the next 24 hours, if the error was received because your application sent too many requests per day. The time of day at which the daily quota for a service is reset varies between customers and for each API, and can change over time.
Upon receiving a response with status code OVER_QUERY_LIMIT, your application should determine which usage limit has been exceeded. This can be done by pausing for 2 seconds and resending the same request. If status code is still OVER_QUERY_LIMIT, your application is sending too many requests per day. Otherwise, your application is sending too many requests per second.
This error comes when you try to request more than 8 or 9 per second.
I have solved this by following code.
function(response, status) {
if (status === 'OK') {
var directionsDisplay = new google.maps.DirectionsRenderer();
directionsDisplay.setMap(map);
directionsDisplay.setDirections(response);
}
else if (status == google.maps.GeocoderStatus.OVER_QUERY_LIMIT) {
wait = true;
setTimeout("wait = true", 2000);
//alert("OQL: " + status);
}
While quoting the documentation seems popular, it seems that no one really knows the actual requests per second answer or much about the actual algorithms used. A bit of investigation suggests that the most likely algorithm is a decaying rate function. I've demonstrated that it is possible to issue many (>15) requests in a burst but then it is only possible to issue a few (<4) requests in >1000 ms. A significant pause allows a repeat with similar though somewhat smaller numbers.
I eventually adopted a queued rate limited function. While it is not the fastest possible approach (fails to capitalize on the burst characteristic) it is simple and appears to be very robust with extremely few OVER_QUERY_LIMIT errors.
Related
I have a webapp and a Windows Service which communicate using Firebase Cloud Messaging. The webapp subscribes to a couple of Topics to receive messages, and Windows Service App sends messages to one of these Topics. In some cases it can be several messages per seconds, and it gives me this error:
FirebaseAdmin.Messaging.FirebaseMessagingException: Topic quota exceeded
I don't quite get it. Is there a limit to messages that can be sent to a specific topic, or what is the meaning?
I have found until now only info about topic names and subscription limits, but I actually couldn't find anything about "topic quota", except maybe this page of the docs (https://firebase.google.com/docs/cloud-messaging/concept-options#fanout_throttling) although I am not sure it refers to the same thing, and in case if and how it can be changed. In the Firebase Console I can't find anything either. Has anybody got an idea?
Well.. from this document it seems pretty clear that this can happen:
The frequency of new subscriptions is rate-limited per project. If you
send too many subscription requests in a short period of time, FCM
servers will respond with a 429 RESOURCE_EXHAUSTED ("quota exceeded")
response. Retry with exponential backoff.
I do agree that the document should've state how much quantity will trigger the block mechanism instead of just telling the developer to "Retry with exponential backoff". But, at the end of the day, Google also produced this document to help developers understand how to properly implement this mechanism. In a nutshell:
If the request fails, wait 1 + random_number_milliseconds seconds and
retry the request.
If the request fails, wait 2 + random_number_milliseconds seconds and
retry the request.
If the request fails, wait 4 + random_number_milliseconds seconds and
retry the request.
And so on, up to a maximum_backoff time.
My conclusion: reduce the amount of messages send to topic OR implement a retry mechanism to recover unsuccessful attempts
It could be one of these issue :
1. Too high subscriptions rates
Like noted here
The frequency of new subscriptions is rate-limited per project. If you send too many subscription requests in a short period of time, FCM servers will respond with a 429 RESOURCE_EXHAUSTED ("quota exceeded") response. Retry with exponential backoff.
But this don't seem to be your problem as you don't open new subscriptions, but instead send messages at high rate.
2. Too many messages sent to on device
Like noted here
Maximum message rate to a single device
For Android, you can send up to 240 messages/minute and 5,000 messages/hour to a single device. This high threshold is meant to allow for short term bursts of traffic, such as when users are interacting rapidly over chat. This limit prevents errors in sending logic from inadvertently draining the battery on a device.
For iOS, we return an error when the rate exceeds APNs limits.
Caution: Do not routinely send messages near this maximum rate. This
could waste end users’ resources, and your app may be marked as
abusive.
Final notes
Fanout throttling don't seems to be the issue here, as the rate limit is really high.
Best way to fix your issue would be :
Lower your rates, control the number of "devices" notified and overall limit your usage over short period of time
Keep you rates as is but implement a back-off retries policy in your Windows Service App
Maybe look into a service mor suited for your usage (as FCM is strongly focused on end-client notification) like PubSub
My team and I have been at this for 4 full days now, analyzing every log available to us, Azure Application Insights, you name it, we've analyzed it. And we can not get down to the cause of this issue.
We have a customer who is integrated with our API to make search calls and they are complaining of intermittent but continual 502.3 Bad Gateway errors.
Here is the flow of our architecture:
All resources are in Azure. The endpoint our customers call is a .NET Framework 4.7 Web App Service in Azure that acts as the stateless handler for all the API calls and responses.
This API app sends the calls to an Azure Service Fabric Cluster - that cluster load balances on the way in and distributes the API calls to our Search Service Application. The Search Service Application then generates and ElasticSearch query from the API call, and sends that query to our ElasticSearch cluster.
ElasticSearch then sends the results back to Service Fabric, and the process reverses from there until the results are sent back to the customer from the API endpoint.
What may separate our process from a typical API is that our response payload can be relatively large, based on the search. On average these last several days, the payload of a single response can be anywhere from 6MB to 12MB. Our searches simply return a lot of data from ElasticSearch. In any case, a normal search is typically executed and returned in 15 seconds or less. As of right now, we have already increased our timeout window to 5 minutes just to try to handle what is happening and reduce timeout errors for the fact their searches are taking so long. However, we increased the timeout via the following code in Startup.cs:
services.AddSingleton<HttpClient>(s => {
return new HttpClient() { Timeout = TimeSpan.FromSeconds(300) };
});
I've read in some places that you actually have to do this in the web.config file as opposed to here, or at least in addition to it. Not sure if this is true?
So The customer who is getting the 502.3 errors have significantly increased the volumes they are sending us over the last week, but we believe we are fully scaled to be able to handle it. They are still trying to put the issue on us, but after many days of research, I'm starting to wonder if the problem is actually on their side. Could it be possible that they are not equipped to take the increased payload on their side. Can it be that their integration architecture is not scaled enough to take the return payload from the increased volumes? When we observe our resources usages (CPU/RAM/IO) on all of the above applications, they are all normal - all below 50%. This also makes me wonder if this is on their side.
I know it's a bit of a subjective question, but I'm hoping for some insight from someone who may have experienced this before, but even more importantly, from someone who has experience with a .Net API app in Azure which return large datasets in it's responses.
Any code blocks of our API app, or screenshots from Application Insights are available to post upon request - just not sure what exactly anyone would want to see yet as I type this.
Currently, I am using Measurement Protocol to push the data to GA. The problem is I didn't get any response back for Success or Error on Production, If yes Please suggest?
Due to this, I am looking if there is any other options available for the same like can we achieve it using analytics 360?
The google analytics production data collection endpoint does not return a request status back (always 200 OK) by design to ensure ultra-light processing speed.
What I usually recommend to clients using Measurement Protocol server-side is to
To log a reasonable amount (or all of them) of requests somewhere. Storage is extremely cheap nowadays and knowing the data format if an emergency happens you will be able to manually extract the data
Every once in a while (one on thousand or one on a million or even more oftne depending on the importance of the data randomly) validate request on GA debug endpoint and parse the returned json. If there are any warnings or error send a notification for further investigation. This way, if anything went wrong you will be on top of the problem by the time BI & Marketing Team would be affected
I have the following problem: I have two network clients, where one is a device that is to be "claimed" by its owner, and another is the program which claims it. When the claimee hits the server, it announces it's available to be claimed, and then the claimer can claim it (after authenticating and supplying information only it could know of course). However, if claimer hits the server first, then I have a classic "lost signal" problem. The claimer can retry and that's fine, but I can end up with the following race condition, the main point in question:
Claimee hits the server and announces, then its connection fails
Claimer comes in and find the announced record, and claims it
Claimee reconnects with a status of unclaimed, and overwrites the claim
I've thought of a few solutions:
1) Expire old claimee announces after 60 seconds, and have the claimer retry. This is still susceptible to the above problem, but shrinks the window to about 60 seconds. In addition, the claimee takes about 30-40 seconds to bootstrap, so it should pragmatically make the problem very hard to encounter, or reproduce.
2) Have claims issued by claimer be valid for any claimee announce up to 30 seconds after the claim came in. This works, but starts to muddle the definition of a claimee announce: it means that the claimee announce isn't always interpreted to mean to "reset the claimee status," because for up to 30 seconds after the last claim it means "join to last claim."
Those are the high points, but may not be enough of a description of the problem, so let me know if I can add any comments to elucidate further. In terms of the solution, these are workable solutions, but I'm looking for an analogy to a known problem perhaps, and to see if there're ideas I haven't thought of.
Maybe I didn’t understand the problem description correctly, but you have also another problem - what if both are connected just fine and than the claimee fails? The claimer will need to deal with this issue as well, unless you’re assuming that this scenario never can happen.
In general there are several ways to implement a solution for both problems, but the probably most reliable one would be inspired by the implementation used by Java’s RMI.
When you send a message to the claimee add there a unique ID. When you don’t get an answer you can retry sending the message several times with the same ID (messages can get lost) and after some longer timeout you can accept that the claimee is unavailable. Now you can again look for connection information at the server and restart the process.
For this you’d need to cache all messages which haven’t been yet processed on claimer’s side. Additionally on claimee’s side you’ll need to cache the last X message IDs and their results (if available) This is necessary in order not to perform operations in one message multiple times and also be able to reply with the correct result again (since also result messages can get lost)
When considering a service in NServiceBus at what point do you start questioning how many messages handled by a service is too much and start to break these into a new service?
Consider the following: I have a sales service which can currently be broken into a few distinct business components, these are sales order validation, sales order processing, purchase order validation and purchase order processing.
There are currently about 20 message handlers and 2 sagas used within this service. My concern is that during high volume traffic from my website this can cause an initial spike in the messages to jump into the hundreds. Considering that the messages need to be processed in the order they are taken off the queue this can cause a delay for the last in the queue ( depending on what processing each message does).
When separating concerns within a service into smaller business components I find this makes things a little easier. Sure, it's a logical separation, but it seems to provide a layer of clarity and understanding. To me it seems it seems an easier option to do this than creating new services where in the end the more services I have the more maintenance I need to do.
Does anyone have any similar concerns to this?
I think you have actually answered you own question :)
As soon as the message volume reaches a point where the lag becomes an issue you could look to instance your endpoint. You do not necessarily need to reduce the number of handlers. You could simply install the service a number of times and have specific message types sent to the relevant endpoint by mapping.
So it becomes a matter of a simple instance installation and some config changes. So you can then either split messages on sending so that messages from a particular source end up on a particular endpoint (maybe priority) or on message type.
I happened to do the same thing on a previous project (not using NServiecBus though) where we needed document conversion messages coming from the UI to be processed ASAP. We simply installed the conversion service again with its own set of queues and changed the UI configuration to send the conversion messages to the new endpoint. The background conversion messages were still going to the previous endpoint. So here the source determined the separation.