HERE Maps Routing: HTTP 429 Too many requests error - here-api

I have a python web app which uses the HERE Maps Routing API (V8) to get approx. 10-20 routes in one go. Periodically, I get a HTTP 429 response saying "Too Many Requests: Rate limit for this service has been reached". The requests to the API are called asynchronously, but I've set a rate limit of 10 requests per second as per the Freemium Plan Limits (https://developer.here.com/pricing#plan-details).
Is anyone able to confirm what the rate limit actually is, or how this error can be avoided?
Thanks!

When using location features provided with HERE REST APIs you should optimize your application for rate limiting. Based on your plan and the services you use; the rate of API requests your client makes may be limited when service utilization is high.
How do I know if my requests will be limited?
While this depends on your plan and the services you use, it is best to ensure your application is optimized in any case to handle rate limiting when it occurs.
What happens when rate limiting occurs?
When a request is rate limited, the response status code will be HTTP 429 Too Many Requests
Suggested optimization approach: exponential back-off
While there are different approaches to optimize for rate limiting, we suggest utilizing an exponential back-off algorithm as it is a common and straightforward method. The specific implementation will depend on your application. Here is a summary of this approach:
Check for HTTP 429 response status codes as you send requests to the service.
When receiving a HTTP 429 status, wait for a set time (one second is typical) before making the next request.
Resume sending requests. If rate limiting occurs again, back-off by increasing the wait time by some factor.
Increase the back-off time factor exponentially, until requests are no longer rate limited.

Related

How to handle HTTP STATUS CODE 429 (Rate limiting) for a particular domain in Scrapy?

I am using Scrapy to build a broad crawler which will crawl a few thousand pages from 50-60 different domains.
I have encountered 429 status code sometimes. I am thinking of ways of dealing with it. I am trying to set polite policies regarding Concurrent requests per domain and autothrottle settings. This is a worst-case situation.
By default, Scrapy drops the request.
If we add 429 to RETRY_HTTP_CODES, Scrapy will use the default retry middleware which will schedule the request at the end of the queue. This will still allow other requests to the same domain to ping the server - Does this prolong the temporary block in place due to rate limiting? If not, why not use this approach only instead of trying other complex solutions as described below?
Another approach is to block the spider when it encounters a 429. However, one of the comments mentions that this will lead to a timeout in other active requests. Also, this would block requests to all the domains (which is an inefficient way as requests to other domains should continue normally). Does it make sense to temporarily reschedule requests to a particular domain instead of pinging the server continuously with other requests to the same domain? If yes, how to implement it in Scapy?
Does this solve the issue?
When Rate Limiting is already triggered - does sending more requests (which will receive a 429 response) prolong the time period for which rate limiting is applied? Or will it have no effect on rate limiting's time period?
How to pause scrapy to send requests to a particular domain, while continuing its other tasks (including requests to other domains)?
EDIT:
The default Retry Middleware cannot be used as it has a max retry counter - RETRY_TIMES. After this has expired for a particular request, that request is dropped - something that we don't want in the case of a 429.

NiFi. There are too many outstanding HTTP requests with a total 100 outstanding requests

I'm having There are too many outstanding HTTP requests with a total 100 outstanding requests error each time I try to login to apache NiFi and sometimes when I'm working on the interface, memory and cpu are available on the cluster machines and NiFi flows work normally even when I can't login, is there a reason I keep getting this? And what can I try to prevent it?
You can change nifi.cluster.node.max.concurrent.requests to a greater number. As stated here, the number of requests is limited to 100 by default, this would increase the possible number of requests.
It might not fix the problem that makes your cluster experience so many requests, but it could at least let you login to your cluster.

NiFi HandleHttpResponse 503 Errors

In NiFi, I have an HTTP endpoint that accepts POST requests with payloads varying from 7kb to 387kb or larger (up to 4mb). The goal is to have a clustered implementation capable of handling approximately 10,000 requests per second. However, no matter if I have NiFi clustered with 3 nodes or a single instance, I've never been able to average more than 15-20 requests/second without the Jetty service returning a 503 error. I've tried reducing the time penalty and increasing the number of Maximum Outstanding Requests in the StandardHttpContextMap. No matter what I try, whether on my local machine or on a remote VM, I cannot get any impressive number of requests to go through.
Any idea why this is occurring and how to fix this? Even when clustering, I notice one node (not even the primary node) does the majority of the work and I think this explains why the throughput isn't much higher for a clustered implementation.
No matter at what bulletin level, this is the error I get in the nifi-app.log:
2016-08-09 09:54:41,568 INFO [qtp1644282758-117] o.a.n.p.standard.HandleHttpRequest HandleHttpRequest[id=6e30cb0d-221f-4b36-b35f-735484e05bf0] Sending back a SERVICE_UNAVAILABLE response to 127.0.0.1; request was POST 127.0.0.1
This is the same whether I'm running just two processors (HandleHttpRequest and HandleHttpResponse) as my only processors or I have my general flow where I'm routing on content, replacing some text, and writing to a database or jms messaging system. I can get higher throughput (up to 40 requests/sec) when I'm running just the web service without my entire flow, but it still has a KO rate of about 90%, so it's not much better - still seems to be an issue with the jetty service.

Call to slow service over HTTP from within message-driven bean (MDB)

I have a message driven bean which serves messages in a following way:
1. It takes data from incoming message.
2. Calls external service via HTTP (literally, sends GET requests using HttpURLConnection), using the data from step 1. No matter how long the call takes - the message MUST NOT be dropped.
3. Uses the outcome from step 2 to persist data (using entity beans).
Rate of incoming messages is:
I. Low most of the time: an order of units / tens in a day.
II. Sometimes high: order of hundreds in a few minutes.
QUESTION:
Having that service in step (2) is relatively slow (20 seconds per request and degrades upon increasing workload), what is the best way to deal with situation II?
WHAT I TRIED:
1. Letting MDB to wait until service is executed, no matter how long it takes. This tends to rollback MDB transactions by timeout and to re-deliver message, increasing workload and making things even worse.
2. Setting timeout for HttpURLConnection gives some guarantees in terms of completion time of MDB onMessage() method, but leaves an open question: how to proceed with 'timed out' messages.
Any ideas are very much appreciated.
Thank you!
In that case you can just increase a transaction timeout for your message driven beans.
This is what I ended up with (mostly, this is application server configuration):
Relatively short (comparing to transaction timeout) timeout for HTTP call. The
rationale: long-running transactions from my experience tend to
have adverse side effects such as threads which are "hung" from app.
server point of view, or extra attention to database configuration,
etc.I chose 80 seconds as timeout value.
Increased up to several minutes re-delivery interval for failed
messages.
Careful adjustment of the number of threads which handle messages
simultaneously. I balanced this value with throughput of HTTP service.

How can I measure the server utilization?

How can I measure the server utilization in terms of requests per unit of time (lets say one hour), assuming the server's maximum capacity is known (for example, 1000 requests per hour)?
I know the equation will be:
utilization = Number of executed requests by server / server capacity
But how can I measure the requests sent from a client to a server?
I need a valid equation to define a request please.
This cannot be answered as "request" in a client/server model cannot be identified without knowing the protocol. To illustrate, multiple HTTP requests can be sent in one connection. UDP based protocols do not use connections at all.
Most general description I can come up with to define a request in an unidentified client/server protocol is the number of messages initiated by the client that require a response by the server. It is a observed variable, not a derivative.
In a program you would obtain this variable via a callback or RPC to the server in question or a program that would be able to provide this variable by inspecting logfiles.
For utilisation you can get all you need from the "sar" utility.
However "request" will be very specific to the software you are running. For instance if you are running Apache web server, by default, it will log each request and you can scan these logs to extract you request data.
However be aware that these are "technical" requests and may not conform to your users idea of a request. Think Amazon, I may think of my book order as one "request", Amazons server will log this as 50 or so http requests.

Resources