When does the Xively API return 406 "Not acceptable" or "403 Rate too fast"? - xively

The Xively API is rate-limited but I'm trying to understand what the limits are so that I can adjust my client accordingly. In fact there seems to be more than one limit: in some cases I see a 406 (Not acceptable) HTTP response, and other times I see a 403 (Rate too fast) HTTP response.
I think the 406 occurs when the number of API calls exceeds a certain rate - in my test the limit seems to be around 25 API calls per minute. The HTTP response includes a "Retry-After: 5" header.
If my test queries more than one device the limit still seems to be 25 API calls per minute - I don't think this limit is per device. The 406 error code is not mentioned in the Xively API documentation.
The 403 error code is described in the Xively documentation: https://xively.com/dev/docs/api/communicating/usage_limits/
The page talks about per-device limits and suggests the limit is different for reading and writing but doesn't really give any more detail than that.
Can anyone shed any more light on what the limits actually are? I am currently using a development-mode account - it's possible the 406 error only occurs in development mode. However the link mentioned above suggests you can get the 403 error in production mode too.

Related

Which HTTP status code should be used for business errors in API design?

Lets say I have an API endpoint that executes some business operation which can result in many different failures that are not connected directly to the request.
The request is correctly formed and I cannot return 4xx failures, but the logic of the application dictates that I return different error messages.
Now I want the client to be able to differentiate these error messages so that different actions can be taken depending on the code. I can return a custom JSON like this e.g.
{
"code": 15,
"message": "Some business error has occurred"
}
Now the question is which HTTP status code should I use for such occasions if no standard code like Conflict or NotFound makes sense.
It seems that 500 InternalServerError is logical, but then how can I additionally flag that this cannot be retried, should it be just documented that given status codes is not possible to retry so one can retry if you don't get one of those?
Consult RFC 7231:
503 Service Unavailable looks like a potential candidate, but the RFC mentions that this is supposed to represent a problem "which will likely be alleviated after some delay." This would indicate to a client that it could try the same call later, maybe after business hours or on the weekend. This is not what you want.
501 Not Implemented could be possible, but the RFC mentions "This
is the appropriate response when the server does not recognize the
request method and is not capable of supporting it for any resource. A 501 response is cacheable by default;" This does not appear to be the case here - the HTTP method itself was presumably valid - the failure here seems to be happening at the business rules layer (e.g. sending in an account number that is not in the database), rather than an HTTP method (GET, POST, etc.) that you never got around to implementing.
That leaves the last serious candidate,
500 Internal Server Error
The 500 (Internal Server Error) status code indicates that the server
encountered an unexpected condition that prevented it from fulfilling
the request.
This is the error code that is normally used for generic "an exception occurred in the app" situations. 500 is the best choice.
As to how to distinguish this from a "temporal internal trouble" error, you can include this as part of the HTTP body - just make sure that your client can parse out the custom codes!

What should the HTTP Status Code of a Degraded Health Check Be?

I have a health check endpoint at /status that returns the following status codes and response bodies:
Healthy - 200 OK
Degraded - ?
Unhealthy - 503 Service Unnavailable
What should the HTTP status code be for a degraded response be? A 'degraded' check is used for checks that did succeed but are slow or unstable. What HTTP status code makes the most sense?
The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK.
I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.
You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...
Other comments on your design
The most natural way to interpret a 5xx response to any request is that the operation in question failed.
So a 503 Service Unavailable response to a GET /status request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:
A health endpoint is only meaningful in the context of the component
it indicates the health of. It has no other meaning or purpose. As
such, its health is a conduit to the health of the component.
Clients SHOULD assume that the HTTP response code returned by the
health endpoint is applicable to the entire component (e.g. a larger
API or a microservice).
But with a URL path of just /status, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.
Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health. I would also suggest placing it under some base path, e.g. /things/health, to make it more clear which component it indicates the health of.
If, on the other hand, /status was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status service itself.
So be careful with how you name things and what status codes you use!
Consider returning a custom code within the 2xx Success range that is not already taken within the known/common status codes. Similar to some of the unofficial codes not supported by any standard.
For example 218 This is fine (Apache Web Server)
Used as a catch-all error condition for allowing response bodies to flow through Apache when ProxyErrorOverride is enabled. When ProxyErrorOverride is enabled in Apache, response bodies that contain a status code of 4xx or 5xx are automatically discarded by Apache in favor of a generic response or a custom response specified by the ErrorDocument directive
After doing some research I came across a draft
Health Check Response Format for HTTP APIs: draft-inadarei-api-health-check-03
Where they also made similar suggestions
In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
where the warn status in the draft is healthy, with some concerns, which I believe aligns closely to your desired model.
While not definitive, I believe it provides some ideas to help with the eventual design.
I would be wary of splitting hairs like this on a healthcheck on the upstream server side. The service providing the healthcheck should be lightly (and concurrently) testing all its upstream dependencies based on its own set of policies or rules - request timeouts, connection failures and so on. In reality the healthcheck either works or it doesn't and the application shouldn't really need to be keeping track of the results of the healthcheck (other than capturing metrics about what happened). IMHO a stateful healthcheck is a recipe for disaster.
I typically use the following interface for application healthchecks:
204 - No Content, everything is working within tolerences
500 - Something failed, and here's some details in the response about what went wrong
Where it gets tricky depends on your architecture. You may have a VIP or reverse proxy that is interpreting this response and deciding if a given node is healthy or not, in which case it's going to either route the request to a healthy node or return the 503 Service Unavailable. This decision is going to made on some policy basis - x healthcheck requests failed over a y time period across z upstream services.
If you use a mesh then everyone can feed data back to the service registry to keep the health state up to date and it can be based on actual service calls rather than a healthcheck.
The client is perfectly placed to make a decision based on the health of services it depends on as they can keep track of the various responses from the service. Circuit breakers are an excellent way to handle that and can do it continuously on actual requests rather than just on the healthcheck. Circuit breaker libraries (such as resilience4j) will do this for you at the cost of setting up some policies about how many failed/slow requests constitute a bad service. Service Registrys like the netflix eureka can help with the discovery and ongoing monitoring.
Assuming you are referring to the status code of a liveness/healthcheck endpoint of a service - to distinguish from 200 OK a 203 likely seems applicable and in line with:
https://datatracker.ietf.org/doc/draft-inadarei-api-health-check/
https://www.rfc-editor.org/rfc/rfc7234#section-5.5 despite being deprecated Warning: 199-header MAY carry details
align max-age with livenessProbe.periodSeconds
HTTP/1.1 203 Non-Authoritative Information
Warning: 199 - "FooBar Warning Details"
Content-Type: application/health+json
Cache-Control: max-age=10
Connection: close
{"status": "warn"}

Http status code to send if server state invalid?

I am writing a REST service and some of the push requests will only work during a certain window of time. For example during active work hours. Outside of those times the server will send an error.
I have looked at the available HTTP status codes and I am not sure which one best to apply for an 'invalid server state' or equivalent situation. I am considering a 400 (Bad Request) or a 422 (Unprocessable Entity)?
For the 422, the definition I have is "The request was well-formed but was unable to be followed due to semantic errors." and wondering whether this is really the most applicable case?
400 Bad Request looks like the right response to me. It's definitely a client error, but there's nothing wrong with the request itself; just the timing of the request. If the response body contains some additional information to make that clear (along the lines of "Our offices are closed. Please make your request between the hours of 9 AM and 5 PM GMT, Monday to Friday.") then you've successfully used a simple and common response type in the appropriate manner. Which makes for a good API.
As an additional note; the reason I'd say that a 422 would be less correct is that the meaning of the request is clear. It's just a timing issue, there's no semantic error.

Which HTTP status code should I use for a health-check failure?

I'm implementing a /_status/ endpoint which does some sanity checks on data in our database.
For example, we are collecting measurements and the status should go "bad" if the latest measurement is over an hour old.
I would like to point Pingdom at this URL to leverage their alerting infrastructure and tell us when something's wrong.
On a "good" status I will serve an HTML page with an HTTP 200 OK status. But what would an appropriate HTTP status code be for "bad"? Or would it be more correct not to convey this information via status code, but via HTML content instead?
Thanks!
Well... this is an old question, but I ended up here, so I thought I'd give my two cents here:
It seems pretty clear that a 2xx should be returned if all is OK
If health is not OK, I think it should return a 5xx result (4xx talks about the client being at fault in the request; 2xx and 3xx are all successful to some degree).
I think that a 5xx is correct because this is a special request that is answering about the state of the whole service. Also, because most Load Balancers offer liveliness checks based on response codes and not all offer a way to parse a more complex payload (other than perhaps a RegExp Match which can make the check brittle).
I agree with #Julien that a 500 (specifically) doesn't seem appropriate, and we've decided on 503 Service Unavailable.
503 seems to fit for a couple of reasons:
It's a 5xx family result code which indicates that something is going on on the server side.
It has a temporary nature to it indicating that it may recover.
We just had a similar discussion in our group. We decided for our purposes that the HTTP response codes should be reporting on your server's success or failure to honor the request. For a GET, this would mean whether or not you can respond with the requested resource. In this case, the requested resource is a health report, so as long as you're returning that successfully, it should be a 200 response.
We're returning JSON for our health check, with a top-level "isHealthy" field set to true or false. Our load balancer and other monitors will parse the JSON and use this field to determine if the system is healthy or not.
If you don't want to parse JSON in your monitors, you could try putting a custom response header to indicate binary health of the system, e.g., System-Health: true or System-Health: false. You might have better luck getting monitors which can check that.
If you really want to use a response code, I would recommend an additional endpoint called something like "health" which returns a "204 No Content" when healthy, and a "404 Not Found" when not healthy. In this case, the resource defined by the URL is, symbolically, the health of your system, and so if it's healthy, you can return a successful response. If it's unhealthy, then it's health can't be found, hence the 404.
If your data is 'bad' because there is a service failure (even if that is a backend job failing) then a HTTP 500 seems like a valid response. It indicates that something, somewhere is broken.
It isn't very specific, you're shrugging your shoulders and saying:
The 500 (Internal Server Error) status code indicates that the server
encountered an unexpected condition that prevented it from fulfilling
the request.
ietf rfc7231
If you ask for health and the server state is not healthy, I'm partial to 409 Conflict which "Indicates that the request could not be processed because of conflict in the current state of the resource" .
Some people might object that if you can respond then the request can be processed, but I disagree. Every error message is a response. The server defines resource semantics. If you ask for the good news resource and the server responds "here is bad news", it didn't give you what it defines to have offered at that resource.
In practice, it's much easier to say 2**="up" 4**="down" and pipe request counts into an availability metric and have a load balancer remove the server from its pool based on the response code. Coming up with ways to argue that "hey, we told you something, so 200 OK" just seems like missing the forrest for the trees to me.

HTTP Status code for generic failure

I am looking for a correct status code to send for a gneral failure through an api.
The exact scenario is failing to add an product to a shopping cart.
The failure could happen for a large number of reasons, but i would like to return a single http code.
Which would be best?
I have been looking through them and cant see anything that exactly fits the needs here.
Some of the possible failure conditions could be:
Not enough stock to satisfy
Stock limit reached for that particular product
Product no longer available
If it's server error then it should be 500. If it's client error, use 400.
It's hard to be more precise than that without seeing the URI and what you do with it. For example, if "Product no longer available" is a result of GET request, then it should be 404 (not found). But if it was a POST request, then it should be 200 or 202.
For the other two, they might not be error. It could be the client has sent the correct request but the stock has been consumed by someone else, in this case server should return 409 (conflict) . If the request was for too much stock from the start, then it should just be 200/202.
If you had to have only one code, just use 400 and 200 (see above).

Resources