Which HTTP redirect status code is best for this REST API scenario? - http

I'm working on a REST API. The key objects ("nouns") are "items", and each item has a unique ID. E.g. to get info on the item with ID foo:
GET http://api.example.com/v1/item/foo
New items can be created, but the client doesn't get to pick the ID. Instead, the client sends some info that represents that item. So to create a new item:
POST http://api.example.com/v1/item/
hello=world&hokey=pokey
With that command, the server checks if we already have an item for the info hello=world&hokey=pokey. So there are two cases here.
Case 1: the item doesn't exist; it's created. This case is easy.
201 Created
Location: http://api.example.com/v1/item/bar
Case 2: the item already exists. Here's where I'm struggling... not sure what's the best redirect code to use.
301 Moved Permanently? 302 Found? 303 See Other? 307 Temporary Redirect?
Location: http://api.example.com/v1/item/foo
I've studied the Wikipedia descriptions and RFC 2616, and none of these seem to be perfect. Here are the specific characteristics I'm looking for in this case:
The redirect is permanent, as the ID will never change. So for efficiency, the client can and should make all future requests to the ID endpoint directly. This suggests 301, as the other three are meant to be temporary.
The redirect should use GET, even though this request is POST. This suggests 303, as all others are technically supposed to re-use the POST method. In practice, browsers will use GET for 301 and 302, but this is a REST API, not a website meant to be used by regular users in browsers.
It should be broadly usable and easy to play with. Specifically, 303 is HTTP/1.1 whereas 301 and 302 are HTTP/1.0. I'm not sure how much of an issue this is.
At this point, I'm leaning towards 303 just to be semantically correct (use GET, don't re-POST) and just suck it up on the "temporary" part. But I'm not sure if 302 would be better since in practice it's been the same behavior as 303, but without requiring HTTP/1.1. But if I go down that line, I wonder if 301 is even better for the same reason plus the "permanent" part.
Thoughts appreciated!
Edit: Let me try to better explain the semantics of this "get or create" operation with a more concrete example: URL shortening. This is actually much closer to my app anyway.
For URL shorteners, the most common operation by far is retrieving by ID. E.g. for http://bit.ly/4Agih5, bit.ly receives an ID of 4Agih5 and must redirect the user to its corresponding URL.
bit.ly already has an API, but it's not truly RESTful. For the sake of example, let me make up a more RESTful API. For example, querying the ID might return all sorts of info about it (e.g. analytics):
GET http://api.bit.ly/item/4Agih5
Now if I want to submit a new URL to bit.ly to shorten, I don't know the ID of my URL in advance, so I can't use PUT. I'd use POST instead.
POST http://api.bit.ly/item/
url=http://stackoverflow.com/ (but encoded)
If bit.ly hasn't seen this URL before, it'll create a new ID for it and redirect me via 201 Created to the new ID. But if it has seen that URL, it'll still redirect me without making a change. This way, I can hit that redirect location either way to get the info/metadata on the shortened URL.
Like this example of URL shortening, in my app, collisions don't matter. One URL maps to one ID, and that's it. So it doesn't really matter if the URL has been shortened before or not; either way, it makes sense to point the client to the ID for it, whether that ID needs to be created first or not.
So I probably won't be changing this approach; I'm just asking about the best redirect method for it. Thanks!

I'd argue for 303. Supposing right now hello=world&hokey=pokey uniquely identifies item foo, but later item foo's hokey value changes to "smokey"? Now those original values are no longer a unique identifier for that resource. I'd argue that a temporary redirect is appropriate.

I think one of the reasons that you are struggling with this scenario is because (unless we are missing some key information) the interaction is not very logical.
Let me explain why I think this. The initial premise is that the user is requesting to create something and has provided some key information for the resource they wish to create.
You then state that if that key information refers to an existing object then you wish to return that object. The problem is that the user did not wish to retrieve an existing object they wished to create a new one. If they cannot create the resource because either it already exists or there is a key collision then the user should be informed of that fact.
Choosing to retrieve an existing object when the user has attempted to create a new one seems to be a misleading approach.
Maybe one alternative would be to return a 404 Bad request if the resource already exists and include a link to the existing object in the entity body. The client application could choose to swallow the bad request error and simply follow the link to the existing entity and by doing so hide the issue from the user. That would be the choice of the client application, but at least the server is behaving in a clear manner.
Based on the new example, let me suggest a completely different approach. It may not work in your case, as always the devil is in the details, but maybe it will be helpful.
From the client's perspective it really has no interest in whether the server is creating a new shortened URL or pulling back an existing one. In fact, whether the server needs to generate a new ID or not is an implementation detail that is completely hidden.
Hiding the creation process could be very valuable. Maybe the server can predict in advance that lots of short urls will soon be requested related to a event such as a conference. It could pre-generate these urls in quite periods to balance the load on its servers.
So, based on that assumption, why not just use
GET /ShortUrl?longUrl=http://www.example.org/en/article/something-that-is-crazy-long.html&suggestion=crazyUrl
If the url already existed then you might get back
303 See Other
Location: http://example.org/ShortUrl/3e4tyz
If it previously didn't, you might get
303 See Other
Location: http://example.org/ShortUrl/crazyurl
I realize that this looks like we are breaking the rules of GET by creating something in response to a GET, but I believe in this case there is nothing wrong with it because client did not ask for the shortened URL to be created and really does not care either way. It is idempotent because does not matter how many times you call it.
One interesting question that I don't know the answer to is whether proxies will cache the initial GET and redirect. That might be an interesting property as future requests by other users for the same url may never need to actually get to the origin server, the proxy could handle the request completely.

POST does not support a 'lookup or create' approach. The server cannot tell the client "I would create that, but it already existed. Look here for the existing entry". None of the 2xx codes work because the request is not successful. None of the 3xx codes work, because the intention is not to redirect the POST to a new resource. And 303 is also not appropriate since nothing changed (see 303 spec).
What you could do is provide a form or template to the client to be used with PUT that tells the client how to construct the PUT URI. If the PUT results in a 200 the client knows the resource existed and if 201 is returned that a new resource has been created.
For example:
Template for URI: http://service/items/{key}
PUT http://service/items/456
[data]
201 Created
or
PUT http://service/items/456
[data]
200 Ok
You can also do a 'create but do not replace if exists' using If-None-Match:
PUT http://service/items/456
If-None-Match: *
[data]
412 Precondition failed
Jan

From the client's point of view, I would think that you could just send a 201 for case 2 the same as for case 1 as to the client the record is now "created".

HTTP 1.1. Spec (RFC 2616) suggests 303:
303 See Other
The response to the request can be found under a different URI and
SHOULD be retrieved using a GET method on that resource. This method
exists primarily to allow the output of a POST-activated script to
redirect the user agent to a selected resource. The new URI is not a
substitute reference for the originally requested resource.

Related

HTTP Status Code for detected query string manipulations

What is the best HTTP Status Code to use if the server detects that the query string of an URI has been tampered with by the Client?
It's important to know what the nature is of the tampering. If you simply want to forbid people from accessing certain urls, 403 is often the most appropriate.
But there may be something more specific.
Let's say we have a collection of some sort and the items can be accessed via some identifier (.../collection/1). Now let's assume that a user can access the items with identifier between 1 and 100 using a GUI with buttons for instance that sets up the rest call, but not for identifiers > 100. Now if a user manipulates the HTML and the request in the browser and tries to access an item with id = 200, what status code is most appropriate to return?
Lets say /collection/101. If the item exists, but the user simply is not allowed to access that item, a 403 is appropriate.
If 101 is never accessible by anyone, because for example 101 is actually accessed via /collection2/101, a 404 status code is the most appropriate.
If /collection/101 is not accessible, because the server has a 'state' that needs to change or be resolved first. This state can be resolved by the user, and is not an access-control issue, a 409 may be appropriate. But I'd say that usually this is not appropriate for requests such as GET.
The way you describe it, it sounds more like a permission issue. So then 403 is an easy choice.

HTTP Status Code for Resource not yet available

I have a REST endpoint accepting a POST request to mark a code as redeemed. The code can only be redeemed between certain dates.
How should I respond if someone attempts to redeem the code early?
I suspect HTTP 403, Forbidden, is the right choice but then the w3c states that "the request SHOULD NOT be repeated" whereas in this case I would anticipate the request being repeated, just at a later date.
409 Conflict
The request could not be completed due to a conflict with the current
state of the resource. This code is only allowed in situations where
it is expected that the user might be able to resolve the conflict and
resubmit the request. The response body SHOULD include enough
information for the user to recognize the source of the conflict.
Ideally, the response entity would include enough information for the
user or user agent to fix the problem; however, that might not be
possible and is not required.
403 Forbidden makes more sense if they are trying to redeem a coupon that has already been redeemed, though 410 Gone seams elegant in this situation as well.
404 Not Found isn't ideal because the resource does in fact exist, however you can use it if you don't want to specify a reason with the 403 or if you want to hide the existence of the resource for security reasons.
If you are using HATEOAS, then you can also head you clients off at the pass (so to speak) by only including a redeem hypermedia control in the coupon resource (retrieved via a GET) when the coupon can be redeemed; though this won't stop overly bound clients from trying to redeem it anyway.
EDIT: Thanks to some good critiques (see below), I want to caveat this answer. It is based on Richardson & Ruby's writeup, which arguably doesn't mesh well with the httpbis writing on 403 Forbidden. (Personally, now I'm learning towards 409 as explained by Tom in a separate answer.)
403 Forbidden is the best choice. I will cite RESTful Web Services by Richardson & Ruby line by line. As you will see, 403 is a great fit:
The client's request is formed correctly, but the server doesn't want to carry it out.
Check!
This is not merely the case of insufficient credentials: that would be a 401 ("Unauthorized"). This is more like a resource that is only accessible at certain times, or from certain IP addresses.
Check!
A response of 403 implies that the client requested a resource that really exists. As with with 401 ("Unauthorized"), if the server doesn't want to give out even this information, it can lie and send a 404 ("Not Found") instead.
You wrote above: "The Code representation is available to be GETted before it goes live." So, you aren't trying to hide anything. So, stick with the 403. Check!
If the client's request is well-formed, why is this status code in the 4xx series (client-side error) instead of the 5xx series (server-side error)? Because the serve made it decision based on some aspect of the request other than its form; say, the time of day the request was made.
Check! The client's request was formed corrected, but it was inappropriate for the particular time.
We went four for four. The 403 code is a winner. No other codes match as well.
All of this said, a plain, non-specific 400 wouldn't be wrong, but would not be as specific or useful.
Another answer suggested the 409 Conflict code. Although worth considering, it isn't as good a fit. Here is why. According to Richardson & Ruby again:
Getting this [409] response response means that you tried to put the server's resources into an impossible or inconsistent state. Amazon S3 gives this response code when you try to delete a bucket that is not empty.
Claiming a promotion before it is 'active' wouldn't "put a server resource into an inconsistent state." It would break some business rules -- and result in cheating -- but it wouldn't cause a logical contradiction that I see.
So, whether you realized it at the onset of asking your question or not, 403 is a great choice. :)
Since Rest URLs should represent resources I would reply with 404 - Not Found
The resource is only available between certain dates, so on any other date it is not found.
When it says the request "SHOULD NOT be repeated", it is referring to the message that you should send to the viewer.
It has nothing to do with whether an actual request is repeated. (The user will get the same 403 message over and over again if s/he so desires.)
That said, a 404 is not appropriate for this because the resource is available - just that the code is not redeemable/forbidden to redeem. It is actually harmful because it tells the user that you probably made a mistake in your URL link or server configuration.
Of course, this assumes that on the appropriate date you return a 200 instead.

Put versus Post - REST

While looking at the code in "petclinic", part of Spring 3.0 samples I noticed the following lines
<c:choose>
<c:when test="${owner.new}"><c:set var="method" value="post"/></c:when>
<c:otherwise><c:set var="method" value="put"/></c:otherwise>
</c:choose>
In this discussion at SO it seems that PUT should be used for "create/update" and POST for "updates".
Which is right?
What is the impact of using post for "create" and put for "update"?
Note : According to the HTTP/1.1 spec. quoted in the referenced SO discussion, the code given above seems to have the correct behavior.
Both POST and PUT are have well defined behavior as per HTTP spec.
The result of a POST request should be a new resource that is subordinate to the request URL; the response should contain Location header with the URL of the newly created resource.
The result of a PUT should be an update of the resource at the request URL. if there is no existing resource at the request URL, a new one can be created.
The confusion arises from the fact that POST is also used with forms as a mechanism to pass the form data. Most common implementation of forms is to post back to the same URL at which the form page is located, thus giving the false idea that the POST operation is used for an update. However, in this particular usage, the form page is not the resource.
With all this in mind, here's the correct (in my opinion of course :-)) usage:
POST should be used to create new resources when:
- the new resource is subordinate to an existing resource
- the resource identity/URL is not known at creation time
PUT should be used to update existing resources with well-known URL. It can be used to create a resource at well-known URL as well; however, it does help to think about this scenario in a different way - if the resource URL is known before the PUT request is made, this could be treated the same as the resource at this location already existing but being empty.
It's quite simple:
POST allows anything to happen, and it isn't restricted to creating "subordinate" resources, but allows the client to "provide a block of data ... to a data-handling process" (RFC 2616 sec 9.5). POST means "Here's that data you asked for just now"
PUT is used as an opposite of GET. The usual flow is that you GET a resource, modify it somehow, and then you PUT it back at the same URI that you got it from. PUT means "Please store this file at this URI".
The uniformity of PUT (which is to store a file) allows intermediaries (e.g. caches) to invalidate any cached responses they might have at that exact URI (since they know that it's about to change). The uniformity of PUT also allows clients (that understand this) to modify a resource by first retrieving it (GET) and then send a modified copy back (PUT). It also allows clients to retry on a network failure, due to PUT's idempotency.
Side note: Using PUT to create resources is dubious. While it's possible within the spec, I don't see it as a good idea, just as using POST to perform searches isn't a good idea, just as tunneling SOAP over HTTP isn't a good idea. AtomPub explicitly states that PUT isn't used to create atom entries.
POSTs ubiquitousness comes from the fact that HTML defines <form> elements that result in POSTing a application/x-www-form-urlencoded entity, with which the recipient can do anything it pleases, including
creating subordinate resources (The repsonse is usually accompanied by a 201 response and Location header)
creating a completely different resource (again usually a 201 response and Location header)
creating many subordinate and/or unrelated resources (perhaps with a simple response indicating the URIs of the created resources)
doing nothing except return a response (e.g. 200 or 302) (a case where perhaps GET should have been used)
modifying the resource that received the POST itself (returning or redirecting back to the updated resource).
delete one or more resource.
any combination of the above.
The only one who knows what will happen in a POST request is the user who initiated the request (by clicking the huge "yes I confirm deleting my Facebook profile" button) and the server that's handling the request. To the rest of the world, the request is opaque and doesn't mean anything other than "this URI is being passed some data".
So the answer to your question is that both POST and PUT can be used for both create and update.
POST is often use to create resources (like AtomPub 9.2)
PUT semantics fits well for modifying resources (like AtomPub 9.3)
POST may be used to modify resources (like a www form edit your profile)
PUT can technically be used to create resources (although I advise against it)

Should I stop redirecting after successful POST or PUT requests?

It seems common in the Rails community, at least, to respond to successful POST, PUT or DELETE requests by redirecting instead of returning success. For instance, if I PUT a legal change to my user profile, the idiomatic response would be a 302 Redirect to the profile page.
Isn't this wrong? Shouldn't we be returning 200 OK from the request? Or a 201 Created, in the case of a POST request? Either of those, in the HTTP/1.1 Status Definitions are allowed to (or required to) include a response, anyway.
I guess I'm wondering, before I go and "fix" my application, whether there is there a darn good reason why the community has gone the way of redirects instead of successful responses.
I'll assume, your use of the PUT verb notwithstanding, that you're talking about a web app that will be accessed primarily through the browser. In that case, the usual reason for following up a POST with a redirect is the post-redirect-get pattern, which avoids duplicate requests caused by a user refreshing or using the back and forward controls of their browser. It seems that in many instances this pattern is overloaded by redirecting not to a success page, but to the next most likely place the user would visit. I don't think either way you mention is necessarily wrong, but doing the redirect may be more user-friendly at the expense of not strictly adhering to the semantics of HTTP.
It's called the POST-Redirect-GET (PRG) pattern. This pattern will prevent clients from (accidently) re-executing non-idempotent requests when for example navigating forth and back in browser's history.
It's a good general web development practice which doesn't only apply on RoR. I'd just keep it as is.
In a perfect world, yes, probably. However HTTP clients and servers are a mess when it comes to standardization and don't always agree on proper protocol. Redirecting after a post helps avoid things like duplicate form submissions.

Do I need to use http redirect code 302 or 307?

Suppose I have a page on my website to show media releases for the current month
http://www.mysite.com/mediareleases.aspx
And for reasons which it's mundane to go into*, this page MUST be given a query string with the current day of the month in order to produce this list:
http://www.mysite.com/mediareleases.aspx?prevDays=18
As such I need to redirect clients requesting http://www.mysite.com/mediareleases.aspx to http://www.mysite.com/mediareleases.aspx?prevDays=whateverDayOfTheMonthItIs
My question is, if I want google to index the page without the query parameter, should I use status code 302 or 307 to perform the redirect?
Both indicate that the page has "temporarily" moved - which is what I want because the page "moves" every day if you get my meaning.
[*] I'm using a feature of a closed-source .NET CMS so my hands are tied.
Google's documentation seems to indicate that both 302 and 307 are treated equivalently, and that "Googlebot will continue to crawl and index the original location."
But in the face of ambiguity, you might as well dig into the RFCs and try to do the Right Thing, with the naïve hope that the crawlers will do the same. In this case, RFC 2616 § 10.3 contains nearly identical definitions for each response code, with one exception:
302: Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.
307: Since the redirection MAY be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.
Which does not strike me as a significant distinction. My reading is that 302 instructs clients that webmasters are untrustworthy, and 307 explicitly tells webmasters that clients will not trust them, so they may freely alter the redirect.
I think the more telling point is the note in 302's definition:
Note: RFC 1945 and RFC 2068 specify that the client is not allowed to change the method on the redirected request. However, most existing user agent implementations treat 302 as if it were a 303 response, performing a GET on the Location field-value regardless of the original request method. The status codes 303 and 307 have been added for servers that wish to make unambiguously clear which kind of reaction is expected of the client.
Which, to me, indicates that 302 and 307 are largely equivalent, but that HTTP/1.0 clients failed to implement 302 correctly the first time around.
Short answer: neither. In most cases the code you really want to use is 303.
For the long answer, first we need some background.
When getting a redirect code the client can (A) load the new location using the same request type or (B) it can overwrite it and use GET.
The HTTP 1.0 spec did not have 303 and 307, it only had 302, which mandated the (A) behavior. But in practice it was discovered that (A) led to a problem with submitted forms.
Say you have a contact form, the visitor fills it and submits it and the client gets a 302 to a page saying "thanks, we'll get back to you". The form was sent using POST so the thanks page is also loaded using POST. Now suppose the visitor hits reload; the request is resent the same way it was obtained the first time, which is with a POST (and the same payload in the body). End result: the form gets submitted twice (and once more for every reload). Even if the client asks the user for confirmation before doing that, it's still annoying in most cases.
This problem became so prevalent that client producers decided to override the spec and issue GET requests for the redirected location. Basically, it was an oversight in the HTTP 1.0 spec. What clients needed most was a 303 (and behavior (B) above), but instead they only got 302 (and (A)).
If HTTP 1.0 would have offered both 302 and 303 there would have been no problem. But it didn't, so it resulted in a 302 which nobody used correctly. So HTTP 1.1 added 303 (badly needed) but also decided to add 307, which is technically identical to 302, but is a sort of "explicit 302"; it says "yeah, I know the issues surrounding 302, I know what I'm doing, give me behavior (A)".
Now, back to our question. You see now why in most cases you will want 303.
Cases where you want to preserve the request type are very rare. And if you do find yourself such a case, the answer is simple: use 302. Either the client speaks HTTP 1.0, in which case it can't understand 307; or it speaks HTTP 1.1, which means it has no reason to preserve the rebelious behavior of old clients ie. it implements 302 correctly, so use it!
5 years on... note that the behaviour of 307 has been updated by RFC-7231#6.4.7 in June 2014, and is now significantly different from a 302, in that the method may not change:
The 307 (Temporary Redirect) status code indicates that the target
resource resides temporarily under a different URI and the user agent
MUST NOT change the request method if it performs an automatic
redirection to that URI.
Probably not an issue for the original question, but may be relevant to others who come across this question just looking for the difference.
I feel your pain. As for a solution, it's hard to say what search engines will do. It seems that each one has its own way of handling redirects. This link suggests that a 302 will index the contents of the redirected page but still use the main page link, but it's not clear what a 307 will do.
Another way you could consider proceeding is with a javascript redirect and a <noscript> tag explaining what's going on. That will also foul up non-javascript browsers, and you'd have to proceed with caution to avoid Google's sneaky-site detection routine, but I suspect that as long as your noscript contains a hyperlink that matches the new URL you'd be OK.
Either way I'd still pursue doing a purely server-side request if at all possible. Heck, if your expected traffic is light, you could treat your home page as a proxy in the case where there's no querystring. Have it use a background thread to request itself with the querystring and pipe out the results. :-)
edit just saw you're using .NET. Maybe consider this answer from SO: C# Can i modify Request.Form's variables? .

Resources