Should HTTP POST be discouraged? - http

Quoting from the CouchDB documentation:
It is recommended that you avoid POST when possible, because proxies and other network intermediaries will occasionally resend POST requests, which can result in duplicate document creation.
To my understanding, this should not be happening on the protocol level (a confused user armed with a doubleclick is a completely different story). What is the best course of action, then?
Should we really try to avoid POST requests and replace them by PUT? I don't like that as they convey a different meaning.
Should we anticipate this and protect the requests by unique IDs where we want to avoid accidental duplication? I don't like that either: it complicates the code and prevents situations where multiple identical posts may be desired.

Should we really try to avoid POST
requests and replace them by PUT? I
don't like that as they convey a
different meaning.
For document creation (as mentioned in the doc you cite), that is exactly the correct meaning. For document modification, occasional resent requests are not a problem.

We should avoid POST requests when the request is idempotent and does change server state (it should not change the server state each time it is performed, in document creation this means the document should only be created once).
POST is suitable when the request is non-idempotent. This means that everytime the request is sent there is a change in server state. For an update or delete this should not matter.
Of course, due to web browser support only GET and POST requests have really been adopted because POST does all that PUT does (it's just not designed for idempotent requests).

I must have been extremely lucky in my WebMaster days to never have seen duplicate POST requests.
TCP/IP sends numbered messages, I wouldn't know why any webserver wouldn't discard of the duplicates.
Is this duplicate POST thing really an issue?

Related

Consequences of POST not being idempotent (RESTful API)

I am wondering if my current approach makes sense or if there is a better way to do it.
I have multiple situations where I want to create new objects and let the server assign an ID to those objects. Sending a POST request appears to be the most appropriate way to do that.
However since POST is not idempotent the request may get lost and sending it again may create a second object. Also requests being lost might be quite common since the API is often accessed through mobile networks.
As a result I decided to split the whole thing into a two-step process:
First sending a POST request to create a new object which returns the URI of the new object in the Location header.
Secondly performing an idempotent PUT request to the supplied Location to populate the new object with data. If a new object is not populated within 24 hours the server may delete it through some kind of batch job.
Does that sound reasonable or is there a better approach?
The only advantage of POST-creation over PUT-creation is the server generation of IDs.
I don't think it worths the lack of idempotency (and then the need for removing duplicates or empty objets).
Instead, I would use a PUT with a UUID in the URL. Owing to UUID generators you are nearly sure that the ID you generate client-side will be unique server-side.
well it all depends, to start with you should talk more about URIs, resources and representations and not be concerned about objects.
The POST Method is designed for non-idempotent requests, or requests with side affects, but it can be used for idempotent requests.
on POST of form data to /some_collection/
normalize the natural key of your data (Eg. "lowercase" the Title field for a blog post)
calculate a suitable hash value (Eg. simplest case is your normalized field value)
lookup resource by hash value
if none then
generate a server identity, create resource
Respond => "201 Created", "Location": "/some_collection/<new_id>"
if found but no updates should be carried out due to app logic
Respond => 302 Found/Moved Temporarily or 303 See Other
(client will need to GET that resource which might include fields required for updates, like version_numbers)
if found but updates may occur
Respond => 307 Moved Temporarily, Location: /some_collection/<id>
(like a 302, but the client should use original http method and might do automatically)
A suitable hash function might be as simple as some concatenated fields, or for large fields or values a truncated md5 function could be used. See [hash function] for more details2.
I've assumed you:
need a different identity value than a hash value
data fields used
for identity can't be changed
Your method of generating ids at the server, in the application, in a dedicated request-response, is a very good one! Uniqueness is very important, but clients, like suitors, are going to keep repeating the request until they succeed, or until they get a failure they're willing to accept (unlikely). So you need to get uniqueness from somewhere, and you only have two options. Either the client, with a GUID as Aurélien suggests, or the server, as you suggest. I happen to like the server option. Seed columns in relational DBs are a readily available source of uniqueness with zero risk of collisions. Round 2000, I read an article advocating this solution called something like "Simple Reliable Messaging with HTTP", so this is an established approach to a real problem.
Reading REST stuff, you could be forgiven for thinking a bunch of teenagers had just inherited Elvis's mansion. They're excitedly discussing how to rearrange the furniture, and they're hysterical at the idea they might need to bring something from home. The use of POST is recommended because its there, without ever broaching the problems with non-idempotent requests.
In practice, you will likely want to make sure all unsafe requests to your api are idempotent, with the necessary exception of identity generation requests, which as you point out don't matter. Generating identities is cheap and unused ones are easily discarded. As a nod to REST, remember to get your new identity with a POST, so it's not cached and repeated all over the place.
Regarding the sterile debate about what idempotent means, I say it needs to be everything. Successive requests should generate no additional effects, and should receive the same response as the first processed request. To implement this, you will want to store all server responses so they can be replayed, and your ids will be identifying actions, not just resources. You'll be kicked out of Elvis's mansion, but you'll have a bombproof api.
But now you have two requests that can be lost? And the POST can still be repeated, creating another resource instance. Don't over-think stuff. Just have the batch process look for dupes. Possibly have some "access" count statistics on your resources to see which of the dupe candidates was the result of an abandoned post.
Another approach: screen incoming POST's against some log to see whether it is a repeat. Should be easy to find: if the body content of a request is the same as that of a request just x time ago, consider it a repeat. And you could check extra parameters like the originating IP, same authentication, ...
No matter what HTTP method you use, it is theoretically impossible to make an idempotent request without generating the unique identifier client-side, temporarily (as part of some request checking system) or as the permanent server id. An HTTP request being lost will not create a duplicate, though there is a concern that the request could succeed getting to the server but the response does not make it back to the client.
If the end client can easily delete duplicates and they don't cause inherent data conflicts it is probably not a big enough deal to develop an ad-hoc duplication prevention system. Use POST for the request and send the client back a 201 status in the HTTP header and the server-generated unique id in the body of the response. If you have data that shows duplications are a frequent occurrence or any duplicate causes significant problems, I would use PUT and create the unique id client-side. Use the client created id as the database id - there is no advantage to creating an additional unique id on the server.
I think you could also collapse creation and update request into only one request (upsert). In order to create a new resource, client POST a “factory” resource, located for example at /factory-url-name. And then the server returns the URI for the new resource.
Why don't you use a request Id on your originating point (your originating point should do two things, send a GET request on request_id=2 to see if it's request has been applied - like a response with person created and created as part of request_id=2
This will ensure your originating system knows what was the last request that was executed as the request id is stored in db.
Second thing, if your originating point finds that last request was still at 1 not yet 2, then it may try again with 3, to make sure if by any chance just the GET response has gotten lost but the request 2 was created in the db.
You can introduce number of tries for your GET request and time to wait before firing again a GET etc kinds of system.

HTTP GET and POST semantics and limitations

Earlier this week, I had to do something which feels like a semantics violation. Let me explain.
I was making a simple AJAX client application, which was to make a request to a service with a given number of parameters. Since the whole app is basically read-only, I thought that using HTTP GET was the way to go. Some of the parameters that I had to pass were simple (such as the sort order, or page number).
However, one of the required parameters could be of variable length, and this made me worry. Since I was encoding all of the parameters in the querystring of the GET request, it seemed to me that this placed an unnecessary upper limit of (roughly) 2000 characters for the request URL. And regardless, I didn't like seeing 500-character-long request URLs.
So, since a POST request doesn't have a limitation like that, I decided to switch. But this doesn't feel right. I am under the impression that a POST denotes modification of data - but I'm using it for a simple read-only request.
Is there a better way to do this? To perform a GET, with many parameters? I've heard of one method - where you perform a preliminary POST of the parameters themselves, and then perform a GET. But, this technique leaves much to be desired.
But looking past this specific case, what are the real semantics and limitations of HTTP request methods? And why does GET not support any kind of parameter payload? Using the querystring in the URL almost feels like a hack to me.
A few points on this issue:
The HTTP spec (RFC 2616) doesn't forbit GET requests to have parameters, so it's not a matter of the semantics of HTTP GET itself. However, many HTTP stacks (for clients, services, or proxies) forbid bodies in HTTP requests, the fact that you can't use them is mostly an implementation detail (quite prevalent) than a semantic issue with the HTTP GET requests
Similarly, the limitation of the URI (or query string) length isn't specified on the RFC either. It's mostly a security mitigation implemented by several HTTP server stacks to prevent a bad client from consuming server resources (for example, in IIS/ASP.NET the default limit is 2k but you can increase it via some elements in web.config). Again, it's not a semantic but a practical issue.
POST requests do indicate data modification if you're following the REST philosophy, but there are many examples of HTTP POST requests used for read-only operations. SOAP uses POST in all of its requests, regardless of whether the operation it is calling is a "safe" or a "modifying" one. So you can use POST for those operations as well. However, by deviating from the REST (and the "canonical" HTTP) usage, you'll lose some of the features of the protocol, such as caching which can be applied for GET requests, but not for POST.
Your example of using two requests (POST with parameters + GET to "get" the results) seems overkill. As I mentioned, POST requests don't necessarily mean modifying resources, so you don't have to create a new "protocol" (POST+GET) to access your operation when one request is enough.

Why is using a HTTP GET to update state on the server in a RESTful call incorrect?

OK, I know already all the reasons on paper why I should not use a HTTP GET when making a RESTful call to update the state of something on the server. Thus returning possibly different data each time. And I know this is wrong for the following 'on paper' reasons:
HTTP GET calls should be idempotent
N > 0 calls should always GET the same data back
Violates HTTP spec
HTTP GET call is typically read-only
And I am sure there are more reasons. But I need a concrete simple example for justification other than "Well, that violates the HTTP Spec!". ...or at least I am hoping for one. I have also already read the following which are more along the lines of the list above: Does it violate the RESTful when I write stuff to the server on a GET call? &
HTTP POST with URL query parameters -- good idea or not?
For example, can someone justify the above and why it is wrong/bad practice/incorrect to use a HTTP GET say with the following RESTful call
"MyRESTService/GetCurrentRecords?UpdateRecordID=5&AddToTotalAmount=10"
I know it's wrong, but hopefully it will help provide an example to answer my original question. So the above would update recordID = 5 with AddToTotalAmount = 10 and then return the updated records. I know a POST should be used, but let's say I did use a GET.
How exactly and to answer my question does or can this cause an actual problem? Other than all the violations from the above bullet list, how can using a HTTP GET to do the above cause some real issue? Too many times I come into a scenario where I can justify things with "Because the doc said so", but I need justification and a better understanding on this one.
Thanks!
The practical case where you will have a problem is that the HTTP GET is often retried in the event of a failure by the HTTP implementation. So you can in real life get situations where the same GET is received multiple times by the server. If your update is idempotent (which yours is), then there will be no problem, but if it's not idempotent (like adding some value to an amount for example), then you could get multiple (undesired) updates.
HTTP POST is never retried, so you would never have this problem.
If some form of search engine spiders your site it could change your data unintentionally.
This happened in the past with Google's Desktop Search that caused people to lose data because people had implemented delete operations as GETs.
Here is an important reason that GETs should be idempotent and not be used for updating state on the server in regards to Cross Site Request Forgery Attacks. From the book: Professional ASP.NET MVC 3
Idempotent GETs
Big word, for sure — but it’s a simple concept. If an
operation is idempotent, it can be executed multiple times without
changing the result. In general, a good rule of thumb is that you can
prevent a whole class of CSRF attacks by only changing things in your
DB or on your site by using POST. This means Registration, Logout,
Login, and so forth. At the very least, this limits the confused
deputy attacks somewhat.
One more problem is there. If GET method is used , data is sent in the URL itself . In web server's logs , this data gets saved somewhere in the server along with the request path. Now suppose that if someone has access to/reads those log files , your data (can be user id , passwords , key words , tokens etc. ) gets revealed . This is dangerous and has to be taken care of .
In server's log file, headers and body are not logged but request path is . So , in POST method where data is sent in body, not in request path, your data remains safe .
i think that reading this resource:
http://www.servicedesignpatterns.com/WebServiceAPIStyles could be helpful to you to make difference between message API and resource api ?

Using GET for a non-idempotent request

Simply put, I have a website where you can sign up as a user and add data. Currently it only makes sense to add specific data once, so an addition should be idempotent, but theoretically you could add the same data multiple times. I won't get into that here.
According to RFC 2616, GET requests should be idempotent (really nullipotent). I want users to be able to do something like visit
http://example.com/<username>/add/?data=1
And this would add that data. It would make sense to have a PUT request do this with REST, but I have no idea how to make a PUT request with a browser and I highly doubt most people do or would want to bother to. Even using POST would be appropriate, but this has a similar problem.
Is there some technically correct way to allow users to add data using only GET (e.g. by visiting the link manually, or allowing external websites to use the link). When they visit this page I could make my own POST/PUT request either with javascript or cURL, but this still seems to violate the spirit of idempotent GET requests.
Is there some technically correct way to allow users to add data using
only GET ... ?
No matter how you go about letting clients access it, you'll end up violating RFC2616. It's ultimately up to you how you handle requests (nothing's going to stop you from doing this), but keep in mind that if you go against the HTTP specification, you might cause unexpected side-effects to clients and proxies who do abide by it.
Also see: Why shouldn't data be modified on an HTTP GET request?
As far as not being able to PUT from the browser, there are workarounds for that [1], [2], most of which use POST but also pass some sort of _method request parameter that's intercepted by the server and routes to the appropriate server-side action.

Is PUT/DELETE idempotent with REST automatic?

I am learning about REST and PUT/DELETE, I have read that both of those (along with GET) is idempotent meaning that multiple requests put the server into the same state.
Does a duplicate PUT/DELETE request ever leave the web browser (when using XMLHttpRequest)? In other words, will the server be updating the same database record for each PUT request, or will duplicate requests be ignored automatically?
If yes, how is using PUT or DELETE different from just using POST?
I read an article which suggested that RESTful web services were the way forward. Is there any particular reason why HTML5 forms do not support PUT/DELETE methods?
REST is just a design structure for data access and manipulation. There's no set-in-stone rules for how a server must react to data requests.
That being said, typically a REST request of PUT or DELETE would be as follows:
DELETE /item/10293
or
PUT /item/23848
foo=bar
fizz=buzz
herp=derp
The requests given are associated with a specific ID. Because of this, telling the server to delete the same ID 15 times will end up with pretty much the same result as calling it once, unless there's some sort of re-numbering going on.
With the PUT request, telling the server to update a specific item to specific values will also lead to the same result.
A case where a command would be non-idempotent would typically involve some sort of relative value:
DELETE /item/last
Calling that 15 times would likely remove 15 items, rather than the same last item. An alternative using HTTP properly might look like:
POST /item/last?action=delete
Again, REST isn't an official spec, it's just a structure with some common qualities. There are many ways to implement a RESTful structure.
As for HTML5 forms supporting PUT & DELETE, it's really up to the browsers to start supporting different methods rather than the spec itself. If all the browsers started implementing different methods for form submission, I'm sure they'd be added to the spec.
With the web going the way it is, a good RESTful implementation is liable to also incorporate some form of AJAX anyway, so to me it seems largely unnecessary.
Does a duplicate PUT/DELETE request ever leave the web browser (when using XMLHttpRequest)?
Yeah, sure. Idempotence is only a convention and it's not enforced. If you make a request, duplicate or not, it will run through.
In other words, will the server be updating the same database record for each PUT request, or will duplicate requests be ignored automatically?
If it conforms to REST it should update the same database record twice, for example running UPDATE user SET name = 'John' twice. There is not guarantee what it will or will not do though, it depends on how it's implemented.
If yes, how is using PUT or DELETE different from just using POST?
It's just a convention. PUT and DELETE requests may or may not be handled differently from POST in the site's code.
I read an article which suggested that RESTful web services were the way forward. Is there any particular reason why HTML5 forms do not support PUT/DELETE methods?
I'm not really sure, to be honest. You can work around this by using a hidden <input> field called _method or similar and set it to DELETE or PUT, and then handle that server side.
PUT operation are idempotent but not safe operation. On success if PUT operation is repeated it will not insert duplicate records. Repeat PUT operation in case of NetworkFailure errors after verifying conditional headers like If-unmodified-since and/or if-match. Don't repeat in case of 4XX or 5XX error codes.
REST aims to establish a syntax convention regarding the HTTP method to use; each back end is free to implement anything they want, devs could break the convention but will cause unnecessary confusion if used by others not involved in the development.
For DELETE, if you delete some item with an ID, the server should responded it's deleted; if delete again, it's no more there so server responded "already removed", also good because your purpose is fulfilled. Same for PUT, because you provide the new status of your resource, the status yet-to-be; if it's already updated, mission complete and it's the same for the client.

Resources