When should one use GET instead of POST in a web application? - http

It seems that sticking to POST is the way to go because it results in clean looking URLs. GET seems to create long confusing URLs. POST is also better in terms of security. Good for protecting passwords in forms. In fact I hear that many developers only use POST for forms. I have also heard that many developers never really use GET at all.
So why and in what situation would one use GET if POST has these 2 advantages?
What benefit does GET have over POST?

you are correct, however it can be better to use gets for search pages and such. Places where you WANT the URL's to be obvious and discoverable. If you look at Google's (or any search page), it puts a www.google.com/?q=my+search at the end so people could link directly to the search.
You actually use GET much more than you think. Simply returning the web page is a GET request. There are also POST, PUT, DELETE, HEAD, OPTIONS and these are all used in RESTful programming interfaces.
GET vs. POST has no implications on security, they are both insecure unless you use HTTP/SSL.

Check the manual, I'm surprised that nobody has pointed out that GET and POST are semantically different and intended for quite different purposes.
While it may appear in a lot of cases that there is no functional difference between the 2 approaches, until you've tested every browser, proxy and server combination you won't be able to rely on that being a consistent in every case. e.g. mobile devices / proxies often cache aggressivley even where they are requested not to (but I've never come across one which incorrectly caches a POST response).
The protocol does not allow for anything other than simple, scalar datatypes as parameters in a GET - e.g. you can only send a file using POST or PUT.
There are also implementation constraints - last time I checked, the size of a URL was limited to around 2k in MSIE.
Finally, as you've noted, there's the issue of data visibility - you may not want to allow users to bookmark a URL containing their credit card number / password.
POST is the way to go because it results in clean looking URLs
That rather defeats the purpose of what a URL is all about. Read RFC 1630 - The Need For a Universal Syntax.

Sometimes you want your web application to be discoverable as in users can just about guess what a URL should be for a certain operation. It gives a nicer user experience and for this you would use GET and base your URLs on some sort of RESTful specification like http://microformats.org/wiki/rest/urls

If by 'web application' you mean 'website', as a developer you don't really have any choice. It's not you as a developer that makes the GET or POST requests, it's your user. They make the requests via their web browser.
When you request a web page by typing its URL into the address bar of the browser (or clicking a link, etc), the browser issues a GET request.
When you submit a web page using a button, you make a POST request.
In a GET request, additional data is sent in the query string. For example, the URL www.mysite.com?user=david&password=fish sends the two bits of data 'user' and 'password'.
In a POST request, the values in the form's controls (e.g. text boxes etc) are sent. This isn't visible in the address bar, but it's completely visible to anyone viewing your web traffic.
Both GET and POST are completely insecure unless SSL is used (e.g. web addresses beginning https).

Related

Is it dangerous if a web resource POSTs to itself?

While reading some articles about writing web servers using Twisted, I came across this page that includes the following statement:
While it's convenient for this example, it's often not a good idea to
make a resource that POSTs to itself; this isn't about Twisted Web,
but the nature of HTTP in general; if you do this, make sure you
understand the possible negative consequences.
In the example discussed in the article, the resource is a web resource retrieved using a GET request.
My question is, what are the possible negative consequences that can arrive from having a resource POST to itself? I am only concerned about the aspects related to the HTTP protocol, so please ignore the fact that I mentioned about Twisted.
The POST verb is used for making a new resource in a collection.
This means that POSTing to a resource has no direct meaning (POST endpoints should always be collections, not resources).
If you want to update your resource, you should PUT to it.
Sometimes, you do not know if you want to update or create the resource (maybe you've created it locally and want to create-or-update it). I think that in that case, the PUT verb is more appropriate because POST really means "I want to create something new".
There's nothing inherently wrong about a page POSTing back to itself - in fact, many of the widely-used frameworks (ASP.NET, etc.) use that method to handle various events that happen on the client - some data is posted back to the same page where the server processes it and sends a new reponse.

Using GET for a non-idempotent request

Simply put, I have a website where you can sign up as a user and add data. Currently it only makes sense to add specific data once, so an addition should be idempotent, but theoretically you could add the same data multiple times. I won't get into that here.
According to RFC 2616, GET requests should be idempotent (really nullipotent). I want users to be able to do something like visit
http://example.com/<username>/add/?data=1
And this would add that data. It would make sense to have a PUT request do this with REST, but I have no idea how to make a PUT request with a browser and I highly doubt most people do or would want to bother to. Even using POST would be appropriate, but this has a similar problem.
Is there some technically correct way to allow users to add data using only GET (e.g. by visiting the link manually, or allowing external websites to use the link). When they visit this page I could make my own POST/PUT request either with javascript or cURL, but this still seems to violate the spirit of idempotent GET requests.
Is there some technically correct way to allow users to add data using
only GET ... ?
No matter how you go about letting clients access it, you'll end up violating RFC2616. It's ultimately up to you how you handle requests (nothing's going to stop you from doing this), but keep in mind that if you go against the HTTP specification, you might cause unexpected side-effects to clients and proxies who do abide by it.
Also see: Why shouldn't data be modified on an HTTP GET request?
As far as not being able to PUT from the browser, there are workarounds for that [1], [2], most of which use POST but also pass some sort of _method request parameter that's intercepted by the server and routes to the appropriate server-side action.

Is it a good idea to check the HTTP request query string and indicate an error once there's an unexpected parameter?

My ASP.NET application will have to handle HTTP GET requests that will have the following URL format:
http://mySite/getStuff?id="actualIdHere"
currently the requirement is to validate that there're no parameters in the query string except id and indicate an error like "unknown parameter P passed".
Is such requirement a good idea? Will it interfere with some obviously valid cases of using the application I haven't thought of?
It would be better to just validate the presence of id.
Validating unknown parameters doesn't serve much of a purpose, they will just be ignored.
Just edited my answer here:
There are also tracking solutions out there that will add to your query string.
One that comes to mind is web analytics.
If your application is going to be a public web site, you will want to implement some tracking of your traffic (e.g. google analytics).
If you want to implement a marketing campaign to draw traffic to your site, you will likely need to add a few parameters (specific to the tracking system you're using) to your querystring to check the effectiveness of your campaign.
It depends on your target audience.
It is not a good practice for public websites where you are aware of SEO, for example if you implement Google Analytics then a user come to your site from Search Results may have a parameter in URL like googleclid.
However in more protected websites it is fine.
It might affect forward compatiblity. For example, if you have separate client applications/websites that actually call this URLs, and future versions of these clients might provide additional parameters to getStuff (like a sort ordering, backlink, etc), making hard requirements on the parameters might make it harder to roll out new versions smoothly (i.e. cannot roll out new clients until the server is updated).
This in addition to the traffic forwarding parameters public websites might get as additional input, like the other answers mention.

Why shouldn't data be modified on an HTTP GET request?

I know that using non-GET methods (POST, PUT, DELETE) to modify server data is The Right Way to do things. I can find multiple resources claiming that GET requests should not change resources on the server.
However, if a client were to come up to me today and say "I don't care what The Right Way to do things is, it's easier for us to use your API if we can just use call URLs and get some XML back - we don't want to have to build HTTP requests and POST/PUT XML," what business-conducive reasons could I give to convince them otherwise?
Are there caching implications? Security issues? I'm kind of looking for more than just "it doesn't make sense semantically" or "it makes things ambiguous."
Edit:
Thanks for the answers so far regarding prefetching. I'm not as concerned with prefetching since is mostly surrounding internal network API use and not visitable HTML pages that would have links that could be prefetched by a browser.
Prefetch: A lot of web browsers will use prefetching. Which means that it will load a page before you click on the link. Anticipating that you will click on that link later.
Bots: There are several bots that scan and index the internet for information. They will only issue GET requests. You don't want to delete something from a GET request for this reason.
Caching: GET HTTP requests should not change state and they should be idempotent. Idempotent means that issuing a request once, or issuing it multiple times gives the same result. I.e. there are no side effects. For this reason GET HTTP requests are tightly tied to caching.
HTTP standard says so: The HTTP standard says what each HTTP method is for. Several programs are built to use the HTTP standard, and they assume that you will use it the way you are supposed to. So you will have undefined behavior from a slew of random programs if you don't follow.
How about Google finding a link to that page with all the GET parameters in the URL and revisiting it every now and then? That could lead to a disaster.
There's a funny article about this on The Daily WTF.
GETs can be forced on a user and result in Cross-site Request Forgery (CSRF). For instance, if you have a logout function at http://example.com/logout.php, which changes the server state of the user, a malicious person could place an image tag on any site that uses the above URL as its source: http://example.com/logout.php. Loading this code would cause the user to get logged out. Not a big deal in the example given, but if that was a command to transfer funds out of an account, it would be a big deal.
Good reasons to do it the right way...
They are industry standard, well documented, and easy to secure. While you fully support making life as easy as possible for the client you don't want to implement something that's easier in the short term, in preference to something that's not quite so easy for them but offers long term benefits.
One of my favourite quotes
Quick and Dirty... long after the
Quick has departed the Dirty remains.
For you this one is a "A stitch in time saves nine" ;)
Security:
CSRF is so much easier in GET requests.
Using POST won't protect you anyway but GET can lead easier exploitation and mass exploitation by using forums and places which accepts image tags.
Depending on what you do in server-side using GET can help attacker to launch DoS (Denial of Service). An attacker can spam thousands of websites with your expensive GET request in an image tag and every single visitor of those websites will carry out this expensive GET request against your web server. Which will cause lots of CPU cycle to you.
I'm aware that some pages are heavy anyway and this is always a risk, but it's bigger risk if you add 10 big records in every single GET request.
Security for one. What happens if a web crawler comes across a delete link, or a user is tricked into clicking a hyperlink? A user should know what they're doing before they actually do it.
I'm kind of looking for more than just "it doesn't make sense semantically" or "it makes things ambiguous."
...
I don't care what The Right Way to do things is, it's easier for us
Tell them to think of the worst API they've ever used. Can they not imagine how that was caused by a quick hack that got extended?
It will be easier (and cheaper) in 2 months if you start with something that makes sense semantically. We call it the "Right Way" because it makes things easier, not because we want to torture you.

HTTP Verbs and Content Negotiation or GET Strings for REST service?

I am designing a REST service and am trying to weight the pros and cons of using the full array of http verbs and content negotiation vs GET string variables. Does my choice affect cacheability? Neither solution may be right for every area.
Which is best for the crud and queries (e.g. ?action=PUT)?
Which is best for api version picking (e.g. ?version=1.0)?
Which is best for return data type(e.g. ?type=json)?
CRUD/Queries are best represented with HTTP verbs. A create and update is usually a PUT or POST. A retrieve would be a GET. Deletes would be a DELETE. Thats the generally mapping. The main point is that a GET doesn't cause side effects, and that the verbs do what you'd expect them to do.
Putting the action in the URI is OK if thats the -only- way to pass it (e.g, the http client library doesn't allow you to send non-GET/POST requests). Most libraries do, though, so its strongly advised not to pass the verb via the URL.
The "best" way to version the API would be using HTTP headers on a per-request basis; this lets clients upgrade/downgrade specific requests instead of every single one. Of course, that granularity of versioning needs to be baked in at the start and could severely complicate the server-side code. Most people just use the URL used the access the servers. A longer explanation is in a blog post by Peter Williams, "Versioning Rest Web Services"
There is no best return data type; it depends on your app. JSON might be easier for Ajax websites, whereas XML might be easier for complicated structures you want to query with Xpath. Protocol Buffers are a third option. Its also debated whether its better to put the return protocol is best specified in the URL or in the HTTP headers.
For the most part, headers will have the largest affect on caching, since proxies are suppose to respect them when told, as are user agents (obviously UA's behave differently, though). Caching based on URL alone is very dependent on the layers. Some user agents don't cache anything with a query string (Safari, iirc), and proxies are free to cache or not cache as they feel appropriate.

Resources