Why shouldn't data be modified on an HTTP GET request?

Why shouldn't data be modified on an HTTP GET request? - http

I know that using non-GET methods (POST, PUT, DELETE) to modify server data is The Right Way to do things. I can find multiple resources claiming that GET requests should not change resources on the server.
However, if a client were to come up to me today and say "I don't care what The Right Way to do things is, it's easier for us to use your API if we can just use call URLs and get some XML back - we don't want to have to build HTTP requests and POST/PUT XML," what business-conducive reasons could I give to convince them otherwise?
Are there caching implications? Security issues? I'm kind of looking for more than just "it doesn't make sense semantically" or "it makes things ambiguous."
Edit:
Thanks for the answers so far regarding prefetching. I'm not as concerned with prefetching since is mostly surrounding internal network API use and not visitable HTML pages that would have links that could be prefetched by a browser.

Prefetch: A lot of web browsers will use prefetching. Which means that it will load a page before you click on the link. Anticipating that you will click on that link later.
Bots: There are several bots that scan and index the internet for information. They will only issue GET requests. You don't want to delete something from a GET request for this reason.
Caching: GET HTTP requests should not change state and they should be idempotent. Idempotent means that issuing a request once, or issuing it multiple times gives the same result. I.e. there are no side effects. For this reason GET HTTP requests are tightly tied to caching.
HTTP standard says so: The HTTP standard says what each HTTP method is for. Several programs are built to use the HTTP standard, and they assume that you will use it the way you are supposed to. So you will have undefined behavior from a slew of random programs if you don't follow.

How about Google finding a link to that page with all the GET parameters in the URL and revisiting it every now and then? That could lead to a disaster.
There's a funny article about this on The Daily WTF.

GETs can be forced on a user and result in Cross-site Request Forgery (CSRF). For instance, if you have a logout function at http://example.com/logout.php, which changes the server state of the user, a malicious person could place an image tag on any site that uses the above URL as its source: http://example.com/logout.php. Loading this code would cause the user to get logged out. Not a big deal in the example given, but if that was a command to transfer funds out of an account, it would be a big deal.

Good reasons to do it the right way...
They are industry standard, well documented, and easy to secure. While you fully support making life as easy as possible for the client you don't want to implement something that's easier in the short term, in preference to something that's not quite so easy for them but offers long term benefits.
One of my favourite quotes
Quick and Dirty... long after the
Quick has departed the Dirty remains.
For you this one is a "A stitch in time saves nine" ;)

Security:
CSRF is so much easier in GET requests.
Using POST won't protect you anyway but GET can lead easier exploitation and mass exploitation by using forums and places which accepts image tags.
Depending on what you do in server-side using GET can help attacker to launch DoS (Denial of Service). An attacker can spam thousands of websites with your expensive GET request in an image tag and every single visitor of those websites will carry out this expensive GET request against your web server. Which will cause lots of CPU cycle to you.
I'm aware that some pages are heavy anyway and this is always a risk, but it's bigger risk if you add 10 big records in every single GET request.

Security for one. What happens if a web crawler comes across a delete link, or a user is tricked into clicking a hyperlink? A user should know what they're doing before they actually do it.

I'm kind of looking for more than just "it doesn't make sense semantically" or "it makes things ambiguous."
...
I don't care what The Right Way to do things is, it's easier for us
Tell them to think of the worst API they've ever used. Can they not imagine how that was caused by a quick hack that got extended?
It will be easier (and cheaper) in 2 months if you start with something that makes sense semantically. We call it the "Right Way" because it makes things easier, not because we want to torture you.

Related

CSRF protection while making use of server side caching

Situation
There is a site at examp.le that costs a lot of CPU/RAM to generate and a more lean examp.le/backend that will perform various tasks to read, write and serve user-specific data for authenticated requests. A lot of resources could be saved by utilizing a server side cache on the examp.le site but not on examp.le/backend and just asynchronously grab all user-specific data from the backend once the page arrives at the client. (Total loading time may even be lower, despite the need of an additional request.)
Threat model
CSRF attacks. Assuming (maybe foolishly) that examp.le is reliably safeguarded against XSS code injection, we still need to consider scripts on malicious site exploit.me that cause the victims browser to run a request against examp.le/backend with their authorization cookies included automagically and cause the server to perform some kind of data mutation on behalf of the user.
Solution / problem with that
As far as I understand, the commonly used countermeasure is to include another token in the generated exampl.le page. The server can verify this token is linked to the current user's session and will only accept requests that can provide it. But I assume caching won't work very well if we are baking a random token into every response to examp.le..?
So then...
I see two possible solutions: One would be some sort of "hybrid caching" where each response to examp.le is still programmatically generated but that program is just merging small dynamic parts to some cached output. Wouldn't work with caching systems that work on the higher layers of the server stack, let alone a CDN, but still might have its merits. I don't know if there is a standard ways or libraries to do this, or more specifically if there are solutions for wordpress (which happens to be the culprit in my case).
The other (preferred) solution would be to get an initial anti-CSRF token directly from examp.le/backend. But I'm not quite clear in my understanding about the implications of that. If the script on exploit.me could somehow obtain that token, the whole mechanism would make no sense to begin with. The way I understand it, if we leave exploitable browser bugs and security holes out of the picture and consider only requests coming from a non-obscure browser visiting exploit.me, then the HTTP_ORIGIN header can be absolutely trusted to be tamper proof. Is that correct? But then that begs the question: wouldn't we get mostly the same amount of security in this scenario by only checking authentication cookie and origin header, without throwing tokens back and forth?
I'm sorry if this question feels a bit all over the place, but I'm partly still in the process of getting the whole picture clear ;-)

First of all: Cross-Site Scripting (XSS) and Cross-Site Request Forgery (CSRF) are two different categories of attacks. I assume, you meant to tackle CSRF problem only.
Second of all: it's crucial to understand what CSRF is about. Consider following.
A POST request to exampl.le/backend changes some kind of crucial data
The request to exampl.le/backend is protected by authentication mechanisms, which generate valid session cookies.
I want to attack you. I do it by sending you a link to a page I have forged at cats.com\best_cats_evr.
If you are logged in to exampl.le in one browser tab and you open cats.com\best_cats_evr in another, the code will be executed.
The code on the site cats.com\best_cats_evr will send a POST request to exampl.le/backend. The cookies will be attached, as there is not reason why they should not. You will perform a change on exampl.le/backend without knowing it.
So, having said that, how can we prevent such attacks?
The CSRF case is very well known to the community and it makes little sense for me to write everything down myself. Please check the OWASP CSRF Prevention Cheat Sheet, as it is one of the best pages you can find in this topic.
And yes, checking the origin would help in this scenario. But checking the origin will not help, if I find XSS vulnerability in exampl.le/somewhere_else and use it against you.
What would also help would be not using POST requests (as they can be manipulated without origin checks), but use e.g. PUT where CORS should help... But this quickly turns out to be too much of rocket science for the dev team to handle and sticking to good old anti-CSRF tokens (supported by default in every framework) should help.

Why is using a HTTP GET to update state on the server in a RESTful call incorrect?

OK, I know already all the reasons on paper why I should not use a HTTP GET when making a RESTful call to update the state of something on the server. Thus returning possibly different data each time. And I know this is wrong for the following 'on paper' reasons:
HTTP GET calls should be idempotent
N > 0 calls should always GET the same data back
Violates HTTP spec
HTTP GET call is typically read-only
And I am sure there are more reasons. But I need a concrete simple example for justification other than "Well, that violates the HTTP Spec!". ...or at least I am hoping for one. I have also already read the following which are more along the lines of the list above: Does it violate the RESTful when I write stuff to the server on a GET call? &
HTTP POST with URL query parameters -- good idea or not?
For example, can someone justify the above and why it is wrong/bad practice/incorrect to use a HTTP GET say with the following RESTful call
"MyRESTService/GetCurrentRecords?UpdateRecordID=5&AddToTotalAmount=10"
I know it's wrong, but hopefully it will help provide an example to answer my original question. So the above would update recordID = 5 with AddToTotalAmount = 10 and then return the updated records. I know a POST should be used, but let's say I did use a GET.
How exactly and to answer my question does or can this cause an actual problem? Other than all the violations from the above bullet list, how can using a HTTP GET to do the above cause some real issue? Too many times I come into a scenario where I can justify things with "Because the doc said so", but I need justification and a better understanding on this one.
Thanks!

The practical case where you will have a problem is that the HTTP GET is often retried in the event of a failure by the HTTP implementation. So you can in real life get situations where the same GET is received multiple times by the server. If your update is idempotent (which yours is), then there will be no problem, but if it's not idempotent (like adding some value to an amount for example), then you could get multiple (undesired) updates.
HTTP POST is never retried, so you would never have this problem.

If some form of search engine spiders your site it could change your data unintentionally.
This happened in the past with Google's Desktop Search that caused people to lose data because people had implemented delete operations as GETs.

Here is an important reason that GETs should be idempotent and not be used for updating state on the server in regards to Cross Site Request Forgery Attacks. From the book: Professional ASP.NET MVC 3
Idempotent GETs
Big word, for sure — but it’s a simple concept. If an
operation is idempotent, it can be executed multiple times without
changing the result. In general, a good rule of thumb is that you can
prevent a whole class of CSRF attacks by only changing things in your
DB or on your site by using POST. This means Registration, Logout,
Login, and so forth. At the very least, this limits the confused
deputy attacks somewhat.

One more problem is there. If GET method is used , data is sent in the URL itself . In web server's logs , this data gets saved somewhere in the server along with the request path. Now suppose that if someone has access to/reads those log files , your data (can be user id , passwords , key words , tokens etc. ) gets revealed . This is dangerous and has to be taken care of .
In server's log file, headers and body are not logged but request path is . So , in POST method where data is sent in body, not in request path, your data remains safe .

i think that reading this resource:
http://www.servicedesignpatterns.com/WebServiceAPIStyles could be helpful to you to make difference between message API and resource api ?

Is it dangerous if a web resource POSTs to itself?

While reading some articles about writing web servers using Twisted, I came across this page that includes the following statement:
While it's convenient for this example, it's often not a good idea to
make a resource that POSTs to itself; this isn't about Twisted Web,
but the nature of HTTP in general; if you do this, make sure you
understand the possible negative consequences.
In the example discussed in the article, the resource is a web resource retrieved using a GET request.
My question is, what are the possible negative consequences that can arrive from having a resource POST to itself? I am only concerned about the aspects related to the HTTP protocol, so please ignore the fact that I mentioned about Twisted.

The POST verb is used for making a new resource in a collection.
This means that POSTing to a resource has no direct meaning (POST endpoints should always be collections, not resources).
If you want to update your resource, you should PUT to it.
Sometimes, you do not know if you want to update or create the resource (maybe you've created it locally and want to create-or-update it). I think that in that case, the PUT verb is more appropriate because POST really means "I want to create something new".

There's nothing inherently wrong about a page POSTing back to itself - in fact, many of the widely-used frameworks (ASP.NET, etc.) use that method to handle various events that happen on the client - some data is posted back to the same page where the server processes it and sends a new reponse.

When should one use GET instead of POST in a web application?

It seems that sticking to POST is the way to go because it results in clean looking URLs. GET seems to create long confusing URLs. POST is also better in terms of security. Good for protecting passwords in forms. In fact I hear that many developers only use POST for forms. I have also heard that many developers never really use GET at all.
So why and in what situation would one use GET if POST has these 2 advantages?
What benefit does GET have over POST?

you are correct, however it can be better to use gets for search pages and such. Places where you WANT the URL's to be obvious and discoverable. If you look at Google's (or any search page), it puts a www.google.com/?q=my+search at the end so people could link directly to the search.
You actually use GET much more than you think. Simply returning the web page is a GET request. There are also POST, PUT, DELETE, HEAD, OPTIONS and these are all used in RESTful programming interfaces.
GET vs. POST has no implications on security, they are both insecure unless you use HTTP/SSL.

Check the manual, I'm surprised that nobody has pointed out that GET and POST are semantically different and intended for quite different purposes.
While it may appear in a lot of cases that there is no functional difference between the 2 approaches, until you've tested every browser, proxy and server combination you won't be able to rely on that being a consistent in every case. e.g. mobile devices / proxies often cache aggressivley even where they are requested not to (but I've never come across one which incorrectly caches a POST response).
The protocol does not allow for anything other than simple, scalar datatypes as parameters in a GET - e.g. you can only send a file using POST or PUT.
There are also implementation constraints - last time I checked, the size of a URL was limited to around 2k in MSIE.
Finally, as you've noted, there's the issue of data visibility - you may not want to allow users to bookmark a URL containing their credit card number / password.
POST is the way to go because it results in clean looking URLs
That rather defeats the purpose of what a URL is all about. Read RFC 1630 - The Need For a Universal Syntax.

Sometimes you want your web application to be discoverable as in users can just about guess what a URL should be for a certain operation. It gives a nicer user experience and for this you would use GET and base your URLs on some sort of RESTful specification like http://microformats.org/wiki/rest/urls

If by 'web application' you mean 'website', as a developer you don't really have any choice. It's not you as a developer that makes the GET or POST requests, it's your user. They make the requests via their web browser.
When you request a web page by typing its URL into the address bar of the browser (or clicking a link, etc), the browser issues a GET request.
When you submit a web page using a button, you make a POST request.
In a GET request, additional data is sent in the query string. For example, the URL www.mysite.com?user=david&password=fish sends the two bits of data 'user' and 'password'.
In a POST request, the values in the form's controls (e.g. text boxes etc) are sent. This isn't visible in the address bar, but it's completely visible to anyone viewing your web traffic.
Both GET and POST are completely insecure unless SSL is used (e.g. web addresses beginning https).

Prioritise ASP.NET requests

I would be surprised if this is possible, but you never know.
Is there a way in which I could prioritise ASP.NET requests? For example, if the request is a NEW request (coming from Location X) I would like it to take priority over a request coming from a known location.
This will be running under IIS 7 so can I make use of the integrated pipeline to pre-process requests before they take threads out the ThreadPool?
Hmmm. Any feedback welcomed, even if it's to say No!
Thanks
Duncan

I don't think what you're after is possible in the truest sense of what you're asking for, but it might be possible to 'simulate' what you're after at the application level. John's right, they're processed first come, first served. But you might be able to give some kind of priority to your web application by setting a cookie for all visitors, and checking if that cookie is present before you render your homepage. If it is not present, you could assume that the request is new and therefore continue to render your homepage (or whatever). If it is present, you might choose to redirect them to another page (or perhaps a cached copy of your page).
Like I said, this isn't the 'truest' sense of what you are after, but if your homepage is particulary process intensive right now, and you want some way to separate recurring visitors from new visitors, this might do the trick.
Since you've asked, though - I'd have to ask you why it is necessary in your implementation to prioritise requests as you have mentioned. Is load on your web server a problem, and you want to appear more responsive to new customers?
Just hazarding a guess - interesting question, though! :)
Best,
Richard.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex