Update database with GET requests? - http

I've read online that you shouldn't update your database with GET requests for the following reasons:
GET request is idempotent and safe
violates HTTP spec
should always read data from a server-database
Let's say that we have build a URL Shortener service. When someone clicks on the link or paste it at the browser's address bar it will be a GET request.
So, if I want to update, on my database, the stats of a shortened link every time it's been clicked, how can I do this if GET requests are idempotent?
The only way I can think of is by calling a PUT request inside the server's side code that handles the GET request.
Is this a good practice or is there a better way to do this?

It seems as you're mixing up a few things here.
While you shouldn't use GET requests to transfer sensitive data (as it is shown in the request URL and most likely logged somewhere in between), there is nothing wrong with using them in your use case. You are only updating a variable serverside.
Just keep in mind that the request parameter are stored in the URL when using GET requests and you should be ok.

Related

How can I communicate to the user's browser that a POST request it made is side-effect-free?

I have to add a page to my website that will be accessed via a POST request. The request is side-effect-free, hence it is safe for the user to use their browser's "Refresh" button on the page. The reason why it has to be POST and not GET is that the volume of data needed to characterise the request is large (it includes a collection of arbitrarily many GUIDs identifying resources to be operated upon at a later stage in the process).
When the user of a browser refreshes a page that was the result of a POST request, the browser will typically warn them that the form will be resubmitted and may cause an action to be repeated. This is not a concern in this case, because as I said, the action of requesting this page is side-effect-free. I therefore want to communicate to the user's browser that no such warning should be presented to the user if they use the "Refresh" function. How can I do this?
You cannot prevent the browser from warning the user about resubmitting a POST request.
References
Mozzila forums (Firefox predecessor) discussed the feature extensively starting in 2002. Some discussion of other browsers also occurs. It is clear that the decision was made to enforce the feature and although workarounds were suggested, they were not taken up.
Google Chrome (2008) and other subsequent browsers also included the feature.
The reasons for this related to the difference between GET and POST in rfc2616: Hypertext Transfer Protocol -- HTTP/1.1 (1999).
GET
retrieve whatever information is identified by the Request-URI
POST
request that the origin server accept the
entity enclosed in the request as a new subordinate of the resource
This infers that whilst a GET request only retrieves data, a POST request modifies the data in some way. As per the discussion on the Mozilla forum, the decision was that enabling the warning to be disabled created more risks for the user than the inconvenience of keeping it.
Solutions
Instead a solution is to use sessions to store the data in the POST request and redirect the user with a GET request to a URL that looks in the session data to find the original request parameters.
Assuming the server side application has session support and it's enabled.
User submits POST request with data that generates a specific result POST /results
Server stores that data in the session with a known key
Server responds with a 302 Redirect to a chosen URL (Could be the same one)
The client will request the new page with a GET request GET /results
Server identifies the incoming GET request is asking for the results of previous POST request and retrieves the data from the session using the known key.
If the user refreshes the page then steps 4 & 5 are repeated.
To make the solution more robust, the POST data could be assigned to a unique key that is passed as part of the path or query in the 302 redirect GET /results?set=1. This would enable multiple different pages to be viewed and refreshed, for example in different browser tabs. Consideration must be given to ensuring that the unique key is valid and does not allow access to other session data.
Some systems, Kibana, Grafana, pastebin.com and many others go one step further. The POST request values are stored in a persistent data store and a unique short URL is provided to the user. The short URL can be used in GET requests and shared with other users to view the same result of what was originally a POST request.
You can solve this problem by implementing the Post/Redirect/Get pattern.
You typically get a browser warning when trying to re-send a POST request for security reasons. Consider a form where you input personal data to register an account or order a product. If you would double-send your data it might happen that you register twice or buy the same thing two times (of course, this is just a theoretical example). Thus, the user should get warned when trying to send the same POST request several times. This behaviour is intended and cannot be disabled but avoided by using the aforementioned PRG pattern.
Image from Wikipedia (published under LGPL).
In simple words, this pattern can be used to avoid double submissions of form data that could possibly cause undesired results. You have to configure your server to redirect the affected incoming POST requests using the status code 303 ("see other"). Then the user will be redirected (using a GET request) to a confirmation page, showing that the request has been successful and will now be processed. If the user now reloads the page, he / she will be redirected to the same page without re-submitting the POST request.
However, this strategy might not always work. In case the server did not receive the first submission yet (because of traffic for instance), if the user now re-submits the second POST request could still be sent.
If you provide more information on your tech stack, I can expand my answer by adding specific code samples.
You can't prevent all browsers from showing this "Are you sure you want to re-submit this form?" popup when the user refreshes a page that is the result of a POST request. So you will have to turn this POST request into a GET request if you want to prevent this popup when your users hit F5 on that page.
And for a search form, which you kind of admitted this was for, turning a POST into a GET has its own problems.
For starters, are you sure you need POST to begin with? Is the data really too large to fit in the query string? Taking a reasonable limit of 1024 characters, being around 30 GUIDs (give or take some space for repeated &q=), why do you need the search parameters to be GUIDs to begin with? If you can map them or look them up somehow, you could perhaps limit the size of each parameter to a handful of characters instead of 32 for a non-dashed GUID, and with 5 characters per key you could suddenly fit 200 parameters in the query string.
Still not enough? Then you need a POST indeed.
One approach, mentioned in comments, is using AJAX, so your search form doesn't actually submit, but instead it sends the query data in the background through a JavaScript HTTP POST request and updates the page with the results. This has the benefit that refreshing the page doesn't prompt, as there's only a GET as far as the browser is concerned, but there's one drawback: search results don't get a unique URL, so you can't cache, bookmark or share them.
If you don't care about caching or URL bookmarking, then AJAX definitely is the simplest option here and you need to read no further.
For all non-AJAX approaches, you need to persist the query parameters somewhere, enabling a Post/Redirect/Get pattern. This patterns ends up with a page that is the result of a GET request, which users can refresh without said popup. What the other answers are being quite handwavy about, is how to properly do this.
Options are:
Serverside session
When POSTing to the server, you can let the server persist the query parameters in the session (all major serverside frameworks enable you to use sessions), then redirect the user to a generic /search-results page, which on the server side reads the data from the session and presents the user with the results built from querying the database combined with the query parameters from the session.
Drawbacks:
Sessions generally time out, and they do so for good reasons. If your user hits F5 after, say, 20 minutes, their session data is gone, and so are their query parameters.
Sessions are shared between browser tabs. If your user is searching for thing A on tab 1, and for thing B on tab 2, the parameters of the tab that's submitted latest, will overwrite the earlier tabs when those are refreshed.
Sessions are per browser. There's generally no trivial way to share sessions (apart from putting the session ID in the URL, but see the first bullet), so you can't bookmark or share your search results.
Local storage / cookies
You could think "but cookies can contain more data than the query string", but just no. Apart from having a limit as well, they're also shared between tabs and can't be (easily) shared between users and not bookmarked.
Local storage also isn't an option, because while that can contain way more data - it doesn't get sent to the server. It's local storage.
Serverside persistent storage
If your search queries actually are that complex that you need multiple KB of query parameters, then you could probably benefit from persisting the query parameters in a database.
So for each search request, you create a new search_query database record that contains the appropriate parameters for the query-to-execute, and, given search results aren't private, you could even write some code that looks up whether the given parameter combination has been used before and first perform a lookup.
So you get a unique search_id that points to a set of parameters with which you can perform a query. Now you can redirect your user, so they perform a GET request to this page:
/search-results?search_id=Xxx
And there you render the results for the given query. Benefits:
You can cache, bookmark and share the URL /search-results?search_id=Xxx
You can refresh the page displaying the search results without an annoying popup
Each browser tab displays its own search results
Of course this approach also has drawbacks:
Unless you use a unguessable key for search_id, users can enumerate earlier searches by other users
Each search costs permanent serverside storage, unless you decide to evict earlier searches based on some criteria

Why is using a HTTP GET to update state on the server in a RESTful call incorrect?

OK, I know already all the reasons on paper why I should not use a HTTP GET when making a RESTful call to update the state of something on the server. Thus returning possibly different data each time. And I know this is wrong for the following 'on paper' reasons:
HTTP GET calls should be idempotent
N > 0 calls should always GET the same data back
Violates HTTP spec
HTTP GET call is typically read-only
And I am sure there are more reasons. But I need a concrete simple example for justification other than "Well, that violates the HTTP Spec!". ...or at least I am hoping for one. I have also already read the following which are more along the lines of the list above: Does it violate the RESTful when I write stuff to the server on a GET call? &
HTTP POST with URL query parameters -- good idea or not?
For example, can someone justify the above and why it is wrong/bad practice/incorrect to use a HTTP GET say with the following RESTful call
"MyRESTService/GetCurrentRecords?UpdateRecordID=5&AddToTotalAmount=10"
I know it's wrong, but hopefully it will help provide an example to answer my original question. So the above would update recordID = 5 with AddToTotalAmount = 10 and then return the updated records. I know a POST should be used, but let's say I did use a GET.
How exactly and to answer my question does or can this cause an actual problem? Other than all the violations from the above bullet list, how can using a HTTP GET to do the above cause some real issue? Too many times I come into a scenario where I can justify things with "Because the doc said so", but I need justification and a better understanding on this one.
Thanks!
The practical case where you will have a problem is that the HTTP GET is often retried in the event of a failure by the HTTP implementation. So you can in real life get situations where the same GET is received multiple times by the server. If your update is idempotent (which yours is), then there will be no problem, but if it's not idempotent (like adding some value to an amount for example), then you could get multiple (undesired) updates.
HTTP POST is never retried, so you would never have this problem.
If some form of search engine spiders your site it could change your data unintentionally.
This happened in the past with Google's Desktop Search that caused people to lose data because people had implemented delete operations as GETs.
Here is an important reason that GETs should be idempotent and not be used for updating state on the server in regards to Cross Site Request Forgery Attacks. From the book: Professional ASP.NET MVC 3
Idempotent GETs
Big word, for sure — but it’s a simple concept. If an
operation is idempotent, it can be executed multiple times without
changing the result. In general, a good rule of thumb is that you can
prevent a whole class of CSRF attacks by only changing things in your
DB or on your site by using POST. This means Registration, Logout,
Login, and so forth. At the very least, this limits the confused
deputy attacks somewhat.
One more problem is there. If GET method is used , data is sent in the URL itself . In web server's logs , this data gets saved somewhere in the server along with the request path. Now suppose that if someone has access to/reads those log files , your data (can be user id , passwords , key words , tokens etc. ) gets revealed . This is dangerous and has to be taken care of .
In server's log file, headers and body are not logged but request path is . So , in POST method where data is sent in body, not in request path, your data remains safe .
i think that reading this resource:
http://www.servicedesignpatterns.com/WebServiceAPIStyles could be helpful to you to make difference between message API and resource api ?

Using GET for a non-idempotent request

Simply put, I have a website where you can sign up as a user and add data. Currently it only makes sense to add specific data once, so an addition should be idempotent, but theoretically you could add the same data multiple times. I won't get into that here.
According to RFC 2616, GET requests should be idempotent (really nullipotent). I want users to be able to do something like visit
http://example.com/<username>/add/?data=1
And this would add that data. It would make sense to have a PUT request do this with REST, but I have no idea how to make a PUT request with a browser and I highly doubt most people do or would want to bother to. Even using POST would be appropriate, but this has a similar problem.
Is there some technically correct way to allow users to add data using only GET (e.g. by visiting the link manually, or allowing external websites to use the link). When they visit this page I could make my own POST/PUT request either with javascript or cURL, but this still seems to violate the spirit of idempotent GET requests.
Is there some technically correct way to allow users to add data using
only GET ... ?
No matter how you go about letting clients access it, you'll end up violating RFC2616. It's ultimately up to you how you handle requests (nothing's going to stop you from doing this), but keep in mind that if you go against the HTTP specification, you might cause unexpected side-effects to clients and proxies who do abide by it.
Also see: Why shouldn't data be modified on an HTTP GET request?
As far as not being able to PUT from the browser, there are workarounds for that [1], [2], most of which use POST but also pass some sort of _method request parameter that's intercepted by the server and routes to the appropriate server-side action.

Is it okay to set a cookie with a HTTP GET request?

This might be a bit of an ethical question, but I'm having quite a discussion in the office about the following issue:
Is it okay to set a cookie with a HTTP GET request? Because whenever a HTTP request changes something in the application, you should use a POST request. HTTP GET should only be used to retrieve data identified by the Request-URI.
In this case, the application doesn't change, but because the cookie is altered, the user might get a different experience when the page loads again, meaning that the HTTP GET request changed the application behaviour (nothing changed server-side though).
Get request reference
The discussion started because we want to use a normal anchor element to set a cookie.
The problem with GETs, especially if they are on an a tag, is when they get spidered by the likes of Google.
In your case, you'd needlessly be creating cookies that will, more than likely, never get used.
I'd also argue that the GET rule it's not really about changing the application, more about changing data. I appreciate the subtle distinction with the cookie ( i.e. you are not changing data on YOUR system ), but generally, it's a good rule to have, and irrespective of where the data is stored, GET shouldn't really be used to change it.
The user can always have different experience when he issues another GET request - you do not expect to return always the same set of data for (imagined) time service: "GET /time/current".
Also, it is not said you are not allowed to change server-side state in response for GET requests - it's perfectly 'legal' to increase a page hit counter, for example, even if you store it in the database.
Consider the section 9.1.1 Safe Methods
Naturally, it is not possible to ensure that the server does not
generate side-effects as a result of performing a GET request; in
fact, some dynamic resources consider that a feature. The important
distinction here is that the user did not request the side-effects, so
therefore cannot be held accountable for them.
Also I would say it is perfectly acceptable to change or set a cookie in response for the GET request because you just return some data.

Stop Direct Page Calls to Ajax Pages

Is there a "clever" way of stopping direct page calls in ASP.NET? (Page functionality, not the page itself)
By clever, I mean not having to add in hashes between pages to stop AJAX pages being called directly. In a nutshell, this is stopping users from accessing the Ajax pages without it coming from one of your websites pages in a legitimate way. I understand that nothing is impossible to break, I am simply interested in seeing what other interesting methods there are.
If not, is there any way that one could do it without using sessions/cookies?
Have a look at this question: Differentiating Between an AJAX Call / Browser Request
The best answer from the above question is to check for a requested-by or custom header.
Ultimately, your web server is receiving requests (including headers) of what the client sends you - all data that can be spoofed. If a user is determined, then any request can look like an AJAX request.
I can't think of an elegant method to prevent this (there are inelegant and probably non-perfect methods whereby you provide a hash of some sort of request counter between ajax and non-ajax requests).
Can I ask why your application is so sensitive to "ajax" pages being called directly? Could you design around this?
You can check the Request headers to see if the call is initiated by AJAX Usually, you should find that x-requested-with has the value XMLHttpRequest. Or in the case of ASP.NET AJAX, check to see if ScriptMAnager.IsInAsyncPostBack == true. However, I'm not sure about preventing the request in the first place.
Have you looked into header authentication? If you only want your app to be able to make ajax calls to certain pages, you can require authentication for those pages...not sure if that helps you or not?
Basic Access Authentication
or the more secure
Digest Access Authentication
Another option would be to append some sort of identifier to your URL query string in your application before requesting the page, and have some sort of authentication method on the server side.
I don't think there is a way to do it without using a session. Even if you use an Http header, it is trivial for someone to create a request with the exact same headers.
Using session with ASP.NET Ajax requests is easy. You may run into some problems, like session expiration, but you should be able to find a solution.
With sessions you will be able to guarantee that only logged-in users can access the Ajax services. When servicing an Ajax request simply test that there is a valid session associated with it. Of course a logged-in user will be able to access the service directly. There is nothing you can do to avoid this.
If you are concerned that a logged-in user may try to contact the service directly in order to steal data, you can add a time limit to the service. For example do not allow the users to access the service more often than one minute at a time (or whatever rate else is needed for the application to work properly).
See what Google and Amazon are doing for their web services. They allow you to contact them directly (even providing APIs to do this), but they impose limits on how many requests you can make.
I do this in PHP by declaring a variable in a file that's included everywhere, and then check if that variable is set in the ajax call file.
This way, you can't directly call the file ever because that variable will never have been defined.
This is the "non-trivial" way, hence it's not too elegant.
The only real idea I can think of is to keep track of every link. (as in everything does a postback and then a response.redirect). In this way you could keep a static List<> or something of IP addresses(and possible browser ID and such) that say which pages are allowed to be accessed at the moment from that visitor.. along with a time out for them and such to keep them from going straight to a page 3 days from now.
I recommend rethinking your design to be sure that this is really needed though. And also note IPs and such can be spoofed.
Also if you follow this route be sure to read up about when static variables get disposed and such. You wouldn't want one of those annoying "your session has expired" messages when they have been using the site for 10 minutes.

Resources