Should I stop redirecting after successful POST or PUT requests? - http

It seems common in the Rails community, at least, to respond to successful POST, PUT or DELETE requests by redirecting instead of returning success. For instance, if I PUT a legal change to my user profile, the idiomatic response would be a 302 Redirect to the profile page.
Isn't this wrong? Shouldn't we be returning 200 OK from the request? Or a 201 Created, in the case of a POST request? Either of those, in the HTTP/1.1 Status Definitions are allowed to (or required to) include a response, anyway.
I guess I'm wondering, before I go and "fix" my application, whether there is there a darn good reason why the community has gone the way of redirects instead of successful responses.

I'll assume, your use of the PUT verb notwithstanding, that you're talking about a web app that will be accessed primarily through the browser. In that case, the usual reason for following up a POST with a redirect is the post-redirect-get pattern, which avoids duplicate requests caused by a user refreshing or using the back and forward controls of their browser. It seems that in many instances this pattern is overloaded by redirecting not to a success page, but to the next most likely place the user would visit. I don't think either way you mention is necessarily wrong, but doing the redirect may be more user-friendly at the expense of not strictly adhering to the semantics of HTTP.

It's called the POST-Redirect-GET (PRG) pattern. This pattern will prevent clients from (accidently) re-executing non-idempotent requests when for example navigating forth and back in browser's history.
It's a good general web development practice which doesn't only apply on RoR. I'd just keep it as is.

In a perfect world, yes, probably. However HTTP clients and servers are a mess when it comes to standardization and don't always agree on proper protocol. Redirecting after a post helps avoid things like duplicate form submissions.

Related

Serving 404 directly

So I have an Nginx server set up which is supposed to redirect all http to https (and non-www to www) using 4 server blocks.
The issue is that any 404 or non existent http URL first get a 301 redirect to what could have been an https version if it hypothetically existed (hence creating an extra URL and redirect).
See example:
1) http://example.com/thisurldoesntexit
301 Redirect
2) https://example.com/thisurldoesntexit
404
3) https://example.com/notfound
Is there a way to redirect user directly to a https 404 (URL 3)?
First of all, as already been pointed out, doing a 301 redirect from a non-existent page to a single /notfound moniker, is a really bad practice, and is likely against the RFCs.
What if the user simply mistyped a single character of a long URL? Modern browsers make it non-trivial to go back to what has been typed in order to correct it. The user would have to decide whether your site is worth a retyping from scratch, or whether your competitor might have thought of a better experience.
What if the user simply followed a broken link, which is broken in a very obvious way, and could be easily fixed? E.g., http://www.example.org/www.example.com/page, where an absolute URL was mistyped by the creator to be a relative one, or maybe a URI like /page.html., with an extra dot in the end. Likewise, you'll be totally confusing the user with what's going on, and offering a terrible user experience, where if left alone, the URL could easily have been corrected promptly.
But, more importantly, what real problem are you actually trying to solve?!
For better or worse, it's a pretty common practice to indiscriminately redirect from http to https scheme, without an account of whether a given page may or may not exist. In fact, if you employ HSTS, then content served over http effectively becomes meaningless; the browser with a policy would never even be requesting anything over http from there on out.
Undoubtedly, in order to know whether or not a given page exists, you must consult with the backend. As such, you might as well do the redirect from http to https from within your backend; but it'll likely tie up your valuable server resources for little to no extra benefit.
Moreover, the presence or absence of the page may be dictated by the contents of the cookies. As such, if you require that your backend must discern whether a page does or does not exist for an http request, then you'll effectively be leaking private information that was meant to be protected by https in the first place. (In turn, if your site has no such private information, then maybe you shouldn't be using https in the first place.)
So, overall, the whole approach is just a REALLY, REALLY bad idea!
Consider instead:
Do NOT do a 301 redirect from all non-existent pages to a single /notfound page. Very bad practice, very bad UX.
It is totally OK to do an indiscriminate redirect from http to https, without accounting for whether or not the page exists. In fact, it's not only okay, but it's the way God intended, because an adversary should not be capable of discerning whether or not a given page exists for an https-based site, so, if you do find and implement a solution for your "problem", then you'll effectively create a security vulnerability and a data leak.
Use https://www.drupal.org/project/fast_404 module for serving 404 pages directly without much overload.
I'd suggest redirecting to a 404 page is a poor choice, and you should instead serve the 404 on the incorrect URL.
My reasons for stating this are:
By redirecting away from the page, you are issuing headers that implicitly say "The content does not exist on this URL, but it does over here". I'm not sure how the various search engines would react to being redirected to a 404
I can speak from my own experience as a user when I say that having the URL change on me when I've mis-typed by a single character can be very frustrating. I then need to spend the time to type out the entire URL again.
You can avoid having logic in your .htaccess file or whatever to judge a page as a 404. This will greatly simplify your initial logic (which by-the-by gets computed on every single page load) - and will remove far more redirects than just the odd one of http://badurl to https://badurl to https://404

What happens if a 302 URI can't be found?

If I make an HTTP request to get index.html on http://www.example.com but that URL has a 302 re-direct in place that points to http://www.foo.com/index.html, what happens if the redirect target (http://www.foo.com/index.html) isn't available? Will the user agent try the original URL (http://www.example.com/index.html) or just return an error?
Background to the question: I manage a legacy site that supports a few existing customers but doesn't allow new signs ups. Pretty much all the pages are redirected (using 302s rather than 301s for some unknown reason...) to a newer site. This includes the sign up page. On one of the pages that isn't redirected there is still a link to the sign up page which itself links through to a third party payment page (i.e. on another domain). Last week our current site went down for a couple of hours and in that period someone successfully signed up through the old site. The only way I can imagine this happened is that if a 302 doesn't find its intended URL some (all?) user agents bypass the redirect and then go to originally requested URL.
By the way, I'm aware there are many better ways to handle the particular situation we're in with the two sites. We're on it! This is just one of those weird situations I want to get to the bottom of.
You should receive a 404 Not Found status code.
Since HTTP is a stateless protocol, there is no real connection between two requests of a user agent. The redirection status codes are just a way for servers to politely tell their clients that the resource they were looking for is somewhere else now. The clients, however, are in no way obliged to actually request the resource from that other URL.
Oh, the signup page is at that URL now? Well then I don't want it anymore... I'll go and look at some kittens instead.
Moreover, even if the client decides to do request the new URL (which it usually does ^^), this can be considered as a completely new communication between server and client. Neither server nor client should remember that there was a previous request which resulted in a redirection status code. Instead, the current request should be treated as if it was the first (and only) request. And what happens when you request a URL that cannot be found? You get a 404 Not Found status code.

How to work around POST being changed to GET on 302 redirect?

Some parts of my website are only accessible via HTTPS (not whole website - security vs performance compromise) and that HTTPS is enforced with a 302 redirect on requests to the secure part if they are sent over plain HTTP.
The problem is for all major browsers if you do a 302 redirect on POST it will be automatically switched to GET (afaik this should only happen on 303, but nobody seems to care). Additional issue is that all POST data is lost.
So what are my options here other than accepting POSTs to secure site over HTTP and redirecting afterwards or changing loads of code to make sure all posts to secure part of website go over HTTPS from the beginning?
You are right, this is the only reliable way. The POST request should go over https connection from the very beginning. Moreover, It is recommended that the form, that leads to such POST is also loaded over https. Usually the first form after that you have the https connection is a login form. All browsers applying different security restrictions to the pages loaded over http and over https. So, this lowers the risk to execute some malicious script in context that own some sensible data.
I think that's what 307 is for. RFC2616 does say:
If the 307 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.
but it says the same thing about 302 and we know what happens there.
Unfortunately, you have a bigger problem than browsers not dealing with response codes the way the RFC's say, and that has to do with how HTTP works. Simplified, the process looks like this:
The browser sends the request
The browser indicates it has sent the entire request
The server sends the response
Presumably your users are sending some sensitive information in their post and this is why you want them to use encryption. However, if you send a redirect response (step 3) to the user's unencrypted POST (step 1), the user has already sent all of the sensitive information out unencrypted.
It could be that you don't consider the information the user sends that sensitive, and only consider the response that you send to be sensitive. However, this turns out not to make sense. Sensitive information should be available only to certain individuals, and the information used to authenticate the user is necessarily part of the request, which means your response is now available to anyone. So, if the response is sensitive, the request is sensitive as well.
It seems that you are going to want to change lots of code to make sure all secure posts use HTTPS (you probably should have written them that way in the first place). You might also want to reconsider your decision to only host some of your website on HTTPS. Are you sure your infrastructure can't handle using all HTTPS connections? I suspect that it can. If not, it's probably time for an upgrade.

Web security -- HTTP-Location = HTTP-Referrer if outside domain? Why?

What is the point of doing this?
I want a reason why it's a good idea to send a person back to where they came from if the referrer is outside of the domain. I want to know why a handful of websites out there insist that this is good practice. It's easily exploitable, easily bypassed by anyone who's logging in with malicious intent, and just glares in my face as a useless "security" measure. I don't like to have my biased opinions on things without other input, so explain this one to me.
The request headers are only as trustworthy as your client, why would you use them as a means of validation?
There are three reasons why someone might want to do this. Checking the referer is a method of CSRF Prevention. A site may not want people to link to sensitive content and thus use this to bounce the browser back. It may also be to prevent spiders from accessing content that the publisher wishes to restrict.
I agree it is easy to bypass this referer restriction on your own browser using something like TamperData. It should also be noted that the browser's http request will not contain a referer if your coming from an https:// page going to an http:// page.

Do I need to use http redirect code 302 or 307?

Suppose I have a page on my website to show media releases for the current month
http://www.mysite.com/mediareleases.aspx
And for reasons which it's mundane to go into*, this page MUST be given a query string with the current day of the month in order to produce this list:
http://www.mysite.com/mediareleases.aspx?prevDays=18
As such I need to redirect clients requesting http://www.mysite.com/mediareleases.aspx to http://www.mysite.com/mediareleases.aspx?prevDays=whateverDayOfTheMonthItIs
My question is, if I want google to index the page without the query parameter, should I use status code 302 or 307 to perform the redirect?
Both indicate that the page has "temporarily" moved - which is what I want because the page "moves" every day if you get my meaning.
[*] I'm using a feature of a closed-source .NET CMS so my hands are tied.
Google's documentation seems to indicate that both 302 and 307 are treated equivalently, and that "Googlebot will continue to crawl and index the original location."
But in the face of ambiguity, you might as well dig into the RFCs and try to do the Right Thing, with the naïve hope that the crawlers will do the same. In this case, RFC 2616 § 10.3 contains nearly identical definitions for each response code, with one exception:
302: Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.
307: Since the redirection MAY be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.
Which does not strike me as a significant distinction. My reading is that 302 instructs clients that webmasters are untrustworthy, and 307 explicitly tells webmasters that clients will not trust them, so they may freely alter the redirect.
I think the more telling point is the note in 302's definition:
Note: RFC 1945 and RFC 2068 specify that the client is not allowed to change the method on the redirected request. However, most existing user agent implementations treat 302 as if it were a 303 response, performing a GET on the Location field-value regardless of the original request method. The status codes 303 and 307 have been added for servers that wish to make unambiguously clear which kind of reaction is expected of the client.
Which, to me, indicates that 302 and 307 are largely equivalent, but that HTTP/1.0 clients failed to implement 302 correctly the first time around.
Short answer: neither. In most cases the code you really want to use is 303.
For the long answer, first we need some background.
When getting a redirect code the client can (A) load the new location using the same request type or (B) it can overwrite it and use GET.
The HTTP 1.0 spec did not have 303 and 307, it only had 302, which mandated the (A) behavior. But in practice it was discovered that (A) led to a problem with submitted forms.
Say you have a contact form, the visitor fills it and submits it and the client gets a 302 to a page saying "thanks, we'll get back to you". The form was sent using POST so the thanks page is also loaded using POST. Now suppose the visitor hits reload; the request is resent the same way it was obtained the first time, which is with a POST (and the same payload in the body). End result: the form gets submitted twice (and once more for every reload). Even if the client asks the user for confirmation before doing that, it's still annoying in most cases.
This problem became so prevalent that client producers decided to override the spec and issue GET requests for the redirected location. Basically, it was an oversight in the HTTP 1.0 spec. What clients needed most was a 303 (and behavior (B) above), but instead they only got 302 (and (A)).
If HTTP 1.0 would have offered both 302 and 303 there would have been no problem. But it didn't, so it resulted in a 302 which nobody used correctly. So HTTP 1.1 added 303 (badly needed) but also decided to add 307, which is technically identical to 302, but is a sort of "explicit 302"; it says "yeah, I know the issues surrounding 302, I know what I'm doing, give me behavior (A)".
Now, back to our question. You see now why in most cases you will want 303.
Cases where you want to preserve the request type are very rare. And if you do find yourself such a case, the answer is simple: use 302. Either the client speaks HTTP 1.0, in which case it can't understand 307; or it speaks HTTP 1.1, which means it has no reason to preserve the rebelious behavior of old clients ie. it implements 302 correctly, so use it!
5 years on... note that the behaviour of 307 has been updated by RFC-7231#6.4.7 in June 2014, and is now significantly different from a 302, in that the method may not change:
The 307 (Temporary Redirect) status code indicates that the target
resource resides temporarily under a different URI and the user agent
MUST NOT change the request method if it performs an automatic
redirection to that URI.
Probably not an issue for the original question, but may be relevant to others who come across this question just looking for the difference.
I feel your pain. As for a solution, it's hard to say what search engines will do. It seems that each one has its own way of handling redirects. This link suggests that a 302 will index the contents of the redirected page but still use the main page link, but it's not clear what a 307 will do.
Another way you could consider proceeding is with a javascript redirect and a <noscript> tag explaining what's going on. That will also foul up non-javascript browsers, and you'd have to proceed with caution to avoid Google's sneaky-site detection routine, but I suspect that as long as your noscript contains a hyperlink that matches the new URL you'd be OK.
Either way I'd still pursue doing a purely server-side request if at all possible. Heck, if your expected traffic is light, you could treat your home page as a proxy in the case where there's no querystring. Have it use a background thread to request itself with the querystring and pipe out the results. :-)
edit just saw you're using .NET. Maybe consider this answer from SO: C# Can i modify Request.Form's variables? .

Resources