For which 3xx HTTP codes is the Location header mandatory? - http

RFC 2616 defines the Location header as:
The Location response-header field is used to redirect the recipient to a location other than the Request-URI for completion of the request or identification of a new resource
...
For 3xx responses, the location SHOULD indicate the server's preferred URI for automatic redirection to the resource.
AFAIK, for 3xx Redirection codes, the Location header is:
300 Multiple Choices : optional
301 Moved Permanently : required
302 Found : required
303 See Other : required
304 Not Modified : irrelevant
305 Use Proxy : irrelevant (?)
306 Switch Proxy : irrelevant (?)
307 Temporary Redirect : required
308 Permanent Redirect : required
But that's just from personal experience. Is there a standard that defines which HTTP codes require the Location header to be sent?
That is, for which 3xx codes should an HTTP client throw an exception when received without a corresponding Location header?

This question has been asked back in the days when RFC 2616 has still been the authority, so it looked like a fun research project now that RFCs 7230 to 7235 are in place. So, let's see what we've got here.
The Location header is now defined in RFC 7231, section 7.1.2:
The "Location" header field is used in some responses to refer to a specific resource in relation to the response. The type of relationship is defined by the combination of request method and status code semantics.
[…]
For 201 (Created) responses, the Location value refers to the primary resource created by the request. For 3xx (Redirection) responses, the Location value refers to the preferred target resource for automatically redirecting the request.
The section does not confine this header solely to the 3xx-range of status codes. In fact, the only status codes explicitly being mentioned are 201 (Created) and 303 (See Other). No word about this header being actually required by any status code, though.
The purpose of the 3xx-range of codes is now described by RFC 7231, section 6.4:
The 3xx (Redirection) class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. If a Location header field is provided, the user agent MAY automatically redirect its request to the URI referenced by the Location field value, even if the specific status code is not understood.
The wording suggests that neither the presence nor automatically redirecting to its content is mandatory.
At the time of this writing, the IANA HTTP Status Code Registry is listing the codes 300 through 308 as registered. With one (305) being obsoleted and one being reserved (306), this is leaving seven active codes:
300: Multiple Choices – RFC 7231, section 6.4.1
The 300 code is to be returned if the server is aware of multiple representations of a resource. As of RFC 7231, there is no longer a recommended way to communicate a list of possible representations, though the Link header via RFC 5988 is being mentioned. Regarding the Location header, the RFC has this to say:
If the server has a preferred choice, the server SHOULD generate a Location header field containing a preferred choice's URI reference. The user agent MAY use the Location field value for automatic redirection.
Meaning the Location header is only to be used if the server has a preferred representation. If there is none, the server simply doesn't have such a preference.
It bears mentioning that the Location header by itself is unfit to list all possible representations as it is by its grammar a single-value field that cannot contain a list. Hence, the meaning of
Location: //example.com/a
Location: //example.com/b
is undefined.
301: Moved Permanently – RFC 7231, section 6.4.2
This response code is to let the client know there is an entirely new location for the requested resource: subsequent requests are to be directed at the location specified in the Location header.
The server SHOULD generate a Location header field in the response containing a preferred URI reference for the new permanent URI. The user agent MAY use the Location field value for automatic redirection.
Again, the presence of the Location header is no absolute requirement. The absence of this header would have questionable practicability. The semantics were akin - but not equal - to the 410 (Gone) response: "This resource is has permanently moved to a new, yet unknown location."
302: Found – RFC 7231, section 6.4.3
Originally this is has been specified as "Temporary Redirect" and got renamed in later specs. In contrast to 301 this one cannot (or should not) be cached or used to permanently rewrite URLs. The relevant part of the spec reads:
The server SHOULD generate a Location header field in the response containing a URI reference for the different URI. The user agent MAY use the Location field value for automatic redirection.
I believe the semantics of a missing Location header were pretty much the same as with 301: "This resource is has temporarily moved to a new, yet unknown location."
303: See Other – RFC 7231, section 6.4.4
303 is supposed to be returned in response to a POST request but is applicable to any method. In general, it is meant to let the client know there were a more appropriate representation at a substitute URL or the requested resource cannot be transmitted via HTTP.
In the context of this question, this is a bit of a headscratcher. RFC 2616, section 10.3.4 states:
The different URI SHOULD be given by the Location field in the response.
The relevant section of the newer RFC 7231 seems to simply presume the Location header being present:
the server is redirecting the user agent to a different resource, as indicated by a URI in the Location header field
There is nothing in the errata to clarify this, so I am inclined to assume the position of RFC 2616. The semantics of an absent Location header do differ depending on request method:
For POST this would be the same as 201 (Created) or 202 (Accepted)
For any other method, this were identical to 404 (Not Found)
304: Not Modified – RFC 7232, section 4.1
This response is in a way special as it stresses out on the "[indication] that further action needs to be taken by the user agent in order to fulfill the request." It should be understood as a redirect not to a new URI but to a local cache. There is no mention of the Location header in the relevant parts of RFC 7232 at all. In fact, this would make little sense as to my understanding the semantics were something like "the requested presentation of this entity has remained unchainged and you will find it in your local cache at …" That were a great breach of separation of concerns but is not to say Location were not allowed at this place. Still, Content-Location or a Link header with a rel=self part were more appropriate. Former one is receiving explicit mentioning:
The server generating a 304 response MUST generate any of the following header fields that would have been sent in a 200 (OK) response to the same request: Cache-Control, Content-Location, Date, ETag, Expires, and Vary.
305: Use Proxy – RFC 2616, section 10.3.6; RFC 7231, section 6.4.5
This status code has been deprecated as of RFC 7231 due to security concerns (cf Appendix B). Its definition in RFC 2616 reads:
The requested resource MUST be accessed through the proxy given by the Location field.
This implies the presence of a Location header, yet it does not explicitly require it. Omitting this header would have the semantic meaning of "this resource can only be accessed through some proxy."
306: Switch Proxy – draft-cohen-http-305-306-responses-00
Ths code has been introduced as a draft after RFC 2068 has been finalized and already got obsoleted by RFC 2616. To my knowledge, this draft has never reached the status of a recommendation, so this is purely for completeness. The rationale of this draft is to supply proxies with a mechanism to direct clients (temporarily) to other proxies for subsequent requests.
Part of this draft is the introduction of the Set-Proxy header which is to be used in place of the Location header per section 2.2:
In the original HTTP/1.1 spec, the 'Location' header was used to indicate the proxy setting. Its use is DEPRECATED by the 'Set-proxy' header in the context of a 305 response. All new implementations MUST send the Set-proxy header. Implementations MAY send the 'Location' header so as to allow backward compatibility.
Set-Proxy is then required in context of 306 while the Location header is purely optional. As the required Set-Proxy mechanism is meant to replace Location, the absence of latter header introduces no semantic changes.
307: Temporary Redirect – RFC 7231, section 6.4.7
307 got introduced as a result of a semantic change of 302 in HTTP/1.1: While redirects via 302 can change request methods, the redirected request must have the same request method as the original request.
The relevant part of the spec reads:
The server SHOULD generate a Location header field in the response containing a URI reference for the different URI. The user agent MAY use the Location field value for automatic redirection.
Again, Location seems to be optional. For semantic changes due to an absent header, see 302.
308: Permanent Redirect – RFC 7538
Like 307, redirects via 308 are to keep their original request method. One could say 308 were to 301 as 307 is to 302.
From section 3 of the spec:
The server SHOULD generate a Location header field in the response containing a preferred URI reference for the new permanent URI.
So, in summary we have got this situation:
Implied: 1 (305)
Optional: 1 (306)
No mention: 1 (304)
SHOULD: 6 (300; 301; 302; 303; 307; 308)
That "SHOULD" is to be read in the context of RFC 2119:
This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.
This is different from the absolute requirement of a "MUST" or "REQUIRED" (also in that RFC). So in a nutshell: There is no 3xx-class code in which the Location header is mandatory.
It should be noted, that the problem of a missing Location header is not a new one. From another answer:
301, 302, 303, and 307 provide a Location only if the next URL is known. Otherwise, the client/user has to decide what to do next

Related

Use 301 or 303 to redirect http to https

I'm not sure which is the best to use for my site when redirecting from http to https.
At the moment I am using IIS rewrite rules to do the redirect. The guides I've read on how to do this use either a 301 or a 303. And after reading up on 301 and 303, I'm still not sure which is best to use.
My understanding is they are pretty much similar in what they do in regards to a redirect between http and https.
Is there any difference and will it affect SEO in any way using one over the other?
From the spec:
301
The 301 (Moved Permanently) status code indicates that the target
resource has been assigned a new permanent URI and any future
references to this resource ought to use one of the enclosed URIs.
Clients with link-editing capabilities ought to automatically re-link
references to the effective request URI to one or more of the new
references sent by the server, where possible.
The server SHOULD generate a Location header field in the response
containing a preferred URI reference for the new permanent URI. The
user agent MAY use the Location field value for automatic
redirection. The server's response payload usually contains a short
hypertext note with a hyperlink to the new URI(s).
Note: For historical reasons, a user agent MAY change the request
method from POST to GET for the subsequent request. If this
behavior is undesired, the 307 (Temporary Redirect) status code
can be used instead.
A 301 response is cacheable by default; i.e., unless otherwise
indicated by the method definition or explicit cache controls
303
The 303 (See Other) status code indicates that the server is
redirecting the user agent to a different resource, as indicated by a
URI in the Location header field, which is intended to provide an
indirect response to the original request. A user agent can perform
a retrieval request targeting that URI (a GET or HEAD request if
using HTTP), which might also be redirected, and present the eventual
result as an answer to the original request. Note that the new URI
in the Location header field is not considered equivalent to the
effective request URI.
This status code is applicable to any HTTP method. It is primarily
used to allow the output of a POST action to redirect the user agent
to a selected resource, since doing so provides the information
corresponding to the POST response in a form that can be separately
identified, bookmarked, and cached, independent of the original
request.
A 303 response to a GET request indicates that the origin server
does not have a representation of the target resource that can be
transferred by the server over HTTP. However, the Location field
value refers to a resource that is descriptive of the target
resource, such that making a retrieval request on that other resource
might result in a representation that is useful to recipients without
implying that it represents the original target resource. Note that
answers to the questions of what can be represented, what
representations are adequate, and what might be a useful description
are outside the scope of HTTP.
Except for responses to a HEAD request, the representation of a 303
response ought to contain a short hypertext note with a hyperlink to
the same URI reference provided in the Location header field.
Google says:
Redirect your users and search engines to the HTTPS page or resource with server-side 301 HTTP redirects.
I recommend following Google's advice rather than trying to implement a 303 strategy.
Source: https://support.google.com/webmasters/answer/6073543?hl=en
When you redirect http to https, essentially you want to preserve the "link juice" from SEO perspective.
As you might know Google considers the number of backlinks a site has for ranking.
The main difference between the 301 and 303 redirect is essentially whether it passes link juice or not. There are technical differences like what you mentioned but from SEO perspective, 301 is a better choice.
Here is a blog post that shows how 303 might affect SEO of your site.
https://digitalreadymarketing.com/303-redirect-effect-seo/
In case you are interested in learning more about duplicate content (http and https is a typical duplicate content issue), check this post.
https://digitalreadymarketing.com/what-is-duplicate-content-how-to-find-solve-them/

Does the Location header accept the // protocol notation?

Most if not all browsers support the following notation:
<script src="//domain.com/script.js">
The // notation means use the same protocol as the current one, i.e.:
http://domain.com/script.js if the current page has been served over HTTP
https://domain.com/script.js if the current page has been served over HTTPS
This notation works with other HTML tags as well: <a>, <link>, etc.
Is this notation also valid for the Location header?
For example, is it valid to reply this:
HTTP/1.0 301 Moved Permanently
Location: //domain.com/other-resource
A URL starting with // is an example of a relative URL.
The Location-header needs an absolute URL, which means the answer you are looking for unfortunately is: no, it's not supported.
This is specified in Section 14.30 of RFC2616 on HTTP/1.1:
The field value consists of a single absolute URI.
Edit: But please consider the comments attached to this answer. My answer should maybe have been qualified by "according to the currently accepted published standard" or something. I am not the one to ask about what exists in reality ;)
No that is not valid. Neither does it really make that much sense:
The Location response-header field is used to redirect the recipient to a location other than the Request-URI for completion of the request or identification of a new resource. For 201 (Created) responses, the Location is that of the new resource which was created by the request. For 3xx responses, the location SHOULD indicate the server's preferred URI for automatic redirection to the resource. The field value consists of a single absolute URI.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30
If you know there is an secure URL available why would it matter what protocol the current page uses?
It is not valid per RFC 2616, but it works in practice and is valid in the current revision of HTTP/1.1 (see http://svn.tools.ietf.org/svn/wg/httpbis/specs/rfc7231.html#rfc.section.7.1.2)

How should a client handle Location field in an HTTP response when it includes a username & password?

The specification does not mention what to do of the username:password in a URI returned by a Location such as:
Location: http://user:secret#w3.org/hidden/pages
Are we supposed to ignore such? It doesn't seem to make sense, but I was wondering what if it were to happen (i.e. server misconfiguration, strange idea from some administrator/programmer...)
14.30 Location
The Location response-header field is used to redirect the recipient
to a location other than the Request-URI for completion of the request
or identification of a new resource. For 201 (Created) responses, the
Location is that of the new resource which was created by the request.
For 3xx responses, the location SHOULD indicate the server's preferred
URI for automatic redirection to the resource. The field value
consists of a single absolute URI.
Location = "Location" ":" absoluteURI
An example is:
Location: http://www.w3.org/pub/WWW/People.html
Note: The Content-Location header field (section 14.14) differs
from Location in that the Content-Location identifies the original
location of the entity enclosed in the request. It is therefore
possible for a response to contain header fields for both Location
and Content-Location. Also see section 13.10 for cache
requirements of some methods.
RFC 2617 may have an answer. From section 3.3:
...For example
a server could be responsible for authenticating content that
actually sits on another server. It would achieve this by having the
first 401 response include a domain directive whose value includes a
URI on the second server, and an opaque directive whose value
contains the state information. The client will retry the request, at
which time the server might respond with a 301/302 redirection,
pointing to the URI on the second server. The client will follow the
redirection, and pass an Authorization header , including the
<opaque> data.
So I interpret that to mean that the Location header you get back from an HTTP redirect shouldn't actually contain the user:secret# parts at all, just the rest of the example URL you gave, and that you (the client) would be responsible for remembering the user/pass you sent in the Authorization header of the original request that was redirected, and to pass the same header again in the second request.
Update
Also, RFC 2396 section 3.2.2 has some words about using user/password in the URL:
Some URL schemes use the format "user:password" in the userinfo
field. This practice is NOT RECOMMENDED, because the passing of
authentication information in clear text (such as URI) has proven to
be a security risk in almost every case where it has been used.

Is HTTP POST request allowed to send back a response body?

As per the HTTP specification:
If a resource has been created on the
origin server, the response SHOULD
be 201 (Created) and contain an entity
which describes the status of the
request and refers to the new
resource, and a Location header
(see section 14.30).
Does this mean that POST request should always send redirect URI in Location header with no response body?
It is perfectly acceptable to specify a response body and use the Location header at the same time. When using the Location header with a 201 response, you're not redirecting the client, you're just telling it where it can find the resource in future.
Redirects only apply to 3xx responses.
The W3C docs for this explain further, though the text is actually quite ambiguous:
The Location response-header field is used to redirect the recipient to a location other than the Request-URI for completion of the request or identification of a new resource. For 201 (Created) responses, the Location is that of the new resource which was created by the request. For 3xx responses, the location SHOULD indicate the server's preferred URI for automatic redirection to the resource.
I read that as saying "...redirect... or... identif[y]... new resource", but it's not exactly a plain English sentence.
Based on paragraph 9.5 of the HTTP 1.1 specification, which is the reference for questions like that, here is my understanding:
Yes you can, and the specification is clear about what you can do and how to do it:
The action performed by the POST method might not result in a resource that can be identified by a URI. In this case, either 200 (OK) or 204 (No Content) is the appropriate response status, depending on whether or not the response includes an entity that describes the result.
If a resource has been created on the origin server, the response SHOULD be 201 (Created) and contain an entity which describes the status of the request and refers to the new resource, and a Location header (see section 14.30).
Responses to this method are not cacheable, unless the response includes appropriate Cache-Control or Expires header fields. However, the 303 (See Other) response can be used to direct the user agent to retrieve a cacheable resource.

What is the purpose of the HTTP header field “Content-Location”?

Confused/inspired by a comment to my question Do search engines respect the HTTP header field “Content-Location”?, I’d like to know, what the exact purpose of the Content-Location header field in HTTP is and how it can be used.
In response to a GET request, Content-Location in HTTP can be used when a requested resource has multiple representations available, e.g. multiple languages. The selection of the resource returned will depend on the Accept headers in the original GET request.
Usually, the location specified in the Content-Location header is different to the location specified in the original request's URI.
In response to a PUT or POST request,
If the Content-Location URI is different than the requested URI, then the cache entry at the indicated URI is invalidated. (see https://www.rfc-editor.org/rfc/rfc7234#section-4.4 and https://www.rfc-editor.org/rfc/rfc2616#section-13.10)
If the Content-Location URI is the same as the requested URI, then that indicates to caches that the response to the PUT/POST request is the same as the response that would be received by a 200 response to a GET request at the same location and can thus be cached. (see https://www.rfc-editor.org/rfc/rfc7231#section-3.1.4.2) Note that Firefox and Chrome do not appear to implement this.
Content-Location HTTP header is supposed to declare unique location of the resource that was used for a response to HTTP GET (e.g. request was GET /frontpage HTTP/1.1, the server may add HTTP header Content-Location: http://domain.com/frontpage.english.msie-optimized informing the user agent that if this specific response is needed later, the provided location should be used because the original location may depend on various things, which should then be explained via the "Vary" header).
However, note that HTTP Content-Location header is problematic in real world usage because different browsers (user agents) handle it differently:
http://mail.python.org/pipermail/web-sig/2004-October/000985.html
This is because of RFC 2616 section 14.14 which says that "The value of Content-Location also defines the base URI for the entity". In short, a comforming user agent will compute the BASE URL for the fetched document using the Content-Location header which may result in different relative URLs being used if the fetched document does not define BASE url and real fetched URL and Content-Location differ enough (the "directory"/"path" part of the URL is different).
In addition, I've yet to see any advantage for using HTTP Content-Location (I once hoped that this could be used for hinting about permanent bookmark location in case currently viewed URL was volatile, such as domain.com/news/latest but that doesn't seem to be the case).
My current advice is forget about Content-Location for HTTP but you may use it for MIME email.
Section 14.14 of RFC 2616 states:
The Content-Location entity-header field MAY be used to supply the
resource location for the entity enclosed in the message when that
entity is accessible from a location separate from the requested
resource's URI...
This is used in AtomPub (RFC 5023, Section 9.2):
If the creation request contained an Atom Entry Document, and the
subsequent response from the server contains a Content-Location header
that matches the Location header character-for-character, then the
client is authorized to interpret the response entity as being a
complete representation of the newly created Entry. Without a matching
Content-Location header, the client MUST NOT assume the returned
entity is a complete representation of the created Resource.
check out RFC2557 at : http://www.faqs.org/rfcs/rfc2557.html for a deeper explanation if you are interested. I'm currently writing about this for a class. It's a little old but still relevant.

Resources