What is the purpose of the HTTP header field “Content-Location”? - http

Confused/inspired by a comment to my question Do search engines respect the HTTP header field “Content-Location”?, I’d like to know, what the exact purpose of the Content-Location header field in HTTP is and how it can be used.

In response to a GET request, Content-Location in HTTP can be used when a requested resource has multiple representations available, e.g. multiple languages. The selection of the resource returned will depend on the Accept headers in the original GET request.
Usually, the location specified in the Content-Location header is different to the location specified in the original request's URI.
In response to a PUT or POST request,
If the Content-Location URI is different than the requested URI, then the cache entry at the indicated URI is invalidated. (see https://www.rfc-editor.org/rfc/rfc7234#section-4.4 and https://www.rfc-editor.org/rfc/rfc2616#section-13.10)
If the Content-Location URI is the same as the requested URI, then that indicates to caches that the response to the PUT/POST request is the same as the response that would be received by a 200 response to a GET request at the same location and can thus be cached. (see https://www.rfc-editor.org/rfc/rfc7231#section-3.1.4.2) Note that Firefox and Chrome do not appear to implement this.

Content-Location HTTP header is supposed to declare unique location of the resource that was used for a response to HTTP GET (e.g. request was GET /frontpage HTTP/1.1, the server may add HTTP header Content-Location: http://domain.com/frontpage.english.msie-optimized informing the user agent that if this specific response is needed later, the provided location should be used because the original location may depend on various things, which should then be explained via the "Vary" header).
However, note that HTTP Content-Location header is problematic in real world usage because different browsers (user agents) handle it differently:
http://mail.python.org/pipermail/web-sig/2004-October/000985.html
This is because of RFC 2616 section 14.14 which says that "The value of Content-Location also defines the base URI for the entity". In short, a comforming user agent will compute the BASE URL for the fetched document using the Content-Location header which may result in different relative URLs being used if the fetched document does not define BASE url and real fetched URL and Content-Location differ enough (the "directory"/"path" part of the URL is different).
In addition, I've yet to see any advantage for using HTTP Content-Location (I once hoped that this could be used for hinting about permanent bookmark location in case currently viewed URL was volatile, such as domain.com/news/latest but that doesn't seem to be the case).
My current advice is forget about Content-Location for HTTP but you may use it for MIME email.

Section 14.14 of RFC 2616 states:
The Content-Location entity-header field MAY be used to supply the
resource location for the entity enclosed in the message when that
entity is accessible from a location separate from the requested
resource's URI...
This is used in AtomPub (RFC 5023, Section 9.2):
If the creation request contained an Atom Entry Document, and the
subsequent response from the server contains a Content-Location header
that matches the Location header character-for-character, then the
client is authorized to interpret the response entity as being a
complete representation of the newly created Entry. Without a matching
Content-Location header, the client MUST NOT assume the returned
entity is a complete representation of the created Resource.

check out RFC2557 at : http://www.faqs.org/rfcs/rfc2557.html for a deeper explanation if you are interested. I'm currently writing about this for a class. It's a little old but still relevant.

Related

Does the Location header accept the // protocol notation?

Most if not all browsers support the following notation:
<script src="//domain.com/script.js">
The // notation means use the same protocol as the current one, i.e.:
http://domain.com/script.js if the current page has been served over HTTP
https://domain.com/script.js if the current page has been served over HTTPS
This notation works with other HTML tags as well: <a>, <link>, etc.
Is this notation also valid for the Location header?
For example, is it valid to reply this:
HTTP/1.0 301 Moved Permanently
Location: //domain.com/other-resource
A URL starting with // is an example of a relative URL.
The Location-header needs an absolute URL, which means the answer you are looking for unfortunately is: no, it's not supported.
This is specified in Section 14.30 of RFC2616 on HTTP/1.1:
The field value consists of a single absolute URI.
Edit: But please consider the comments attached to this answer. My answer should maybe have been qualified by "according to the currently accepted published standard" or something. I am not the one to ask about what exists in reality ;)
No that is not valid. Neither does it really make that much sense:
The Location response-header field is used to redirect the recipient to a location other than the Request-URI for completion of the request or identification of a new resource. For 201 (Created) responses, the Location is that of the new resource which was created by the request. For 3xx responses, the location SHOULD indicate the server's preferred URI for automatic redirection to the resource. The field value consists of a single absolute URI.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30
If you know there is an secure URL available why would it matter what protocol the current page uses?
It is not valid per RFC 2616, but it works in practice and is valid in the current revision of HTTP/1.1 (see http://svn.tools.ietf.org/svn/wg/httpbis/specs/rfc7231.html#rfc.section.7.1.2)

For which 3xx HTTP codes is the Location header mandatory?

RFC 2616 defines the Location header as:
The Location response-header field is used to redirect the recipient to a location other than the Request-URI for completion of the request or identification of a new resource
...
For 3xx responses, the location SHOULD indicate the server's preferred URI for automatic redirection to the resource.
AFAIK, for 3xx Redirection codes, the Location header is:
300 Multiple Choices : optional
301 Moved Permanently : required
302 Found : required
303 See Other : required
304 Not Modified : irrelevant
305 Use Proxy : irrelevant (?)
306 Switch Proxy : irrelevant (?)
307 Temporary Redirect : required
308 Permanent Redirect : required
But that's just from personal experience. Is there a standard that defines which HTTP codes require the Location header to be sent?
That is, for which 3xx codes should an HTTP client throw an exception when received without a corresponding Location header?
This question has been asked back in the days when RFC 2616 has still been the authority, so it looked like a fun research project now that RFCs 7230 to 7235 are in place. So, let's see what we've got here.
The Location header is now defined in RFC 7231, section 7.1.2:
The "Location" header field is used in some responses to refer to a specific resource in relation to the response. The type of relationship is defined by the combination of request method and status code semantics.
[…]
For 201 (Created) responses, the Location value refers to the primary resource created by the request. For 3xx (Redirection) responses, the Location value refers to the preferred target resource for automatically redirecting the request.
The section does not confine this header solely to the 3xx-range of status codes. In fact, the only status codes explicitly being mentioned are 201 (Created) and 303 (See Other). No word about this header being actually required by any status code, though.
The purpose of the 3xx-range of codes is now described by RFC 7231, section 6.4:
The 3xx (Redirection) class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. If a Location header field is provided, the user agent MAY automatically redirect its request to the URI referenced by the Location field value, even if the specific status code is not understood.
The wording suggests that neither the presence nor automatically redirecting to its content is mandatory.
At the time of this writing, the IANA HTTP Status Code Registry is listing the codes 300 through 308 as registered. With one (305) being obsoleted and one being reserved (306), this is leaving seven active codes:
300: Multiple Choices – RFC 7231, section 6.4.1
The 300 code is to be returned if the server is aware of multiple representations of a resource. As of RFC 7231, there is no longer a recommended way to communicate a list of possible representations, though the Link header via RFC 5988 is being mentioned. Regarding the Location header, the RFC has this to say:
If the server has a preferred choice, the server SHOULD generate a Location header field containing a preferred choice's URI reference. The user agent MAY use the Location field value for automatic redirection.
Meaning the Location header is only to be used if the server has a preferred representation. If there is none, the server simply doesn't have such a preference.
It bears mentioning that the Location header by itself is unfit to list all possible representations as it is by its grammar a single-value field that cannot contain a list. Hence, the meaning of
Location: //example.com/a
Location: //example.com/b
is undefined.
301: Moved Permanently – RFC 7231, section 6.4.2
This response code is to let the client know there is an entirely new location for the requested resource: subsequent requests are to be directed at the location specified in the Location header.
The server SHOULD generate a Location header field in the response containing a preferred URI reference for the new permanent URI. The user agent MAY use the Location field value for automatic redirection.
Again, the presence of the Location header is no absolute requirement. The absence of this header would have questionable practicability. The semantics were akin - but not equal - to the 410 (Gone) response: "This resource is has permanently moved to a new, yet unknown location."
302: Found – RFC 7231, section 6.4.3
Originally this is has been specified as "Temporary Redirect" and got renamed in later specs. In contrast to 301 this one cannot (or should not) be cached or used to permanently rewrite URLs. The relevant part of the spec reads:
The server SHOULD generate a Location header field in the response containing a URI reference for the different URI. The user agent MAY use the Location field value for automatic redirection.
I believe the semantics of a missing Location header were pretty much the same as with 301: "This resource is has temporarily moved to a new, yet unknown location."
303: See Other – RFC 7231, section 6.4.4
303 is supposed to be returned in response to a POST request but is applicable to any method. In general, it is meant to let the client know there were a more appropriate representation at a substitute URL or the requested resource cannot be transmitted via HTTP.
In the context of this question, this is a bit of a headscratcher. RFC 2616, section 10.3.4 states:
The different URI SHOULD be given by the Location field in the response.
The relevant section of the newer RFC 7231 seems to simply presume the Location header being present:
the server is redirecting the user agent to a different resource, as indicated by a URI in the Location header field
There is nothing in the errata to clarify this, so I am inclined to assume the position of RFC 2616. The semantics of an absent Location header do differ depending on request method:
For POST this would be the same as 201 (Created) or 202 (Accepted)
For any other method, this were identical to 404 (Not Found)
304: Not Modified – RFC 7232, section 4.1
This response is in a way special as it stresses out on the "[indication] that further action needs to be taken by the user agent in order to fulfill the request." It should be understood as a redirect not to a new URI but to a local cache. There is no mention of the Location header in the relevant parts of RFC 7232 at all. In fact, this would make little sense as to my understanding the semantics were something like "the requested presentation of this entity has remained unchainged and you will find it in your local cache at …" That were a great breach of separation of concerns but is not to say Location were not allowed at this place. Still, Content-Location or a Link header with a rel=self part were more appropriate. Former one is receiving explicit mentioning:
The server generating a 304 response MUST generate any of the following header fields that would have been sent in a 200 (OK) response to the same request: Cache-Control, Content-Location, Date, ETag, Expires, and Vary.
305: Use Proxy – RFC 2616, section 10.3.6; RFC 7231, section 6.4.5
This status code has been deprecated as of RFC 7231 due to security concerns (cf Appendix B). Its definition in RFC 2616 reads:
The requested resource MUST be accessed through the proxy given by the Location field.
This implies the presence of a Location header, yet it does not explicitly require it. Omitting this header would have the semantic meaning of "this resource can only be accessed through some proxy."
306: Switch Proxy – draft-cohen-http-305-306-responses-00
Ths code has been introduced as a draft after RFC 2068 has been finalized and already got obsoleted by RFC 2616. To my knowledge, this draft has never reached the status of a recommendation, so this is purely for completeness. The rationale of this draft is to supply proxies with a mechanism to direct clients (temporarily) to other proxies for subsequent requests.
Part of this draft is the introduction of the Set-Proxy header which is to be used in place of the Location header per section 2.2:
In the original HTTP/1.1 spec, the 'Location' header was used to indicate the proxy setting. Its use is DEPRECATED by the 'Set-proxy' header in the context of a 305 response. All new implementations MUST send the Set-proxy header. Implementations MAY send the 'Location' header so as to allow backward compatibility.
Set-Proxy is then required in context of 306 while the Location header is purely optional. As the required Set-Proxy mechanism is meant to replace Location, the absence of latter header introduces no semantic changes.
307: Temporary Redirect – RFC 7231, section 6.4.7
307 got introduced as a result of a semantic change of 302 in HTTP/1.1: While redirects via 302 can change request methods, the redirected request must have the same request method as the original request.
The relevant part of the spec reads:
The server SHOULD generate a Location header field in the response containing a URI reference for the different URI. The user agent MAY use the Location field value for automatic redirection.
Again, Location seems to be optional. For semantic changes due to an absent header, see 302.
308: Permanent Redirect – RFC 7538
Like 307, redirects via 308 are to keep their original request method. One could say 308 were to 301 as 307 is to 302.
From section 3 of the spec:
The server SHOULD generate a Location header field in the response containing a preferred URI reference for the new permanent URI.
So, in summary we have got this situation:
Implied: 1 (305)
Optional: 1 (306)
No mention: 1 (304)
SHOULD: 6 (300; 301; 302; 303; 307; 308)
That "SHOULD" is to be read in the context of RFC 2119:
This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.
This is different from the absolute requirement of a "MUST" or "REQUIRED" (also in that RFC). So in a nutshell: There is no 3xx-class code in which the Location header is mandatory.
It should be noted, that the problem of a missing Location header is not a new one. From another answer:
301, 302, 303, and 307 provide a Location only if the next URL is known. Otherwise, the client/user has to decide what to do next

How should a client handle Location field in an HTTP response when it includes a username & password?

The specification does not mention what to do of the username:password in a URI returned by a Location such as:
Location: http://user:secret#w3.org/hidden/pages
Are we supposed to ignore such? It doesn't seem to make sense, but I was wondering what if it were to happen (i.e. server misconfiguration, strange idea from some administrator/programmer...)
14.30 Location
The Location response-header field is used to redirect the recipient
to a location other than the Request-URI for completion of the request
or identification of a new resource. For 201 (Created) responses, the
Location is that of the new resource which was created by the request.
For 3xx responses, the location SHOULD indicate the server's preferred
URI for automatic redirection to the resource. The field value
consists of a single absolute URI.
Location = "Location" ":" absoluteURI
An example is:
Location: http://www.w3.org/pub/WWW/People.html
Note: The Content-Location header field (section 14.14) differs
from Location in that the Content-Location identifies the original
location of the entity enclosed in the request. It is therefore
possible for a response to contain header fields for both Location
and Content-Location. Also see section 13.10 for cache
requirements of some methods.
RFC 2617 may have an answer. From section 3.3:
...For example
a server could be responsible for authenticating content that
actually sits on another server. It would achieve this by having the
first 401 response include a domain directive whose value includes a
URI on the second server, and an opaque directive whose value
contains the state information. The client will retry the request, at
which time the server might respond with a 301/302 redirection,
pointing to the URI on the second server. The client will follow the
redirection, and pass an Authorization header , including the
<opaque> data.
So I interpret that to mean that the Location header you get back from an HTTP redirect shouldn't actually contain the user:secret# parts at all, just the rest of the example URL you gave, and that you (the client) would be responsible for remembering the user/pass you sent in the Authorization header of the original request that was redirected, and to pass the same header again in the second request.
Update
Also, RFC 2396 section 3.2.2 has some words about using user/password in the URL:
Some URL schemes use the format "user:password" in the userinfo
field. This practice is NOT RECOMMENDED, because the passing of
authentication information in clear text (such as URI) has proven to
be a security risk in almost every case where it has been used.

Appropriate HTTP status code for request specifying invalid Content-Encoding header?

What status code should be returned if a client sends an HTTP request and specifies a Content-Encoding header which cannot be decoded by the server?
Example
A client POSTs JSON data to a REST resource and encodes the entity body using the gzip coding. However, the server can only decode DEFLATE codings because it failed the gzip class in server school.
What HTTP response code should be returned? I would say 415 Unsupported Media Type but it's not the entity's Content-Type that is the problem -- it's the encoding of the otherwise supported entity body.
Which is more appropriate: 415? 400? Perhaps a custom response code?
Addendum: I have, of course, thoroughly checked rfc2616. If the answer is there I may need some new corrective eyewear, but I don't believe that it is.
Update:
This has nothing to do with sending a response that might be unacceptable to a client. The problem is that the client is sending the server what may or may not be a valid media type in an encoding the server cannot understand (as per the Content-Encoding header the client packaged with the request message).
It's an edge-case and wouldn't be encountered when dealing with browser user-agents, but it could crop up in REST APIs accepting entity bodies to create/modify resources.
As i'm reading it, 415 Unsupported Media Type sounds like the most appropriate.
From RFC 2616:
10.4.16 415 Unsupported Media Type
The server is refusing to service the request because the entity of the request is in a format not supported by the requested resource for the requested method.
Yeah, the text part says "media type" rather than "encoding", but the actual description doesn't include any mention of that distinction.
The new hotness, RFC 7231, is even explicit about it:
6.5.13. 415 Unsupported Media Type
The 415 (Unsupported Media Type) status code indicates that the
origin server is refusing to service the request because the payload
is in a format not supported by this method on the target resource.
The format problem might be due to the request's indicated
Content-Type or Content-Encoding, or as a result of inspecting the
data directly.
They should make that the final question on Who Wants To Be a Millionaire!
Well the browser made a request that the server cannot service because the information the client provided is in a format that cannot be handled by the server. However, this isn't the server's fault for not supporting the data the client provided, it's the client's fault for not listening to the server's Acccept-* headers and providing data in an inappropriate encoding. That would make it a Client Error (400 series error code).
My first instinct is 400 Bad Request is the appropriate response in this case.
405 Method Not Allowed isn't right because it refers to the HTTP verb being one that isn't allowed.
406 Not Acceptable looks like it might have promise, but it refers to the server being unable to provide data to the client that satisfies the Accept-* request headers that it sent. This doesn't seem like it would fit your case.
412 Precondition Failed is rather vaguely defined. It might be appropriate, but I wouldn't bet on it.
415 Unsupported Media Type isn't right because it's not the data type that's being rejected, it's the encoding format.
After that we get into the realm of non-standard response codes.
422 Unprocessable Entity describes a response that should be returned if the request was well-formed but if it was semantically incorrect in some way. This seems like a good fit, but it's a WebDAV extension to HTTP and not standard.
Given the above, I'd personally opt for 400 Bad Request. If any other HTTP experts have a better candidate though, I'd listen to them instead. ;)
UPDATE: I'd previously been referencing the HTTP statuses from their page on Wikipedia. Whilst the information there seems to be accurate, it's also less than thorough. Looking at the specs from W3C gives a lot more information on HTTP 406, and it's leading me to think that 406 might be the right code after all.
10.4.7 406 Not Acceptable
The resource identified by the request is only capable of generating
response entities which have content characteristics not acceptable
according to the accept headers sent in the request.
Unless it was a HEAD request, the response SHOULD include an entity
containing a list of available entity characteristics and location(s)
from which the user or user agent can choose the one most appropriate.
The entity format is specified by the media type given in the
Content-Type header field. Depending upon the format and the
capabilities of the user agent, selection of the most appropriate
choice MAY be performed automatically. However, this specification
does not define any standard for such automatic selection.
Note: HTTP/1.1 servers are allowed to return responses which are
not acceptable according to the accept headers sent in the
request. In some cases, this may even be preferable to sending a
406 response. User agents are encouraged to inspect the headers of
an incoming response to determine if it is acceptable.
If the response could be unacceptable, a user agent SHOULD temporarily
stop receipt of more data and query the user for a decision on further
actions.
While it does mention the Content-Type header explicitly, the wording mentions "entity characteristics", which you could read as covering stuff like GZIP versus DEFLATE compression.
One thing worth noting is that the spec says that it may be appropriate to just send the data as is, along with the headers to tell the client what format it's in and what encoding it uses, and just leave it for the client to sort out. So if the client sends a header indicating it accepts GZIP compression, but the server can only generate a response with DEFLATE, then sending that along with headers saying it's DEFLATE should be okay (depending on the context).
Client: Give me a GZIPPED page.
Server: Sorry, no can do. I can DEFLATE pack it for you. Here's the DEFLATE packed page. Is that okay for you?
Client: Welllll... I didn't really want DEFLATE, but I can decode it okay so I'll take it.
(or)
Client: I think I'll have to clear that with my user. Hold on.

Is HTTP POST request allowed to send back a response body?

As per the HTTP specification:
If a resource has been created on the
origin server, the response SHOULD
be 201 (Created) and contain an entity
which describes the status of the
request and refers to the new
resource, and a Location header
(see section 14.30).
Does this mean that POST request should always send redirect URI in Location header with no response body?
It is perfectly acceptable to specify a response body and use the Location header at the same time. When using the Location header with a 201 response, you're not redirecting the client, you're just telling it where it can find the resource in future.
Redirects only apply to 3xx responses.
The W3C docs for this explain further, though the text is actually quite ambiguous:
The Location response-header field is used to redirect the recipient to a location other than the Request-URI for completion of the request or identification of a new resource. For 201 (Created) responses, the Location is that of the new resource which was created by the request. For 3xx responses, the location SHOULD indicate the server's preferred URI for automatic redirection to the resource.
I read that as saying "...redirect... or... identif[y]... new resource", but it's not exactly a plain English sentence.
Based on paragraph 9.5 of the HTTP 1.1 specification, which is the reference for questions like that, here is my understanding:
Yes you can, and the specification is clear about what you can do and how to do it:
The action performed by the POST method might not result in a resource that can be identified by a URI. In this case, either 200 (OK) or 204 (No Content) is the appropriate response status, depending on whether or not the response includes an entity that describes the result.
If a resource has been created on the origin server, the response SHOULD be 201 (Created) and contain an entity which describes the status of the request and refers to the new resource, and a Location header (see section 14.30).
Responses to this method are not cacheable, unless the response includes appropriate Cache-Control or Expires header fields. However, the 303 (See Other) response can be used to direct the user agent to retrieve a cacheable resource.

Resources