What's the rationale behind the HTTP Date header? - http

I have read RFC 2616, but still I wonder, what the Date field is for. There is the Last-Modified field, that actually has a meaning besides just serving metadata, that is, for caching ('If-Modified-Since').
But what use has it to double the info in a separate Date header?

Per the spec, it is used in age calculations. If you don't know what time the server thinks it is, you won't be able to calculate the "age" of a resource. Here's the relevant text from the spec:
Summary of age calculation algorithm, when a cache receives a response:
age_value
is the value of Age: header received by the cache with
this response.
date_value
is the value of the origin server's Date: header
request_time
is the (local) time when the cache made the request
that resulted in this cached response
response_time
is the (local) time when the cache received the
response
now
is the current (local) time
apparent_age = max(0, response_time - date_value);
corrected_received_age = max(apparent_age, age_value);
response_delay = response_time - request_time;
corrected_initial_age = corrected_received_age + response_delay;
resident_time = now - response_time;
current_age = corrected_initial_age + resident_time;

The Date is needed only for a better work of Expires header:
Date: Mon, 26 Mar 2012 12:53:02 GMT
Expires: Wed, 25 Apr 2012 12:53:02 GMT
A server or a client may have an incorrect time so client (browser) tries to calculate max age of the resource freshness.
That was one of the reasons why the Cache-Control tag was introduced.
It uses seconds to expire instead of a fixed time.
I tested Chrome and Firefox and they are fine is response without Date header so it can be safely omitted unless you are still using obsolete Expires header. If the Date is missing it just assumed the same as client's time.
It's just insane that in spec the header is mandatory: the date formatting/parsing consumes CPU and network.

Please consider not to use the Date Header as it is on the list of the "Forbidden header names".
The following description from the MDN web docs might help:
A forbidden header name is the name of any HTTP header that cannot be modified programmatically; specifically, an HTTP request header name (in contrast with a Forbidden response header name).
Modifying such headers is forbidden because the user agent retains full control over them. Names starting with Sec- are reserved for creating new headers safe from APIs using Fetch that grant developers control over headers, such as XMLHttpRequest.
Forbidden header names start with Proxy- or Sec-, or are one of the following names:
Accept-Charset
List item
Accept-Encoding
Access-Control-Request-Headers
Access-Control-Request-Method
Connection
Content-Length
Cookie
Cookie2
Date
DNT
Expect
Host
Keep-Alive
Origin
Proxy-
Sec-
Referer
TE
Trailer
Transfer-Encoding
Upgrade
Via

Related

What happens if the origin web server sets the expires value in response header as a time which is passed relatively long ago?

What happens if the origin web server sets the expires value in response header as a time which is passed relatively long ago.
For instance, consider current time is Fri, 25 Jan 2013 GMT, and the expire header is set as -->
Expires: Thu, 01 Dec 1994 16:00:00 GMT
How will the client respond for the above instance?
Any help would be appreciated
Responding with a past date in the Expired header (earlier than the Date header value) makes no sense and would be a sign of some serious misconfiguration. That being said, clients would treat such response as "already expired" and not cache it. Same applies if Expires date is equal to Date header value (which is the correct way of server marking the response as "already expired") or having an invalid format.
See the Expires section of RFC 2616 for details.
An expire time pointing to a specific and fixed date in the past is often seen in relatively poor configs and/or relatively poor (starters) code whereby the intent is to prevent the HTTP 1.1 client to cache the response. This will work, but a specific and fixed date in the past is plain ridiculous. A value of 0 would make more sense, or a value exactly equal to the Date response header, which is less often seen, but recommended by the HTTP Expires header specification, cited below (emphasis mine):
...
HTTP/1.1 clients and caches MUST treat other invalid date formats, especially including the value "0", as in the past (i.e., "already expired").
To mark a response as "already expired," an origin server sends an Expires date that is equal to the Date header value. (See the rules for expiration calculations in section 13.2.4.)
...
See also:
How to control web page caching, across all browsers? - you'll see that some of the lower rated answers mention a specific and fixed date in the past, which just evidences ignorance.
From RFC 7234:
If an origin server wishes to force a cache to validate every
request, it can assign an explicit expiration time in the past to
indicate that the response is already stale.
This makes clear that it is a valid behaviour, and that a client should treat the response as stale and validate it on every subsequent call.

What is Cache-Control: private?

When I visit chesseng.herokuapp.com I get a response header that looks like
Cache-Control:private
Connection:keep-alive
Content-Encoding:gzip
Content-Type:text/css
Date:Tue, 16 Oct 2012 06:37:53 GMT
Last-Modified:Tue, 16 Oct 2012 03:13:38 GMT
Status:200 OK
transfer-encoding:chunked
Vary:Accept-Encoding
X-Rack-Cache:miss
and then I refresh the page and get
Cache-Control:private
Connection:keep-alive
Date:Tue, 16 Oct 2012 06:20:49 GMT
Status:304 Not Modified
X-Rack-Cache:miss
so it seems like caching is working. If that works for caching then what is the point of Expires and Cache-Control:max-age. To add to confusion, when I test the page at https://developers.google.com/speed/pagespeed/insights/ it tells me to "Leverage browser caching".
Cache-Control: private
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache, such as a proxy server.
From RFC2616 section 14.9.1
To answer your question about why caching is working, even though the web-server didn't include the headers:
Expires: [a date]
Cache-Control: max-age=[seconds]
The server kindly asked any intermediate proxies to not cache the contents (i.e. the item should only be cached in a private cache, i.e. only on your own local machine):
Cache-Control: private
But the server forgot to include any sort of caching hints:
they forgot to include Expires (so the browser knows to use the cached copy until that date)
they forgot to include Max-Age (so the browser knows how long the cached item is good for)
they forgot to include E-Tag (so the browser can do a conditional request)
But they did include a Last-Modified date in the response:
Last-Modified: Tue, 16 Oct 2012 03:13:38 GMT
Because the browser knows the date the file was modified, it can perform a conditional request. It will ask the server for the file, but instruct the server to only send the file if it has been modified since 2012/10/16 3:13:38:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
The server receives the request, realizes that the client has the most recent version already. Rather than sending the client 200 OK, followed by the contents of the page, it instead tells you that your cached version is good:
304 Not Modified
Your browser did have to suffer the round-trip delay of sending a request to the server, and waiting for the response, but it did save having to re-download the static content.
Why Max-Age? Why Expires?
Because Last-Modified sucks.
Not everything on the server has a date associated with it. If I'm building a page on the fly, there is no date associated with it - it's now. But I'm perfectly willing to let the user cache the homepage for 15 seconds:
200 OK
Cache-Control: max-age=15
If the user hammers F5, they'll keep getting the cached version for 15 seconds. If it's a corporate proxy, then all 67,198 users hitting the same page in the same 15-second window will all get the same contents - all served from close cache. Performance win for everyone.
The virtue of adding Cache-Control: max-age is that the browser doesn't even have to perform a "conditional" request.
if you specified only Last-Modified, the browser has to perform a If-Modified-Since request, and watch for a 304 Not Modified response
if you specified max-age, the browser won't even have to suffer the network round-trip; the content will come right out of the caches.
The difference between "Cache-Control: max-age" and "Expires"
Expires is a legacy (c. 1998) equivalent of the modern Cache-Control: max-age header:
Expires: you specify a date (yuck)
max-age: you specify seconds (goodness)
And if both are specified, then the browser uses max-age:
200 OK
Cache-Control: max-age=60
Expires: 20180403T192837
Any web-site written after 1998 should not use Expires anymore, and instead use max-age.
What is ETag?
ETag is similar to Last-Modified, except that it doesn't have to be a date - it just has to be a something.
If I'm pulling a list of products out of a database, the server can send the last rowversion as an ETag, rather than a date:
200 OK
ETag: "247986"
My ETag can be the SHA1 hash of a static resource (e.g. image, js, css, font), or of the cached rendered page (i.e. this is what the Mozilla MDN wiki does; they hash the final markup):
200 OK
ETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
And exactly like in the case of a conditional request based on Last-Modified:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
304 Not Modified
I can perform a conditional request based on the ETag:
GET / HTTP/1.1
If-None-Match: "33a64df551425fcc55e4d42a148795d9f25f89d4"
304 Not Modified
An ETag is superior to Last-Modified because it works for things besides files, or things that have a notion of date. It just is
RFC 2616, section 14.9.1:
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache...A private (non-shared) cache MAY cache the response.
Browsers could use this information. Of course, the current "user" may mean many things: OS user, a browser user (e.g. Chrome's profiles), etc. It's not specified.
For me, a more concrete example of Cache-Control: private is that proxy servers (which typically have many users) won't cache it. It is meant for the end user, and no one else.
FYI, the RFC makes clear that this does not provide security. It is about showing the correct content, not securing content.
This usage of the word private only controls where the response may be cached, and cannot ensure the privacy of the message content.
The Expires entity-header field gives the date/time after which the response is considered stale.The Cache-control:maxage field gives the age value (in seconds) bigger than which response is consider stale.
Althought above header field give a mechanism to client to decide whether to send request to the server. In some condition, the client send a request to sever and the age value of response is bigger then the maxage value ,dose it means server needs to send the resource to client? Maybe the resource never changed.
In order to resolve this problem, HTTP1.1 gives last-modifided head. The server gives the last modified date of the response to client. When the client need this resource, it will send If-Modified-Since head field to server. If this date is before the modified date of the resouce, the server will sends the resource to client and gives 200 code.Otherwise,it will returns 304 code to client and this means client can use the resource it cached.

no 'Last-Modified' HTTP header -> however cached?

From a browser perspective,
What occur if a component (image, script, stylesheet...) is served without a Last-Modified HTTP header field...
Is it however cached by the browser even if it won't be able to perform a validity check(If-Modified-Since) in future, due to his lack of date/time information?
Eg:
GET /foo.png HTTP/1.1
Host: example.org
--
200 OK
Content-Type: image/png
...
Is foo.png however cached?
--
Would you know any online service to serve my raw HTTP response that I can write myself in order to test what I'm asking ?
Thank you.
Generally speaking, responses can be cached unless they explicitly say that they can't (e.g., with cache-control: no-store).
However, most caches will not store responses that don't have something that they can base freshness on, e.g., Cache-Control, Expires, or Last-Modified.
For the complete rules, see:
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-p6-cache-13#section-2.1
See:
http://www.mnot.net/blog/2009/02/24/unintended_caching
for an example of how this can surprise some people.
Yes, the image may get cached even without a Last-Modified response header.
The browser will then cache the image until its TTL expires. You can set the image's Time To Live using appropriate response headers, e.g. this would set the TTL to one hour:
Cache-Control: max-age=3600
Date: Tue, 29 Mar 2011 20:18:17 GMT
Expires: Tue, 29 Mar 2011 21:18:17 GMT
Even without any Last-Modified in the response, the browser may still use the Date header for subsequent If-Modified-Since requests.
I disabled the Last-Modified header on a large site and FF 13 doesn't take the contents from cache, although a max-age is given etc. Contents without a Last-Modified header ALWAYS get a status 200 ok when requested, not a 304. So the browser looks for it in the cache.

What HTTP response headers are required

What HTTP response headers are required to be sent from server to the client?
I working to optimize the HTTP response headers to minimize the HTTP response overhead. I know "overhead" is somewhat exaggerated, but I like a clean output.
I see a lot of websites sending redundant cache headers, etc..
e.g.
It is redundant to specify both Expires and Cache-Control: max-age, or to specify both Last-Modified and ETag.
Source
HTTP/1.1: Header Field Definitions
It depends on what you define as being required: there are no header fields that must be sent with every response no matter what the circumstances are, but there are header fields that you really should send. The only header field that comes close is Date, but even it has circumstances under which it is not required.
In the parlance of RFC 2119, the term MUST means that something is a requirement of the specification and not meeting the requirement would be invalid. There are no header fields defined by RFCs 7230, 7231, 7232, 7233, 7234, or 7235 that MUST be sent by an origin server in all cases.
The following headers, for example, can be omitted (though you probably should send them):
7.1.1.2. Date
An origin server MUST NOT send a Date header field if it does not
have a clock capable of providing a reasonable approximation of the
current instance in Coordinated Universal Time. An origin server MAY
send a Date header field if the response is in the 1xx
(Informational) or 5xx (Server Error) class of status codes. An
origin server MUST send a Date header field in all other cases.
Note the last sentence of the quote. The Date header field MUST be sent if the origin server is capable of providing a "reasonable approximation" of the date in UTC, but there is nothing stopping a server from misrepresenting itself.
7.4.2. Server
An origin server MAY generate a Server field in its responses.
3.3.2. Content-Length
Aside from [a finite number of predefined cases], in the absence of
Transfer-Encoding, an origin server SHOULD send a Content-Length
header field when the payload body size is known prior to sending the
complete header section.
On the subject of Content-Length and Transfer-Encoding, note that neither can be sent, in which case the length of the response is "determined by the number of octets received prior to the server closing the connection."
3.1.1.5. Content-Type
If a Content-Type header field is not present, the recipient
MAY either assume a media type of application/octet-stream
(RFC2046, Section 4.5.1) or examine the data to determine its type.
There are circumstances under which particular headers can be required, for example:
An origin server that does not support persistent connections MUST send the Connection: close in every response that does not have a 1xx status code.
An origin server MUST generate an Allow header in a 405 (Method Not Allowed) response.
An origin server generating a 401 (Unauthorized) response MUST send a WWW-Authenticate header field containing at least one challenge.
It depends on the specifics of the response, but generally, a response from an origin server should have:
Date
Content-Type
Server
and either Content-Length, Transfer-Encoding or Connection: close.
If you want to do caching, add Cache-Control (e.g., with max-age); Expires isn't generally necessary any more. If you want clients to be able to validate, add Last-Modified or ETag.

How to cache an HTTP POST response?

I would like to create a cacheable HTTP response for a POST request.
My actual implementation responds the following for the POST request:
HTTP/1.1 201 Created
Expires: Sat, 03 Oct 2020 15:33:00 GMT
Cache-Control: private,max-age=315360000,no-transform
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Content-Length: 9
ETag: 2120507660800737950
Last-Modified: Wed, 06 Oct 2010 15:33:00 GMT
.........
But it looks like that the browsers (Safari, Firefox tested) are not caching the response.
In the HTTP RFC the corresponding part says:
Responses to this method are not cacheable unless the response includes appropriate Cache-Control or Expires header fields. However, the 303 (See Other) response can be used to direct the user agent to retrieve a cacheable resource.
So I think it should be cached. I know I could set a session variable and set a cookie and do a 303 redirect, but I want to cache the response of the POST request.
Is there any way to do this?
P.S.: I've started with a simple 200 OK, so it does not work.
I'd also note that caching is always optional (it's a MAY in the HTTP/1.1 RFC). Since under most circumstances, a successful POST invalidates a cache entry, it's probably simply the case that the browser caches you're looking at just don't implement caching POST responses (since this would be pretty uncommon--usually this is accomplished by formatting things as a GET, which it sounds like you've done).
Short answer: POST caching rarely makes sense. A cache may serve GET requests to a URL which is the same as that of a previous POST, whose response came with a Content-Location header containing the POST's request URI.
From rfc-7231 (http-bis, superseding rfc-2616):
Responses to POST requests are only cacheable when they include
explicit freshness information (see Section 4.2.1 of [RFC7234]).
However, POST caching is not widely implemented. For cases where an
origin server wishes the client to be able to cache the result of a
POST in a way that can be reused by a later GET, the origin server
MAY send a 200 (OK) response containing the result and a
Content-Location header field that has the same value as the POST's
effective request URI (Section 3.1.4.2).
See also Mark Nottinghams Blog:
POSTs don't deal in representations of identified state, 99 times out
of 100. However, there is one case where it does; when the server goes
out of its way to say that this POST response is a representation of
its URI, by setting a Content-Location header that's the same as the
request URI. When that happens, the POST response is just like a GET
response to the same URI; it can be cached and reused -- but only for
future GET requests.
The rfc also describes a PRG sequence which has a similar effect, allowing the response cycle to a POST to fill the cache for a subsequent GET - which is probably more widely implemented.
Can you try to change the Cache-Control to public instead of private and see if it's working?

Resources