HTTP header weird field explanation - http

If you look on the response header of the http://www.yale.edu, there is a field such as
Expires: Sun, 19 Nov 1978 05:00:00 GMT.
What does it stand for?
If you want to look at it yourself, just type in terminal
curl -I http://www.yale.edu

A couple of years back, this was the main way of specifying when assets expires. Expires is simply a basic date-time stamp. It’s fairly useful for old user agents which still roam unchartered territories. It is however important to note that cache-control headers, max-age and s-maxage still take precedence on most modern systems. It’s however good practice to set matching values here for the sake of compatibility. It’s also important to ensure you format the date properly or it might be considered as expired.
taken from here
After that the response is no longer cached. See here
Also worth to look at

Related

Why is the Nginx etag created from last-modified-time and content-length?

Nginx etag source
etag->value.len = ngx_sprintf(etag->value.data, "\"%xT-%xO\"",
r->headers_out.last_modified_time,
r->headers_out.content_length_n)
- etag->value.data;
r->headers_out.etag = etag;
If the file last-modified-time in server is changed but the file content has not been updated, does the etag value will be the same?
Why not the etag value generated by content hash?
Why not the etag value generated by content hash?
Unless nginx has documented the reason it's hard to say why.
My speculation is that they did it this way because it's very fast and only takes a constant amount of time. Computing a hash can be a costly operation, with the amount of time needed depending on the size of the response. nginx, with a reputation for simplicity and speed, may not have been wiling to add that overhead.
If the file last-modified-time in server is changed but the file content has not been updated, does the etag value will be the same?
No, it will not be the same and therefore the file will have to be re-served. The result is a slower response than you would get with a hash-based ETag, but the response will be correct.
The bigger concern with this algorithm is that the content could change while the ETag stays the same, in which case the response will be incorrect. This could happen if the file changes (in a way that keeps the same length) faster than the one-second precision of the Last-Modified time. (In theory a hash-based approach has the same issue—that is, it's possible for two different files to produce the same hash—but collisions are so unlikely that it's not a concern in practice.)
So presumably nginx weighed this tradeoff—a faster response, but one that has a slight chance of being incorrect—and decided that it was worth it.

last-modified vs expires precedence

I have read through lot of discussions, and found most of the claim saying expires is higher precedence over last-modified, meaning if a response already expired, it will not even send out if-modified-since to server and of course the response code will not be 304.
But my situation is totally weird, I have returned back last-modified in response, and somehow CDN/proxy side add in expires header, which value is same as date response header, I suppose same value in expires and date header will cause the response stale immediately, but in fact, my client browser will still send out request with if-modified-since header, this will cause a 304 response code returned from the server.
I read throught RFC 2616, it doesn't tell much as well. So what happen to this case?
Almost 2 years but find no answer...
I managed find out some reference:
The freshness lifetime is calculated based on several headers. If a
"Cache-control: max-age=N" header is specified, then the freshness
lifetime is equal to N. If this header is not present, which is very
often the case, it is checked if an Expires header is present. If an
Expires header exists, then its value minus the value of the Date
header determines the freshness lifetime. Finally, if neither header
is present, look for a Last-Modified header. If this header is
present, then the cache's freshness lifetime is equal to the value of
the Date header minus the value of the Last-modified header divided by
10.
Although I can't find confirmed precedence in RFC, but I think this MDN quote is quite reliable enough.
It is pretty normal if some browser doesn't implement in this way... So to avoid any issue, the best is to not return these two headers in the response at the same time.

Is If-Modified-Since strong or weak validation?

HTTP 1.1 states that there can be either strong and weak ETag/If-None-Match validation. My questions is, is Last-Modified/If-Modified-Since validation strong or weak?
This has implications whether sub-range requests can be made or not.
From http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p5-range-23.html#rfc.section.4.3:
"A response might transfer only a subrange of a representation if the connection closed prematurely or if the request used one or more Range specifications. After several such transfers, a client might have received several ranges of the same representation. These ranges can only be safely combined if they all have in common the same strong validator, where "strong validator" is defined to be either an entity-tag that is not marked as weak (Section 2.3 of [Part4]) or, if no entity-tag is provided, a Last-Modified value that is strong in the sense defined by Section 2.2.2 of [Part4]."
An ETag can be strong or weak depending on its suffix. Normally it will be strong, except if you access dynamic content where the content management system (CMS) handles that which is IMHO very uncommon.
However, the If-Modified-Since headers result should be strong too if and only if nobody manipulates the metadata of the files in the filesystem. In Linux it is pretty simple with the touch command, however I think you normally don't need to care about that. If somebody manipulates your server you have a different problem entirely.

Pre flush head tag with gzip support

http://developer.yahoo.com/performance/rules.html
There it is given it is good to preflush the head tag .
But I have a question will it help while using gzip ? (I am using apache2).
I think full document will get gziped at one shot and then send to the client.
or is it also possible to have gzip as well as pre-flush the head tag
EDITED
The original version of this question suggested we were dealing with HTTP headers rather than the <head> section on an HTML document. I will leave my original answer below, but it actually has no relevance to this specific question.
To answer the question about pre-flushing the <head> section of a document - while it would be possible to do this in combination with gzip, it is probably not possible without more granular control over the gzip process than Apache affords. It is possible to break a gzipped stream into chunks that can be decompressed on their own (see this) but if there is a way to control Apache's gzip implementation to such a degree then I am not aware of it.
Doing so would likely decrease the efficacy of the gzip, making the compressed size larger, and would only be worth doing when the <head> of a document was particularly large, say, greater than 10KB (this is a somewhat arbitrary value I arrived at by reading about how gzip works under the bonnet, and should definitely not be taken as gospel).
Original answer, relating to the HTTP headers:
Purely from the viewpoint of the HTTP protocol, rather than exactly how you would implement it on an Apache based server, I can't see any reason why you can't preflush the headers and also use gzip to compress the body. Keeping in mind that fact that the headers are never gzipped (if they were, how would the client know they had been?), the transfer encoding of the content should have no effect on when you send the headers.
There are, however a couple of things to keep in mind:
Once the headers have been sent, you can't change your mind about the transfer encoding. So if you send the headers which state that the body will be gzipped, then realise that your body is only 4 bytes, your would still have to gzip it anyway, which would actually increase the size of the body. This probably wouldn't be a problem unless you were omitting the Content-Length: header which while possible, is bad practice as it means you cannot use persistent connections. This leads on to the next point...
You cannot send a Content-Length: header in this secenario. This is because you don't know what the size of the body is until you have compressed it, by which time it is ready to send, so you are not really (from the server's point of view) preflushing the headers, even if you do send them seperately before you start to send the body.
If it takes you a long time time to compress the body of the message (slow/heavily loaded server, very large body etc etc), and you don't start the compression until after you have sent the headers, there is a risk the client may time out waiting for the rest of the response. This depends entirely on the client, but the are so many HTTP client implementations out there that this possibility cannot be totally discounted.
In short, yes it is possible to do it, but there is no catch-all, "Yes, do it" or "No, don't do it" answer - whether you would do it depends on each request and the nature of it's response.

Max value for cache control header in HTTP

I'm using Amazon S3 to serve static assets for my website. I want to have browsers cache these assets for as long as possible. What meta-data headers should I include with my assets
Cache-Control: max-age=???
Generally one year is advised as a standard max value. See RFC 2616:
To mark a response as "never expires," an origin server sends an
Expires date approximately one year from the time the response is
sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one
year in the future.
Although that applies to the older expires standard, it makes sense to apply to cache-control too in the absence of any explicit standards guidance. It's as long as you should generally need anyway and picking any arbitrarily longer value could break some user-agents. So:
Cache-Control: max-age=31536000
Consider not storing it for "as long as possible," and instead settling for as long as reasonable. For instance, it's unlikely you'd need to cache it for longer than say 10 years...am I right?
The RFC discusses max-age here: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3
Eric Lawrence says that prior to IE9, Internet Explorer would treat as stale any resource with a Cache-Control: max-age value over 2147483648 (2^31) seconds, approximately 68 years (http://blogs.msdn.com/b/ie/archive/2010/07/14/caching-improvements-in-internet-explorer-9.aspx).
Other user agents will of course vary, so...try and choose a number that is unlikely (rather than likely!) to cause an overflow. Max-age greater than 31536000 (one year) makes little sense, and informally this is considered a reasonable maximum value.
The people who created the recommendation of maximum 1 year caching did not think it through properly.
First of all, if a visitor is being served an outdated cached file, then why would it provide any benefit to have it suddenly load a fresh version after 1 year? If a file has 1 year TTL, from a functional perspective, it obviously means that the file is not intended to be changed at all.
So why would one need more than 1 year?
1) Why not? It doesn't server any purpose to tell the visitors browser "hey, this file is 1 year old, it might be an idea to check if it has been updated".
2) CDN services. Most content delivery networks use the cache header to decide how long to serve a file efficiently from the edge server. If you have 1 year cache control for the files, it will at some point start re-requesting non-changed files from the origin server, and edge-cache will need to be entirely re-populated, causing slower loads for client, and unnecessary calls to the origin.
What is the point of having max 1 year? What browsers will choke on an amount set higher than 31536000?

Resources