Max value for cache control header in HTTP - http

I'm using Amazon S3 to serve static assets for my website. I want to have browsers cache these assets for as long as possible. What meta-data headers should I include with my assets
Cache-Control: max-age=???

Generally one year is advised as a standard max value. See RFC 2616:
To mark a response as "never expires," an origin server sends an
Expires date approximately one year from the time the response is
sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one
year in the future.
Although that applies to the older expires standard, it makes sense to apply to cache-control too in the absence of any explicit standards guidance. It's as long as you should generally need anyway and picking any arbitrarily longer value could break some user-agents. So:
Cache-Control: max-age=31536000

Consider not storing it for "as long as possible," and instead settling for as long as reasonable. For instance, it's unlikely you'd need to cache it for longer than say 10 years...am I right?
The RFC discusses max-age here: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3
Eric Lawrence says that prior to IE9, Internet Explorer would treat as stale any resource with a Cache-Control: max-age value over 2147483648 (2^31) seconds, approximately 68 years (http://blogs.msdn.com/b/ie/archive/2010/07/14/caching-improvements-in-internet-explorer-9.aspx).
Other user agents will of course vary, so...try and choose a number that is unlikely (rather than likely!) to cause an overflow. Max-age greater than 31536000 (one year) makes little sense, and informally this is considered a reasonable maximum value.

The people who created the recommendation of maximum 1 year caching did not think it through properly.
First of all, if a visitor is being served an outdated cached file, then why would it provide any benefit to have it suddenly load a fresh version after 1 year? If a file has 1 year TTL, from a functional perspective, it obviously means that the file is not intended to be changed at all.
So why would one need more than 1 year?
1) Why not? It doesn't server any purpose to tell the visitors browser "hey, this file is 1 year old, it might be an idea to check if it has been updated".
2) CDN services. Most content delivery networks use the cache header to decide how long to serve a file efficiently from the edge server. If you have 1 year cache control for the files, it will at some point start re-requesting non-changed files from the origin server, and edge-cache will need to be entirely re-populated, causing slower loads for client, and unnecessary calls to the origin.
What is the point of having max 1 year? What browsers will choke on an amount set higher than 31536000?

Related

How long can an ETag be (storing AWS S3 ETag in database)

Assuming I want to store an ETag in a database column, what length should I allocate?
As far as I can tell there is not limit on the length of an ETag in the spec (https://www.rfc-editor.org/rfc/rfc7232#section-2.3). Even if I use a varchar(max) technically someone could use more than 2billion characters in an ETag, but we know that's not realistic. We also know web servers will barf on more than a few KB in total headers (Maximum on HTTP header values?) so the limit is way lower than that.
Typically ETags are going to be hashes (don't have to be, '5' is a perfectly valid etag), so I'm thinking 64 bytes is a minimum (SHA512), 100 is probably 'safe'. But does anyone have a better limit? What have people seen in the wild?
(I actually only care about AWS S3 ETag values if someone has a answer for that specific case I'll take it)

Why is the Nginx etag created from last-modified-time and content-length?

Nginx etag source
etag->value.len = ngx_sprintf(etag->value.data, "\"%xT-%xO\"",
r->headers_out.last_modified_time,
r->headers_out.content_length_n)
- etag->value.data;
r->headers_out.etag = etag;
If the file last-modified-time in server is changed but the file content has not been updated, does the etag value will be the same?
Why not the etag value generated by content hash?
Why not the etag value generated by content hash?
Unless nginx has documented the reason it's hard to say why.
My speculation is that they did it this way because it's very fast and only takes a constant amount of time. Computing a hash can be a costly operation, with the amount of time needed depending on the size of the response. nginx, with a reputation for simplicity and speed, may not have been wiling to add that overhead.
If the file last-modified-time in server is changed but the file content has not been updated, does the etag value will be the same?
No, it will not be the same and therefore the file will have to be re-served. The result is a slower response than you would get with a hash-based ETag, but the response will be correct.
The bigger concern with this algorithm is that the content could change while the ETag stays the same, in which case the response will be incorrect. This could happen if the file changes (in a way that keeps the same length) faster than the one-second precision of the Last-Modified time. (In theory a hash-based approach has the same issue—that is, it's possible for two different files to produce the same hash—but collisions are so unlikely that it's not a concern in practice.)
So presumably nginx weighed this tradeoff—a faster response, but one that has a slight chance of being incorrect—and decided that it was worth it.

Maximum Cookie Size of current browsers (Year 2018)

From the django docs:
Both RFC 2109 and RFC 6265 state that user agents should support cookies of at least 4096 bytes. For many browsers this is also the maximum size.
Source: https://docs.djangoproject.com/en/2.1/ref/request-response/
Is this still valid today?
What is the maximum cookie size of current browsers?
The cookie spec definition in RFC6265 (April 2011) is the current RFC (No new draft and no new RFC) and is supported by all major browsers (IE,Chrome,Opera,Firefox) today.
At least 4096 bytes for the entire cookie (as measured by the sum of all of the cookie names, values, and attributes).
At least 50 cookies per domain, provided they don't go over the above limit.
At least 3000 cookies total.
So all modern browsers support AT LEAST this. Any other limit values are a gamble
See 6.1. Limits in https://datatracker.ietf.org/doc/rfc6265/ for more details
You can test it out by setting and reading back a cookie size from JavaScript in an iteration if you are interested in modern browsers only.
That is what I was doing in the past. And this is exactly what this site is about, it also includes the limits by browsers.
But keep in mind that the matching cookies will travel with every HTTP requests so they could dramatically affect the perceived response time.
here is the detail which you can refer - http://browsercookielimits.iain.guru/
Typically, the following are allowed:
300 cookies in total
4096 bytes per cookie
20 cookies per domain
81920 bytes per domain*
Given 20 cookies of max size 4096 = 81920 bytes.

HTTP header weird field explanation

If you look on the response header of the http://www.yale.edu, there is a field such as
Expires: Sun, 19 Nov 1978 05:00:00 GMT.
What does it stand for?
If you want to look at it yourself, just type in terminal
curl -I http://www.yale.edu
A couple of years back, this was the main way of specifying when assets expires. Expires is simply a basic date-time stamp. It’s fairly useful for old user agents which still roam unchartered territories. It is however important to note that cache-control headers, max-age and s-maxage still take precedence on most modern systems. It’s however good practice to set matching values here for the sake of compatibility. It’s also important to ensure you format the date properly or it might be considered as expired.
taken from here
After that the response is no longer cached. See here
Also worth to look at

Pre flush head tag with gzip support

http://developer.yahoo.com/performance/rules.html
There it is given it is good to preflush the head tag .
But I have a question will it help while using gzip ? (I am using apache2).
I think full document will get gziped at one shot and then send to the client.
or is it also possible to have gzip as well as pre-flush the head tag
EDITED
The original version of this question suggested we were dealing with HTTP headers rather than the <head> section on an HTML document. I will leave my original answer below, but it actually has no relevance to this specific question.
To answer the question about pre-flushing the <head> section of a document - while it would be possible to do this in combination with gzip, it is probably not possible without more granular control over the gzip process than Apache affords. It is possible to break a gzipped stream into chunks that can be decompressed on their own (see this) but if there is a way to control Apache's gzip implementation to such a degree then I am not aware of it.
Doing so would likely decrease the efficacy of the gzip, making the compressed size larger, and would only be worth doing when the <head> of a document was particularly large, say, greater than 10KB (this is a somewhat arbitrary value I arrived at by reading about how gzip works under the bonnet, and should definitely not be taken as gospel).
Original answer, relating to the HTTP headers:
Purely from the viewpoint of the HTTP protocol, rather than exactly how you would implement it on an Apache based server, I can't see any reason why you can't preflush the headers and also use gzip to compress the body. Keeping in mind that fact that the headers are never gzipped (if they were, how would the client know they had been?), the transfer encoding of the content should have no effect on when you send the headers.
There are, however a couple of things to keep in mind:
Once the headers have been sent, you can't change your mind about the transfer encoding. So if you send the headers which state that the body will be gzipped, then realise that your body is only 4 bytes, your would still have to gzip it anyway, which would actually increase the size of the body. This probably wouldn't be a problem unless you were omitting the Content-Length: header which while possible, is bad practice as it means you cannot use persistent connections. This leads on to the next point...
You cannot send a Content-Length: header in this secenario. This is because you don't know what the size of the body is until you have compressed it, by which time it is ready to send, so you are not really (from the server's point of view) preflushing the headers, even if you do send them seperately before you start to send the body.
If it takes you a long time time to compress the body of the message (slow/heavily loaded server, very large body etc etc), and you don't start the compression until after you have sent the headers, there is a risk the client may time out waiting for the rest of the response. This depends entirely on the client, but the are so many HTTP client implementations out there that this possibility cannot be totally discounted.
In short, yes it is possible to do it, but there is no catch-all, "Yes, do it" or "No, don't do it" answer - whether you would do it depends on each request and the nature of it's response.

Resources