HTTP headers for the most permanent caching possible - http

For example: a stable release of jQuery will never change until the next version. When that happens, the URL will change.
Also, images like a website logo can be cached and when it changes I simply change the URL that is used to call it.
The header I know of is
Expires: Tue, 01 Feb 2050 00:00:00 GMT
I believe there are one or more additional headers I can use to add to the caching for proxies and maybe there is something I don't know about.
Are there any other headers I should know about?
Granted the cache will get deleted beyond my control. But I want to cache as much as possible.
Also this does not cover CSS/JavaScript minify/compile and it also does not count image compression or content compression such as gzip.

Expires is HTTP/1.0 and HTTP/1.1 introduced the more versatile Cache-Control where you can not just specify an expiration date but also cacheability and revalidation.
I recommend you to read Mark Nottingham’s Caching Tutorial.

Related

Understand Request HTTP Cache-Control [duplicate]

The header Cache-Control: max-age=0 implies that the content is considered stale (and must be re-fetched) immediately, which is in effect the same thing as Cache-Control: no-cache.
I had this same question, and found some info in my searches (your question came up as one of the results). Here's what I determined...
There are two sides to the Cache-Control header. One side is where it can be sent by the web server (aka. "origin server"). The other side is where it can be sent by the browser (aka. "user agent").
When sent by the origin server
I believe max-age=0 simply tells caches (and user agents) the response is stale from the get-go and so they SHOULD revalidate the response (eg. with the If-Not-Modified header) before using a cached copy, whereas, no-cache tells them they MUST revalidate before using a cached copy. From 14.9.1 What is Cacheable:
no-cache
...a cache MUST NOT use the response
to satisfy a subsequent request
without successful revalidation with
the origin server. This allows an
origin server to prevent caching even
by caches that have been configured to
return stale responses to client
requests.
In other words, caches may sometimes choose to use a stale response (although I believe they have to then add a Warning header), but no-cache says they're not allowed to use a stale response no matter what. Maybe you'd want the SHOULD-revalidate behavior when baseball stats are generated in a page, but you'd want the MUST-revalidate behavior when you've generated the response to an e-commerce purchase.
Although you're correct in your comment when you say no-cache is not supposed to prevent storage, it might actually be another difference when using no-cache. I came across a page, Cache Control Directives Demystified, that says (I can't vouch for its correctness):
In practice, IE and Firefox have
started treating the no-cache
directive as if it instructs the
browser not to even cache the page.
We started observing this behavior
about a year ago. We suspect that
this change was prompted by the
widespread (and incorrect) use of this
directive to prevent caching.
...
Notice that of late, "cache-control:
no-cache" has also started behaving
like the "no-store" directive.
As an aside, it appears to me that Cache-Control: max-age=0, must-revalidate should basically mean the same thing as Cache-Control: no-cache. So maybe that's a way to get the MUST-revalidate behavior of no-cache, while avoiding the apparent migration of no-cache to doing the same thing as no-store (ie. no caching whatsoever)?
When sent by the user agent
I believe shahkalpesh's answer applies to the user agent side. You can also look at 13.2.6 Disambiguating Multiple Responses.
If a user agent sends a request with Cache-Control: max-age=0 (aka. "end-to-end revalidation"), then each cache along the way will revalidate its cache entry (eg. with the If-Not-Modified header) all the way to the origin server. If the reply is then 304 (Not Modified), the cached entity can be used.
On the other hand, sending a request with Cache-Control: no-cache (aka. "end-to-end reload") doesn't revalidate and the server MUST NOT use a cached copy when responding.
max-age=0
This is equivalent to clicking Refresh, which means, give me the latest copy unless I already have the latest copy.
no-cache
This is holding Shift while clicking Refresh, which means, just redo everything no matter what.
Old question now, but if anyone else comes across this through a search as I did, it appears that IE9 will be making use of this to configure the behaviour of resources when using the back and forward buttons. When max-age=0 is used, the browser will use the last version when viewing a resource on a back/forward press. If no-cache is used, the resource will be refetched.
Further details about IE9 caching can be seen on this msdn caching blog post.
In my recent tests with IE8 and Firefox 3.5, it seems that both are RFC-compliant. However, they differ in their "friendliness" to the origin server. IE8 treats no-cache responses with the same semantics as max-age=0,must-revalidate. Firefox 3.5, however, seems to treat no-cache as equivalent to no-store, which sucks for performance and bandwidth usage.
Squid Cache, by default, seems to never store anything with a no-cache header, just like Firefox.
My advice would be to set public,max-age=0 for non-sensitive resources you want to have checked for freshness on every request, but still allow the performance and bandwidth benefits of caching. For per-user items with the same consideration, use private,max-age=0.
I would avoid the use of no-cache entirely, as it seems it has been bastardized by some browsers and popular caches to the functional equivalent of no-store.
Additionally, do not emulate Akamai and Limelight. While they essentially run massive caching arrays as their primary business, and should be experts, they actually have a vested interest in causing more data to be downloaded from their networks. Google might not be a good choice for emulation, either. They seem to use max-age=0 or no-cache randomly depending on the resource.
max-age
When an intermediate cache is forced, by means of a max-age=0 directive, to revalidate
its own cache entry, and the client has supplied its own validator in the request, the
supplied validator might differ from the validator currently stored with the cache entry.
In this case, the cache MAY use either validator in making its own request without
affecting semantic transparency.
However, the choice of validator might affect performance. The best approach is for the
intermediate cache to use its own validator when making its request. If the server replies
with 304 (Not Modified), then the cache can return its now validated copy to the client
with a 200 (OK) response. If the server replies with a new entity and cache validator,
however, the intermediate cache can compare the returned validator with the one provided in
the client's request, using the strong comparison function. If the client's validator is
equal to the origin server's, then the intermediate cache simply returns 304 (Not
Modified). Otherwise, it returns the new entity with a 200 (OK) response.
If a request includes the no-cache directive, it SHOULD NOT include min-fresh,
max-stale, or max-age.
courtesy: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4
Don't accept this as answer - I will have to read it to understand the true usage of it :)
I'm hardly a caching expert, but Mark Nottingham is. Here are his caching docs. He also has excellent links in the References section.
Based on my reading of those docs, it looks like max-age=0 could allow the cache to send a cached response to requests that came in at the "same time" where "same time" means close enough together they look simultaneous to the cache, but no-cache would not.
By the way, it's worth noting that some mobile devices, particularly Apple products like iPhone/iPad completely ignore headers like no-cache, no-store, Expires: 0, or whatever else you may try to force them to not re-use expired form pages.
This has caused us no end of headaches as we try to get the issue of a user's iPad say, being left asleep on a page they have reached through a form process, say step 2 of 3, and then the device totally ignores the store/cache directives, and as far as I can tell, simply takes what is a virtual snapshot of the page from its last state, that is, ignoring what it was told explicitly, and, not only that, taking a page that should not be stored, and storing it without actually checking it again, which leads to all kinds of strange Session issues, among other things.
I'm just adding this in case someone comes along and can't figure out why they are getting session errors with particularly iphones and ipads, which seem by far to be the worst offenders in this area.
I've done fairly extensive debugger testing with this issue, and this is my conclusion, the devices ignore these directives completely.
Even in regular use, I've found that some mobiles also totally fail to check for new versions via say, Expires: 0 then checking last modified dates to determine if it should get a new one.
It simply doesn't happen, so what I was forced to do was add query strings to the css/js files I needed to force updates on, which tricks the stupid mobile devices into thinking it's a file it does not have, like: my.css?v=1, then v=2 for a css/js update. This largely works.
User browsers also, by the way, if left to their defaults, as of 2016, as I continuously discover (we do a LOT of changes and updates to our site) also fail to check for last modified dates on such files, but the query string method fixes that issue. This is something I've noticed with clients and office people who tend to use basic normal user defaults on their browsers, and have no awareness of caching issues with css/js etc, almost invariably fail to get the new css/js on change, which means the defaults for their browsers, mostly MSIE / Firefox, are not doing what they are told to do, they ignore changes and ignore last modified dates and do not validate, even with Expires: 0 set explicitly.
This was a good thread with a lot of good technical information, but it's also important to note how bad the support for this stuff is in particularly mobile devices. Every few months I have to add more layers of protection against their failure to follow the header commands they receive, or to properly interpet those commands.
This is answered directly in the MDN docs about cache control:
Most HTTP/1.0 caches don't support no-cache directives, so
historically max-age=0 was used as a workaround. But only
max-age=0 could cause a stale response to be reused when caches
disconnected from the origin server. must-revalidate addresses that.
That's why the example below is equivalent to no-cache.
Cache-Control: max-age=0, must-revalidate
But for now, you can simply use no-cache instead.
And also in the MDN docs about cache validation:
It is often stated that the combination of max-age=0 and
must-revalidate has the same meaning as no-cache.
Cache-Control: max-age=0, must-revalidate
max-age=0 means that the
response is immediately stale, and must-revalidate means that it must
not be reused without revalidation once it is stale — so in
combination, the semantics seem to be the same as no-cache.
However, that usage of max-age=0 is a remnant of the fact that many
implementations prior to HTTP/1.1 were unable to handle the no-cache
directive — and so to deal with that limitation, max-age=0 was used as
a workaround.
But now that HTTP/1.1-conformant servers are widely deployed, there's
no reason to ever use that max-age=0-and-must-revalidate combination —
you should instead just use no-cache.
One thing that (surprisingly) hasn't been mentioned is that a request can explicitly indicate that it will accept stale data, using the max-stale directive. In that case, if the server responded with max-age=0, the cache would merely consider the response stale, and would be free to use it to satisfy the client's request [which asked for potentially-stale data]. By contrast, if the server sends no-cache that really does trump any request by the client (with max-stale) for stale data, as the cache MUST revalidate.

How in Flex to load cached preloaded images

In my application, I make numerous calls to preload images to brower cache in the background using Loader instances and ignore the complete event. I don't store the results in the application, but rather want to store them in the browser cache. The images have long Expires header dates.
When I want to use a particular image(s), I again use a Loader instance and call the same url and listen for the complete event to load the file to an Image.
The problem is that when I re-request the url for the "cached" image, it is making an http request with response 200 status, which I presume means it is hitting the server.
How do I make sure that a request for a cached image never hits the server from Flex?
In general, I am finding that any request to a url for a cached image (with a long expires header) is making another request to the server, or at least that is my interpretation of it in Firebug.
Any ideas how to do this? Or am I misinterpreting what Firebug is telling me?
Thanks.
So, yes I was misinterpreting firebug. It turns out that firebug logs the url request and it looks like a normal request. However, if you monitor the network with a network monitor like wireshark you will notice that there are no outgoing packets to the url for the cached images. Flex does load the cached images.
Just to be safe on the image caching, I added a the following Cache-Control header (though I think the expires is enough. One year out at the time of posting this).
Cache-Control: max-age=31536000, must-revalidate
Expires: Thu, 01 Dec 2011 16:00:00 GMT
So, if you set cache headers correctly (note that if the date is not a valid date in the expires header, it does not work), flex will load from cache when you call the url to a cached image.

Redirect browser to another location and force refresh

I would like to kindly ask you for a suggestion regarding browser cache invalidation.
Let's assume we've got an index page that is returned to the client with http headers:
Cache-Control: public, max-age=31534761
Expires: Fri, 17 Feb 2012 18:22:04 GMT
Last-Modified: Thu, 17 Feb 2011 18:22:04 GMT
Vary: Accept-Encoding
Should the user try to hit that index page again, it is very likely that the browser won't even send a request to the server - it will just present the user with the cached version of the page.
My question is: is it possible to create a web resource (for instance at uri /invalidateIndex) such that when a user hits that resource he is redirected to the index page in a way that forces the browser to invalidate its cache and ask the server for fresh content?
I'm having similar problems with a project of my own, so I have a few suggestions, if you haven't already found some solution...
I've seen this as a way jQuery forces ajax requests not to be cached: it adds a HTTP parameter to the URL with a random value or name, so that each new request has essentialy a different URL and the browser then never uses the cache. You could actually have the /invalidateIndex URI redirect to such a URL. The problem of course is that the browser never actually invalidates the original index URL, and that the browser will always re-request your index.
You could of course change the http header Cache-Control with a smaller max-age, say down to an hour, so that the cache is invalidated every hour or so
And also, you could use ETags, wherein the cached data have a tag that will be sent with each request, essentially asking the server if the index has changed or not.
2, 3 can be even combined I think...
There is no direct way of asking a browser to purge its cache of a particular file, but if you have only a few systems like this and plenty of bandwidth, you could try returning large objects on the same protocol, host, and port so that the cache starts evicting old objects. See https://bugzilla.mozilla.org/show_bug.cgi?id=81640 for example.

How to optimize caching of images on a webpage

I have a website that contains pages with many small images. The images are set to cache, with the headers containing:
Expires "Thu, 31 Dec 2037 23:55:55 GMT"
Cache-Control "public, max-age=315360000"
When someone loads a page, however, it seems that we are still forced to send a 304 response for each image--better than sending the whole image, but still takes some time. Of course, this sort of caching is up to the browser, but is it possible to suggest to the browser that it use the cached images without any request at all?
If you have many small images on a page, consider making a CSS sprite with all the images - that will reduce the number of requests a lot. A List Apart explains the concept.
Take a look at RFC 2616, Part of the HTTP/1.1 Protocol:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
You'll see lots of options to play with. Primarily intended for proxies, not Browsers. You can't really force the browser to totally stop Modified-Since-Requests.
Especially older proxies might ignore your Cache-Control hints, see the mentioned paragraph on the aforementioned website:
Note that HTTP/1.0 caches might not implement Cache-Control and
might only implement Pragma: no-cache (see section 14.32).
If you are really concerned about such short requests, take a look if HTTP-keepalive feature in your server is enabled (which has sideeffects on it's own, of course).

What's the difference between Cache-Control: max-age=0 and no-cache?

The header Cache-Control: max-age=0 implies that the content is considered stale (and must be re-fetched) immediately, which is in effect the same thing as Cache-Control: no-cache.
I had this same question, and found some info in my searches (your question came up as one of the results). Here's what I determined...
There are two sides to the Cache-Control header. One side is where it can be sent by the web server (aka. "origin server"). The other side is where it can be sent by the browser (aka. "user agent").
When sent by the origin server
I believe max-age=0 simply tells caches (and user agents) the response is stale from the get-go and so they SHOULD revalidate the response (eg. with the If-Not-Modified header) before using a cached copy, whereas, no-cache tells them they MUST revalidate before using a cached copy. From 14.9.1 What is Cacheable:
no-cache
...a cache MUST NOT use the response
to satisfy a subsequent request
without successful revalidation with
the origin server. This allows an
origin server to prevent caching even
by caches that have been configured to
return stale responses to client
requests.
In other words, caches may sometimes choose to use a stale response (although I believe they have to then add a Warning header), but no-cache says they're not allowed to use a stale response no matter what. Maybe you'd want the SHOULD-revalidate behavior when baseball stats are generated in a page, but you'd want the MUST-revalidate behavior when you've generated the response to an e-commerce purchase.
Although you're correct in your comment when you say no-cache is not supposed to prevent storage, it might actually be another difference when using no-cache. I came across a page, Cache Control Directives Demystified, that says (I can't vouch for its correctness):
In practice, IE and Firefox have
started treating the no-cache
directive as if it instructs the
browser not to even cache the page.
We started observing this behavior
about a year ago. We suspect that
this change was prompted by the
widespread (and incorrect) use of this
directive to prevent caching.
...
Notice that of late, "cache-control:
no-cache" has also started behaving
like the "no-store" directive.
As an aside, it appears to me that Cache-Control: max-age=0, must-revalidate should basically mean the same thing as Cache-Control: no-cache. So maybe that's a way to get the MUST-revalidate behavior of no-cache, while avoiding the apparent migration of no-cache to doing the same thing as no-store (ie. no caching whatsoever)?
When sent by the user agent
I believe shahkalpesh's answer applies to the user agent side. You can also look at 13.2.6 Disambiguating Multiple Responses.
If a user agent sends a request with Cache-Control: max-age=0 (aka. "end-to-end revalidation"), then each cache along the way will revalidate its cache entry (eg. with the If-Not-Modified header) all the way to the origin server. If the reply is then 304 (Not Modified), the cached entity can be used.
On the other hand, sending a request with Cache-Control: no-cache (aka. "end-to-end reload") doesn't revalidate and the server MUST NOT use a cached copy when responding.
max-age=0
This is equivalent to clicking Refresh, which means, give me the latest copy unless I already have the latest copy.
no-cache
This is holding Shift while clicking Refresh, which means, just redo everything no matter what.
Old question now, but if anyone else comes across this through a search as I did, it appears that IE9 will be making use of this to configure the behaviour of resources when using the back and forward buttons. When max-age=0 is used, the browser will use the last version when viewing a resource on a back/forward press. If no-cache is used, the resource will be refetched.
Further details about IE9 caching can be seen on this msdn caching blog post.
In my recent tests with IE8 and Firefox 3.5, it seems that both are RFC-compliant. However, they differ in their "friendliness" to the origin server. IE8 treats no-cache responses with the same semantics as max-age=0,must-revalidate. Firefox 3.5, however, seems to treat no-cache as equivalent to no-store, which sucks for performance and bandwidth usage.
Squid Cache, by default, seems to never store anything with a no-cache header, just like Firefox.
My advice would be to set public,max-age=0 for non-sensitive resources you want to have checked for freshness on every request, but still allow the performance and bandwidth benefits of caching. For per-user items with the same consideration, use private,max-age=0.
I would avoid the use of no-cache entirely, as it seems it has been bastardized by some browsers and popular caches to the functional equivalent of no-store.
Additionally, do not emulate Akamai and Limelight. While they essentially run massive caching arrays as their primary business, and should be experts, they actually have a vested interest in causing more data to be downloaded from their networks. Google might not be a good choice for emulation, either. They seem to use max-age=0 or no-cache randomly depending on the resource.
max-age
When an intermediate cache is forced, by means of a max-age=0 directive, to revalidate
its own cache entry, and the client has supplied its own validator in the request, the
supplied validator might differ from the validator currently stored with the cache entry.
In this case, the cache MAY use either validator in making its own request without
affecting semantic transparency.
However, the choice of validator might affect performance. The best approach is for the
intermediate cache to use its own validator when making its request. If the server replies
with 304 (Not Modified), then the cache can return its now validated copy to the client
with a 200 (OK) response. If the server replies with a new entity and cache validator,
however, the intermediate cache can compare the returned validator with the one provided in
the client's request, using the strong comparison function. If the client's validator is
equal to the origin server's, then the intermediate cache simply returns 304 (Not
Modified). Otherwise, it returns the new entity with a 200 (OK) response.
If a request includes the no-cache directive, it SHOULD NOT include min-fresh,
max-stale, or max-age.
courtesy: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4
Don't accept this as answer - I will have to read it to understand the true usage of it :)
I'm hardly a caching expert, but Mark Nottingham is. Here are his caching docs. He also has excellent links in the References section.
Based on my reading of those docs, it looks like max-age=0 could allow the cache to send a cached response to requests that came in at the "same time" where "same time" means close enough together they look simultaneous to the cache, but no-cache would not.
By the way, it's worth noting that some mobile devices, particularly Apple products like iPhone/iPad completely ignore headers like no-cache, no-store, Expires: 0, or whatever else you may try to force them to not re-use expired form pages.
This has caused us no end of headaches as we try to get the issue of a user's iPad say, being left asleep on a page they have reached through a form process, say step 2 of 3, and then the device totally ignores the store/cache directives, and as far as I can tell, simply takes what is a virtual snapshot of the page from its last state, that is, ignoring what it was told explicitly, and, not only that, taking a page that should not be stored, and storing it without actually checking it again, which leads to all kinds of strange Session issues, among other things.
I'm just adding this in case someone comes along and can't figure out why they are getting session errors with particularly iphones and ipads, which seem by far to be the worst offenders in this area.
I've done fairly extensive debugger testing with this issue, and this is my conclusion, the devices ignore these directives completely.
Even in regular use, I've found that some mobiles also totally fail to check for new versions via say, Expires: 0 then checking last modified dates to determine if it should get a new one.
It simply doesn't happen, so what I was forced to do was add query strings to the css/js files I needed to force updates on, which tricks the stupid mobile devices into thinking it's a file it does not have, like: my.css?v=1, then v=2 for a css/js update. This largely works.
User browsers also, by the way, if left to their defaults, as of 2016, as I continuously discover (we do a LOT of changes and updates to our site) also fail to check for last modified dates on such files, but the query string method fixes that issue. This is something I've noticed with clients and office people who tend to use basic normal user defaults on their browsers, and have no awareness of caching issues with css/js etc, almost invariably fail to get the new css/js on change, which means the defaults for their browsers, mostly MSIE / Firefox, are not doing what they are told to do, they ignore changes and ignore last modified dates and do not validate, even with Expires: 0 set explicitly.
This was a good thread with a lot of good technical information, but it's also important to note how bad the support for this stuff is in particularly mobile devices. Every few months I have to add more layers of protection against their failure to follow the header commands they receive, or to properly interpet those commands.
This is answered directly in the MDN docs about cache control:
Most HTTP/1.0 caches don't support no-cache directives, so
historically max-age=0 was used as a workaround. But only
max-age=0 could cause a stale response to be reused when caches
disconnected from the origin server. must-revalidate addresses that.
That's why the example below is equivalent to no-cache.
Cache-Control: max-age=0, must-revalidate
But for now, you can simply use no-cache instead.
And also in the MDN docs about cache validation:
It is often stated that the combination of max-age=0 and
must-revalidate has the same meaning as no-cache.
Cache-Control: max-age=0, must-revalidate
max-age=0 means that the
response is immediately stale, and must-revalidate means that it must
not be reused without revalidation once it is stale — so in
combination, the semantics seem to be the same as no-cache.
However, that usage of max-age=0 is a remnant of the fact that many
implementations prior to HTTP/1.1 were unable to handle the no-cache
directive — and so to deal with that limitation, max-age=0 was used as
a workaround.
But now that HTTP/1.1-conformant servers are widely deployed, there's
no reason to ever use that max-age=0-and-must-revalidate combination —
you should instead just use no-cache.
One thing that (surprisingly) hasn't been mentioned is that a request can explicitly indicate that it will accept stale data, using the max-stale directive. In that case, if the server responded with max-age=0, the cache would merely consider the response stale, and would be free to use it to satisfy the client's request [which asked for potentially-stale data]. By contrast, if the server sends no-cache that really does trump any request by the client (with max-stale) for stale data, as the cache MUST revalidate.

Resources