What is the definitive solution for avoid any kind of caching of http data? We can modify the client as well as the server - so I think we can split the task between client and the server.
Client can append to each request a random parameter http://URL/path?rand=6372637263 – My feeling is that using only this way it is not working 100% - might be there are some intelligent proxies, which can detect that… On the other side I think that if the URL is different from the previous one, the proxy cannot simply decide to send back some cached response.
On server can control a bunch of HTTP headers:
Expires: Tue, 03 Jul 2001 06:00:00 GMT
Last-Modified: {now} GMT
Cache-Control: no-store, no-cache, must-revalidate, max-age=0
Cache-Control: post-check=0, pre-check=0
Pragma: no-cache
Any comments to this, what is the best approach?
Server-side cache control headers should look like:
Expires: Tue, 03 Jul 2001 06:00:00 GMT
Last-Modified: {now} GMT
Cache-Control: max-age=0, no-cache, must-revalidate, proxy-revalidate
Avoid rewriting URLs on the client because it pollutes caches, and causes other weird semantic issues. Furthermore:
Use one Cache-Control header (see rfc 2616) because behaviour with multiple entries is undefined. Also the MSIE specific entries in the second cache-control are at best redundant.
no-store is about data security. (it only means don't write this to disk - caches are still allowed to store the response in memory).
Pragma: no-cache is meaningless in a server response - it's a request header meaning that any caches receiving the request must forward it to the origin.
Using both Expires (http/1.0) and cache-control (http/1.1) is not redundant since proxies exist that only speak http/1.0, or will downgrade the protocol.
Technically, the last modified header is redundant in light of no-cache, but it's a good idea to leave it in there.
Some browsers will ignore subsequent directives in a cache-control header after they come across one they don't recognise - so put the important stuff first.
Adding header
Cache-control: private
guarantees, that gataway cache won't cache such request.
I'd like to recommend you Fabien Potencier lecture about caching: http://www.slideshare.net/fabpot/caching-on-the-edge
To disable the cache, you should use
Expires: 0
Or
Cache-Control: no-store
If you use one then should not use other one.
Related
I see big player (i.e. akamai) started to drop the Expires header all together and only use Cache-Control, e.g.
curl -I https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-snc7/395029_379645875452936_1719075242_n.jpg
HTTP/1.1 200 OK
Last-Modified: Fri, 01 Jan 2010 00:00:00 GMT
Date: Sun, 25 Nov 2012 16:46:43 GMT
Connection: keep-alive
Cache-Control: max-age=1209600
So still any reason to keep using Expires?
Cache-Control was introduced in HTTP 1.1 to replace Expires. If both headers are present, Cache-Control is preferred over Expires:
If a response includes both an Expires header and a max-age
directive, the max-age directive overrides the Expires header, even
if the Expires header is more restrictive. This rule allows an origin
server to provide, for a given response, a longer expiration time to
an HTTP/1.1 (or later) cache than to an HTTP/1.0 cache. This might be
useful if certain HTTP/1.0 caches improperly calculate ages or
expiration times, perhaps due to desynchronized clocks.
But there are still clients out there that can only HTTP 1.0. So for HTTP 1.0 requests/responses, you should still use Expires.
OK, I'm having aplay with expires headers in IIS6 on our development server and I don't really get it!
So if I don't add an expires header to a file I get the following request/response when viewed with firebug:
Accept */*
Accept-Encoding gzip, deflate
Accept-Language en-gb,en;q=0.5
Cache-Control no-cache
Connection keep-alive
Cookie __utma=222382046.267771103.1330592028.1337002926.1340787333.122; __utmz=222382046.1330592028.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utma=76038230.629470783.1340728034.1340728034.1340786921.2; __utmz=76038230.1340728034.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); timeOutCookie=Wed%20Jun%2027%202012%2011%3A17%3A22%20GMT+0100%20%28GMT%20Daylight%20Time%29; __utmb=76038230.26.10.1340786921; __utmb=222382046.5.10.1340787333; ASP.NET_SessionId=yhib5kyxf1m5azuhoogrstt5; __utmc=76038230; Travel2=ECC62DC4F9C36A41F3BCF0C54F96D877FEA32D4867DB1A3A97D0C6A3BE79EE98517B9B1A4E24289C863D86A2A4A846EA1FF4BF3822E8B6CBF872E25DD1ADF306F724EE1500AA71E28CFCD02476748163929B73856C505E50D185C05E6322488F
Host site
Pragma no-cache
Referer http://site/Agents/Flights/FlightSearch.aspx?
Response:
Accept-Ranges bytes
Content-Length 17864
Content-Type application/x-javascript
Date Wed, 27 Jun 2012 09:21:07 GMT
Etag "0de7d7f192dcd1:a07d"
Last-Modified Tue, 08 May 2012 12:53:00 GMT
Server Microsoft-IIS/6.0
X-Powered-By ASP.NET
Now if I press f5 now, the system retrieves the file from the client cache, cool!
Now if I add the expires header and press ctrl f5 I get a slightly different request/response:
Accept */*
Accept-Encoding gzip, deflate
Accept-Language en-gb,en;q=0.5
Cache-Control no-cache
Connection keep-alive
Cookie __utma=222382046.267771103.1330592028.1337002926.1340787333.122; __utmz=222382046.1330592028.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utma=76038230.629470783.1340728034.1340728034.1340786921.2; __utmz=76038230.1340728034.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); timeOutCookie=Wed%20Jun%2027%202012%2011%3A21%3A11%20GMT+0100%20%28GMT%20Daylight%20Time%29; __utmb=76038230.27.10.1340786921; __utmb=222382046.5.10.1340787333; ASP.NET_SessionId=yhib5kyxf1m5azuhoogrstt5; __utmc=76038230; Travel2=ECC62DC4F9C36A41F3BCF0C54F96D877FEA32D4867DB1A3A97D0C6A3BE79EE98517B9B1A4E24289C863D86A2A4A846EA1FF4BF3822E8B6CBF872E25DD1ADF306F724EE1500AA71E28CFCD02476748163929B73856C505E50D185C05E6322488F
Host site
Pragma no-cache
Referer http://site/Agents/Flights/FlightSearch.aspx?
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1
Response:
Accept-Ranges bytes
Cache-Control max-age=86400
Content-Length 17864
Content-Type application/x-javascript
Date Wed, 27 Jun 2012 09:24:41 GMT
Etag "0de7d7f192dcd1:a082"
Last-Modified Tue, 08 May 2012 12:53:00 GMT
Server Microsoft-IIS/6.0
X-Powered-By ASP.NET
Brilliant I've now got a max age in the cache control. Now what's confusing me is this as far as I can tell has now practical difference in how the site performs in terms of downloads. If I press f5 it gets it from the cache, if I press control f5 it gets it from the server with a HTTP 200.
So how does this improve performance? How do you get a HTTP 304 instead of a http 200? I just don't get what this practically archives?
any help would be good thanks
When you set Expires or max-age explicitly, you are telling the client that it will be safe to cache the response for that much time. The client will happily get it from cache, it will not touch your server, there will be no 304. Unless you do Ctrl+F5, which forces the browser to do a full request anew, resulting in 200.
Now what if you don’t set Expires nor max-age? This just means that the client will pick an expiration time by itself, heuristically. Your response is still cached, only the browser has to guess for how long.
So, Expires/max-age are useful in two cases.
If you want to recommend caching for a specific period of time—longer than a browser would guess. This is often done with versioned static content, which never changes, so expiration time is set on the order of years.
If you want to prevent caching, in which case you set Cache-Control: no-cache and Expires in the past (some versions of IE will ignore no-cache).
Conditional requests, 304 and all that, only come to play after the content has already expired. To revalidate it, the client might do a conditional GET, which, depending on your server setup, may or may not result in 304.
The performance improvements come from making fewer HTTP requests. When a browser is parsing a page and sees it has to request a CSS file, if it's already got a copy of it in it's cache with a max-age=31536000, it knows it's cached copy of the file is good for 1 year and doesn't have to make the HTTP request to fetch the file.
Less round trips to the server should result in a faster loading page, and a better experience for users.
I have a particular HTTP response which I don't want cached because it has private/sensitive data in it
I'm already setting Cache-Control to no-store,
which should handle clients supporting HTTP/1.1.
How do I use the Expires header to do the same for HTTP/1.0? Should I just set it with an arbitrary timestamp from 1970 or something? Is there a special value to tell it never to cache?
The HTTP RFC says:
To mark a response as "already expired," an origin server sends an Expires date that is equal to the Date header value.
You should set the expires header to a date in the past. And you should also set the must-revalidate flag on the Cache-Control header.
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-control: no-cache, must-revalidate
You can find a good article dealing with caching issues on the doctype wiki:
Setting an Expires header in the past ensures that HTTP/1.0 and
HTTP/1.1 proxies and browsers will not cache the content. The
Cache-control directive also tells HTTP/1.1 proxies not to cache the
content. Even if proxies may be configured to return stale content
when they should not, the must-revalidate re-affirms that they SHOULD
NOT do it.
Here is the logo currently used on www.google.com:
http://www.google.com/images/logos/ps_logo2.png
Here's its HTTP response:
HTTP/1.1 200 OK
Content-Type: image/png
Last-Modified: Thu, 05 Aug 2010 22:54:44 GMT
Date: Fri, 25 Mar 2011 16:41:05 GMT
Expires: Fri, 25 Mar 2011 16:41:05 GMT
Cache-Control: private, max-age=31536000
X-Content-Type-Options: nosniff
Server: sffe
Content-Length: 26209
Age: 0
Via: 1.1 localhost.localdomain
The Cache-Control header says it's good for 1 year. But Expires is the same as Date, i.e. it's stale immediately.
Why the difference?
Cache-Control overrides Expires on any HTTP/1.1 cache or client.
So I assume Google wants to cache the image for HTTP/1.1 but not cache it at all for HTTP/1.0.
I don't know why Google cares. I would think they'd want to cache the logo even for older clients.
The reason is that google wants the user to cache the image but not intermediate shared caches (hence the private directive).
Many intermediate cache systems can be outdated and ignore new HTTP features (as the cache-control header), so this approach makes them not to cache the resource (via the expires header). For the rest of agents understanding both, the cache-control overrides expires header.
This is a common practice referenced in rfc2616 sec14.9.3
An origin server might wish to use a relatively new HTTP cache control feature, such as the "private" directive, on a network including older caches that do not understand that feature. The origin server will need to combine the new feature with an Expires field whose value is less than or equal to the Date value. This will prevent older caches from improperly caching the response.
Often I use
Cache-Control: no-cache
or
Cache-Control: max-age=0
The spec says must-revalidate is for max-stale... (the server issue max-stale?)
So if for normal web servers, Apache, or Rails with Mongrels, then I think usually there is no max-stale, so must-revalidate is not needed?
must-revalidate should be specified by servers where it would be incorrect (and not just suboptimal) for a client to get a stale response. This applies to all requests with max-stale as you mentioned. It also applies, if a cache temporarily loses connectivity to the origin (a cache is allowed to return a stale entry with a Warning header in this case). That being said, I think you are right that this directive is not needed particularly in practice; it's seen most often in a situation where the origin wants to let a client cache a copy of the resource (for bandwidth conservation purposes) but always validate it before use, as in:
Cache-Control: private, max-age=0, must-revalidate