OK, I'm having aplay with expires headers in IIS6 on our development server and I don't really get it!
So if I don't add an expires header to a file I get the following request/response when viewed with firebug:
Accept */*
Accept-Encoding gzip, deflate
Accept-Language en-gb,en;q=0.5
Cache-Control no-cache
Connection keep-alive
Cookie __utma=222382046.267771103.1330592028.1337002926.1340787333.122; __utmz=222382046.1330592028.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utma=76038230.629470783.1340728034.1340728034.1340786921.2; __utmz=76038230.1340728034.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); timeOutCookie=Wed%20Jun%2027%202012%2011%3A17%3A22%20GMT+0100%20%28GMT%20Daylight%20Time%29; __utmb=76038230.26.10.1340786921; __utmb=222382046.5.10.1340787333; ASP.NET_SessionId=yhib5kyxf1m5azuhoogrstt5; __utmc=76038230; Travel2=ECC62DC4F9C36A41F3BCF0C54F96D877FEA32D4867DB1A3A97D0C6A3BE79EE98517B9B1A4E24289C863D86A2A4A846EA1FF4BF3822E8B6CBF872E25DD1ADF306F724EE1500AA71E28CFCD02476748163929B73856C505E50D185C05E6322488F
Host site
Pragma no-cache
Referer http://site/Agents/Flights/FlightSearch.aspx?
Response:
Accept-Ranges bytes
Content-Length 17864
Content-Type application/x-javascript
Date Wed, 27 Jun 2012 09:21:07 GMT
Etag "0de7d7f192dcd1:a07d"
Last-Modified Tue, 08 May 2012 12:53:00 GMT
Server Microsoft-IIS/6.0
X-Powered-By ASP.NET
Now if I press f5 now, the system retrieves the file from the client cache, cool!
Now if I add the expires header and press ctrl f5 I get a slightly different request/response:
Accept */*
Accept-Encoding gzip, deflate
Accept-Language en-gb,en;q=0.5
Cache-Control no-cache
Connection keep-alive
Cookie __utma=222382046.267771103.1330592028.1337002926.1340787333.122; __utmz=222382046.1330592028.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utma=76038230.629470783.1340728034.1340728034.1340786921.2; __utmz=76038230.1340728034.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); timeOutCookie=Wed%20Jun%2027%202012%2011%3A21%3A11%20GMT+0100%20%28GMT%20Daylight%20Time%29; __utmb=76038230.27.10.1340786921; __utmb=222382046.5.10.1340787333; ASP.NET_SessionId=yhib5kyxf1m5azuhoogrstt5; __utmc=76038230; Travel2=ECC62DC4F9C36A41F3BCF0C54F96D877FEA32D4867DB1A3A97D0C6A3BE79EE98517B9B1A4E24289C863D86A2A4A846EA1FF4BF3822E8B6CBF872E25DD1ADF306F724EE1500AA71E28CFCD02476748163929B73856C505E50D185C05E6322488F
Host site
Pragma no-cache
Referer http://site/Agents/Flights/FlightSearch.aspx?
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1
Response:
Accept-Ranges bytes
Cache-Control max-age=86400
Content-Length 17864
Content-Type application/x-javascript
Date Wed, 27 Jun 2012 09:24:41 GMT
Etag "0de7d7f192dcd1:a082"
Last-Modified Tue, 08 May 2012 12:53:00 GMT
Server Microsoft-IIS/6.0
X-Powered-By ASP.NET
Brilliant I've now got a max age in the cache control. Now what's confusing me is this as far as I can tell has now practical difference in how the site performs in terms of downloads. If I press f5 it gets it from the cache, if I press control f5 it gets it from the server with a HTTP 200.
So how does this improve performance? How do you get a HTTP 304 instead of a http 200? I just don't get what this practically archives?
any help would be good thanks
When you set Expires or max-age explicitly, you are telling the client that it will be safe to cache the response for that much time. The client will happily get it from cache, it will not touch your server, there will be no 304. Unless you do Ctrl+F5, which forces the browser to do a full request anew, resulting in 200.
Now what if you don’t set Expires nor max-age? This just means that the client will pick an expiration time by itself, heuristically. Your response is still cached, only the browser has to guess for how long.
So, Expires/max-age are useful in two cases.
If you want to recommend caching for a specific period of time—longer than a browser would guess. This is often done with versioned static content, which never changes, so expiration time is set on the order of years.
If you want to prevent caching, in which case you set Cache-Control: no-cache and Expires in the past (some versions of IE will ignore no-cache).
Conditional requests, 304 and all that, only come to play after the content has already expired. To revalidate it, the client might do a conditional GET, which, depending on your server setup, may or may not result in 304.
The performance improvements come from making fewer HTTP requests. When a browser is parsing a page and sees it has to request a CSS file, if it's already got a copy of it in it's cache with a max-age=31536000, it knows it's cached copy of the file is good for 1 year and doesn't have to make the HTTP request to fetch the file.
Less round trips to the server should result in a faster loading page, and a better experience for users.
Related
A grid of EC2 web servers is running behind an ELB load balancer. The ELB is behind Amazon's CloudFront content delivery network. Content Delivery Networks are very new to me. My understanding is that CloudFront is supposed to speed up performance by caching static content at its "edges". But this isn't what's happening.
Consider my EC2 instances whose content should always have a lifetime of five minutes. For static content this usually means declaring the following in my web.config file:
<staticContent>
<clientCache cacheControlCustom="public" cacheControlMode="UseMaxAge" cacheControlMaxAge="00.00:05:00"/>
</staticContent>
...and for the dynamic stuff, it usually means executing the following commands against an HttpResponse object:
resp.Cache.SetCacheability(HttpCacheability.Public);
resp.Cache.SetMaxAge(TimeSpan.FromMinutes(5));
With that as background...
When my browser hits the ELB directly, everything works as expected. Firebug consistently shows that 304 (Not Modified) is returned for content that exists in the browser's cache, has passed its five minute expiration, but has not been changed on the server. Here are the response headers for a download of defs.js, for example:
HTTP/1.1 304 Not Modified
Accept-Ranges: bytes
Cache-Control: public,max-age=300
Date: Tue, 22 Apr 2014 13:54:16 GMT
Etag: "0152435d158cf1:0"
Last-Modified: Tue, 15 Apr 2014 17:36:18 GMT
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Connection: keep-alive
IIS correctly sees that the file hasn't been changed since April 15th and returns 304.
But looks what happens when the file is grabbed through CloudFront.
HTTP/1.1 200 OK
Content-Type: application/x-javascript
Content-Length: 205
Connection: keep-alive
Accept-Ranges: bytes
Cache-Control: public,max-age=300
Date: Tue, 22 Apr 2014 14:07:33 GMT
Etag: "0152435d158cf1:0"
Last-Modified: Tue, 15 Apr 2014 17:36:18 GMT
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Age: 16
X-Cache: Hit from cloudfront
Via: 1.1 0f140ef1be762325ad24a7167aa57e65.cloudfront.net (CloudFront)
X-Amz-Cf-Id: Evfdhs-pxFojnzkQWuG-Ubp6B2TC5xbunhavG8ivXURdp2fw_noXjw==
In this case CloudFront forces the browser to download the entire file again even though, as you can see:
(a) it knows the file hasn't been modified since April 15th (see Last-Modified header), and
(b) CloudFront does have a cached copy of the file on hand (see X-Cache header)
Perhaps you're wondering if my browser is sending a valid If-Modified-Since header. Indeed it is. Here are the request headers:
GET /code/shared/defs.js HTTP/1.1
Host: d2fn6fv5a0cu3b.cloudfront.net
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://d2fn6fv5a0cu3b.cloudfront.net/
Connection: keep-alive
If-Modified-Since: Tue, 15 Apr 2014 17:36:18 GMT
If-None-Match: "0152435d158cf1:0"
Cache-Control: max-age=0
It's an odd situation. If I just sit in front of my browser and keep doing page Reloads (Cmd-R), maybe about half the time CloudFront will correctly return a 304 and the other half of the time it'll incorrectly return 200 along with all of the content. Waiting for the five minute expiration before interacting with the page yields primarily 200's and only a few 304's. This odd behavior applies to all of the files (.css, .js, .png, etc.) referenced on the HTML page as well as for the containing HTML page itself. I know my app is coded properly because as mentioned above, hitting the ELB directly without going through CloudFront results in the expected 304 result. Any ideas?
The answer was found in an obscure sentence written in a seemingly unrelated piece of Amazon documentation:
When you configure CloudFront to forward cookies to your origin [...] If-Modified-Since and If-None-Match conditional requests are not supported.
Strange, but indeed the reality of the situation is far worse; It's not that forwarding cookies to your origin servers disables conditional requests, but rather that is disables them sometimes -- to the point where the HTTP result code (304 vs 200) is virtually random.
It's important to note that you'll be bitten by this bizarre behavior even if you're not using cookies at all. It's still absolutely essential that the Forward Cookies drop-down be set to "None" as shown in the image below:
Switching the setting to "None" fixes the errant behavior described in my original post.
This solution presents you with another problem though. You're telling CloudFront to totally strip out all cookies prior to forwarding the request to your origin. But your origin server might need those cookies. Further, if you're using the ELB (load balancer) as your origin, a critical cookie that the ELB depends upon to maintain sticky sessions will be totally dropped. Not good.
The solution to the cookie-stripping problem will depend on how your site is organized. In my case, transmission of cookies (session-related or otherwise) is only necessary when posting AJAX data to myDomain.com/ajax/. Because all cookie-dependent URLs fall under the category of ajax/* , a new behavioral rule for that path had to be created and in that rule, and that rule only, the Forward Cookies drop-down is set to "All" instead of "None."
So there it is. Hope this helps someone.
I see big player (i.e. akamai) started to drop the Expires header all together and only use Cache-Control, e.g.
curl -I https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-snc7/395029_379645875452936_1719075242_n.jpg
HTTP/1.1 200 OK
Last-Modified: Fri, 01 Jan 2010 00:00:00 GMT
Date: Sun, 25 Nov 2012 16:46:43 GMT
Connection: keep-alive
Cache-Control: max-age=1209600
So still any reason to keep using Expires?
Cache-Control was introduced in HTTP 1.1 to replace Expires. If both headers are present, Cache-Control is preferred over Expires:
If a response includes both an Expires header and a max-age
directive, the max-age directive overrides the Expires header, even
if the Expires header is more restrictive. This rule allows an origin
server to provide, for a given response, a longer expiration time to
an HTTP/1.1 (or later) cache than to an HTTP/1.0 cache. This might be
useful if certain HTTP/1.0 caches improperly calculate ages or
expiration times, perhaps due to desynchronized clocks.
But there are still clients out there that can only HTTP 1.0. So for HTTP 1.0 requests/responses, you should still use Expires.
What is the definitive solution for avoid any kind of caching of http data? We can modify the client as well as the server - so I think we can split the task between client and the server.
Client can append to each request a random parameter http://URL/path?rand=6372637263 – My feeling is that using only this way it is not working 100% - might be there are some intelligent proxies, which can detect that… On the other side I think that if the URL is different from the previous one, the proxy cannot simply decide to send back some cached response.
On server can control a bunch of HTTP headers:
Expires: Tue, 03 Jul 2001 06:00:00 GMT
Last-Modified: {now} GMT
Cache-Control: no-store, no-cache, must-revalidate, max-age=0
Cache-Control: post-check=0, pre-check=0
Pragma: no-cache
Any comments to this, what is the best approach?
Server-side cache control headers should look like:
Expires: Tue, 03 Jul 2001 06:00:00 GMT
Last-Modified: {now} GMT
Cache-Control: max-age=0, no-cache, must-revalidate, proxy-revalidate
Avoid rewriting URLs on the client because it pollutes caches, and causes other weird semantic issues. Furthermore:
Use one Cache-Control header (see rfc 2616) because behaviour with multiple entries is undefined. Also the MSIE specific entries in the second cache-control are at best redundant.
no-store is about data security. (it only means don't write this to disk - caches are still allowed to store the response in memory).
Pragma: no-cache is meaningless in a server response - it's a request header meaning that any caches receiving the request must forward it to the origin.
Using both Expires (http/1.0) and cache-control (http/1.1) is not redundant since proxies exist that only speak http/1.0, or will downgrade the protocol.
Technically, the last modified header is redundant in light of no-cache, but it's a good idea to leave it in there.
Some browsers will ignore subsequent directives in a cache-control header after they come across one they don't recognise - so put the important stuff first.
Adding header
Cache-control: private
guarantees, that gataway cache won't cache such request.
I'd like to recommend you Fabien Potencier lecture about caching: http://www.slideshare.net/fabpot/caching-on-the-edge
To disable the cache, you should use
Expires: 0
Or
Cache-Control: no-store
If you use one then should not use other one.
I have a particular HTTP response which I don't want cached because it has private/sensitive data in it
I'm already setting Cache-Control to no-store,
which should handle clients supporting HTTP/1.1.
How do I use the Expires header to do the same for HTTP/1.0? Should I just set it with an arbitrary timestamp from 1970 or something? Is there a special value to tell it never to cache?
The HTTP RFC says:
To mark a response as "already expired," an origin server sends an Expires date that is equal to the Date header value.
You should set the expires header to a date in the past. And you should also set the must-revalidate flag on the Cache-Control header.
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-control: no-cache, must-revalidate
You can find a good article dealing with caching issues on the doctype wiki:
Setting an Expires header in the past ensures that HTTP/1.0 and
HTTP/1.1 proxies and browsers will not cache the content. The
Cache-control directive also tells HTTP/1.1 proxies not to cache the
content. Even if proxies may be configured to return stale content
when they should not, the must-revalidate re-affirms that they SHOULD
NOT do it.
Here is the logo currently used on www.google.com:
http://www.google.com/images/logos/ps_logo2.png
Here's its HTTP response:
HTTP/1.1 200 OK
Content-Type: image/png
Last-Modified: Thu, 05 Aug 2010 22:54:44 GMT
Date: Fri, 25 Mar 2011 16:41:05 GMT
Expires: Fri, 25 Mar 2011 16:41:05 GMT
Cache-Control: private, max-age=31536000
X-Content-Type-Options: nosniff
Server: sffe
Content-Length: 26209
Age: 0
Via: 1.1 localhost.localdomain
The Cache-Control header says it's good for 1 year. But Expires is the same as Date, i.e. it's stale immediately.
Why the difference?
Cache-Control overrides Expires on any HTTP/1.1 cache or client.
So I assume Google wants to cache the image for HTTP/1.1 but not cache it at all for HTTP/1.0.
I don't know why Google cares. I would think they'd want to cache the logo even for older clients.
The reason is that google wants the user to cache the image but not intermediate shared caches (hence the private directive).
Many intermediate cache systems can be outdated and ignore new HTTP features (as the cache-control header), so this approach makes them not to cache the resource (via the expires header). For the rest of agents understanding both, the cache-control overrides expires header.
This is a common practice referenced in rfc2616 sec14.9.3
An origin server might wish to use a relatively new HTTP cache control feature, such as the "private" directive, on a network including older caches that do not understand that feature. The origin server will need to combine the new feature with an Expires field whose value is less than or equal to the Date value. This will prevent older caches from improperly caching the response.