I want browsers to always add (except first time) "If-Modified-Since" request header to avoid unnecessary traffic.
Response headers are:
Accept-Ranges:bytes
Cache-Control:max-age=0, must-revalidate
Connection:Keep-Alive
Content-Length:2683
Content-Type:text/html; charset=UTF-8
Date:Thu, 05 Apr 2012 13:06:19 GMT
Keep-Alive:timeout=15, max=497
Last-Modified:Thu, 05 Apr 2012 13:05:11 GMT
Server:Apache/2.2.21 (Red Hat)
FF 11 and IE 9 both send "If-Modified-Since" and get 304 in response but Chrome 18 doesn't and get 200.
Why? How to force Chrome to sent "If-Modified-Since" header?
I do not know if it important or not but all requests going through HTTPS.
I've been chasing this issue for some time, thought I'd share what I found.
"The rule is actually quite simple: any error with the certificate means the page will not be cached."
https://code.google.com/p/chromium/issues/detail?id=110649
If you are using a self-signed certificate, even if you tell Chrome to add an exception for it so that the page loads, no resources from that page will be cached, and subsequent requests will not have an If-Modified-Since header.
I just now found this question, and after puzzling over Chrome's If_Modified_Since behavior, I've found the answer.
Chrome's decision to cache files is based on the Expires header that it receives. The Expires header has two main requirements:
It must be in Greenwich Mean Time (GMT), and
It must be formatted according to RFC 1123 (which is basically RFC 822 with a four-digit year).
The format is as follows:
Expires: Sat, 07 Sep 2013 05:21:03 GMT
For example, in PHP, the following outputs a properly formatted header.
$duration = time() + 3600 // Expires in one hour.
header("Expires: " . gmdate("D, d M Y H:i:s", $duration) . " GMT");
("GMT" is appended to the string instead of the "e" timezone flag because, when used with gmdate(), the flag will output "UTC," which RFC 1123 considers invalid. Also note that the PHP constants DateTime::RFC1123 and DATE_RFC1123 will not provide the proper formatting, since they output the difference to GMT in hours [i.e. +02:00] rather than "GMT".)
See the W3C's date/time format specifications for more info.
In short, Chrome will only recognize the header if it follows this exact format. This, combined with the Cache-Control header...
header("Cache-Control: private, must-revalidate, max-age=" . $duration);
...allowed me to implement proper cache control. Once Chrome recognized those headers, it began caching the pages I sent it (even with query strings!), and it also began sending the If_Modified_Since header. I compared it to a stored "last-modified" date, sent back HTTP/1.1 304 Not Modified, and everything worked perfectly.
Hope this helps anyone else who stumbles along!
I've noticed almost the same behaviour and my findings are:
First of all the 200 status indicator in chrome is not the whole truth, you need to look at the "Size Content" column as well. If this says "(from cache)" the resource was take directly from cache without even asking if it was modified.
This caching behaviour of resources that lack any indication of expires or max-age seems to apply when requesting static files that have a last-modified header. I've noted that chrome (ver. 22):
Asks for the file the first time (obviously since it is not in cache).
Asks if it is modified the second time (since it is in cache but has no indication of freshness).
Uses it directly the third time and then on (even if it is a new browser session).
I'm a bit puzzled by this behaviour but it is fairly reasonable, if it is static, was modified a long time ago, and hasn't changed since last check you could assume that it is going to be valid for a while longer (don't know how they calculate it though).
I had the same problem, in Chrome all requests were always status code 200, in other browsers 304.
It turned out I had the disable cache (while DevTools is open) checked in on Devtools - Settings - General page..:)
Don't disable cache in Chrome Dev Tools (on "Network" tab).
Cache-Control should be Cache-Control: public. Use true as second parameter to header PHP function: header("Cache-Control: public", true);
I have also found that Chrome (recent v95+) also returns a cached 200 response if I have DevTools open. It never even sends the request on to the server! If I close DevTools, the behaviour is as it should, and the server receives the expected request.
Related
I have some js hosted on AWS. I want to cache it to not to pay extra for 304 GET request, but I'm puzzled why two headers are different.
Request Method:GET
Status Code:304 Not Modified
Request header of helper.js
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
If-Modified-Since:Tue, 20 Aug 2013 13:08:13 GMT
and response header
Age:4348
Cache-Control:max-age=604800
Connection:keep-alive
Why they are different? Does it mean that Cache-Control is wrong? I used Chrome console to get the headers.
I don't think that Cache-Control is wrong and it seems that your content is already cached. From the request headers, I understand that the first request was done at Tue, 20 Aug 2013 13:08:13 GMT as the browser indicates the server "hey, has the content changed since that time?". In return, the server responses with 304 Not Modified header, indicating that the content has not been changed and it should be cached 604800 seconds more until revalidating it. Remember that the caching is done on server side. So, you may want to look at your server defitinitons on js files. Usually, in the deployment environment, I instruct my webserver to send cache header for *.js *.png etc. After configuring the web server for sending cache headers, it is the browser's work to take care of the rest. In that case, your browser works as expected.
You can look at RFC2616 for 304 response. You may also want to look at this decent caching tutorial. It should clear some ideas.
The problem is with Chrome. If you press Refresh button it invalidates the cache, but if you press Enter in address bar it gets the resources from cache.
When I visit chesseng.herokuapp.com I get a response header that looks like
Cache-Control:private
Connection:keep-alive
Content-Encoding:gzip
Content-Type:text/css
Date:Tue, 16 Oct 2012 06:37:53 GMT
Last-Modified:Tue, 16 Oct 2012 03:13:38 GMT
Status:200 OK
transfer-encoding:chunked
Vary:Accept-Encoding
X-Rack-Cache:miss
and then I refresh the page and get
Cache-Control:private
Connection:keep-alive
Date:Tue, 16 Oct 2012 06:20:49 GMT
Status:304 Not Modified
X-Rack-Cache:miss
so it seems like caching is working. If that works for caching then what is the point of Expires and Cache-Control:max-age. To add to confusion, when I test the page at https://developers.google.com/speed/pagespeed/insights/ it tells me to "Leverage browser caching".
Cache-Control: private
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache, such as a proxy server.
From RFC2616 section 14.9.1
To answer your question about why caching is working, even though the web-server didn't include the headers:
Expires: [a date]
Cache-Control: max-age=[seconds]
The server kindly asked any intermediate proxies to not cache the contents (i.e. the item should only be cached in a private cache, i.e. only on your own local machine):
Cache-Control: private
But the server forgot to include any sort of caching hints:
they forgot to include Expires (so the browser knows to use the cached copy until that date)
they forgot to include Max-Age (so the browser knows how long the cached item is good for)
they forgot to include E-Tag (so the browser can do a conditional request)
But they did include a Last-Modified date in the response:
Last-Modified: Tue, 16 Oct 2012 03:13:38 GMT
Because the browser knows the date the file was modified, it can perform a conditional request. It will ask the server for the file, but instruct the server to only send the file if it has been modified since 2012/10/16 3:13:38:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
The server receives the request, realizes that the client has the most recent version already. Rather than sending the client 200 OK, followed by the contents of the page, it instead tells you that your cached version is good:
304 Not Modified
Your browser did have to suffer the round-trip delay of sending a request to the server, and waiting for the response, but it did save having to re-download the static content.
Why Max-Age? Why Expires?
Because Last-Modified sucks.
Not everything on the server has a date associated with it. If I'm building a page on the fly, there is no date associated with it - it's now. But I'm perfectly willing to let the user cache the homepage for 15 seconds:
200 OK
Cache-Control: max-age=15
If the user hammers F5, they'll keep getting the cached version for 15 seconds. If it's a corporate proxy, then all 67,198 users hitting the same page in the same 15-second window will all get the same contents - all served from close cache. Performance win for everyone.
The virtue of adding Cache-Control: max-age is that the browser doesn't even have to perform a "conditional" request.
if you specified only Last-Modified, the browser has to perform a If-Modified-Since request, and watch for a 304 Not Modified response
if you specified max-age, the browser won't even have to suffer the network round-trip; the content will come right out of the caches.
The difference between "Cache-Control: max-age" and "Expires"
Expires is a legacy (c. 1998) equivalent of the modern Cache-Control: max-age header:
Expires: you specify a date (yuck)
max-age: you specify seconds (goodness)
And if both are specified, then the browser uses max-age:
200 OK
Cache-Control: max-age=60
Expires: 20180403T192837
Any web-site written after 1998 should not use Expires anymore, and instead use max-age.
What is ETag?
ETag is similar to Last-Modified, except that it doesn't have to be a date - it just has to be a something.
If I'm pulling a list of products out of a database, the server can send the last rowversion as an ETag, rather than a date:
200 OK
ETag: "247986"
My ETag can be the SHA1 hash of a static resource (e.g. image, js, css, font), or of the cached rendered page (i.e. this is what the Mozilla MDN wiki does; they hash the final markup):
200 OK
ETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
And exactly like in the case of a conditional request based on Last-Modified:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
304 Not Modified
I can perform a conditional request based on the ETag:
GET / HTTP/1.1
If-None-Match: "33a64df551425fcc55e4d42a148795d9f25f89d4"
304 Not Modified
An ETag is superior to Last-Modified because it works for things besides files, or things that have a notion of date. It just is
RFC 2616, section 14.9.1:
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache...A private (non-shared) cache MAY cache the response.
Browsers could use this information. Of course, the current "user" may mean many things: OS user, a browser user (e.g. Chrome's profiles), etc. It's not specified.
For me, a more concrete example of Cache-Control: private is that proxy servers (which typically have many users) won't cache it. It is meant for the end user, and no one else.
FYI, the RFC makes clear that this does not provide security. It is about showing the correct content, not securing content.
This usage of the word private only controls where the response may be cached, and cannot ensure the privacy of the message content.
The Expires entity-header field gives the date/time after which the response is considered stale.The Cache-control:maxage field gives the age value (in seconds) bigger than which response is consider stale.
Althought above header field give a mechanism to client to decide whether to send request to the server. In some condition, the client send a request to sever and the age value of response is bigger then the maxage value ,dose it means server needs to send the resource to client? Maybe the resource never changed.
In order to resolve this problem, HTTP1.1 gives last-modifided head. The server gives the last modified date of the response to client. When the client need this resource, it will send If-Modified-Since head field to server. If this date is before the modified date of the resouce, the server will sends the resource to client and gives 200 code.Otherwise,it will returns 304 code to client and this means client can use the resource it cached.
Background
I'm attempting to help a colleague debug an issue that hasn't been an issue for the past 6 months. After the most recent deployment of an ASP.NET MVC 2 application, FileResult responses that force a PDF file at the user for opening or saving are having trouble existing long enough on the client machine for the PDF reader to open them.
Earlier versions of IE (expecially 6) are the only browsers affected. Firefox and Chrome and newer versions of IE (>8) all behave as expected. With that in mind, the next section defines the actions necessary to recreate the issue.
Behavior
User clicks a link that points to an action method (a plain hyperlink with an href attribute).
The action method generates a PDF represented as a byte stream. The method always recreates the PDF.
In the action method, headers are set to instruct browsers how to cache the response. They are:
response.AddHeader("Cache-Control", "public, must-revalidate, post-check=0, pre-check=0");
response.AddHeader("Pragma", "no-cache");
response.AddHeader("Expires", "0");
For those unfamiliar with exactly what the headers do:
a. Cache-Control: public
Indicates that the response MAY be cached by any cache, even if it would normally be non-cacheable or cacheable only within a non- shared cache.
b. Cache-Control: must-revalidate
When the must-revalidate directive is present in a response received by a cache, that cache MUST NOT use the entry after it becomes stale to respond to a
subsequent request without first revalidating it with the origin server
c. Cache-Control: pre-check (introduced with IE5)
Defines an interval in seconds after which an entity must be checked for freshness. The check may happen after the user is shown the resource but ensures that on the next roundtrip the cached copy will be up-to-date.
d. Cache-Control: post-check (introduced with IE5)
Defines an interval in seconds after which an entity must be checked for freshness prior to showing the user the resource.
e. Pragma: no-cache (to ensure backwards compatibility with HTTP/1.0)
When the no-cache directive is present in a request message, an application SHOULD forward the request toward the origin server even if it has a cached copy of what is being requested
f. Expires
The Expires entity-header field gives the date/time after which the response is considered stale.
We return the file from the action
return File(file, "mime/type", fileName);
The user is presented with an Open/Save dialog box
Clicking "Save" works as expected, but clicking "Open" launches the PDF reader, but the temporary file IE stored has already been deleted by the time the reader tries to open the file, so it complains that the file is missing (and it is).
There are a half dozen other apps here that use the same headers to force Excel, CSV, PDF, Word, and a ton of other content at users and there's never been an issue.
The Question
Are the headers correct for what we're trying to do? We want the file to exist temporarily (get cached), but always be replaced by new versions even though the requests may be identical).
The response headers are set in the action method before return a FileResult. I've asked my colleague to try creating a new class that inherits from FileResult and to instead override the ExecuteResult method so that it modifies the headers and then does base.ExecuteResult() instead -- no status on that.
I have a hunch the "Expires" header of "0" is the culprit. According to this W3C article, setting it to "0" implies "already expired." I do want it to be expired, I just don't want IE to go removing it off of the filesystem before the application handling it gets a chance to open it.
As always, thanks!
Edit: The Solution
Upon further testing (using Fiddler to inspect the headers), we were seeing that the response headers we thought were getting set were not the ones being interpreted by the browser. Having not been familiar with the code myself, I was unaware of an underlying issue: the headers were getting stomped on outside of the action method.
Nonetheless, I'm going to leave this question open. Still outstanding is this: there seems to be some discrepancy between the Expires header having a value of 0 vs. -1. If anybody can lay claim to differences by design, in regards to IE, I would still like to hear about it. As for a solution though, the above headers do work as intended with the Expires value set to -1 in all browsers.
Update 1
The post How to control web page caching, across all browsers? describes in detail that caching can be prevented in all browsers with the help of setting Expires = 0. I'm still not sold on this 0 vs -1 argument...
I think you should just use
HttpContext.Current.Response.Cache.SetMaxAge (new TimeSpan (0));
or
HttpContext.Current.Response.Headers.Set ("Cache-Control", "private, max-age=0");
to set max-age=0 which means nothing more as the cache re-validating (see here). If you would be set additionally ETag in the header with some your custom checksum of hash from the data, the ETag from the previous request will be sent to the server. The server are able either to return the data or, in case that the data are exactly the same as before, it can return empty body and HttpStatusCode.NotModified as the status code. In the case the web browser will get the data from the local browser cache.
I recommend you to use Cache-Control: private which force two important things: 1) switch off caching the data on the proxy, which has sometimes very aggressive caching settings 2) it will allows the caching of the the data, but not permit sharing of the cache with another users. It can solve privacy problems because the data which you return to one user could be not allowed to read by another users. By the way the code HttpContext.Current.Response.Cache.SetMaxAge (new TimeSpan (0)) set Cache-Control: private, max-age=0 in the HTTP header by default. If you do want to use Cache-Control: public you can use SetCacheability (HttpCacheability.Public); to overwrite the behavior or use Headers.Set instead of Cache.SetMaxAge.
If you have interest to study more caching options of HTTP protocol I would recommend you to read the caching tutorial.
UPDATED: I decide to write some more information to clear my position. Corresponds to the information from the Wikipedia even so old web browsers like Mosaic 2.7, Netscape 2.0 and Internet Explorer 3.0 supports March 1996, pre-standard of HTTP/1.1 described in RFC 2068. So I suppose (but not test it) that the old web browsers support max-age=0 HTTP header. In any way Netscape 2.06 and Internet Explorer 4.0 definitively supports HTTP 1.1.
So you should ask you first: which HTML standards you use? Do you still use HTML 2.0 instead of more late HTML 3.2 published in January 1997? I suppose you use at least HTML 4.0 published in December 1997. So if you build your application at least in HTML 4.0, your site can be oriented on the web clients which supports HTTP 1.1 and ignore (don't support) the web clients which don't support HTTP 1.1.
Now about other "Cache-Control" headers as "private, max-age=0". Including of the headers is in my opinion is pure paranoia. As I have some caching problem myself I tried also to include different other headers, but later after reading carefully the section 14.9 of RFC2616 I use only "Cache-Control: private, max-age=0".
The only "Cache-Control" header which can be additionally discussed is "must-revalidate" described on the section 14.9.4 which I referenced before. Here is the quote:
The must-revalidate directive is necessary to support reliable
operation for certain protocol features. In all circumstances an
HTTP/1.1 cache MUST obey the must-revalidate directive; in particular,
if the cache cannot reach the origin server for any reason, it MUST
generate a 504 (Gateway Timeout) response.
Servers SHOULD send the must-revalidate directive if and only if
failure to revalidate a request on the entity could result in
incorrect operation, such as a silently unexecuted financial
transaction. Recipients MUST NOT take any automated action that
violates this directive, and MUST NOT automatically provide an
unvalidated copy of the entity if revalidation fails.
Although this is
not recommended, user agents operating under severe connectivity
constraints MAY violate this directive but, if so, MUST explicitly
warn the user that an unvalidated response has been provided. The
warning MUST be provided on each unvalidated access, and SHOULD
require explicit user confirmation.
Sometime if I have problem with Internet connection I see the empty page with "Gateway Timeout" message. It come from the the usage of "must-revalidate" directive. I don't think that "Gateway Timeout" message really help the user.
So the persons, how prefer to start self-destructive procedure if he hears "Busy" signal on the call to his boss, should additionally use "must-revalidate" directive in the "Cache-Control" header. Other persons I recommend just use "Cache-Control: private, max-age=0" and nothing more.
For IE, I remember having to set Expires: -1. How to prevent caching in Internet Explorer seems to confirm this with the following code snippet.
<% Response.CacheControl = "no-cache" %>
<% Response.AddHeader "Pragma", "no-cache" %>
<% Response.Expires = -1 %>
Looking back in code, this is what I found. Also, I vaguely remember that if you set Cache-Control: private is may not behave correctly with SSL.
Response.AddHeader("Cache-Control", "no-cache");
Response.AddHeader("Expires", "-1");
Also, So, You Don't Want To Cache, Huh? mentions -1, but uses methods on Response.Cache instead:
// Stop Caching in IE
Response.Cache.SetCacheability(System.Web.HttpCacheability.NoCache);
// Stop Caching in Firefox
Response.Cache.SetNoStore();
However, ASP Page caching issue (IE8) says this code doesn't work.
I have these headers being sent to the client by the server:
Cache-Control:private
Connection:keep-alive
Content-Encoding:gzip
Content-Type:text/html
Date:Sun, 27 Nov 2011 11:10:38 GMT
ETag:"12341234"
Set-Cookie:connect.sid=e1u...7o; path=/; expires=Sun, 27 Nov 2011 11:40:38 GMT; httpOnly
Transfer-Encoding:chunked
last-modified:Sat, 26 Nov 2011 21:42:45 GMT
I want the client to validate that the file hasn't changed on the server and send a "200" if it has otherwise a "304".
Firefox sends:
if-modified-since: Sat, 26 Nov 2011 21:42:45 GMT
if-none-match: "12341234"
Why isn't the chrome sending the same on a refresh of the page? I'm after the behavior that .Net has running:
context.Response.Cache.SetCacheability(HttpCacheability.ServerAndPrivate)
After spending half a day on this yesterday, I tracked down what was causing the issue for me. So long as you have the Chrome object inspector/Client Debugger/Network monitor/Thing that pops up when you hit F12, Chrome will not send cache request headers. Period. (update: in newer versions of Chrome, there is a checkbox "Disable cache"). Even if you don't have the "network" tab open (ex: have the javascript console open), this checkbox still disables all cacheing.
Its sad, because debugging this from the client side obligates you to leave the network panel open to see what headers are being sent and received, and what codes are being returned. Without the network panel open, there is no way to know if your content is being cached from the client side.
If you dig into your server access logs, you will notice your server returning 304s(Cached Content) the minute you close the debug window on your Chrome client. Hope this helps.
Chrome 24.0.1312.57
I found one answer to this behaviour when using HTTPS, thought I'd share what I found. You do not specify if you are requesting via HTTP or HTTPS.
"The rule is actually quite simple: any error with the certificate means the page will not be cached."
https://code.google.com/p/chromium/issues/detail?id=110649
If you are using a self-signed certificate, even if you tell Chrome to add an exception for it so that the page loads, no resources from that page will be cached, and subsequent requests will not have an If-Modified-Since header.
In my experience you need more than just the "Private" Cache-Control header. You need either "Max-Age" or "Expires" to force Chrome to revalidate content with the server.
Remember that revalidation will only start after these time values have elapsed, so they may need to be set to a small value.
In addition (https://stackoverflow.com/a/14899869/362780):
F12 > Settings > General > Disable cache (while DevTools is open) -> uncheck this...
Browsers have a lot of counter intuitive behavior when it comes to caching. You would expect, that if the response includes a last-modified-date, that the browser would revalidate this before reusing it. But none of the major browsers actually do that.
The ideal settings for your situation depend on when you want the browser to revalidate, see link below.
Not only do browsers act counter intuitively, different browsers also behave differently in the same situation. For example when the user clicks on the refresh button.
You can read how the different browsers (Internet Explorer, Edge, Safari, FireFox, Chrome) behave with different caching directives (Etag, last-modified, must-revalidate, expires, max-age, no-cache, no-store) at https://gertjans.home.xs4all.nl/javascript/cache-control.html
I know this question is old, but still..
I noticed that chrome remembers the last refresh that you made. So if you press ctrl+shift+r (refreshing and deleting cache), and pressing ctrl+r (just refreshing), chrome keeps on deleting cache and does not show the 304 in the received response. There is a workaround for this. Press ctrl+shift+r and then go to the address bar, focus it, and hit enter. If your etags are set correctly, and your server is ready to serve a 304, you'll see a new response code in the debugger - 304. so it works.
From a browser perspective,
What occur if a component (image, script, stylesheet...) is served without a Last-Modified HTTP header field...
Is it however cached by the browser even if it won't be able to perform a validity check(If-Modified-Since) in future, due to his lack of date/time information?
Eg:
GET /foo.png HTTP/1.1
Host: example.org
--
200 OK
Content-Type: image/png
...
Is foo.png however cached?
--
Would you know any online service to serve my raw HTTP response that I can write myself in order to test what I'm asking ?
Thank you.
Generally speaking, responses can be cached unless they explicitly say that they can't (e.g., with cache-control: no-store).
However, most caches will not store responses that don't have something that they can base freshness on, e.g., Cache-Control, Expires, or Last-Modified.
For the complete rules, see:
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-p6-cache-13#section-2.1
See:
http://www.mnot.net/blog/2009/02/24/unintended_caching
for an example of how this can surprise some people.
Yes, the image may get cached even without a Last-Modified response header.
The browser will then cache the image until its TTL expires. You can set the image's Time To Live using appropriate response headers, e.g. this would set the TTL to one hour:
Cache-Control: max-age=3600
Date: Tue, 29 Mar 2011 20:18:17 GMT
Expires: Tue, 29 Mar 2011 21:18:17 GMT
Even without any Last-Modified in the response, the browser may still use the Date header for subsequent If-Modified-Since requests.
I disabled the Last-Modified header on a large site and FF 13 doesn't take the contents from cache, although a max-age is given etc. Contents without a Last-Modified header ALWAYS get a status 200 ok when requested, not a 304. So the browser looks for it in the cache.