I have a couple of queries related to Cache-Control.
If I specify Cache-Control max-age=3600, must-revalidate for a static html/js/images/css file, with Last Modified Header defined in HTTP header:
Does browser/proxy cache(like Squid/Akamai) go all the way to origin server to validate before max-age expires? Or will it serve content from cache till max-age expires?
After max-age expiry (that is expiry from cache), is there a If-Modified-Since check or is content re-downloaded from origin server w/o If-Modified-Since check?
a) If the server includes this header:
Cache-Control "max-age=3600, must-revalidate"
it is telling both client caches and proxy caches that once the content is stale (older than 3600 seconds) they must revalidate at the origin server before they can serve the content. This should be the default behavior of caching systems, but the must-revalidate directive makes this requirement unambiguous.
b) The client should revalidate. It might revalidate using the If-Match or If-None-Match headers with an ETag, or it might use the If-Modified-Since or If-Unmodified-Since headers with a date.
a. Look at the ‘Stats’ tab on this page and see what happens.
b. After expiration the browser will check at the server if the file is updated. If not, the server will respond with a 304 Not Modified header and nothing is downloaded.
You can check this behaviour yourself by looking at the ‘Net’ panel in Firebug or similar tools. Just re-enter the URL in the address bar and compare the number of HTTP requests with the number of requests when your cache is empty.
The given answers are incorrect, at least for web browsers in 2019.
"After expiration the browser will check at the server if the file is updated" <- not true
I have a static file served with "Cache-Control: public,must-revalidate,max-age=864000" and both Chrome and Firefox do a request every time (and get a 304 Not Modified back every time).
Related
many websites' html file uses cache-control: public, max-age=0, must-revalidate, like this one.
According to this MDN page, this is the same as Cache-Control: no-cache for most modern browsers. And I understand how it works from a high-level.
For a page with this cache strategy, for a return visit or a refresh, the browser would send a conditional request to revalidate the cached asset, i.e. the HTML file. My question is, does this request always bypass any intermediate cache, e.g. an edge server in a CDN and go straight to the origin server (with no s-max-age set explicitly) or any intermediate cache can validate the cache on behalf of the origin server?
The MDN page doesn't explicitly give me the answer. My guess is that it would still go to CDN for revalidation as the origin server can proactively purge the CDN once a new build occurs to give it a fresh copy.
I've read a lot of related articles on the matter and also the very good article about HTTP caching here:
https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=en#invalidating-and-updating-cached-responses
but it is still not clear to me:
Why isn't sending an ETag header enough to invalidate the browser cache for a particular resource? Why does everyone recommend actually changing the URL/filename of the resource to force the browser to re-download the file? If the browser has already cached the file with a particular ETag and the ETag is modified on the server, wouldn't that suffice?
I find the following pages helpful:
https://jakearchibald.com/2016/caching-best-practices/
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag
This line from MDN's ETag page shares the key point (emphasis added):
If a user visits a given URL again (that has an ETag set), and it is stale, that is too old to be considered usable, the client will send the value of its ETag along in an If-None-Match header field...
The ETag will be used by the client to revalidate resources once they become "stale". But what constitutes "stale"?
This is where the Cache-Control header comes in handy. The Cache-Control header can be sent with a response to let the client know how long the client may cache an item until it should be considered stale. For example, Cache-Control: no-cache would indicate that the resource should be considered stale immediately. See the MDN Cache-Control page for more information on available Cache-Control values.
When the browser attempts to process a request for a cached resource that is considered stale, it will first send a revalidation request to the server with the resource's last ETag value included via the If-None-Match request header, as described on MDN's ETag page. It can also use the Last-Modified response header sent as the If-Modified-Since request header as a secondary option.
If the server determines that the client's ETag value (in the If-None-Match request header) is current, then it will respond with a 304 (Not Modified) HTTP status code and an empty body, indicating that the client can use the cached entry. Otherwise, the server will respond with a 200 HTTP status code and the new response body.
Other resources:
Difference between no-cache and must-revalidate
What's default value of cache-control?
To answer your questions directly:
Why isn't sending an ETag header enough to invalidate the browser cache for a particular resource? -- Because the ETag header is not validated until the cached entry is considered stale, such as via an expiration date set in the Cache-Control response header.
Why does everyone recommend actually changing the URL/filename of the resource to force the browser to re-download the file? -- Changing the URL/filename or adding a query string will force the client to avoid using a cache. This is simple and is a virtually guaranteed way of cache-busting. This does not mean it's necessary, but it tends to be safe in the realm of inconsistent browser behaviors.
If the browser has already cached the file with a particular ETag and the ETag is modified on the server, wouldn't that suffice? -- Technically it should suffice as long as the appropriate Cache-Control headers (including the Pragma and Expires headers) are included. See How to control web page caching, across all browsers? for more details.
I have one question: suppose in each http request there is a cache-control: max-age=0 header, so each request will go all the way to the origin web server.
Does it mean CDN is not useful anymore if all requests are like this?
from other post:
When sent by the user agent
I believe shahkalpesh's answer applies to the user agent side. You can also look at 13.2.6 Disambiguating Multiple Responses.
If a user agent sends a request with Cache-Control: max-age=0 (aka. "end-to-end revalidation"), then each cache along the way will revalidate its cache entry (eg. with the If-Not-Modified header) all the way to the origin server. If the reply is then 304 (Not Modified), the cached entity can be used.
On the other hand, sending a request with Cache-Control: no-cache (aka. "end-to-end reload") doesn't revalidate and the server MUST NOT use a cached copy when responding.
It makes sense and match my result.
when cache is not expired in chrome,it will send request to CDN,CDN will query this with if-modified-since with origin ,then serve the end user.
By setting the max-age to 0, you effectively expire your page in your CDN edge cache immediately. Therefore, your CDN always hit your origin and render the CDN useless as you suggested.
Noticed from your other question that you are using Akamai. If so, then you can use the Edge-Control header to override your cache-control if you don't have direct control over that value, but still want to be able to leverage CDN functionality.
What value should have cache-control header to enable ETag\Last-Modified? I want my resources files to be cached but never used without validation from server, i.e. browser should send If-none-match or If-modified-since header and receive 304 HTTP status code to use file from cache.
The short answer is Cache-control: no-cache. Browser/caching proxy will have to always validate data before serving. For success validation ETag and Last-Modified headers must be present. Otherwise resource will be downloaded always fully from the server.
I'm trying to understand how the cache-control http header works.
The cache-control header can have the no-cache value. I have checked the definition in w3c and it said:
If the no-cache directive does not specify a field-name, then a cache
MUST NOT use the response to satisfy a subsequent request without
successful revalidation with the origin server.
It tells no-cache value will trigger validation for every request.
What I want to know is, what is cache validation and what it does in the http protocol?
thanks for your help guys. now i understand validation means check if cache contain latest content from server.
my further question would be what issues no-cache will fix. please provide some scenario, like after applied no-cache in http header, what security issue will be fixed.
thanks guys
The no-cache directive is not intended for a security purpose. Security gets covered in rules that define which data/resources a cdn/proxy server is not permitted to cache. So, if security is required, the no-store directive should be used by the client/server. Look under :
paragraph 2 under section 13.4 on https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
https://www.rfc-editor.org/rfc/rfc7234#section-3
The no-cache directive is used by the client when it is ready to accept a resource from a cache, provided there is a confirmation from the server that the cached resource is up to date (fresh). The proxy/cdn can use two methods to re-validate the resource's freshness :
If client sent an ETAG value, proxy/cdn can forward it to the server under an If-None-Match header. If server responds with '304 Not Modified', then the cached resource is fresh to serve.
Using an If-Modified-Since header with a date value that was received the last time the resource was downloaded from the server (was to be found under the Last-Modified header in server's last response).