I have some js hosted on AWS. I want to cache it to not to pay extra for 304 GET request, but I'm puzzled why two headers are different.
Request Method:GET
Status Code:304 Not Modified
Request header of helper.js
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
If-Modified-Since:Tue, 20 Aug 2013 13:08:13 GMT
and response header
Age:4348
Cache-Control:max-age=604800
Connection:keep-alive
Why they are different? Does it mean that Cache-Control is wrong? I used Chrome console to get the headers.
I don't think that Cache-Control is wrong and it seems that your content is already cached. From the request headers, I understand that the first request was done at Tue, 20 Aug 2013 13:08:13 GMT as the browser indicates the server "hey, has the content changed since that time?". In return, the server responses with 304 Not Modified header, indicating that the content has not been changed and it should be cached 604800 seconds more until revalidating it. Remember that the caching is done on server side. So, you may want to look at your server defitinitons on js files. Usually, in the deployment environment, I instruct my webserver to send cache header for *.js *.png etc. After configuring the web server for sending cache headers, it is the browser's work to take care of the rest. In that case, your browser works as expected.
You can look at RFC2616 for 304 response. You may also want to look at this decent caching tutorial. It should clear some ideas.
The problem is with Chrome. If you press Refresh button it invalidates the cache, but if you press Enter in address bar it gets the resources from cache.
Related
I have a url with a test PDF on it, this is my origin:
https://powered-by.qbank.se/miso/MISO_Testing_Document279626.pdf
I have that origin setup in an Azure CDN using the Microsoft provider. it's url is:
https://misocdn-fail.azureedge.net/MISO_Testing_Document279626.pdf
When I update the PDF on the origin site, all the browsers that I have tested will bring back the NEW document with just an F5 refresh, not even a ctrl-F5. But, the CDN continues the cache the PDF basically indefinetly (2 days acording to docs or til I purge)
My question is, why isn't my CDN able to detect the change at the origin and browser is?
I understand that the CDN caches, but I don't understand what it is that a browser is doing to figure out this content is new?
To better understand the phenomenon it is a good start to obesrve the response headers received from the direct access url.
One way to do that is to use curl -I <YOUR_URL> in your terminal.
You will see something like:
HTTP/1.1 200 OK
Date: Mon, 01 Oct 2018 09:03:57 GMT
Server: Apache
Last-Modified: Fri, 28 Sep 2018 19:11:57 GMT
ETag: "11ff1-576f33ab4c2a0"
Accept-Ranges: bytes
Content-Length: 73713
Cache-Control: max-age=86400
Expires: Tue, 02 Oct 2018 09:03:57 GMT
Content-Type: application/pdf
Out of these headers the browser uses the Cache-Control, ETag and Last-Modified to determine the freshness of the requested content.
Cache-Control: max-age=<seconds> is the maximum amount of time (relative to the time of the request) a resource will be considered fresh.
Now, according to Mozilla Developer Network –MDN– Freshness is described as below:
Once a resource is stored in a cache, it could theoretically be served by the cache forever. Caches have finite storage so items are periodically removed from storage. This process is called cache eviction. On the other side, some resources may change on the server so the cache should be updated. As HTTP is a client-server protocol, servers can't contact caches and clients when a resource changes; they have to communicate an expiration time for the resource. Before this expiration time, the resource is fresh; after the expiration time, the resource is stale. Eviction algorithms often privilege fresh resources over stale resources. Note that a stale resource is not evicted or ignored; when the cache receives a request for a stale resource, it forwards this request with a If-None-Match to check if it is in fact still fresh. If so, the server returns a 304 (Not Modified) header without sending the body of the requested resource, saving some bandwidth.
So to validate a cached resource, an If-None-Match header will be issued by the browser if the ETag header was part of the response for the resource.
This is the mechanism that makes your browser download the new version of your pdf when accessed directly. Please also note, that these headers are present in the request from the CDN url as well, but the CDN edge servers are still storing your old file.
When it comes to the CDN cache, the ETag and Last-Modified headers are not respected. It is only the Cache-Control header in the HTTP response from the origin server that defines the time-to-live (TTL) period of a resource. In your case, it is 86400 seconds. So theoretically, the new version of your pdf will be served after 1 day from the first request through the CDN link.
Up until that moment the old pdf will be hosted by the CDN edge servers. You can read more about the Azure CDN expiration management in the Azure CDN documentation.
When I visit chesseng.herokuapp.com I get a response header that looks like
Cache-Control:private
Connection:keep-alive
Content-Encoding:gzip
Content-Type:text/css
Date:Tue, 16 Oct 2012 06:37:53 GMT
Last-Modified:Tue, 16 Oct 2012 03:13:38 GMT
Status:200 OK
transfer-encoding:chunked
Vary:Accept-Encoding
X-Rack-Cache:miss
and then I refresh the page and get
Cache-Control:private
Connection:keep-alive
Date:Tue, 16 Oct 2012 06:20:49 GMT
Status:304 Not Modified
X-Rack-Cache:miss
so it seems like caching is working. If that works for caching then what is the point of Expires and Cache-Control:max-age. To add to confusion, when I test the page at https://developers.google.com/speed/pagespeed/insights/ it tells me to "Leverage browser caching".
Cache-Control: private
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache, such as a proxy server.
From RFC2616 section 14.9.1
To answer your question about why caching is working, even though the web-server didn't include the headers:
Expires: [a date]
Cache-Control: max-age=[seconds]
The server kindly asked any intermediate proxies to not cache the contents (i.e. the item should only be cached in a private cache, i.e. only on your own local machine):
Cache-Control: private
But the server forgot to include any sort of caching hints:
they forgot to include Expires (so the browser knows to use the cached copy until that date)
they forgot to include Max-Age (so the browser knows how long the cached item is good for)
they forgot to include E-Tag (so the browser can do a conditional request)
But they did include a Last-Modified date in the response:
Last-Modified: Tue, 16 Oct 2012 03:13:38 GMT
Because the browser knows the date the file was modified, it can perform a conditional request. It will ask the server for the file, but instruct the server to only send the file if it has been modified since 2012/10/16 3:13:38:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
The server receives the request, realizes that the client has the most recent version already. Rather than sending the client 200 OK, followed by the contents of the page, it instead tells you that your cached version is good:
304 Not Modified
Your browser did have to suffer the round-trip delay of sending a request to the server, and waiting for the response, but it did save having to re-download the static content.
Why Max-Age? Why Expires?
Because Last-Modified sucks.
Not everything on the server has a date associated with it. If I'm building a page on the fly, there is no date associated with it - it's now. But I'm perfectly willing to let the user cache the homepage for 15 seconds:
200 OK
Cache-Control: max-age=15
If the user hammers F5, they'll keep getting the cached version for 15 seconds. If it's a corporate proxy, then all 67,198 users hitting the same page in the same 15-second window will all get the same contents - all served from close cache. Performance win for everyone.
The virtue of adding Cache-Control: max-age is that the browser doesn't even have to perform a "conditional" request.
if you specified only Last-Modified, the browser has to perform a If-Modified-Since request, and watch for a 304 Not Modified response
if you specified max-age, the browser won't even have to suffer the network round-trip; the content will come right out of the caches.
The difference between "Cache-Control: max-age" and "Expires"
Expires is a legacy (c. 1998) equivalent of the modern Cache-Control: max-age header:
Expires: you specify a date (yuck)
max-age: you specify seconds (goodness)
And if both are specified, then the browser uses max-age:
200 OK
Cache-Control: max-age=60
Expires: 20180403T192837
Any web-site written after 1998 should not use Expires anymore, and instead use max-age.
What is ETag?
ETag is similar to Last-Modified, except that it doesn't have to be a date - it just has to be a something.
If I'm pulling a list of products out of a database, the server can send the last rowversion as an ETag, rather than a date:
200 OK
ETag: "247986"
My ETag can be the SHA1 hash of a static resource (e.g. image, js, css, font), or of the cached rendered page (i.e. this is what the Mozilla MDN wiki does; they hash the final markup):
200 OK
ETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
And exactly like in the case of a conditional request based on Last-Modified:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
304 Not Modified
I can perform a conditional request based on the ETag:
GET / HTTP/1.1
If-None-Match: "33a64df551425fcc55e4d42a148795d9f25f89d4"
304 Not Modified
An ETag is superior to Last-Modified because it works for things besides files, or things that have a notion of date. It just is
RFC 2616, section 14.9.1:
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache...A private (non-shared) cache MAY cache the response.
Browsers could use this information. Of course, the current "user" may mean many things: OS user, a browser user (e.g. Chrome's profiles), etc. It's not specified.
For me, a more concrete example of Cache-Control: private is that proxy servers (which typically have many users) won't cache it. It is meant for the end user, and no one else.
FYI, the RFC makes clear that this does not provide security. It is about showing the correct content, not securing content.
This usage of the word private only controls where the response may be cached, and cannot ensure the privacy of the message content.
The Expires entity-header field gives the date/time after which the response is considered stale.The Cache-control:maxage field gives the age value (in seconds) bigger than which response is consider stale.
Althought above header field give a mechanism to client to decide whether to send request to the server. In some condition, the client send a request to sever and the age value of response is bigger then the maxage value ,dose it means server needs to send the resource to client? Maybe the resource never changed.
In order to resolve this problem, HTTP1.1 gives last-modifided head. The server gives the last modified date of the response to client. When the client need this resource, it will send If-Modified-Since head field to server. If this date is before the modified date of the resouce, the server will sends the resource to client and gives 200 code.Otherwise,it will returns 304 code to client and this means client can use the resource it cached.
I have these headers being sent to the client by the server:
Cache-Control:private
Connection:keep-alive
Content-Encoding:gzip
Content-Type:text/html
Date:Sun, 27 Nov 2011 11:10:38 GMT
ETag:"12341234"
Set-Cookie:connect.sid=e1u...7o; path=/; expires=Sun, 27 Nov 2011 11:40:38 GMT; httpOnly
Transfer-Encoding:chunked
last-modified:Sat, 26 Nov 2011 21:42:45 GMT
I want the client to validate that the file hasn't changed on the server and send a "200" if it has otherwise a "304".
Firefox sends:
if-modified-since: Sat, 26 Nov 2011 21:42:45 GMT
if-none-match: "12341234"
Why isn't the chrome sending the same on a refresh of the page? I'm after the behavior that .Net has running:
context.Response.Cache.SetCacheability(HttpCacheability.ServerAndPrivate)
After spending half a day on this yesterday, I tracked down what was causing the issue for me. So long as you have the Chrome object inspector/Client Debugger/Network monitor/Thing that pops up when you hit F12, Chrome will not send cache request headers. Period. (update: in newer versions of Chrome, there is a checkbox "Disable cache"). Even if you don't have the "network" tab open (ex: have the javascript console open), this checkbox still disables all cacheing.
Its sad, because debugging this from the client side obligates you to leave the network panel open to see what headers are being sent and received, and what codes are being returned. Without the network panel open, there is no way to know if your content is being cached from the client side.
If you dig into your server access logs, you will notice your server returning 304s(Cached Content) the minute you close the debug window on your Chrome client. Hope this helps.
Chrome 24.0.1312.57
I found one answer to this behaviour when using HTTPS, thought I'd share what I found. You do not specify if you are requesting via HTTP or HTTPS.
"The rule is actually quite simple: any error with the certificate means the page will not be cached."
https://code.google.com/p/chromium/issues/detail?id=110649
If you are using a self-signed certificate, even if you tell Chrome to add an exception for it so that the page loads, no resources from that page will be cached, and subsequent requests will not have an If-Modified-Since header.
In my experience you need more than just the "Private" Cache-Control header. You need either "Max-Age" or "Expires" to force Chrome to revalidate content with the server.
Remember that revalidation will only start after these time values have elapsed, so they may need to be set to a small value.
In addition (https://stackoverflow.com/a/14899869/362780):
F12 > Settings > General > Disable cache (while DevTools is open) -> uncheck this...
Browsers have a lot of counter intuitive behavior when it comes to caching. You would expect, that if the response includes a last-modified-date, that the browser would revalidate this before reusing it. But none of the major browsers actually do that.
The ideal settings for your situation depend on when you want the browser to revalidate, see link below.
Not only do browsers act counter intuitively, different browsers also behave differently in the same situation. For example when the user clicks on the refresh button.
You can read how the different browsers (Internet Explorer, Edge, Safari, FireFox, Chrome) behave with different caching directives (Etag, last-modified, must-revalidate, expires, max-age, no-cache, no-store) at https://gertjans.home.xs4all.nl/javascript/cache-control.html
I know this question is old, but still..
I noticed that chrome remembers the last refresh that you made. So if you press ctrl+shift+r (refreshing and deleting cache), and pressing ctrl+r (just refreshing), chrome keeps on deleting cache and does not show the 304 in the received response. There is a workaround for this. Press ctrl+shift+r and then go to the address bar, focus it, and hit enter. If your etags are set correctly, and your server is ready to serve a 304, you'll see a new response code in the debugger - 304. so it works.
From a browser perspective,
What occur if a component (image, script, stylesheet...) is served without a Last-Modified HTTP header field...
Is it however cached by the browser even if it won't be able to perform a validity check(If-Modified-Since) in future, due to his lack of date/time information?
Eg:
GET /foo.png HTTP/1.1
Host: example.org
--
200 OK
Content-Type: image/png
...
Is foo.png however cached?
--
Would you know any online service to serve my raw HTTP response that I can write myself in order to test what I'm asking ?
Thank you.
Generally speaking, responses can be cached unless they explicitly say that they can't (e.g., with cache-control: no-store).
However, most caches will not store responses that don't have something that they can base freshness on, e.g., Cache-Control, Expires, or Last-Modified.
For the complete rules, see:
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-p6-cache-13#section-2.1
See:
http://www.mnot.net/blog/2009/02/24/unintended_caching
for an example of how this can surprise some people.
Yes, the image may get cached even without a Last-Modified response header.
The browser will then cache the image until its TTL expires. You can set the image's Time To Live using appropriate response headers, e.g. this would set the TTL to one hour:
Cache-Control: max-age=3600
Date: Tue, 29 Mar 2011 20:18:17 GMT
Expires: Tue, 29 Mar 2011 21:18:17 GMT
Even without any Last-Modified in the response, the browser may still use the Date header for subsequent If-Modified-Since requests.
I disabled the Last-Modified header on a large site and FF 13 doesn't take the contents from cache, although a max-age is given etc. Contents without a Last-Modified header ALWAYS get a status 200 ok when requested, not a 304. So the browser looks for it in the cache.
I have a couple of queries related to Cache-Control.
If I specify Cache-Control max-age=3600, must-revalidate for a static html/js/images/css file, with Last Modified Header defined in HTTP header:
Does browser/proxy cache(like Squid/Akamai) go all the way to origin server to validate before max-age expires? Or will it serve content from cache till max-age expires?
After max-age expiry (that is expiry from cache), is there a If-Modified-Since check or is content re-downloaded from origin server w/o If-Modified-Since check?
a) If the server includes this header:
Cache-Control "max-age=3600, must-revalidate"
it is telling both client caches and proxy caches that once the content is stale (older than 3600 seconds) they must revalidate at the origin server before they can serve the content. This should be the default behavior of caching systems, but the must-revalidate directive makes this requirement unambiguous.
b) The client should revalidate. It might revalidate using the If-Match or If-None-Match headers with an ETag, or it might use the If-Modified-Since or If-Unmodified-Since headers with a date.
a. Look at the ‘Stats’ tab on this page and see what happens.
b. After expiration the browser will check at the server if the file is updated. If not, the server will respond with a 304 Not Modified header and nothing is downloaded.
You can check this behaviour yourself by looking at the ‘Net’ panel in Firebug or similar tools. Just re-enter the URL in the address bar and compare the number of HTTP requests with the number of requests when your cache is empty.
The given answers are incorrect, at least for web browsers in 2019.
"After expiration the browser will check at the server if the file is updated" <- not true
I have a static file served with "Cache-Control: public,must-revalidate,max-age=864000" and both Chrome and Firefox do a request every time (and get a 304 Not Modified back every time).