I see a lot of sites that return an html page with no caching parameters in the header. No pragma, no cache-control, no e-tag, no expiration date... nothing... From the http 1.2 spec, it seems like this means it can be cached but I am not sure. Any one know the exact rule that governs caching if there are no cache directives in the response?
I think I found the answer. The http 1.1 spec allows for a scenario where there are no cache directives. In this case the receiving cache can use a Heuristic Expiration algorithm. For me that is the answer. It is cache-able. However, I have found that firefox and Chrome will not cache it. I did see a post that found the chrome source that sets "DEFAULT_CACHE_TIME = 300" which I believe is about 5 minutes or maybe it's 5 ms.. I don't know... Anyway, I just needed to know if the spec allowed a cache to cache an object that had had no cache directives.
Below quote is from:
http://home.anadolu.edu.tr/~egermen/EEM534/Refreshment%20policies%20for%20Web%20content%20caches%20.pdf
Otherwise, no explicit
freshness lifetime is provided by the origin
server and a heuristic is used: the freshness lifetime
is assigned to be a fraction (HTTP/1.1 mentions
10% as an example) of the time difference between
the timestamp at the DATE header and the time
specified by the LAST-MODIFIED header, subject
to a maximum allowed value (usually 24 h, since
HTTP/1.1 requires that the cache must attach a
warning if heuristic expiration is used and the
object’s age exceeds 24 h).
Related
nginx version: nginx/1.14.0 (Ubuntu)
Trying to study how to deal with browser cache.
Could you explain to me why in case of html the browser sent request to the server, whereas in case of css it didn't?
In other words, why in case of html we have 304, but in case of css we have 200 (from disk cache)?
The server didn't provide any information to the browser on how long to cache its resources. (That is, it didn't include Cache-Control or Expires headers.) Therefore the browser is free to come up with its own heuristic freshness, as described in RFC 7234:
Since origin servers do not always provide explicit expiration times,
a cache MAY assign a heuristic expiration time when an explicit time
is not specified, employing algorithms that use other header field
values (such as the Last-Modified time) to estimate a plausible
expiration time.
Presumably the browser assigned a longer freshness time to the static CSS resource than it did to the HTML page. Which makes sense.
If you care how your resources are cached, the answer is simple—provide explicit direction using the appropriate cache headers.
What's the difference in browser behavior between two headers Cache-control: max-age=0 and Cache-control: max-age=-1?
If the browser receives max-age=0, it will revalidate cache immediately.
If the browser receives max-age=10, it will revalidate cache after 10 seconds.
What's browser behavior with max-age=-1? Is it the same like with max-age=0? If yes, why we need both?
max-age takes an argument that matches delta-seconds:
The delta-seconds rule specifies a non-negative integer, representing
time in seconds.
delta-seconds = 1*DIGIT
max-age=-1 is therefore not a valid directive, and the specification doesn't define an interpretation. The spec suggests:
Caches are
encouraged to consider responses that have invalid freshness
information to be stale.
From https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3
max-age
When an intermediate cache is forced, by means of a max-age=0 directive, to revalidate its own cache entry, and the client has supplied its own validator in the request, the supplied validator might differ from the validator currently stored with the cache entry. In this case, the cache MAY use either validator in making its own request without affecting semantic transparency.
However, the choice of validator might affect performance. The best approach is for the intermediate cache to use its own validator when making its request. If the server replies with 304 (Not Modified), then the cache can return its now validated copy to the client with a 200 (OK) response. If the server replies with a new entity and cache validator, however, the intermediate cache can compare the returned validator with the one provided in the client's request, using the strong comparison function. If the client's validator is equal to the origin server's, then the intermediate cache simply returns 304 (Not Modified). Otherwise, it returns the new entity with a 200 (OK) response.
From https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Max-Age
delta-seconds
Maximum number of seconds the results can be cached.
Firefox caps this at 24 hours (86400 seconds) and Chromium at 10 minutes (600 seconds). Chromium also specifies a default value of 5 seconds.
A value of -1 will disable caching, requiring a preflight OPTIONS check for all calls.
I'm wondering how the browser determines whether a cached resource has expired or not.
Assume that I have set the max-age header to 300. I made a request at 14:00, 3 minutes later I made another request to the same resource. So how can the browser tell the resource haven't expired (the current age which is 180 is less than the max-age)? Does the browser hold a "expiry date" or "current age" for every requested resource? If so how can I inspect the "current age" at the time I made the request?
Check what browsers store in their cache
To have a better understanding on how the browser cache works, check what the browsers actually store in their cache:
Firefox: Navigate to about:cache.
Chrome: Navigate to chrome://cache.
Note that there's a key for each cache entry (requested URL). Associated with the key, you will find the whole response details (status codes, headers and content). With those details, the browser is able to determine the age of a requested resource and whether it's expired or not.
The reference for HTTP caching
The RFC 7234, the current reference for caching in HTTP/1.1, tells you a good part of the story about how cache is supposed to work:
2. Overview of Cache Operation
Proper cache operation preserves the semantics of HTTP transfers
while eliminating the transfer of information already
held in the cache. Although caching is an entirely OPTIONAL feature
of HTTP, it can be assumed that reusing a cached response is
desirable and that such reuse is the default behavior when no
requirement or local configuration prevents it. [...]
Each cache entry consists of a cache key and one or more HTTP
responses corresponding to prior requests that used the same key.
The most common form of cache entry is a successful result of a
retrieval request: i.e., a 200 (OK) response to a GET request, which
contains a representation of the resource identified by the request
target. However, it is also possible to
cache permanent redirects, negative results (e.g., 404 (Not Found)),
incomplete results (e.g., 206 (Partial Content)), and responses to
methods other than GET if the method's definition allows such caching
and defines something suitable for use as a cache key.
The primary cache key consists of the request method and target URI.
However, since HTTP caches in common use today are typically limited
to caching responses to GET, many caches simply decline other methods
and use only the URI as the primary cache key. [...]
Some rules are defined regarding storing responses in caches:
3. Storing Responses in Caches
A cache MUST NOT store a response to any request, unless:
The request method is understood by the cache and defined as being
cacheable, and
the response status code is understood by the cache, and
the no-store cache directive does not appear
in request or response header fields, and
the private response directive does not
appear in the response, if the cache is shared, and
the Authorization header field does
not appear in the request, if the cache is shared, unless the
response explicitly allows it, and
the response either:
contains an Expires header field, or
contains a max-age response directive, or
contains a s-maxage response directive
and the cache is shared, or
contains a Cache Control Extension that
allows it to be cached, or
has a status code that is defined as cacheable by default, or
contains a public response directive.
Note that any of the requirements listed above can be overridden by a cache-control extension. [...]
Usually (but not always) the server providing the resource will provide a Date header, indicating the time at which that resource was requested. Caching entities can use that Date and the current time to find the resource's age. If the Date response header does not appear, that the caching entity will probably mark the resource's request time in other metadata, and use that metadata for computing the age. Another possibly helpful response header to look for is the Last-Modified response header.
So first, you should check if the cached resource has the Date header for your own age calculation. If not present, it will then depend on which specific browser you are using, and how that browser handles caching for Date-less resources. More information on HTTP caching and the various factors involved, can be found in this caching tutorial.
Hope this helps!
I have a simple question. I googled but no answer found.
I have a page. I want disable cache for the page content.
Yes. I can add Cache-control directive such as
Cache-Control: no-cache, no-store, must-revalidate, max-age: 0
But question is: If there is NO HTTP headers relating to Cache returned such as Cache-Control, Expires, Pragma, Last-Modified, ... Does browser/proxy ever cache response in this case? If yes, when?
Thank you!
RFC-compliant clients can be kept from caching a page through a variety of instructions. However, as far as a browser's history is concerned, anything goes.
If there are no headers suitable for cache validation, neither intermediaries nor clients should consider a response to be cacheable:
A cache MUST NOT store a response to any request, unless […] the response either:
contains an Expires header field, or
contains a max-age response directive, or
contains a s-maxage response directive and the cache is shared, or
contains a Cache Control Extension that allows it to be cached, or
has a status code that is defined as cacheable by default, or
contains a public response directive
A loophole may be those responses with status codes considered to be cacheable by default as per RFC 7231, section 6.1:
Responses with status codes that are defined as cacheable by default (e.g., 200, 203, 204, 206, 300, 301, 404, 405, 410, 414, and 501 in this specification) can be reused by a cache with heuristic expiration unless otherwise indicated by the method definition or explicit cache controls
The linked section of RFC 7234 is deliberately vague. My reading is that caches are to interpolate reasonable expiration times based on whatever other headers they can find. This may very well allow expiration times to be based on the parameters of a Set-Cookie header. Going back to section 3, the closing statement reinforces this by stating that
[…] in normal operation, some caches will not store a response that has neither a cache validator nor an explicit expiration time, as such responses are not usually useful to store. However, caches are not prohibited from storing such responses.
Browsers, however, are free to serve pages out of their history at will. From section 6:
The freshness model does not necessarily apply to history mechanisms. That is, a history mechanism can display a previous representation even if it has expired.
In conclusion, intermediaries have a lot of liberties at hand to cache a response with no obvious cache control instructions. Always provided the request method (e.g. GET, HEAD) and the response code (see above) are cacheable in the first place. The browser's cache is supposed to behave like any ordinary intermediary (in a way it is, really), but in the context of the history mechanism, it is free to deliberately ignore all caching mechanisms (present or absent) and load pages directly from its memory.
I'm trying to work out a new caching policy for the static resources on a website. A common problem is whenever javascript, CSS etc. is updated, many users hold onto stale versions because currently there are no caching specific HTTP headers included in the file responses.
This becomes a serious problem when, for example, the javascript updates are linked to server-side updates, and the stale javascript chokes on the new server responses.
Eliminating browser caching completely with a cache-control: max-age=0, no-cache seems like overkill, since I'd still like to take some pressure off the server by letting browsers cache temporarily. So, setting the cache policy to a maximum of one hour seems alright, like cache-control: max-age=3600, no-cache.
My understanding is that this will always fetch a new copy of the resource if the cached copy is older than one hour. I'd specifically like to know if it's possible to set a HTTP header or combination of headers that will instruct browsers to only fetch a new copy if the resource was last checked more than one hour ago AND if the resource has changed.
I'm just trying to avoid browsers blindly fetching new copies just because the cached resource is older than one hour, so I'd also like to add the condition that the resource has been changed.
Just to illustrate further what I'm asking:
New user arrives at site and gets fresh copy of script.js
User stays on site for 45 mins, browser uses cached copy of script.js all the time
User comes back to site 2 hours later, and browser asks the server if script.js has changed
If it has, then it gets a fresh copy and the process repeats
If it has not changed, then it uses the cached copy for the next hour, after which it will check again
Have I misunderstood things? Is what I'm asking how it actually works, or do I have to do something different?
Have I misunderstood things? Is what I'm asking how it actually works,
or do I have to do something different?
You have some serious misconceptions about what the various cache control directives do and why cache behaves as it does.
Eliminating browser caching completely with a cache-control:
max-age=0, no-cache seems like overkill, since I'd still like to take
some pressure off the server by letting browsers cache temporarily ...
The no-cache option is wrong too. Including it means the browser will always
check with the server for modifications to the file every time.
That isn't what the no-cache means or what it is intended for - it means that a client MUST NOT used a cached copy to satisfy a subsequent request without successful revalidation - it does not and has never meant "do not cache" - that is what the no-store directive is for
Also the max-age directive is just the primary means for caches to calculate the freshness lifetime and expiration time of cached entries. The Expires header (minus the value of the Date header can also be used) - as can a heuristic based on the current UTC time and any Last-Modified header value.
Really if your goal is to retain the cached copy of a resource for as long as it is meaningful - whilst minimising requests and responses you have a number of options.
The Etag (Entity Tag) header - this is supplied by the server in response to a request in either a "strong" or "weak" form. It is usually a hash based on the resource in question. When a client re-requests a resource it can pass the stored value of the Etag with the If-None-Match request header. If the resource has not changed then the server will respond with 304 Not Modified.
You can think Etags as fingerprints for resources. They can be used to massively reduce the amount of information sent over the wire - as only fresh data is served - but they do not have any bearing on the number of times or frequency of requests.
The last-modified header - this is supplied by the server in response to a request in HTTPdate format - it tells the client the last time the resource was modified.
When a client re-requests a resource it can pass the stored value of the last-modified header with the If-Modified-Since request header. If the resource has not changed since the time it was last modified then the server will respond with 304 Not Modified.
You can think of last modified as a weaker form of entity checking than Etags. It addresses the same problem (bandwidth/redundancy) it in a less robust way and again it has no bearing at all on the actual number of requests made.
Revving - a technique that use a combination of the Expires header and the name (URN) of a resource. (see stevesouders blog post)
Here one basically sets a far forward Expires header - say 5 years from now - to ensure the static resource is cached for a long time.
You then have have two options for updating - either by appending a versioning query string to the requests URL - e.g. "/mystyles.css?v=1.1" - and updating the version number as and when the resource changes. Or better - versioning the file name itself e.g. "/mystyles.v1.1.css" so that each version is cached for as long as possible.
This way not only do you reduce the amount of bandwidth - you will as eliminate all checks to see if the resource has changed until you rename it.
I suppose the main point here is none of the catch control directives you mention max-age, public, etc have any bearing at all on if a 304 response is generated or not. For that use either Etag / If-None-Match or last-modified / If-Modified-Since or a combination of them (with If-Modified-Since as a fallback mechanism to If-None-Match).
It seems that I have misunderstood how it works, because some testing in Chrome has revealed exactly the behavior that I was looking for in the 5 steps I mentioned.
It doesn't blindly grab a fresh copy from the server when the max-age has expired. It does a GET, and if the response is 304 (Not Modified), it continues using the cached copy until the next hour has expired, at which point it checks for changes again etc.
The no-cache option is wrong too. Including it means the browser will always check with the server for modifications to the file every time. So what I was really looking for is:
Cache-Control: public, max-age=3600