Max-age and 304 Not Modified Processing - http

I've been looking at the standards - but was not entirely sure about the following:
If we have a variant (resource, image, page etc) that is served with a cache setting of max-age=259200 (3 days) and the server is also processing ETags and last modified dates - then what will happen when the max-age is reached - but the resource has not been modified?
What I'm hoping will happen is that after 3 days - the client will request the resource again - and if it has not changed will received a 304 Not Modified response. If the cache control response (during the 304 response) also still contains max-age=259200 - then I'm hoping the client will continue to use its local cached copy and not request again for another 3 days.
What I'm afraid will happen is that once the max-age is reached - the client will no longer cache the resource - making a fresh request each time the resource is loaded - followed by a 304 Not Modified response if the resource has not been modified. i.e. we're now getting http requests for every use as opposed to using the local cache for another 3 days.
Thoughts?

It will cache for 3 more days. RFC 2616 10.3.5:
If a cache uses a received 304 response to update a cache entry, the cache MUST update the entry to reflect any new field values given in the response.
Details about age calculation.

Related

cache-control not working without etag

I am sending the following header in the reponse. "Cache-Control: public, max-age=300", but still every time I hit refresh I get a 200 response(the request is made to he server again). Same happens if I add the "Expires" header.
But if I add a ETag to the headers, then I get 304 on refresh(the request goes to the server, the server prepares the response, then matches the ETag and returns a 204 response).
What should I change so that "Cache-Control" header is used and the
content is served from local cache and no request is sent to the server until the age becomes more than "max-age"?
EDIT: Here is an image that doesn't get cached https://image-dev-dot-quizizz-dev.appspot.com/resource/gs/quizizz-image/rejected.jpeg
Your image is in proxy cache - notice that in a response you get Age header. Furthermore, every next request in 300 seconds takes much less time. Why is the status not 304? According to this article:
200 (from cache) vs 304
Now the other day while performing a site performance audit I noticed
that a lot of our assets were returning 304 statuses. While comparing
another site I noticed that it was returning a 200 (from cache) status
code. This intrigued me and I wanted to dig deeper.
It turns out that when a 200 (from cache) response is given it means
that a future expiration date on the content is set. In essence the
browser doesn’t even really communicate with the server to check on
the file. It knows not to do it until the expiration date has
expired.
By contrast a 304 goes to the server and receives a response back that
the data has not change. The server is telling the browser to use the
cache as a result.

How can I tell the "current age" of a cached page?

I'm wondering how the browser determines whether a cached resource has expired or not.
Assume that I have set the max-age header to 300. I made a request at 14:00, 3 minutes later I made another request to the same resource. So how can the browser tell the resource haven't expired (the current age which is 180 is less than the max-age)? Does the browser hold a "expiry date" or "current age" for every requested resource? If so how can I inspect the "current age" at the time I made the request?
Check what browsers store in their cache
To have a better understanding on how the browser cache works, check what the browsers actually store in their cache:
Firefox: Navigate to about:cache.
Chrome: Navigate to chrome://cache.
Note that there's a key for each cache entry (requested URL). Associated with the key, you will find the whole response details (status codes, headers and content). With those details, the browser is able to determine the age of a requested resource and whether it's expired or not.
The reference for HTTP caching
The RFC 7234, the current reference for caching in HTTP/1.1, tells you a good part of the story about how cache is supposed to work:
2. Overview of Cache Operation
Proper cache operation preserves the semantics of HTTP transfers
while eliminating the transfer of information already
held in the cache. Although caching is an entirely OPTIONAL feature
of HTTP, it can be assumed that reusing a cached response is
desirable and that such reuse is the default behavior when no
requirement or local configuration prevents it. [...]
Each cache entry consists of a cache key and one or more HTTP
responses corresponding to prior requests that used the same key.
The most common form of cache entry is a successful result of a
retrieval request: i.e., a 200 (OK) response to a GET request, which
contains a representation of the resource identified by the request
target. However, it is also possible to
cache permanent redirects, negative results (e.g., 404 (Not Found)),
incomplete results (e.g., 206 (Partial Content)), and responses to
methods other than GET if the method's definition allows such caching
and defines something suitable for use as a cache key.
The primary cache key consists of the request method and target URI.
However, since HTTP caches in common use today are typically limited
to caching responses to GET, many caches simply decline other methods
and use only the URI as the primary cache key. [...]
Some rules are defined regarding storing responses in caches:
3. Storing Responses in Caches
A cache MUST NOT store a response to any request, unless:
The request method is understood by the cache and defined as being
cacheable, and
the response status code is understood by the cache, and
the no-store cache directive does not appear
in request or response header fields, and
the private response directive does not
appear in the response, if the cache is shared, and
the Authorization header field does
not appear in the request, if the cache is shared, unless the
response explicitly allows it, and
the response either:
contains an Expires header field, or
contains a max-age response directive, or
contains a s-maxage response directive
and the cache is shared, or
contains a Cache Control Extension that
allows it to be cached, or
has a status code that is defined as cacheable by default, or
contains a public response directive.
Note that any of the requirements listed above can be overridden by a cache-control extension. [...]
Usually (but not always) the server providing the resource will provide a Date header, indicating the time at which that resource was requested. Caching entities can use that Date and the current time to find the resource's age. If the Date response header does not appear, that the caching entity will probably mark the resource's request time in other metadata, and use that metadata for computing the age. Another possibly helpful response header to look for is the Last-Modified response header.
So first, you should check if the cached resource has the Date header for your own age calculation. If not present, it will then depend on which specific browser you are using, and how that browser handles caching for Date-less resources. More information on HTTP caching and the various factors involved, can be found in this caching tutorial.
Hope this helps!

Recognize HTTP 304 in service worker / fetch()

I build a service worker which always responds with data from the cache and then, in the background, sends a request to the server. If the server responds with HTTP 304 - not modified everything is fine, if the server responds with HTTP 200, that means the data was changed and the new file is put into the cache, also the user is notified and asked for a page refresh.
I use the not-modified-since / last-modified headers to make sure the client gets the most up-to-date version. When a request is sent via fetch() the request passes the HTTP-cache on it's way to the network - also the response passes the HTTP cache when it arrives on the client. The problem is when the response has the status 304 - not modified the HTTP cache responds to the service worker with the cached version and replaces the status with 200 (as it is described in the fetch specification - HTTP-network-or-cache-fetch). In the service worker there is no possibility to find out whether the 200 response was initially sent by the server (the user needs to be updated) or it was sent by the cache and the server originally responded with 304 (most up-to-date version is already loaded).
There is the cache-mode flag which can be set to no-cache, but this bypasses the HTTP-cache also when the request is sent to the server, which means that the if-modified-since header is not set and the server has no chance to find out which version the client has. Also this flag is only supported by firefox nightly right now.
I think the best solution is to set a custom HTTP header like x-was-modified when the server responds with 200. This custom header can be accessed in the service worker and can be used to find out whether a resource was updated or not - even if the HTTP cache replaces the 304 status with 200.
Is this a legit solution/workaround? Are there any suggested approaches to solve this problem?
Should I even rely on HTTP headers which are supposed to handle the HTTP cache when implementing the service worker cache? Or should I rather use custom x-if-modified-since / x-last-modified headers and use indexedDB to store the information on the client and append it to each request?
Why does fetch() even replace the 304 code with 200 if there is a up-to-date version in the cache?
You can’t rely on the status code (304 vs. 200) to determine whether something has changed. What if some other part of your code requests the same resource, thus updating the browser’s cache?
Instead, simply compare the response’s Last-Modified header to what you sent in If-Modified-Since, or whatever you last saw in Last-Modified. If the values don’t match, something has changed.
For more precision (if the data can change several times in 1 second), consider using ETag instead of Last-Modified.
Why does fetch() even replace the 304 code with 200 if there is a up-to-date version in the cache?
Because usually people just want to get fresh content, regardless of where it comes from. A 304 response is only interesting to those who implement their own HTTP caches.

Checking if HTTP resource has changed after maximum cache time has expired

I'm trying to work out a new caching policy for the static resources on a website. A common problem is whenever javascript, CSS etc. is updated, many users hold onto stale versions because currently there are no caching specific HTTP headers included in the file responses.
This becomes a serious problem when, for example, the javascript updates are linked to server-side updates, and the stale javascript chokes on the new server responses.
Eliminating browser caching completely with a cache-control: max-age=0, no-cache seems like overkill, since I'd still like to take some pressure off the server by letting browsers cache temporarily. So, setting the cache policy to a maximum of one hour seems alright, like cache-control: max-age=3600, no-cache.
My understanding is that this will always fetch a new copy of the resource if the cached copy is older than one hour. I'd specifically like to know if it's possible to set a HTTP header or combination of headers that will instruct browsers to only fetch a new copy if the resource was last checked more than one hour ago AND if the resource has changed.
I'm just trying to avoid browsers blindly fetching new copies just because the cached resource is older than one hour, so I'd also like to add the condition that the resource has been changed.
Just to illustrate further what I'm asking:
New user arrives at site and gets fresh copy of script.js
User stays on site for 45 mins, browser uses cached copy of script.js all the time
User comes back to site 2 hours later, and browser asks the server if script.js has changed
If it has, then it gets a fresh copy and the process repeats
If it has not changed, then it uses the cached copy for the next hour, after which it will check again
Have I misunderstood things? Is what I'm asking how it actually works, or do I have to do something different?
Have I misunderstood things? Is what I'm asking how it actually works,
or do I have to do something different?
You have some serious misconceptions about what the various cache control directives do and why cache behaves as it does.
Eliminating browser caching completely with a cache-control:
max-age=0, no-cache seems like overkill, since I'd still like to take
some pressure off the server by letting browsers cache temporarily ...
The no-cache option is wrong too. Including it means the browser will always
check with the server for modifications to the file every time.
That isn't what the no-cache means or what it is intended for - it means that a client MUST NOT used a cached copy to satisfy a subsequent request without successful revalidation - it does not and has never meant "do not cache" - that is what the no-store directive is for
Also the max-age directive is just the primary means for caches to calculate the freshness lifetime and expiration time of cached entries. The Expires header (minus the value of the Date header can also be used) - as can a heuristic based on the current UTC time and any Last-Modified header value.
Really if your goal is to retain the cached copy of a resource for as long as it is meaningful - whilst minimising requests and responses you have a number of options.
The Etag (Entity Tag) header - this is supplied by the server in response to a request in either a "strong" or "weak" form. It is usually a hash based on the resource in question. When a client re-requests a resource it can pass the stored value of the Etag with the If-None-Match request header. If the resource has not changed then the server will respond with 304 Not Modified.
You can think Etags as fingerprints for resources. They can be used to massively reduce the amount of information sent over the wire - as only fresh data is served - but they do not have any bearing on the number of times or frequency of requests.
The last-modified header - this is supplied by the server in response to a request in HTTPdate format - it tells the client the last time the resource was modified.
When a client re-requests a resource it can pass the stored value of the last-modified header with the If-Modified-Since request header. If the resource has not changed since the time it was last modified then the server will respond with 304 Not Modified.
You can think of last modified as a weaker form of entity checking than Etags. It addresses the same problem (bandwidth/redundancy) it in a less robust way and again it has no bearing at all on the actual number of requests made.
Revving - a technique that use a combination of the Expires header and the name (URN) of a resource. (see stevesouders blog post)
Here one basically sets a far forward Expires header - say 5 years from now - to ensure the static resource is cached for a long time.
You then have have two options for updating - either by appending a versioning query string to the requests URL - e.g. "/mystyles.css?v=1.1" - and updating the version number as and when the resource changes. Or better - versioning the file name itself e.g. "/mystyles.v1.1.css" so that each version is cached for as long as possible.
This way not only do you reduce the amount of bandwidth - you will as eliminate all checks to see if the resource has changed until you rename it.
I suppose the main point here is none of the catch control directives you mention max-age, public, etc have any bearing at all on if a 304 response is generated or not. For that use either Etag / If-None-Match or last-modified / If-Modified-Since or a combination of them (with If-Modified-Since as a fallback mechanism to If-None-Match).
It seems that I have misunderstood how it works, because some testing in Chrome has revealed exactly the behavior that I was looking for in the 5 steps I mentioned.
It doesn't blindly grab a fresh copy from the server when the max-age has expired. It does a GET, and if the response is 304 (Not Modified), it continues using the cached copy until the next hour has expired, at which point it checks for changes again etc.
The no-cache option is wrong too. Including it means the browser will always check with the server for modifications to the file every time. So what I was really looking for is:
Cache-Control: public, max-age=3600

304 Not Modified with 200 (from cache)

I'm trying to understand what exactly is the difference between "status 304 not modified" and "200 (from cache)"
I'm getting 304 on javascript files that I changed last. Why does this happen?
Thanks for the assistance.
https://sookocheff.com/post/api/effective-caching/ is an excellent source to form the required understanding around these 2 HTTP status codes.
After reading this thoroughly, I had this understanding
In typical usage, when a URL is retrieved, the web server will return the resource's current representation along with its corresponding ETag value, which is placed in an HTTP response header "ETag" field. The client may then decide to cache the representation, along with its ETag. Later, if the client wants to retrieve the same URL resource again, it will first determine whether the local cached version of the URL has expired (through the Cache-Control and the Expire headers). If the URL has not expired, it will retrieve the local cached resource. If it determined that the URL has expired (is stale), then the client will contact the server and send its previously saved copy of the ETag along with the request in a "If-None-Match" field. (Source: https://en.wikipedia.org/wiki/HTTP_ETag)
But even when expires time for an asset is set in future, browser can still reach the server for a conditional GET using ETag as per the 'Vary' header. Details on 'vary' header: https://www.fastly.com/blog/best-practices-using-vary-header/
"200 (from cache)" means the browser found a cached response for the request made. So instead of making a network call to fetch that resource from the remote server, it simply made use of the cached response.
Now these cached responses have a life time associated with them. With time, the responses can become stale. And if these are not allowed to be served stale (see section 4.2.4 - RFC7234), the browser needs to contact the remote server and verify whether these cached responses are still valid or not. The 304 response status code is the servers' way of letting the browser know that the response hasn't changed and is still valid.
If the browser receives a 304 response, it can "freshen" the relevant stale response(s). And subsequent requests to the resource can again be served from the browser cache without checking with the remote server (until the response becomes stale again).
304 Modified
A 304 Not Modified means that the files have not been modified since the specified version by the "If-Modified-Since", or "If-None-Match".
200 OK
This is the response you'll get if an HTTP request works. GET requests will have something relating to the files. A POST request will have something holding the result of the action.
Happy coding!Lyfe

Resources