`cache-control: max-age=0` in http request - http

I have one question: suppose in each http request there is a cache-control: max-age=0 header, so each request will go all the way to the origin web server.
Does it mean CDN is not useful anymore if all requests are like this?

from other post:
When sent by the user agent
I believe shahkalpesh's answer applies to the user agent side. You can also look at 13.2.6 Disambiguating Multiple Responses.
If a user agent sends a request with Cache-Control: max-age=0 (aka. "end-to-end revalidation"), then each cache along the way will revalidate its cache entry (eg. with the If-Not-Modified header) all the way to the origin server. If the reply is then 304 (Not Modified), the cached entity can be used.
On the other hand, sending a request with Cache-Control: no-cache (aka. "end-to-end reload") doesn't revalidate and the server MUST NOT use a cached copy when responding.
It makes sense and match my result.
when cache is not expired in chrome,it will send request to CDN,CDN will query this with if-modified-since with origin ,then serve the end user.

By setting the max-age to 0, you effectively expire your page in your CDN edge cache immediately. Therefore, your CDN always hit your origin and render the CDN useless as you suggested.
Noticed from your other question that you are using Akamai. If so, then you can use the Edge-Control header to override your cache-control if you don't have direct control over that value, but still want to be able to leverage CDN functionality.

Related

what happened if the HTTP request cache control header is different than the response cache control header

See the screenshot above. The response header has a cache-control set to max-age, which means the maximum amount of time a resource is considered fresh. I believe if we make a request within the time frame, the browser will serve the local copies without bothering asking the server. and the request header has a cache-control set to no-cache, that means, according to MDN,
response may be stored by any cache, even if the request is normally
non-cacheable. However, the stored response MUST always go through
validation with the origin server first before using it,
So here we have a contradiction:
the cache-control directive in the request is no-cache, so the user agent has to consult the server first before using the cache to fulfill the request.
The cache-control in response has a max-age being 86400, suggesting that within that time frame user agents can just use the cache to fulfill the request.
If the time specified in response's max-age hasn't expired, does the browser bypass the cache and send a request to the server because of its no-cache or not?
If the time specified in response's max-age hasn't expired, does the browser bypass the cache and send a request to the server because of its no-cache or not?
Yes, a request will be sent to the origin server. From the specification:
The no-cache request directive indicates that a cache MUST NOT use
a stored response to satisfy the request without successful
validation on the origin server.
There's no contradiction. The max-age in the response indicates how long it can be considered to be fresh. It doesn't obligate anyone to use it. Indeed, caching is an entirely optional part of HTTP, so sending a full request to the origin every time would also be fully compliant with the specification.
Now imagine that the response uses no-cache and the request uses max-age=86400. Again, a request would be sent to the origin server, because "the no-cache response directive indicates that the response MUST NOT be used to satisfy a subsequent request without successful validation on the origin server."
So the real asymmetry here is not between requests and responses, but between caching (optional) and not caching (obligatory when specified).
If the time specified in response's max-age hasn't expired, does the browser bypass the cache and send a request to the server because of
its no-cache or not?
Yes, it will be bypassed and sent a request to the server.
If the client sets max-age and there is no max-stale present, there is no request until the max-age expires. On the other hand, If the client sets no-cache, it always means a request sent without any conditions.
In conclusion, the max-age value of the current request compare to the last value of the response, and if there is no value or equal to no-cache that means always must send a request because the client not spouse to cache anything about that resource

Are both "Cache request directives" and "Cache response directives" needed?

If I already have "Cache request directives," what is the point of "Cache response directives." Do they add anything? Will my application run the same without them?
I looking for proof whether "Cache response directives" are redundant. If they are redundant, I will not bother with them.
GC_
I assume you are asking as an application developer and if so, you should not bother with any Cache-Control header your application receives in a request.
Why?
Because that Cache-Control header is intended for caches before the request reaches your application.
It is not for your application.
This is explained in RFC7234 Section 5.2 (emphasis mine):
The "Cache-Control" header field is used to specify directives for caches along the request/response chain.
The purpose of the header is to tell caches what to do with the request.
Your application receives the header because it is attached to a request.
But just because you receive it, it doesn't mean it is for you.
Bottom line: ignore any Cache-Control header in a request.
Cache-Control in a response comes from your application and it is also intended for caches.
You use it to tell caches what to do with the response.
Basically, you use the header to specify whether the response is cacheable and if it is, for how long.
It is not merely a copy of the Cache-Control header received in a request.
Do they add anything?
Yes, they do.
Cache-Control in a response tells caches whether the response is cacheable and if it is,
it allows caches to serve an equivalent request immediately with a cached response.
This reduces your application's load and improves response times from a client's point of view.
RFC7234 Section 4.2 states:
When a response is "fresh" in the cache, it can be used to satisfy subsequent requests without contacting the origin server, thereby improving efficiency.
Your next question:
Will my application run the same without them?
It depends.
If your application doesn't add appropriate Cache-Control header for responses that must not be cached, future requests may receive stale responses.
So, I recommend that at the very least, add Cache-Control: no-cache to responses that must not be cached.
Additional explanation for your question in the comment section
The header should generally come from your backend, not your frontend.
This allows caches to accurately accelerates requests to your backend and keeps your frontend request code simple.
There is one exception: if the backend isn't yours and its response freshness policy doesn't match your requirement.
An example scenario may be in order:
Let's say, that in addition to sending requests to your own backend, your frontend also sends requests to someone else's backend.
This particular backend specifies that its responses are cacheable for at most 5 minutes, by either sending Cache-Control: max-age=300 or appropriate Expires header.
Let's also say, that you want the responses to be no more than 10 seconds stale, because 5 minutes is too stale for you.
Since the backend isn't yours, you can't change the 5-minutes directive, but you can send your requests with Cache-Control: max-age=10 thereby forcing the caches to fetch a fresh response if a cached response is older than 10 seconds, despite the 5-minutes directive from the backend.
That is the appropriate situation to send Cache-Control header from your frontend: the backend isn't yours and its response freshness policy doesn't match your requirement.
Are both "Cache request directives" and "Cache response directives" needed?
Yes. Cache-Control in request header and Cache-Control in response header are both needed. Even if you already have Cache-Control in request header, Cache-Control in response is not redundant. They are 2 different things. According to RFC7234:
cache directives are unidirectional in that the presence of a directive in a request does not imply that the same directive is to be given in the response.
Generally speaking, Cache-Control in response header controls the cache behaviour from resource provider's point of view. -- should the resource stored in cache? How long would it be valid? When requested, does it need to be revalidated? etc. As response headers can be configured for all HTTP requests, "Cache response directives" provides a way to define cache policy for all resources.
Cache-Control in request header, however, controls the cache behaviour from resource consumer's point of view. It's more like defining exceptional case where the cache policy of specific resource should be adjusted. If you check RFC7234, most of the "Request Cache-Control Directives" indicates that the client is willing to... or indicates that the client is unwilling to...
Also, as request headers can only be configured in some cases (e.g. Ajax), "Cache request directives" doesn't exist for many HTTP requests. For example, after HTML file is parsed, many HTTP requests will be created to fetch static resources (image files, css files etc.), there is no way to configure Cache-Control header for these requests manually in program.
If I already have "Cache request directives", what is the point of "Cache response directives"?
If you only have "Cache request directives" and never get Cache-Control response header, some problems will happen:
Without Cache-Control response header, the cache behaviour of all resources are decided by browser (e.g. calculate valid-time through LM-Factor algorithm). In the worst case, there would be no cache at all.
For static resources (e.g. image files, css files), as you can't configure Cache-Control in request, you lost cache control ability.

Is Cache-Control:must-revalidate obliging to validate all requests, or just the stale ones?

I have a mess with this header, I have read that Cache-Control:must-revalidate oblige to validate all requests with the source before serving a cached item, but just the stale ones? or all no matter if stale or fresh? I have read both things in different places.
What is the difference with Cache-Control:no-cache ? Because these headers look equivalent to me.
UPDATE 1: I have read this from a book:
The Cache-Control: must-revalidate response header tells the cache
to bypass the freshness calculation mechanisms and revalidate on every
access:
#Peter O. has pointed out what the RFC says. So that old book is wrong.
UPDATE 2: In this tutorial : http://www.mnot.net/cache_docs/
no-cache — forces caches to submit the request to the origin server
for validation before releasing a cached copy, every time. This is
useful to assure that authentication is respected (in combination with
public), or to maintain rigid freshness, without sacrificing all of
the benefits of caching.
must-revalidate — tells caches that they must
obey any freshness information you give them about a representation.
HTTP allows caches to serve stale representations under special
conditions; by specifying this header, you’re telling the cache that
you want it to strictly follow your rules.
Section 14.9.4 of HTTP/1.1:
When the must-revalidate directive is present in a response
received by
a cache, that cache MUST NOT use the entry after it becomes
stale
to respond to a subsequent request without first revalidating it
with the
origin server
Section 14.8 of HTTP/1.1:
If the response includes the "must-revalidate" cache-control
directive, the cache MAY use that response in replying to a
subsequent request. But if the response is stale, all caches
MUST first revalidate it with the origin server...
So it appears that only stale responses must be revalidated if
must-revalidate is received.
For no-cache, see section 14.9.1:
If the no-cache directive does not specify a field-name [which is
the case
here], then a cache MUST NOT use the response to satisfy a
subsequent
request without successful revalidation with the origin server...
Thus, no-cache applies both to fresh and stale responses.
EDIT:
This phrase may be relevant here (section 13.3):
When a cache has a stale entry that it would like to use as a response
to a client's request, it first has to check with the origin server
(or possibly an intermediate cache with a fresh response) to see if
its cached entry is still usable.
So, must-revalidate is probably relevant when the cache has intermediate
caches, since otherwise the cache can check the intermediate cache for a
fresh response rather than check the origin server directly.

What is the difference between no-cache and no-store in Cache-control?

I don't find get the practical difference between Cache-Control:no-store and Cache-Control:no-cache.
As far as I know, no-store means that no cache device is allowed to cache that response. In the other hand, no-cache means that no cache device is allowed to serve a cached response without validate it first with the source. But what is that validation about? Conditional get?
What if a response has no-cache, but it has no Last-Modified or ETag?
Regards.
See the below flow chart for better understanding
Ref: (https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=en#cache-control)
But what is that check about?
Exactly checking Last-Modified or ETag. Client would ask server if it has new version of data using those headers and if the answer is no it will serve cached data.
Update
From RFC
no-cache
If the no-cache directive does not specify a field-name, then a cache MUST NOT use
the response to satisfy a subsequent request without successful revalidation with the
origin server. This allows an origin server to prevent caching even by caches that
have been configured to return stale responses to client requests.
As you identified, no-cache doesn't mean there is never caching, but rather that the user agent has to always ask the server if it's OK to use what it cached. By contrast, no-store says to not even keep a copy, which means there's nothing to ask about. If you know the answer to "Can I reuse this?" is always no, you get a performance boost by skipping cache validation and saving room in the cache for other data.
Aside from performance, there is a behavior difference with browser history. HTTP 1.1 section 13.13 says that "expiration time does not apply to history mechanisms." The no-cache header describes expiration, and so doesn't apply to history mechanisms such as the back button. Thus, the user can navigate backward to a previous page with no-cache without the server being contacted.
The no-store header, on the other hand, prevents the data from being stored outside of a session, in which case it simply isn't available for a history mechanism to use. With no-store, if the user ends his session by navigating to another domain and then goes back, the only way for browser to know what to display is to get the initial page again from the server.
Here's how a Chromium issue on this topic makes the distinction:
no-cache doesn't mean "don't cache this" (that would be no-store). no-cache means don't use this for normal loads unless the resource is revalidated for freshness. History navigations are not normal loads.
No-store : Client will not make any caching operation.
No-cache : Client will cache the response, but client will check server before using that cached data: "data has changed on the server or not?" :with help of 'If-Modified-Since' or 'If-None-Match' header.

HTTP Cache Control max-age, must-revalidate

I have a couple of queries related to Cache-Control.
If I specify Cache-Control max-age=3600, must-revalidate for a static html/js/images/css file, with Last Modified Header defined in HTTP header:
Does browser/proxy cache(like Squid/Akamai) go all the way to origin server to validate before max-age expires? Or will it serve content from cache till max-age expires?
After max-age expiry (that is expiry from cache), is there a If-Modified-Since check or is content re-downloaded from origin server w/o If-Modified-Since check?
a) If the server includes this header:
Cache-Control "max-age=3600, must-revalidate"
it is telling both client caches and proxy caches that once the content is stale (older than 3600 seconds) they must revalidate at the origin server before they can serve the content. This should be the default behavior of caching systems, but the must-revalidate directive makes this requirement unambiguous.
b) The client should revalidate. It might revalidate using the If-Match or If-None-Match headers with an ETag, or it might use the If-Modified-Since or If-Unmodified-Since headers with a date.
a. Look at the ‘Stats’ tab on this page and see what happens.
b. After expiration the browser will check at the server if the file is updated. If not, the server will respond with a 304 Not Modified header and nothing is downloaded.
You can check this behaviour yourself by looking at the ‘Net’ panel in Firebug or similar tools. Just re-enter the URL in the address bar and compare the number of HTTP requests with the number of requests when your cache is empty.
The given answers are incorrect, at least for web browsers in 2019.
"After expiration the browser will check at the server if the file is updated" <- not true
I have a static file served with "Cache-Control: public,must-revalidate,max-age=864000" and both Chrome and Firefox do a request every time (and get a 304 Not Modified back every time).

Resources