how to properly use cache-control header in an HTTP request - http

I have a website that caches data, it uses a content-delivery-network called akamai, and this is the response header. 'cache-control': 'must-revalidate, max-age=600'. This means, re-validate after 600 seconds (stale). If i want the cdn to query the origin server each request, i can do this... cache-control: no-cache. When i send this request, i get the same response header... indicating that it isn't being re-validated? Is it actually not being re-validated, or is it being re-validated? Since the website is well-known, it is safe to say that the website is correctly responding to headers.

What you've observed is correct behavior.
Your Cache-Control request header applies to this request, while the Cache-Control response applies to future requests. Whether or not your client wants a fresh response to this request will not and should not change the server's general directions as to how its resources can be cached.
As long as you use no-cache in your requests you should not get a cached response.

Related

what happened if the HTTP request cache control header is different than the response cache control header

See the screenshot above. The response header has a cache-control set to max-age, which means the maximum amount of time a resource is considered fresh. I believe if we make a request within the time frame, the browser will serve the local copies without bothering asking the server. and the request header has a cache-control set to no-cache, that means, according to MDN,
response may be stored by any cache, even if the request is normally
non-cacheable. However, the stored response MUST always go through
validation with the origin server first before using it,
So here we have a contradiction:
the cache-control directive in the request is no-cache, so the user agent has to consult the server first before using the cache to fulfill the request.
The cache-control in response has a max-age being 86400, suggesting that within that time frame user agents can just use the cache to fulfill the request.
If the time specified in response's max-age hasn't expired, does the browser bypass the cache and send a request to the server because of its no-cache or not?
If the time specified in response's max-age hasn't expired, does the browser bypass the cache and send a request to the server because of its no-cache or not?
Yes, a request will be sent to the origin server. From the specification:
The no-cache request directive indicates that a cache MUST NOT use
a stored response to satisfy the request without successful
validation on the origin server.
There's no contradiction. The max-age in the response indicates how long it can be considered to be fresh. It doesn't obligate anyone to use it. Indeed, caching is an entirely optional part of HTTP, so sending a full request to the origin every time would also be fully compliant with the specification.
Now imagine that the response uses no-cache and the request uses max-age=86400. Again, a request would be sent to the origin server, because "the no-cache response directive indicates that the response MUST NOT be used to satisfy a subsequent request without successful validation on the origin server."
So the real asymmetry here is not between requests and responses, but between caching (optional) and not caching (obligatory when specified).
If the time specified in response's max-age hasn't expired, does the browser bypass the cache and send a request to the server because of
its no-cache or not?
Yes, it will be bypassed and sent a request to the server.
If the client sets max-age and there is no max-stale present, there is no request until the max-age expires. On the other hand, If the client sets no-cache, it always means a request sent without any conditions.
In conclusion, the max-age value of the current request compare to the last value of the response, and if there is no value or equal to no-cache that means always must send a request because the client not spouse to cache anything about that resource

Are both "Cache request directives" and "Cache response directives" needed?

If I already have "Cache request directives," what is the point of "Cache response directives." Do they add anything? Will my application run the same without them?
I looking for proof whether "Cache response directives" are redundant. If they are redundant, I will not bother with them.
GC_
I assume you are asking as an application developer and if so, you should not bother with any Cache-Control header your application receives in a request.
Why?
Because that Cache-Control header is intended for caches before the request reaches your application.
It is not for your application.
This is explained in RFC7234 Section 5.2 (emphasis mine):
The "Cache-Control" header field is used to specify directives for caches along the request/response chain.
The purpose of the header is to tell caches what to do with the request.
Your application receives the header because it is attached to a request.
But just because you receive it, it doesn't mean it is for you.
Bottom line: ignore any Cache-Control header in a request.
Cache-Control in a response comes from your application and it is also intended for caches.
You use it to tell caches what to do with the response.
Basically, you use the header to specify whether the response is cacheable and if it is, for how long.
It is not merely a copy of the Cache-Control header received in a request.
Do they add anything?
Yes, they do.
Cache-Control in a response tells caches whether the response is cacheable and if it is,
it allows caches to serve an equivalent request immediately with a cached response.
This reduces your application's load and improves response times from a client's point of view.
RFC7234 Section 4.2 states:
When a response is "fresh" in the cache, it can be used to satisfy subsequent requests without contacting the origin server, thereby improving efficiency.
Your next question:
Will my application run the same without them?
It depends.
If your application doesn't add appropriate Cache-Control header for responses that must not be cached, future requests may receive stale responses.
So, I recommend that at the very least, add Cache-Control: no-cache to responses that must not be cached.
Additional explanation for your question in the comment section
The header should generally come from your backend, not your frontend.
This allows caches to accurately accelerates requests to your backend and keeps your frontend request code simple.
There is one exception: if the backend isn't yours and its response freshness policy doesn't match your requirement.
An example scenario may be in order:
Let's say, that in addition to sending requests to your own backend, your frontend also sends requests to someone else's backend.
This particular backend specifies that its responses are cacheable for at most 5 minutes, by either sending Cache-Control: max-age=300 or appropriate Expires header.
Let's also say, that you want the responses to be no more than 10 seconds stale, because 5 minutes is too stale for you.
Since the backend isn't yours, you can't change the 5-minutes directive, but you can send your requests with Cache-Control: max-age=10 thereby forcing the caches to fetch a fresh response if a cached response is older than 10 seconds, despite the 5-minutes directive from the backend.
That is the appropriate situation to send Cache-Control header from your frontend: the backend isn't yours and its response freshness policy doesn't match your requirement.
Are both "Cache request directives" and "Cache response directives" needed?
Yes. Cache-Control in request header and Cache-Control in response header are both needed. Even if you already have Cache-Control in request header, Cache-Control in response is not redundant. They are 2 different things. According to RFC7234:
cache directives are unidirectional in that the presence of a directive in a request does not imply that the same directive is to be given in the response.
Generally speaking, Cache-Control in response header controls the cache behaviour from resource provider's point of view. -- should the resource stored in cache? How long would it be valid? When requested, does it need to be revalidated? etc. As response headers can be configured for all HTTP requests, "Cache response directives" provides a way to define cache policy for all resources.
Cache-Control in request header, however, controls the cache behaviour from resource consumer's point of view. It's more like defining exceptional case where the cache policy of specific resource should be adjusted. If you check RFC7234, most of the "Request Cache-Control Directives" indicates that the client is willing to... or indicates that the client is unwilling to...
Also, as request headers can only be configured in some cases (e.g. Ajax), "Cache request directives" doesn't exist for many HTTP requests. For example, after HTML file is parsed, many HTTP requests will be created to fetch static resources (image files, css files etc.), there is no way to configure Cache-Control header for these requests manually in program.
If I already have "Cache request directives", what is the point of "Cache response directives"?
If you only have "Cache request directives" and never get Cache-Control response header, some problems will happen:
Without Cache-Control response header, the cache behaviour of all resources are decided by browser (e.g. calculate valid-time through LM-Factor algorithm). In the worst case, there would be no cache at all.
For static resources (e.g. image files, css files), as you can't configure Cache-Control in request, you lost cache control ability.

`cache-control: max-age=0` in http request

I have one question: suppose in each http request there is a cache-control: max-age=0 header, so each request will go all the way to the origin web server.
Does it mean CDN is not useful anymore if all requests are like this?
from other post:
When sent by the user agent
I believe shahkalpesh's answer applies to the user agent side. You can also look at 13.2.6 Disambiguating Multiple Responses.
If a user agent sends a request with Cache-Control: max-age=0 (aka. "end-to-end revalidation"), then each cache along the way will revalidate its cache entry (eg. with the If-Not-Modified header) all the way to the origin server. If the reply is then 304 (Not Modified), the cached entity can be used.
On the other hand, sending a request with Cache-Control: no-cache (aka. "end-to-end reload") doesn't revalidate and the server MUST NOT use a cached copy when responding.
It makes sense and match my result.
when cache is not expired in chrome,it will send request to CDN,CDN will query this with if-modified-since with origin ,then serve the end user.
By setting the max-age to 0, you effectively expire your page in your CDN edge cache immediately. Therefore, your CDN always hit your origin and render the CDN useless as you suggested.
Noticed from your other question that you are using Akamai. If so, then you can use the Edge-Control header to override your cache-control if you don't have direct control over that value, but still want to be able to leverage CDN functionality.

What does cache validation (no-cache for cache-control header) do in http protocol?

I'm trying to understand how the cache-control http header works.
The cache-control header can have the no-cache value. I have checked the definition in w3c and it said:
If the no-cache directive does not specify a field-name, then a cache
MUST NOT use the response to satisfy a subsequent request without
successful revalidation with the origin server.
It tells no-cache value will trigger validation for every request.
What I want to know is, what is cache validation and what it does in the http protocol?
thanks for your help guys. now i understand validation means check if cache contain latest content from server.
my further question would be what issues no-cache will fix. please provide some scenario, like after applied no-cache in http header, what security issue will be fixed.
thanks guys
The no-cache directive is not intended for a security purpose. Security gets covered in rules that define which data/resources a cdn/proxy server is not permitted to cache. So, if security is required, the no-store directive should be used by the client/server. Look under :
paragraph 2 under section 13.4 on https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
https://www.rfc-editor.org/rfc/rfc7234#section-3
The no-cache directive is used by the client when it is ready to accept a resource from a cache, provided there is a confirmation from the server that the cached resource is up to date (fresh). The proxy/cdn can use two methods to re-validate the resource's freshness :
If client sent an ETAG value, proxy/cdn can forward it to the server under an If-None-Match header. If server responds with '304 Not Modified', then the cached resource is fresh to serve.
Using an If-Modified-Since header with a date value that was received the last time the resource was downloaded from the server (was to be found under the Last-Modified header in server's last response).

Why is Cache-Control attribute sent in request header (client to server)?

After reading about the Cache-Control field of the HTTP header,
I understand that the Cache-Control field in the HTTP response header (server to client) specifies the directives for the intermediate proxy servers/client browser on how to handle the response, by sending different values for the Cache-Control field: private, public, no-cache, or no-store in the response header.
But I don't get why we need to send Cache-Control as a request header (client to server)?
Cache-Control: no-cache is generally used in a request header (sent from web browser to server) to force validation of the resource in the intermediate proxies.
If the client doesn't send this request to the server, intermediate proxies will return a copy of the content if it is fresh (has not expired according to Expire or max-age fields). Cache-Control directs these proxies to revalidate the copy even if it is fresh.
A client can send a Cache-Control header in a request in order to request specific caching behavior, such as revalidation, from the origin server and any intermediate proxy servers along the request path.
In addition to the above answer,
There might be a setup where cache chaining is implemented. In that case if the request comes to first cache where it is not satisfied, it might go to further chained cache.
Thus in order to get the response always from the server we include cache-control in request headers. This will insure that response is always from the server.

Resources