After reading about the Cache-Control field of the HTTP header,
I understand that the Cache-Control field in the HTTP response header (server to client) specifies the directives for the intermediate proxy servers/client browser on how to handle the response, by sending different values for the Cache-Control field: private, public, no-cache, or no-store in the response header.
But I don't get why we need to send Cache-Control as a request header (client to server)?
Cache-Control: no-cache is generally used in a request header (sent from web browser to server) to force validation of the resource in the intermediate proxies.
If the client doesn't send this request to the server, intermediate proxies will return a copy of the content if it is fresh (has not expired according to Expire or max-age fields). Cache-Control directs these proxies to revalidate the copy even if it is fresh.
A client can send a Cache-Control header in a request in order to request specific caching behavior, such as revalidation, from the origin server and any intermediate proxy servers along the request path.
In addition to the above answer,
There might be a setup where cache chaining is implemented. In that case if the request comes to first cache where it is not satisfied, it might go to further chained cache.
Thus in order to get the response always from the server we include cache-control in request headers. This will insure that response is always from the server.
Related
I have a website that caches data, it uses a content-delivery-network called akamai, and this is the response header. 'cache-control': 'must-revalidate, max-age=600'. This means, re-validate after 600 seconds (stale). If i want the cdn to query the origin server each request, i can do this... cache-control: no-cache. When i send this request, i get the same response header... indicating that it isn't being re-validated? Is it actually not being re-validated, or is it being re-validated? Since the website is well-known, it is safe to say that the website is correctly responding to headers.
What you've observed is correct behavior.
Your Cache-Control request header applies to this request, while the Cache-Control response applies to future requests. Whether or not your client wants a fresh response to this request will not and should not change the server's general directions as to how its resources can be cached.
As long as you use no-cache in your requests you should not get a cached response.
See the screenshot above. The response header has a cache-control set to max-age, which means the maximum amount of time a resource is considered fresh. I believe if we make a request within the time frame, the browser will serve the local copies without bothering asking the server. and the request header has a cache-control set to no-cache, that means, according to MDN,
response may be stored by any cache, even if the request is normally
non-cacheable. However, the stored response MUST always go through
validation with the origin server first before using it,
So here we have a contradiction:
the cache-control directive in the request is no-cache, so the user agent has to consult the server first before using the cache to fulfill the request.
The cache-control in response has a max-age being 86400, suggesting that within that time frame user agents can just use the cache to fulfill the request.
If the time specified in response's max-age hasn't expired, does the browser bypass the cache and send a request to the server because of its no-cache or not?
If the time specified in response's max-age hasn't expired, does the browser bypass the cache and send a request to the server because of its no-cache or not?
Yes, a request will be sent to the origin server. From the specification:
The no-cache request directive indicates that a cache MUST NOT use
a stored response to satisfy the request without successful
validation on the origin server.
There's no contradiction. The max-age in the response indicates how long it can be considered to be fresh. It doesn't obligate anyone to use it. Indeed, caching is an entirely optional part of HTTP, so sending a full request to the origin every time would also be fully compliant with the specification.
Now imagine that the response uses no-cache and the request uses max-age=86400. Again, a request would be sent to the origin server, because "the no-cache response directive indicates that the response MUST NOT be used to satisfy a subsequent request without successful validation on the origin server."
So the real asymmetry here is not between requests and responses, but between caching (optional) and not caching (obligatory when specified).
If the time specified in response's max-age hasn't expired, does the browser bypass the cache and send a request to the server because of
its no-cache or not?
Yes, it will be bypassed and sent a request to the server.
If the client sets max-age and there is no max-stale present, there is no request until the max-age expires. On the other hand, If the client sets no-cache, it always means a request sent without any conditions.
In conclusion, the max-age value of the current request compare to the last value of the response, and if there is no value or equal to no-cache that means always must send a request because the client not spouse to cache anything about that resource
I have one question: suppose in each http request there is a cache-control: max-age=0 header, so each request will go all the way to the origin web server.
Does it mean CDN is not useful anymore if all requests are like this?
from other post:
When sent by the user agent
I believe shahkalpesh's answer applies to the user agent side. You can also look at 13.2.6 Disambiguating Multiple Responses.
If a user agent sends a request with Cache-Control: max-age=0 (aka. "end-to-end revalidation"), then each cache along the way will revalidate its cache entry (eg. with the If-Not-Modified header) all the way to the origin server. If the reply is then 304 (Not Modified), the cached entity can be used.
On the other hand, sending a request with Cache-Control: no-cache (aka. "end-to-end reload") doesn't revalidate and the server MUST NOT use a cached copy when responding.
It makes sense and match my result.
when cache is not expired in chrome,it will send request to CDN,CDN will query this with if-modified-since with origin ,then serve the end user.
By setting the max-age to 0, you effectively expire your page in your CDN edge cache immediately. Therefore, your CDN always hit your origin and render the CDN useless as you suggested.
Noticed from your other question that you are using Akamai. If so, then you can use the Edge-Control header to override your cache-control if you don't have direct control over that value, but still want to be able to leverage CDN functionality.
What value should have cache-control header to enable ETag\Last-Modified? I want my resources files to be cached but never used without validation from server, i.e. browser should send If-none-match or If-modified-since header and receive 304 HTTP status code to use file from cache.
The short answer is Cache-control: no-cache. Browser/caching proxy will have to always validate data before serving. For success validation ETag and Last-Modified headers must be present. Otherwise resource will be downloaded always fully from the server.
I'm trying to understand how the cache-control http header works.
The cache-control header can have the no-cache value. I have checked the definition in w3c and it said:
If the no-cache directive does not specify a field-name, then a cache
MUST NOT use the response to satisfy a subsequent request without
successful revalidation with the origin server.
It tells no-cache value will trigger validation for every request.
What I want to know is, what is cache validation and what it does in the http protocol?
thanks for your help guys. now i understand validation means check if cache contain latest content from server.
my further question would be what issues no-cache will fix. please provide some scenario, like after applied no-cache in http header, what security issue will be fixed.
thanks guys
The no-cache directive is not intended for a security purpose. Security gets covered in rules that define which data/resources a cdn/proxy server is not permitted to cache. So, if security is required, the no-store directive should be used by the client/server. Look under :
paragraph 2 under section 13.4 on https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
https://www.rfc-editor.org/rfc/rfc7234#section-3
The no-cache directive is used by the client when it is ready to accept a resource from a cache, provided there is a confirmation from the server that the cached resource is up to date (fresh). The proxy/cdn can use two methods to re-validate the resource's freshness :
If client sent an ETAG value, proxy/cdn can forward it to the server under an If-None-Match header. If server responds with '304 Not Modified', then the cached resource is fresh to serve.
Using an If-Modified-Since header with a date value that was received the last time the resource was downloaded from the server (was to be found under the Last-Modified header in server's last response).