How can user set no-cache on browser requests? - http

I understand, to some degree, the HTTP(S) response cache-control: headers, and associated controls for caching but the request cache-control: headers? How does a user control his own request headers? If users are using a normal browser, they have no ability to manually tweak any request parameters outside of those the URL itself indirectly generates.
How is request cache-control even a thing? Is it only intended for programmatically generated (curl, wget, JavaScript) HTTP(S) requests? or interaction between caches and origins?

Most browsers don't give a lot of fine-grained cache control to users. They'll let you clear any local cache, which is purely a local operation. Many will also let you request a page with caching disabled; see Force browser to refresh css, javascript, etc for details.
To give a specific example, in Firefox requesting a page will send headers:
GET /... HTTP/1.1
...
However, if I use 'Reload current page', the request will include cache-control headers to request uncached data from upstream:
GET /... HTTP/1.1
...
Cache-Control: max-age=0
...
Similarly for a resource on that page referenced through <img src...>.
GET /... HTTP/1.1
...
Accept: image/webp,*/*
...
Cache-Control: max-age=0
As you suggest, this isn't fine-grained control; I'm not aware of any browsers that allow anything as complex as choosing the max-age for regular browsing.
However, it is a good example of the general cache-control header interacting with the browser's user-facing functionality.

Related

`cache-control: max-age=0` in http request

I have one question: suppose in each http request there is a cache-control: max-age=0 header, so each request will go all the way to the origin web server.
Does it mean CDN is not useful anymore if all requests are like this?
from other post:
When sent by the user agent
I believe shahkalpesh's answer applies to the user agent side. You can also look at 13.2.6 Disambiguating Multiple Responses.
If a user agent sends a request with Cache-Control: max-age=0 (aka. "end-to-end revalidation"), then each cache along the way will revalidate its cache entry (eg. with the If-Not-Modified header) all the way to the origin server. If the reply is then 304 (Not Modified), the cached entity can be used.
On the other hand, sending a request with Cache-Control: no-cache (aka. "end-to-end reload") doesn't revalidate and the server MUST NOT use a cached copy when responding.
It makes sense and match my result.
when cache is not expired in chrome,it will send request to CDN,CDN will query this with if-modified-since with origin ,then serve the end user.
By setting the max-age to 0, you effectively expire your page in your CDN edge cache immediately. Therefore, your CDN always hit your origin and render the CDN useless as you suggested.
Noticed from your other question that you are using Akamai. If so, then you can use the Edge-Control header to override your cache-control if you don't have direct control over that value, but still want to be able to leverage CDN functionality.

What does cache validation (no-cache for cache-control header) do in http protocol?

I'm trying to understand how the cache-control http header works.
The cache-control header can have the no-cache value. I have checked the definition in w3c and it said:
If the no-cache directive does not specify a field-name, then a cache
MUST NOT use the response to satisfy a subsequent request without
successful revalidation with the origin server.
It tells no-cache value will trigger validation for every request.
What I want to know is, what is cache validation and what it does in the http protocol?
thanks for your help guys. now i understand validation means check if cache contain latest content from server.
my further question would be what issues no-cache will fix. please provide some scenario, like after applied no-cache in http header, what security issue will be fixed.
thanks guys
The no-cache directive is not intended for a security purpose. Security gets covered in rules that define which data/resources a cdn/proxy server is not permitted to cache. So, if security is required, the no-store directive should be used by the client/server. Look under :
paragraph 2 under section 13.4 on https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
https://www.rfc-editor.org/rfc/rfc7234#section-3
The no-cache directive is used by the client when it is ready to accept a resource from a cache, provided there is a confirmation from the server that the cached resource is up to date (fresh). The proxy/cdn can use two methods to re-validate the resource's freshness :
If client sent an ETAG value, proxy/cdn can forward it to the server under an If-None-Match header. If server responds with '304 Not Modified', then the cached resource is fresh to serve.
Using an If-Modified-Since header with a date value that was received the last time the resource was downloaded from the server (was to be found under the Last-Modified header in server's last response).

Cache-Control: 'private' makes 'no-cache="set-cookie"' unnecessary?

My reading of the definition of the 'private' directive for the Cache-Control header is that it will prevent any part of the response from being cached by intermediate proxies. So based on that, it sounds like if I'm using the 'private' directive then there's no need to also use a 'no-cache="set-cookie"' directive to tell intermediate proxies to suppress caching of the Set-Cookie header.
However, in section 4.2.3 in this document, it says:
The origin server should send the following additional HTTP/1.1
response headers, depending on circumstances:
To suppress caching of the Set-Cookie header: Cache-control: no-cache="set-cookie".
and one of the following:
To suppress caching of a private document in shared caches: Cache-control: private.
[...]
and I see a ton of examples online that have both directives.
So do I really need both of those to prevent intermediate proxies from caching a Set-Cookie header? I've been doing some testing, and it seems like Internet Explorer is responding to the 'no-cache="set-cookie"' directive by issuing a full request every subsequent time, so I'd rather not include it if it's not necessary.
Cache-Control: Private will stop intermediary caches from storing the content, so the set-cookie directive isn't applicable in this case.

What is the difference between no-cache and no-store in Cache-control?

I don't find get the practical difference between Cache-Control:no-store and Cache-Control:no-cache.
As far as I know, no-store means that no cache device is allowed to cache that response. In the other hand, no-cache means that no cache device is allowed to serve a cached response without validate it first with the source. But what is that validation about? Conditional get?
What if a response has no-cache, but it has no Last-Modified or ETag?
Regards.
See the below flow chart for better understanding
Ref: (https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=en#cache-control)
But what is that check about?
Exactly checking Last-Modified or ETag. Client would ask server if it has new version of data using those headers and if the answer is no it will serve cached data.
Update
From RFC
no-cache
If the no-cache directive does not specify a field-name, then a cache MUST NOT use
the response to satisfy a subsequent request without successful revalidation with the
origin server. This allows an origin server to prevent caching even by caches that
have been configured to return stale responses to client requests.
As you identified, no-cache doesn't mean there is never caching, but rather that the user agent has to always ask the server if it's OK to use what it cached. By contrast, no-store says to not even keep a copy, which means there's nothing to ask about. If you know the answer to "Can I reuse this?" is always no, you get a performance boost by skipping cache validation and saving room in the cache for other data.
Aside from performance, there is a behavior difference with browser history. HTTP 1.1 section 13.13 says that "expiration time does not apply to history mechanisms." The no-cache header describes expiration, and so doesn't apply to history mechanisms such as the back button. Thus, the user can navigate backward to a previous page with no-cache without the server being contacted.
The no-store header, on the other hand, prevents the data from being stored outside of a session, in which case it simply isn't available for a history mechanism to use. With no-store, if the user ends his session by navigating to another domain and then goes back, the only way for browser to know what to display is to get the initial page again from the server.
Here's how a Chromium issue on this topic makes the distinction:
no-cache doesn't mean "don't cache this" (that would be no-store). no-cache means don't use this for normal loads unless the resource is revalidated for freshness. History navigations are not normal loads.
No-store : Client will not make any caching operation.
No-cache : Client will cache the response, but client will check server before using that cached data: "data has changed on the server or not?" :with help of 'If-Modified-Since' or 'If-None-Match' header.

HTTP Headers: Controlling Cache and History Mechanism

I'm trying to figure out the best HTTP headers to send for four use cases. I'm hoping to come up with headers that do not depend on user agent / protocol version sniffing but I'll accept that if nothing else fits. All URLs are fetched through fully custom handler so I can select all headers as I like, this is all about intermediate proxies and user agents. If possible, this should be compatible with both HTTP/1.0 and HTTP/1.1 clients. If multiple solutions exists, the best one will be the shortest one when sent over the wire.
Static public content
All "Static public content" is stuff that HTTP is really all about: if the URL is the same, the content is the same. I can do this easily: for example, I put user profile icon into http://domain.com/profiles/xyz/icon/1234abcd where "1234abcd" is the SHA-1 of the file contents of the icon. If I change to icon in the future, I'll create a new URL and and modify all existing referrers that should use the new icon. What are the best headers to declare that this may be cached forever and may be shared? I'm currently thinking something along the lines:
Date: <current time>
Expires: <current time + one year>
Is this enough to allow caching by user agents and proxies? Do I need Last-Modified or Pragma?
Static non-public content
All "Static non-public content" is stuff that is static but may not be available to everybody. In fact, this content will be available only to selected logged in users (session is kept with session cookie holding session UUID). If the URL is the same, the content is the same. However, the response is not public. An use case could be an image shared to selected friends in a social network service. I'm currently thinking something along the lines:
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=<huge number>, s-maxage=0
Is this enough to allow caching by user agents and and disable proxies? Do I need Pragma?
Volatile public content
All "Volatile public content" is stuff that is volatile and available to everybody. Something like frontpage of http://slashdot.org/ when not logged in. The intent is to allow rapidly updating content in a non-changing URL. Note that I do NOT want to break the user agent history mechanism (that is, clicking something from a volatile page and then hitting the back button should not result in fetching the volatile page from the server -- however, clicking a link that goes to front page should fetch the resource from the server). I'm currently thinking something along the lines:
Date: <current time>
Expires: <current time>
Cache-Control: public, max-age=0, s-maxage=0
Is this enough to prevent caching but to allow history mechanism (back button)? I know that if I send Cache-Control: no-store, must-revalidate I can force reloading but this is not what I want because that will break the back button, too. Do I need Last-Modified or Pragma?
Even though this is public, it probably does not make sense to allow intermediate proxies to cache this because it's volatile.
Volatile non-public content
All "Volatile non-public content" is stuff that is volatile and not available to everybody (private). Something like frontpage of http://slashdot.org/ when you are logged in. The intent is to allow rapidly updating content in a non-changing URL. Note that I do NOT want to break the user agent history mechanism (that is, clicking something from a volatile page and then hitting the back button should not result in fetching the volatile page from the server -- however, clicking a link that goes to front page should fetch the resource from the server). I'm currently thinking something along the lines:
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=0, s-maxage=0
Is this enough to prevent caching but to allow history mechanism (back button)? Do I need Pragma?
Things that still need testing with my suggested headers:
Verify that private content will not be leaked through HTTP/1.0 proxies.
Verify that caching works correctly in proxies.
Verify that caching works correctly in user agents.
Verify that user agent history mechanism works in user agents (all cases).
Verify that following a link to a volatile page fetches fresh content from the server.
Verify all the results when using HTTPS instead of HTTP.
I'll answer my own question:
Static public content
Date: <current time>
Expires: <current time + one year>
Rationale: This is compatible with the HTTP/1.0 proxies and RFC 2616 Section 14: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21
The Last-Modified header is not needed for correct caching (because conforming user agents follow the Expires header) but may be included for the end user consumption. Including the Last-Modified header may also decrease the server data transfer in case user hits the Reload/Refresh button. If Last-Modified header is added, it should reflect real data instead of something invented up. If you want to decrease server data transfer (in case user hits Reload/Refresh button) and cannot include real Last-Modified header, you may add ETag header to allow conditional GET (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.26). If you already include Last-Modified also adding ETag is just waste. Note that Last-Modified is clearly superior because it's supported by HTTP/1.0 clients and proxies, too. A suitable value for ETag in case of dynamic pages is SHA-1 of the contents of the page/resource. Note that using Last-Modified or ETag will not help with the server load, only with the server outgoing internet pipe / data transfer rate.
Static non-public content
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=31536000, s-maxage=0
Vary: Cookie
Rationale: The Date and Expires headers are for HTTP/1.0 compatibility and because there's no sensible way to specify that the response is private, these headers communicate that the response may not be cached. The Cache-Control header tells that this response may be cached by private cache but shared cache may not cache the response. The s-maxage=0 is added because private may not be supported by all proxies that support Cache-Control (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3 - I have no idea which proxies are broken). The max-age is set to value of 60*60*24*365 (1 year) because the HTTP/1.1 specification does not define any upper limit for this parameter, I guess that this is implementation dependant. The Expires headers SHOULD be limited to one year in the future, so using the same logic here should be okay. The Vary: Cookie header is required because the session that is used to check if the visitor is allowed to see the content is transferred in a cookie; because the returned response depends on the cookie value the cache may not use cached response if cookie header is changed.
I might personally break the last part. By not including the Vary: Cookie header I can improve caching a lot. For example: I have a profile image at http://example.com/icon/12 which is returned only for selected authenticated users. I have a visitor X with session id 5f2 and I allow the image to that user. Visitor X logs out and then later logs in again. Now X has session id 2e8 stored in his session cookie. If I have Vary: cookie, the user agent of X cannot use the cached image and is forced to reload this to its cache. Because the content varies by Cookie, a conditional GET with last modification time cannot be used. I haven't tested if using ETag could help in this case because in that case, the server response would be the same (match the SHA-1 ETag computed from the contents of the response). Be warned that Internet Explorer (at least up to version 9) always forces conditional GET for resources that include Vary: Cookie even if suitable response were already in cache (source: http://blogs.msdn.com/b/ie/archive/2010/07/14/caching-improvements-in-internet-explorer-9.aspx). This is because internal cache implementation of MSIE does not remember which Cookie it sent the first time so it cannot know if the current Cookie is the same one.
However, here's an example of a problem that is caused by dropping the Vary: Cookie header to show why this is indeed required for technically correct behavior: see the example above and imagine that after X has logged out, visitor Y logs in with the same user agent (the user agent may have been restarted between X and Y, it does not matter). If Y views a page that includes a link to http://example.com/icon/12 then Y will see the icon embedded inside the page even though Y wouldn't be able to see the icon if X had not been using the same user agent previously. In my case I don't consider this a big enough problem because Y would be able to access the icon manually by inspecting the user agent cache regardless of possibly added Vary: Cookie. However, this issue may prevent Y from noticing that he wouldn't technically have access to this content (this may be important e.g. if Y is co-authoring the content). If the content is considered sensitive, the server must send no-store regardless of the problems caused by this Cache-Control directive.
Here too, adding Last-Modified header will help with users hitting Reload/Refresh button (see discussion above).
Volatile public content
Date: <current time>
Expires: <current time>
Cache-Control: public, max-age=0, s-maxage=0
Last-Modified: <real-last-modification-time>
Rationale: Tell HTTP/1.0 clients and proxies that this response should be considered stale immediately. The Last-Modified time is included to allow skipping content data transmission when the resource is accessed again and client supports conditional GET. If the Last-Modified cannot be used, ETag may be used as a replacement (see discussion above). It's critical to use Last-Modified to allow conditional GET with HTTP/1.0 compatible clients.
If the content may be delayed even slightly, then Expires, max-age and s-maxage [sic] should be adjusted suitably. For example, adding 5 seconds to those might help a lot for highly popular site, as suggested by symcbean's answer. Note that unlike conditional GET, increasing the expiry time will decrease server load instead of just decreasing server outgoing data traffic (because the server will see less requests in total).
Volatile non-public content
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=0, s-maxage=0
Last-Modified: <real-last-modification-time>
Vary: Cookie
Rationale: Tell HTTP/1.0 clients and proxies that this response should be considered stale immediately. The Last-Modified time is included to allow skipping content data transmission when the resource is accessed again and client supports conditional GET. If the Last-Modified cannot be used, ETag may be used as a replacement (see discussion above). It's critical to use Last-Modified to allow conditional GET with HTTP/1.0 compatible clients. Also note that Cache-Control must not include no-cache, must-revalidate or no-store because using any of these directives will break the back button in at least one user agent. However, if the content the server is transferring contains sensitive material that should not be stored in permanent storage, the no-store flag MUST be used regardless of breaking the back button. Warning: note that the use of no-store cannot prevent sensitive material ending up on the hard disk without encryption if the operating system has swapping enabled and the swap is not encrypted! Also note that using no-store makes very little sense unless the connection is encrypted (HTTPS/SSL).
Mostly OK, however you do need to bear in mind that HTTP/1.0 proxies may cache content served up as
Cache-Control: private
So you should set an explicit Date-modified header as well as the expires header.
For your 'Static non-public content' you should add a 'Varies: Cookie' header.
For your 'Volatile public content': How fast is it changing? Setting an TTL of +5 seconds may offload a lot of effort from your servers.
For 'Volatile non-public content' you should probably add no-cache,must-revalidate to the Cache-control header.
Pragma headers issued from the server should have no effect on clients nor proxies.
Do test out what happens when your cache expires (IME you can end up with a system even slower than one accessed with no populated cache due to all the conditional requests / 304 responses)

Resources