What is Cache-Control: private? - http

When I visit chesseng.herokuapp.com I get a response header that looks like
Cache-Control:private
Connection:keep-alive
Content-Encoding:gzip
Content-Type:text/css
Date:Tue, 16 Oct 2012 06:37:53 GMT
Last-Modified:Tue, 16 Oct 2012 03:13:38 GMT
Status:200 OK
transfer-encoding:chunked
Vary:Accept-Encoding
X-Rack-Cache:miss
and then I refresh the page and get
Cache-Control:private
Connection:keep-alive
Date:Tue, 16 Oct 2012 06:20:49 GMT
Status:304 Not Modified
X-Rack-Cache:miss
so it seems like caching is working. If that works for caching then what is the point of Expires and Cache-Control:max-age. To add to confusion, when I test the page at https://developers.google.com/speed/pagespeed/insights/ it tells me to "Leverage browser caching".

Cache-Control: private
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache, such as a proxy server.
From RFC2616 section 14.9.1

To answer your question about why caching is working, even though the web-server didn't include the headers:
Expires: [a date]
Cache-Control: max-age=[seconds]
The server kindly asked any intermediate proxies to not cache the contents (i.e. the item should only be cached in a private cache, i.e. only on your own local machine):
Cache-Control: private
But the server forgot to include any sort of caching hints:
they forgot to include Expires (so the browser knows to use the cached copy until that date)
they forgot to include Max-Age (so the browser knows how long the cached item is good for)
they forgot to include E-Tag (so the browser can do a conditional request)
But they did include a Last-Modified date in the response:
Last-Modified: Tue, 16 Oct 2012 03:13:38 GMT
Because the browser knows the date the file was modified, it can perform a conditional request. It will ask the server for the file, but instruct the server to only send the file if it has been modified since 2012/10/16 3:13:38:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
The server receives the request, realizes that the client has the most recent version already. Rather than sending the client 200 OK, followed by the contents of the page, it instead tells you that your cached version is good:
304 Not Modified
Your browser did have to suffer the round-trip delay of sending a request to the server, and waiting for the response, but it did save having to re-download the static content.
Why Max-Age? Why Expires?
Because Last-Modified sucks.
Not everything on the server has a date associated with it. If I'm building a page on the fly, there is no date associated with it - it's now. But I'm perfectly willing to let the user cache the homepage for 15 seconds:
200 OK
Cache-Control: max-age=15
If the user hammers F5, they'll keep getting the cached version for 15 seconds. If it's a corporate proxy, then all 67,198 users hitting the same page in the same 15-second window will all get the same contents - all served from close cache. Performance win for everyone.
The virtue of adding Cache-Control: max-age is that the browser doesn't even have to perform a "conditional" request.
if you specified only Last-Modified, the browser has to perform a If-Modified-Since request, and watch for a 304 Not Modified response
if you specified max-age, the browser won't even have to suffer the network round-trip; the content will come right out of the caches.
The difference between "Cache-Control: max-age" and "Expires"
Expires is a legacy (c. 1998) equivalent of the modern Cache-Control: max-age header:
Expires: you specify a date (yuck)
max-age: you specify seconds (goodness)
And if both are specified, then the browser uses max-age:
200 OK
Cache-Control: max-age=60
Expires: 20180403T192837
Any web-site written after 1998 should not use Expires anymore, and instead use max-age.
What is ETag?
ETag is similar to Last-Modified, except that it doesn't have to be a date - it just has to be a something.
If I'm pulling a list of products out of a database, the server can send the last rowversion as an ETag, rather than a date:
200 OK
ETag: "247986"
My ETag can be the SHA1 hash of a static resource (e.g. image, js, css, font), or of the cached rendered page (i.e. this is what the Mozilla MDN wiki does; they hash the final markup):
200 OK
ETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
And exactly like in the case of a conditional request based on Last-Modified:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
304 Not Modified
I can perform a conditional request based on the ETag:
GET / HTTP/1.1
If-None-Match: "33a64df551425fcc55e4d42a148795d9f25f89d4"
304 Not Modified
An ETag is superior to Last-Modified because it works for things besides files, or things that have a notion of date. It just is

RFC 2616, section 14.9.1:
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache...A private (non-shared) cache MAY cache the response.
Browsers could use this information. Of course, the current "user" may mean many things: OS user, a browser user (e.g. Chrome's profiles), etc. It's not specified.
For me, a more concrete example of Cache-Control: private is that proxy servers (which typically have many users) won't cache it. It is meant for the end user, and no one else.
FYI, the RFC makes clear that this does not provide security. It is about showing the correct content, not securing content.
This usage of the word private only controls where the response may be cached, and cannot ensure the privacy of the message content.

The Expires entity-header field gives the date/time after which the response is considered stale.The Cache-control:maxage field gives the age value (in seconds) bigger than which response is consider stale.
Althought above header field give a mechanism to client to decide whether to send request to the server. In some condition, the client send a request to sever and the age value of response is bigger then the maxage value ,dose it means server needs to send the resource to client? Maybe the resource never changed.
In order to resolve this problem, HTTP1.1 gives last-modifided head. The server gives the last modified date of the response to client. When the client need this resource, it will send If-Modified-Since head field to server. If this date is before the modified date of the resouce, the server will sends the resource to client and gives 200 code.Otherwise,it will returns 304 code to client and this means client can use the resource it cached.

Related

Why does my CDN cache a certain document,but browsers do not?

I have a url with a test PDF on it, this is my origin:
https://powered-by.qbank.se/miso/MISO_Testing_Document279626.pdf
I have that origin setup in an Azure CDN using the Microsoft provider. it's url is:
https://misocdn-fail.azureedge.net/MISO_Testing_Document279626.pdf
When I update the PDF on the origin site, all the browsers that I have tested will bring back the NEW document with just an F5 refresh, not even a ctrl-F5. But, the CDN continues the cache the PDF basically indefinetly (2 days acording to docs or til I purge)
My question is, why isn't my CDN able to detect the change at the origin and browser is?
I understand that the CDN caches, but I don't understand what it is that a browser is doing to figure out this content is new?
To better understand the phenomenon it is a good start to obesrve the response headers received from the direct access url.
One way to do that is to use curl -I <YOUR_URL> in your terminal.
You will see something like:
HTTP/1.1 200 OK
Date: Mon, 01 Oct 2018 09:03:57 GMT
Server: Apache
Last-Modified: Fri, 28 Sep 2018 19:11:57 GMT
ETag: "11ff1-576f33ab4c2a0"
Accept-Ranges: bytes
Content-Length: 73713
Cache-Control: max-age=86400
Expires: Tue, 02 Oct 2018 09:03:57 GMT
Content-Type: application/pdf
Out of these headers the browser uses the Cache-Control, ETag and Last-Modified to determine the freshness of the requested content.
Cache-Control: max-age=<seconds> is the maximum amount of time (relative to the time of the request) a resource will be considered fresh.
Now, according to Mozilla Developer Network –MDN– Freshness is described as below:
Once a resource is stored in a cache, it could theoretically be served by the cache forever. Caches have finite storage so items are periodically removed from storage. This process is called cache eviction. On the other side, some resources may change on the server so the cache should be updated. As HTTP is a client-server protocol, servers can't contact caches and clients when a resource changes; they have to communicate an expiration time for the resource. Before this expiration time, the resource is fresh; after the expiration time, the resource is stale. Eviction algorithms often privilege fresh resources over stale resources. Note that a stale resource is not evicted or ignored; when the cache receives a request for a stale resource, it forwards this request with a If-None-Match to check if it is in fact still fresh. If so, the server returns a 304 (Not Modified) header without sending the body of the requested resource, saving some bandwidth.
So to validate a cached resource, an If-None-Match header will be issued by the browser if the ETag header was part of the response for the resource.
This is the mechanism that makes your browser download the new version of your pdf when accessed directly. Please also note, that these headers are present in the request from the CDN url as well, but the CDN edge servers are still storing your old file.
When it comes to the CDN cache, the ETag and Last-Modified headers are not respected. It is only the Cache-Control header in the HTTP response from the origin server that defines the time-to-live (TTL) period of a resource. In your case, it is 86400 seconds. So theoretically, the new version of your pdf will be served after 1 day from the first request through the CDN link.
Up until that moment the old pdf will be hosted by the CDN edge servers. You can read more about the Azure CDN expiration management in the Azure CDN documentation.

difference between request header cache policy and response header

I have some js hosted on AWS. I want to cache it to not to pay extra for 304 GET request, but I'm puzzled why two headers are different.
Request Method:GET
Status Code:304 Not Modified
Request header of helper.js
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
If-Modified-Since:Tue, 20 Aug 2013 13:08:13 GMT
and response header
Age:4348
Cache-Control:max-age=604800
Connection:keep-alive
Why they are different? Does it mean that Cache-Control is wrong? I used Chrome console to get the headers.
I don't think that Cache-Control is wrong and it seems that your content is already cached. From the request headers, I understand that the first request was done at Tue, 20 Aug 2013 13:08:13 GMT as the browser indicates the server "hey, has the content changed since that time?". In return, the server responses with 304 Not Modified header, indicating that the content has not been changed and it should be cached 604800 seconds more until revalidating it. Remember that the caching is done on server side. So, you may want to look at your server defitinitons on js files. Usually, in the deployment environment, I instruct my webserver to send cache header for *.js *.png etc. After configuring the web server for sending cache headers, it is the browser's work to take care of the rest. In that case, your browser works as expected.
You can look at RFC2616 for 304 response. You may also want to look at this decent caching tutorial. It should clear some ideas.
The problem is with Chrome. If you press Refresh button it invalidates the cache, but if you press Enter in address bar it gets the resources from cache.

HTTP Headers: Controlling Cache and History Mechanism

I'm trying to figure out the best HTTP headers to send for four use cases. I'm hoping to come up with headers that do not depend on user agent / protocol version sniffing but I'll accept that if nothing else fits. All URLs are fetched through fully custom handler so I can select all headers as I like, this is all about intermediate proxies and user agents. If possible, this should be compatible with both HTTP/1.0 and HTTP/1.1 clients. If multiple solutions exists, the best one will be the shortest one when sent over the wire.
Static public content
All "Static public content" is stuff that HTTP is really all about: if the URL is the same, the content is the same. I can do this easily: for example, I put user profile icon into http://domain.com/profiles/xyz/icon/1234abcd where "1234abcd" is the SHA-1 of the file contents of the icon. If I change to icon in the future, I'll create a new URL and and modify all existing referrers that should use the new icon. What are the best headers to declare that this may be cached forever and may be shared? I'm currently thinking something along the lines:
Date: <current time>
Expires: <current time + one year>
Is this enough to allow caching by user agents and proxies? Do I need Last-Modified or Pragma?
Static non-public content
All "Static non-public content" is stuff that is static but may not be available to everybody. In fact, this content will be available only to selected logged in users (session is kept with session cookie holding session UUID). If the URL is the same, the content is the same. However, the response is not public. An use case could be an image shared to selected friends in a social network service. I'm currently thinking something along the lines:
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=<huge number>, s-maxage=0
Is this enough to allow caching by user agents and and disable proxies? Do I need Pragma?
Volatile public content
All "Volatile public content" is stuff that is volatile and available to everybody. Something like frontpage of http://slashdot.org/ when not logged in. The intent is to allow rapidly updating content in a non-changing URL. Note that I do NOT want to break the user agent history mechanism (that is, clicking something from a volatile page and then hitting the back button should not result in fetching the volatile page from the server -- however, clicking a link that goes to front page should fetch the resource from the server). I'm currently thinking something along the lines:
Date: <current time>
Expires: <current time>
Cache-Control: public, max-age=0, s-maxage=0
Is this enough to prevent caching but to allow history mechanism (back button)? I know that if I send Cache-Control: no-store, must-revalidate I can force reloading but this is not what I want because that will break the back button, too. Do I need Last-Modified or Pragma?
Even though this is public, it probably does not make sense to allow intermediate proxies to cache this because it's volatile.
Volatile non-public content
All "Volatile non-public content" is stuff that is volatile and not available to everybody (private). Something like frontpage of http://slashdot.org/ when you are logged in. The intent is to allow rapidly updating content in a non-changing URL. Note that I do NOT want to break the user agent history mechanism (that is, clicking something from a volatile page and then hitting the back button should not result in fetching the volatile page from the server -- however, clicking a link that goes to front page should fetch the resource from the server). I'm currently thinking something along the lines:
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=0, s-maxage=0
Is this enough to prevent caching but to allow history mechanism (back button)? Do I need Pragma?
Things that still need testing with my suggested headers:
Verify that private content will not be leaked through HTTP/1.0 proxies.
Verify that caching works correctly in proxies.
Verify that caching works correctly in user agents.
Verify that user agent history mechanism works in user agents (all cases).
Verify that following a link to a volatile page fetches fresh content from the server.
Verify all the results when using HTTPS instead of HTTP.
I'll answer my own question:
Static public content
Date: <current time>
Expires: <current time + one year>
Rationale: This is compatible with the HTTP/1.0 proxies and RFC 2616 Section 14: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21
The Last-Modified header is not needed for correct caching (because conforming user agents follow the Expires header) but may be included for the end user consumption. Including the Last-Modified header may also decrease the server data transfer in case user hits the Reload/Refresh button. If Last-Modified header is added, it should reflect real data instead of something invented up. If you want to decrease server data transfer (in case user hits Reload/Refresh button) and cannot include real Last-Modified header, you may add ETag header to allow conditional GET (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.26). If you already include Last-Modified also adding ETag is just waste. Note that Last-Modified is clearly superior because it's supported by HTTP/1.0 clients and proxies, too. A suitable value for ETag in case of dynamic pages is SHA-1 of the contents of the page/resource. Note that using Last-Modified or ETag will not help with the server load, only with the server outgoing internet pipe / data transfer rate.
Static non-public content
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=31536000, s-maxage=0
Vary: Cookie
Rationale: The Date and Expires headers are for HTTP/1.0 compatibility and because there's no sensible way to specify that the response is private, these headers communicate that the response may not be cached. The Cache-Control header tells that this response may be cached by private cache but shared cache may not cache the response. The s-maxage=0 is added because private may not be supported by all proxies that support Cache-Control (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3 - I have no idea which proxies are broken). The max-age is set to value of 60*60*24*365 (1 year) because the HTTP/1.1 specification does not define any upper limit for this parameter, I guess that this is implementation dependant. The Expires headers SHOULD be limited to one year in the future, so using the same logic here should be okay. The Vary: Cookie header is required because the session that is used to check if the visitor is allowed to see the content is transferred in a cookie; because the returned response depends on the cookie value the cache may not use cached response if cookie header is changed.
I might personally break the last part. By not including the Vary: Cookie header I can improve caching a lot. For example: I have a profile image at http://example.com/icon/12 which is returned only for selected authenticated users. I have a visitor X with session id 5f2 and I allow the image to that user. Visitor X logs out and then later logs in again. Now X has session id 2e8 stored in his session cookie. If I have Vary: cookie, the user agent of X cannot use the cached image and is forced to reload this to its cache. Because the content varies by Cookie, a conditional GET with last modification time cannot be used. I haven't tested if using ETag could help in this case because in that case, the server response would be the same (match the SHA-1 ETag computed from the contents of the response). Be warned that Internet Explorer (at least up to version 9) always forces conditional GET for resources that include Vary: Cookie even if suitable response were already in cache (source: http://blogs.msdn.com/b/ie/archive/2010/07/14/caching-improvements-in-internet-explorer-9.aspx). This is because internal cache implementation of MSIE does not remember which Cookie it sent the first time so it cannot know if the current Cookie is the same one.
However, here's an example of a problem that is caused by dropping the Vary: Cookie header to show why this is indeed required for technically correct behavior: see the example above and imagine that after X has logged out, visitor Y logs in with the same user agent (the user agent may have been restarted between X and Y, it does not matter). If Y views a page that includes a link to http://example.com/icon/12 then Y will see the icon embedded inside the page even though Y wouldn't be able to see the icon if X had not been using the same user agent previously. In my case I don't consider this a big enough problem because Y would be able to access the icon manually by inspecting the user agent cache regardless of possibly added Vary: Cookie. However, this issue may prevent Y from noticing that he wouldn't technically have access to this content (this may be important e.g. if Y is co-authoring the content). If the content is considered sensitive, the server must send no-store regardless of the problems caused by this Cache-Control directive.
Here too, adding Last-Modified header will help with users hitting Reload/Refresh button (see discussion above).
Volatile public content
Date: <current time>
Expires: <current time>
Cache-Control: public, max-age=0, s-maxage=0
Last-Modified: <real-last-modification-time>
Rationale: Tell HTTP/1.0 clients and proxies that this response should be considered stale immediately. The Last-Modified time is included to allow skipping content data transmission when the resource is accessed again and client supports conditional GET. If the Last-Modified cannot be used, ETag may be used as a replacement (see discussion above). It's critical to use Last-Modified to allow conditional GET with HTTP/1.0 compatible clients.
If the content may be delayed even slightly, then Expires, max-age and s-maxage [sic] should be adjusted suitably. For example, adding 5 seconds to those might help a lot for highly popular site, as suggested by symcbean's answer. Note that unlike conditional GET, increasing the expiry time will decrease server load instead of just decreasing server outgoing data traffic (because the server will see less requests in total).
Volatile non-public content
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=0, s-maxage=0
Last-Modified: <real-last-modification-time>
Vary: Cookie
Rationale: Tell HTTP/1.0 clients and proxies that this response should be considered stale immediately. The Last-Modified time is included to allow skipping content data transmission when the resource is accessed again and client supports conditional GET. If the Last-Modified cannot be used, ETag may be used as a replacement (see discussion above). It's critical to use Last-Modified to allow conditional GET with HTTP/1.0 compatible clients. Also note that Cache-Control must not include no-cache, must-revalidate or no-store because using any of these directives will break the back button in at least one user agent. However, if the content the server is transferring contains sensitive material that should not be stored in permanent storage, the no-store flag MUST be used regardless of breaking the back button. Warning: note that the use of no-store cannot prevent sensitive material ending up on the hard disk without encryption if the operating system has swapping enabled and the swap is not encrypted! Also note that using no-store makes very little sense unless the connection is encrypted (HTTPS/SSL).
Mostly OK, however you do need to bear in mind that HTTP/1.0 proxies may cache content served up as
Cache-Control: private
So you should set an explicit Date-modified header as well as the expires header.
For your 'Static non-public content' you should add a 'Varies: Cookie' header.
For your 'Volatile public content': How fast is it changing? Setting an TTL of +5 seconds may offload a lot of effort from your servers.
For 'Volatile non-public content' you should probably add no-cache,must-revalidate to the Cache-control header.
Pragma headers issued from the server should have no effect on clients nor proxies.
Do test out what happens when your cache expires (IME you can end up with a system even slower than one accessed with no populated cache due to all the conditional requests / 304 responses)

no 'Last-Modified' HTTP header -> however cached?

From a browser perspective,
What occur if a component (image, script, stylesheet...) is served without a Last-Modified HTTP header field...
Is it however cached by the browser even if it won't be able to perform a validity check(If-Modified-Since) in future, due to his lack of date/time information?
Eg:
GET /foo.png HTTP/1.1
Host: example.org
--
200 OK
Content-Type: image/png
...
Is foo.png however cached?
--
Would you know any online service to serve my raw HTTP response that I can write myself in order to test what I'm asking ?
Thank you.
Generally speaking, responses can be cached unless they explicitly say that they can't (e.g., with cache-control: no-store).
However, most caches will not store responses that don't have something that they can base freshness on, e.g., Cache-Control, Expires, or Last-Modified.
For the complete rules, see:
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-p6-cache-13#section-2.1
See:
http://www.mnot.net/blog/2009/02/24/unintended_caching
for an example of how this can surprise some people.
Yes, the image may get cached even without a Last-Modified response header.
The browser will then cache the image until its TTL expires. You can set the image's Time To Live using appropriate response headers, e.g. this would set the TTL to one hour:
Cache-Control: max-age=3600
Date: Tue, 29 Mar 2011 20:18:17 GMT
Expires: Tue, 29 Mar 2011 21:18:17 GMT
Even without any Last-Modified in the response, the browser may still use the Date header for subsequent If-Modified-Since requests.
I disabled the Last-Modified header on a large site and FF 13 doesn't take the contents from cache, although a max-age is given etc. Contents without a Last-Modified header ALWAYS get a status 200 ok when requested, not a 304. So the browser looks for it in the cache.

How to cache an HTTP POST response?

I would like to create a cacheable HTTP response for a POST request.
My actual implementation responds the following for the POST request:
HTTP/1.1 201 Created
Expires: Sat, 03 Oct 2020 15:33:00 GMT
Cache-Control: private,max-age=315360000,no-transform
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Content-Length: 9
ETag: 2120507660800737950
Last-Modified: Wed, 06 Oct 2010 15:33:00 GMT
.........
But it looks like that the browsers (Safari, Firefox tested) are not caching the response.
In the HTTP RFC the corresponding part says:
Responses to this method are not cacheable unless the response includes appropriate Cache-Control or Expires header fields. However, the 303 (See Other) response can be used to direct the user agent to retrieve a cacheable resource.
So I think it should be cached. I know I could set a session variable and set a cookie and do a 303 redirect, but I want to cache the response of the POST request.
Is there any way to do this?
P.S.: I've started with a simple 200 OK, so it does not work.
I'd also note that caching is always optional (it's a MAY in the HTTP/1.1 RFC). Since under most circumstances, a successful POST invalidates a cache entry, it's probably simply the case that the browser caches you're looking at just don't implement caching POST responses (since this would be pretty uncommon--usually this is accomplished by formatting things as a GET, which it sounds like you've done).
Short answer: POST caching rarely makes sense. A cache may serve GET requests to a URL which is the same as that of a previous POST, whose response came with a Content-Location header containing the POST's request URI.
From rfc-7231 (http-bis, superseding rfc-2616):
Responses to POST requests are only cacheable when they include
explicit freshness information (see Section 4.2.1 of [RFC7234]).
However, POST caching is not widely implemented. For cases where an
origin server wishes the client to be able to cache the result of a
POST in a way that can be reused by a later GET, the origin server
MAY send a 200 (OK) response containing the result and a
Content-Location header field that has the same value as the POST's
effective request URI (Section 3.1.4.2).
See also Mark Nottinghams Blog:
POSTs don't deal in representations of identified state, 99 times out
of 100. However, there is one case where it does; when the server goes
out of its way to say that this POST response is a representation of
its URI, by setting a Content-Location header that's the same as the
request URI. When that happens, the POST response is just like a GET
response to the same URI; it can be cached and reused -- but only for
future GET requests.
The rfc also describes a PRG sequence which has a similar effect, allowing the response cycle to a POST to fill the cache for a subsequent GET - which is probably more widely implemented.
Can you try to change the Cache-Control to public instead of private and see if it's working?

Resources