no 'Last-Modified' HTTP header -> however cached? - http

From a browser perspective,
What occur if a component (image, script, stylesheet...) is served without a Last-Modified HTTP header field...
Is it however cached by the browser even if it won't be able to perform a validity check(If-Modified-Since) in future, due to his lack of date/time information?
Eg:
GET /foo.png HTTP/1.1
Host: example.org
--
200 OK
Content-Type: image/png
...
Is foo.png however cached?
--
Would you know any online service to serve my raw HTTP response that I can write myself in order to test what I'm asking ?
Thank you.

Generally speaking, responses can be cached unless they explicitly say that they can't (e.g., with cache-control: no-store).
However, most caches will not store responses that don't have something that they can base freshness on, e.g., Cache-Control, Expires, or Last-Modified.
For the complete rules, see:
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-p6-cache-13#section-2.1
See:
http://www.mnot.net/blog/2009/02/24/unintended_caching
for an example of how this can surprise some people.

Yes, the image may get cached even without a Last-Modified response header.
The browser will then cache the image until its TTL expires. You can set the image's Time To Live using appropriate response headers, e.g. this would set the TTL to one hour:
Cache-Control: max-age=3600
Date: Tue, 29 Mar 2011 20:18:17 GMT
Expires: Tue, 29 Mar 2011 21:18:17 GMT
Even without any Last-Modified in the response, the browser may still use the Date header for subsequent If-Modified-Since requests.

I disabled the Last-Modified header on a large site and FF 13 doesn't take the contents from cache, although a max-age is given etc. Contents without a Last-Modified header ALWAYS get a status 200 ok when requested, not a 304. So the browser looks for it in the cache.

Related

Why does my CDN cache a certain document,but browsers do not?

I have a url with a test PDF on it, this is my origin:
https://powered-by.qbank.se/miso/MISO_Testing_Document279626.pdf
I have that origin setup in an Azure CDN using the Microsoft provider. it's url is:
https://misocdn-fail.azureedge.net/MISO_Testing_Document279626.pdf
When I update the PDF on the origin site, all the browsers that I have tested will bring back the NEW document with just an F5 refresh, not even a ctrl-F5. But, the CDN continues the cache the PDF basically indefinetly (2 days acording to docs or til I purge)
My question is, why isn't my CDN able to detect the change at the origin and browser is?
I understand that the CDN caches, but I don't understand what it is that a browser is doing to figure out this content is new?
To better understand the phenomenon it is a good start to obesrve the response headers received from the direct access url.
One way to do that is to use curl -I <YOUR_URL> in your terminal.
You will see something like:
HTTP/1.1 200 OK
Date: Mon, 01 Oct 2018 09:03:57 GMT
Server: Apache
Last-Modified: Fri, 28 Sep 2018 19:11:57 GMT
ETag: "11ff1-576f33ab4c2a0"
Accept-Ranges: bytes
Content-Length: 73713
Cache-Control: max-age=86400
Expires: Tue, 02 Oct 2018 09:03:57 GMT
Content-Type: application/pdf
Out of these headers the browser uses the Cache-Control, ETag and Last-Modified to determine the freshness of the requested content.
Cache-Control: max-age=<seconds> is the maximum amount of time (relative to the time of the request) a resource will be considered fresh.
Now, according to Mozilla Developer Network –MDN– Freshness is described as below:
Once a resource is stored in a cache, it could theoretically be served by the cache forever. Caches have finite storage so items are periodically removed from storage. This process is called cache eviction. On the other side, some resources may change on the server so the cache should be updated. As HTTP is a client-server protocol, servers can't contact caches and clients when a resource changes; they have to communicate an expiration time for the resource. Before this expiration time, the resource is fresh; after the expiration time, the resource is stale. Eviction algorithms often privilege fresh resources over stale resources. Note that a stale resource is not evicted or ignored; when the cache receives a request for a stale resource, it forwards this request with a If-None-Match to check if it is in fact still fresh. If so, the server returns a 304 (Not Modified) header without sending the body of the requested resource, saving some bandwidth.
So to validate a cached resource, an If-None-Match header will be issued by the browser if the ETag header was part of the response for the resource.
This is the mechanism that makes your browser download the new version of your pdf when accessed directly. Please also note, that these headers are present in the request from the CDN url as well, but the CDN edge servers are still storing your old file.
When it comes to the CDN cache, the ETag and Last-Modified headers are not respected. It is only the Cache-Control header in the HTTP response from the origin server that defines the time-to-live (TTL) period of a resource. In your case, it is 86400 seconds. So theoretically, the new version of your pdf will be served after 1 day from the first request through the CDN link.
Up until that moment the old pdf will be hosted by the CDN edge servers. You can read more about the Azure CDN expiration management in the Azure CDN documentation.

difference between request header cache policy and response header

I have some js hosted on AWS. I want to cache it to not to pay extra for 304 GET request, but I'm puzzled why two headers are different.
Request Method:GET
Status Code:304 Not Modified
Request header of helper.js
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
If-Modified-Since:Tue, 20 Aug 2013 13:08:13 GMT
and response header
Age:4348
Cache-Control:max-age=604800
Connection:keep-alive
Why they are different? Does it mean that Cache-Control is wrong? I used Chrome console to get the headers.
I don't think that Cache-Control is wrong and it seems that your content is already cached. From the request headers, I understand that the first request was done at Tue, 20 Aug 2013 13:08:13 GMT as the browser indicates the server "hey, has the content changed since that time?". In return, the server responses with 304 Not Modified header, indicating that the content has not been changed and it should be cached 604800 seconds more until revalidating it. Remember that the caching is done on server side. So, you may want to look at your server defitinitons on js files. Usually, in the deployment environment, I instruct my webserver to send cache header for *.js *.png etc. After configuring the web server for sending cache headers, it is the browser's work to take care of the rest. In that case, your browser works as expected.
You can look at RFC2616 for 304 response. You may also want to look at this decent caching tutorial. It should clear some ideas.
The problem is with Chrome. If you press Refresh button it invalidates the cache, but if you press Enter in address bar it gets the resources from cache.

What is Cache-Control: private?

When I visit chesseng.herokuapp.com I get a response header that looks like
Cache-Control:private
Connection:keep-alive
Content-Encoding:gzip
Content-Type:text/css
Date:Tue, 16 Oct 2012 06:37:53 GMT
Last-Modified:Tue, 16 Oct 2012 03:13:38 GMT
Status:200 OK
transfer-encoding:chunked
Vary:Accept-Encoding
X-Rack-Cache:miss
and then I refresh the page and get
Cache-Control:private
Connection:keep-alive
Date:Tue, 16 Oct 2012 06:20:49 GMT
Status:304 Not Modified
X-Rack-Cache:miss
so it seems like caching is working. If that works for caching then what is the point of Expires and Cache-Control:max-age. To add to confusion, when I test the page at https://developers.google.com/speed/pagespeed/insights/ it tells me to "Leverage browser caching".
Cache-Control: private
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache, such as a proxy server.
From RFC2616 section 14.9.1
To answer your question about why caching is working, even though the web-server didn't include the headers:
Expires: [a date]
Cache-Control: max-age=[seconds]
The server kindly asked any intermediate proxies to not cache the contents (i.e. the item should only be cached in a private cache, i.e. only on your own local machine):
Cache-Control: private
But the server forgot to include any sort of caching hints:
they forgot to include Expires (so the browser knows to use the cached copy until that date)
they forgot to include Max-Age (so the browser knows how long the cached item is good for)
they forgot to include E-Tag (so the browser can do a conditional request)
But they did include a Last-Modified date in the response:
Last-Modified: Tue, 16 Oct 2012 03:13:38 GMT
Because the browser knows the date the file was modified, it can perform a conditional request. It will ask the server for the file, but instruct the server to only send the file if it has been modified since 2012/10/16 3:13:38:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
The server receives the request, realizes that the client has the most recent version already. Rather than sending the client 200 OK, followed by the contents of the page, it instead tells you that your cached version is good:
304 Not Modified
Your browser did have to suffer the round-trip delay of sending a request to the server, and waiting for the response, but it did save having to re-download the static content.
Why Max-Age? Why Expires?
Because Last-Modified sucks.
Not everything on the server has a date associated with it. If I'm building a page on the fly, there is no date associated with it - it's now. But I'm perfectly willing to let the user cache the homepage for 15 seconds:
200 OK
Cache-Control: max-age=15
If the user hammers F5, they'll keep getting the cached version for 15 seconds. If it's a corporate proxy, then all 67,198 users hitting the same page in the same 15-second window will all get the same contents - all served from close cache. Performance win for everyone.
The virtue of adding Cache-Control: max-age is that the browser doesn't even have to perform a "conditional" request.
if you specified only Last-Modified, the browser has to perform a If-Modified-Since request, and watch for a 304 Not Modified response
if you specified max-age, the browser won't even have to suffer the network round-trip; the content will come right out of the caches.
The difference between "Cache-Control: max-age" and "Expires"
Expires is a legacy (c. 1998) equivalent of the modern Cache-Control: max-age header:
Expires: you specify a date (yuck)
max-age: you specify seconds (goodness)
And if both are specified, then the browser uses max-age:
200 OK
Cache-Control: max-age=60
Expires: 20180403T192837
Any web-site written after 1998 should not use Expires anymore, and instead use max-age.
What is ETag?
ETag is similar to Last-Modified, except that it doesn't have to be a date - it just has to be a something.
If I'm pulling a list of products out of a database, the server can send the last rowversion as an ETag, rather than a date:
200 OK
ETag: "247986"
My ETag can be the SHA1 hash of a static resource (e.g. image, js, css, font), or of the cached rendered page (i.e. this is what the Mozilla MDN wiki does; they hash the final markup):
200 OK
ETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
And exactly like in the case of a conditional request based on Last-Modified:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
304 Not Modified
I can perform a conditional request based on the ETag:
GET / HTTP/1.1
If-None-Match: "33a64df551425fcc55e4d42a148795d9f25f89d4"
304 Not Modified
An ETag is superior to Last-Modified because it works for things besides files, or things that have a notion of date. It just is
RFC 2616, section 14.9.1:
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache...A private (non-shared) cache MAY cache the response.
Browsers could use this information. Of course, the current "user" may mean many things: OS user, a browser user (e.g. Chrome's profiles), etc. It's not specified.
For me, a more concrete example of Cache-Control: private is that proxy servers (which typically have many users) won't cache it. It is meant for the end user, and no one else.
FYI, the RFC makes clear that this does not provide security. It is about showing the correct content, not securing content.
This usage of the word private only controls where the response may be cached, and cannot ensure the privacy of the message content.
The Expires entity-header field gives the date/time after which the response is considered stale.The Cache-control:maxage field gives the age value (in seconds) bigger than which response is consider stale.
Althought above header field give a mechanism to client to decide whether to send request to the server. In some condition, the client send a request to sever and the age value of response is bigger then the maxage value ,dose it means server needs to send the resource to client? Maybe the resource never changed.
In order to resolve this problem, HTTP1.1 gives last-modifided head. The server gives the last modified date of the response to client. When the client need this resource, it will send If-Modified-Since head field to server. If this date is before the modified date of the resouce, the server will sends the resource to client and gives 200 code.Otherwise,it will returns 304 code to client and this means client can use the resource it cached.

Chrome doesn't send "If-Modified-Since"

I want browsers to always add (except first time) "If-Modified-Since" request header to avoid unnecessary traffic.
Response headers are:
Accept-Ranges:bytes
Cache-Control:max-age=0, must-revalidate
Connection:Keep-Alive
Content-Length:2683
Content-Type:text/html; charset=UTF-8
Date:Thu, 05 Apr 2012 13:06:19 GMT
Keep-Alive:timeout=15, max=497
Last-Modified:Thu, 05 Apr 2012 13:05:11 GMT
Server:Apache/2.2.21 (Red Hat)
FF 11 and IE 9 both send "If-Modified-Since" and get 304 in response but Chrome 18 doesn't and get 200.
Why? How to force Chrome to sent "If-Modified-Since" header?
I do not know if it important or not but all requests going through HTTPS.
I've been chasing this issue for some time, thought I'd share what I found.
"The rule is actually quite simple: any error with the certificate means the page will not be cached."
https://code.google.com/p/chromium/issues/detail?id=110649
If you are using a self-signed certificate, even if you tell Chrome to add an exception for it so that the page loads, no resources from that page will be cached, and subsequent requests will not have an If-Modified-Since header.
I just now found this question, and after puzzling over Chrome's If_Modified_Since behavior, I've found the answer.
Chrome's decision to cache files is based on the Expires header that it receives. The Expires header has two main requirements:
It must be in Greenwich Mean Time (GMT), and
It must be formatted according to RFC 1123 (which is basically RFC 822 with a four-digit year).
The format is as follows:
Expires: Sat, 07 Sep 2013 05:21:03 GMT
For example, in PHP, the following outputs a properly formatted header.
$duration = time() + 3600 // Expires in one hour.
header("Expires: " . gmdate("D, d M Y H:i:s", $duration) . " GMT");
("GMT" is appended to the string instead of the "e" timezone flag because, when used with gmdate(), the flag will output "UTC," which RFC 1123 considers invalid. Also note that the PHP constants DateTime::RFC1123 and DATE_RFC1123 will not provide the proper formatting, since they output the difference to GMT in hours [i.e. +02:00] rather than "GMT".)
See the W3C's date/time format specifications for more info.
In short, Chrome will only recognize the header if it follows this exact format. This, combined with the Cache-Control header...
header("Cache-Control: private, must-revalidate, max-age=" . $duration);
...allowed me to implement proper cache control. Once Chrome recognized those headers, it began caching the pages I sent it (even with query strings!), and it also began sending the If_Modified_Since header. I compared it to a stored "last-modified" date, sent back HTTP/1.1 304 Not Modified, and everything worked perfectly.
Hope this helps anyone else who stumbles along!
I've noticed almost the same behaviour and my findings are:
First of all the 200 status indicator in chrome is not the whole truth, you need to look at the "Size Content" column as well. If this says "(from cache)" the resource was take directly from cache without even asking if it was modified.
This caching behaviour of resources that lack any indication of expires or max-age seems to apply when requesting static files that have a last-modified header. I've noted that chrome (ver. 22):
Asks for the file the first time (obviously since it is not in cache).
Asks if it is modified the second time (since it is in cache but has no indication of freshness).
Uses it directly the third time and then on (even if it is a new browser session).
I'm a bit puzzled by this behaviour but it is fairly reasonable, if it is static, was modified a long time ago, and hasn't changed since last check you could assume that it is going to be valid for a while longer (don't know how they calculate it though).
I had the same problem, in Chrome all requests were always status code 200, in other browsers 304.
It turned out I had the disable cache (while DevTools is open) checked in on Devtools - Settings - General page..:)
Don't disable cache in Chrome Dev Tools (on "Network" tab).
Cache-Control should be Cache-Control: public. Use true as second parameter to header PHP function: header("Cache-Control: public", true);
I have also found that Chrome (recent v95+) also returns a cached 200 response if I have DevTools open. It never even sends the request on to the server! If I close DevTools, the behaviour is as it should, and the server receives the expected request.

How to cache an HTTP POST response?

I would like to create a cacheable HTTP response for a POST request.
My actual implementation responds the following for the POST request:
HTTP/1.1 201 Created
Expires: Sat, 03 Oct 2020 15:33:00 GMT
Cache-Control: private,max-age=315360000,no-transform
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Content-Length: 9
ETag: 2120507660800737950
Last-Modified: Wed, 06 Oct 2010 15:33:00 GMT
.........
But it looks like that the browsers (Safari, Firefox tested) are not caching the response.
In the HTTP RFC the corresponding part says:
Responses to this method are not cacheable unless the response includes appropriate Cache-Control or Expires header fields. However, the 303 (See Other) response can be used to direct the user agent to retrieve a cacheable resource.
So I think it should be cached. I know I could set a session variable and set a cookie and do a 303 redirect, but I want to cache the response of the POST request.
Is there any way to do this?
P.S.: I've started with a simple 200 OK, so it does not work.
I'd also note that caching is always optional (it's a MAY in the HTTP/1.1 RFC). Since under most circumstances, a successful POST invalidates a cache entry, it's probably simply the case that the browser caches you're looking at just don't implement caching POST responses (since this would be pretty uncommon--usually this is accomplished by formatting things as a GET, which it sounds like you've done).
Short answer: POST caching rarely makes sense. A cache may serve GET requests to a URL which is the same as that of a previous POST, whose response came with a Content-Location header containing the POST's request URI.
From rfc-7231 (http-bis, superseding rfc-2616):
Responses to POST requests are only cacheable when they include
explicit freshness information (see Section 4.2.1 of [RFC7234]).
However, POST caching is not widely implemented. For cases where an
origin server wishes the client to be able to cache the result of a
POST in a way that can be reused by a later GET, the origin server
MAY send a 200 (OK) response containing the result and a
Content-Location header field that has the same value as the POST's
effective request URI (Section 3.1.4.2).
See also Mark Nottinghams Blog:
POSTs don't deal in representations of identified state, 99 times out
of 100. However, there is one case where it does; when the server goes
out of its way to say that this POST response is a representation of
its URI, by setting a Content-Location header that's the same as the
request URI. When that happens, the POST response is just like a GET
response to the same URI; it can be cached and reused -- but only for
future GET requests.
The rfc also describes a PRG sequence which has a similar effect, allowing the response cycle to a POST to fill the cache for a subsequent GET - which is probably more widely implemented.
Can you try to change the Cache-Control to public instead of private and see if it's working?

Resources