Why won't the browser cache my http response? - http

I have a REST api developed with django-tastypie. I have a couple of resources that are quite heavy but are not mutable, so I would like the browser to cache them to avoid unnecesary requests.
I've set the HTTP Expire header to a date far two years in the future, this is what the browser gets:
HTTP/1.1 200 OK
Date: Wed, 16 May 2012 17:29:33 GMT
Server: Apache/2.2.14 (Ubuntu)
Vary: Cookie,Accept-Encoding,User-Agent
Expires: Tue, 06 May 2014 17:29:33 GMT
Cache-Control: no-cache, public
Content-Encoding: gzip
Access-Control-Allow-Origin: *
Content-Length: 1051
Keep-Alive: timeout=15, max=82
Connection: Keep-Alive
Content-Type: application/json; charset=utf-8
I'm using jQuery.ajax to issue the request. The expires header looks good, but the request is made each time I refresh the page.

This is your problem:
Cache-Control: no-cache
from the spec:
This allows an origin server to prevent caching even by caches that
have been configured to return stale responses to client requests.

If this content can change, try to use ifModified: true in the jQuery.ajax

In your .ajax call, set the cache: attribute to true, like this:
$.ajax({
url: postUrl,
type: 'POST',
cache: true, /* this should be true by default, but in your case I would check this*/
data: stuff
});

Related

cache-control headers with last-modified in the past

I get the following header on CURL request.
Am I correct to assume that because the last-modified is that back in the past - the request is not cached in the browser?
➜ curl -I https://dommain.com/assets/sounds/intro.mp3
HTTP/2 200
date: Sat, 12 Nov 2022 23:39:39 GMT
content-type: audio/mpeg
content-length: 33976
last-modified: Tue, 01 Jan 1980 00:00:01 GMT
etag: "12cea601-84b8"
cache-control: no-cache
accept-ranges: bytes
strict-transport-security: max-age=15724800; includeSubDomains
No. You might be thinking of the Expires header. In the absence of Cache-Control an Expires header with a date in the past would mean that the cached response couldn't be used without revalidation.
In this case, the cache policy is set by the Cache-Control: no-cache header. That means that the browser can store the response, but can't serve it without confirming that it's up to date by making a conditional request.
Since the response has an ETag header, that will be used to determine if the cached response is still valid or not. In the absence of ETag, Last-Modified would be used for that purpose.
In sum, what you have here is a perfectly standard approach to caching, especially for large files. The response can be stored indefinitely, but can only be served from the cache after the origin server has confirmed that the stored response is still valid.

CloudFront ignores Cache-Control headers

CloudFront ignores my cache header and my pictures have to be picked up from the server again after a while.
~$ curl -I http://d2573vy43ojbo7.cloudfront.net/attachments/store/limit/64/3720c5574063aebc90511061b99de858740ad764c6981d2bf30ff121ada0/image.jpg
HTTP/1.1 200 OK
Content-Type: image/jpeg
Content-Length: 1645
Connection: keep-alive
Server: nginx/1.4.1
Date: Thu, 12 Feb 2015 14:37:41 GMT
Status: 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers:
Access-Control-Allow-Method:
Cache-Control: public, must-revalidate, max-age=31536000
Expires: Fri, 12 Feb 2016 14:37:41 GMT
Content-Disposition: inline; filename="image.jpg"
Last-Modified: Thu, 12 Feb 2015 14:37:41 GMT
X-Content-Type-Options: nosniff
X-Request-Id: 239b0fda-cae9-452f-9d1b-ccbf035bbf69
X-Runtime: 3.457939
X-Cache: Miss from cloudfront
Via: 1.1 6cde3c778df412041adc7610331b57bc.cloudfront.net (CloudFront)
X-Amz-Cf-Id: yicAkZYc5XpowKRFMOXDKSJKBMWZ4kq2B3vLK8Q-Py124D8lQq_1lg==
I tried to get the same file yesterday and then it was the same, after the second time i tried it was reached and served by CloudFront but not anymore. It's the same for all my images. They are cached but are removed from the cache after a couple of hours.
What's wrong? My cache behavior settings on CloudFront is set to default and it uses Origin Cache Headers.
Take a look here: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html
If an object in an edge location isn't frequently requested, CloudFront might evict the object—remove the object before its expiration date—to make room for objects that are more popular.
It means that object is not popular enough to stay in cache for a longer time. If you have enough viewers hitting this object AND this particular CloudFront location, it would have stayed in cache longer

Cross-domain chunked uploads using CORS

I have user-submitted files that I'm trying to upload in 10 MB chunks. I'm currently using raw XMLHttpRequest (and XDomainRequest) to push each individual slice (File.prototoype.slice) on the front end. The back end is Nginx using the upload module.
Just for reference, here's the synopsis of how I'm using slice:
element.files[0].slice(...)
I understand the cross-browser prefixed methods webkitSlice and mozSlice and all that.
The problem I have is with actually making the cross-domain request. I'm uploading from server.local to upload.server.local. In Firefox, the options request goes through fine and then the actual post fails. In Chrome and Opera, the options request fails with
OPTIONS https://URL Resource failed to load
Here are the headers from Firefox:
Request Headers
OPTIONS /path/to/asset HTTP/1.1
Host: upload.server.local:8443
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:18.0) Gecko/20100101 Firefox/18.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Origin: https://server.local:8443
Access-Control-Request-Method: POST
Access-Control-Request-Headers: content-disposition,content-type,x-content-range,x-session-id
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Response Headers
HTTP/1.1 204 No Content
Server: nginx/1.2.6
Date: Wed, 13 Feb 2013 03:27:44 GMT
Connection: keep-alive
access-control-allow-origin: https://server.local:8443
Access-Control-Allow-Methods: POST, OPTIONS
Access-Control-Allow-Headers: x-content-range, origin, content-disposition, x-session-id, content-type, cache-control, pragma, referrer, host
access-control-allow-credentials: true
Access-Control-Max-Age: 10000
The actual post request never leaves the browser. Nginx access logs never see the post. The browser halts it for some reason. How do I unravel why this post is being blocked?
Chromium 24
Firefox 18
Opera 12.14
I've verified all browsers support CORS properly here.
By pointing my uploads to https://cors-test.appspot.com/test, I have confirmed that the problem is definitely with the server-side headers.
The POST won't leave the browser if the preflight check does not return sufficient permissions and thus the POST request is not fully authorized. The request/response included in the question does look sufficient to me.
Are you sure you are setting withCredentials = true in your XMLHttpRequest?
Are you sure that you have valid (not self-signed) SSL certificates on your servers? The HTTPS might fail the CORS check even if you have added an exception for browsing the site with an invalid certificate.
Have you tried emptying your cache? You have Access-Control-Max-Age: 10000 set in your response headers. That's close to 3 hours. I know you've been working on this longer than that but while testing especially, set that header to zero instead so you don't go crazy with browser caching of old access permissions.
In general I'd start with going as permissive as possible with the CORS headers and slowly ratcheting up the the security to see where it fails. However, this is not completely straightforward. For example, according to the MDN documentation on CORS,
When responding to a credentialed request, server must specify a domain, and cannot use wild carding. The above example would fail if the header was wildcarded as: Access-Control-Allow-Origin: *
When I send the request part of your question to https://cors-test.appspot.com/test, I get back the following:
HTTP/1.1 200 OK
Cache-Control: no-cache
Access-Control-Allow-Origin: https://server.local:8443
Access-Control-Allow-Headers: content-disposition,content-type,x-content-range,x-session-id
Access-Control-Allow-Methods: POST
Access-Control-Max-Age: 0
Access-Control-Allow-Credentials: true
Cache-Control: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Content-Type: application/json
Content-Encoding: gzip
Content-Length: 35
Vary: Accept-Encoding
Date: Thu, 23 May 2013 06:37:34 GMT
Server: Google Frontend
So you can start from there and add more and more security until it breaks to figure out what is the culprit.

Difference between Pragma and Cache-Control headers?

I read about Pragma header on Wikipedia which says:
"The Pragma: no-cache header field is an HTTP/1.0 header intended for
use in requests. It is a means for the browser to tell the server and
any intermediate caches that it wants a fresh version of the resource,
not for the server to tell the browser not to cache the resource. Some
user agents do pay attention to this header in responses, but the
HTTP/1.1 RFC specifically warns against relying on this behavior."
But I haven't understood what it does? What is the difference between the Cache-Control header whose value is no-cache and Pragma whose value is also no-cache?
Pragma is the HTTP/1.0 implementation and cache-control is the HTTP/1.1 implementation of the same concept. They both are meant to prevent the client from caching the response. Older clients may not support HTTP/1.1 which is why that header is still in use.
There is no difference, except that Pragma is only defined as applicable to the requests by the client, whereas Cache-Control may be used by both the requests of the clients and the replies of the servers.
So, as far as standards go, they can only be compared from the perspective of the client making a requests and the server receiving a request from the client. The http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.32 defines the scenario as follows:
HTTP/1.1 caches SHOULD treat "Pragma: no-cache" as if the client had
sent "Cache-Control: no-cache". No new Pragma directives will be
defined in HTTP.
Note: because the meaning of "Pragma: no-cache as a response
header field is not actually specified, it does not provide a
reliable replacement for "Cache-Control: no-cache" in a response
The way I would read the above:
if you're writing a client and need no-cache:
just use Pragma: no-cache in your requests, since you may not know if Cache-Control is supported by the server;
but in replies, to decide on whether to cache, check for Cache-Control
if you're writing a server:
in parsing requests from the clients, check for Cache-Control; if not found, check for Pragma: no-cache, and execute the Cache-Control: no-cache logic;
in replies, provide Cache-Control.
Of course, reality might be different from what's written or implied in the RFC!
Stop using (HTTP 1.0)
Replaced with (HTTP 1.1 since 1999)
Expires: [date]
Cache-Control: max-age=[seconds]
Pragma: no-cache
Cache-Control: no-cache
If it's after 1999, and you're still using Expires or Pragma, you're doing it wrong.
I'm looking at you Stackoverflow:
200 OK
Pragma: no-cache
Content-Type: application/json
X-Frame-Options: SAMEORIGIN
X-Request-Guid: a3433194-4a03-4206-91ea-6a40f9bfd824
Strict-Transport-Security: max-age=15552000
Content-Length: 54
Accept-Ranges: bytes
Date: Tue, 03 Apr 2018 19:03:12 GMT
Via: 1.1 varnish
Connection: keep-alive
X-Served-By: cache-yyz8333-YYZ
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1522782193.766958,VS0,VE30
Vary: Fastly-SSL
X-DNS-Prefetch-Control: off
Cache-Control: private
tl;dr: Pragma is a legacy of HTTP/1.0 and hasn't been needed since Internet Explorer 5, or Netscape 4.7. Unless you expect some of your users to be using IE5: it's safe to stop using it.
Expires: [date] (deprecated - HTTP 1.0)
Pragma: no-cache (deprecated - HTTP 1.0)
Cache-Control: max-age=[seconds]
Cache-Control: no-cache (must re-validate the cached copy every time)
And the conditional requests:
Etag (entity tag) based conditional requests
Server: Etag: W/“1d2e7–1648e509289”
Client: If-None-Match: W/“1d2e7–1648e509289”
Server: 304 Not Modified
Modified date based conditional requests
Server: last-modified: Thu, 09 May 2019 19:15:47 GMT
Client: If-Modified-Since: Fri, 13 Jul 2018 10:49:23 GMT
Server: 304 Not Modified
last-modified: Thu, 09 May 2019 19:15:47 GMT

How can I stop browsers caching my web page using HTTP 1.1 headers?

Although I have set Expires to a date in the past, and Cache-Control to no-store, no-cache, I still get one of my web pages cached.
Here are the HTTP headers sent to the browser:
Date: Tue, 02 Nov 2010 09:13:23 GMT
Server: Apache/2.2.15 (el)
X-Powered-By: PHP/5.2.13
Set-Cookie: PHPSESSID=2luvb7b316lfc8ht570s1l1v84; path=/
Set-Cookie: Newsletter_Counter=17; expires=Wed, 02-Nov-2011 09:13:23 GMT; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 20
Connection: close
Content-Type: text/html; charset=UTF-8
Same behavior for FF 3.6, Safari and IE 8.
How do I get browsers to stop caching the page?
Browsers decide caching themselves. You can use a random GET parameter to force browsers not to cache, e.g.
http://www.foo.com/yourfile.zip?id=1234
The following headers have always worked well for me (for HTTP/1.1). You should not need Pragma: no-cache.
Cache-Control: no-cache
Expires: <some date in the past>
Vary: *
Try changing your Vary value to the asterisk from my example.
Per http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44:
"A Vary field value of "*" implies that a cache cannot determine from the request headers of a subsequent request whether this response is the appropriate representation."
Using Cache-Control: no-store should forbid any storage:
no-store
[…] If sent in a response, a cache MUST NOT store any part of either this response or the request that elicited it. This directive applies to both non- shared and shared caches. […]
You certainly seem to be doing the right things (but like a lot of people seem to assume that sending a 'Pragma: no-cache' response header has some effect on browser side caching - it should not).
What do you mean its getting cached? It will not (usually) be fetched again from the server if the user clicks on the 'back button' and was retrieved using a GET operation.

Resources