AWS CloudFront, `Vary` header and content negotiation - http

I'm trying to implement content negotiation based on client Accept headers so that clients that accept image/webp get webp images while clients that don't get plain old jpeg. webp and jpeg image are served from the same url, i.e. /images/foo-image/ and the content returned varies on the Accept header presented by the client. This now works great on my site.
Next challenge is to get this working AWS CloudFront sitting in front of my site. I'm setting the Vary header to Vary: Accept to let CloudFront know that it has to cache and serve different content based on the client Accept headers.
This doesn't seem to work unfortunately, i.e. CloudFront just serves up whatever it first gets it's hands on, Vary and Accept notwithstanding. Interestingly, CloudFront does seem to be able to vary content based on Accept-Encoding (i.e. gzip).
Does anyone know what gives?

It turns out this is documented as not supposed to work:
The only acceptable value for the Vary header is Accept-Encoding. CloudFront ignores other values.
UPDATE: AWS now has support for more sophisticated content negotiation. I wrote a blog post on how to take advantage of this.

Just to update this question, CloudFront now supports caching by different headers, so you can now do this

Related

nginx Http2 Push fails when Vary: Accept header set

Basically, http2 push using http2_push_preload doesn't work if you set header Vary: Accept on your response because you are doing content negotiation using the Accept request header. I'm using content negotiation to send (http2 push) webp pics instead of jpg to clients that support it.
HTTP/2 Push works for .js, .css files and all in the same call and shows "Push/Other" in Chrome DevTools, but fails for this one unique case (jpg content negotiated to webp), and shows just "Other" (not pushed) in Chrome DevTools.
Content negotiation for brotli, gzip compressions all work fine and get pushed properly using the Vary: Accept-Encoding and same for languages using the Vary: Accept-Language.
Only Vary: Accept fails.
Please help I'm at the point of giving up.
P.S: I was going through nginx source https://github.com/nginx/nginx/blob/master/src/http/v2/ngx_http_v2.c. Do a Crtl+F and you will find cases for only "Accept-Encoding" and "Accept-Language", nothing for "Accept". So I think "Accept" case is not yet supported by nginx??
P.P.S: I'm not overpushing, only using http2 push for the hero image.
Edit: Here's bug ticket on nginx site for those who want to track it:
https://trac.nginx.org/nginx/ticket/1851
https://trac.nginx.org/nginx/ticket/1817
Edit 2: Nginx team has responded by saying they are not going to support it due to security reasons (you can find the response in the duplicate bug post), which I believe is due to pushing from different origins like CDNs? Anyway, I need this feature, so the only option left is to:
Create a custom patch or package.
Use some other server software that supports it.
Manually implement in website code a feature to rewrite .jpg paths to .jpg.webp if requests are coming from clients that support webp.
(I don't give up :P)
I'm not entirely surprised by this and Apache does the same. If you want this to change suggest to raise a bug with nginx but wouldn't be surprised if they didn't prioritise it.
It also seems the browsers don't handle this situation very well either.
HTTP/2 push is fraught with opportunities to over push and this is one example. You should not push if client does not support WebP and you often won't know that with the information that you have at this point. Chrome seems to send webp in the accept header when you ask for the HTML for example, but Firefox does not.
Preload is a much better, safer, option that will respect vary headers and also cache status.

How to ensure my CDN caches CORS requests by origin?

I currently use Akamai as a CDN for my app, which is served over multiple subdomains.
I recently realized that Akamai is caching CORS requests the same, regardless of the origin from which they were requested.
This of course causes clients that make requests with a different Origin than the cached response to fail (since they have a different response header for Access-Control-Allow-Origin than they should)
Many suggest supplying the Vary: Origin request header to avoid this issue, but according to Akamai's docs and this Akamai community post, this isn't supported by Akamai.
How can I force Akamai to cache things uniquely by Origin if an Origin header is present in the request?
I did some research, and it appears this can be done by adding a new Rule in your Akamai config, like so:
Note that if you do this - REMEMBER - this changes your cache key at Akamai, so anything that was cached before is essentially NOT CACHED anymore! Also, as noted in the yellow warning labels, this can make it harder to force reset your cache using Akamai's url purging tools. You could remove the If block, and just include Origin header as a similar Cache ID Modification rule too, if you were ok with changing the cache key for all your content that this rule would apply to.
So in short, try this out on a small section of your site first!
More details can be found in this related post on Stack Overflow
We have hosted an API on Akamai. I had similar requirement, but we wanted to use the cached response on Akamai for all the touchpoints. But without CORS settings, it used to cache the response from first origin, and then keep it in cache, and the following requests from other touch points use to fail due to cached origin header.
We solved the problem with using API Gateway feature provided by Akamai. You can find it under API Definition. Custom cache parameters can also be defined here. Please see the screen shot for the CORS settings. Now it cached the response from backend and serve to the requester as per the allowed origin list.
CORS Setting in API Definition

Disable cache sharing among websites

Is there a way to tell the browser not to share a cached resource among websites?
I want to give websites a link to some JavaScript on my server and I want to make the response be different for each domain using the Referer header as check.
The response which will be cached should be available to the domain that requested it and when the end users visit another site that uses the script link, another request should be made.
I don't know whether I understand your question.
Does your scenario like: stackoverflow.com and yourwebsite.com use the same script called "https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js", but you don't want to share the cached script with stackoverflow.com
This is under the control of googleapis.com's web server.
So if the cached resource's origin server(googleapis.com) want to implement the feature as you said, it may use the Vary response header. Vary Header define the secondary key of cache.
Maybe "Vary: Origin" but only work for CORS
Maybe "Vary: referer" but referer contains url path
It still doesn't solve your problem but I hope it helps.
see MDN HTTP Cache Doc and [RFC 7234 Section 4.1]

HTTP "Don't execute!" Header

On my website, files can be shared by URLs like
"/file/file_id",
and the server sends back exactly the file contents with the filename being specified too.
I guess I should do something with the Content-Type header. If I say
Content-Type: "image"
Firefox gladly executes html files too. It seems to be solved by
Content-Type: "image/jpeg"
For one I think having to just say "I'm an image!" should be sufficient by standards. For example with a typo(leaving off "jpeg") I could exploit my whole site. Plus now I have to look after all common image types and implement headers for them.
Secondly it would be great if there was a header for this(DO NOT EXECUTE). Is there one?
I looked at some "X-XSS-Protection" header but it looks like something else only IE understands anyway. Sorry if this was answered somewhere, I have not found it.
X-Content-Type-Options: nosniff
Makes browsers respect the Content-Type you send, so if you're careful to only send known-safe types (e.g. not SVG!), it'll be fine.
There's also CSP that might be a second line of defence:
Content-Security-Policy: default-src 'none'
Sites that are very careful about security host 3rd party content on a completely different top-level domain (to get same-origin policy protection and avoid cookie injection through compromised subdomains).
Traditionally there have been many ways to circumvent the different protections. As such, a full defense relies on multiple mechanisms (defense-in-depth).
Most larger companies solve this by hosting such files on custom domain (e.g. googleusercontent.com). If an attacker is able to execute script on such a domain, at least that does not give XSS access to the main web site.
X-Content-Type-Options is a non-standard header, and was up until very recently not supported in Firefox, but it is still a part of the defense. It's possible to construct files which are valid in many formats (I have a file that is a "valid" gif, html, javascript and pdf).
Images can normally be served directly (with x-content-type-options).
Other files can be served with content-type text/plain, while serving others with "Content-Disposition: attachment" to force a download instead of showing them in the browser.

How can I avoid zero-byte files on Cloudfront edge locations?

We've just discovered that one of Cloudfront's edge locations is returning a zero-byte file for one of our javascript assets. The invalidation is running right now, but I'm beginning to think this phenomenon may be the source of widespread but strangely un-reproducible bugs that our customers have been reporting for months now.
We're using Cloudfront with a Custom Origin (Nginx serving static files from an EC2 server). It would appear that with every deploy to our application that introduces new asset names (e.g. changed file version), we have a non-zero chance of one or more Cloudfront edge locations getting a 0-byte file.
Is there any way to avoid this?
Is there any way to detect this?
[sentiment redacted]
There is a very similar problem which has been discussed on the AWS forum. It seems to boil down to your server not sending a Content Length header with your custom origin.
Note the excerpt from the forum, which may be related:
Unfortunately your origin doesn't appear to provide a Content-Length
header. Without a Content-Length header CloudFront can't determine
that a truncated object was received and will cache it. If your origin
can send a Content-Length header any truncated objects will not be
cached. See the Developer Guide for more details.
Try adding the ContentLength header, that should do the trick.

Resources