HTTP ETag reproduction - http

Having recently discovered problems relating to the HTTP ETag and our CDN I've tried to capture some in Fiddler for well known sites. However it appears that whatever combination of browser / website I use I'm not seeing any pass by.
Is there any reason for this? Can you suggest a combination in which I can see them? Perhaps they're not widely used anymore?

They are definitely widely used, I've used it myself often. The most common usecase is conditional requests (always check if there's new content, but only send the content back from the server if it has changed).
However, Last-Modified can also do this instead and it's not needed if you don't force the browser to always check for new content (no must-revalidate).
The reason your CDN isn't using them is one of the following:
They are using Last-Modified instead
They don't force a revalidation and set an expiry time well in the future
They couldn't determine a ETag for a particular piece of content
Misconfiguration

Related

How does hsts preload work on the backend

I know the hsts (http to https) will work from the very first time If my site is registered in the preload list.
On the other hand I am also declaring preload in hsts header in my web server.
What if I access my site for the very first time with http which one is gonna happen first?
I mean will the site access the preload list first or web server first?
You need to submit your site to the browsers preload list. It will then yet you are issuing the preload header (to prevent bad actors submitting sites to preload list when they don’t want it), and include it in the inbuilt list in future release.
Some browsers also regularly scan or crawl websites looking for sites with preload headers to include. Though I believe this is done less, and it’s better to explicitly submit your site.
After the site is included in the browsers preload list and request for http:// version will automatically be converted to https://. This happens before you send the request, so before you get the HSTS header response.
That’s the point of preloading - to protect you before you even make a single request.
Personally I’m not a fan of preload. Hard coding a list of sites something into a browser has obvious scaling issues but, more importantly, when you do that you’re taking a risk with something you can’t change back without waiting months or possibly years for browser vendors to pickup the reverted setting to remove the code. I personally believe preload is overkill for most sites.

Securing HTTP referer

I develop software which stores files in directories with random names to prevent unauthorized users to download a file.
The first thing we need about this is to store them in a separate top-level domain (to prevent cookie theft).
The second danger is HTTP referer which may reveal the name of the secret directory.
My experiments with Chrome browser shows that HTTP referer is sent only when I click a link in my (secret) file. So the trouble is limited only to files which may contain links (in Chrome HTML and PDF). Can I rely on this behavior (not sending the referer is the next page is opened not from a current (secret) page link but with some other method such as entering the URL directly) for all browsers?
So the problem was limited only to HTML and PDF files. But it is not a complete security solution.
I suspect that we can fully solve this problem by adding Content-Disposition: attachment when serving all our secret files. Will it prevent the HTTP referer?
Also note that I am going to use HTTPS for a man-in-the-middle not to be able to download our secret files.
You can use the Referrer-Policy header to try to control referer behaviour. Please take note that this requires clients to implement this.
Instead of trying to conceal the file location, may I suggest you implement proper authentication and authorization handling?
I agree that Referrer-Policy is your best first step, but as DaSourcerer notes, it is not universally implemented on browsers you may support.
A fully server-side solution is as follows:
User connects to .../<secret>
Server generates a one-time token and redirects to .../<token>
Server provides document and invalidate token
Now the referer will point to .../<token>, which is no longer valid. This has usability trade-offs, however:
Reloading the page will not work (though you may be able to address this with a cookie or session)
Users cannot share URL from URL bar, since it's technically invalid (in some cases that could be a minor benefit)
You may be able to get the same basic benefits without the usability trade-offs by doing the same thing with an IFRAME rather than redirecting. I'm not certain how IFRAME influences Referer.
This entire solution is basically just Referer masking done proactively. If you can rewrite the links in the document, then you could instead use Referer masking on the way out. (i.e. rewrite all the links so that they point to https://yoursite.com/redirect/....) Since you mention PDF, I'm assuming that this would be challenging (or that you otherwise do not want to rewrite the document).

Can we use etags to get the latest version of image from a CDN

We have a use case where we are storing our images in a CDN. Let's say we are storing a.jpg in the cache and if the user uploads a newer version of the file, then it will flush the cache and overwrite the a.jpg. Now the challenge is that the browser might have cached the file. Since we cannot flush the cached image in the browser we are thinking of using one of the 2 approaches mentioned below :
Append a version a_v1.jpg, a_v2.jpg (version id is the checksum) this will eliminate the need for flushing the browser and CDN cache. I found a lot of documentation about this on the internet and so many people are using this.
Use the etag of the file to find eliminate the stale cache in the browser. I found that CDN's support etags but I did not find literature that etag is used for images.
Can you please share your thoughts about using etag header for cache busting ? Is this a good practice to use it ?
well i wouldn't suggest etag. This might have its advantage but has its setbacks as well. Say you are running two servers then the etag when content served from each of these servers might change.
Best thing i would suggest is control what the browser is caching and how long.
What i mean is send expiry headers when sending response from cdn to client browser say 5min TTL. This way browser will respect the expiry header. And once expired browser will send a fresh request to cdn when the page is refreshed.

nginx: Take action on proxy_pass response headers

I'd like to use nginx as a front-end proxy, but then have it conditionally proxy to another URL depending on the MIME type (Content-Type header) of the response.
For instance, suppose 1% of my clients are using a User-Agent that doesn't handle PNGs. For that UA, if the response is of type, image/png, I want to proxy_pass again to a special URL that'll get the PNG and convert it for me.
Ideally I'd do this without hurting performance and caching for the 99% of users that don't need this special handling. I can't modify the backend application. (Otherwise I could have it detect the UA and fix the response, or send an X-Accel-Redirect to get nginx to run another location block.)
If this isn't possible or has bad performance, where would I look to start writing a module to achieve the desired effect? As in, which extension point gets me closest to implementing this logic?
Edit: It seems like I could use Lua to perform a subrequest then inspect the response headers there. But that'd mean passing every request through Lua which seems suboptimal
Although I'm sure there could be valid reasons to do what you want to do, your actual image/png example is not as straightforward as it may look. Browsers no longer include image/png in their Accept HTTP request headers like they've used to in the old days when png was new, so, you'd have to have the whole detection and mapping tables of the really-really old browsers.
Additionally, from the architecture perspective, it's not very clear what you are trying to accomplish.
If this is static data, then why are you proxying it to the backend in the first place, instead of serving the static data directly from nginx? Will the \.png$ regex not match the affected request URIs? Couldn't you solve this without involving a backend, or even by rewriting the request without sending the wrong one to the backend first?
If this is really dynamic, then why do you need to waste the time for making a request only to receive a reply of an unacceptable type, instead of having the special-case mapping tables based on your knowledge of how the app works, and bypassing the needless requests from the start, instead of discarding them later on?
If the app is truly a black box, and you require a general purpose solution that'll work for any app, then it's still unclear what the usecase is, and why the extra requests have to be made, only to be discarded.
If you're really looking to mess with only 1% of your traffic as in your image/png example, then perhaps it might make sense to redirect all requests from the affected 1% of the old browsers to a separate backend, which would have the logic to do what you require.
Frankly, if you want to target the really-really old browsers, then I think png support should be the last of your worries. Many webapps include very complex and special-purpose JavaScript that wouldn't even work in new alternative browsers with new User-Agent strings, let alone any old webbrowsers that didn't even have png support.
So according to http://wiki.nginx.org/HttpCoreModule#.24http_HEADER. I assume you can have something along the lines of
if ($content_type ~ 'whatever your content type') {
proxy_pass 'ur_url'
}
Would that be something that works for you?

Client with ETag always performs conditional GET

I am working on a new API. With ETags enabled, Chrome always conditionally GETs a resource with a long cache lifetime, that is, Chrome no longer uses the cached version without first confirming the ETag.
Is this how ETags are supposed to work with GET?
If so, and considering that I don't need them for update race-checking, I will remove the ETags and alleviate the extra work my server is doing.
Edit
I would have expected Chrome to use the cached item according to the max-age until it expires and then use the conditional GET to update its cached-item.
As I write this, it has now occurred to me that it would not be able to 'update its cached-item' since the 304 contains no max-age to use to extend the expiry time.
So I guess with ETags enabled, a client will always conditionally GET unless there's a good reason to use the cached version, such as no network.
Thus, it appears that ETags harm server performance when no concurrency control is needed.
Edit
I think I've already guessed the answer (above) but here's the question in a nutshell:
When ETags are enabled and the Max-Age is far in the future, should User Agents always use conditional GET instead of using the previously cached, fresh response?
How you load your page in the browser might be relevant here:
If you press Ctrl and reload the page using the refresh button, that will cause an unconditional reload of your resources with 200s returned.
If you just reload using the refresh button (or equivalent key like F5), conditional requests will be sent and 304s will be returned for static resources.
If you press enter in the address box, add the page to your bookmarks and load from there, or visit the page from a hyperlink, the cache max-age should work as expected.
With ETags enabled, Chrome always conditionally GETs a resource with a long cache lifetime, [...] Is this how ETags are supposed to work with GET?
The answer is in the RFC:
10.3.5 304 Not Modified
If the client has performed a conditional GET request and access is
allowed, but the document has not been modified, the server SHOULD
respond with this status code.
So, yes.
If this is not the answer you expect, you might want to include some actual traffic in your question that shows the order of request-response pairs that you expect and that you actually get.
considering that I don't need them for update race-checking
If you're not going to use conditional requests (If-Match and the likes), you may indeed omit the ETag calculation and processing.

Resources