Should Cache-Control max-age be set if marked public? - http

I was trying to figure out the best cache headers for my site and couldn't find a good resource about setting both max-age and public.
In my situation I have a number of files where I am not worried about them expiring, if I were to set max-age it would probably be a month or year. But my question is, should the max-age be omitted, and just specify Cache-Control: public? Isn't that essentially saying "cache for as long as you can"?
TL;DR Are there any advantages/disadvantages to setting max-age if public is set and the objects have an indefinite expiry time?

Cache-Control: public means that the resource is publicly available, and thus may be put in cache available to several users (such as a corporate Internet proxy). It has nothing to do with how long a resource may be cached. See RFC2616.

Related

Difference of 304 and 200 (from disk cache)

nginx version: nginx/1.14.0 (Ubuntu)
Trying to study how to deal with browser cache.
Could you explain to me why in case of html the browser sent request to the server, whereas in case of css it didn't?
In other words, why in case of html we have 304, but in case of css we have 200 (from disk cache)?
The server didn't provide any information to the browser on how long to cache its resources. (That is, it didn't include Cache-Control or Expires headers.) Therefore the browser is free to come up with its own heuristic freshness, as described in RFC 7234:
Since origin servers do not always provide explicit expiration times,
a cache MAY assign a heuristic expiration time when an explicit time
is not specified, employing algorithms that use other header field
values (such as the Last-Modified time) to estimate a plausible
expiration time.
Presumably the browser assigned a longer freshness time to the static CSS resource than it did to the HTML page. Which makes sense.
If you care how your resources are cached, the answer is simple—provide explicit direction using the appropriate cache headers.

Allow reverse proxy cache but not browser cache

There are so many questions asking "how to make sure the pages are not cached", and answers like "this is how to instruct the both clients and proxy servers not to cache". I'm instead looking for a way to achieve "allow proxy cache but not clients (i.e. browsers) cache".
In fact, I found that setting no cache-related headers can achieve this, but I'm not sure if this is the right way to do it and there's no explicit way of instructing it.
In the HTTP jargon these caches are referred as shared or public (proxy) and private caches (browser).
You should use a response header similar to this:
cache-control: public, max-age=0, s-maxage=${seconds}
Being ${seconds} the TTL of the cached elements. The key here is using the directive s-maxage.
If a response includes an s-maxage directive, then for a shared cache (but not for a private cache), the maximum age specified by this directive overrides the maximum age specified by either the max-age directive or the Expires header. The s-maxage directive also implies the semantics of the proxy-revalidate directive (see section 14.9.4), i.e., that the shared cache must not use the entry after it becomes stale to respond to a subsequent request without first revalidating it with the origin server. The s- maxage directive is always ignored by a private cache.
Notice that this header does not exempt the browser from caching the resource (using it to serve a future request), but instead forces it to revalidate its content (if a last-modified or etag header is also returned)

Why does google pagespeed asks to specify ETag even when cache headers are set

I have set cache headers to be far in future (1 year from now) and have disabled the ETags as advised by the YSlow (http://developer.yahoo.com/performance/rules.html#etags) but Google pagespeed seems to require ETag (or last-modified) even after the cached headers are set.
"It is important to specify one of Expires or Cache-Control max-age, and one of Last-Modified or ETag, for all cacheable resources."
The two rules seems to be conflicting each other.
YSlow does not advise to remove ETags in general but for some environments. When not using ETags then you should use Last-Modified instead.
ETag and Last-Modified are for conditional GET-Requests when re-requesting an already cached and maybe expired resource.
Cache-Control max-age is for defining how long a cached item is valid for sure without asking again. (When expired by this rule then the browser will make a conditional GET ...)
So in your case:
Browser is caching the resource for one year. Within that year no request for this resource is done at all. It's directly served from local cache. (uses Cache-Control header settings.)
Browser does conditional Request after one year expired to check if something changed. The server responds with HTTP 304 and empty body when nothing changed. The browser continues to use its cached item in that case without the need of retransmission. (uses ETag and/or Last-Modified header settings)
(The browser may or may not respect your data. For example it is possible that a browser will do a conditional request even when one year has not been expired yet.)
For highly optimized sites the Cache-Control is far more important, because you set it faaaar future expire headers and simply change the URL for the resource in case it changed. While this prevents the use of conditional Requests it gives you the ability to be extremly aggressive when defining the expires header while being able to serve new versions of the resource immediatly to everybody at the same time. This is because of the new URL it seems to be a new resource in browser's view.
For Java there exists a framework called jawr which makes use of these and other concepts without having negative impact to your site development.
ETag and Cache-Control headers are not exclusive. The reason the page you linked to recommends to remove ETags is to reduce the size of the HTTP headers.. which will at best save you a few bytes. Here's a use case where and why is still makes sense to have both:
You provide application.js with one week expiry date, and an etag fingerprint
Week passes, user comes back to your site: the file has expired, and the browser dispatches a conditional request, if the file has not been modified, the browser can decide to skip requesting the file entirely. (Last-Modified works too)
If you don't provide an ETag or Last-Modified, the browser has to request and download the entire file.
Good related resource: https://developers.google.com/speed/articles/caching

HTTP Headers: Is Cache-Control enough, or do I still need Expires?

HTTP 1.1 introduced a new class of headers, Cache-Control response headers, to give Web publishers more control over their content, and to address the limitations of Expires.
Expires is kind of a pain due to its limitations. First, because there’s an absolute date involved, the clocks on the Web server and the client's cache must be synchronized; if they have a different idea of the time, the intended results won’t be achieved, and caches might wrongly consider stale content as fresh.
Another problem with Expires is that it’s easy to forget that you’ve set some content to expire at a particular time. If you don’t update an Expires time before it passes, each and every request will go back to your Web server, increasing load and latency.
So, do we need to use Expires anymore, or is Cache-Control (specifically, max-age set to some far future number of seconds) enough for my static content? I'd like to avoid using Expires, but should I set both?
Generally speaking, it's considered a best-practice to use both, as Expires will be understood by even HTTP/1.0 proxies and clients (rare though they may be).
Almost all server platforms will dynamically calculate the Expires header for you.

Ideal HTTP cache control headers for different types of resources

I want to find a minimal set of headers, that work with "all" caches and browsers (also when using HTTPS!)
On my web site, I'll have three kinds of resources:
(1) Forever cacheable (public / equal for all users)
Example: 0A470E87CC58EE133616F402B5DDFE1C.cache.html (auto generated by GWT)
These files are automatically assigned a new name, when they change content (based on the MD5).
They should get cached as much as possible, even when using HTTPS (so I assume, I should set Cache-Control: public, especially for Firefox?)
They shouldn't require the client to make a round-trip to the server to validate, if the content has changed.
(2) Changing occasionally (public / equal for all users)
Examples: index.html, mymodule.nocache.js
These files change their content without changing the URL, when a new version of the site is deployed.
They can be cached, but probably need a round-trip to be revalidated every time.
(3) Individual for each request (private / user specific)
Example: JSON responses
These resources should never be cached unencrypted to disk under no circumstances. (Except maybe I'll have a few specific requests that could be cached.)
I have a general idea on which headers I would probably use for each type, but there's always something I could be missing.
I would probably use these settings:
Cache-Control: max-age=31556926 – Representations may be cached by any cache. The cached representation is to be considered fresh for 1 year:
To mark a response as "never expires," an origin server sends an
Expires date approximately one year from the time the response is
sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one
year in the future.
Cache-Control: no-cache – Representations are allowed to be cached by any cache. But caches must submit the request to the origin server for validation before releasing a cached copy.
Cache-Control: no-store – Caches must not cache the representation under any condition.
See Mark Nottingham’s Caching Tutorial for further information.
Cases one and two are actually the same scenario.
You should set Cache-Control: public and then generate a URL with includes the build number / version of the site so that you have immutable resources that could potentially last forever.
You also want to set the Expires header a year or more in the future so that the client will not need to issue a freshness check.
For case 3, you could all of the following for maximum flexibility:
"Cache-Control", "no-cache, must-revalidate"
"Expires", 0
"Pragma", "no-cache"

Resources