"Uncache" a resource - http

Using the browser cache offers the big advantage of saving a lot of traffic and making your site faster. However, the big disadvantage is that cached resources cannot be "uncached", until they expire and the browser requests the resources again. Or can they?
Is there a way to explicitly tell the browser (in a seperate request or JavaScript, etc.) to uncache a certain resource?
I know of appending version strings like image.jpg?12342, but I'm looking for a more elegant alternative.
E-Tags are a cool thing, but they don't really cache. We may save sending the actual resource payload, but the browser still does a request.

You might want to check out a cache manifest especially if you don't have access to the server.
Yes it is primarily used to "cache" files for using web apps offline however, you can also expliticly declare certain URI's to be non-cached and even versioning your cache manifest will trigger a reload of noted URI's.
CACHE MANIFEST
# Version x
NETWORK:
uri-path.here
http://html5doctor.com/go-offline-with-application-cache/

Related

How to display the cached version first and check the etag/modified-since later?

With caching headers I can either make the client not check online for updates for a certain period of time, and/or check for etags every time. What I do not know is whether I can do both: use the offline version first, but meanwhile in the background, check for an update. If there is a new version, it would be used next time the page is opened.
For a page that is completely static except for when the user changes it by themselves, this would be much more efficient than having to block checking the etag every time.
One workaround I thought of is using Javascript: set headers to cache the page indefinitely and have some Javascript make a request with an If-Modified-Since or something, which could then dynamically change the page. The big issue with this is that it cannot invalidate the existing cache, so it would have to keep dynamically updating the page theoretically forever. I'd also prefer to keep it pure HTTP (or HTML, if there is some tag that can do this), but I cannot find any relevant hits online.
A related question mentions "the two rules of caching": never cache HTML and cache everything else forever. Just to be clear, I mean to cache the HTML. The whole purpose of the thing I am building is for it to be very fast on very slow connections (high latency, low throughput, like EDGE). Every roundtrip saved is a second or two shaved off of loading time.
Update: reading more caching resources, it seems the Vary: Cookie header might do the trick in my case. I would like to know if there is a more general solution though, and I didn't really dig into the vary-header yet so I don't know yet if that works.
Solution 1 (HTTP)
There is a cache control extension stale-while-revalidate which describes exactly what you want.
When present in an HTTP response, the stale-while-revalidate Cache-
Control extension indicates that caches MAY serve the response in
which it appears after it becomes stale, up to the indicated number
of seconds.
If a cached response is served stale due to the presence of this
extension, the cache SHOULD attempt to revalidate it while still
serving stale responses (i.e., without blocking).
cache-control: max-age=60,stale-while-revalidate=86400
When browser firstly request the page it will cache result for 60s. During that 60s period requests are answered from the cache without contacting of the origin server. During next 86400s content will be served from the cache and fetched from origin server simultaneously. Only if both periods 60s+86400s are expired cache will not serve cached content but wait for origin server to fresh data.
This solution has only one drawback. I was not able to find any browser or intermediate cache which currently supports this cache control extension.
Solution 2 (Javascript)
Another solution is usage of Service workers with its feature to make custom responses to requests. With combination with Cache API it is enough to provide the requested feature.
The problem is that this solution will work only for browsers (not intermediate caches nor another http services) and even not all browsers supports Services workers and Cache API.

Long waiting (TTFB) time for scripts / styles on Azure Website

I have this intriguing problem on Azure Website. My website uses 4 script files and 3 style files, each minified. They are not so big, bigest has near 200 KBs. Website had already started. Azure's Always On option is turned on. When I call to WebApi for data it returns in <50ms.
And when app is reloaded it needs 250 ms just to get first byte from tiniest script, and others needs much more. Initial Html is loaded in 60 ms. Scripts/styles are cached so they are not downloaded, but the TTFB time is killing the performance. This repeats every single reload. App is not containing any sophisticated configuration so it should run much faster than it.
What can cause such problems?
Although your static files are cached, the browser still issues requests with if-modifies-since header (which results in a 304).
While it doesn't need to download the actual content, it still needs to wait the RTT + server think time to continue.
I would suggest two things:
Adding Cache-Control and Expire headers - will help avoid 304 in some cases (pretty much unless you hit F5)
Using a proper CDN - such as Incapsula or others, that will minimize the RTT + think time. It can also be used to easily control cache settings for various resources.
More good stuff here.
Good Luck!
From here:
As you saw earlier, IIS 7 caches the compressed versions of static
files. So, if a request arrives for a static file whose compressed
version is already in the cache, it doesn’t need to be compressed
again.
But what if there is no compressed version in the cache? Will IIS 7
then compress the file right away and put it in the cache? The answer
is yes, but only if the file is being requested frequently. By not
compressing files that are only requested infrequently, IIS 7 saves
CPU usage and cache space.
By default, a file is considered to be requested frequently if it is
requested two or more times per 10 seconds.
So, the reason your users are being served an uncompressed version of the javascript file is because it didn't meet the default threshold for being compressed; in other words, the javascript file was not requested 2 times within 10 seconds.
To control this, there is one attribute we must change on the <serverRuntime> element, which controls compression: frequentHitThreshold. In order for your file to be compressed when it is requested once, change your <serverRuntime> element to look like this:
<serverRuntime enabled="true" frequentHitThreshold="1" />
This will slightly impact your CPU performance if you have many javascript files that are being served and you have users quite often, but likely if you have users often enough to impact CPU from compressing these files, then they are already compressed and cached!
My guess would be Azures always on.
If it works anything like the one CloudFlare provides, it essentially proxies the request and tries to cache it.
Depending on the exact implementation of this cache on the side of Azure, it might wait for the scripts output to complete to cache it/validate the cache and then pass it on to the browser.
You might have a chance checking the caching configuration and disable always on for your scripts if possible.
The scripts and styles are static files and by default are compressed (you can check this with HTTP header "content-encoding": gzip) before being sent to client. So, the TTFB consists of network latency, browser HTTP channel scheduling and the static file compression time from server.
On the other hand, your Web API data is dynamic data and by default is not compressed, so possible its TTFB is less than the TTFB for static files.
However, you don't need to switch off static compressing, otherwise TTFB is minimized but content transferring time will be extended. Actually, you don't need to worry about TTFB, see more info: https://blog.cloudflare.com/ttfb-time-to-first-byte-considered-meaningles/
I finished with storing files on Azure Storage and serving them by Azure CDN. It provides high speed of response and costs nothing. I add them to blob every publish, in Pre-build event by Gulp.
well... there are 2 main problems with your site:
you are using AZURE - a high priced service with a poor performance.... don't ask me why people think that this is a good service
you are storing client files side-by-side with the server files.. while server files should be stored in a specific server, client files can practically can be served from... everywhere
so - please use a CDN (or any other server) for your client side files (mainly css and js, you may consider moving fonts and images as well)

Are "forever" and "never" the only useful caching durations?

In the presentation "Cache is King" by Steve Souders (at around 14:30), it is implied that there are in practice only two caching durations that you should use for your resources: "forever" and "never" (my own terminology).
"Forever" means that you effectively make the resource permanently immutable by setting a very high max age, such as one year. If you want to modify the resource at some point, the presentation suggests, you simply publish the modified resource at a different URL. (It is suggested that this renaming is necessary, in part or entirely, because of the large number of misconfigured proxies on the Internet.)
"Never" means that you effectively disable all forms of caching and require browsers to download the resource every time it is requested.
On the one hand, any performance advice given by the head performance engineer at Google carries weight on its own. On the other hand, HTTP caching was presumably designed with variable cache durations for a reason (not just "forever" and "never"), and changing the URL to a resource only because the resource has been modified seems to go against the spirit of HTTP.
Are "forever" and "never" the only cache durations that you should use in practice? Is this in conflict with other best practices on the web?
In addition to the typical "user with a browser" use case, I would also like to know how these principles apply to REST/hypermedia APIs.
Many people would disagree with limiting yourself to "forever" or "never" as you describe it.
For one thing, it ignores the option of allowing caching with always revalidating. In this case, if the client (or proxy) has cached the resource, it sends a conditional HTTP request. If the client/proxy has cached the latest version of the resource, then the server sends a short 304 response rather than the entire resource. If the client's (proxy) copy is out of date, then the server sends the entire resource.
With this scheme, the client will always get an up-to-date version of the resource, and if the resource doesn't change much bandwidth will be saved.
To save even more bandwidth, the client can be instructed to revalidate only when the resource is older than a certain period of time.
And if bad proxies are a problem, the server can specify that only clients and not proxies may cache the resource.
I found this document pretty concisely describes your options for caching. This page is longer but also gives some excellent information.
"It depends" really, on your use case, what you are trying to achieve, and your branding proposition.
If all you want to achieve is some bandwidth saving, you could do a total cost breakdown. Serving cost might not amount to much. Browsers are anyway pretty smart at optimizing image hits, for example, so understand your HTTP protocol. Forever, combined with versioned resource url, and url rewrite rules might be a good fit, like your Google engineer suggested.
Resource volatility is another. If you are only serving daily stock charts for example, it could safely be cached for some time but not forever.
Are your computation costs heavy? Are your users sensitive to timeliness? Is data live or fixed? For example, you might be serving airline routes, path of a hurricane, option greeks or a BI report to COO. You might want to have it cached, but the TTL will likely vary by user class, all the way down to never. Forever cannot work for live data but never might be a wrong answer too.
Degree of cooperation between the server and the client may be another factor. For example in a business operations environment where procedures can be distributed and expected to be followed, it might be worthwhile to again look at TTLs.
HTH. I doubt if there is a magical answer.
Idealy, you muste cache until the content changes, if you cannot clear/refresh the cache when content changes for any reason, you need a duration. But indeed, if you can, cache forever or do not cache. No need to refresh if you already know nothing changed.
If you know that the underlying data will be static for any length of time, caching makes sense. We have a web service that exposes data from a database that is populated by a nightly ETL job from an external source. Our RESTful web service only goes to the database when it changes. In our case, we know exactly when the data changes and we invalidate the cache right after the ETL process finishes.

Advantages of ETags verses updating URL

ETags allow browsers to perform conditional GETs. Only if the resource in question has been altered will the resource have to be re-downloaded. However, the browser still has to wait for the server to respond to its request.
An alternative to ETags is to introduce a token into the URL pointing to the resource:
http://example.com/css/styles.css?token=134124134
or
http://example.com/css/134124134/styles.css
Both approaches avoid having to re-download an unchanged resource.
However, using URLs with tokens allows the server to set a far-future expiry header on the resource. This saves the round trip taken up by a conditional GET - if the resource is unchanged then the URL pointing to it will be unchanged.
Are there any advantages to using ETags over URLs with tokens?
The major downside for read-only resources that I see is that if we all took this approach for all static resources then client caches would start to fill with all sorts of out-dated resources.
Also, think of all the intermediary caches that would start holding loads of useless files.
You are fighting against the web with this approach and if it became popular then something would have to change because it is not a scalable solution.
Could there be some kind of hybrid approach where you use a limited set of tokens and set the expiry small enough that an old cached resource would expire before the token was reused?
Etags are also used for read-write resources and in this case the I suspect the token solution just does not work.
I think the biggest difference/potential advantage would be configuration; The URL setting must be configured/setup inside the application (eg, the HTML actually must include the value). ETags are configured for the entire web server, and the HTML doesn't have to be modified to take advantage of them.
Also, the ETags will (assuming they are configured correctly) change when the file pointed at changes; Adding a token to the URL will require some additional "thing" that tells it to change (either a person editing the HTML or some configuration setting, etc).
Have a constant URI?

Single web server and ETags

Does anyone know if it is worth disabling ETags on an web application that is hosted on a single web server? Currently we don't make use of ETags in our application.
If it is worth disabling them - why?
Many thanks.
I don't know if this helps, but you can read about etags here:
http://developer.yahoo.net/blog/archives/2007/07/high_performanc_11.html
and here is what Jeff Atwood thinks about ETags:
ETags are a checksum field served up
with each server file so the client
can tell if the server resource is
different from the cached version the
client holds locally. Yahoo recommends
turning ETags off because they cause
problems on server farms due to the
way they are generated with
machine-specific markers. So unless
you run a server farm, you should
ignore this guidance. It'll only make
your site perform worse because the
client will have a more difficult time
determining if its cache is stale or
fresh. It is possible for the client
to use the existing last-modified date
fields to determine whether the cache
is stale, but last-modified is a weak
validator, whereas Entity Tag (ETag)
is a strong validator. Why trade
strength for weakness?
also interview with Steve Souders at .NET Rocks may help:
Steve Souders: ... the default implementation of
IIS and Apache, they put both of
those servers, put something in the
e-tag that will make it very likely
that if the user ever has to check
the validity of that resource, the
browsers are going to be incorrectly
told that the resource is no longer
valid. So in Apache’s case, what they
put in the e-tag is the INO number of
the file on that web server so that
if you have more than one web servers
hosting your site which most large
websites do, that INO number is never
going to match across two servers so
if yesterday the user went to server
one and today they tried to validate
that resource and they go to server
2, the e-tag is not going to match,
e-tag overrides last modified date so
instead of just returning a 200-byte
304 response, the server has to
return a 50k response of the entire
image.
"If you’re hosting your website on one server, it isn’t necessary to remove ETags. The same ETag will be used every time and the validation check will take place efficiently and correctly."
Source: Dean Hume

Resources