How in Flex to load cached preloaded images - apache-flex

In my application, I make numerous calls to preload images to brower cache in the background using Loader instances and ignore the complete event. I don't store the results in the application, but rather want to store them in the browser cache. The images have long Expires header dates.
When I want to use a particular image(s), I again use a Loader instance and call the same url and listen for the complete event to load the file to an Image.
The problem is that when I re-request the url for the "cached" image, it is making an http request with response 200 status, which I presume means it is hitting the server.
How do I make sure that a request for a cached image never hits the server from Flex?
In general, I am finding that any request to a url for a cached image (with a long expires header) is making another request to the server, or at least that is my interpretation of it in Firebug.
Any ideas how to do this? Or am I misinterpreting what Firebug is telling me?
Thanks.

So, yes I was misinterpreting firebug. It turns out that firebug logs the url request and it looks like a normal request. However, if you monitor the network with a network monitor like wireshark you will notice that there are no outgoing packets to the url for the cached images. Flex does load the cached images.
Just to be safe on the image caching, I added a the following Cache-Control header (though I think the expires is enough. One year out at the time of posting this).
Cache-Control: max-age=31536000, must-revalidate
Expires: Thu, 01 Dec 2011 16:00:00 GMT
So, if you set cache headers correctly (note that if the date is not a valid date in the expires header, it does not work), flex will load from cache when you call the url to a cached image.

Related

Caching strategy using ETag and Expires/Cache-control with no assets version/ID

After reading a lot about caching validators (more intensively after reading this answer on SO), I had a doubt that didn't find the answer anywhere.
My use-case is to serve a static asset (a javascript file, ie: https://example.com/myasset.js) to be used in other websites, so messing with their Gpagespeed/gmetrix score matters the most.
I also need their users to receive updated versions of my static asset every time I deploy new changes.
For this, I have the following response headers:
Cache-Control: max-age=10800
etag: W/"4efa5de1947fe4ce90cf10992fa"
In short, we can see the following flow in terms of how browser behaves using etag
For the first request, the browser has no value for the If-None-Match Request Header, so the Server will send back the status code 200 (Ok), the content itself, and a Response header with ETag value.
For the subsequent requests, the browser will add the previously received ETag value in a form of the If-None-Match Request Header. This way, the server can compare this value with the current value from ETag and, if both match, the server can return 304 (Not Modified) telling the browser to use the latest version of the file, or just 200 followed by the new content and the related ETag value instead.
However, I couldn't find any information in regards to using the Cache-Control: max-age header and how will this affect the above behavior, like:
Will the browser request for new updates before max-age has met? Meaning that I can define a higher max-age value (pagespeed/gmetrix will be happy about it) and force this refresh using only etag fingerprint.
If not, then what are the advantages of using etag and adding extra bits to the network?
No, the browser will not send any requests until max-age has passed.
The advantage of using ETag is that, if the file hasn't changed, you don't need to resend the entire file to the client. The response will be a small 304.
Note that you can achieve the best of both worlds by using the stale-while-revalidate directive, which allows stale responses to be served while the cache silently revalidates the resource in the background.

Checking if HTTP resource has changed after maximum cache time has expired

I'm trying to work out a new caching policy for the static resources on a website. A common problem is whenever javascript, CSS etc. is updated, many users hold onto stale versions because currently there are no caching specific HTTP headers included in the file responses.
This becomes a serious problem when, for example, the javascript updates are linked to server-side updates, and the stale javascript chokes on the new server responses.
Eliminating browser caching completely with a cache-control: max-age=0, no-cache seems like overkill, since I'd still like to take some pressure off the server by letting browsers cache temporarily. So, setting the cache policy to a maximum of one hour seems alright, like cache-control: max-age=3600, no-cache.
My understanding is that this will always fetch a new copy of the resource if the cached copy is older than one hour. I'd specifically like to know if it's possible to set a HTTP header or combination of headers that will instruct browsers to only fetch a new copy if the resource was last checked more than one hour ago AND if the resource has changed.
I'm just trying to avoid browsers blindly fetching new copies just because the cached resource is older than one hour, so I'd also like to add the condition that the resource has been changed.
Just to illustrate further what I'm asking:
New user arrives at site and gets fresh copy of script.js
User stays on site for 45 mins, browser uses cached copy of script.js all the time
User comes back to site 2 hours later, and browser asks the server if script.js has changed
If it has, then it gets a fresh copy and the process repeats
If it has not changed, then it uses the cached copy for the next hour, after which it will check again
Have I misunderstood things? Is what I'm asking how it actually works, or do I have to do something different?
Have I misunderstood things? Is what I'm asking how it actually works,
or do I have to do something different?
You have some serious misconceptions about what the various cache control directives do and why cache behaves as it does.
Eliminating browser caching completely with a cache-control:
max-age=0, no-cache seems like overkill, since I'd still like to take
some pressure off the server by letting browsers cache temporarily ...
The no-cache option is wrong too. Including it means the browser will always
check with the server for modifications to the file every time.
That isn't what the no-cache means or what it is intended for - it means that a client MUST NOT used a cached copy to satisfy a subsequent request without successful revalidation - it does not and has never meant "do not cache" - that is what the no-store directive is for
Also the max-age directive is just the primary means for caches to calculate the freshness lifetime and expiration time of cached entries. The Expires header (minus the value of the Date header can also be used) - as can a heuristic based on the current UTC time and any Last-Modified header value.
Really if your goal is to retain the cached copy of a resource for as long as it is meaningful - whilst minimising requests and responses you have a number of options.
The Etag (Entity Tag) header - this is supplied by the server in response to a request in either a "strong" or "weak" form. It is usually a hash based on the resource in question. When a client re-requests a resource it can pass the stored value of the Etag with the If-None-Match request header. If the resource has not changed then the server will respond with 304 Not Modified.
You can think Etags as fingerprints for resources. They can be used to massively reduce the amount of information sent over the wire - as only fresh data is served - but they do not have any bearing on the number of times or frequency of requests.
The last-modified header - this is supplied by the server in response to a request in HTTPdate format - it tells the client the last time the resource was modified.
When a client re-requests a resource it can pass the stored value of the last-modified header with the If-Modified-Since request header. If the resource has not changed since the time it was last modified then the server will respond with 304 Not Modified.
You can think of last modified as a weaker form of entity checking than Etags. It addresses the same problem (bandwidth/redundancy) it in a less robust way and again it has no bearing at all on the actual number of requests made.
Revving - a technique that use a combination of the Expires header and the name (URN) of a resource. (see stevesouders blog post)
Here one basically sets a far forward Expires header - say 5 years from now - to ensure the static resource is cached for a long time.
You then have have two options for updating - either by appending a versioning query string to the requests URL - e.g. "/mystyles.css?v=1.1" - and updating the version number as and when the resource changes. Or better - versioning the file name itself e.g. "/mystyles.v1.1.css" so that each version is cached for as long as possible.
This way not only do you reduce the amount of bandwidth - you will as eliminate all checks to see if the resource has changed until you rename it.
I suppose the main point here is none of the catch control directives you mention max-age, public, etc have any bearing at all on if a 304 response is generated or not. For that use either Etag / If-None-Match or last-modified / If-Modified-Since or a combination of them (with If-Modified-Since as a fallback mechanism to If-None-Match).
It seems that I have misunderstood how it works, because some testing in Chrome has revealed exactly the behavior that I was looking for in the 5 steps I mentioned.
It doesn't blindly grab a fresh copy from the server when the max-age has expired. It does a GET, and if the response is 304 (Not Modified), it continues using the cached copy until the next hour has expired, at which point it checks for changes again etc.
The no-cache option is wrong too. Including it means the browser will always check with the server for modifications to the file every time. So what I was really looking for is:
Cache-Control: public, max-age=3600

304 Status code for my Azure CDN images hosted in Azure Blob Storage

My images are stored in azure blob storage and referenced through my web application using my azure CDN. However all images return a 304 response header. Ideally I dont want the browser to return to the CDN to check for validity at every request, instead for the browser to always use the cache. - Well for at the life of the image cache.
With my limited knowledge of Caching, I understand that the cache uses the ETag value to compare if the version of the image is the same when requested. In this case it is and the CDN returns a 304 response. But because the CacheControl header is set as public, max-age=2592000 I would hope the browser would use the cached copy of the image. I have another CDN setup that has a hosted service endpoint which returns a 200 response because I remove the ETag value.
Any help with this would be greatly appreciated.
When ETag "triggers" 304 response => the browser has sent If-None-Match validating request to the server. This is normally done after max-age has elapsed. You could find a good description of this here:
https://stackoverflow.com/a/500103/2550808
it is also worth mentioning, Firefox browser settings should be set to default: go to about:config page and check this settings: http://kb.mozillazine.org/Browser.cache.check_doc_frequency
Going back to your question, something might be wrong with Cache-Control header the server returns to the browser. In my modest personal experience I didn't encounter explicitly public version of the header, it would be more likely just this:
Cache-Control: max-age=3600, must-revalidate
Anyway, here is pretty good description of headers pertaining to caching:
https://www.mnot.net/cache_docs/
Alternatively, there might be other reasons for incessant re-validation to consider:
VARY headers in server's 200 response with the file may affect caching;
JavaScript calling reload on the location object, passing TRUE for bReloadSource;

Redirect browser to another location and force refresh

I would like to kindly ask you for a suggestion regarding browser cache invalidation.
Let's assume we've got an index page that is returned to the client with http headers:
Cache-Control: public, max-age=31534761
Expires: Fri, 17 Feb 2012 18:22:04 GMT
Last-Modified: Thu, 17 Feb 2011 18:22:04 GMT
Vary: Accept-Encoding
Should the user try to hit that index page again, it is very likely that the browser won't even send a request to the server - it will just present the user with the cached version of the page.
My question is: is it possible to create a web resource (for instance at uri /invalidateIndex) such that when a user hits that resource he is redirected to the index page in a way that forces the browser to invalidate its cache and ask the server for fresh content?
I'm having similar problems with a project of my own, so I have a few suggestions, if you haven't already found some solution...
I've seen this as a way jQuery forces ajax requests not to be cached: it adds a HTTP parameter to the URL with a random value or name, so that each new request has essentialy a different URL and the browser then never uses the cache. You could actually have the /invalidateIndex URI redirect to such a URL. The problem of course is that the browser never actually invalidates the original index URL, and that the browser will always re-request your index.
You could of course change the http header Cache-Control with a smaller max-age, say down to an hour, so that the cache is invalidated every hour or so
And also, you could use ETags, wherein the cached data have a tag that will be sent with each request, essentially asking the server if the index has changed or not.
2, 3 can be even combined I think...
There is no direct way of asking a browser to purge its cache of a particular file, but if you have only a few systems like this and plenty of bandwidth, you could try returning large objects on the same protocol, host, and port so that the cache starts evicting old objects. See https://bugzilla.mozilla.org/show_bug.cgi?id=81640 for example.

Should I remove Etag for htm and php pages?

I generate htm files dynamically using php and .htaccess. I read somewhere that I should remove Etags for files of type text/html? Is that correct? I am wondering if I use etags and If i don't change the content, I could save some bandwidth. I would appreciate if you guys could tell me if I can use etags for htm files.
As far as i know the Etag is an http header is generated by the HTTP server used by the cache system.
The idea:
You ask to stackoverflow.com the image logo.png
stackoverflow.com will answer to you with a HTTP 304 (content not modified, etag: XXXXXX)
Your browser before ask the image again will check the cache for a resource called: logo.png, from the website: stackoverflow, with the etag: XXXXXXX
If the browser find it, it will load the image from the cache, without downloading
If it can't find it, it will ask again to the web server to download it.
so... for what propose you want use the ETags ?
If you want understand more about the ETags could be interesting download HttpFox for firefox.
Apache have his own cache system and it's used when you download or require any "static" download, like html files and images.
If you want do it on dynamic context you must implement it by yourself.
Etags can be usefull speeding up your website, even with dynamic content (like php scripts).
Especially on mobile connections this is important, since connection speed is slower.
I use ETag headers on some mobile websites like this:
https://gist.github.com/oliworx/4951478
Hint: You must not include curent time or other often changing content in the page, because this prevents it from beeing cached by the client (the browser).
Remove Etags if possible
The best cache method is max-age. W3C mandates that Browsers must use max-age if available.
When max-age is used the Browser will use the cached version and not even query the Server.
This also means if you are replacing a resource on you web page (e.g. CSS, JS, IMG, link), you should rename the resource.
The next best caching method is Expires.
In every PHP page with an echo it is not a bad idea to always include a max-age header.
header('Cache-Control: max-age=31536000');
These are wise also, (Example Content Type is only for HTML)
header('Content-Type: text/html; charset=utf-8');
header('Connection: Keep-Alive');
header('Keep-Alive: timeout=50, max=100');
eTag has no expiration. The resource must be checked every time.
If you are using max-age or Expires, the Browser will not make an HTTP request to check the resource.
When included with max-age and or expires, it is a waste of hearer space and wastes a few Server CPU cycles to generate or lookup the eTag Value.
The problem with eTag is unless the resource is very large it will have little benefit. In an HTTP Request, the time required for Transmission of the data is often minimal compared to the connect and wait times.
With eTag the Browser still has to do an HTTP Request. When the eTag has not changed, then then the response is 304.
Here is a typical HTTP request:
Only 3 milliseconds to download 2.9KB
454 milliseconds request time. + 58ms DNS (very fast)
DNS Lookup: 58 ms
Initial Connection: 192 ms
Time to First Byte: 262 ms
Content Download: 3 ms
Bytes In (downloaded): 2.9 KB
eTag would save 3 milliseconds.
If resource was cached, it would have freed the connection for another resource in addition to saving the 400-500 ms.
Here is a 301 response from Intel
441 ms
DNS Lookup: 103 ms
Initial Connection: 219 ms
Time to First Byte: 222 ms
Content Download: ms
Bytes In (downloaded): 0.1 KB

Resources