What would explain a file served via a CDN being 7/8th the expected size?

What would explain a file served via a CDN being 7/8th the expected size? - cdn

I've got a Android .apk file that I want to distribute to a few 1000 devices but I don't want to put on the market, therefore I've decided to serve it via a CDN.
However the file I'm receiving via the CDN has been corrupted somehow. It doesn't seem to have merely been truncated, at least the bytes that I've examined in a hex dump are all different.
For what it's worth I'm setting the mime type to be 'application/vnd.android.package-archive', I think my problem might well be related to this.
It's suspicious to me that the file is close to 7/8th the original size - 1155060 vs 1321106 bytes. This makes me wonder if the file is getting treated as 7 bit ascii somewhere along the way...

Aha, I've just realised that the file from the CDN is actually a gzipped version of the original.
I think I'm on the right track in thinking that the mime type is confusing the CDN & it's defaulting to gzipping the file.
I'm guessing the 7/8th file size difference is probably just a coincidence.
Edit:
Yes, this was it, a bodgy fix was to set the mime type to application/zip, which presumably doesn't confuse the CDN (I've got a custom updater so I can get around the fact that mime type is wrong).

Related

How does nginx ssi work?

Cant find any info about it.. so I have few question
I need to use it to help solve the "big problem" of cache invalidation
1)if I understand right it need to search in every file it served to find if it need to include other files sound like very unperfomance?
2)does it fetch them one after another or all in the same time?
3)does it mean that if I have 3 esi in 1 file than my nginx will recive 3 more request for every request to this file?
4)is it in use in 2015?
cant find any info about it except the minimal nginx docs which dosent give any technical info thanks

Based on the documentation of ngx_http_ssi_module, it causes all responses going through to be scanned for SSI commands.
So it has nothing to do with files or anything. It doesn't matter from where the content is generated (plain file, reverse proxy, fastcgi/php), this module will analyze the generated content.
I don't think SSI will bring you major performance penalty (unless you are serving and analyzing big binary files). You can limit which content is scanned for SSI by MIME type (by default only text/html). If you need exact figures, you need to do performance tests with and without SSI.
If we are talking about including other files from your SSI commands, then by default all of them are fetched in parallel, so from time perspective it doesn't matter if you include 1 or 3 files. Of course with 3 files your server has more work to do.
Yes, having included more files/urls means more requests for nginx (unless those urls point to some other server).
The SSI module is still present in latest nginx release, but I don't know if any big company actually uses it. It all depends on what you want to achieve. I have still not understood how you want to do cache invalidation with SSI.

Stylus / Nib CSS with Node / Express secure app triggers Chrome warnings about insecure JavaScript

Console messages:
The page at [...] ran insecure content from http://fonts.googleapis.com/css?family=Quicksand.
and
The page at [...] displayed insecure content from http://themes.googleusercontent.com/static/fonts/quicksand/v2/sKd0EMYPAh5PYCRKSryvWz8E0i7KZn-EPnyo3HZu7kw.woff.
and
Resource interpreted as Font but transferred with MIME type font/woff: "http://themes.googleusercontent.com/static/fonts/quicksand/v2/sKd0EMYPAh5PYCRKSryvWz8E0i7KZn-EPnyo3HZu7kw.woff".
I know what caused it--vaguely. I just started implementing Stylus, Nib CSS modules.

In my Google research I found: http://mccormicky.com/1595/importing-google-web-fonts-lightspeed-web-store-ssl/
which makes it clear that the font requests should be switched to https for the whole thing to be considered secure, so then I looked into my /public/styles/style.styl file and found the offending line. One extra s should be sufficient to clear the warning. Indeed it's fine now.
It's really as simple as making the font request (found in the CSS styles file) over https instead of http.
I hadn't intended to be answering my own question (really thought the fix would be something complicated with the headers), but by the time I'd half finished it and done my due diligent research it was answerable. So, whatever.

Resource interpreted as Font but transferred with MIME type application/octet-stream

Based on research I did about this, it'd seem that some of my headers aren't set correctly, and I have no idea how to fix this. Do I need to change a few config files on the server?
Backstory: There's a website with a few pages hosted on another server (I know, it's horrible). The site is using a custom font. It works fine on the pages housed on the server holding the fonts, but the other server isn't loading the fonts. After poking around in Firebug I learned that when the page requests the font files, the server returns a status 200OK, but the response field is blank. This happens for all font formats (eot,woff,otf,ttf).
I really need some help on this. Anything would be greatly appreciated.

Doesn't both the font and the file need to be on the same server for this to work?
I know the 'Resource interpreted as Font but transferred with MIME type application/octet-stream' warning is because the mime type is not set in chrome yet and I generally don't worry about it. It is basically Chrome saying 'I don't know what this is'. You can set custom mime types to get rid of the warning if you want.

Custom webserver caching

I'm working with a custom webserver on an embedded system and having some problems correctly setting my HTTP Headers for caching.
Our webserver is generating all dynamic content as XML and we're using semi-static XSL files to display it with some dynamic JSON requests thrown in for good measure along with semi-static images. I say "semi-static" because the problems occur when we need to do a firmware update which might change the XSL and image files.
Here's what needs to be done: cache the XSL and image files and do not cache the XML and JSON responses. I have full control over the HTTP response and am currently:
Using ETags with the XSL and image files, using the modified time and size to generate the ETag
Setting Cache-Control: no-cache on the XML and JSON responses
As I said, everything works dandy until a firmware update when the XSL and image files are sometimes cached. I've seen it work fine with the latest versions of Firefox and Safari but have had some problems with IE.
I know one solution to this problem would be simply rename the XSL and image files after each version (eg. logo-v1.1.png, logo-v1.2.png) and set the Expires header to a date in the future but this would be difficult with the XSL files and I'd like to avoid this.
Note: There is a clock on the unit but requires the user to set it and might not be 100% reliable which is what might be causing my caching issues when using ETags.
What's the best practice that I should employ? I'd like to avoid as many webserver requests as possible but invalidating old XSL and image files after a software update is the #1 priority.

Are we working on the same project? I went down a lot of dead ends figuring out the best way to handle this.
I set my .html and my .shtml files (dynamic JSON data) to expire immediately. ("Cache-Control: no-cache\r\nExpires: -1\r\n")
Everything else is set to expire in 10 years. ("Cache-Control: max-age=290304000\r\n")
My makefile runs a perl script over all the .html files and identifies what you call "semi-static" content (images, javascript, css.) The script then runs a md5 checksum on those files and appends the checksum to the file:
<script type="text/Javascript" src="js/all.js?7f26be24ed2d05e7d0b844351e3a49b1">
Everything after the question mark is ignored, but no browser will cache it unless everything between the quotes matches.
I use all.js and all.css because everything's combined and minified using the same script.
Out of curiosity, what embedded webserver are you using?

Try Cache-Control: no-store. no-cache tells the client that the response can be cached; it just generally isn't reused unless the cache can't contact the origin server.
BTW, setting an ETag alone won't make the response cacheable; you should also set Cache-Control: max-age=nnn.
You can check how your responses will be treated with http://redbot.org/

ASP.NET: Legitimate architecture/HttpModule concern?

An architect at my work recently read Yahoo!'s Exceptional Performance Best Practices guide where it says to use a far-future Expires header for resources used by a page such as JavaScript, CSS, and images. The idea is you set a Expires header for these resources years into the future so they're always cached by the browser, and whenever we change the file and therefore need the browser to request the resource again instead of using its cache, change the filename by adding a version number.
Instead of incorporating this into our build process though, he has another idea. Instead of changing file names in source and on the server disk for each build (granted, that would be tedious), we're going to fake it. His plan is to set far-future expires on said resources, then implement two HttpModules.
One module will intercept all the Response streams of our ASPX and HTML pages before they go out, look for resource links and tack on a version parameter that is the file's last modified date. The other HttpModule will handle all requests for resources and simply ignore the version portion of the address. That way, the browser always requests a new resource file each time it has changed on disk, without ever actually having to change the name of the file on disk.
Make sense?
My concern relates to the module that rewrites the ASPX/HTML page Response stream. He's simply going to apply a bunch of Regex.Replace() on "src" attributes of <script> and <img> tags, and "href" attribute of <link> tags. This is going to happen for every single request on the server whose content type is "text/html." Potentially hundreds or thousands a minute.
I understand that HttpModules are hooked into the IIS pipeline, but this has got to add a prohibitive delay in the time it takes IIS to send out HTTP responses. No? What do you think?

A few things to be aware of:
If the idea is to add a query string to the static file names to indicate their version, unfortunately that will also prevent caching by the kernel-mode HTTP driver (http.sys)
Scanning each entire response based on a bunch of regular expressions will be slow, slow, slow. It's also likely to be unreliable, with hard-to-predict corner cases.
A few alternatives:
Use control adapters to explicitly replace certain URLs or paths with the current version. That allows you to focus specifically on images, CSS, etc.
Change folder names instead of file names when you version static files
Consider using ASP.NET skins to help centralize file names. That will help simplify maintenance.
In case it's helpful, I cover this subject in my book (Ultra-Fast ASP.NET), including code examples.

He's worried about stuff not being cached on the client - obviously this depends somewhat on how the user population has their browsers configured; if it's the default config then I doubt you'd need to worry about trying to second guess the client caching, it's too hard and the results aren't guaranteed, also it's not going to help new users.
As far as the HTTP Modules go - in principle I would say they are fine, but you'll want them to be blindingly fast and efficient if you take that track; it's probably worth trying out. I can't speak on the appropriateness of use RegEx to do what you want done inside, though.
If you're looking for high performance, I suggest you (or your architect) do some reading (and I don't mean that in a nasty way). I learnt something recently which I think will help -let me explain (and maybe you guys know this already).
Browsers only hold a limited number of simultaneous connections open to a specific hostname at any one time. e.g, IE6 will only do 6 connections to say www.foo.net.
If you call your images from say images.foo.net you get 6 new connections straight away.
The idea is to seperate out different content types into different hostnames (css.foo.net, scripts.foo.net, ajaxcalls.foo.net) that way you'll be making sure the browser is really working on your behalf.

http://code.google.com/p/talifun-web
StaticFileHandler - Serve Static Files in a cachable, resumable way.
CrusherModule - Serve compressed versioned JS and CSS in a cachable way.
You don't quite get kernel mode caching speed but serving from HttpRuntime.Cache has its advantages. Kernel Mode cache can't cache partial responses and you don't have fine grained control of the cache. The most important thing to implement is a consistent etag header and expires header. This will improve your site performance more than anything else.
Reducing the number of files served is probably one of the best ways to improve the speed of your website. The CrusherModule combines all the css on your site into one file and all the js into another file.
Memory is cheap, hard drives are slow, so use it!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex