IIS6 not doing gzip compression when including Via header in request - http

I have some static content going through a CDN. I am using IIS6's built in compression (gzip & deflate) for static content and this is working fine when I request it. However, when the CDN makes the initial request for the content, it is not being returned compressed. They therefore don't have compressed content to forward to people requesting it. (Yes this raises the issue of people requesting [the zipped] content from the CDN with a browser that can't handle the compression. - We'll put that to one side for now though)
Here's an example of requesting without the 'Via' header:
HEAD /flash/swfobject.js HTTP/1.1
User-Agent: curl/7.19.7 (i386-pc-win32)
Host: localhost:9120
Accept: */*
Connection: Keep-Alive
accept-encoding: gzip
And it returns a compressed response:
HTTP/1.1 200 OK
Content-Length: 4357
Content-Type: application/x-javascript
Content-Encoding: gzip
Expires: Wed, 01 Jan 2020 00:00:00 GMT
Last-Modified: Wed, 18 Nov 2009 15:36:52 GMT
Accept-Ranges: bytes
Vary: Accept-Encoding
Server: Microsoft-IIS/6.0
Date: Thu, 19 Nov 2009 10:27:50 GMT
However, if I include a 'Via' header in the request (as the CDN does) then the result comes back uncompressed:
Request:
HEAD /flash/swfobject.js HTTP/1.1
User-Agent: curl/7.19.7 (i386-pc-win32)
Host: localhost:9120
Accept: */*
Connection: Keep-Alive
Via: 1.1 204.160.105.17:80 (Footprint 4.5/FPMCP)
accept-encoding: gzip
Response:
HTTP/1.1 200 OK
Content-Length: 14602
Content-Type: application/x-javascript
Expires: Wed, 01 Jan 2020 00:00:00 GMT
Last-Modified: Wed, 18 Nov 2009 15:36:54 GMT
Accept-Ranges: bytes
Server: Microsoft-IIS/6.0
Date: Thu, 19 Nov 2009 10:29:52 GMT
Yes these demos use 'localhost' in the request. I get the same result using the actual domain name from various machines on various networks though.
Two questions then:
Could this be IIS not applying the compression due to the extra header? and if so what can I do about it?
How can I tell if the proxy is decompressing the content before returning it?
Bonus question 3 - What can I do to investigate this problem further?
I am aware of question 332049, but that has the header in the response, not the request.

I stumbled across your question while researching this myself. I uncovered an article on MSDN and the short answer is that the Via header is used for proxies and proxies typically mess up compression. You either have the option of removing the header or you can change the setting in the IIS metabase (HcNoCompressionForProxies="FALSE"). I had success with both options.

Related

Where does this Content-Length come from?

I visit the page http://furryfaust.com/waveform and it redirects to http://furyfaust.com/waveform/ with this initial response.
HTTP/1.1 301 Moved Permanently
Location: /waveform/
Date: Sat, 09 Jul 2016 21:32:56 GMT
Content-Length: 45
Content-Type: text/html; charset=utf-8
I found this out using chrome developer tools. I'm curious where the content-length is determined from since there is no response message body.
The page is served by a web framework called gin (https://github.com/gin-gonic/gin).
There is a content. The developer-tools don't show it for some reason, but using telnet I got the following response:
HTTP/1.1 301 Moved Permanently
Location: /waveform/
Date: Sat, 09 Jul 2016 21:40:38 GMT
Content-Length: 45
Content-Type: text/html; charset=utf-8
Moved Permanently.
Why the dev-tools don't show it? I don't know. Probably because Chrome doesn't even read the content because it'll redirect anyway.

HTTP caching does not work

Opening same URL from browser, and the server returns below header.
Repeating again, why it always give HTTP 200, instead of 304.
Any idea?
HTTP/1.1 200 OK Server: Cowboy Connection: keep-alive X-Powered-By:
Express Content-Type: text/html Cache-Control: public, max-age=600
Date: Sat, 07 Nov 2015 04:41:50 GMT Transfer-Encoding: chunked Via:
1.1 vegur

nginx is slower than apache when serving pdf file, anything else is faster

I have one PDF file, put it on two different servers (same hardware spec), one running Apache 2.2, the other running Nginx 1.8.0. When using IE 11, Nginx got hung, Apache went through. When using FF/Chrome, both went through but Apache is faster.
Second, I've added "pdf" into this Nginx config to force browsers to cache pdf files, but it seems, the browsers always ignore it and always download the full file every time it is accessed.
location ~* \.(ico|css|js|gif|jpe?g|png|ttf|svg|eot|woff|webm|mp3|mp4|pdf|txt|xml)(\?[0-9]+)?$ {
expires max;
log_not_found off;
access_log off;
}
How to expedite pdf-file download in Nginx, and how to force the browser to cache it?
UPDATED: Response Header Nginx
HTTP/1.1 200 OK
Server: nginx/1.8.0
Date: Thu, 28 May 2015 16:39:53 GMT
Content-Type: application/pdf
Content-Length: 14709260
Last-Modified: Wed, 27 May 2015 20:35:11 GMT
Connection: keep-alive
Etag: "55662a7f-e0720c"
Expires: Thu, 31 Dec 2037 23:55:55 GMT
Cache-Control: max-age=315360000
Accept-Ranges: bytes
Response Header Apache:
HTTP/1.1 200 OK
Date: Thu, 28 May 2015 17:29:41 GMT
Server: Apache/2.2.9 (Unix) DAV/2 PHP/5.2.6 mod_jk/1.2.26
Last-Modified: Wed, 27 May 2015 20:57:52 GMT
Etag: "196c55-e0720c-517167f19423e"
Accept-Ranges: bytes
Content-Length: 14709260
Keep-Alive: timeout=5, max=85
Connection: Keep-Alive
Content-Type: application/pdf

CloudFront ignores Cache-Control headers

CloudFront ignores my cache header and my pictures have to be picked up from the server again after a while.
~$ curl -I http://d2573vy43ojbo7.cloudfront.net/attachments/store/limit/64/3720c5574063aebc90511061b99de858740ad764c6981d2bf30ff121ada0/image.jpg
HTTP/1.1 200 OK
Content-Type: image/jpeg
Content-Length: 1645
Connection: keep-alive
Server: nginx/1.4.1
Date: Thu, 12 Feb 2015 14:37:41 GMT
Status: 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers:
Access-Control-Allow-Method:
Cache-Control: public, must-revalidate, max-age=31536000
Expires: Fri, 12 Feb 2016 14:37:41 GMT
Content-Disposition: inline; filename="image.jpg"
Last-Modified: Thu, 12 Feb 2015 14:37:41 GMT
X-Content-Type-Options: nosniff
X-Request-Id: 239b0fda-cae9-452f-9d1b-ccbf035bbf69
X-Runtime: 3.457939
X-Cache: Miss from cloudfront
Via: 1.1 6cde3c778df412041adc7610331b57bc.cloudfront.net (CloudFront)
X-Amz-Cf-Id: yicAkZYc5XpowKRFMOXDKSJKBMWZ4kq2B3vLK8Q-Py124D8lQq_1lg==
I tried to get the same file yesterday and then it was the same, after the second time i tried it was reached and served by CloudFront but not anymore. It's the same for all my images. They are cached but are removed from the cache after a couple of hours.
What's wrong? My cache behavior settings on CloudFront is set to default and it uses Origin Cache Headers.
Take a look here: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html
If an object in an edge location isn't frequently requested, CloudFront might evict the object—remove the object before its expiration date—to make room for objects that are more popular.
It means that object is not popular enough to stay in cache for a longer time. If you have enough viewers hitting this object AND this particular CloudFront location, it would have stayed in cache longer

How to crawl a wordpress blog?

I write a c program to crawl blogs. It works well until it meets this blog: www.ipujia.com. I send the HTTP request:
GET http://www.ipujia.com/ HTTP/1.0
to the website and get the response as below:
HTTP/1.1 301 Moved Permanently
Date: Sun, 27 Feb 2011 13:15:26 GMT
Server: Apache/2.2.16 (Unix) mod_ssl/2.2.16 OpenSSL/0.9.8e-fips-rhel5
mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_perl/2.0.4
Perl/v5.8.8
X-Powered-By: PHP/5.2.14
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Last-Modified: Sun, 27 Feb 2011 13:15:27 GMT
Location: http://http/www.ipujia.com/
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8
This is strange because I cannot get the index page following the Location. Does anyone have any ideas?
The Location field in the response contains a malformed URI.
Location: http://http/www.ipujia.com/ (notice the protocol error)
Should be
Location: http://www.ipujia.com/
Unless you are in control of the server there is little you could do here.
To solve it could you not parse the "Location" response and attempt to extract a valid URI from the it?

Resources