Varnish don't gzip html pages - http

I have multiple tomcat 7.0.56 running on CentOS 6 and a Varnish 4 running on another server Centos. Varnish has to do two important things for us: be a reverse proxy (works like a charm) and compress all data that can be compressed. We don't care about caching in our architecture.
On the second point we have a problem. Varnish gzip CSS and JS and don't gzip html.
In my default.vcl I do not compress files like pictures,swf or my pages designed for mobile and I set beresp.do_gzip true for all other stuff.
My vcl_recv :
sub vcl_recv {
if (req.http.Accept-Encoding) {
if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf)$" || req.url ~ "Mobile\.") {
unset req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate" && req.http.user-agent !~ "MSIE") {
set req.http.Accept-Encoding = "deflate";
} else {
# unkown algorithm
unset req.http.Accept-Encoding;
}
}
set req.backend_hint = h.backend(client.identity);
}
My vcl_backend_response:
sub vcl_backend_response {
if (beresp.http.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf)$" || beresp.http.url ~ "Mobile\.") {
set beresp.do_gzip = false;
}
else {
set beresp.do_gzip = true;
set beresp.http.X-Cache = "ZIP";
}}
All streams passing by Varnish are correctly gzipped except html pages. But these pages have headers almost correct.
Request Headers
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate, sdch
Accept-Language:fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
Cache-Control:no-cache
Connection:keep-alive
Cookie:JSESSIONID=F116C2729E96D2150EEEACEB90F95EA9.node1; UUID=631a2947-14ac4e00ca6-0233de72a654bb34bce4a88d9e172e25
Host:tomcat.domain.tld
Pragma:no-cache
Referer:http://tomcat.domain.tld/path/to/ServletControl?sourceview=liste_menu
User-Agent:Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
Response Headers
Accept-Ranges:bytes
Age:0
Cache-Control:no-store,no-cache
Connection:keep-alive
Content-Language:fr-FR
Content-Type:text/html;charset=UTF-8
Date:Wed, 07 Jan 2015 14:55:04 GMT
Expires:0
MII:1800
Pragma:no-store,no-cache
Server:Apache-Coyote/1.1
Set-Cookie:UUID=631a2947-...e25; Version=1; Max-Age=10000; Expires=Wed, 07-Jan-2015 17:41:44 GMT; Path=/gce162
Transfer-Encoding:chunked
Vary:Accept-Encoding
Via:1.1 varnish-v4
X-Cache:ZIP
X-Varnish:163870
We can see the tag X-Cache with the value ZIP, the tag Vary with accept-Encoding but no Content-Encoding "gzip".
So I don't understand why varnish don't gzip html and write the tag Vary=accept-Encoding ?
Any help is welcome. Thank you.
Baddou

Believe it's related to Transfer-Encoding:chunked response header be returned due to the content-length not being set for the html files.
To disable the chunked response, try adding set beresp.do_esi = true; in the else loop of the vcl_backend_response.
see also
https://www.varnish-cache.org/trac/ticket/1506 and How do I disable 'Transfer-Encoding: chunked' encoding in Varnish?

Related

Are browsers supposed to handle 304 responses automagically?

Might be a silly question, but I haven't found any clear answer yet.
My server handles ETag caching for some quite big JSON responses we have, returning 304 NOT MODIFIED with an empty body if the If-None-Match header contains the same hash as the one newly generated (shallow ETags).
Are browsers supposed to handle this automagically, or do the in-browser client apps consuming the API asynchronously need to implement some logic to handle such responses (i.e. use the cached version if 304 is responded, create/update the cached version otherwise)?
Because so far, I've manually implemented this logic client-side, but I'm wondering whether I just reinvented a square wheel...
In other words, with the Cache-Control header for example, the in-browser client apps don't need to parse the value, check for max-age for instance, stores it somehow, setup a timeout, etc.: everything is handled ahead by the browsers directly. The question is: are browsers supposed to behave the same way when they receive a 304?
Here is how I wrote my client so far (built with AngularJS, running in browsers):
myModule
.factory("MyRepository", ($http) => {
return {
fetch: (etag) => {
return $http.get(
"/api/endpoint",
etag ? { headers: { "If-None-Match": etag } } : undefined
);
}
};
})
.factory("MyService", (MyRepository, $q) => {
let latestEtag = null;
let latestVersion = null;
return {
fetch: () => {
return MyRepository
.fetch(latestEtag)
.then((response) => {
latestEtag = response.headers("ETag");
latestVersion = response.data;
return angular.copy(latestVersion);
})
.catch((response) => {
return 304 === error.status
? angular.copy(latestVersion)
: $q.reject(response)
});
}
};
});
So basically, is the above logic effectively needed, or am I supposed to be able to simply use $http.get("/api/endpoint") directly?
This code above is working fine, which seems to mean that it needs to be handled programmatically, although I've never seen such "custom" implementations on the articles I read.
The 304 responses are automagically handled by browser as such
So I created a simple page
<html>
<head>
<script src="./axios.min.js"></script>
<script src="./jquery-3.3.1.js"></script>
</head>
<body>
<h1>this is a test</page>
</body>
</html>
and the added a test.json file
root#vagrant:/var/www/html# cat test.json
{
"name": "tarun"
}
And then in nginx added below
location ~* \.(jpg|jpeg|png|gif|ico|css|js|json)$ {
expires 365d;
}
Now the results
AXIOS
As you can see the first request is 200 and second one 304 but there is no impact on the JS code
jQuery
Same thing with jQuery as well
From the curl you can see that server didn't send anything on the 2nd 304 request
$ curl -v 'http://vm/test.json' -H 'If-None-Match: "5ad71064-17"' -H 'DNT: 1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.9' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36' -H 'Accept: */*' -H 'Referer: http://vm/' -H 'X-Requested-With: XMLHttpRequest' -H 'Connection: keep-alive' -H 'If-Modified-Since: Wed, 18 Apr 2018 09:31:16 GMT' --compressed
* Trying 192.168.33.100...
* TCP_NODELAY set
* Connected to vm (192.168.33.100) port 80 (#0)
> GET /test.json HTTP/1.1
> Host: vm
> If-None-Match: "5ad71064-17"
> DNT: 1
> Accept-Encoding: gzip, deflate
> Accept-Language: en-US,en;q=0.9
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
> Accept: */*
> Referer: http://vm/
> X-Requested-With: XMLHttpRequest
> Connection: keep-alive
> If-Modified-Since: Wed, 18 Apr 2018 09:31:16 GMT
>
< HTTP/1.1 304 Not Modified
< Server: nginx
< Date: Wed, 18 Apr 2018 09:42:45 GMT
< Last-Modified: Wed, 18 Apr 2018 09:31:16 GMT
< Connection: keep-alive
< ETag: "5ad71064-17"
<
* Connection #0 to host vm left intact
So you don't need to handle a 304, browser will do that work for you.
Yes, probably all modern major browsers handle response validation using conditional requests well. Relevant excerpt from The State of Browser Caching, Revisited article by Mark Nottingham:
Validation allows a cache to check with the server to see if a stale stored response can be reused.
All of the tested browsers support validation based upon ETag and Last-Modified. The tricky part is making sure that the 304 Not Modified response is correctly combined with the stored response; specifically, the headers in the 304 update the stored response headers.
All of the tested browsers do update stored headers upon a 304, both in the immediate response and subsequent ones served from cache.
This is good news; updating headers with a 304 is an important mechanism, and when they get out of sync it can cause problems.
For more information check HTTP Caching article by Ilya Grigorik.

Cookies not being set by IIS in HTTP header

I use ASP.NET forms authentication that seems to work ok online but not in my development environment for Internet Explorer, Firefox, and Chrome. As far as I can see IIS is not sending the Set-Cookie HTTP header when a page is being requested:
GET http://127.0.0.1:81/ HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: nb-NO
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko
Accept-Encoding: gzip, deflate
Host: 127.0.0.1:81
DNT: 1
Connection: Keep-Alive
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Vary: Accept-Encoding
Server: Microsoft-IIS/8.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Wed, 08 Oct 2014 19:24:58 GMT
Content-Length: 13322
I've tried adding 127.0.0.1 www.example.com to the \Windows\System32\drivers\etc\hosts file and accessing http://www.example.com:81 instead, but that has no effect. Here are my web.config settings:
<!-- ASP.NET forms authentication is enabled by default -->
<authentication mode="Forms">
<!-- Set the page to redirect to when the user attempts to access a restricted resource -->
<forms loginUrl="~/Account/Login.aspx" timeout="2880" />
</authentication>
I've found a work-around by always setting a dummy cookie if no cookies are sent inside the ASP.NET web page:
/// <summary>
/// Force the browser to use cookies if none are in use.
/// Sets an empty cookie.
/// </summary>
void ForceCookiesIfRequired()
{
if (Request.Cookies == null || Request.Cookies.Count == 0)
{
// No cookies, so set a dummy blank one
var cookie1 = new HttpCookie(FormsAuthentication.FormsCookieName, String.Empty) { Expires = DateTime.Now.AddYears(-1) };
Response.Cookies.Add(cookie1);
}
}
protected override void OnInit(EventArgs e)
{
base.OnInit(e);
// Force the use of cookies if none are sent
ForceCookiesIfRequired();
}
I never had todo this before, has some Microsoft patch or upgrade broken ASP.NET forms authentication? One way to check your own solutions is to clear the cookie I guess. This forces the Set-Cookie HTTP header to be sent by the IIS server.

How do I disable 'Transfer-Encoding: chunked' encoding in Varnish?

Using Varnish 4, I have a set of backends that're responding with a valid Content-Length header and no Transfer-Encoding header.
On the first hit from a client, rather than responding to the client with those headers, Varnish is dropping the Content-Length header and adding Transfer-Encoding: chunked to the response. (Interestingly, the payload doesn't appear to have any chunks in it - it's one contiguous payload).
This causes serious problems for clients like Flash video players that are trying to do segment-size, bandwidth, etc analysis based on the Content-Length header. Their analysis fails, and they can't do things like multi-bitrate streaming, etc.
I've tried a number of semi-obvious things like:
beresp.do_stream = true
beresp.do_gzip = false
unset req.http.Accept-Encoding
Sample backend response:
HTTP/1.1 200 OK
Cache-Control: public, max-age=600
Content-Type: video/mp4
Date: Tue, 13 May 2014 19:44:35 GMT
Server: Apache
Content-Length: 796618
Connection: keep-alive
Sample varnish response:
HTTP/1.1 200 OK
Server: Apache
Cache-Control: public, max-age=600
Content-Type: video/mp4
Date: Tue, 13 May 2014 23:10:06 GMT
X-Varnish: 2
Age: 0
Transfer-Encoding: chunked
Accept-Ranges: bytes
Subsequent loads of the object do including the Content-Length header, just not the first load into cache.
VCL: https://gist.github.com/onethumb/e64a405cc579909cace1
varnishlog output: https://gist.github.com/onethumb/e66a2bc4727a3a5340b6
Varnish Trac: https://www.varnish-cache.org/trac/ticket/1506
For the time being, do_stream = false will do what you want.
Avoiding chunked encoding for the case where the backend sends unchunked is a possible future improvement to Varnish.
Example:
sub vcl_backend_response {
if(beresp.http.Content-Type ~ "video") {
set beresp.do_stream = false;
set beresp.do_gzip = false;
//set resp.http.Content-Length = beresp.http.Content-Length;
}
if(beresp.http.Edge-Control == "no-store") {
set beresp.uncacheable = true;
set beresp.ttl = 60s;
set beresp.http.Smug-Cacheable = "No";
return(deliver);
}
}
So the solution is not at all intuitive, but you must enable esi processing:
sub vcl_backend_response {
set beresp.do_esi = true;
if(beresp.http.Content-Type ~ "video") {
set beresp.do_stream = true;
set beresp.do_gzip = false;
//set resp.http.Content-Length = beresp.http.Content-Length;
}
if(beresp.http.Edge-Control == "no-store") {
set beresp.uncacheable = true;
set beresp.ttl = 60s;
set beresp.http.Smug-Cacheable = "No";
return(deliver);
}
}
So I discovered this by browsing the source code.
In particular, Varnish does this:
if (!req->disable_esi && req->obj->esidata != NULL) {
/* In ESI mode, we can't know the aggregate length */
req->res_mode &= ~RES_LEN;
req->res_mode |= RES_ESI;
}
The above code sets the res_mode flag.
A little while later:
if (!(req->res_mode & (RES_LEN|RES_CHUNKED|RES_EOF))) {
/* We havn't chosen yet, do so */
if (!req->wantbody) {
/* Nothing */
} else if (req->http->protover >= 11) {
req->res_mode |= RES_CHUNKED;
} else {
req->res_mode |= RES_EOF;
req->doclose = SC_TX_EOF;
}
}
This sets the res_mode flag to RES_CHUNKED if the HTTP protocol is HTTP/1.1 or higher (which it is in your example) and the res_mode flag isn't set. Now even later:
if (req->res_mode & RES_CHUNKED)
http_SetHeader(req->resp, "Transfer-Encoding: chunked");
Varnish sends the chuncked transfer encoding if the RES_CHUNKED flag is set.
The only way I see to effectively disable this is by enabling ESI mode. It gets disabled in a few other ways, but those aren't practical (e.g. for HTTP HEAD requests or pages with a 304 status code).
Upgraded from varnish 4.0 to 5.2 and now this works correctly also for the 1st request.

MVC 3 client caching

I am trying to make modifications to an existing CDN. What I am trying to do is create a short cache time and use conditional GETs to see if the file has been updated.
I am tearing my hair out because even though I am setting a last modified date and seeing it in the response headers, on subsequent get requests I am not seeing an If-Modified-Since header being returned. At first I thought it was my local development environment or the fact that I was using Fiddler as a proxy for testing so I deployed to a QA server. But what I am seeing in Firebug is so different than what I am doing. I see the last modified date, for some reason it is setting my cache-control to private, and I have cleared any header Output Caching and the only header IIS 7.5 is set to write is to enable Http keep-alive, so all the caching should be driven by the code.
This seemed like such a no-brainer, yet I've been adding and removing headers all day with no luck. I checked global.asax and anywhere else (I didn't write the app so I was looking for any hidden surprises and am stumped. Below is the current code and request and response headers. I have the expiration set to 30 seconds just for testing purposes. I have looked at several samples, I don't see myself doing anything different, but it simply won't work.
Response Headersview source
Cache-Control private, max-age=30
Content-Length 597353
Content-Type image/jpg
Date Tue, 03 Sep 2013 21:33:55 GMT
Expires Tue, 03 Sep 2013 21:34:25 GMT
Last-Modified Tue, 03 Sep 2013 21:33:55 GMT
Server Microsoft-IIS/7.5
X-AspNet-Version 4.0.30319
X-AspNetMvc-Version 3.0
X-Powered-By ASP.NET
Request Headersview source
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Connection keep-alive
Cookie __utma=1.759556114.1354835397.1377631052.1377732484.36; __utmz=1.1354835397.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Host hqat4app1
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
Response.Cache.SetCacheability(HttpCacheability.Public);
Response.Cache.SetLastModified(DateTime.Now);
return new FileContentResult(fileContents, contentType);
The relevant code is:
public ActionResult Resize(int id, int size, bool grayscale)
{
_logger.Debug(() => string.Format("Resize {0} {1} {2}", id, size, grayscale));
string imageFileName = null;
if (id > 0)
using (new UnitOfWorkScope())
imageFileName = RepositoryFactory.CreateReadOnly<Image>().Where(o => o.Id == id).Select(o => o.FileName).SingleOrDefault();
CacheImageSize(id, size);
if (!ImageWasModified(imageFileName))
{
Response.Cache.SetExpires(DateTime.Now.AddSeconds(30));
Response.StatusCode = (int)HttpStatusCode.NotModified;
Response.Status = "304 Not Modified";
return new HttpStatusCodeResult((int)HttpStatusCode.NotModified, "Not-Modified");
}
byte[] fileContents;
if (ShouldReturnDefaultImage(imageFileName))
fileContents = GetDefaultImageContents(size, grayscale);
else
{
bool foundImageFile;
fileContents = GetImageContents(id, size, grayscale, imageFileName, out foundImageFile);
if (!foundImageFile)
{
// No file found, clear cache, disable output cache
//ClearOutputAndRuntimeCacheForImage(id, grayscale);
//Response.DisableKernelCache();
}
}
string contentType = GetBestContentType(imageFileName);
Response.Cache.SetCacheability(HttpCacheability.Public);
Response.Cache.SetLastModified(DateTime.Now);
return new FileContentResult(fileContents, contentType);
}
private bool ImageWasModified(string fileName)
{
bool foundImageFile;
string filePath = GetFileOrDefaultPath(fileName, out foundImageFile);
if (foundImageFile)
{
string header = Request.Headers["If-Modified-Since"];
if(!string.IsNullOrEmpty(header))
{
DateTime isModifiedSince;
if (DateTime.TryParse(header, out isModifiedSince))
{
return isModifiedSince < System.IO.File.GetLastWriteTime(filePath);
}
}
}
return true;
}

SignalR routing issue, get 200 ok but response empty

I have an existing MVC application which I am integrating a hub into, now I have setup the hub like so:
routeTable.MapHubs("myapp/chat/room", new HubConfiguration { EnableCrossDomain = true, EnableDetailedErrors = true, EnableJavaScriptProxies = true });
Then in the clientside I am connecting like so:
var connection = $.hubConnection(SystemConfiguration.ServiceUrl + "/myapp/chat/room", { useDefaultPath: false });
var hub = this.Connection.createHubProxy("ChatHub"); // Same name as on the hub attribute
connection.start().done(function(){ /* do stuff */});
Then I see the HTTP Request like so:
http://localhost:23456/myapp/chat/room/negotiate?_=1374187915970
Response Headers
Access-Control-Allow-Cred... true, true
Access-Control-Allow-Head... content-type, x-requested-with, *
Access-Control-Allow-Meth... GET, POST, PUT, DELETE, OPTIONS
Access-Control-Allow-Orig... http://localhost:34567, http://localhost:34567
Access-Control-Max-Age 10000
Cache-Control no-cache
Content-Length 420
Content-Type application/json; charset=UTF-8
Date Thu, 18 Jul 2013 22:52:18 GMT
Expires -1
Pragma no-cache
Server Microsoft-IIS/8.0
X-AspNet-Version 4.0.30319
X-Content-Type-Options nosniff
Request Headers
Accept application/json, text/javascript, */*; q=0.01
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Content-Type application/x-www-form-urlencoded; charset=UTF-8
Host localhost:23456
Origin http://localhost:34567
Referer http://localhost:34567/myapp/chat?chatId=1764a2e3-ff6f-4a17-9c5f-d99642301dbf
User-Agent Mozilla/5.0 (Windows NT 6.2; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0
The response though contains no body, its got a 200 status though... I am debugging on the server and the hub methods are never hit. The only non standard thing in this scenario is that I have a custom CORS HttpModule which intercepts traffic and appends the CORS required headers, as you can see in the response, so not sure if this confuses SignalR's CORS support in some way. Anyway I can see the HttpModule being hit so it goes past there fine, but is somehow lost between there and the hub.
Tried googling but not much info on this topic...
The issue seems to be down to my CORS handling at HttpModule level, it must somehow conflict with SignalR... if I put a check in the module to see if the URL contains "chat/room" and just ignore the request if needed it then works fine, however it feels like a hack, but at least it works now.

Resources