According to Firebug, here are the response headers the first time the resource is retrieved:
Accept-Ranges bytes
Cache-Control public, max-age=86400
Content-Language en
Content-Length 232
Content-Location http://localhost/myapp/cacheTest.html
Content-Type text/html; charset=WINDOWS-1252
Date Wed, 05 Sep 2012 15:59:31 GMT
Last-Modified Tue, 01 May 2012 05:00:00 GMT
Server Restlet-Framework/2.0.3
Vary Accept-Charset, Accept-Encoding, Accept-Language, Accept
I click away and click back, and here are are the request headers sent to the server:
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-us,en;q=0.5
Connection keep-alive
Host localhost
Referer http://localhost/myapp/cacheTest2.html
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0
And so, naturally, the server can't send a 304 like I want, and instead sends the entire resource again.
This was happening in Firefox 14, and I thought it might be a bug, so I upgraded. But it is still happening in Firefox 15. Chrome has no problem.
I have tried both with and without an "Expires" header, it makes no difference. Firefox just refuses to send an If-Modified-Since header.
Okay, I feel like a doofus but decided to put my pride aside, and rather than just deleting this question, tell what the solution was in case anyone else ever did the same thing...
Once upon a time, in order to test something, I had turned off caching in Firefox. I turned it back on, and now it is sending the header.
For me, the problem turned out to be that the Last-Modified date in the response I was sending wasn't exactly RFC 1123. Chrome didn't mind; it happily sent my malformed timestamp back in the If-Modified-Since header. Firefox, however, quietly ignored it.
I can see from your headers this wasn't the reason in your case, but I'm posting this answer anyhow, since it took a while for me to realise this was the issue, and maybe, some day, somebody else will have the same problem.
This is under Linux, FWIW (Mint 17, to be precise) but I expect both browsers would behave the same way under other OSes.
For me, it was that Firefox (ESR 60.4.0) wasn't sending "If-Modified-Since" nor "If-None-Match" headers for some resources (like CSS, JS) when I loaded site from address bar.
However, when asking for reload with "ctrl+r", it was sending both headers, but resources still were reloaded with "200 OK" even if they should have returned "304 Not modified"
After some tracking, I found out that it was due to apache 2.4.25 deflate module. If resources were compressed, they effectively would not be cached (that is, they would be reloaded on next access). When looking more into it, it turns out it is due to ETag handling when using deflate.
So the most reasonable kludge for me was using "FileETag None", and now I properly get "304" even for compressed documents when I do "ctrl-r".
Amazingly, even after that it still showed green "200 OK" indicating full retrieval od CSS and JS (and no "If-Modified-Since" in detailed "Request headers" panel) which drove me crazy, until I figured out that this column is sometimes FAKED (there is another column called "Transferred", and if it says "cached" instead of number of bytes, it means firefox actually got the value from internal cache, and not from network "200 OK" request it shows. Checking access_log on server side confirmed that there is no network activity then, so that part just bad UI)
another reason that can cause firefox to not cache requests is if the disk is full. at least on OSX.
this is extra puzzling because safari at that point still properly caches requests, and firefox since could at least cache the requests in memory.
clearing the cache and making some space on the disk helps.
Related
I'm not completely sure whether this belongs on SO, but I don't know where else to ask.
While I was checking the loading speed of a web app of mine I noticed that apparently no HTTP response (no matter what type - html, css, js) is gzip/deflate compressed. That is, no response header like "Content-Encoding: gzip" is present in any request and the browser reports that the resource is not compressed.
tested and confirmed in multiple browsers (IE10, FF 17, Chrome 23, Opera 12.10, Safari 5.x)
tested and confirmed on two machines running Windows 8 Pro
double checked with Fiddler - the response is not compressed and does not contain a content-encoding header
this doesn't only happen for my web apps, no other web site I tested appears to send compressed responses (according to the browser)
on Windows 7 the responses arrive compressed and with all headers
HTTPS responses are compressed
Here's an example of the response headers (note the lack of the content-encoding header):
I also checked the server side. The server is running Windows Server 2008 R2/IIS 7.5. I used Failed Request Tracing to find out what the server is sending. The resource appears to be compressed:
Also, the server seems to send the proper headers:
My conclusion: it must be Windows 8 who is intervening here. Apparently it modifies HTTP responses. I suppose that Windows 8 is receiving the compressed response, decompresses it, removes the content-encoding header and passes the modified response further down the pipeline.
Now my questions:
Can anybody confirm that Windows 8 modifies HTTP responses and that it works the way I described?
Is there a way to monitor or even disable this behavior?
Thanks in advance for your answers.
Regards,
Andre
Update:
I used Wireshark to see what arrives at the client. As I expected the resources are compressed and the content-encoding header is still present. The image below shows the wireshark protocol and in the bottom right the response as received by Chrome.
This confirms my assumption that Windows 8 is intervening.
It turned out that the culprit was my antivirus software, Avast, more specifically the integrated real-time network-shield. Turning it off causes responses to appear compressed in the browsers again.
What remains interesting is that Avast was running on the Windows 7 machines as well, even though on those machines responses where compressed where applicable during my tests.
What is the current state of affairs when it comes to whether to do
Transfer-Encoding: gzip
or a
Content-Encoding: gzip
when I want to allow clients with e.g. limited bandwidth to signal their willingness to accept a compressed response and the server have the final say whether or not to compress.
The latter is what e.g. Apache's mod_deflate and IIS do, if you let it take care of compression. Depending on the size of the content to be compressed, it will do the additional Transfer-Encoding: chunked.
It will also include a Vary: Accept-Encoding, which already hints at the problem. Content-Encoding seems to be part of the entity, so changing the Content-Encoding amounts to a change of the entity, i.e. a different Accept-Encoding header means e.g. a cache cannot use its cached version of the otherwise identical entity.
Is there a definite answer on this that I have missed (and that's not buried inside a message in a long thread in some apache newsgroup)?
My current impression is:
Transfer-Encoding would in fact be the right way to do what is mostly done with Content-Encoding by existing server and client implentations
Content-Encoding, because of its semantic implications, carries a couple of issues (what should the server do to the ETag when it transparently compresses a response?)
The reason is chicken'n'egg: Browsers don't support it because servers don't because browsers don't
So I am assuming the right way would be a Transfer-Encoding: gzip (or, if I additionally chunk the body, it would become Transfer-Encoding: gzip, chunked). And no reason to touch Vary or ETag or any other header in that case as it's a transport-level thing.
For now I don't care too much about the 'hop-by-hop'-ness of Transfer-Encoding, something that others seem to be concerned about first and foremost, because proxies might uncompress and forward uncompressed to the client. However, proxies might just as well forward it as-is (compressed), if the original request has the proper Accept-Encoding header, which in case of all browsers that I know is a given.
Btw, this issue is at least a decade old, see e.g.
https://bugzilla.mozilla.org/show_bug.cgi?id=68517 .
Any clarification on this will be appreciated. Both in terms of what is considered standards-compliant and what is considered practical. For example, HTTP client libraries only supporting transparent "Content-Encoding" would be an argument against practicality.
The correct usage, as defined in RFC 2616 and actually implemented in the wild, is for the client to send an Accept-Encoding request header (the client may specify multiple encodings). The server may then, and only then, encode the response according to the client's supported encodings (if the file data is not already stored in that encoding), indicate in the Content-Encoding response header which encoding is being used. The client can then read data off of the socket based on the Transfer-Encoding (ie, chunked) and then decode it based on the Content-Encoding (ie: gzip).
So, in your case, the client would send an Accept-Encoding: gzip request header, and then the server may decide to compress (if not already) and send a Content-Encoding: gzip and optionally Transfer-Encoding: chunked response header.
And yes, the Transfer-Encoding header can be used in requests, but only for HTTP 1.1, which requires that both client and server implementations support the chunked encoding in both directions.
ETag uniquely identifies the resource data on the server, not the data actually being transmitted. If a given URL resource changes its ETag value, it means the server-side data for that resource has changed.
Quoting Roy T. Fielding, one of the authors of RFC 2616:
changing content-encoding on the fly in an inconsistent manner
(neither "never" nor "always) makes it impossible for later requests
regarding that content (e.g., PUT or conditional GET) to be handled
correctly. This is, of course, why performing on-the-fly
content-encoding is a stupid idea, and why I added Transfer-Encoding
to HTTP as the proper way to do on-the-fly encoding without changing
the resource.
Source: https://issues.apache.org/bugzilla/show_bug.cgi?id=39727#c31
In other words: Don't do on-the-fly Content-Encoding, use Transfer-Encoding instead!
Edit: That is, unless you want to serve gzipped content to clients that only understand Content-Encoding. Which, unfortunately, seems to be most of them. But be aware that you leave the realms of the spec and might run into issues such as the one mentioned by Fielding as well as others, e.g. when caching proxies are involved.
I recently discovered something (that surprises me) when opening an audio file in firefox or chrome. If I don't specify the HTTP response header "Accept-Ranges: bytes", firefox won't be able to determine the length of the ogg file (in seconds) before reaching the end in playback. Chrome will discover the length of the ogg file (in seconds), but the audio player appears to crash when it reaches the end, and refuses to re-play the file after the crash. Other browsers were not tested.
Working Http response headers:
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Type: application/ogg
Content-Length: 245646
Failing Http response headers:
HTTP/1.1 200 OK
Content-Type: application/ogg
Content-Length: 245646
This is strange to me, because I'm not using any partial-content ranges. My server implementation doesn't even support them (so I think my server might be lying when it says "Accept-Ranges: bytes"). I certainly don't see why this header should be required for playback within the browser. Do both browsers just have bugs, which are exposed when I don't set the Accept-Ranges header? This seems unlikely to me. Can anyone explain?
Thanks!
Not sure this is a bug or just a skewed interpretation of the standard - section 14.5 . The standard states MAY and not MUST...
OTOH it can be that for being able to use an audio-stream and/or seeking etc. the implemented audio playback modules need this header... you can try out what happens if you submit "Accept-Ranges: none"... if they are a bit http 1.1 conforming than it just works...
I'm seeing a behaviour in Firefox which seems to me unexpected. This is not particularly repeatable (unfortunately) but does crop up from time to time. Once it starts, it is repeatable on refreshing the page until a full refresh (ctrl-f5) is done. Last time, I managed to get a trace.
Basically, FF4.0.1 is requesting a resource (from an ASP.NET MVC 3 application running under IIS7):
GET http://www.notarealdomain.com/VersionedContent/Scripts/1.0.40.5653/jquery.all.js HTTP/1.1
Host: www.notarealdomain.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: */*
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Proxy-Connection: keep-alive
Referer: http://www.notarealdomain.com/gf
If-Modified-Since: Sat, 01 Jan 2000 00:00:00 GMT
It then gets the following response from the server (via a proxy, but I can see in the IIS server logs that the request was made all the way to the server):
HTTP/1.1 304 Not Modified
Date: Mon, 09 May 2011 14:00:47 GMT
Cache-Control: public
Via: 1.1 corp-proxy (NetCache NetApp/6.0.3P2D5)
This seems reasonable - the client makes a conditional request (if-modified-since), and the server responds "ok - use your copy" (304 Not Modified).
The problem is that the client in this case then does not dish up the file - it behaves as if there was no content (ie if an image, it doesn't appear, if js it behaves as if the .js file is missing on the page, if .css then the page renders without the css styles, etc). This is apparent on both the web page itself, and also when using the excellent HttpWatch tool. HttpWatch clearly shows that the browser did have the item in cache, but didn't use it as the content source.
What am I missing here? The above conversation seems reasonable, so why does FF make a conditional request and then not use its cached copy when told to do so? If you then ctrl-F5 to force a full refresh, the behaviour goes away, and only returns sporadically.
We also have some anecdotal evidence that this occurs with FF3 and Chrome too, but we haven't confirmed this with any forensic data.
Has anyone else seen this, and does anyone know what might be going wrong, or what further steps might isolate the problem?
OK - I got to the bottom of this problem. As ever - the answer was staring me in the face!
The problem was not with the caching behaviour at all. Another part of the system was sporadically failing and writing out a zero length file - which was being cached by Firefox.
So when Firefox sent a conditional request to the server and received it's 304/Not Modified, it dutifully went off and used the corrupt zero length version of the file that it found in its cache.
All the signs were there for me to see - it just took a bit of time to get there :)
Thanks all, for your comments and suggestions.
I'm using Apache Abdera to POST atom multipart data to my server, and am having some odd problems that I can't pin down.
It looks like an issue with chunked transfer encoding, but I'm insufficiently experienced to be certain. The problem manifests as the server throwing an error indicating that the request I sent it contains only one mime part, not two as required. I attached Wireshark to the interface and captured the conversation, and it went like this:
POST /sss/col-uri/2ee98ea1-f9ad-4f01-9b1c-cfa3c4a6dc3c HTTP/1.1
Host: localhost
Expect: 100-continue
Transfer-Encoding: chunked
Content-Type: multipart/related; boundary="1306399868259";type="application/atom+xml;type=entry"
The server's response:
HTTP/1.1 100 Continue
My client continues:
198
--1306399868259
Content-Type: application/atom+xml;type=entry
Content-Disposition: attachment; name="atom"
<entry xmlns="http://www.w3.org/2005/Atom"><title xmlns="http://purl.org/dc/terms/">Richard Woz Ere</title><bibliographicCitation xmlns="http://purl.org/dc/terms/">this is my citation</bibliographicCitation><content type="application/zip" src="cid:48bd9436-e8b6-4f68-aa83-5c88eda52fd4" /></entry>
0
b0e9
--1306399868259
Content-Type: application/zip
Content-Disposition: attachment; name="payload"; filename="example.zip"
Content-ID: <48bd9436-e8b6-4f68-aa83-5c88eda52fd4>
Packaging: http://purl.org/net/sword/package/SimpleZip
And at this point the server responds with:
HTTP/1.1 400 Bad Request
Date: Thu, 26 May 2011 08:51:08 GMT
Server: Apache/2.2.17 (Unix) mod_ssl/2.2.17 OpenSSL/0.9.8l DAV/2 mod_wsgi/3.3 Python/2.6.1
Connection: close
Transfer-Encoding: chunked
Content-Type: text/xml
Indicating the error (which is well understood). My server goes on to stream a pile of base64 encoded bits onto the output stream, but in the mean time the server is not listening, it has already decided that the request was erroneous.
Unfortunately, I'm not in charge of the HTTP layer - this is all handled by Abdera using Apache httpclient. My code that does this looks like this:
client.execute("POST", url.toString(), new SWORDMultipartRequestEntity(deposit), options);
Here, the SWORDMultipartRequestEntity is a copy of the standard Abdera MultipartRequestEntity class, with a few extra headers thrown in (see, for example, Packaging in the above snippet); the "deposit" argument is just an object holding the atom part and the inputstream.
When attaching a debugger I get to this line of code fine, and then it disappears into a rat hole and then I get this error back.
Any hints or tips? I've pretty much exhausted my angles of attack!
The only thing that stands out for me is that immediately after the atom:entry document, there is a newline with "0" on it alone, which appears to be chunked transfer encoding speak for "I'm finished". Not sure how it got there, or whether it really has any effect. Help much appreciated.
Cheers,
Richard
The lonely 0 may indeed be a problem. My uninformed guess is that it results from some call to flush(), which then writes the whole buffer as another HTTP chunk. Unfortunately at the point where flush is called, the buffer had already been flushed and its size is therefore zero. So the HttpChunkedOutputFilter (or however it is called) should be taught than an empty buffer does not need to be flushed.
[update:] You should set a breakpoint in the ChunkedOutputStream class, especially the flush method. I just looked at its code and it seems to be ok, but maybe I missed something.