I'm running a web application using Tomcat 7. The app is returning everyone's favorite HTTP header, Transfer-Encoding: chunked. When I navigate to the root url with Chrome (and Firefox, fwiw), Chrome only downloads the first 16384 bytes of the resource. The resource is definitely larger than that size. Why is Chrome only fetching the initial chunk? I see this when accessing the resource with curl:
$ curl -i http://localhost:8080
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=9C7D0EE5D26AD7C99551204CE2896422; Path=/; HttpOnly
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Date: Tue, 03 Dec 2013 23:07:09 GMT
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
(I know there are several questions on this topic, but they seem to involve disabling Transfer-Encoding: chunked. I want to know why Chrome isn't handling the situation correctly.)
Related
I'm trying to obtain the HTML dump of some RFC's from IETF website, via a simple GET request. However, it responds with status code 301. I'm making use of netcat to simulate the HTTP GET request with the following command :
$ printf 'GET /html/rfc3986 HTTP/1.1\r\nHost: tools.ietf.org\r\nConnection: close\r\n\r\n' | nc tools.ietf.org 80
The following reply is obtained as a result of the above command :
HTTP/1.1 301 Moved Permanently
Date: Wed, 09 Sep 2020 15:36:36 GMT
Server: Apache/2.2.22 (Debian)
Location: https://tools.ietf.org/html/rfc3986
Vary: Accept-Encoding
Content-Length: 323
Connection: close
Content-Type: text/html; charset=iso-8859-1
X-Pad: avoid browser bug
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved here.</p>
<hr>
<address>Apache/2.2.22 (Debian) Server at tools.ietf.org Port 80</address>
</body></html>
However, if I try to send a HTTP/1.0 based HEAD request to the Location value determined in the above reply, I get status 404 in reply. I made use of HEAD method just to check the status code of the reply.
Command :
printf 'HEAD https://tools.ietf.org/html/rfc3986 HTTP/1.0\r\n\r\n' | nc tools.ietf.org 80
Reply:
HTTP/1.1 404 Not Found
Date: Wed, 09 Sep 2020 16:32:18 GMT
Server: Apache/2.2.22 (Debian)
Vary: accept-language,accept-charset,Accept-Encoding
Accept-Ranges: bytes
Connection: close
Content-Type: text/html; charset=iso-8859-1
Content-Language: en
Expires: Wed, 09 Sep 2020 16:32:18 GMT
Is there a mistake in the way I'm making use of GET method to obtain the results?
You are sending a plain text request to port 80, so the URL you are trying is effectively http://tools.ietf.org/html/rfc3986
The response is telling you to instead request https://tools.ietf.org/html/rfc3986. That's not a different path on the same server, but a full URL.
The difference is that it begins https meaning you need to make a TLS-secured connection on port 443.
That's not going to be possible with a trivial use of netcat, so you're better off using an HTTP client like curl or wget
From this wikipedia example
a typical HTTP response may look like this
HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Content-Type: text/html; charset=UTF-8
Content-Encoding: UTF-8
Content-Length: 138
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
ETag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Connection: close
<html>
<head>
<title>An Example Page</title>
</head>
<body>
Hello World, this is a very simple HTML document.
</body>
</html>
In this case because of this header:
Connection: close
the client will know that the HTTP body has finished, because the TCP connection will be closed, Say for example Connection: close wasn't present, the client would still know the HTTP body was finished because this header:
Content-Length: 138
Is saying, once you get 138 bytes, we're done here. But in the case where neither of these headers are used and instead the server sends Transfer-encoding: chunked how does a browser know when a response has completed, so that it can move onto other things?
From Wikipedia:
The chunked keyword in the Transfer-Encoding header is used to
indicate chunked transfer. Each chunk is preceded by its size. The
transmission ends when a zero-length chunk is encountered.
Context
My github pages are not refreshing. After diagnosing my conclusion is it's a server side caching effect.
What I did + diagnostic results
The site is working OK.
I made a change in index.html in my local
repo, then commit and push
I completely cleared my browser cache (btw also using cache clear plugins, and Chrome dev tools set not using cache)
Reloaded the page, with ctrl+f5 and ctrl+R (change is not applied)
Checked using github.com read index.html, the change is there, committed.
Monitored the traffic with Fiddler. The request for index.html sent, full response received, the content is the old NOT changed.
Examined the response header with Fiddler, says: (see header exhibit)
Reverse diagnostic
I've issued a request with a usual trick typeing: index.html?v001orAnythingYouWant and I got the new version of the page
Problem
Problem solved one can say, but it is not true. When I refresh images, css, js still this effect will prevent me to see the new result.
Question
How can I configure or overcome this server side caching, of course only for development/testing time?
Response header exhibit
HTTP/1.1 200 OK
Server: GitHub.com
Content-Type: text/html; charset=utf-8
Last-Modified: Fri, 06 May 2016 12:24:29 GMT
Access-Control-Allow-Origin: *
Expires: Fri, 06 May 2016 12:45:44 GMT
Cache-Control: max-age=600
X-GitHub-Request-Id: B91F111E:5AA6:47804:572C8F9F
Content-Length: 43752
Accept-Ranges: bytes
Date: Fri, 06 May 2016 12:35:57 GMT
Via: 1.1 varnish
Age: 13
Connection: keep-alive
X-Served-By: cache-fra1238-FRA
X-Cache: HIT
X-Cache-Hits: 1
Vary: Accept-Encoding
X-Fastly-Request-ID: 1758f53052edbfb40a0044407d53d5654ad1e983
I'd like to get the whole page using telnet:
telnet
o test.bugs3.com 80
GET / HTTP/1.0
Actually I can get almost any website but this one. The same problem occurs with other free servers. I just want to know what exactly causes some restriction like that.
The request is as following:
Connected.
HTTP/1.1 200 OK
Server:
Date: Mon, 11 Nov 2013 04:11:47 GMT
Content-Type: text/html
Content-Length: 328
Last-Modified: Thu, 16 May 2013 12:17:53 GMT
Connection: close
Accept-Ranges: bytes
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>Account unavailable</title>
</head><body>
<h1>Account unavailable</h1>
<p>Maybe account have been moved, deleted, suspended or not activated yet.
<p>The requested resource could not be found but may be available again in
the future.
<hr>
</body></html>
It's because you aren't sending a Host: test.bugs3.com\r\n header. RFC 2616 #14.23: "A client MUST include a Host header field in all HTTP/1.1 request messages."
I have a quick question but in advance I've read the RFC 2616 Chapter 14.22 about Host and HTTP Header but I still not understand where in httpd.conf or configuration file of a webserver should be changed? Please correct me if I'm wrong.
Look at following two HTTP GET I did to an Apache. The first one is GET for HTTP 1.0 , the other one is GET for HTTP 1.1. See the output:
HTTP/1.0 200 OK
Date: Thu, 24 Oct 2013 03:46:22 GMT
Server: Apache/1.3.41 (Unix) mod_gzip/1.3.26.1a PHP/5.2.9 mod_throttle/3.1.2 mod_psoft_traffic/0.2 mod_ssl/2.8.31 OpenSSL/0.9.8b
Vary: *
Last-Modified: Fri, 10 Aug 2012 20:22:30 GMT
ETag: "17c815b-3b-50256d86"
Accept-Ranges: bytes
Content-Length: 59
Connection: close
Content-Type: text/html
<html>
<body>
<center>webli7</center>
</body>
</html>
HTTP/1.1 400 Bad Request
Date: Thu, 24 Oct 2013 04:04:40 GMT
Server: Apache/1.3.41 (Unix) mod_gzip/1.3.26.1a PHP/5.2.9 mod_throttle/3.1.2 mod_psoft_traffic/0.2 mod_ssl/2.8.31 OpenSSL/0.9.8b
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
16e
The HTTP protocol version is decided dynamicaly, not through configuration files. The client send a request specifying the highest protocol version that its support. Then, the server must respond with either the version requested by the client, or any earlier version that it prefers.
Since Apache does support HTTP/1.1, it should therefore match exactly the version provided by the client.
There exist a flag that you may set in Apache's config to force Apache to use HTTP/1.0 in certain situations, even though the browser requested HTTP/1.1. This is used to fix bugs in HTTP/1.1 handling of some very old browser. Today, you should not need to play with this flag.
As for your error, I would suggest that you make sure that your GET does provide the Host: header. This header is required in HTTP/1.1, yet optional in HTTP/1.0, and having it missing would certainly result in a 400 error.