HTTP1.1 to HTTP/2: what about headers? - http

In HTTP 1.1, the status line was
scheme/version code reason
HTTP/1.1 200 OK
I see :scheme and :status headers in the HPACK spec. I don't however see anything for version or reason? Is there not one?
In a request in HTTP 1.1, the request line was
method uri scheme/version
POST http://myhost.com HTTP/1.1
I see :method and I see :path, which I think is just a relative path, which is not the same as the full absolute path (and since Chrome and Firefox are pushing HTTPS for HTTP/2, this may make sense). I do not see version header though.
Is there a version header? Or is it seen that this will always be known before the protocol decision such that it is not really needed?
What about reason codes? Is it assumed these are pretty constant so that goes away (I am guessing here)?

In HTTP/1, the version token was needed to differentiate HTTP/1.0 from HTTP/1.1, since they had the same wire representation, but were supporting different features.
For example, a client declaring HTTP/1.1 implicitly tells the server that it supports persistent connections and content chunking.
With HTTP/2, the protocol version is negotiated.
In clear-text HTTP/2, the Upgrade header reports h2c, where the 2 means version 2 of the protocol. I imagine that for HTTP/3 the token will change to h3c.
Similarly happens for encrypted HTTP/2 where the token h2 is negotiated via ALPN.
Reason messages have been dropped as being redundant, as the status code was already conveying all the necessary information (not to mention that they could be attack vectors).
For these reasons, HTTP/2 does not have neither version nor reason pseudo-headers.

Related

How to send HTTP Headers during/after HTTP Body stream? Is there spec work on this?

Today, HTTP headers all need to be sent before a single bit of HTTP body is sent to the browser.
This is especially problematic with new technologies such as React 18 Streaming where certain headers, such as caching headers and 103 Early Hints, can be determined with certainty only at the end of the HTTP stream. Ideally these late headers would be sent to the browser just before ending the stream.
Are there efforts from spec working groups or browser vendors to enable headers to be sent during/after the HTTP body?
After doing research, it seems that there is no spec work about this, but I wonder if there is a browser vendor working on this? (Some browser folks are active here on StackOverflow.)
Context: I'm the author of vite-plugin-ssr and react-streaming.
There is a specification for Trailer fields for use with Chunked Encoding (Http 1.1, https://httpwg.org/specs/rfc7230.html#header.trailer).
The HTTP2 spec (which does not support Chunked Encoding) directly allows for a headers frame following the Data frames that contain the http body https://datatracker.ietf.org/doc/html/rfc7540#section-8.1.
Library support may vary as most http libraries attempt to abstract away the differences in the underlying protocols. In Javascript you will be interested in enabling trailing headers in the cross-browser standard fetch API. The MDN docs suggest that support is coming with reference to a trailers field on the Response object: https://developer.mozilla.org/en-US/docs/Web/API/Response.

HTTP Connection: Keep-Alive

I was looking at the HTTP 1.1 spec and was looking at the part of the spec related to the 'Connection' header. I noticed the the only token that is specified for the 'Connection' header is "close". After a little digging I found that the 'Keep-Alive' token that is found in the 'Connection' header in many server implementations, including Vim's which is using Apache 2.2.3, is left over from HTTP 1.0. Given the wide spread use of HTTP 1.1 how much value is there in adding Keep-Alive and similar inherited tokens from HTTP 1.0?
Some value; depends on the specific use.
In HTTP 1.1, all connections are considered persistent unless declared otherwise.
In practice, implementations do what they want:
When the client sends another request [after a HTTP Connection: Keep-Alive], it uses the same connection.
This will continue until either the client or the server decides that
the conversation is over, and one of them drops the connection.
So, it's really up to the implementers of the clients and servers to determine how long they keep the TCP connection open for. For example,
The default connection timeout of Apache 2.0 httpd[2] is as little as
15 seconds[3] and for Apache 2.2 only 5 seconds.
It looks like SPDY will form the basis for the upcoming HTTP 2.0. This changes connection handling dramatically.
Sources:
http://en.wikipedia.org/wiki/HTTP_persistent_connection#HTTP_1.1
http://en.wikipedia.org/wiki/SPDY
http://en.wikipedia.org/wiki/HTTP_2.0
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-http2-08

Chunked encoding and content-length header

Is it possible to set the content-length header and also use chunked transfer encoding? and does doing so solve the problem of not knowing the length of the response at the client side when using chunked?
the scenario I'm thinking about is when you have a large file to transfer and there's no problem in determining its size, but it's too large to be buffered completely.
(If you're not using chunked, then the whole response must get buffered first? Right??)
thanks.
No:
"Messages MUST NOT include both a Content-Length header field and a non-identity transfer-coding. If the message does include a non-identity transfer-coding, the Content-Length MUST be ignored." (RFC 2616, Section 4.4)
And no, you can use Content-Length and stream; the protocol doesn't constrain how your implementation works.
Well, you can always send a header stating the size of the file.
Something like response.addHeader("File-Size","size of the file");
And ignore the Content-Length header.
The client implementation has to be tweaked to read this value, but hey you can achieve both the things you want :)
You have to use either Content-Length or chunking, but not both.
If you know the length in advance, you can use Content-Length instead of chunking even if you generate the content on the fly and never have it all at once in your buffer.
However, you should not do that if the data is really large because a proxy might not be able to handle it. For large data, chunking is safer.
This headers can be cause of Postman Parse Error:
"Content-Length" and "Transfer-Encoding" can't be present in the response headers together.
Using parametrized ResponseEntity<?> except raw ResponseEntity in controller can fixed the issue.
The question asks:
Is it possible to set the content-length header and also use chunked transfer encoding?
The RFC HTTP/1.1 spec, quoted in Julian's answer, says:
Messages MUST NOT include both a Content-Length header field and a non-identity transfer-coding.
There is an important difference between what's possible, and what's allowed by a protocol. It is certainly possible, for example, for you to write your own HTTP/1.1 client which sends malformed messages with both headers. You would be violating the HTTP/1.1 spec in doing so, and so you'd imagine some alarm bells would go off and a bunch of Internet police would burst into your house and say, "Stop, arrest that client!" But that doesn't happen, of course. Your request will get sent to wherever it's going.
OK, so you can send a malformed message. So what? Surely on the receiving end, the server will detect the HTTP/1.1 protocol client-side violation, vanquish your malformed request, and serve you back a stern 400 response telling you that you are due in court the following Monday for violating the protocol. But no, actually, that probably won't happen. Of course, it's beyond the scope of HTTP/1.1 to prescribe what happens to misbehaving clients; i.e. while the HTTP/1.1 protocol is analogous to the "law", there is nothing in HTTP/1.1 analogous to the judicial system.
The best that the HTTP/1.1 protocol can do is dictate how a server must act/respond in the case of receiving such a malformed request. However, it's quite lenient in this case. In particular, the server does not have to reject such malformed requests. In fact, in such a scenario, the rule is:
If the message does include a non-identity transfer-coding, the Content-Length MUST be ignored.
Unfortunately, though, some HTTP servers will violate that part of the HTTP/1.1 protocol and will actually give precedence to the Content-Length header, if both headers are present. This can cause a serious problem, if the message visits two servers in sequence in the same system and they disagree about where one HTTP message ends and the next one starts. It leaves the system vulnerable to HTTP Desync attacks a.k.a. Request Smuggling.

How could we fool the HTTP protocol?

Although HTTP is ubiquitous it comes with its baggage of Headers which in my case is becoming more of a problem.
My data to be transferred is an iota of the HTTP header size.
Is there another protocol that I can
use which is still understood by the
browsers and other networks and doesn't come with the
baggage of HTTP?
Any other way to skip headers and add it at the destination so only a miniscule of data is transferred over the network?
No.
No.
Many HTTP headers are optional. A typical browser request is much larger than a minimal request, which might look like:
GET /doc HTTP/1.1
Host: example.com
Connection: close
(I can say with confidence that requests of this form work because I use them all the time when testing Web server response via telnet example.com 80.)
Possibly you can get useful results simply by omitting some headers.
HTTP requests can be quite small. As chaos points out in his answer, you don't really need to send many headers with a request. The only header that's essential is Host. I can simplify chaos' example a bit more by using HTTP 1.0, which doesn't feature persistent connections.
GET / HTTP/1.0
Host: example.com
(blank line is necessary)
The reply can be similarly simple
HTTP/1.0 200 OK
Content-Type: text/html
data content
In this case, the overhead of HTTP is about 40 bytes in the request and the response. A standard TCP packet is 1500 bytes so you have plenty of room left over in the response packet for the actual data.
There are other HTTP headers, and they do have value. You can include cache information and do conditional GETs. You can use an HTTP/1.1 persistent socket to make subsequent requests faster. Etc, etc. You don't have to use any of this stuff if you don't want, but one nice thing about HTTP is there's a standard way to do more complicated protocols when you need it.
As for doing minimal HTTP in JavaME, if you really care about every byte you may be best off writing your own simple HTTP client by working with a plain TCP socket. If you're talking to a known server you don't need to implement much at all. (If you're talking to arbitrary servers, you need to pay more attention to error handling, redirects, etc).
WebSockets are coming in HTML5 and should suit your needs. A standard HTTP connection can be renegotiated to change protocol to websockets. But I suspect the specification might be a bit young, but it might fit the bill.

Is HTTP/1.0 still in use?

Say one is to write an HTTP server/client, how important is it to support HTTP/1.0? Is it still used anywhere nowdays?
Edit: I'm less concerned with the usefullness/importance of HTTP/1.0, rather the amount of software that actually uses it for non-internal (unit testing being internal use, for example) purposes in the real world (browsers, robots, smartphones/stupidphones, etc...).
As of 2016, you would think that the prominence would decline even more since 1.1 was introduced in 1999 so this is about 17 years.
I checked 7,727,198 lines of logs to see what percent I get of HTTP/1.0 and HTTP/1.1:
Protocol Counts Percent
--------------------------------
HTTP/0.9 0 0.00%
HTTP/1.0 1,636,187 21.17% (all)
HTTP/1.0 15,415 0.20% (without the obvious robots)
HTTP/1.1 6,091,011 78.83%
HTTP/2 0 0.00%
From what I can see, most of the HTTP/1.0 are from robots. So I tried to remove entries that were obviously from such (i.e. Agent including the word robot, bot, slurp, etc.)
So it looks like the amount of end users still stuck with HTTP/1.0 is very limited today (0.2%). However, if you want to let robots check out your websites, you may need/want to keep HTTP/1.0 operational. Most will anyway include the Host: ... header even though they advertise their connection as an HTTP/1.0 protocol.
Also, the differences between HTTP/1.0 and HTTP/1.1 is very blurry in terms of implementation. Most people are happily mixing both. I would not worry too much about still accepting/handling HTTP/1.0 requests.
On another server I am starting to see HTTP/2.0 requests that look like this (got 2427 and I see 34,161,268 HTTP/1.0 and HTTP/1.1 requests, so 0.007%):
PRI * HTTP/2.0
wget uses HTTP/1.0, and it is still relatively popular (though it does support a few HTTP/1.1 features like the Host: header, which is necessary to access any virtual hosts).
A fair number of servers will deliberately return HTTP/1.0 responses because some (older) browsers will afford a HTTP/1.0 server a higher connection limit than the 2-connection limit imposed for HTTP/1.1's persistent connections.
But in general, most "HTTP/1.0" implementations are really just slightly limited versions of the HTTP/1.1 implementations, and many HTTP/1.1 implementations don't really support some features of that version (e.g. pipelining in particular).
I use it all the time when I'm telnet-ing to a server to verify connectivity or figure out why it's not working:
$ telnet 192.168.1.1 80
GET / HTTP/1.0\r\n
\r\n
...
(Because making a 1.0 request doesn't require that I provide any extra headers).
HTTP/1.0 is very important in writing very basic clients that don't need the overhead of all the 1.1 things like pipelining and other complicated things required by 1.1. Post a request get a response and disconnect is very easy to code for. This might be useful in writing test cases for your server that just want to test the application functionality and NOT the HTTP protocol implementation.
There are lots of mobile browsers and applications that use 1.0 because they don't have the space or need for more sophisticated 1.1 implementations, and the latency issues with non-3G connections on non-smart phones completely negates any benefits of 1.1 features.
There are also lots of proxies that degrade everything to 1.0 regardless of what the client asks for, and then there is IE issues.
So the short answer is, for a general purpose HTTP server, 1.0 is very relevant.
Looking into this myself for other purposes:
"HTTP/1.0 is in use by proxies, some mobile clients, and IE when
configured to use a proxy. So 1.0 appears to still account for a non-
trivial % of traffic on the web overall.
...
Yes, there are many 1.0 clients still out there."
Source (July 2009): http://groups.google.com/group/erlang-programming/msg/08f6b72d5156ef74
:-(
Update (March 2011):
If you are going to build a client/server thingy, make the client use HTTP/1.1, and make the server accept both 1.1 and 1.0.
Doing web-development, it is a PITA to get clients trying to load a page without the Host header, because I have no way to know which site I am supposed to load :-S
So you better don't build a client like that ;-)
IME its been a very long time since I've seen a true HTTP/1.0 request. (including mobile devices fuzzylollipop).
I say a true request as MSIE still (pretends) to downgrade to HTTP/1.0 by default (unless yo sig in the config) when you connect via a proxy (all the outgoing requests are flagged as HTTP/1.0) - however it still includes HTTP/1.1 specific request headers and respects all the HTTP/1.1 responses.
Curiously, IIS, in a mirror image, happily ignores the HTTP version (although I've not experimented much with this to see if only does this for MSIE user agents).
So by curious coincidence, MSIE and IIS work much better with proxies than with standards-compliant tools.
C.

Resources