RFC 7230 §4.1 defined "chunk extensions", additional key-value pairs that can be sent along with chunk in an HTTP message encoded with Transfer-Encoding: chunked:
The chunked encoding allows each chunk to include zero or more chunk extensions, immediately following the chunk-size, for the sake of supplying per-chunk metadata (such as a signature or hash), mid-message control information, or randomization of message body size.
HTTP/2 claims,
HTTP/2 is intended to be as compatible as possible with current uses of HTTP. This means that, from the application perspective, the features of the protocol are largely unchanged. To achieve this, all request and response semantics are preserved, although the syntax of conveying those semantics has changed.
In HTTP/2, the DATA frames are used to carry data, in chunks, instead of the chunked Transfer-Encoding:
HTTP/2 uses DATA frames to carry message payloads. The "chunked" transfer encoding defined in Section 4.1 of [RFC7230] MUST NOT be used in HTTP/2.
but, AFAICT, DATA frames offer no support for chunk extensions, only an optional padding.
In particular, how is an HTTP/1.1 to HTTP/2 proxy, receiving a request containing chunks bearing well-formed chunk-extensions supposed to translate those chunks to HTTP/2?
In https://greenbytes.de/tech/webdav/rfc7540.html#rfc.section.8.1.p.4:
HTTP/2 uses DATA frames to carry message payloads. The chunked transfer encoding defined in Section 4.1 of [RFC7230] MUST NOT be used in HTTP/2.
So no chunked encoding, thus no chunked extensions.
In particular, how is an HTTP/1.1 to HTTP/2 proxy, receiving a request containing chunks bearing well-formed chunk-extensions supposed to translate those chunks to HTTP/2?
RFC 7230 §4.1.1 states:
A recipient MUST ignore unrecognized chunk extensions.
You'll have difficulty recognizing any chunk extensions, since none have ever been defined that I can tell.
Transfer-Encoding is a hop-by-hop header, so you can compliantly discard any chunk extensions you receive and generate DATA frames from the chunked data.
In HTTP/2, the DATA frames are used to carry data, in chunks, instead of the chunked Transfer-Encoding
Simply put, each DATA frame is a chunk. There's no need to use the HTTP/1.1 chunked encoding, because HTTP/2 data frames provide the same set of features.
Related
I've read about chunked encoding in few places, but still don't quite get why for example a Radio streaming server such as this one sends its data as chunked Transfer-Encoding. A client of such server just continuously reads data from the server. It doesn't in any point needs to know how much data is currently being sent since it is always consuming data (as long as it stays connected).
It seems that the client doesn't use these pieces of data, he just discards them, which adds more job for him.. Can anyone explain this to me?
HTTP/1.1 needs to either send a Content-Length, or chunked encoding. If you can't know the length, you must use chunked encoding. A continuous stream would in theory have an infinite length.
A HTTP client needs either of these to know when the response ends. After the respons a new HTTP request could be sent.
You are correct that in the case of a continuous stream it's not needed to detect when the stream ends, because it doesn't. It could have been possible for the authors of HTTP/1.1 to have a third 'continuous stream of bytes' option for HTTP. Perhaps that use-case wasn't considered all those years ago.
Note that HTTP/2 doesn't have chunked encoding anymore, but it does something similar and sends multiple body frames.
I have a question on usage of Content-Encoding and Transfer-Encoding:
Please let me know if my below understanding is right:
Client in its request can specify which encoding types it is willing to accept using accept-encoding header. So, if Server wishes to encode the message before transmission, eg. gzip, it can zip the entity (content) and add content-encoding: gzip and send across the HTTP response. On reception, client can receive and decompress and parse the entity.
In case of Transfer Encoding, Client may specify what kind of encoding it is willing to accept and perform its action on fly. i.e. if Client sends a TE: gzip; q=1, it means that if Server wishes, it can send a 200 OK with Transfer-Encoding: gzip and as it tries sending the stream, it can compress and send across, and client upon receiving the content, can decompress on fly and perform its parsing.
Is my understanding right here? Please comment.
Also, what is the basic advantage of compressing the entity on fly vs compressing the entity first and then transmitting it across? Is transfer-encoding valid only for chunked responses as we do not know the size of the entity before transmission?
The difference really is not about on-the-fly or not -- Content-Encoding can be both pre-computed and on the fly.
The differences are:
Transfer Encoding is hop-by-hop, not end-to-end
Transfer Encodings other than "chunked" (sadly) aren't implemented in practice
Transfer Encoding is on the message layer, Content Encoding on the payload layer
Using Content Encoding affects entity tags etc.
See http://greenbytes.de/tech/webdav/rfc7230.html#transfer.codings and http://greenbytes.de/tech/webdav/rfc7231.html#data.encoding.
Can some experts explain the differences between the two? Is it true that chunked is a streaming protocol and multipart is not? What is the benefit of using multipart?
More intuitively,
Chunking is a way to send a single message from server to client, where the server doesn't have to wait for the entire response to be generated but can send pieces (chunks) as and when it is available. Now this happens at data transfer level and is oblivious to the client. Appropriately it is a 'Transfer-Encoding' type.
While Multi-part happens at the application level and is interpreted at the application logic level. Here the server is telling client that the content , even if it is one response body it has different logical parts and can be parsed accordingly. Again appropriately, this is a setting at 'Content-Type' as the clients ought to know it.
Given that transfer can be chunked independent of the content types, a multi-part http message can be transferred using chunked encoding by the server if need be.
Neither is a protocol. HTTP is the protocol. In fact, the P in HTTP stands for Protocol.
You can read more on chunked and multipart under Hypertext Transfer Protocol 1.1
Chunked is a transfer coding found in section 3.6 Transfer Codings.
Multipart is a media type found in section 3.7.2 Multipart Types a subsection of 3.7 Media Types.
Chunked also affects other aspects of the protocol such as the content-length as specified under 4.4 as chunked must be used when message length cannot be predetermined (mainly when delivering dynamic content).
From 14.41 (Transfer-Encoding header field)
The Transfer-Encoding general-header field indicates what (if any)
type of transformation has been applied to the message body in order
to safely transfer it between the sender and the recipient. This
differs from the content-coding in that the transfer-coding is a
property of the message, not of the entity.
Put more simply, chunking is how you transfer a block of data, while multipart is the shape of the data.
I am trying to write a simple proxy. I just want to know whether it is possible to have chunked Http GET requests?
The answer is no and yes simultaneously. GET requests don't have any content, so they obviously cannot use chunked transfer encoding (there is nothing to transfer). However the response to a GET request can contain a body that is encoded using chunked transfer encoding. So whenever there is a body, chunked transfer encoding may be used. The wikipedia page has more information and also links to the corresponding RFC.
I'm trying to get my webserver to correctly gzip an http response that is chunk encoding.
my understanding of the non-gzip response is that it looks like this:
<the response headers>
and then for each chunk,
<chunk length in hex>\r\n<chunk>\r\n
and finally, a zero length chunk:
0\r\n\r\n
I've tried to get gzip compression working and I could use some help figuring out what should actually be returned. This documentation implies that the entire response should be gzipped, as opposed to gzipping each chunk:
HTTP servers sometimes use compression (gzip) or deflate methods to optimize transmission.
Chunked transfer encoding can be used to delimit parts of the compressed object.
In this case the chunks are not individually compressed. Instead, the complete payload
is compressed and the output of the compression process is chunk encoded.
I tried to gzip the entire thing and return the response even without chunked, and it didn't work. I tried setting the Content-Encoding header to "gzip". Can someone explain what changes must be made to the above scheme to support gzipping of chunks? Thanks.
In case the other answers weren't clear enough:
First you gzip the body with zlib (this can be done in a stream so you don't need the whole thing in memory at once, which is the whole point of chunking).
Then you send that compressed body in chunks (presumably the ones provided by the gzip stream, with the chunk header to declare how long it is), with the Content-Encoding: gzip and Transfer-Encoding: chunked headers (and no Content-Length header).
If you're using gzip or zcat or some such utility for the compression, it probably won't work. Needs to be zlib. If you're creating the chunks and then compressing them, that definitely won't work. If you think you're doing this right and it's not working, you might try taking a packet trace and asking questions based on that and any error messages you're getting.
You gzip the content, and only then apply the chunked encoding:
"Since "chunked" is the only transfer-coding required to be understood by HTTP/1.1 recipients, it plays a crucial role in delimiting messages on a persistent connection. Whenever a transfer-coding is applied to a payload body in a request, the final transfer-coding applied MUST be "chunked". If a transfer-coding is applied to a response payload body, then either the final transfer-coding applied MUST be "chunked" or the message MUST be terminated by closing the connection. When the "chunked" transfer-coding is used, it MUST be the last transfer-coding applied to form the message-body. The "chunked" transfer-coding MUST NOT be applied more than once in a message-body."
(HTTPbis Part1, Section 6.2.1)
Likely you are not really sending an appropriately gzipped response.
Try setting the window bits to 31 in zlib. And use deflateInit2().