Content-Encoding vs Transfer Encoding in HTTP - http

I have a question on usage of Content-Encoding and Transfer-Encoding:
Please let me know if my below understanding is right:
Client in its request can specify which encoding types it is willing to accept using accept-encoding header. So, if Server wishes to encode the message before transmission, eg. gzip, it can zip the entity (content) and add content-encoding: gzip and send across the HTTP response. On reception, client can receive and decompress and parse the entity.
In case of Transfer Encoding, Client may specify what kind of encoding it is willing to accept and perform its action on fly. i.e. if Client sends a TE: gzip; q=1, it means that if Server wishes, it can send a 200 OK with Transfer-Encoding: gzip and as it tries sending the stream, it can compress and send across, and client upon receiving the content, can decompress on fly and perform its parsing.
Is my understanding right here? Please comment.
Also, what is the basic advantage of compressing the entity on fly vs compressing the entity first and then transmitting it across? Is transfer-encoding valid only for chunked responses as we do not know the size of the entity before transmission?

The difference really is not about on-the-fly or not -- Content-Encoding can be both pre-computed and on the fly.
The differences are:
Transfer Encoding is hop-by-hop, not end-to-end
Transfer Encodings other than "chunked" (sadly) aren't implemented in practice
Transfer Encoding is on the message layer, Content Encoding on the payload layer
Using Content Encoding affects entity tags etc.
See http://greenbytes.de/tech/webdav/rfc7230.html#transfer.codings and http://greenbytes.de/tech/webdav/rfc7231.html#data.encoding.

Related

Chunked Transfer Encoding, Content-Length, and Byte Serving

In https://gist.github.com/CMCDragonkai/6bfade6431e9ffb7fe88, it says
Do note that byte serving is compatible with chunked encoding, this
would be applicable where you know the total content length, want to
allow partial or resumable downloads, but you want to stream each
partial response to the client.
I thought that if you want to allow partial and resumable downloads, you need to use the Content-Length HTTP header which is not allowed with chunked encoding. Is my understanding incorrect?

Difference between multipart and chunked protocol?

Can some experts explain the differences between the two? Is it true that chunked is a streaming protocol and multipart is not? What is the benefit of using multipart?
More intuitively,
Chunking is a way to send a single message from server to client, where the server doesn't have to wait for the entire response to be generated but can send pieces (chunks) as and when it is available. Now this happens at data transfer level and is oblivious to the client. Appropriately it is a 'Transfer-Encoding' type.
While Multi-part happens at the application level and is interpreted at the application logic level. Here the server is telling client that the content , even if it is one response body it has different logical parts and can be parsed accordingly. Again appropriately, this is a setting at 'Content-Type' as the clients ought to know it.
Given that transfer can be chunked independent of the content types, a multi-part http message can be transferred using chunked encoding by the server if need be.
Neither is a protocol. HTTP is the protocol. In fact, the P in HTTP stands for Protocol.
You can read more on chunked and multipart under Hypertext Transfer Protocol 1.1
Chunked is a transfer coding found in section 3.6 Transfer Codings.
Multipart is a media type found in section 3.7.2 Multipart Types a subsection of 3.7 Media Types.
Chunked also affects other aspects of the protocol such as the content-length as specified under 4.4 as chunked must be used when message length cannot be predetermined (mainly when delivering dynamic content).
From 14.41 (Transfer-Encoding header field)
The Transfer-Encoding general-header field indicates what (if any)
type of transformation has been applied to the message body in order
to safely transfer it between the sender and the recipient. This
differs from the content-coding in that the transfer-coding is a
property of the message, not of the entity.
Put more simply, chunking is how you transfer a block of data, while multipart is the shape of the data.

avoiding chunked encoding of HTTP/1.1 response

I want to avoid ever getting chunked encoded HTTP server response from (conforming) HTTP server. I am reading RFC 2616 section "14.39 TE" and it seems to me that I could avoid it by specifying TE: chunked;q=0. If I cannot avoid the chunked encoding, I want do avoid the trailers. Will specifying TE: trailers;q=0 work?
From rfc2616 - Hypertext Transfer Protocol -- HTTP/1.1 in section 3.6.1 Chunked Transfer Coding:
All HTTP/1.1 applications MUST be able to receive and decode the
"chunked" transfer-coding, and MUST ignore chunk-extension extensions
they do not understand.
This is still the case in the updated RFC 7230 - Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing in section 4.1. Chunked Transfer Coding although in a slightly different wording:
A recipient MUST be able to parse and decode the chunked transfer
coding.
So if you want to be conform to HTTP/1.1, you will have to accept chunked encoding.
##Update##
As for the trailers: I think if you don't send a TE header field in your request, a conforming server shouldn't send you any trailers. If it still sends trailers you are probably save to ignore them (again section 3.6.1):
A server using chunked transfer-coding in a response MUST NOT use the
trailer for any header fields unless at least one of the following is
true:
a) the request included a TE header field that indicates "trailers" is
acceptable in the transfer-coding of the response, as described in
section 14.39; or,
b) the server is the origin server for the response, the trailer
fields consist entirely of optional metadata, and the recipient
could use the message (in a manner acceptable to the origin server)
without receiving this metadata. In other words, the origin server
is willing to accept the possibility that the trailer fields might
be silently discarded along the path to the client.

Transfer-Encoding: gzip vs. Content-Encoding: gzip

What is the current state of affairs when it comes to whether to do
Transfer-Encoding: gzip
or a
Content-Encoding: gzip
when I want to allow clients with e.g. limited bandwidth to signal their willingness to accept a compressed response and the server have the final say whether or not to compress.
The latter is what e.g. Apache's mod_deflate and IIS do, if you let it take care of compression. Depending on the size of the content to be compressed, it will do the additional Transfer-Encoding: chunked.
It will also include a Vary: Accept-Encoding, which already hints at the problem. Content-Encoding seems to be part of the entity, so changing the Content-Encoding amounts to a change of the entity, i.e. a different Accept-Encoding header means e.g. a cache cannot use its cached version of the otherwise identical entity.
Is there a definite answer on this that I have missed (and that's not buried inside a message in a long thread in some apache newsgroup)?
My current impression is:
Transfer-Encoding would in fact be the right way to do what is mostly done with Content-Encoding by existing server and client implentations
Content-Encoding, because of its semantic implications, carries a couple of issues (what should the server do to the ETag when it transparently compresses a response?)
The reason is chicken'n'egg: Browsers don't support it because servers don't because browsers don't
So I am assuming the right way would be a Transfer-Encoding: gzip (or, if I additionally chunk the body, it would become Transfer-Encoding: gzip, chunked). And no reason to touch Vary or ETag or any other header in that case as it's a transport-level thing.
For now I don't care too much about the 'hop-by-hop'-ness of Transfer-Encoding, something that others seem to be concerned about first and foremost, because proxies might uncompress and forward uncompressed to the client. However, proxies might just as well forward it as-is (compressed), if the original request has the proper Accept-Encoding header, which in case of all browsers that I know is a given.
Btw, this issue is at least a decade old, see e.g.
https://bugzilla.mozilla.org/show_bug.cgi?id=68517 .
Any clarification on this will be appreciated. Both in terms of what is considered standards-compliant and what is considered practical. For example, HTTP client libraries only supporting transparent "Content-Encoding" would be an argument against practicality.
The correct usage, as defined in RFC 2616 and actually implemented in the wild, is for the client to send an Accept-Encoding request header (the client may specify multiple encodings). The server may then, and only then, encode the response according to the client's supported encodings (if the file data is not already stored in that encoding), indicate in the Content-Encoding response header which encoding is being used. The client can then read data off of the socket based on the Transfer-Encoding (ie, chunked) and then decode it based on the Content-Encoding (ie: gzip).
So, in your case, the client would send an Accept-Encoding: gzip request header, and then the server may decide to compress (if not already) and send a Content-Encoding: gzip and optionally Transfer-Encoding: chunked response header.
And yes, the Transfer-Encoding header can be used in requests, but only for HTTP 1.1, which requires that both client and server implementations support the chunked encoding in both directions.
ETag uniquely identifies the resource data on the server, not the data actually being transmitted. If a given URL resource changes its ETag value, it means the server-side data for that resource has changed.
Quoting Roy T. Fielding, one of the authors of RFC 2616:
changing content-encoding on the fly in an inconsistent manner
(neither "never" nor "always) makes it impossible for later requests
regarding that content (e.g., PUT or conditional GET) to be handled
correctly. This is, of course, why performing on-the-fly
content-encoding is a stupid idea, and why I added Transfer-Encoding
to HTTP as the proper way to do on-the-fly encoding without changing
the resource.
Source: https://issues.apache.org/bugzilla/show_bug.cgi?id=39727#c31
In other words: Don't do on-the-fly Content-Encoding, use Transfer-Encoding instead!
Edit: That is, unless you want to serve gzipped content to clients that only understand Content-Encoding. Which, unfortunately, seems to be most of them. But be aware that you leave the realms of the spec and might run into issues such as the one mentioned by Fielding as well as others, e.g. when caching proxies are involved.

Is it possible to have chunked Http GET requests?

I am trying to write a simple proxy. I just want to know whether it is possible to have chunked Http GET requests?
The answer is no and yes simultaneously. GET requests don't have any content, so they obviously cannot use chunked transfer encoding (there is nothing to transfer). However the response to a GET request can contain a body that is encoded using chunked transfer encoding. So whenever there is a body, chunked transfer encoding may be used. The wikipedia page has more information and also links to the corresponding RFC.

Resources