How does a browser (or other HTTP client) know the size of a chunked HTTP response before it has finished downloading? - http

I'm teaching myself how to make a rudimentary HTTP server, and I've recently learned about Transfer-Encoding: chunked. I understand that each chunk reports its own size, but all of the documentation I can find seems to indicate that Content-Length on the initial response is only useful for a standard response body, and is ignored for chunked content, making it effectively impossible for a client to know the size of a chunked response body until it's finished.
However, nearly every file I've ever downloaded in my time on the internet has somehow reported its size to the browser ahead of time, so it's clearly not only possible, but common, to the point where it's odd not to.
Is this (non-standard?) behavior common HTTP clients implement, reading the Content-Length (or some other) header as an indicator of total chunked length, or something else entirely?

Related

What is the point in Transfer-Encoding chunked

I've read about chunked encoding in few places, but still don't quite get why for example a Radio streaming server such as this one sends its data as chunked Transfer-Encoding. A client of such server just continuously reads data from the server. It doesn't in any point needs to know how much data is currently being sent since it is always consuming data (as long as it stays connected).
It seems that the client doesn't use these pieces of data, he just discards them, which adds more job for him.. Can anyone explain this to me?
HTTP/1.1 needs to either send a Content-Length, or chunked encoding. If you can't know the length, you must use chunked encoding. A continuous stream would in theory have an infinite length.
A HTTP client needs either of these to know when the response ends. After the respons a new HTTP request could be sent.
You are correct that in the case of a continuous stream it's not needed to detect when the stream ends, because it doesn't. It could have been possible for the authors of HTTP/1.1 to have a third 'continuous stream of bytes' option for HTTP. Perhaps that use-case wasn't considered all those years ago.
Note that HTTP/2 doesn't have chunked encoding anymore, but it does something similar and sends multiple body frames.

How to write malformed HTTP response to "guarantee" something akin to HTTP 500

Say I started writing to the response body, but there was some error, and I need to indicate that it's an HTTP 500 even if an HTTP 200 OK header was already written as a header...
How can I write something to the body of the response that's guaranteed to be malformed so that the response is interpreted as some sort of error by the client?
In general, this is impossible. Some clients only care about the response header, and may stop paying attention to what you send after the header.
But with certain clients, in certain cases, this may be possible.
I assume HTTP/1.1 here. HTTP/2 probably gives even more opportunities, because there’s more to screw up in the protocol, and the implementations are often stricter. Conversely, HTTP/1.0 is dumber and laxer, so harder to break.
Close the connection before the end of response, as indicated by your framing. If your response is framed with Content-Length: 100, close before you’ve sent the 100th byte of payload. If your response is framed with Transfer-Encoding: chunked, close before you’ve sent the final empty chunk. If the client expects to receive the entire payload, it may (and should) treat this as an error. But some won’t, including very popular client libraries.
If the payload is in a structured format, like JSON or XML, then do the same as 1, but before closing, send something that would disrupt that format. For example, no valid JSON text can end with {. Even if the client doesn’t recognize the incomplete payload as an error, it might then fail on trying to parse it.
Same as 1, but instead of closing the connection, just stop sending data. The client will “hang” until its receive operation times out, which it may treat as an error. This may be a bad idea if the client is operated by someone who is not prepared for such extravagant timeouts.
Only with Transfer-Encoding: chunked: Same as 3, but instead of hanging, send bogus very long chunks and/or keep sending chunks indefinitely, until the client gives up or crashes. Probably a very bad idea, bordering on malicious.

HEAD headers differ from GET, chunked transfer

A web application under test behaves in an odd way. A HEAD request returns the header Content-Length, but the consequent GET returns Transfer-Encoding: chunked. I expected the headers to be equal, and RFC says SHOULD, so my question is: how legit and how common is this behaviour?
UPDATE It turns out, that the root cause of the problem is HAProxy's behaviour. If that's a HEAD request, the response is propagated as is from the application underneath. But for GET it applies the compression and sets the chunked transfer. I'll close this question as an off-topic and perhaps will ask at ServerFault.
If the server use chunked encoding for GET, but returns Content-Length for HEAD this is IMHO an indication that the information returned for HEAD is unlikely to be correct.
The HEAD method response does not return entity-body but GET responds with an entity-body, if the HTTP server has the "Chunked transfer encoding" enabled does not send the "Content-Length" in the response because is not used, the server does not need to know the length of the content before it starts transmitting a response to the client. The server can begin transmitting dynamically-generated content to the client before knowing the total size of that content. Perhaps this is the most likely explanation.

Chunked encoding and content-length header

Is it possible to set the content-length header and also use chunked transfer encoding? and does doing so solve the problem of not knowing the length of the response at the client side when using chunked?
the scenario I'm thinking about is when you have a large file to transfer and there's no problem in determining its size, but it's too large to be buffered completely.
(If you're not using chunked, then the whole response must get buffered first? Right??)
thanks.
No:
"Messages MUST NOT include both a Content-Length header field and a non-identity transfer-coding. If the message does include a non-identity transfer-coding, the Content-Length MUST be ignored." (RFC 2616, Section 4.4)
And no, you can use Content-Length and stream; the protocol doesn't constrain how your implementation works.
Well, you can always send a header stating the size of the file.
Something like response.addHeader("File-Size","size of the file");
And ignore the Content-Length header.
The client implementation has to be tweaked to read this value, but hey you can achieve both the things you want :)
You have to use either Content-Length or chunking, but not both.
If you know the length in advance, you can use Content-Length instead of chunking even if you generate the content on the fly and never have it all at once in your buffer.
However, you should not do that if the data is really large because a proxy might not be able to handle it. For large data, chunking is safer.
This headers can be cause of Postman Parse Error:
"Content-Length" and "Transfer-Encoding" can't be present in the response headers together.
Using parametrized ResponseEntity<?> except raw ResponseEntity in controller can fixed the issue.
The question asks:
Is it possible to set the content-length header and also use chunked transfer encoding?
The RFC HTTP/1.1 spec, quoted in Julian's answer, says:
Messages MUST NOT include both a Content-Length header field and a non-identity transfer-coding.
There is an important difference between what's possible, and what's allowed by a protocol. It is certainly possible, for example, for you to write your own HTTP/1.1 client which sends malformed messages with both headers. You would be violating the HTTP/1.1 spec in doing so, and so you'd imagine some alarm bells would go off and a bunch of Internet police would burst into your house and say, "Stop, arrest that client!" But that doesn't happen, of course. Your request will get sent to wherever it's going.
OK, so you can send a malformed message. So what? Surely on the receiving end, the server will detect the HTTP/1.1 protocol client-side violation, vanquish your malformed request, and serve you back a stern 400 response telling you that you are due in court the following Monday for violating the protocol. But no, actually, that probably won't happen. Of course, it's beyond the scope of HTTP/1.1 to prescribe what happens to misbehaving clients; i.e. while the HTTP/1.1 protocol is analogous to the "law", there is nothing in HTTP/1.1 analogous to the judicial system.
The best that the HTTP/1.1 protocol can do is dictate how a server must act/respond in the case of receiving such a malformed request. However, it's quite lenient in this case. In particular, the server does not have to reject such malformed requests. In fact, in such a scenario, the rule is:
If the message does include a non-identity transfer-coding, the Content-Length MUST be ignored.
Unfortunately, though, some HTTP servers will violate that part of the HTTP/1.1 protocol and will actually give precedence to the Content-Length header, if both headers are present. This can cause a serious problem, if the message visits two servers in sequence in the same system and they disagree about where one HTTP message ends and the next one starts. It leaves the system vulnerable to HTTP Desync attacks a.k.a. Request Smuggling.

How can I set Transfer-Encoding to chunked, explicitly or implicitly, in an ASP.NET response?

Can I simply set the Transfer-Encoding header?
Will calling Response.Flush() at some point cause this to occur implicitly?
EDIT
No, I Cannot call Response.Headers.Add("Transfer-Encoding","anything"); That throws.
any other suggestions?
Related:
Enable Chunked Transfer Encoding in ASP.NET
TL;DR: Specifying the content-length is the best way to achieve a fast first byte; you'll allow chunking at TCP rather than HTTP level. If you don't know the content-length, setting context.Response.BufferOutput to false will send output as it's written the the output stream using chunked transfer-encoding.
Why do you want to set Transfer-Encoding: chunked? Chunked transfers are essentially a work-around to permit sending documents whose content-length is not known in advance. ASP.NET, however, by default buffers the entire output and hence does know the overall content length.
Of course, HTTP is layered over TCP, and behind the scene TCP is "chunking" anyhow by splitting even a monolithic HTTP response into packets - meaning that if you specify the content-length up front and disable output buffering, you'll get the best latency without requiring HTTP-level chunking. Thus, you don't need HTTP-level chunking to provide a fast first byte when you know the content-length.
Although I'm not an expert on HTTP, I have implemented a simple streaming media server with seeking support, dynamic compression, caching etc. and I do have a reasonable grasp of the relevance of a fast first byte - and chunking is generally an inferior option if you know the content-length - which is almost certainly why ASP.NET won't let you set it manually - it's just not necessary.
However, if you don't know the HTTP content length before transmission and buffering is too expensive, you turn off output buffering and presumably the server will use a chunked transfer encoding by necessity.
When does the server use chunked transfer encoding? I just tested, and indeed if context.Response.BufferOutput is set to false, and when the content length is not set, the response is chunked; such a response is 1-2% larger in my entirely non-scientific quick test of a 1.7MB content-encoding: gzip xml document. Since gzip relies on context to reduce redundancy, I'd expected the compression ratio to suffer more, but it seems that chunking doesn't necessarily greatly reduce compression ratios.
If you look at the framework code in reflector, it seems that the transfer encoding is indeed set automatically as needed - i.e. if buffering is off AND no content length is known AND the response is to an HTTP/1.1 request, chunked transfer encoding is used. However, if the server is IIS7 and this is a worker request (?integrated mode?), the code branches to a native method - probably with the same behavior, but I can't verify that.
It looks like you need to setup IIS for this. IIS 6 has a property AspEnableChunkedEncoding in the metabase and you can see the IIS 7 mappings for this on MSDN at http://msdn.microsoft.com/en-us/library/aa965021(VS.90).aspx.
This will enable you to set TRANSFER-ENCODING: chunked in your header. I hope this helps.
Although you set Buffer to false and leave empty the content length, you need to make sure that you have disabled "Dynamic Content Compressing" feature for IIS7 to make chunked response working. Also, client browser should have at least HTTP 1.1 .. Chunked mode won't be working for HTTP 1.0
Response.Buffer = False
This will set HTTP Header "Tranfer-Encoding:Chuncked" and send the response each callled response.write

Resources