Can I be sure that a chunked HTTP response will be sent uninterrupted by anything else? I need to differentiate responses (and requests) and this isn't a simple case of reading content length, seeing a closed connection or a no-body response code.
Can I read each chunk and once chunk-size is 0 I will have read exactly one response (or request)? i.e. is it possible for part of any other response to have been sent interleaved? I suspect it is sent consecutively and uninterrupted as there doesn't appear to be any kind of identification in the spec for chunked transfer, so how could more than one be reassembled?
Finally, if a response is sent chunked, does the client send anything more than its original request? I'm thinking along the lines of flow control and error checking but that is all handled at lower layers, so I suspect that the client does not send anything more.
Thanks!
"Interleaved" with what exactly? HTTP doesn't allow to send several responses concurrently on the same connection. Even with pipelining responses are still sent after each other. That is, you will see all the chunks coming in order before the response to any other request.
As for your final question, no, the client doesn't send anything more than the original request.
Related
I've read about chunked encoding in few places, but still don't quite get why for example a Radio streaming server such as this one sends its data as chunked Transfer-Encoding. A client of such server just continuously reads data from the server. It doesn't in any point needs to know how much data is currently being sent since it is always consuming data (as long as it stays connected).
It seems that the client doesn't use these pieces of data, he just discards them, which adds more job for him.. Can anyone explain this to me?
HTTP/1.1 needs to either send a Content-Length, or chunked encoding. If you can't know the length, you must use chunked encoding. A continuous stream would in theory have an infinite length.
A HTTP client needs either of these to know when the response ends. After the respons a new HTTP request could be sent.
You are correct that in the case of a continuous stream it's not needed to detect when the stream ends, because it doesn't. It could have been possible for the authors of HTTP/1.1 to have a third 'continuous stream of bytes' option for HTTP. Perhaps that use-case wasn't considered all those years ago.
Note that HTTP/2 doesn't have chunked encoding anymore, but it does something similar and sends multiple body frames.
Say I started writing to the response body, but there was some error, and I need to indicate that it's an HTTP 500 even if an HTTP 200 OK header was already written as a header...
How can I write something to the body of the response that's guaranteed to be malformed so that the response is interpreted as some sort of error by the client?
In general, this is impossible. Some clients only care about the response header, and may stop paying attention to what you send after the header.
But with certain clients, in certain cases, this may be possible.
I assume HTTP/1.1 here. HTTP/2 probably gives even more opportunities, because there’s more to screw up in the protocol, and the implementations are often stricter. Conversely, HTTP/1.0 is dumber and laxer, so harder to break.
Close the connection before the end of response, as indicated by your framing. If your response is framed with Content-Length: 100, close before you’ve sent the 100th byte of payload. If your response is framed with Transfer-Encoding: chunked, close before you’ve sent the final empty chunk. If the client expects to receive the entire payload, it may (and should) treat this as an error. But some won’t, including very popular client libraries.
If the payload is in a structured format, like JSON or XML, then do the same as 1, but before closing, send something that would disrupt that format. For example, no valid JSON text can end with {. Even if the client doesn’t recognize the incomplete payload as an error, it might then fail on trying to parse it.
Same as 1, but instead of closing the connection, just stop sending data. The client will “hang” until its receive operation times out, which it may treat as an error. This may be a bad idea if the client is operated by someone who is not prepared for such extravagant timeouts.
Only with Transfer-Encoding: chunked: Same as 3, but instead of hanging, send bogus very long chunks and/or keep sending chunks indefinitely, until the client gives up or crashes. Probably a very bad idea, bordering on malicious.
I have two machines, A and B.
A sends an HTTP request to B and asks for some document.
B responds back and sends the requested document and gives a 200 OK message, but machine A is complaining that the document is not received because of a network failure.
Does HTTP code 200 also work as acknowledgment that the document is received?
Does the HTTP 200 code also work as an acknowledgment that document has been received?
No. Not at all.
It is not even a guarantee that the document was completely transmitted.
The response code is in the first line of the response stream. The server could fail, or be disconnected from the client anywhere between sending the first line and the last byte of the response. The server may not even know this has happened.
In fact, there is no way that the server can know if the client received a complete (or partial) HTTP response. There is no provision for an acknowledgment in the HTTP protocol.
Now you could implement an application protocol over the top of HTTP in which the client is required to send a second HTTP request to the server to say "yes, I got the document". But this would involve some "application logic" implemented in the user's browser; e.g. in Javascript.
Absolutely not.
HTTP 200 is generated by the server, and only means that it understood the request and thinks it is able to fulfill it (e.g. the file is actually there).
All sorts of errors may occur during the transmission of the full response document (network connection breaking, packet loss, etc) which will not show up in the HTTP response, but need to be detected separately.
A pretty good guide to the HTTP protocol is found here: http://blog.catchpoint.com/2010/09/17/anatomyhttp/
You should make a distinction between the HTTP protocol and the underlying stream transport protocol, which should be reliable for HTTP purposes. The stream transport protocol will ACKnowledge all data transmission, including the response, so that both ends of exchange will affirm that the data is transmitted correctly. If the transport stream fails, then you will get a 'network failure' or similar error. When this happens, the HTTP protocol cannot continue; the data is no longer reliable or even complete.
What a 200 OK message means, at the HTTP level, is that the server has the document you're after and is about to transmit it to you. Normally you will get a content-length header as well, so you will be able to ascertain if/when the body is complete as an additional check on top of the stream protocol. From the HTTP protocol perspective, a response receives no acknowledgement, so once a response has been sent there is no verification.
However, as the stream transport is reliable, the act of sending the response will either be successful or result in an error. This does verify whether the document has been received by the network target (as noted by TripeHound, in the case of non-direct connection, e.g. a proxy, this is not a guarantee of delivery to the final target).
It's very simple to see that the 200 OK response code can't be a guarantee of anything about the response document. It's sent before the document is transmitted, so only a violation of causality could allow it to be dependent on successful reception of the document. It only serves as an indicator that the request was received properly and the server believes that it's able to fulfill the request. If the request requires extra processing (e.g. running a script), rather than just returning a static document, the response code should generally be sent after this has been completed, so it's normally an indicator that this was successful (but there are situations where this is not feasible, such as requests with persistent connections and push notifications -- the script could fail later).
On a more general level, it's never possible to provide an absolute guarantee that all messages have been received in any protocol, due to the Two Generals Problem. No acknowledgement system can get around this, because at some point there has to be a last acknowledgement; there's no way to know if this is received successfully, because that would require another acknowledgement, contradicting the premise that it was the last one.
HTTP is designed with an awareness of the possibility of various sorts of "middleboxes" - proxies operating with or without the knowledge of the client.
If there is a proxy involved, then even knowing that the server had transmitted all the data and recieved an normal close connection would not tell you anything about whether the document has been received by the machine who generated the HTTP request.
A sends a request to B. There may be all kinds of obstacles in the way that prevent the request from reaching B. In the case of https, the request may be reaching B but be rejected and it counts as if it hadn't reached B. In all these cases, B will not send any status at all.
Once the request reaches B, and there are no bugs crashing B, and no hardware failure etc. B will examine the request and determine what to do and what status to report. If A requested a file that is there and A is allowed access, B will start sending a "status 200" together with the file data.
Again all kinds of things can go wrong. A may receive nothing, or the "status 200" with no data or incomplete data etc. (By "receive" I mean that data arrives on the Ethernet cable, or through WiFi).
Usually the user of A will use some library that handles the ugly bits. With some decent library, the user can expect that they either get some error, or a status complete with the corresponding data. If a status 200 arrives at A with only half the data, the user will (depending on the design of the library) receive an error, not a status, and definitely not a status 200.
Or you may have a library that reports the status 200 and tells you "here's the first 2,000 bytes", "here's the next 2,000 bytes" and so on, and at some point when things go wrong, you might be told "sorry, there was an error, the data is incomplete".
But in general, the case that the user gets a status 200, and no data, will not happen.
How is a socket used by an HTTP client properly closed after transmitting the request? Or does it have to remain open (bidirectionally) until the complete response has been received? If so, how is the end of the request body determined by the server?
According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4, closing the socket is not an option for a request. That doesn't sound logical to me - why should a half-closed TCP connection be a problem for the server, if the client doesn't try to transmit anything after closing its half of the socket? The client can still receive data after all.
It seems to me that shutting down the write part of a socket would be a very practical way of letting the server know that the request has been finished. http://docs.python.org/howto/sockets.html#disconnecting even specifically mentions that use case.
If that's really the wrong way to do it, what's the alternative? Do I really always have to send a "Content-length" or use chunked transport to enable the server to properly find the end of a request? How does that work for requests with unknown body length?
Transfer-Encoding: chunked is specifically designed to allow sending data with an unknown body length, for both requests and responses. The end of the data is determined by receiving a chunk whose payload size is 0. If you do not send a chunked request, then you must send a Content-Length instead.
Are you talking about this?:
Closing the connection cannot be used to indicate the end of a request body, since that would leave no possibility for the server to send back a response.
I think the text is talking about full close, you can do a half(write)-close. I'm not sure that's a HTTP compilant way of doing it, but I would think most servers will accept it.
Regarding your second question, simply use chunked encoding:
All HTTP/1.1 applications that receive entities MUST accept the "chunked" transfer-coding (section 3.6), thus allowing this mechanism to be used for messages when the message length cannot be determined in advance.
If I make multiple HTTP Get Requests to the same server and get HTTP 200 OK responses to each one how do I tell which request maps to which response using Wireshark?
Currently it looks like an http request is made, and the next HTTP 200 OK response is quickly received so everything is in a the proper sequence. I have seen things to the contrary however. For example using the Google Maps API v2 I've made several requests for location information and then the information is received in an arbitrary order (closely resembling the order in which I requested it, but not necessarily perfect.)
So my intuition is I cannot assume that my responses will be received in a specific order, even though they may be in order most of the time. So I'm wondering how I can determine this order from the response.
Update: Clarification as to what I need. I just need to know that the server has received the request. It seems like I need to do this by looking at sequence numbers and perhaps even ACKS. The reasoning behind this approach is I'm basically observing a web app and checking it is sending the information and the information is being received.
Update: This has nothing to do with wireshark specifically. I believe it is confusing people so I removing it from the title. It has to do with the HTTP protocol on top of the TCP/IP protocol and how we map responses to requests.
Thanks.
After you have stopped capturing packets follow this steps:
position the cursor on a GET request
Open the Analyze menu
click "Follow TCP Stream"
You get a new window with requests and responses in sequence.
While I was googling for a complete different question, I saw this one and I think I can provide a more complete answer :
HTTP dictates that responses must arrive in the order they were requested, Therefore, if you are looking at a single TCP connection at a given time you should be seeing :
Request ; Response ; Request ; Response ...
Also in HTTP/1.1, there is support for "Pipeline" where the client doesn't have to wait for responses to arrive in order to issue the next request. What could be observed in such cases is :
Request ; Response ; Request ; Request ; Response ; Response ; Request ; Response
In the HTTP response itself, there is no reference to the specific request that triggered it.
Filipo's suggestion is classic when debugging / observing a single TCP connection, but, when observing multiple TCP connections, you can't click the follow TCP Stream because you'd have to do it for each connection.
If you have many TCP connections, and many requests/responses you will have to look at TCP Source port in the request packet, and the TCP dest port in the response packet to know which response is related to each tcp connection, and then apply the HTTP request/response order rules.
Also, Wireshark CAN decompress the response body, and it will do it automatically if all the response body has arrived, but it will do so NOT in the Follow TCP Stream.
I always use Wireshark to debug HTTP.
Seems like this ability is not provided by the HTTP protocol at the application layer so I must go down to the transportation layer to determine this. In my case the TCP/IP layer using sequence numbers.
HTTP only presumes a reliable
transport; any protocol that provides
such guarantees can be used; the
mapping of the HTTP/1.1 request and
response structures onto the
transport data units of the protocol
in question is outside the scope of
this specification.
Read more:
http://www.faqs.org/rfcs/rfc2616.html#ixzz0e20kxKcz
Don't use Wireshark to debug HTTP, use an HTTP debugger such as Fiddler2