HTTP pipelining - concurrent responses per connection - http

I was just reading this Wikipedia article on HTTP pipelining and from the diagram it appears that responses can be sent concurrently on one connection. Am I misinterpreting the diagram or is this allowed?
Section 8.1.2.2 of RFC 2616 states:
A server MUST send its responses to those requests in the same order
that the requests were received.
Whilst that stops short of explicitly ruling out concurrent responses, it does not mention a need to ensure that responses must not only start in the correct order with relation to requests, but also finish in the correct order.
I also cannot imagine the practicalities of dealing with concurrent responses - how would the client know to which response the received data applies?
Therefore my interpretation of the RFC is that whilst additional requests can be made whilst the response to the first request is being processed, it is not allowedfor the client to send concurrent requests or the server to send concurrent responses on the same connection.
Is this correct? I've attached a diagram below to illustrate my interpretation.
It would prevent the problems I mentioned from occurring, but it does not appear to completely align with the diagram in Wikipedia.

Short answer: Yes, clients and servers can send requests and responses concurrently.
However, a server cannot send multiple responses to one request, i.e. the request response pattern still applies. RFC 2616 (and the Wikipedia article you are refering to) simply state that a client does not need to wait for the server's response to send an additional request on the same connection. So the requests in your diagram look good :).
But the server doesn't have to wait for each of its responses to finish before it can start transmission of the next response. It can just send the responses to the client as it receives the client's requests. (Which results in the diagram shown in the Wikipedia article.)
How does the client know to which request a response applies?
Well, let's ignore that whole network delay stuff for a minute here and assume that pipelined request or response messages arrive at once but only after all of them have been sent.
The client sends its requests in a certain order (without waiting for responses inbetween requests).
The server receives the requests in the same order (TCP guarantees that) all at once.
The server takes the first request message, processes it, and stores the response in a queue.
The server takes the second request message, processes it, and stores the response in a queue.
(You get the idea...)
The server sends the contents of that queue to the client. The responses are stored in order so the response to the first request is at the beginning of that queue followed by the response to the second request and so on...
The client receives the responses in the same order (TCP guarantees that) and associates the first response with the first request it made and so on.
This still works even if we don't assume that we receive all the messages at once because TCP guarantees that the data that was sent is received in the same order.
We could also ignore the network completely and just look at the messages that are transferred between server and client.
Client -> Server
GET /request1.html HTTP/1.1
Host: example.com
...
GET /request2.html HTTP/1.1
Host: example.com
...
GET /request3.html HTTP/1.1
Host: example.com
...
Server -> Client
HTTP/1.1 200 OK
Content-Length: 234
...
HTTP/1.1 200 OK
Content-Length: 123
...
HTTP/1.1 200 OK
Content-Length: 345
...
The great thing about TCP is that this particular stream of messages always looks the same. You can send all of the requests first and then receive the responses; you can send request 1 first, receive the first response, send the remaining requests, and receive the remaining responses; you can send the first and part of the second request, receive part of the first response, send the remaining requests, receive the remaining responses; etc. Because TCP guarantees to keep the order of the transmitted messages, we can always associate the first request with the first response and so on.
I hope this answers your question...

Related

Does HTTP response "200 OK" give a guarantee that the document has been received by the machine who generated the HTTP request?

I have two machines, A and B.
A sends an HTTP request to B and asks for some document.
B responds back and sends the requested document and gives a 200 OK message, but machine A is complaining that the document is not received because of a network failure.
Does HTTP code 200 also work as acknowledgment that the document is received?
Does the HTTP 200 code also work as an acknowledgment that document has been received?
No. Not at all.
It is not even a guarantee that the document was completely transmitted.
The response code is in the first line of the response stream. The server could fail, or be disconnected from the client anywhere between sending the first line and the last byte of the response. The server may not even know this has happened.
In fact, there is no way that the server can know if the client received a complete (or partial) HTTP response. There is no provision for an acknowledgment in the HTTP protocol.
Now you could implement an application protocol over the top of HTTP in which the client is required to send a second HTTP request to the server to say "yes, I got the document". But this would involve some "application logic" implemented in the user's browser; e.g. in Javascript.
Absolutely not.
HTTP 200 is generated by the server, and only means that it understood the request and thinks it is able to fulfill it (e.g. the file is actually there).
All sorts of errors may occur during the transmission of the full response document (network connection breaking, packet loss, etc) which will not show up in the HTTP response, but need to be detected separately.
A pretty good guide to the HTTP protocol is found here: http://blog.catchpoint.com/2010/09/17/anatomyhttp/
You should make a distinction between the HTTP protocol and the underlying stream transport protocol, which should be reliable for HTTP purposes. The stream transport protocol will ACKnowledge all data transmission, including the response, so that both ends of exchange will affirm that the data is transmitted correctly. If the transport stream fails, then you will get a 'network failure' or similar error. When this happens, the HTTP protocol cannot continue; the data is no longer reliable or even complete.
What a 200 OK message means, at the HTTP level, is that the server has the document you're after and is about to transmit it to you. Normally you will get a content-length header as well, so you will be able to ascertain if/when the body is complete as an additional check on top of the stream protocol. From the HTTP protocol perspective, a response receives no acknowledgement, so once a response has been sent there is no verification.
However, as the stream transport is reliable, the act of sending the response will either be successful or result in an error. This does verify whether the document has been received by the network target (as noted by TripeHound, in the case of non-direct connection, e.g. a proxy, this is not a guarantee of delivery to the final target).
It's very simple to see that the 200 OK response code can't be a guarantee of anything about the response document. It's sent before the document is transmitted, so only a violation of causality could allow it to be dependent on successful reception of the document. It only serves as an indicator that the request was received properly and the server believes that it's able to fulfill the request. If the request requires extra processing (e.g. running a script), rather than just returning a static document, the response code should generally be sent after this has been completed, so it's normally an indicator that this was successful (but there are situations where this is not feasible, such as requests with persistent connections and push notifications -- the script could fail later).
On a more general level, it's never possible to provide an absolute guarantee that all messages have been received in any protocol, due to the Two Generals Problem. No acknowledgement system can get around this, because at some point there has to be a last acknowledgement; there's no way to know if this is received successfully, because that would require another acknowledgement, contradicting the premise that it was the last one.
HTTP is designed with an awareness of the possibility of various sorts of "middleboxes" - proxies operating with or without the knowledge of the client.
If there is a proxy involved, then even knowing that the server had transmitted all the data and recieved an normal close connection would not tell you anything about whether the document has been received by the machine who generated the HTTP request.
A sends a request to B. There may be all kinds of obstacles in the way that prevent the request from reaching B. In the case of https, the request may be reaching B but be rejected and it counts as if it hadn't reached B. In all these cases, B will not send any status at all.
Once the request reaches B, and there are no bugs crashing B, and no hardware failure etc. B will examine the request and determine what to do and what status to report. If A requested a file that is there and A is allowed access, B will start sending a "status 200" together with the file data.
Again all kinds of things can go wrong. A may receive nothing, or the "status 200" with no data or incomplete data etc. (By "receive" I mean that data arrives on the Ethernet cable, or through WiFi).
Usually the user of A will use some library that handles the ugly bits. With some decent library, the user can expect that they either get some error, or a status complete with the corresponding data. If a status 200 arrives at A with only half the data, the user will (depending on the design of the library) receive an error, not a status, and definitely not a status 200.
Or you may have a library that reports the status 200 and tells you "here's the first 2,000 bytes", "here's the next 2,000 bytes" and so on, and at some point when things go wrong, you might be told "sorry, there was an error, the data is incomplete".
But in general, the case that the user gets a status 200, and no data, will not happen.

How browser maps the web response back to request?

Say i make the web request(www.amazon.com) to amazon web server through browser. Browser makes the connection with Internet through Internet service providers.
Request reaches to amazon server which process it and send back the response. Two questions here :-
Does Amazon server makes new connection with internet to send the response back or incoming request(initiated by me) waits on socket till amazon process the response ?
Once my browser receives the response how does it map the response(sent from amazon) back to particular request . I believe there must be some unique identifier like
requestId must be present in response through which browser must be mapping to request. Is that correct ?
Does Amazon server makes new connection with internet to send the response back or incoming request(initiated by me) waits on socket
till amazon process the response ?
It uses the same connection. Most of the time it's not even possible to connect back to a web browser due to firewall restrictions or Network Address Translation (NAT).
Once my browser receives the request how does it map the response(sent from amazon) back to particular request . I believe
there must be some unique identifier like requestId must be present in
response through which browser must be mapping to request. Is that
correct ?
It receives the response on the same socket. So the socket is the identifier. If HTTP2 multiplexing is used, then each multiplexed stream has a stream identifier, which is used to map the response back to the request.
The client opens a TCP-connection to the server, sends an HTTP-request and the server sends the response using the same connection. So, the browser knows from the connection that the response belongs to a specific request. This applies to basic HTTP 1.
This has to be distinguished from the programming model of an AJAX web application which is asynchronous and not synchronous. The application does not actively wait for a response. It is instead triggered later when the response arrives. The connection handling described above is what happens "under the hood".
Back to the connection handling: There are optimizations of HTTP that make things more complicated. HTTP 1.1 has a feature called "keep alive" and HTTP 2 goes further into this direction. The idea is to send more data over a single TCP-connection because establishing a TCP-connection is expensive (-> three way handshake, slow start). So, multiple requests and responses are sent over a single TCP-connection. Your question arises again in case of this optimization. If e. g. there is a sequence of requests A, B and a sequence of corresponding responses B, A within a single HTTP-connection how does the browser know the request a response belongs to? HTTP 2 introduces the concept of streams (RFC 7540, section 5):
A single HTTP/2 connection can contain multiple concurrently open
streams, with either endpoint interleaving frames from multiple
streams.
The order in which frames are sent on a stream is significant.
Streams are identified by an integer.
So, the stream identifier and the order within a stream can be used by the browser to find out the request a response belongs to.
HTTP 2 introduces another interesting feature which is called "push". The client can proactively send resources to the client that the client has not even requested. So, resources like e. g. images can be already sent when the HTML is requested avoiding another communication roundtrip.
HTTP uses Transfer Control Protocol. This is how it happens-
Does Amazon server makes new connection with internet to send the response back or incoming request(initiated by me) waits on socket till amazon process the response ?
No. Most browsers use HTTP 1.1 so the connection between client and server is established only once until closed (Persistent connection).
Once my browser receives the request how does it map the response(sent from amazon) back to particular request . I believe there must be some unique identifier like requestId must be present in response through which browser must be mapping to request. Is that correct ?
There is a protocol(HTTP) on how the messages are exchanged. HTTP dictates that responses must arrive in the order they were requested. So it goes like-
Request;Response;Request;Response;Request;Response;...
And there is also a specific format of HTTP request (from your browser- HTTP client) and HTTP response message (from amazon HTTP server). There are response status codes that let the browser know if their request has been succeeded, otherwise tell the errors.
A few sample codes-

What is the difference between HTTP 100 and 200 status code?

What is the difference between HTTP 100 and 200status code?
Are they the same?
I was told that 200 is the standard code when the HTTP request is successful without any errors whatsoever.
Is that right?
What about this 100 code? I have found different explanations on this status code. could somebody explain that using some real world example please?
Because right now I don't know the difference and both seem to be the same to me.
Let's me give you an example:
You’re sending a large object to the server using a PUT request, you may include a Expect header like this:
PUT /media/file.mp4 HTTP/1.1
Host: api.example.org
Content-Length: 1073741824
Expect: 100-continue
This tells the server that it should respond with a 100 Continue status code if the server is going to be able to accept the request:
HTTP/1.1 100 Continue
When the client receives this, it tells the client the server will accept the request, and it may start sending the request body.
The big benefit here is that if there’s a problem with the request, a server can immediately respond with an error before the client starts sending the request body.
A simple use-case is that a server might first require authentication using 401 Unauthorized, or it might know in advance that the Content-Type that the client wants to send to the server is not something the server will want to accept.
Mainly cited from :
https://evertpot.com/http/100-continue/
https://www.rfc-editor.org/rfc/rfc7231#section-5.1.1
From: http://www.rfc-editor.org/rfc/rfc7231.txt
6.2.1. 100 Continue
The 100 (Continue) status code indicates that the initial part of a
request has been received and has not yet been rejected by the
server. The server intends to send a final response after the
request has been fully received and acted upon.
When the request contains an Expect header field that includes a
100-continue expectation, the 100 response indicates that the server
wishes to receive the request payload body, as described in
Section 5.1.1. The client ought to continue sending the request and
discard the 100 response.
If the request did not contain an Expect header field containing the
100-continue expectation, the client can simply discard this interim
response.
(edited, thank you Julian for noticing :)

Mapping HTTP requests to HTTP responses

If I make multiple HTTP Get Requests to the same server and get HTTP 200 OK responses to each one how do I tell which request maps to which response using Wireshark?
Currently it looks like an http request is made, and the next HTTP 200 OK response is quickly received so everything is in a the proper sequence. I have seen things to the contrary however. For example using the Google Maps API v2 I've made several requests for location information and then the information is received in an arbitrary order (closely resembling the order in which I requested it, but not necessarily perfect.)
So my intuition is I cannot assume that my responses will be received in a specific order, even though they may be in order most of the time. So I'm wondering how I can determine this order from the response.
Update: Clarification as to what I need. I just need to know that the server has received the request. It seems like I need to do this by looking at sequence numbers and perhaps even ACKS. The reasoning behind this approach is I'm basically observing a web app and checking it is sending the information and the information is being received.
Update: This has nothing to do with wireshark specifically. I believe it is confusing people so I removing it from the title. It has to do with the HTTP protocol on top of the TCP/IP protocol and how we map responses to requests.
Thanks.
After you have stopped capturing packets follow this steps:
position the cursor on a GET request
Open the Analyze menu
click "Follow TCP Stream"
You get a new window with requests and responses in sequence.
While I was googling for a complete different question, I saw this one and I think I can provide a more complete answer :
HTTP dictates that responses must arrive in the order they were requested, Therefore, if you are looking at a single TCP connection at a given time you should be seeing :
Request ; Response ; Request ; Response ...
Also in HTTP/1.1, there is support for "Pipeline" where the client doesn't have to wait for responses to arrive in order to issue the next request. What could be observed in such cases is :
Request ; Response ; Request ; Request ; Response ; Response ; Request ; Response
In the HTTP response itself, there is no reference to the specific request that triggered it.
Filipo's suggestion is classic when debugging / observing a single TCP connection, but, when observing multiple TCP connections, you can't click the follow TCP Stream because you'd have to do it for each connection.
If you have many TCP connections, and many requests/responses you will have to look at TCP Source port in the request packet, and the TCP dest port in the response packet to know which response is related to each tcp connection, and then apply the HTTP request/response order rules.
Also, Wireshark CAN decompress the response body, and it will do it automatically if all the response body has arrived, but it will do so NOT in the Follow TCP Stream.
I always use Wireshark to debug HTTP.
Seems like this ability is not provided by the HTTP protocol at the application layer so I must go down to the transportation layer to determine this. In my case the TCP/IP layer using sequence numbers.
HTTP only presumes a reliable
transport; any protocol that provides
such guarantees can be used; the
mapping of the HTTP/1.1 request and
response structures onto the
transport data units of the protocol
in question is outside the scope of
this specification.
Read more:
http://www.faqs.org/rfcs/rfc2616.html#ixzz0e20kxKcz
Don't use Wireshark to debug HTTP, use an HTTP debugger such as Fiddler2

HTTP chunks: All chunks sent consecutively and uninterrupted?

Can I be sure that a chunked HTTP response will be sent uninterrupted by anything else? I need to differentiate responses (and requests) and this isn't a simple case of reading content length, seeing a closed connection or a no-body response code.
Can I read each chunk and once chunk-size is 0 I will have read exactly one response (or request)? i.e. is it possible for part of any other response to have been sent interleaved? I suspect it is sent consecutively and uninterrupted as there doesn't appear to be any kind of identification in the spec for chunked transfer, so how could more than one be reassembled?
Finally, if a response is sent chunked, does the client send anything more than its original request? I'm thinking along the lines of flow control and error checking but that is all handled at lower layers, so I suspect that the client does not send anything more.
Thanks!
"Interleaved" with what exactly? HTTP doesn't allow to send several responses concurrently on the same connection. Even with pipelining responses are still sent after each other. That is, you will see all the chunks coming in order before the response to any other request.
As for your final question, no, the client doesn't send anything more than the original request.

Resources