I'm sifting through some network traces and noticed on my own machine that when I connect over HTTP, packets look something like:
client --> server: GET
server --> client: tcp ack
server --> client: HTTP response
client --> server: tcp ack
However, I looked at some CIFS (SMB) traces I have saved from a few years back. I see things like:
client --> server: Create Request
server --> client: Create response (This packet also acks the request)
At a high level, I'm wondering why the difference - what is causing the different behaviors? What is controlling whether the application response is placed on the request ack or another packet: the application or OS?
This behavior is dependent on both the OS and the application. In linux, the kernel doesn't send an ACK directly, but instead waits a fixed number of milliseconds (around 200), hoping that is has some data to send back and can let the ACK piggyback the data.
If the timer goes off, then the ACK is sent immediately.
Example 1.
Client sends the GET request.
Server tries to create a http response, but before it does that 200ms are gone
and it must send the ACK before the http response.
Example 2.
Client sends the GET request.
Server creates a http response within the timer limit, and the ACK can piggyback
the data.
Meaning, if your application got slower at generating that response, the ACK will be send without piggybacking on the data. And also depending on the OS, the delay timer can be higher / lower and once again changing how ACK's are sent.
Related
I'm an application developer looking to learn more about the transport layer of my requests that I've been making all these years. I've also been learning more of the backend and am building my own live data service with websockets, which has me curious about how data actually moves around.
As such I've learned about TCP, and I understand how it works, but there's still one term that confuses me-- a "TCP Connection". I have seen it everywhere, and actually there was a thread opened with the exact same question... but as the OP said in the comments, nobody actually answered the question:
TCP vs UDP - What is a TCP connection?
"when we say that there is a connection established between two hosts,
what does that mean? If I could get a magic microscope and inspect the
server or the client, and - a-ha! - find the connection, what would I
be looking at? Some variable allocated by the OS code? Some entry in
some kind of table? How and when does that gets there, and how and
when it is removed from there"
I've been reading to try to figure this out on my own,
Here is a nice resource that details HTTP flow, also mentions "TCP Connection"
https://blog.catchpoint.com/2010/09/17/anatomyhttp/
Here is another thread about HTTP Keep-alive, same "TCP Connection":
HTTP Keep Alive and TCP keep alive
My understanding:
When a client wants data from server, SYN/ACK handshake happens, this "connection" is established, and both parties agree on the starting sequence number, maximum packet size, etc.
as long as this "connection" is still open, client can request/receive data without doing another handshake. TCP Keep-alive sends a heartbeat to keep this "connection" open
1) Somehow a HTTP Header "Keep-alive" also keeps this TCP "connection" open, even though HTTP headers are part of the packet payload and it doesn't seem to make sense that the TCP layer would parse the HTTP headers?
To me it seems like a "connection" between two machines in the literal sense can never be closed, because a client is always free to hit a server with packets (like the first SYN packet, for example)
2) Is a TCP "connection" just the client and server saving the sequence number from the other's IP address? maybe it's just a flag that's saying "hey this client is cool, accept messages from them without a handshake"? So would closing a connection just be wiping that data out from memory?
... both parties agree on the starting sequence number
No, they don't "agree" one a number. Each direction has their own sequence numbering. So the client sends in the SYN to the server the initial sequence number (ISN) for the data from client to server, the server sends in its SYN the ISN for the data from server to client.
Somehow a HTTP Header "Keep-alive" also keeps this TCP "connection" open ...
Not really. With HTTP keep-alive the client just asks a server nicely to not close the connection after the HTTP response was sent so that another HTTP request can be sent using the same TCP connection. The server might decide to follow the clients wish or not.
To me it seems like a "connection" between two machines in the literal sense can never be closed,
Each side can send a packet with a FIN flag to signal that it will no longer send any data. If both sides has send the FIN the the connection is considered close since no one will send anything and thus nothing can be received. If one side decides that it does not want to receive any more data it can send a packet with a RST flag.
Is a TCP "connection" just the client and server saving the sequence number from the other's IP address?
Kind of. Each side saves the current state of the connection, i.e. IP's and ports involved, currently expected sequence number for receiving, current sequence number for sending, outstanding bytes which were not ACKed yet ... If no such state is there (for example one site crashed) then there is no connection.
... maybe it's just a flag that's saying "hey this client is cool, accept messages from them without a handshake"
If a packet got received which fits an existing state then it is considered part of the connection, i.e. it will be processed and the state will be updated.
So would closing a connection just be wiping that data out from memory?
Closing is telling the other that no more data will be send (using FIN) and if both side have done it both can basically remove the state and then there is no connection anymore.
I'm doing a test where I examine how much HTTP-long polling compared to Websockets is affecting the battery performance on my iPhone. Basically what I have is a Node.js with express server that sends out a random string every 0.5 or 10th second to the iPhone. I've inspected the messages in Chrome and I can see the keep-alive header is present. I know keep-alive is a default feature since HTTP/1.1. From what I've understood the TCP-connection will be held open and can be used for pipelining, and this is certainly the case when I'm sending out pings from the server every 0.5 seconds. But when I send out every 10 seconds, will the connection be closed during that time?
How do I know how long the connection is open? This seems to be a crucial part to have in mind when doing the tests.
Will the HTTP-handshake still be made when the TCP-connection is open?
AFAIK, in HTTP 1, the server cannot send a response back to the client if that client didn't send a request first. That might sound irrelevant to your question but bear with me.
The Connection: keep-alive header tells the client that it can reuse the connection if he want to, not that it must. The client can decide to close it any time, it all depends on the client library implementation and you don't have any guarantee.
The only way to force the client to not close the connection is to not finish the response. The only way to do that is to send a response with a Transfer-Encoding: chunked, and never send the final chunk (this has some serious caveats, like a buffer overrun on the client...).
So to answer your 2 points:
You can't, this low-level detail is totally hidden (for good reasons) from the client.
There is no HTTP handshake, there is a TCP handshake which is made when the client socket connects to the server socket. There is the TLS handshake which is made after the TCP connection and before any request is made. Once the connection is open, http requests are sent by the client and the server responds with resources.
From HTTP:The definitive guide :
But without Content-Length, clients cannot distinguish between
successful connection close at the end of a message and connection
close due to a server crash in the middle of a message.
Let's assume that for this purpose the "server crash" means crash of the server's HW or OS without closing the TCP connection or possibly link being broken.
If the web server crashes without closing TCP connection, how does the client detect that the connection "has been closed"?
From what I know, if FIN segment is not sent the client will keep waiting for the data unless there is a timer or it tries to send some data (failing which detects TCP connection shutdown).
How is this done in HTTP?
If the web server crashes without closing TCP connection, how does the client detect that the connection "has been closed"?
Since the closing will be done by the kernel that would mean, that the whole system crashed or that the connection broke somewhere else (router crashed, power blackout at server side or similar).
You can only detect this if you sent data to the server and don't get any useful response back.
From what I know, if FIN segment is not sent the client will keep waiting for the data unless there is a timer or it tries to send some data (failing which detects TCP connection shutdown).
How is this done in HTTP?
HTTP uses TCP as the underlying protocol, so if TCP detects a connection close HTTP will too. Additionally HTTP can detect in most cases if the response is complete, by using information from Content-length header or similar information with chunked transfer encoding. In the few cases where the end of response is only indicated by a connection close HTTP can only rely on TCP do detect problems. So far the theory, but in practice most browsers simply ignore an incomplete response and show as much as they got.
I was just reading this Wikipedia article on HTTP pipelining and from the diagram it appears that responses can be sent concurrently on one connection. Am I misinterpreting the diagram or is this allowed?
Section 8.1.2.2 of RFC 2616 states:
A server MUST send its responses to those requests in the same order
that the requests were received.
Whilst that stops short of explicitly ruling out concurrent responses, it does not mention a need to ensure that responses must not only start in the correct order with relation to requests, but also finish in the correct order.
I also cannot imagine the practicalities of dealing with concurrent responses - how would the client know to which response the received data applies?
Therefore my interpretation of the RFC is that whilst additional requests can be made whilst the response to the first request is being processed, it is not allowedfor the client to send concurrent requests or the server to send concurrent responses on the same connection.
Is this correct? I've attached a diagram below to illustrate my interpretation.
It would prevent the problems I mentioned from occurring, but it does not appear to completely align with the diagram in Wikipedia.
Short answer: Yes, clients and servers can send requests and responses concurrently.
However, a server cannot send multiple responses to one request, i.e. the request response pattern still applies. RFC 2616 (and the Wikipedia article you are refering to) simply state that a client does not need to wait for the server's response to send an additional request on the same connection. So the requests in your diagram look good :).
But the server doesn't have to wait for each of its responses to finish before it can start transmission of the next response. It can just send the responses to the client as it receives the client's requests. (Which results in the diagram shown in the Wikipedia article.)
How does the client know to which request a response applies?
Well, let's ignore that whole network delay stuff for a minute here and assume that pipelined request or response messages arrive at once but only after all of them have been sent.
The client sends its requests in a certain order (without waiting for responses inbetween requests).
The server receives the requests in the same order (TCP guarantees that) all at once.
The server takes the first request message, processes it, and stores the response in a queue.
The server takes the second request message, processes it, and stores the response in a queue.
(You get the idea...)
The server sends the contents of that queue to the client. The responses are stored in order so the response to the first request is at the beginning of that queue followed by the response to the second request and so on...
The client receives the responses in the same order (TCP guarantees that) and associates the first response with the first request it made and so on.
This still works even if we don't assume that we receive all the messages at once because TCP guarantees that the data that was sent is received in the same order.
We could also ignore the network completely and just look at the messages that are transferred between server and client.
Client -> Server
GET /request1.html HTTP/1.1
Host: example.com
...
GET /request2.html HTTP/1.1
Host: example.com
...
GET /request3.html HTTP/1.1
Host: example.com
...
Server -> Client
HTTP/1.1 200 OK
Content-Length: 234
...
HTTP/1.1 200 OK
Content-Length: 123
...
HTTP/1.1 200 OK
Content-Length: 345
...
The great thing about TCP is that this particular stream of messages always looks the same. You can send all of the requests first and then receive the responses; you can send request 1 first, receive the first response, send the remaining requests, and receive the remaining responses; you can send the first and part of the second request, receive part of the first response, send the remaining requests, receive the remaining responses; etc. Because TCP guarantees to keep the order of the transmitted messages, we can always associate the first request with the first response and so on.
I hope this answers your question...
I was asked to build a site , and one of the co-developer told me That I would need to include the keep-alive header.
Well I read alot about it and still I have questions.
msdn ->
The open connection improves performance when a client makes multiple
requests for Web page content, because the server can return the
content for each request more quickly. Otherwise, the server has to
open a new connection for every request
Looking at
When The IIS (F) sends keep alive header (or user sends keep-alive) , does it mean that (E,C,B) save a connection which is only for my session ?
Where does this info is kept ( "this connection belongs to "Royi") ?
Does it mean that no one else can use that connection
If so - does it mean that keep alive-header - reduce the number of overlapped connection users ?
if so , for how long does the connection is saved to me ? (in other words , if I set keep alive- "keep" till when?)
p.s. for those who interested :
clicking this sample page will return keep alive header
Where is this info kept ("this connection is between computer A and server F")?
A TCP connection is recognized by source IP and port and destination IP and port. Your OS, all intermediate session-aware devices and the server's OS will recognize the connection by this.
HTTP works with request-response: client connects to server, performs a request and gets a response. Without keep-alive, the connection to an HTTP server is closed after each response. With HTTP keep-alive you keep the underlying TCP connection open until certain criteria are met.
This allows for multiple request-response pairs over a single TCP connection, eliminating some of TCP's relatively slow connection startup.
When The IIS (F) sends keep alive header (or user sends keep-alive) , does it mean that (E,C,B) save a connection
No. Routers don't need to remember sessions. In fact, multiple TCP packets belonging to same TCP session need not all go through same routers - that is for TCP to manage. Routers just choose the best IP path and forward packets. Keep-alive is only for client, server and any other intermediate session-aware devices.
which is only for my session ?
Does it mean that no one else can use that connection
That is the intention of TCP connections: it is an end-to-end connection intended for only those two parties.
If so - does it mean that keep alive-header - reduce the number of overlapped connection users ?
Define "overlapped connections". See HTTP persistent connection for some advantages and disadvantages, such as:
Lower CPU and memory usage (because fewer connections are open simultaneously).
Enables HTTP pipelining of requests and responses.
Reduced network congestion (fewer TCP connections).
Reduced latency in subsequent requests (no handshaking).
if so , for how long does the connection is saved to me ? (in other words , if I set keep alive- "keep" till when?)
An typical keep-alive response looks like this:
Keep-Alive: timeout=15, max=100
See Hypertext Transfer Protocol (HTTP) Keep-Alive Header for example (a draft for HTTP/2 where the keep-alive header is explained in greater detail than both 2616 and 2086):
A host sets the value of the timeout parameter to the time that the host will allows an idle connection to remain open before it is closed. A connection is idle if no data is sent or received by a host.
The max parameter indicates the maximum number of requests that a client will make, or that a server will allow to be made on the persistent connection. Once the specified number of requests and responses have been sent, the host that included the parameter could close the connection.
However, the server is free to close the connection after an arbitrary time or number of requests (just as long as it returns the response to the current request). How this is implemented depends on your HTTP server.