Keep-alive header clarification - http

I was asked to build a site , and one of the co-developer told me That I would need to include the keep-alive header.
Well I read alot about it and still I have questions.
msdn ->
The open connection improves performance when a client makes multiple
requests for Web page content, because the server can return the
content for each request more quickly. Otherwise, the server has to
open a new connection for every request
Looking at
When The IIS (F) sends keep alive header (or user sends keep-alive) , does it mean that (E,C,B) save a connection which is only for my session ?
Where does this info is kept ( "this connection belongs to "Royi") ?
Does it mean that no one else can use that connection
If so - does it mean that keep alive-header - reduce the number of overlapped connection users ?
if so , for how long does the connection is saved to me ? (in other words , if I set keep alive- "keep" till when?)
p.s. for those who interested :
clicking this sample page will return keep alive header

Where is this info kept ("this connection is between computer A and server F")?
A TCP connection is recognized by source IP and port and destination IP and port. Your OS, all intermediate session-aware devices and the server's OS will recognize the connection by this.
HTTP works with request-response: client connects to server, performs a request and gets a response. Without keep-alive, the connection to an HTTP server is closed after each response. With HTTP keep-alive you keep the underlying TCP connection open until certain criteria are met.
This allows for multiple request-response pairs over a single TCP connection, eliminating some of TCP's relatively slow connection startup.
When The IIS (F) sends keep alive header (or user sends keep-alive) , does it mean that (E,C,B) save a connection
No. Routers don't need to remember sessions. In fact, multiple TCP packets belonging to same TCP session need not all go through same routers - that is for TCP to manage. Routers just choose the best IP path and forward packets. Keep-alive is only for client, server and any other intermediate session-aware devices.
which is only for my session ?
Does it mean that no one else can use that connection
That is the intention of TCP connections: it is an end-to-end connection intended for only those two parties.
If so - does it mean that keep alive-header - reduce the number of overlapped connection users ?
Define "overlapped connections". See HTTP persistent connection for some advantages and disadvantages, such as:
Lower CPU and memory usage (because fewer connections are open simultaneously).
Enables HTTP pipelining of requests and responses.
Reduced network congestion (fewer TCP connections).
Reduced latency in subsequent requests (no handshaking).
if so , for how long does the connection is saved to me ? (in other words , if I set keep alive- "keep" till when?)
An typical keep-alive response looks like this:
Keep-Alive: timeout=15, max=100
See Hypertext Transfer Protocol (HTTP) Keep-Alive Header for example (a draft for HTTP/2 where the keep-alive header is explained in greater detail than both 2616 and 2086):
A host sets the value of the timeout parameter to the time that the host will allows an idle connection to remain open before it is closed. A connection is idle if no data is sent or received by a host.
The max parameter indicates the maximum number of requests that a client will make, or that a server will allow to be made on the persistent connection. Once the specified number of requests and responses have been sent, the host that included the parameter could close the connection.
However, the server is free to close the connection after an arbitrary time or number of requests (just as long as it returns the response to the current request). How this is implemented depends on your HTTP server.

Related

How is a TCP "Connection" maintained, and how does HTTP Keep-Alive affect it?

I'm an application developer looking to learn more about the transport layer of my requests that I've been making all these years. I've also been learning more of the backend and am building my own live data service with websockets, which has me curious about how data actually moves around.
As such I've learned about TCP, and I understand how it works, but there's still one term that confuses me-- a "TCP Connection". I have seen it everywhere, and actually there was a thread opened with the exact same question... but as the OP said in the comments, nobody actually answered the question:
TCP vs UDP - What is a TCP connection?
"when we say that there is a connection established between two hosts,
what does that mean? If I could get a magic microscope and inspect the
server or the client, and - a-ha! - find the connection, what would I
be looking at? Some variable allocated by the OS code? Some entry in
some kind of table? How and when does that gets there, and how and
when it is removed from there"
I've been reading to try to figure this out on my own,
Here is a nice resource that details HTTP flow, also mentions "TCP Connection"
https://blog.catchpoint.com/2010/09/17/anatomyhttp/
Here is another thread about HTTP Keep-alive, same "TCP Connection":
HTTP Keep Alive and TCP keep alive
My understanding:
When a client wants data from server, SYN/ACK handshake happens, this "connection" is established, and both parties agree on the starting sequence number, maximum packet size, etc.
as long as this "connection" is still open, client can request/receive data without doing another handshake. TCP Keep-alive sends a heartbeat to keep this "connection" open
1) Somehow a HTTP Header "Keep-alive" also keeps this TCP "connection" open, even though HTTP headers are part of the packet payload and it doesn't seem to make sense that the TCP layer would parse the HTTP headers?
To me it seems like a "connection" between two machines in the literal sense can never be closed, because a client is always free to hit a server with packets (like the first SYN packet, for example)
2) Is a TCP "connection" just the client and server saving the sequence number from the other's IP address? maybe it's just a flag that's saying "hey this client is cool, accept messages from them without a handshake"? So would closing a connection just be wiping that data out from memory?
... both parties agree on the starting sequence number
No, they don't "agree" one a number. Each direction has their own sequence numbering. So the client sends in the SYN to the server the initial sequence number (ISN) for the data from client to server, the server sends in its SYN the ISN for the data from server to client.
Somehow a HTTP Header "Keep-alive" also keeps this TCP "connection" open ...
Not really. With HTTP keep-alive the client just asks a server nicely to not close the connection after the HTTP response was sent so that another HTTP request can be sent using the same TCP connection. The server might decide to follow the clients wish or not.
To me it seems like a "connection" between two machines in the literal sense can never be closed,
Each side can send a packet with a FIN flag to signal that it will no longer send any data. If both sides has send the FIN the the connection is considered close since no one will send anything and thus nothing can be received. If one side decides that it does not want to receive any more data it can send a packet with a RST flag.
Is a TCP "connection" just the client and server saving the sequence number from the other's IP address?
Kind of. Each side saves the current state of the connection, i.e. IP's and ports involved, currently expected sequence number for receiving, current sequence number for sending, outstanding bytes which were not ACKed yet ... If no such state is there (for example one site crashed) then there is no connection.
... maybe it's just a flag that's saying "hey this client is cool, accept messages from them without a handshake"
If a packet got received which fits an existing state then it is considered part of the connection, i.e. it will be processed and the state will be updated.
So would closing a connection just be wiping that data out from memory?
Closing is telling the other that no more data will be send (using FIN) and if both side have done it both can basically remove the state and then there is no connection anymore.

HTTP-Long Polling keep-alive and handshakes

I'm doing a test where I examine how much HTTP-long polling compared to Websockets is affecting the battery performance on my iPhone. Basically what I have is a Node.js with express server that sends out a random string every 0.5 or 10th second to the iPhone. I've inspected the messages in Chrome and I can see the keep-alive header is present. I know keep-alive is a default feature since HTTP/1.1. From what I've understood the TCP-connection will be held open and can be used for pipelining, and this is certainly the case when I'm sending out pings from the server every 0.5 seconds. But when I send out every 10 seconds, will the connection be closed during that time?
How do I know how long the connection is open? This seems to be a crucial part to have in mind when doing the tests.
Will the HTTP-handshake still be made when the TCP-connection is open?
AFAIK, in HTTP 1, the server cannot send a response back to the client if that client didn't send a request first. That might sound irrelevant to your question but bear with me.
The Connection: keep-alive header tells the client that it can reuse the connection if he want to, not that it must. The client can decide to close it any time, it all depends on the client library implementation and you don't have any guarantee.
The only way to force the client to not close the connection is to not finish the response. The only way to do that is to send a response with a Transfer-Encoding: chunked, and never send the final chunk (this has some serious caveats, like a buffer overrun on the client...).
So to answer your 2 points:
You can't, this low-level detail is totally hidden (for good reasons) from the client.
There is no HTTP handshake, there is a TCP handshake which is made when the client socket connects to the server socket. There is the TLS handshake which is made after the TCP connection and before any request is made. Once the connection is open, http requests are sent by the client and the server responds with resources.

How HTTP client detect web server crash

From HTTP:The definitive guide :
But without Content-Length, clients cannot distinguish between
successful connection close at the end of a message and connection
close due to a server crash in the middle of a message.
Let's assume that for this purpose the "server crash" means crash of the server's HW or OS without closing the TCP connection or possibly link being broken.
If the web server crashes without closing TCP connection, how does the client detect that the connection "has been closed"?
From what I know, if FIN segment is not sent the client will keep waiting for the data unless there is a timer or it tries to send some data (failing which detects TCP connection shutdown).
How is this done in HTTP?
If the web server crashes without closing TCP connection, how does the client detect that the connection "has been closed"?
Since the closing will be done by the kernel that would mean, that the whole system crashed or that the connection broke somewhere else (router crashed, power blackout at server side or similar).
You can only detect this if you sent data to the server and don't get any useful response back.
From what I know, if FIN segment is not sent the client will keep waiting for the data unless there is a timer or it tries to send some data (failing which detects TCP connection shutdown).
How is this done in HTTP?
HTTP uses TCP as the underlying protocol, so if TCP detects a connection close HTTP will too. Additionally HTTP can detect in most cases if the response is complete, by using information from Content-length header or similar information with chunked transfer encoding. In the few cases where the end of response is only indicated by a connection close HTTP can only rely on TCP do detect problems. So far the theory, but in practice most browsers simply ignore an incomplete response and show as much as they got.

Relation between HTTP Keep Alive duration and TCP timeout duration

I am trying to understand the relation between TCP/IP and HTTP timeout values. Are these two timeout values different or same? Most Web servers allow users to set the HTTP Keep Alive timeout value through some configuration. How is this value used by the Web servers? is this value just set on the underlying TCP/IP socket i.e is the HTTP Keep Alive timeout and TCP/IP Keep Alive Timeout same? or are they treated differently?
My understanding is (maybe incorrect):
The Web server uses the default timeout on the underlying TCP socket (i.e. indefinite) regardless of the configured HTTP Keep Alive timeout and creates a Worker thread that counts down the specified HTTP timeout interval. When the Worker thread hits zero, it closes the connection.
EDIT:
My question is about the relation or difference between the two timeout durations i.e. what will happen when HTTP keep-alive timeout duration and the timeout on the Socket (SO_TIMEOUT) which the Web server uses is different? should I even worry about these two being same or not?
An open TCP socket does not require any communication whatsoever between the two parties (let's call them Alice and Bob) unless actual data is being sent. If Alice has received acknowledgments for all the data she's sent to Bob, there's no way she can distinguish among the following cases:
Bob has been unplugged, or is otherwise inaccessible to Alice.
Bob has been rebooted, or otherwise forgotten about the open TCP socket he'd established with Alice.
Bob is connected to Alice, and knows he has an open connection, but doesn't have anything he wants to say.
If Alice hasn't heard from Bob in awhile and wants to distinguish among the above conditions, she can resend her last byte of data, wrapped in a suitable TCP frame to be recognizable as a retransmission, essentially pretending she hasn't heard the acknowledgment. If Bob is unplugged, she'll hear nothing back, even if she repeatedly sends the packet over a period of many seconds. If Bob has rebooted or forgotten the connection, he will immediately respond saying the connection is invalid. If Bob is happy with the connection and simply has nothing to say, he'll respond with an acknowledgment of the retransmission.
The Timeout indicates how long Alice is willing to wait for a response when she sends a packet which demands a reply. The Keepalive time indicates how much time she should allow to lapse before she retransmits her last bit of data and demands an acknowledgment. If Bob goes missing, the sum of the Keepalive and Timeout values will indicate the worst-case time between Alice receiving her last bit of data and her deciding that Bob is dead.
They're two separate mechanisms; the name is a coincidence.
HTTP keep-alive (also known as persistent connections) is keeping the TCP socket open so that another request can be made without setting up a new connection.
TCP keep-alive is a periodic check to make sure that the connection is still up and functioning. It's often used to assure that a NAT box (e.g., a DSL router) doesn't "forget" the mapping between an internal and external ip/port.
KeepAliveTimeout Directive
Description: Amount of time the server will wait for subsequent
requests on a persistent connection Syntax: KeepAliveTimeout seconds
Default: KeepAliveTimeout 15 Context: server config, virtual host
Status: Core Module: core The number of seconds Apache will wait for a
subsequent request before closing the connection. Once a request has
been received, the timeout value specified by the Timeout directive
applies.
Setting KeepAliveTimeout to a high value may cause performance
problems in heavily loaded servers. The higher the timeout, the more
server processes will be kept occupied waiting on connections with
idle clients.
In a name-based virtual host context, the value of the first defined
virtual host (the default host) in a set of NameVirtualHost will be
used. The other values will be ignored.
TimeOut Directive
Description: Amount of time the server will wait for certain events
before failing a request Syntax: TimeOut seconds Default: TimeOut 300
Context: server config, virtual host Status: Core Module: core The
TimeOut directive currently defines the amount of time Apache will
wait for three things:
The total amount of time it takes to receive a GET request. The amount
of time between receipt of TCP packets on a POST or PUT request. The
amount of time between ACKs on transmissions of TCP packets in
responses. We plan on making these separately configurable at some
point down the road. The timer used to default to 1200 before 1.2, but
has been lowered to 300 which is still far more than necessary in most
situations. It is not set any lower by default because there may still
be odd places in the code where the timer is not reset when a packet
is sent.

Use HTTP Keep-Alive for server to communicate to client

Recently in an interview I was asked how I would approach an online chat client application. I went through the standard "polling" solution but was cut off because the interviewer was looking for the "HTTP 1.1 keep-alive" method. Having used HTTP for quite a while and remembering that the whole point was to be "stateless", this never occurred to me (also, not to mention that the keep-alive is not consistently implemented).
My question is, is it possible for a web server to broadcast and/or send information to a client when the "keep-alive" header has been set?
With HTTP 1.1, keep-alive is the default behavior. (In HTTP 1.0, the default behavior was to close the connection.) The server must send the 'Connection: close" header to terminate the connection with the first response. So there is still a TCP socket available to push data through, but just pushing data from the server would violate the HTTP protocol in a major way. Even using keep-alive, the client would still have to poll the server.
It is important to distinguish between HTTP Keepalive and TCP Keepalive. HTTP keepalive prevents the connection from being closed by the server or client. TCP keepalive is used when the connection might be idle for an extended period of time and might be dropped by a NAT proxy or firewall. TCP keepalive is activated on a per-socket basis by setsockopt() calls.
When doing a 'long poll' to eliminate the need to re-poll, TCP keepalive might be needed.
Keep-alive simply holds a TCP socket open, so each time you poll, you save the overhead of the TCP setup/teardown packets--but you still have to poll.
However, "long polling" is a strategy for the web server to broadcast notifications to the client. Essentially, the client issues a GET request, but instead of immediately responding, the web server waits until they have a notification to send, at which point they respond to the GET request. This eliminates any need for packets to go across the wire for polling purposes, and keeps the connection stateless, which as you correctly mention is one of the purposes of the protocol.
You might read more about Comet servers. That sounds basically like the approach that the interviewer was asking about. Their effectiveness is disputed by some, but it has been used in several similar situations.
For example, I believe gmail uses comet technologies for some things (but don't quote me on it).
Another example that seems relevant is BOSH, which is a protocol for transmitting chat information using HTTP and XMPP. But I don't believe that using keep-alive is involved in that.

Resources