HTTP-Long Polling keep-alive and handshakes - http

I'm doing a test where I examine how much HTTP-long polling compared to Websockets is affecting the battery performance on my iPhone. Basically what I have is a Node.js with express server that sends out a random string every 0.5 or 10th second to the iPhone. I've inspected the messages in Chrome and I can see the keep-alive header is present. I know keep-alive is a default feature since HTTP/1.1. From what I've understood the TCP-connection will be held open and can be used for pipelining, and this is certainly the case when I'm sending out pings from the server every 0.5 seconds. But when I send out every 10 seconds, will the connection be closed during that time?
How do I know how long the connection is open? This seems to be a crucial part to have in mind when doing the tests.
Will the HTTP-handshake still be made when the TCP-connection is open?

AFAIK, in HTTP 1, the server cannot send a response back to the client if that client didn't send a request first. That might sound irrelevant to your question but bear with me.
The Connection: keep-alive header tells the client that it can reuse the connection if he want to, not that it must. The client can decide to close it any time, it all depends on the client library implementation and you don't have any guarantee.
The only way to force the client to not close the connection is to not finish the response. The only way to do that is to send a response with a Transfer-Encoding: chunked, and never send the final chunk (this has some serious caveats, like a buffer overrun on the client...).
So to answer your 2 points:
You can't, this low-level detail is totally hidden (for good reasons) from the client.
There is no HTTP handshake, there is a TCP handshake which is made when the client socket connects to the server socket. There is the TLS handshake which is made after the TCP connection and before any request is made. Once the connection is open, http requests are sent by the client and the server responds with resources.

Related

How is a TCP "Connection" maintained, and how does HTTP Keep-Alive affect it?

I'm an application developer looking to learn more about the transport layer of my requests that I've been making all these years. I've also been learning more of the backend and am building my own live data service with websockets, which has me curious about how data actually moves around.
As such I've learned about TCP, and I understand how it works, but there's still one term that confuses me-- a "TCP Connection". I have seen it everywhere, and actually there was a thread opened with the exact same question... but as the OP said in the comments, nobody actually answered the question:
TCP vs UDP - What is a TCP connection?
"when we say that there is a connection established between two hosts,
what does that mean? If I could get a magic microscope and inspect the
server or the client, and - a-ha! - find the connection, what would I
be looking at? Some variable allocated by the OS code? Some entry in
some kind of table? How and when does that gets there, and how and
when it is removed from there"
I've been reading to try to figure this out on my own,
Here is a nice resource that details HTTP flow, also mentions "TCP Connection"
https://blog.catchpoint.com/2010/09/17/anatomyhttp/
Here is another thread about HTTP Keep-alive, same "TCP Connection":
HTTP Keep Alive and TCP keep alive
My understanding:
When a client wants data from server, SYN/ACK handshake happens, this "connection" is established, and both parties agree on the starting sequence number, maximum packet size, etc.
as long as this "connection" is still open, client can request/receive data without doing another handshake. TCP Keep-alive sends a heartbeat to keep this "connection" open
1) Somehow a HTTP Header "Keep-alive" also keeps this TCP "connection" open, even though HTTP headers are part of the packet payload and it doesn't seem to make sense that the TCP layer would parse the HTTP headers?
To me it seems like a "connection" between two machines in the literal sense can never be closed, because a client is always free to hit a server with packets (like the first SYN packet, for example)
2) Is a TCP "connection" just the client and server saving the sequence number from the other's IP address? maybe it's just a flag that's saying "hey this client is cool, accept messages from them without a handshake"? So would closing a connection just be wiping that data out from memory?
... both parties agree on the starting sequence number
No, they don't "agree" one a number. Each direction has their own sequence numbering. So the client sends in the SYN to the server the initial sequence number (ISN) for the data from client to server, the server sends in its SYN the ISN for the data from server to client.
Somehow a HTTP Header "Keep-alive" also keeps this TCP "connection" open ...
Not really. With HTTP keep-alive the client just asks a server nicely to not close the connection after the HTTP response was sent so that another HTTP request can be sent using the same TCP connection. The server might decide to follow the clients wish or not.
To me it seems like a "connection" between two machines in the literal sense can never be closed,
Each side can send a packet with a FIN flag to signal that it will no longer send any data. If both sides has send the FIN the the connection is considered close since no one will send anything and thus nothing can be received. If one side decides that it does not want to receive any more data it can send a packet with a RST flag.
Is a TCP "connection" just the client and server saving the sequence number from the other's IP address?
Kind of. Each side saves the current state of the connection, i.e. IP's and ports involved, currently expected sequence number for receiving, current sequence number for sending, outstanding bytes which were not ACKed yet ... If no such state is there (for example one site crashed) then there is no connection.
... maybe it's just a flag that's saying "hey this client is cool, accept messages from them without a handshake"
If a packet got received which fits an existing state then it is considered part of the connection, i.e. it will be processed and the state will be updated.
So would closing a connection just be wiping that data out from memory?
Closing is telling the other that no more data will be send (using FIN) and if both side have done it both can basically remove the state and then there is no connection anymore.

How HTTP client detect web server crash

From HTTP:The definitive guide :
But without Content-Length, clients cannot distinguish between
successful connection close at the end of a message and connection
close due to a server crash in the middle of a message.
Let's assume that for this purpose the "server crash" means crash of the server's HW or OS without closing the TCP connection or possibly link being broken.
If the web server crashes without closing TCP connection, how does the client detect that the connection "has been closed"?
From what I know, if FIN segment is not sent the client will keep waiting for the data unless there is a timer or it tries to send some data (failing which detects TCP connection shutdown).
How is this done in HTTP?
If the web server crashes without closing TCP connection, how does the client detect that the connection "has been closed"?
Since the closing will be done by the kernel that would mean, that the whole system crashed or that the connection broke somewhere else (router crashed, power blackout at server side or similar).
You can only detect this if you sent data to the server and don't get any useful response back.
From what I know, if FIN segment is not sent the client will keep waiting for the data unless there is a timer or it tries to send some data (failing which detects TCP connection shutdown).
How is this done in HTTP?
HTTP uses TCP as the underlying protocol, so if TCP detects a connection close HTTP will too. Additionally HTTP can detect in most cases if the response is complete, by using information from Content-length header or similar information with chunked transfer encoding. In the few cases where the end of response is only indicated by a connection close HTTP can only rely on TCP do detect problems. So far the theory, but in practice most browsers simply ignore an incomplete response and show as much as they got.

Keep-alive header clarification

I was asked to build a site , and one of the co-developer told me That I would need to include the keep-alive header.
Well I read alot about it and still I have questions.
msdn ->
The open connection improves performance when a client makes multiple
requests for Web page content, because the server can return the
content for each request more quickly. Otherwise, the server has to
open a new connection for every request
Looking at
When The IIS (F) sends keep alive header (or user sends keep-alive) , does it mean that (E,C,B) save a connection which is only for my session ?
Where does this info is kept ( "this connection belongs to "Royi") ?
Does it mean that no one else can use that connection
If so - does it mean that keep alive-header - reduce the number of overlapped connection users ?
if so , for how long does the connection is saved to me ? (in other words , if I set keep alive- "keep" till when?)
p.s. for those who interested :
clicking this sample page will return keep alive header
Where is this info kept ("this connection is between computer A and server F")?
A TCP connection is recognized by source IP and port and destination IP and port. Your OS, all intermediate session-aware devices and the server's OS will recognize the connection by this.
HTTP works with request-response: client connects to server, performs a request and gets a response. Without keep-alive, the connection to an HTTP server is closed after each response. With HTTP keep-alive you keep the underlying TCP connection open until certain criteria are met.
This allows for multiple request-response pairs over a single TCP connection, eliminating some of TCP's relatively slow connection startup.
When The IIS (F) sends keep alive header (or user sends keep-alive) , does it mean that (E,C,B) save a connection
No. Routers don't need to remember sessions. In fact, multiple TCP packets belonging to same TCP session need not all go through same routers - that is for TCP to manage. Routers just choose the best IP path and forward packets. Keep-alive is only for client, server and any other intermediate session-aware devices.
which is only for my session ?
Does it mean that no one else can use that connection
That is the intention of TCP connections: it is an end-to-end connection intended for only those two parties.
If so - does it mean that keep alive-header - reduce the number of overlapped connection users ?
Define "overlapped connections". See HTTP persistent connection for some advantages and disadvantages, such as:
Lower CPU and memory usage (because fewer connections are open simultaneously).
Enables HTTP pipelining of requests and responses.
Reduced network congestion (fewer TCP connections).
Reduced latency in subsequent requests (no handshaking).
if so , for how long does the connection is saved to me ? (in other words , if I set keep alive- "keep" till when?)
An typical keep-alive response looks like this:
Keep-Alive: timeout=15, max=100
See Hypertext Transfer Protocol (HTTP) Keep-Alive Header for example (a draft for HTTP/2 where the keep-alive header is explained in greater detail than both 2616 and 2086):
A host sets the value of the timeout parameter to the time that the host will allows an idle connection to remain open before it is closed. A connection is idle if no data is sent or received by a host.
The max parameter indicates the maximum number of requests that a client will make, or that a server will allow to be made on the persistent connection. Once the specified number of requests and responses have been sent, the host that included the parameter could close the connection.
However, the server is free to close the connection after an arbitrary time or number of requests (just as long as it returns the response to the current request). How this is implemented depends on your HTTP server.

Figure out the point (latest sent byte) after TCP disconnection

I am wondering if it is possible to figure out the last byte that a server has sent to a client using TCP connection. To put it in details, I have a client and a server, both in C++. They are communicating using XMLRPC and the connection is TCP. The client can send a big request to the server and it might take some time for the server to reply, due to some calculations. In any part of the connection, if it gets disconnected, the entire process should be done from the scratch, which causes the server vulnerable to DoS attack.
My question is if I can figure out where the connection was disconnected so that after reestablishing the connection (for the same client using some Identifications), the server can send the remaining bytes from the previous request instead of processing request again.
You should code that support into your protocol. For example, break responses into 4096 byte chunks; then the client can reconnect and say: "I received the first 19 blocks, continue with block 20 please!"

Use HTTP Keep-Alive for server to communicate to client

Recently in an interview I was asked how I would approach an online chat client application. I went through the standard "polling" solution but was cut off because the interviewer was looking for the "HTTP 1.1 keep-alive" method. Having used HTTP for quite a while and remembering that the whole point was to be "stateless", this never occurred to me (also, not to mention that the keep-alive is not consistently implemented).
My question is, is it possible for a web server to broadcast and/or send information to a client when the "keep-alive" header has been set?
With HTTP 1.1, keep-alive is the default behavior. (In HTTP 1.0, the default behavior was to close the connection.) The server must send the 'Connection: close" header to terminate the connection with the first response. So there is still a TCP socket available to push data through, but just pushing data from the server would violate the HTTP protocol in a major way. Even using keep-alive, the client would still have to poll the server.
It is important to distinguish between HTTP Keepalive and TCP Keepalive. HTTP keepalive prevents the connection from being closed by the server or client. TCP keepalive is used when the connection might be idle for an extended period of time and might be dropped by a NAT proxy or firewall. TCP keepalive is activated on a per-socket basis by setsockopt() calls.
When doing a 'long poll' to eliminate the need to re-poll, TCP keepalive might be needed.
Keep-alive simply holds a TCP socket open, so each time you poll, you save the overhead of the TCP setup/teardown packets--but you still have to poll.
However, "long polling" is a strategy for the web server to broadcast notifications to the client. Essentially, the client issues a GET request, but instead of immediately responding, the web server waits until they have a notification to send, at which point they respond to the GET request. This eliminates any need for packets to go across the wire for polling purposes, and keeps the connection stateless, which as you correctly mention is one of the purposes of the protocol.
You might read more about Comet servers. That sounds basically like the approach that the interviewer was asking about. Their effectiveness is disputed by some, but it has been used in several similar situations.
For example, I believe gmail uses comet technologies for some things (but don't quote me on it).
Another example that seems relevant is BOSH, which is a protocol for transmitting chat information using HTTP and XMPP. But I don't believe that using keep-alive is involved in that.

Resources