Reusing Client TCP Connection for HTTP - http

Why can't I reuse a http client socket in a C web client, since I don't call close(http_socket_fd)? The first write/read to the socket file descriptor works perfectly well. Any/all successive reads return zero (hardly any error).
Basically, I don't want to keep recreating new client connection sockets to the same host for successive requests. Is it not possible in C to reuse an open client socket (which already has HTTP keep-alive enabled) for successive read/writes? It seems possible with Java http://www.mail-archive.com/httpclient-dev#jakarta.apache.org/msg04687.html
Example: (PSEUDO_CODE)
MANY_DOMAINS=30,000;
//initial connection
do{
http_socket_to_domain_x=open(NEW_TCP_SOCKET_PER_DOMAIN);
get(initial_url_path);
read(http_socket_to_domain_x,initial_http_response);
} while(EACH_DOMAIN)
for(LIST_OF_URLS FROM EACH_DOMAIN);
//successive connections - NO RECREATING TCP SOCKET!
do{
get(another_url_path);
read(http_socket_to_domain_x,another_http_response);
} while(EACH_URL_PER_DOMAIN)
//finally
close(http_socket_to_domain_x);

You want to read up on persistent HTTP/1.1 connections.

The server may be closing the connection. Are you sending HTTP/1.1 headers? See http://www.io.com/~maus/HttpKeepAlive.html for a bit more.

Related

Netty and TCP - How to properly send an empty message

We have a simple TCP server behind an AWS Network ELB (similar to Echo server with long-lived connections) written in Netty and I'm trying to implement a keep-alive mechanism similar to TCP keep-alive mechanism to keep our idle connections open. Unfortunately we cannot rely on TCP keep-alive mechanism since NELBs do not forward keep-alive TCP packets to the other side of the loadbalancer.
What I'm thinking to do is to watch for idle connections and send an empty string (empty byte array) to clients. What I did so far in the code is:
Add a IdleStateHandler with some timeout values
Register a GprsKeepAliveHandler, a sub class of ChannelDuplexHandler, overriding userEventTriggered method sending (ctx.writeAndFlush) the Unpooled.EMPTY_BUFFER.
This way, I expect to receive a RST packet if the connection is gone. Otherwise the connection will become active again.
The problem is Netty does not do anything with the empty message, it does not send any packets to the client (monitored with Wireshark). If I change the message to Unpooled.wrappedBuffer(new byte[]{0}) I see what I'm expect to see.
Questions
I couldn't find a better way to achieve my objective (keep connections alive and detect dead connections). If there's a better way please let me know.
What is the proper way to send an empty message in Netty? (I saw this question but it didn't help)
If the issue is because of OS TCP stack behavior, is there a way to solve this problem?
from my perspective you need to send something meaningful, because you try to do (e.g. ping/pong, heartbeating behavior). Also see Is it is possible to force TCP socket to send 0 bytes in case of packet losses - python
It seems that Netty does not make any syscall in case of empty messages. (see this)

HTTP-Long Polling keep-alive and handshakes

I'm doing a test where I examine how much HTTP-long polling compared to Websockets is affecting the battery performance on my iPhone. Basically what I have is a Node.js with express server that sends out a random string every 0.5 or 10th second to the iPhone. I've inspected the messages in Chrome and I can see the keep-alive header is present. I know keep-alive is a default feature since HTTP/1.1. From what I've understood the TCP-connection will be held open and can be used for pipelining, and this is certainly the case when I'm sending out pings from the server every 0.5 seconds. But when I send out every 10 seconds, will the connection be closed during that time?
How do I know how long the connection is open? This seems to be a crucial part to have in mind when doing the tests.
Will the HTTP-handshake still be made when the TCP-connection is open?
AFAIK, in HTTP 1, the server cannot send a response back to the client if that client didn't send a request first. That might sound irrelevant to your question but bear with me.
The Connection: keep-alive header tells the client that it can reuse the connection if he want to, not that it must. The client can decide to close it any time, it all depends on the client library implementation and you don't have any guarantee.
The only way to force the client to not close the connection is to not finish the response. The only way to do that is to send a response with a Transfer-Encoding: chunked, and never send the final chunk (this has some serious caveats, like a buffer overrun on the client...).
So to answer your 2 points:
You can't, this low-level detail is totally hidden (for good reasons) from the client.
There is no HTTP handshake, there is a TCP handshake which is made when the client socket connects to the server socket. There is the TLS handshake which is made after the TCP connection and before any request is made. Once the connection is open, http requests are sent by the client and the server responds with resources.

How HTTP client detect web server crash

From HTTP:The definitive guide :
But without Content-Length, clients cannot distinguish between
successful connection close at the end of a message and connection
close due to a server crash in the middle of a message.
Let's assume that for this purpose the "server crash" means crash of the server's HW or OS without closing the TCP connection or possibly link being broken.
If the web server crashes without closing TCP connection, how does the client detect that the connection "has been closed"?
From what I know, if FIN segment is not sent the client will keep waiting for the data unless there is a timer or it tries to send some data (failing which detects TCP connection shutdown).
How is this done in HTTP?
If the web server crashes without closing TCP connection, how does the client detect that the connection "has been closed"?
Since the closing will be done by the kernel that would mean, that the whole system crashed or that the connection broke somewhere else (router crashed, power blackout at server side or similar).
You can only detect this if you sent data to the server and don't get any useful response back.
From what I know, if FIN segment is not sent the client will keep waiting for the data unless there is a timer or it tries to send some data (failing which detects TCP connection shutdown).
How is this done in HTTP?
HTTP uses TCP as the underlying protocol, so if TCP detects a connection close HTTP will too. Additionally HTTP can detect in most cases if the response is complete, by using information from Content-length header or similar information with chunked transfer encoding. In the few cases where the end of response is only indicated by a connection close HTTP can only rely on TCP do detect problems. So far the theory, but in practice most browsers simply ignore an incomplete response and show as much as they got.

Keep-alive header clarification

I was asked to build a site , and one of the co-developer told me That I would need to include the keep-alive header.
Well I read alot about it and still I have questions.
msdn ->
The open connection improves performance when a client makes multiple
requests for Web page content, because the server can return the
content for each request more quickly. Otherwise, the server has to
open a new connection for every request
Looking at
When The IIS (F) sends keep alive header (or user sends keep-alive) , does it mean that (E,C,B) save a connection which is only for my session ?
Where does this info is kept ( "this connection belongs to "Royi") ?
Does it mean that no one else can use that connection
If so - does it mean that keep alive-header - reduce the number of overlapped connection users ?
if so , for how long does the connection is saved to me ? (in other words , if I set keep alive- "keep" till when?)
p.s. for those who interested :
clicking this sample page will return keep alive header
Where is this info kept ("this connection is between computer A and server F")?
A TCP connection is recognized by source IP and port and destination IP and port. Your OS, all intermediate session-aware devices and the server's OS will recognize the connection by this.
HTTP works with request-response: client connects to server, performs a request and gets a response. Without keep-alive, the connection to an HTTP server is closed after each response. With HTTP keep-alive you keep the underlying TCP connection open until certain criteria are met.
This allows for multiple request-response pairs over a single TCP connection, eliminating some of TCP's relatively slow connection startup.
When The IIS (F) sends keep alive header (or user sends keep-alive) , does it mean that (E,C,B) save a connection
No. Routers don't need to remember sessions. In fact, multiple TCP packets belonging to same TCP session need not all go through same routers - that is for TCP to manage. Routers just choose the best IP path and forward packets. Keep-alive is only for client, server and any other intermediate session-aware devices.
which is only for my session ?
Does it mean that no one else can use that connection
That is the intention of TCP connections: it is an end-to-end connection intended for only those two parties.
If so - does it mean that keep alive-header - reduce the number of overlapped connection users ?
Define "overlapped connections". See HTTP persistent connection for some advantages and disadvantages, such as:
Lower CPU and memory usage (because fewer connections are open simultaneously).
Enables HTTP pipelining of requests and responses.
Reduced network congestion (fewer TCP connections).
Reduced latency in subsequent requests (no handshaking).
if so , for how long does the connection is saved to me ? (in other words , if I set keep alive- "keep" till when?)
An typical keep-alive response looks like this:
Keep-Alive: timeout=15, max=100
See Hypertext Transfer Protocol (HTTP) Keep-Alive Header for example (a draft for HTTP/2 where the keep-alive header is explained in greater detail than both 2616 and 2086):
A host sets the value of the timeout parameter to the time that the host will allows an idle connection to remain open before it is closed. A connection is idle if no data is sent or received by a host.
The max parameter indicates the maximum number of requests that a client will make, or that a server will allow to be made on the persistent connection. Once the specified number of requests and responses have been sent, the host that included the parameter could close the connection.
However, the server is free to close the connection after an arbitrary time or number of requests (just as long as it returns the response to the current request). How this is implemented depends on your HTTP server.

Figure out the point (latest sent byte) after TCP disconnection

I am wondering if it is possible to figure out the last byte that a server has sent to a client using TCP connection. To put it in details, I have a client and a server, both in C++. They are communicating using XMLRPC and the connection is TCP. The client can send a big request to the server and it might take some time for the server to reply, due to some calculations. In any part of the connection, if it gets disconnected, the entire process should be done from the scratch, which causes the server vulnerable to DoS attack.
My question is if I can figure out where the connection was disconnected so that after reestablishing the connection (for the same client using some Identifications), the server can send the remaining bytes from the previous request instead of processing request again.
You should code that support into your protocol. For example, break responses into 4096 byte chunks; then the client can reconnect and say: "I received the first 19 blocks, continue with block 20 please!"

Resources