TCP keep-alive to determine if client disconnected in netty - tcp

I'm trying to determine if a client has closed a socket connection from netty. Is there a way to do this?

On a usual case where a client closes the socket via close() and the TCP closing handshake has been finished successfully, a channelInactive() (or channelClosed() in 3) event will be triggered.
However, on an unusual case such as where a client machine goes offline due to power outage or unplugged LAN cable, it can take a lot of time until you discover the connection was actually down. To detect this situation, you have to send some message to the client periodically and expect to receive its response within a certain amount of time. It's like a ping - you should define a periodic ping and pong message in your protocol which practically does nothing but checking the health of the connection.
Alternatively, you can enable SO_KEEPALIVE, but the keepalive interval of this option is usually OS-dependent and I would not recommend using it.
To help a user implement this sort of behavior relatively easily, Netty provides ReadTimeoutHandler. Configure your pipeline so that ReadTimeoutHandler raises an exception when there's no inbound traffic for a certain amount of time, and close the connection on the exception in your exceptionCaught() handler method. If you are the party who is supposed to send a periodic ping message, use a timer (or IdleStateHandler) to send it.

If you are writing a server, and netty is your client, then your server can detect a disconnect by calling select() or equivalent to detect when the socket is readable and then call recv(). If recv() returns 0 then the socket was closed gracefully by the client. If recv() returns -1 then check errno or equivalent for the actual error (with few exceptions, most errors should be treated as an ungraceful disconnect). The thing about unexpected disconnects is that they can take a long time for the OS to detect, so you would have to either enable TCP keep-alives, or require the client to send data to the server on a regular basis. If nothing is received from the client for a period of time then just assume the client is gone and close your end of the connection. If the client wants to, it can then reconnect.

If you read from a connection that has been closed by the peer you will get an end-of-stream indication of some kind, depending on the API. If you write to such a connection you will get an IOException: 'connection reset'. TCP doesn't provide any other way of detecting a closed connection.
TCP keep-alive (a) is off by default and (b) only operates every two hours by default when enabled. This probably isn't what you want. If you use it and you read or write after it has detected that the connection is broken, you will get the reset error above,

It depends on your protocol that you use ontop of netty. If you design it to support ping-like messages, you can simply send those messages. Besides that, netty is only a pretty thin wrapper around TCP.
Also see this SO post which describes isOpen() and related. This however does not solve the keep-alive problem.

Related

Netty and TCP - How to properly send an empty message

We have a simple TCP server behind an AWS Network ELB (similar to Echo server with long-lived connections) written in Netty and I'm trying to implement a keep-alive mechanism similar to TCP keep-alive mechanism to keep our idle connections open. Unfortunately we cannot rely on TCP keep-alive mechanism since NELBs do not forward keep-alive TCP packets to the other side of the loadbalancer.
What I'm thinking to do is to watch for idle connections and send an empty string (empty byte array) to clients. What I did so far in the code is:
Add a IdleStateHandler with some timeout values
Register a GprsKeepAliveHandler, a sub class of ChannelDuplexHandler, overriding userEventTriggered method sending (ctx.writeAndFlush) the Unpooled.EMPTY_BUFFER.
This way, I expect to receive a RST packet if the connection is gone. Otherwise the connection will become active again.
The problem is Netty does not do anything with the empty message, it does not send any packets to the client (monitored with Wireshark). If I change the message to Unpooled.wrappedBuffer(new byte[]{0}) I see what I'm expect to see.
Questions
I couldn't find a better way to achieve my objective (keep connections alive and detect dead connections). If there's a better way please let me know.
What is the proper way to send an empty message in Netty? (I saw this question but it didn't help)
If the issue is because of OS TCP stack behavior, is there a way to solve this problem?
from my perspective you need to send something meaningful, because you try to do (e.g. ping/pong, heartbeating behavior). Also see Is it is possible to force TCP socket to send 0 bytes in case of packet losses - python
It seems that Netty does not make any syscall in case of empty messages. (see this)

How HTTP client detect web server crash

From HTTP:The definitive guide :
But without Content-Length, clients cannot distinguish between
successful connection close at the end of a message and connection
close due to a server crash in the middle of a message.
Let's assume that for this purpose the "server crash" means crash of the server's HW or OS without closing the TCP connection or possibly link being broken.
If the web server crashes without closing TCP connection, how does the client detect that the connection "has been closed"?
From what I know, if FIN segment is not sent the client will keep waiting for the data unless there is a timer or it tries to send some data (failing which detects TCP connection shutdown).
How is this done in HTTP?
If the web server crashes without closing TCP connection, how does the client detect that the connection "has been closed"?
Since the closing will be done by the kernel that would mean, that the whole system crashed or that the connection broke somewhere else (router crashed, power blackout at server side or similar).
You can only detect this if you sent data to the server and don't get any useful response back.
From what I know, if FIN segment is not sent the client will keep waiting for the data unless there is a timer or it tries to send some data (failing which detects TCP connection shutdown).
How is this done in HTTP?
HTTP uses TCP as the underlying protocol, so if TCP detects a connection close HTTP will too. Additionally HTTP can detect in most cases if the response is complete, by using information from Content-length header or similar information with chunked transfer encoding. In the few cases where the end of response is only indicated by a connection close HTTP can only rely on TCP do detect problems. So far the theory, but in practice most browsers simply ignore an incomplete response and show as much as they got.

Does asynchronous receive guarantee the detection of connection failure?

From what I know, a blocking receive on a TCP socket does not always detect a connection error (due either to a network failure or to a remote-endpoint failure) by returning a -1 value or raising an IO exception: sometimes it could just hang indefinitely.
One way to manage this problem is to set a timeout for the blocking receive. In case an upper bound for the reception time is known, this bound could be set as timeout and the connection could be considered lost simply when the timeout expires; when such an upper bound is not known a priori, for example in a pub-sub system where a connection stays open to receive publications, the timeout to be set would be somewhat arbitrary but its expiration could trigger a ping/pong request to verify that the connection (and the endpoint too) is still up.
I wonder whether the use of asynchronous receive also manages the problem of detecting a connection failure. In boost::asio I would call socket::asynch_read_some() registering an handler to be asynchronously called, while in java.nio I would configure the channel as non-blocking and register it to a selector with an OP_READ interest flag. I imagine that a correct connection-failure detection would mean that, in the first case the handler would be called with a non-0 error_code, while in the second case the selector would select the faulty channel but a subsequent read() on the channel would either return -1 or throw an IOException.
Is this behaviour guaranteed with asynchronous receive, or could there be scenarios where after a connection failure, for example, in boost::asio the handler will never be called or in java.nio the selector will never select the channel?
Thank you very much.
I believe you're referring to the TCP half-open connection problem (the RFC 793 meaning of the term). Under this scenario, the receiving OS will never receive indication of the lost connection, so it will never notify the app. Whether the app is readding synchronously or asynchronously doesn't enter into it.
The problem occurs when the transmitting side of the connection somehow is no longer aware of the network connection. This can happen, for example, when
the transmitting OS abruptly terminates/restarts (power outage, OS failure/BSOD, etc.).
the transmitting side closes its side while there is a network disruption between the two sides and cleans up its side: e.g transmitting OS reboots cleanly during disruption, transmitting Windows OS is unplugged from the network
When this happens, the receiving side may be waiting for data or a FIN that will never come. Unless the receiving side sends a message, there's no way for it to realize the transmitting side is no longer aware of the receiving side.
Your solution (a timeout) is one way to address the issue, but it should include sending a message to the transmitting side. Again, it doesn't matter the read is synchronous or asynchronous, just that it doesn't read and wait indefinitely for data or a FIN. Another solution is using a TCP KEEPALIVE feature that is supported by some TCP stacks. But the hard part of any generalized solution is usually determining a proper timeout, since the timeout is highly dependent on characteristics of the specific application.
Because of how TCP works, you will typically have to send data in order to notice a hard connection failure, to find out that no ACK packet will ever be returned. Some protocols attempt to identify conditions like this by periodically using a keep-alive or ping packet: if one side does not receive such a packet in X time (and perhaps after trying and failing one itself), it can consider the connection dead.
To answer your question, blocking and non-blocking receive should perform identically except for the act of blocking itself, so both will suffer from this same issue. In order to make sure that you can detect a silent failure from the remote host, you'll have to use a form of keep-alive like I described.

Detecting HTTP close using inet

In my mochiweb application, I am using a long held HTTP request. I wanted to detect when the connection with the user died, and I figured out how to do that by doing:
Socket = Req:get(socket),
inet:setopts(Socket, [{active, once}]),
receive
{tcp_closed, Socket} ->
% handle clean up
Data ->
% do something
end.
This works when: user closes his tab/browser or refreshes the page. However, when the internet connection dies suddenly (say wifi signal lost all of a sudden), or when the browser crashes abnormally, I am not able to detect a tcp close.
Am I missing something, or is there any other way to achieve this?
There is a TCP keepalive protocol and it can be enabled with inet:setopts/2 under the option {keepalive, Boolean}.
I would suggest that you don't use it. The keep-alive timeout and max-retries tends to be system wide, and it is optional after all. Using timeouts on the protocol level is better.
The HTTP protocol has the status code Request Timeout which you can send to the client if it seems dead.
Check out the after clause in receive blocks that you can use to timeout waiting for data, or use the timer module, or use erlang:start_timer/3. They all have different performance characteristics and resource costs.
There isn't a default "keep alive" (but can be enabled if supported) protocol over TCP: in case there is a connection fault when no data is exchanged, this translates to a "silent failure". You would need to account for this type of failure by yourself e.g. implement some form of connection probing.
How does this affect HTTP? HTTP is a stateless protocol - this means that every request is independent of every other. The "keep alive" functionality of HTTP doesn’t change that i.e. "silent failure" can still occur.
Only when data is exchanged can this condition be detected (or when TCP Keep Alive is enabled).
I would suggest sending the application level keep alive messages over HTTP chunked-encoding. Have your client/server smart enough to understand the keep alive messages and ignore them if they arrive on time or close and re-establish the connection again.

Rejecting a TCP connection before it's being accepted?

There are 3 different accept versions in winsock. Aside from the basic accept which is there for standard compliance, there's also AcceptEx which seems the most advanced version (due to it's overlapped io capabilities), and WSAAccept. The latter supports a condition callback, which as far as I understand, allows the rejection of connection requests before they're accepted (when the SO_CONDITIONAL_ACCEPT option is enabled). None of the other versions supports this functionality.
Since I prefer to use AcceptEx with overlapped io, I wonder how come this functionality is only available in the simpler version?
I don't know enough about the inner workings of TCP to tell wehter there's actually any difference between rejecting a connection before it has been accepted, and disconnecting a socket right after a connection has been established? And if there is, is there any way to mimic the WSAAccept functionality with AcceptEx?
Can someone shed some light over this issue?
When a connection is established, the remote end sends a packet with the SYN flag set. The server answers with a SYN,ACK packet, and after that the remote end sends an ACK packet, which may already contain data.
There are two ways to break a TCP connection from forming. The first is resetting the connection - this is the same as the common "connection refused" message seen when connecting to a port nobody is listening to. In this case, the original SYN packet is answered with a RST packet, which terminates the connection immediately and is stateless. If the SYN is resent, RST will be generated from every received SYN packet.
The second is closing the connection as soon as it has been formed. On the TCP level, there is no way to close the connection both ways immediately - the only thing you can say is that "I am not going to send any more data". This happens so that when the initial SYN, SYN,ACK, ACK exchange has finished, the server sends a FIN packet to the remote end. In most cases, telling the other end with FIN that "I am not going to send any more data" makes the other end close the connection as well, and send it's own FIN packet. A connection terminated this way isn't any different from a normal connection where no data was sent for some reason. This means that the normal state tracking for TCP connections and the lingering close states will persist, just like for normal connections.
Now, on the C API side, this looks a bit different. When calling listen() on a port, the OS starts accepting connections on that port. This means that is starts replying SYN,ACK packets to connections, regardless if the C code has called accept() yet. So, on the TCP side, it makes no difference whether the connection is somehow closed before or after accept. The only additional concern is that a listening socket has a backlog, which means the number of non-accepted connections it can have waiting, before it starts saying RST to the remote end.
However, on windows, the SO_CONDITIONAL_ACCEPT call allows the application to take control of the backlog queue. This means that the server will not answer anything to a SYN packet until the application does something with the connection. This means, that rejecting connections at this level can actually send RST packets to the network without creating state.
So, if you cannot get the SO_CONDITIONAL_ACCEPT functionality enabled somehow on the socket you are using AcceptEx on, it will show up differently to the network. However, not many places actually use the immediate RST functionality, so I would think the requirement for that must mean a very specialized system indeed. For most common use cases, accepting a socket and then closing it is the normal way to behave.
I can't comment on the Windows side of things but as far as TCP is concerned, rejecting a connection is a bit different than disconnecting from it.
For one, disconnecting from a connection means there were more resources already "consumed" (e.g. ports state maintained in Firewalls & end-points, forwarding capacity used in switches/routers etc.) in both the network and the hosts. Rejecting a connection is less resource intensive.

Resources