C++ TCP connection unexpected write() result - tcp

I'm working on a C++ TCP server - multi client program. The problem is why the return value of write() may be a number greater than 0 even a client has been disconnected (in this case, I test with localhost and press Ctrl+C so one of the client programs ends, but it forces the main program to close).
There are some questions:
1. How to detect if write to client failed
2. Why the return value of write() is not <= 0 when it failed?
3. How to prevent the server program force close caused by a client program closed?

This is inherent nature of TCP protocol. Succeeding in write() does not mean the message was read (or even reached) the server. Without going into too much details on TCP stack implementation, just accept this as a fact of life.
If you want to make sure the message has been read, you'd have to introduce your own acknowlegement protocol on top. Lack of acknowlegement would mean the message was lost.
Answering question on crash when writing to closed socket:
Application terminates because writing to closed socket by default causes SIGPIPE signal. There are two ways to prevent this:
Set SIGPIPE handler to SIG_IGN and deal with write() errors.
Depending on your OS, either set SO_NOSIGPIPE on the socket using setsockopt(), or use MSG_NOSIGNAL as a flag to your send().

Related

TCP keep-alive to determine if client disconnected in netty

I'm trying to determine if a client has closed a socket connection from netty. Is there a way to do this?
On a usual case where a client closes the socket via close() and the TCP closing handshake has been finished successfully, a channelInactive() (or channelClosed() in 3) event will be triggered.
However, on an unusual case such as where a client machine goes offline due to power outage or unplugged LAN cable, it can take a lot of time until you discover the connection was actually down. To detect this situation, you have to send some message to the client periodically and expect to receive its response within a certain amount of time. It's like a ping - you should define a periodic ping and pong message in your protocol which practically does nothing but checking the health of the connection.
Alternatively, you can enable SO_KEEPALIVE, but the keepalive interval of this option is usually OS-dependent and I would not recommend using it.
To help a user implement this sort of behavior relatively easily, Netty provides ReadTimeoutHandler. Configure your pipeline so that ReadTimeoutHandler raises an exception when there's no inbound traffic for a certain amount of time, and close the connection on the exception in your exceptionCaught() handler method. If you are the party who is supposed to send a periodic ping message, use a timer (or IdleStateHandler) to send it.
If you are writing a server, and netty is your client, then your server can detect a disconnect by calling select() or equivalent to detect when the socket is readable and then call recv(). If recv() returns 0 then the socket was closed gracefully by the client. If recv() returns -1 then check errno or equivalent for the actual error (with few exceptions, most errors should be treated as an ungraceful disconnect). The thing about unexpected disconnects is that they can take a long time for the OS to detect, so you would have to either enable TCP keep-alives, or require the client to send data to the server on a regular basis. If nothing is received from the client for a period of time then just assume the client is gone and close your end of the connection. If the client wants to, it can then reconnect.
If you read from a connection that has been closed by the peer you will get an end-of-stream indication of some kind, depending on the API. If you write to such a connection you will get an IOException: 'connection reset'. TCP doesn't provide any other way of detecting a closed connection.
TCP keep-alive (a) is off by default and (b) only operates every two hours by default when enabled. This probably isn't what you want. If you use it and you read or write after it has detected that the connection is broken, you will get the reset error above,
It depends on your protocol that you use ontop of netty. If you design it to support ping-like messages, you can simply send those messages. Besides that, netty is only a pretty thin wrapper around TCP.
Also see this SO post which describes isOpen() and related. This however does not solve the keep-alive problem.

How does TCP connection terminate if one of the machine dies?

If a TCP connection is established between two hosts (A & B), and lets say host A has sent 5 octets to host B, and then the host B crashes (due to unknown reason).
The host A will wait for acknowledgments, but on not getting them, will resend octets and also reduce the sender window size.
This will repeat couple times till the window size shrinks to zero because of packet loss. My question is, what will happen next?
In this case, TCP eventually times out waiting for the ack's and return an error to the application. The application have to read/recv from the TCP socket to learn about that error, a subsequent write/send call will fail as well. Up till the point that TCP determined that the connection is gone, write/send calls will not fail, they'll succeed as seen from the application or block if the socket buffer is full.
In the case your host B vanishes after it has sent its ACKs, host A will not learn about that until it sends something to B, which will eventually also time out, or result in an ICMP error. (Typically the first write/send call will not fail as TCP will not fail the connection immediately, and keep in mind that write/send calls does not wait for ACKs until they complete).
Note also that retransmission does not reduce the window size.
Please follow this link
now a very simple answer to your question in my view is, The connection will be timed out and will be closed. another possibility that exists is that some ICMP error might be generated due to due un-responsive machine.
Also, if the crashed machine is online again, then the procedure described in the link i just pasted above will be observed.
Depends on the OS implementation. In short it will wait for ACK and resend packets until it times out. Then your connection will be torn down. To see exactly what happens in Linux look here other OSes follow similar algorithm.
in your case, A FIN will be generated (by the surviving node) and connection will eventually migrate to CLOSED state. If you keep grep-ing for netstat output on the destination ip address, you will watch the migration from ESTABLISHED state to TIMED_WAIT and then finally disappear.
In your case, this will happen since TCP keeps a timer to get the ACK for the packet it has sent. This timer is not long enough so detection will happen pretty quickly.
However, if the machine B dies after A gets ACK and after that A doesn't send anything, then the above timer can't detect the same event, however another timer (calls idle timeout) will detect that condition and connection will close then. This timeout period is high by default. But normally this is not the case, machine A will try to send stuff in between and will detect the error condition in send path.
In short, TCP is smart enough to close the connection by itself (and let application know about it) except for one case (Idle timeout: which by default is very high).
cforfun
In normal cases, each side terminating its end of the connectivity by sending a special message with a FIN(finish) bit set.
The device receiving this FIN responds with an acknowledgement to the FIN to indicate that it has been received.
The connection as a whole is not considered terminated until both the devices complete the shut down procedure by sending an FIN and receiving an acknowledgement.

TCP client-server SIGPIPE

I am designing and testing a client server program based on TCP sockets(Internet domain). Currently , I am testing it on my local machine and not able to understand the following about SIGPIPE.
*. SIGPIPE appears quite randomly. Can it be deterministic?
The first tests involved single small(25 characters) send operation from client and corresponding receive at server. The same code, on the same machine runs successfully or not(SIGPIPE) totally out of my control. The failure rate is about 45% of times(quite high). So, can I tune the machine in any way to minimize this.
**. The second round of testing was to send 40000 small(25 characters) messages from the client to the server(1MB of total data) and then the server responding with the total size of data it actually received. The client sends data in a tight loop and there is a SINGLE receive call at the server. It works only for a maximum of 1200 bytes of total data sent and again, there are these non deterministic SIGPIPEs, about 70% times now(really bad).
Can some one suggest some improvement in my design(probably it will be at the server). The requirement is that the client shall be able to send over medium to very high amount of data (again about 25 characters each message) after a single socket connection has been made to the server.
I have a feeling that multiple sends against a single receive will always be lossy and very inefficient. Shall we be combining the messages and sending in one send() operation only. Is that the only way to go?
SIGPIPE is sent when you try to write to an unconnected pipe/socket. Installing a handler for the signal will make send() return an error instead.
signal(SIGPIPE, SIG_IGN);
Alternatively, you can disable SIGPIPE for a socket:
int n = 1;
setsockopt(thesocket, SOL_SOCKET, SO_NOSIGPIPE, &n, sizeof(n));
Also, the data amounts you're mentioning are not very high. Likely there's a bug somewhere that causes your connection to close unexpectedly, giving a SIGPIPE.
SIGPIPE is raised because you are attempting to write to a socket that has been closed. This does indicate a probable bug so check your application as to why it is occurring and attempt to fix that first.
Attempting to just mask SIGPIPE is not a good idea because you don't really know where the signal is coming from and you may mask other sources of this error. In multi-threaded environments, signals are a horrible solution.
In the rare cases were you cannot avoid this, you can mask the signal on send. If you set the MSG_NOSIGNAL flag on send()/sendto(), it will prevent SIGPIPE being raised. If you do trigger this error, send() returns -1 and errno will be set to EPIPE. Clean and easy. See man send for details.

Does asynchronous receive guarantee the detection of connection failure?

From what I know, a blocking receive on a TCP socket does not always detect a connection error (due either to a network failure or to a remote-endpoint failure) by returning a -1 value or raising an IO exception: sometimes it could just hang indefinitely.
One way to manage this problem is to set a timeout for the blocking receive. In case an upper bound for the reception time is known, this bound could be set as timeout and the connection could be considered lost simply when the timeout expires; when such an upper bound is not known a priori, for example in a pub-sub system where a connection stays open to receive publications, the timeout to be set would be somewhat arbitrary but its expiration could trigger a ping/pong request to verify that the connection (and the endpoint too) is still up.
I wonder whether the use of asynchronous receive also manages the problem of detecting a connection failure. In boost::asio I would call socket::asynch_read_some() registering an handler to be asynchronously called, while in java.nio I would configure the channel as non-blocking and register it to a selector with an OP_READ interest flag. I imagine that a correct connection-failure detection would mean that, in the first case the handler would be called with a non-0 error_code, while in the second case the selector would select the faulty channel but a subsequent read() on the channel would either return -1 or throw an IOException.
Is this behaviour guaranteed with asynchronous receive, or could there be scenarios where after a connection failure, for example, in boost::asio the handler will never be called or in java.nio the selector will never select the channel?
Thank you very much.
I believe you're referring to the TCP half-open connection problem (the RFC 793 meaning of the term). Under this scenario, the receiving OS will never receive indication of the lost connection, so it will never notify the app. Whether the app is readding synchronously or asynchronously doesn't enter into it.
The problem occurs when the transmitting side of the connection somehow is no longer aware of the network connection. This can happen, for example, when
the transmitting OS abruptly terminates/restarts (power outage, OS failure/BSOD, etc.).
the transmitting side closes its side while there is a network disruption between the two sides and cleans up its side: e.g transmitting OS reboots cleanly during disruption, transmitting Windows OS is unplugged from the network
When this happens, the receiving side may be waiting for data or a FIN that will never come. Unless the receiving side sends a message, there's no way for it to realize the transmitting side is no longer aware of the receiving side.
Your solution (a timeout) is one way to address the issue, but it should include sending a message to the transmitting side. Again, it doesn't matter the read is synchronous or asynchronous, just that it doesn't read and wait indefinitely for data or a FIN. Another solution is using a TCP KEEPALIVE feature that is supported by some TCP stacks. But the hard part of any generalized solution is usually determining a proper timeout, since the timeout is highly dependent on characteristics of the specific application.
Because of how TCP works, you will typically have to send data in order to notice a hard connection failure, to find out that no ACK packet will ever be returned. Some protocols attempt to identify conditions like this by periodically using a keep-alive or ping packet: if one side does not receive such a packet in X time (and perhaps after trying and failing one itself), it can consider the connection dead.
To answer your question, blocking and non-blocking receive should perform identically except for the act of blocking itself, so both will suffer from this same issue. In order to make sure that you can detect a silent failure from the remote host, you'll have to use a form of keep-alive like I described.

Behavior of shutdown(sock, SHUT_RD) with TCP

When using a TCP socket, what does
shutdown(sock, SHUT_RD);
actually do? Does it just make all recv() calls return an error code? If so, which error code?
Does it cause any packets to be sent by the underlying TCP connection? What happens to any data that the other side sends at this point - is it kept, and the window size of the connection keeps shrinking until it gets to 0, or is it just discarded, and the window size doesn't shrink?
Shutting down the read side of a socket will cause any blocked recv (or similar) calls to return 0 (indicating graceful shutdown). I don't know what will happen to data currently traveling up the IP stack. It will most certainly ignore data that is in-flight from the other side. It will not affect writes to that socket at all.
In fact, judicious use of shutdown is a good way to ensure that you clean up as soon as you're done. An HTTP client that doesn't use keepalive can shutdown the write-side as soon as it is done sending the request, and a server that sees Connection: closed can likewise shutdown the read-side as soon as it is done receiving the request. This will cause any further erroneous activity to be immediately obvious, which is very useful when writing protocol-level code.
Looking at the Linux source code, shutdown(sock, SHUT_RD) doesn't seem to cause any state changes to the socket. (Obviously, shutdown(sock, SHUT_WR) causes FIN to be set.)
I can't comment on the window size changes (or lack thereof). But you can write a test program to see. Just make your inetd run a chargen service, and connect to it. :-)
shutdown(,SHUT_RD) does not have any counterpart in TCP protocol, so it is pretty much up to implementation how to behave when someone writes to a connection where other side indicated that it will not read or when you try to read after you declared that you wont.
On slightly lower level it is beneficial to remember that TCP connection is a pair of flows using which peers send data until they declare that they are done (by SHUT_WR which sends FIN). And these two flows are quite independent.
I test shudown(sock,SHUT_RD) on Ubuntu 12.04. I find that when you call shutdown(sock,SHUT_RD) if there are no any type of data(include FIN....) in the TCP buffer, the successive read call will return 0(indicates end of stream). But if there are some data which arrived before or after shutdown function, read call will process normally as if shutdown function was not called. It seems that shutdown(sock,SHUT_RD) doesn't cause any TCP states changed to the socket
It has two effects, one of them platform-dependent.
recv() will return zero, indicating end of stream.
Any further writes to the connection by the peer will either be (a) silently thrown away by the receiver (BSD), (b) be buffered by the receiver and eventually cause send() to block or return -1/EAGAIN/EWOULDBLOCK (Linux), or (c) cause the receiver to send an RST (Windows).
shutdown(sock, SHUT_RD) causes any writer to the socket to receive a sigpipe signal.
Any further reads using the read system call will return a -1 and set errno to EINVAL.
The use of recv will return a -1 and set errno to indicate the error (probably ENOTCONN or ENOTSOCK).

Resources