How to query TCP connection state in go? - tcp

On the client side of a TCP connection, I am attempting to to reuse established connections as much as possible to avoid the overhead of dialing every time I need a connection. Fundamentally, it's connection pooling, although technically, my pool size just happens to be one.
I'm running into a problem in that if a connection sits idle for long enough, the other end disconnects. I've tried using something like the following to keep connections alive:
err = conn.(*net.TCPConn).SetKeepAlive(true)
if err != nil {
fmt.Println(err)
return
}
err = conn.(*net.TCPConn).SetKeepAlivePeriod(30*time.Second)
if err != nil {
fmt.Println(err)
return
}
But this isn't helping. In fact, it's causing my connections to close sooner. I'm pretty sure this is because (on a Mac) this means the connection health starts being probed after 30 seconds and then is probed at 8 times at 30 second intervals. The server side must not be supporting keepalive, so after 4 minutes and 30 seconds, the client is disconnecting.
There might be nothing I can do to keep an idle connection alive indefinitely, and that would be absolutely ok if there were some way for me to at least detect that a connection has been closed so that I can seamlessly replace it with a new one. Alas, even after reading all the docs and scouring the blogosphere for help, I can't find any way at all in go to query the state of a TCP connection.
There must be a way. Does anyone have any insight into how that can be accomplished? Many thanks in advance to anyone who does!
EDIT:
Ideally, I'd like to learn how to handle this, low-level with pure go-- without using third-party libraries to accomplish this. Of course if there is some library that does this, I don't mind being pointed in its direction so I can see how they do it.

The socket api doesn't give you access to the state of the connection. You can query the current state it in various ways from the kernel (/proc/net/tcp[6] on linux for example), but that doesn't make any guarantee that further sends will succeed.
I'm a little confused on one point here. My client is ONLY sending data. Apart from acking the packets, the server sends nothing back. Reading doesn't seem an appropriate way to determine connection status, as there's noting TO read.
The socket API is defined such that that you detect a closed connection by a read returning 0 bytes. That's the way it works. In Go, this is translated to a Read returning io.EOF. This will usually be the fastest way to detect a broken connection.
So am I supposed to just send and act on whatever errors occur? If so, that's a problem because I'm observing that I typically do not get any errors at all when attempting to send over a broken pipe-- which seems totally wrong
If you look closely at how TCP works, this is the expected behavior. If the connection is closed on the remote side, then your first send will trigger an RST from the server, fully closing the local connection. You either need to read from the connection to detect the close, or if you try to send again you will get an error (assuming you've waited long enough for the packets to make a round trip), like "broken pipe" on linux.
To clarify... I can dial, unplug an ethernet cable, and STILL send without error. The messages don't get through, obviously, but I receive no error
If the connection is actually broken, or the server is totally unresponsive, then you're sending packets off to nowhere. The TCP stack can't tell the difference between packets that are really slow, packet loss, congestion, or a broken connection. The system needs to wait for the retransmission timeout, and retry the packet a number of times before failing. The standard configuration for retries alone can take between 13 and 30 minutes to trigger an error.
What you can do in your code is
Turn on keepalive. This will notify you of a broken connection more quickly, because the idle connection is always being tested.
Read from the socket. Either have a concurrent Read in progress, or check for something to read first with select/poll/epoll (Go usually uses the first)
Set timeouts (deadlines in Go) for everything.
If you're not expecting any data from the connection, checking for a closed connection is very easy in Go; dispatch a goroutine to read from the connection until there's an error.
notify := make(chan error)
go func() {
buf := make([]byte, 1024)
for {
n, err := conn.Read(buf)
if err != nil {
notify <- err
return
}
if n > 0 {
fmt.Println("unexpected data: %s", buf[:n])
}
}
}()

There is no such thing as 'TCP connection state', by design. There is only what happens when you send something. There is no TCP API, at any level down to the silicon, that will tell you the current state of a TCP connection. You have to try to use it.
If you're sending keepalive probes, the server doesn't have any choice but to respond appropriately. The server doesn't even know that they are keepalives. They aren't. They are just duplicate ACKs. Supporting keepalive just means supporting sending keepalives.

Related

TCP write error but not really

I have been testing a program which has simple communication between two machines over a 1Gbps line. While running TCP communications over the line I occasionally receive write errors on the client side (due to a timeout) when the network is totally flooded (running at or close to 100% usage). This generally happens when I am running multiple instances of the same program going to different ports.
My question is, is it possible to get a write error but still receive the message on the server side. It appears that is what is happening, and I am not quite sure why. Could it be that the ACK coming back to the client is what is timing out?
Yes, that is possible. TCP does not guarantee you that data you sent successfully is received and that data that is sent unsuccessfully is not received. This problem is unsolvable. It is called the Generals Problem. There is always a way to loose messages/packets such that the sender comes to the wrong conclusion. TCP guarantees that the receiver receives the same stream of bytes that the sender sent, but possibly cut off at an arbitrary point.
This unreliability has performance reasons, too. TCP data is buffered on both hosts as well as on the network. Acknowledgement is delayed.
You have to live with this. If you make your scenario more concrete I can suggest some strategies of dealing with this.
send puts data into the TCP send buffer.
If the send buffer has no enough space, send will block util the data is completely or partly copied into the send buffer, or the designed timeout arrives.
Read timeout and write timeout is OK. You should check and process them. The way is restarting read/write operation after timeout. You also pay attention to other read/write error except timeout.

TCP keep-alive to determine if client disconnected in netty

I'm trying to determine if a client has closed a socket connection from netty. Is there a way to do this?
On a usual case where a client closes the socket via close() and the TCP closing handshake has been finished successfully, a channelInactive() (or channelClosed() in 3) event will be triggered.
However, on an unusual case such as where a client machine goes offline due to power outage or unplugged LAN cable, it can take a lot of time until you discover the connection was actually down. To detect this situation, you have to send some message to the client periodically and expect to receive its response within a certain amount of time. It's like a ping - you should define a periodic ping and pong message in your protocol which practically does nothing but checking the health of the connection.
Alternatively, you can enable SO_KEEPALIVE, but the keepalive interval of this option is usually OS-dependent and I would not recommend using it.
To help a user implement this sort of behavior relatively easily, Netty provides ReadTimeoutHandler. Configure your pipeline so that ReadTimeoutHandler raises an exception when there's no inbound traffic for a certain amount of time, and close the connection on the exception in your exceptionCaught() handler method. If you are the party who is supposed to send a periodic ping message, use a timer (or IdleStateHandler) to send it.
If you are writing a server, and netty is your client, then your server can detect a disconnect by calling select() or equivalent to detect when the socket is readable and then call recv(). If recv() returns 0 then the socket was closed gracefully by the client. If recv() returns -1 then check errno or equivalent for the actual error (with few exceptions, most errors should be treated as an ungraceful disconnect). The thing about unexpected disconnects is that they can take a long time for the OS to detect, so you would have to either enable TCP keep-alives, or require the client to send data to the server on a regular basis. If nothing is received from the client for a period of time then just assume the client is gone and close your end of the connection. If the client wants to, it can then reconnect.
If you read from a connection that has been closed by the peer you will get an end-of-stream indication of some kind, depending on the API. If you write to such a connection you will get an IOException: 'connection reset'. TCP doesn't provide any other way of detecting a closed connection.
TCP keep-alive (a) is off by default and (b) only operates every two hours by default when enabled. This probably isn't what you want. If you use it and you read or write after it has detected that the connection is broken, you will get the reset error above,
It depends on your protocol that you use ontop of netty. If you design it to support ping-like messages, you can simply send those messages. Besides that, netty is only a pretty thin wrapper around TCP.
Also see this SO post which describes isOpen() and related. This however does not solve the keep-alive problem.

Is validation necessary with TCP?

I had to implement an application which in very short, sent packets every few seconds to a server, when the server received them it send a response to the client which only then proceeded to send another packet. This sounds all good but we were using TCP and the responses came as soon as the server got the packet and not post-processing or anything like that. So that makes me wonder, why would you do something like this? The client had a queue where I kept all the packets and did something like this:
try {
send packet // exception is thrown if connection is lost
remove packet from queue
} catch exception {
try to reconnect
}
so in this case the packet gets removed from the queue only if the send was successful.
Any idea about this? Is this best practice? I would appreciate if someone could clear this for me.
Thanks
One option would be to put the packets into a queue and send them. After sending move them into a "pending" queue. Once the other end has processed them, you mark them as completed. Then you are up against other problems. What if the other end processes them but the ack never gets to your end? This is a relatively researched problem and I suggest you research distributed transactions and two phase commit, if you need to be sure.
Sending isn't enough in some cases. If it's absolutely critical that the data you're shoving out the door MUST be received, then you should wait for an acknowledgement that the packet was received/processed by the remote end.
Even if the network-level stuff works perfectly and the packets arrive at the destination, that destination machine might still crash or otherwise lose the data. If you remove-on-send, then that data's gone. Waiting for acknowledgements from the remote end would at least give you the ability to resend packets that were corrupted/lost.

How does TCP connection terminate if one of the machine dies?

If a TCP connection is established between two hosts (A & B), and lets say host A has sent 5 octets to host B, and then the host B crashes (due to unknown reason).
The host A will wait for acknowledgments, but on not getting them, will resend octets and also reduce the sender window size.
This will repeat couple times till the window size shrinks to zero because of packet loss. My question is, what will happen next?
In this case, TCP eventually times out waiting for the ack's and return an error to the application. The application have to read/recv from the TCP socket to learn about that error, a subsequent write/send call will fail as well. Up till the point that TCP determined that the connection is gone, write/send calls will not fail, they'll succeed as seen from the application or block if the socket buffer is full.
In the case your host B vanishes after it has sent its ACKs, host A will not learn about that until it sends something to B, which will eventually also time out, or result in an ICMP error. (Typically the first write/send call will not fail as TCP will not fail the connection immediately, and keep in mind that write/send calls does not wait for ACKs until they complete).
Note also that retransmission does not reduce the window size.
Please follow this link
now a very simple answer to your question in my view is, The connection will be timed out and will be closed. another possibility that exists is that some ICMP error might be generated due to due un-responsive machine.
Also, if the crashed machine is online again, then the procedure described in the link i just pasted above will be observed.
Depends on the OS implementation. In short it will wait for ACK and resend packets until it times out. Then your connection will be torn down. To see exactly what happens in Linux look here other OSes follow similar algorithm.
in your case, A FIN will be generated (by the surviving node) and connection will eventually migrate to CLOSED state. If you keep grep-ing for netstat output on the destination ip address, you will watch the migration from ESTABLISHED state to TIMED_WAIT and then finally disappear.
In your case, this will happen since TCP keeps a timer to get the ACK for the packet it has sent. This timer is not long enough so detection will happen pretty quickly.
However, if the machine B dies after A gets ACK and after that A doesn't send anything, then the above timer can't detect the same event, however another timer (calls idle timeout) will detect that condition and connection will close then. This timeout period is high by default. But normally this is not the case, machine A will try to send stuff in between and will detect the error condition in send path.
In short, TCP is smart enough to close the connection by itself (and let application know about it) except for one case (Idle timeout: which by default is very high).
cforfun
In normal cases, each side terminating its end of the connectivity by sending a special message with a FIN(finish) bit set.
The device receiving this FIN responds with an acknowledgement to the FIN to indicate that it has been received.
The connection as a whole is not considered terminated until both the devices complete the shut down procedure by sending an FIN and receiving an acknowledgement.

Behavior of shutdown(sock, SHUT_RD) with TCP

When using a TCP socket, what does
shutdown(sock, SHUT_RD);
actually do? Does it just make all recv() calls return an error code? If so, which error code?
Does it cause any packets to be sent by the underlying TCP connection? What happens to any data that the other side sends at this point - is it kept, and the window size of the connection keeps shrinking until it gets to 0, or is it just discarded, and the window size doesn't shrink?
Shutting down the read side of a socket will cause any blocked recv (or similar) calls to return 0 (indicating graceful shutdown). I don't know what will happen to data currently traveling up the IP stack. It will most certainly ignore data that is in-flight from the other side. It will not affect writes to that socket at all.
In fact, judicious use of shutdown is a good way to ensure that you clean up as soon as you're done. An HTTP client that doesn't use keepalive can shutdown the write-side as soon as it is done sending the request, and a server that sees Connection: closed can likewise shutdown the read-side as soon as it is done receiving the request. This will cause any further erroneous activity to be immediately obvious, which is very useful when writing protocol-level code.
Looking at the Linux source code, shutdown(sock, SHUT_RD) doesn't seem to cause any state changes to the socket. (Obviously, shutdown(sock, SHUT_WR) causes FIN to be set.)
I can't comment on the window size changes (or lack thereof). But you can write a test program to see. Just make your inetd run a chargen service, and connect to it. :-)
shutdown(,SHUT_RD) does not have any counterpart in TCP protocol, so it is pretty much up to implementation how to behave when someone writes to a connection where other side indicated that it will not read or when you try to read after you declared that you wont.
On slightly lower level it is beneficial to remember that TCP connection is a pair of flows using which peers send data until they declare that they are done (by SHUT_WR which sends FIN). And these two flows are quite independent.
I test shudown(sock,SHUT_RD) on Ubuntu 12.04. I find that when you call shutdown(sock,SHUT_RD) if there are no any type of data(include FIN....) in the TCP buffer, the successive read call will return 0(indicates end of stream). But if there are some data which arrived before or after shutdown function, read call will process normally as if shutdown function was not called. It seems that shutdown(sock,SHUT_RD) doesn't cause any TCP states changed to the socket
It has two effects, one of them platform-dependent.
recv() will return zero, indicating end of stream.
Any further writes to the connection by the peer will either be (a) silently thrown away by the receiver (BSD), (b) be buffered by the receiver and eventually cause send() to block or return -1/EAGAIN/EWOULDBLOCK (Linux), or (c) cause the receiver to send an RST (Windows).
shutdown(sock, SHUT_RD) causes any writer to the socket to receive a sigpipe signal.
Any further reads using the read system call will return a -1 and set errno to EINVAL.
The use of recv will return a -1 and set errno to indicate the error (probably ENOTCONN or ENOTSOCK).

Resources