How do I detect tcp-client disconnect with gen_tcp? - tcp

I'am trying to use gen_tcp module.
There is example of server-side code, which I have troubles.
%% First, I bind server port and wait for peer connection
{ok, Sock} = gen_tcp:listen(7890, [{active, false}]),
{ok, Peer} = gen_tcp:accept(Sock),
%% Here client calls `gen_tcp:close/1` on socket and goes away.
%% After that I am tryin' send some message to client
SendResult = gen_server:send(Peer, <<"HELLO">>),
%% Now I guess that send is failed with {error, closed}, but...
ok = SendResult.
When I call gen_tcp:send/2 again, second call wil return {error, closed} as expected. But I want to understand, why first call succeeded? Am I missing some tcp-specific details?
This strange (for me) behavior is only for {active, false} connection.

In short, the reason for this is that there's no activity on the socket that can determine that the other end has closed. The first send appears to work because it operates on a socket that, for all intents and purposes, appears to be connected and operational. But that write activity determines that the other end is closed, which is why the second send fails as expected.
If you were to first read or recv from the socket, you'd quickly learn the other end was closed. Alternatively, if the socket were in an Erlang active mode, then you'd also learn of the other end closing because active modes poll the socket.
Aside from whether the socket is in an active mode or not, this has nothing to do with Erlang. If you were to write C code directly to the sockets API, for example, you'd see the same behavior.

Related

How to query TCP connection state in go?

On the client side of a TCP connection, I am attempting to to reuse established connections as much as possible to avoid the overhead of dialing every time I need a connection. Fundamentally, it's connection pooling, although technically, my pool size just happens to be one.
I'm running into a problem in that if a connection sits idle for long enough, the other end disconnects. I've tried using something like the following to keep connections alive:
err = conn.(*net.TCPConn).SetKeepAlive(true)
if err != nil {
fmt.Println(err)
return
}
err = conn.(*net.TCPConn).SetKeepAlivePeriod(30*time.Second)
if err != nil {
fmt.Println(err)
return
}
But this isn't helping. In fact, it's causing my connections to close sooner. I'm pretty sure this is because (on a Mac) this means the connection health starts being probed after 30 seconds and then is probed at 8 times at 30 second intervals. The server side must not be supporting keepalive, so after 4 minutes and 30 seconds, the client is disconnecting.
There might be nothing I can do to keep an idle connection alive indefinitely, and that would be absolutely ok if there were some way for me to at least detect that a connection has been closed so that I can seamlessly replace it with a new one. Alas, even after reading all the docs and scouring the blogosphere for help, I can't find any way at all in go to query the state of a TCP connection.
There must be a way. Does anyone have any insight into how that can be accomplished? Many thanks in advance to anyone who does!
EDIT:
Ideally, I'd like to learn how to handle this, low-level with pure go-- without using third-party libraries to accomplish this. Of course if there is some library that does this, I don't mind being pointed in its direction so I can see how they do it.
The socket api doesn't give you access to the state of the connection. You can query the current state it in various ways from the kernel (/proc/net/tcp[6] on linux for example), but that doesn't make any guarantee that further sends will succeed.
I'm a little confused on one point here. My client is ONLY sending data. Apart from acking the packets, the server sends nothing back. Reading doesn't seem an appropriate way to determine connection status, as there's noting TO read.
The socket API is defined such that that you detect a closed connection by a read returning 0 bytes. That's the way it works. In Go, this is translated to a Read returning io.EOF. This will usually be the fastest way to detect a broken connection.
So am I supposed to just send and act on whatever errors occur? If so, that's a problem because I'm observing that I typically do not get any errors at all when attempting to send over a broken pipe-- which seems totally wrong
If you look closely at how TCP works, this is the expected behavior. If the connection is closed on the remote side, then your first send will trigger an RST from the server, fully closing the local connection. You either need to read from the connection to detect the close, or if you try to send again you will get an error (assuming you've waited long enough for the packets to make a round trip), like "broken pipe" on linux.
To clarify... I can dial, unplug an ethernet cable, and STILL send without error. The messages don't get through, obviously, but I receive no error
If the connection is actually broken, or the server is totally unresponsive, then you're sending packets off to nowhere. The TCP stack can't tell the difference between packets that are really slow, packet loss, congestion, or a broken connection. The system needs to wait for the retransmission timeout, and retry the packet a number of times before failing. The standard configuration for retries alone can take between 13 and 30 minutes to trigger an error.
What you can do in your code is
Turn on keepalive. This will notify you of a broken connection more quickly, because the idle connection is always being tested.
Read from the socket. Either have a concurrent Read in progress, or check for something to read first with select/poll/epoll (Go usually uses the first)
Set timeouts (deadlines in Go) for everything.
If you're not expecting any data from the connection, checking for a closed connection is very easy in Go; dispatch a goroutine to read from the connection until there's an error.
notify := make(chan error)
go func() {
buf := make([]byte, 1024)
for {
n, err := conn.Read(buf)
if err != nil {
notify <- err
return
}
if n > 0 {
fmt.Println("unexpected data: %s", buf[:n])
}
}
}()
There is no such thing as 'TCP connection state', by design. There is only what happens when you send something. There is no TCP API, at any level down to the silicon, that will tell you the current state of a TCP connection. You have to try to use it.
If you're sending keepalive probes, the server doesn't have any choice but to respond appropriately. The server doesn't even know that they are keepalives. They aren't. They are just duplicate ACKs. Supporting keepalive just means supporting sending keepalives.

Erlang: exit/shutdown synchronous or asynchronous?

You have a process tree you want to kill, so you send an exit(PID, shutdown) to the supervisor. There's other stuff you need to do, but it can't be done until this process tree is shutdown. For instance, let's say this process tree writes to a database. You want to shut everything down cleanly. You want to shut down the database, but obviously you need to shut down the process tree first, else the tree could be in the middle of a write to the database.
My question is, when I send the exit signal, is it synchronous or asynchronous? If it is synchronous, it seems I have no worries, but if it is asynchronous, I will need to do something like establish a process monitor and check whether the tree shut down before I proceed with database shutdown, correct?
Thanks.
Short answer: OTP shutdown is synchronous. exit/2 is a single asynchronous message.
Long answer: All messages in Erlang are asynchronous. The shutdown message is no different. However, there is more to shutdown than just sending a message. The supervisor listens for {'DOWN', ...} messages after sending the exit signal. Only after it receives a 'DOWN' message or times out does it proceed, so in effect it is synchronous. Checkout the supervisor source code. On line 894 is where the functions that actually makes the exit call is defined:
shutdown(Pid, Time) ->
case monitor_child(Pid) of
ok ->
exit(Pid, shutdown), %% Try to shutdown gracefully
receive
{'DOWN', _MRef, process, Pid, shutdown} ->
ok;
{'DOWN', _MRef, process, Pid, OtherReason} ->
{error, OtherReason}
after Time ->
exit(Pid, kill), %% Force termination.
receive
{'DOWN', _MRef, process, Pid, OtherReason} ->
{error, OtherReason}
end
end;
{error, Reason} ->
{error, Reason}
end.
The source code can be viewed on GitHub here: https://github.com/erlang/otp/blob/maint/lib/stdlib/src/supervisor.erl#L894
erlang:exit/2 calls on the other hand is simply an asynchronous exit signal
If you need to manage this yourself, do your own monitoring:
sneak_attack(BankGuard) ->
monitor(process, BankGuard),
exit(BankGuard, kill),
Cash = receive {'DOWN', _, process, BankGuard, _} -> rob_bank() end,
send_to_bahamas(Cash).
In this example rob_bank() and anything after is blocked waiting on the 'DOWN' message from BankGuard.
Also, note that this is a much more general concept than just shutting something down. All messages in Erlang are asynchronous but unlike UDP, ordering (between two processes) and delivery (so long as the destination is alive) is guaranteed. So synchronous messaging is simply monitoring the target, sending a tagged message, and blocking on receipt of the return message.

TCP keep-alive to determine if client disconnected in netty

I'm trying to determine if a client has closed a socket connection from netty. Is there a way to do this?
On a usual case where a client closes the socket via close() and the TCP closing handshake has been finished successfully, a channelInactive() (or channelClosed() in 3) event will be triggered.
However, on an unusual case such as where a client machine goes offline due to power outage or unplugged LAN cable, it can take a lot of time until you discover the connection was actually down. To detect this situation, you have to send some message to the client periodically and expect to receive its response within a certain amount of time. It's like a ping - you should define a periodic ping and pong message in your protocol which practically does nothing but checking the health of the connection.
Alternatively, you can enable SO_KEEPALIVE, but the keepalive interval of this option is usually OS-dependent and I would not recommend using it.
To help a user implement this sort of behavior relatively easily, Netty provides ReadTimeoutHandler. Configure your pipeline so that ReadTimeoutHandler raises an exception when there's no inbound traffic for a certain amount of time, and close the connection on the exception in your exceptionCaught() handler method. If you are the party who is supposed to send a periodic ping message, use a timer (or IdleStateHandler) to send it.
If you are writing a server, and netty is your client, then your server can detect a disconnect by calling select() or equivalent to detect when the socket is readable and then call recv(). If recv() returns 0 then the socket was closed gracefully by the client. If recv() returns -1 then check errno or equivalent for the actual error (with few exceptions, most errors should be treated as an ungraceful disconnect). The thing about unexpected disconnects is that they can take a long time for the OS to detect, so you would have to either enable TCP keep-alives, or require the client to send data to the server on a regular basis. If nothing is received from the client for a period of time then just assume the client is gone and close your end of the connection. If the client wants to, it can then reconnect.
If you read from a connection that has been closed by the peer you will get an end-of-stream indication of some kind, depending on the API. If you write to such a connection you will get an IOException: 'connection reset'. TCP doesn't provide any other way of detecting a closed connection.
TCP keep-alive (a) is off by default and (b) only operates every two hours by default when enabled. This probably isn't what you want. If you use it and you read or write after it has detected that the connection is broken, you will get the reset error above,
It depends on your protocol that you use ontop of netty. If you design it to support ping-like messages, you can simply send those messages. Besides that, netty is only a pretty thin wrapper around TCP.
Also see this SO post which describes isOpen() and related. This however does not solve the keep-alive problem.

Erlang socket doesn't receive until the second setopts {active,once}

First I would like to apologize, I'm giving so much information to make it as clear as possible what the problem is. Please let me know if there's still anything which needs clarifying.
(Running erlang R13B04, kernel 2.6.18-194, centos 5.5)
I have a very strange problem. I have the following code to listen and process sockets:
%Opts used to make listen socket
-define(TCP_OPTS, [binary, {packet, raw}, {nodelay, true}, {reuseaddr, true}, {active, false},{keepalive,true}]).
%Acceptor loop which spawns off sock processors when connections
%come in
accept_loop(Listen) ->
case gen_tcp:accept(Listen) of
{ok, Socket} ->
Pid = spawn(fun()->?MODULE:process_sock(Socket) end),
gen_tcp:controlling_process(Socket,Pid);
{error,_} -> do_nothing
end,
?MODULE:accept_loop(Listen).
%Probably not relevant
process_sock(Sock) ->
case inet:peername(Sock) of
{ok,{Ip,_Port}} ->
case Ip of
{172,16,_,_} -> Auth = true;
_ -> Auth = lists:member(Ip,?PUB_IPS)
end,
?MODULE:process_sock_loop(Sock,Auth);
_ -> gen_tcp:close(Sock)
end.
process_sock_loop(Sock,Auth) ->
try inet:setopts(Sock,[{active,once}]) of
ok ->
receive
{tcp_closed,_} ->
?MODULE:prepare_for_death(Sock,[]);
{tcp_error,_,etimedout} ->
?MODULE:prepare_for_death(Sock,[]);
%Not getting here
{tcp,Sock,Data} ->
?MODULE:do_stuff(Sock,Data);
_ ->
?MODULE:process_sock_loop(Sock,Auth)
after 60000 ->
?MODULE:process_sock_loop(Sock,Auth)
end;
{error,_} ->
?MODULE:prepare_for_death(Sock,[])
catch _:_ ->
?MODULE:prepare_for_death(Sock,[])
end.
This whole setup works wonderfully normally, and has been working for the past few months. The server operates as a message passing server with long-held tcp connections, and it holds on average about 100k connections. However now we're trying to use the server more heavily. We're making two long-held connections (in the future probably more) to the erlang server and making a few hundred commands every second per each of those connections. Each of those commands, in the common case, spawn off a new thread which will probably make some kind of read from mnesia, and send some messages based on that.
The strangeness comes when we try to test those two command connections. When we turn on the stream of commands, any new connection has about 50% chance of hanging. For instance, using netcat if I were to connect and send along the string "blahblahblah" the server should immediately return back an error. In doing this it won't make any calls outside the thread (since all it's doing is trying to parse the command, which will fail because blahblahblah isn't a command). But about 50% of the time (when the two command connections are running) typing in blahblahblah results in the server just sitting there for 60 seconds before returning that error.
In trying to debug this I pulled up wireshark. The tcp handshake always happens immediately, and when the first packet from the client (netcat) is sent it acks immediately, telling me that the tcp stack of the kernel isn't the bottleneck. My only guess is that the problem lies in the process_sock_loop function. It has a receive which will go back to the top of the function after 60 seconds and try again to get more from the socket. My best guess is that the following is happening:
Connection is made, thread moves on to process_sock_loop
{active,once} is set
Thread receives, but doesn't get data even though it's there
After 60 seconds thread goes back to the top of process_sock_loop
{active, once} is set again
This time the data comes through, things proceed as normal
Why this would be I have no idea, and when we turn those two command connections off everything goes back to normal and the problem goes away.
Any ideas?
it's likely that your first call to set {active,once} is failing due to a race condition between your call to spawn and your call to controlling_process
it will be intermittent, likely based on host load.
When doing this, I'd normally spawn a function that blocks on something like:
{take,Sock}
and then call your loop on the sock, setting {active,once}.
so you'd change the acceptor to spawn, set controlling_process then Pid ! {take,Sock}
something to that effect.
note: I don't know if the {active,once} call actually throws when you aren't the controlling processes, if it doesn't, then what I just said makes sense.

Behavior of shutdown(sock, SHUT_RD) with TCP

When using a TCP socket, what does
shutdown(sock, SHUT_RD);
actually do? Does it just make all recv() calls return an error code? If so, which error code?
Does it cause any packets to be sent by the underlying TCP connection? What happens to any data that the other side sends at this point - is it kept, and the window size of the connection keeps shrinking until it gets to 0, or is it just discarded, and the window size doesn't shrink?
Shutting down the read side of a socket will cause any blocked recv (or similar) calls to return 0 (indicating graceful shutdown). I don't know what will happen to data currently traveling up the IP stack. It will most certainly ignore data that is in-flight from the other side. It will not affect writes to that socket at all.
In fact, judicious use of shutdown is a good way to ensure that you clean up as soon as you're done. An HTTP client that doesn't use keepalive can shutdown the write-side as soon as it is done sending the request, and a server that sees Connection: closed can likewise shutdown the read-side as soon as it is done receiving the request. This will cause any further erroneous activity to be immediately obvious, which is very useful when writing protocol-level code.
Looking at the Linux source code, shutdown(sock, SHUT_RD) doesn't seem to cause any state changes to the socket. (Obviously, shutdown(sock, SHUT_WR) causes FIN to be set.)
I can't comment on the window size changes (or lack thereof). But you can write a test program to see. Just make your inetd run a chargen service, and connect to it. :-)
shutdown(,SHUT_RD) does not have any counterpart in TCP protocol, so it is pretty much up to implementation how to behave when someone writes to a connection where other side indicated that it will not read or when you try to read after you declared that you wont.
On slightly lower level it is beneficial to remember that TCP connection is a pair of flows using which peers send data until they declare that they are done (by SHUT_WR which sends FIN). And these two flows are quite independent.
I test shudown(sock,SHUT_RD) on Ubuntu 12.04. I find that when you call shutdown(sock,SHUT_RD) if there are no any type of data(include FIN....) in the TCP buffer, the successive read call will return 0(indicates end of stream). But if there are some data which arrived before or after shutdown function, read call will process normally as if shutdown function was not called. It seems that shutdown(sock,SHUT_RD) doesn't cause any TCP states changed to the socket
It has two effects, one of them platform-dependent.
recv() will return zero, indicating end of stream.
Any further writes to the connection by the peer will either be (a) silently thrown away by the receiver (BSD), (b) be buffered by the receiver and eventually cause send() to block or return -1/EAGAIN/EWOULDBLOCK (Linux), or (c) cause the receiver to send an RST (Windows).
shutdown(sock, SHUT_RD) causes any writer to the socket to receive a sigpipe signal.
Any further reads using the read system call will return a -1 and set errno to EINVAL.
The use of recv will return a -1 and set errno to indicate the error (probably ENOTCONN or ENOTSOCK).

Resources