TCP connections terminate after some time on 4G hotspot - tcp

I am using ubuntu 20 and I am connected with wifi to the hotspot of a Galaxy A20e running android 12 which is receiving 4G. I have a connection problem that is not occuring when using my ADSL connection.
Test description : I opened a dozen of shells and established ssh connections to a dozen of servers and I just wait without interacting with the shells.
After some time (usually 5 to 10 minutes), I observe a "client_loop: send disconnect: Broken pipe ssh" message on some of the shells, then after a longer time, I observe the same message on some other shells, until after maybe one hour, all connections are gone.
Using wireshark, I have filtered on packets exchanged with one of the servers and I notice no packets exchanged except the "keep alive" signal that is send from my computer every 5 minutes, (which is conform to my ssh client configuration). At some point, in response to one of those signal, the server returns RST ACK which means that the connection has been interrupted.
I tried to find the root cause of the connection cut by using pings but I did not find anything special during the 5 minutes window when the connection cut happens. Only thing I noticed is that some pings are sometimes not received.
Any idea ?
Thanks

This question belongs to https://serverfault.com/
Try adding this to your .ssh/config:
Host *
ServerAliveInterval 60
ServerAliveCountMax 5
Most probably your 4G router or your 4G operator terminate idle connections.

Related

Nodemcu: UDP communication gets influenced if there is/isn't udp listener?

I have a weird problem with nodemcu firmware (2.1.0) on an ESP8266, where I'm running out of ideas what else I could try to solve the issue.
I have a simple lua script running, which is listening on UDP for commands to switch a relay on and off, and sending alive messages via UDP every 60 seconds to a defined IP address.
If there is nothing listening on the server side which is supposed to get the UDP "alive" messages, ESP reacts fine, all good.
As soon as I start netcat to listen to the UDP packages coming from ESP, the ESP will start to hang every couple of minutes for at least 30-60 seconds.
It is specifically confusing as I'm expecting UDP to be a connectionless protocol. So how can a listener on UDP influence the behavior of the sender?
These are the relevant parts of the lua script:
[...]
alive=60000
[...]
function srvupd(s)
if (connected==1) then
s = s .." "..ip
srv:send(serverport, serveradr, s.."\n")
end;
end;
if (alive>0) then
tmr.alarm(2, alive, 1, function()
srvupd("alive")
end)
end
srv=net.createUDPSocket()
srv:listen(80)
srv:on("sent", function()
srv:close();
srv:listen(80);
end)
srv:on("receive",function(client,request, port, ip)
if (alive>0) then tmr.stop(2) end
print(string.format("received '%s' from %s:%d", request, ip, port))
buf="unknown"
if (request == "ch1on") then gpio.write(relay1, relayon);buf="ok" end
[...]
client:send(port, ip, buf)
if (alive>0) then tmr.start(2) end
end)
And this is how I use netcat to listen to the UDP messages from ESP in a bash script:
#!/bin/bash
while true
do
msg=$(netcat -4 -u -n -l -D 192.168.0.5 2701 -w0 -q0)
echo -e "$msg"
done
In the situation where ESP is not reacting to UDP commands anymore, the alive messages are still sending every minute. The UDP commands are even received by the ESP, because as soon as processing continues, a "channel-on" command sent some time ago gets executed.
These temporary blockings of ESP only happens when I listen to its UDP messages.
I've checked all kind of combinations, like separate UDP-sockets for the listener and the alive sending on the ESP, closing and opening the server, after message was sent (like in the current version above) etc.
I've even tried to receive commands via TCP and only send the alive messages via UDP.
Behaviour remains the same. All works, as long as there is nothing receiving the UDP messages from ESP. As soon as I start netcat, ESP startes to hang within a couple of minutes.
Any ideas? As it is UDP it is already difficult to understand how it can happen at all.
kind regards
Tjareson
The issue is solved meanwhile. A friend of mine was pointing me to the only common basis for the UDP issue, which is ARP.
The behaviour only occured, when ESP was in a different network than the udp-listener. (like 192.168.1.x and 192.168.5.y)
Even if it remains a bit unclear, the guessing is that probably netcat is making an ARP request when receiving a message and that this somehow isn't properly processed by the router, if taking place between two different networks.
After putting the listener bashscript in the same network (basically by giving the raspberry on which it runs a second IP in the network the ESP is in), the blocked ESP communication didn't happen again.

Creating an UDP server

I am trying to create an UDP server and client. How can I check if client is "connected" or has disconnected? Since UDP is not really a Connection. How does multiplayer games do it?
Have you ever played a first person game and at some point lost internet connection? You should have gotten a message with a countdown before it automatically disconnects you. Have a look at this picture and notice in the top right the countdown to a timeout.
UDP maintains the connection by continuously sending and receiving packets. So as an example if for 30 seconds the server hasn't received anything from a client it can presume the client disconnected without letting the server know (disconnected by timeout).
The implementation is usually quite straightforward: use a timer and reset it each time you receive a packet, if the timer goes over the TIMEOUT_VALUE then disconnect this client.

Linux Doesn't Respond to SYN on ESTABLISHED connection

So I have a remote device using a Lantronics XPort module connecting to a VPS. They establish a TCP connection and everything is great. The server ACKs everything.
At some point the remote device stops transmitting data. 30 seconds goes by.
The device then starts sending SYN packets as if trying to establish a new connection. The device is configured to maintain a connection to the server, and it always uses the same source port. (I realize this is bad, but it is hard for me to change)
The server sees a SYN packet from the same (source ip, source port), so the server thinks the connection is ESTABLISHED. The server does not respond to the SYN packet.
Why does the server not respond with ACK as described in Figure 10 in RFC 793? ( https://www.ietf.org/rfc/rfc793.txt )
How can I get the server to kill the connection or respond with an ACK?
It could be the case that during that 30 second silence, the device is waiting for an ACK from the server, and that ACK was dropped somewhere along the line. In this case, I think it should retransmit.
Server is running Ubuntu with kernel 3.12.9-x86_64-linode37
Thank you for any help!
My first suggestion is change the client to use the same connection or to gracefully close the connection before re-opening.
As you DO NOT have control over client and all that can do is on server, you can try this:
Configure keep-alive to be sent after 10 secs of silence and probe only once. If client does not respond, server closes the connection. By doing this, the server should be in listening mode again within 10 seconds of silence without client responding. You can play with the following sysctl's and arrive at optimal values.
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 1
======
Also, regarding missing-ack that you have mentioned in your question, TCP takes care of those things. 30 seconds is too long a time for the first re-transmission from sender. If client/device does not get an ack after 30 seconds, it will/should not try to open a new connection. If you are seeing that, it is an insane-TCP stack at the client. What is that device and which OS/TCP-stack is it using?
it is kernel version has different behavior, ignore any syn packet in kernel 3.12.9-x86_64. but server ack a ack packet, client receive the ack resent rst, and sent new syn in kernel 4.9.0.
incoming-tcp-syns-possibilities
TCP packets ignored on ESTABLISHED connection

Should I send Keep alive packets when I use a TCP connection?

My question is , I have create a TCP connection , and when it stays without transfering any data about 1 hour, It already disconnected from the server, but it do not notify me , that it's disconnected, Should I send keep alive packets to the server ? or should i send keep alive packets from server to client ? Or should i send to both ?
Yes you should. A few days ago I created a TCP socket/server application, and I got the same problem. I fixed it by starting to send keep alive packets.
If you send keep alive packets, your problem will disappear.
I've heard some people say that the OS will send keep alive packets for you, I am not very familiar with this, but sending keep alive packets explicitly worked for me
it do not notify me that it's disconnected
It can't. There is no means whereby it could do so other than inducing an EOS or a 'connection reset' next time you try an I/O.
Should I send keepalive packets ...
If you have this problem, yes. In general, probably not.

TCP Socket no connection timeout

I open a TCP socket and connect it to another socket somewhere else on the network. I can then successfully send and receive data. I have a timer that sends something to the socket every second.
I then rudely interrupt the connection by forcibly losing the connection (pulling out the Ethernet cable in this case). My socket is still reporting that it is successfully writing data out every second. This continues for approximately 1hour and 30 minutes, where a write error is eventually given.
What specifies this time-out where a socket finally accepts the other end has disappeared? Is it the OS (Ubuntu 11.04), is it from the TCP/IP specification, or is it a socket configuration option?
Pulling the network cable will not break a TCP connection(1) though it will disrupt communications. You can plug the cable back in and once IP connectivity is established, all back-data will move. This is what makes TCP reliable, even on cellular networks.
When TCP sends data, it expects an ACK in reply. If none comes within some amount of time, it re-transmits the data and waits again. The time it waits between transmissions generally increases exponentially.
After some number of retransmissions or some amount of total time with no ACK, TCP will consider the connection "broken". How many times or how long depends on your OS and its configuration but it typically times-out on the order of many minutes.
From Linux's tcp.7 man page:
tcp_retries2 (integer; default: 15; since Linux 2.2)
The maximum number of times a TCP packet is retransmitted in
established state before giving up. The default value is 15, which
corresponds to a duration of approximately between 13 to 30 minutes,
depending on the retransmission timeout. The RFC 1122 specified
minimum limit of 100 seconds is typically deemed too short.
This is likely the value you'll want to adjust to change how long it takes to detect if your connection has vanished.
(1) There are exceptions to this. The operating system, upon noticing a cable being removed, could notify upper layers that all connections should be considered "broken".
If want a quick socket error propagation to your application code, you may wanna try this socket option:
TCP_USER_TIMEOUT (since Linux 2.6.37)
This option takes an unsigned int as an argument. When the
value is greater than 0, it specifies the maximum amount of
time in milliseconds that transmitted data may remain
unacknowledged before TCP will forcibly close the
corresponding connection and return ETIMEDOUT to the
application. If the option value is specified as 0, TCP will
use the system default.
See full description on linux/man/tcp(7). This option is more flexible (you can set it on the fly, just right after a socket creation) than tcp_retries2 editing and exactly applies to a situation when you client's socket doesn't aware about server's one state and may get into so called half-closed state.
Two excellent answers are here and here.
TCP user timeout may work for your case: The TCP user timeout controls how long transmitted data may remain unacknowledged before a connection is forcefully closed.
there are 3 OS dependent TCP timeout parameters.
On Linux the defaults are:
tcp_keepalive_time default 7200 seconds
tcp_keepalive_probes default 9
tcp_keepalive_intvl default 75 sec
Total timeout time is tcp_keepalive_time + (tcp_keepalive_probes * tcp_keepalive_intvl), with these defaults 7200 + (9 * 75) = 7875 secs
To set these parameters on Linux:
sysctl -w net.ipv4.tcp_keepalive_time=1800 net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=20

Resources