HAProxy closes long living TCP connections ignoring TCP keepalive

HAProxy closes long living TCP connections ignoring TCP keepalive - tcp

I have configured HAProxy (1.5.4, but I tried also 1.5.14) to balance in TCP mode two server exposing AMQP protocol (WSO2 Message Broker) on 5672 port.
The clients create and use permanent connection to the AMQP Servers, via HAProxy.
I've changed the client and server TCP keepalive timeout, setting net.ipv4.tcp_keepalive_time=120 (CentOS 7).
In HAProxy I've setted timeout client/server to 200 seconds (>120 seconds of the keepalive packets) and used the option clitcpka.
Then I've started wireshark and sniffed all the tcp traffic: after the last request from the clients, the tcp keepalived packets are sente regularly after 120 seconds, but after 200 seconds after the last request from the clients the connection are closed (thus ignoring the keepalived packet).
Below the configuration:
haproxy.conf
global
log 127.0.0.1 local3
maxconn 4096
user haproxy
group haproxy
daemon
debug
listen messagebroker_balancer 172.19.19.91:5672
mode tcp
log global
retries 3
timeout connect 5000ms
option redispatch
timeout client 200000ms
timeout server 200000ms
option tcplog
option clitcpka
balance leastconn
server s1 172.19.19.79:5672 check inter 5s rise 2 fall 3
server s2 172.19.19.80:5672 check inter 5s rise 2 fall 3

TCP keep alive is at the transport layer and is only used to do some traffic on the connection so intermediate systems like packet filters don't loose any states and that the end systems can notice if the connection to the other side broke (maybe because something crashed or a network cable broke).
TCP keep alive has nothing to do with the application level idle timeout which you have set explicitly to 200s:
timeout client 200000ms
timeout server 200000ms
This timeouts gets triggered if the connection is idle, that is if no data get transferred. TCP keep alive does not transport any data, the payload of these packets is empty.

The timeout client detects a dead client application on a responsive client OS. You can always have an application that occupies a connection but doesn't speak to you. This is bad because the number of connections isn't infinite (maxconn).
Similarly, set timeout server for the backend.
These options were for haproxy talking to application. Now, there is a completely separate check where OS talks to OS (without touching the app or haproxy):
With option clitcpka or option srvtcpka or option tcpka you allow the inactive connection to be detected and killed by the OS, even when haproxy doesn't actively check it. This primarily needs OS settings (Linux).
If no data sent for 110 seconds then immediately send the first keep-alive (KA), don't kill connection yet:
sysctl net.ipv4.tcp_keepalive_time=110
Wait for 30 seconds after each KA, once they're enabled on this connection:
sysctl net.ipv4.tcp_keepalive_intvl=30
Allow 3 KAs be unacknowledged, then kill the TCP connection:
sysctl net.ipv4.tcp_keepalive_probes=3
In this situation OS kills the connection 200 seconds after packets stop coming.

Related

TCP Connection Life time

I know that the TCP is connection oriented. But if I set up a forwarding server(syslog server for example) which forwards logs on TCP, is the connection always on or whenever the logs are forwarded to a server.

It depends on the server configuration.
Say you are working on Linux, you can use the command
cat /proc/sys/net/ipv4/tcp_keepalive_time
to check your current keepalive value in seconds.

How to keep TCP connections alive on AWS network loadbalancer

Architecture:
We have a bunch of IoT devices connected via an AWS network loadbalancer (NLB) to our backend servers.
This is a bidirectional channel (not a request response style, but messages passed from either party to the other).
Objective:
How to keep connections (both sides of NLB) alive during inactivity.
Description:
Frequently clients go to inactive mode and do not send (or receive) anything to (or from) servers. If this state lasts longer than 350 seconds (connection idle timeout value of NLBs) the LB silently kill the connection. This is bad, because we see a lot of RST packets everywhere.
Questions:
I'm aware of SO_KEEPALIVE feature and can enable it on our backend servers. This keeps the connection between backend servers and NLB alive. But what about clients? Do NLBs forward TCP keep-alive packets to the other party? (Here it says it does not). If it does not, how to keep clients connections open? (At them moment, I'm thinking to send an empty message to keep the connection.)
Is this behavior specific to AWS NLBs or do loadbalancers generally work this way?

AWS docs say that NLB TCP listener has ability to keep connection alive with TCP keep-alive packets: link
For TCP listeners, clients or targets can use TCP keepalive packets to reset the idle timeout.
Based on my tests client is receiving TCP keep alive packets sent by server and correctly responds back.
Server doesn't interrupt connection what means it receives response from client.
It means that NLB TCP listener actually forwards keep-alive packets.
Based on the same docs, NLB TLS listener shouldn't react the same on TCP keep-alive packets.
TCP keepalive packets are not supported for TLS listeners.
But actual tests result shocked me when Wireshark showed keep-alive packets received on client connected through TLS listener.
My previous test results performed 2 months ago don't correspond what I'm experiencing now and I'm thinking behaviour may changed.
(previously server was keeping the connection even after client became unavailable in unexpected manner)

Not an answer, just to document what I found/did:
NELBs do not forward keep-alive packets. Meaning you have to enable them on both server and clients.
NELB's timeout cannot be changed. it's 350 second
I couldn't find any way to forge an empty TCP packet to fool the LB to forward it to the other side of the LB.
At the end, we implemented the keep alive feature at the application layer (sending an empty message to clients periodically.)

What is the typical usage of TCP keepalive?

Consider a scenario where exists one server and multiple clients. And each client creates TCP connections to interact with the server. There are three usages of TCP alive:
Server-side keepalive: The server sends TCP keepalive to make sure that the client is alive. If the client is dead, the server closes the TCP connection to the client.
Client-side keepalive: Clients sends TCP keepalive to prevent the server from closing the TCP connection to the client.
Both-side keepalive: Both server and clients send TCP keepalive as described in 1 and 2.
Which of the above usages of TCP keepalive are typical?

Actually, both server and client peers may use TCP keepalive. It is useful to ensure that the operating system will eventually release any resource associated with dead connections. Note that if a connection between two hosts get lost because of some issue with a router between them, then both hosts have to independently detect that the connection is dead, and cleanup for themselves.
Now, each host will maintain a timer on each connection indicating when it last received a packet associated with that connection. A host will send a keepalive packet when that timer goes over a certain threshold, which is defined locally (that is, hosts do not exchange information about their own keepalive configuration). So either host with the lowest keepalive time will take the initiative of sending a keepalive packet to the other host. If the packet indeed goes through, the other host (that is, the one with the higher keepalive time) will respond to that packet and reset its own timer; therefore, the host with an higher keepalive time will certainly never reach the need to send keepalive packet itself, unless the connection has indeed been lost.
Arguably, it could be said that servers are generally more aggressive on keepalive than client machines (that is, they will more often be configured with lower keepalive time), because hanging connections often have undesirable effects on server software (for example, the software may accept a limited number of concurrent connection, or the server may fork a new process instance associated with each connection).

Server-side keepalive: The server sends TCP keepalive to make sure that the client is alive. If the client is dead, the server closes the TCP connection to the client.
If the client is dead, the server gets a 'connection reset' error, after which it should close the connection.
Client-side keepalive: Clients sends TCP keepalive to prevent the server from closing the TCP connection to the client.
No. Client sends keepalive so that if the server is dead, the client will get a 'connection reset' error, after which it should close the connection.
Both-side keepalive
Both sides are capable of getting a 'connection reset' due to keepalive failure, as above.
Whuch of the above usages is typical?
Any of them, or none. If a peer is sending regularly it doesn't really need keepalive as well. It is therefore often of more use to a server than a client.

Load balance TCP with HAProxy

I want to set load balancer for syslog-ng messages, so let say several boxes are sending TCP 514 messages to fronend interface of HAProxy box - 192.168.0.20 and there is one graylog server to which those messages are passed - 10.0.0.2.
Below simplest possible config doesn't work.
defaults
mode tcp
frontend main
bind 192.168.0.20:514
use_backend graylog
backend graylog
server graylog1 10.0.0.2:514
Tcpdump is showing that HAProxy is sending RST flag to incoming messages on 514. I believe I should see HAProxy listening on 514 with netstat?

RST for SYN packet means the port is not open for connection. Use netstat utility to determine if the ports are open. RST can also be sent when the entity wants to close the established connection for good.

Here is a config that should work. You have to be root (or sudo) to bind to port 514 though.
defaults
mode tcp
timeout connect 5000ms
timeout client 50000ms
listen graylog
bind *:514
mode tcp
balance roundrobin
server graylog1 10.0.0.1:514
server graylog2 10.0.0.1:514
timeout connect 20s
timeout server 30s

Will keep-alive useful to use with load balancer and firewalls

I have client and server component. Server may be installed behind the firewall or load balancer. Many sites/forums suggested to use TCP keep-alive feature to avoid connection termination due to inactivity.
The question is whether the keep-alive message from client will actually reach to server?
I tried to simulate the deployment using tcptrace utility and found that the keep-alive messages does not reach to server still the client was getting ACK for keep alive message.
I am not sure whether LB/FW work in same manner.
Is the keep-alive good option to avoid connection termination due to inactivity over socket in case of firewall and load balancer?

The answer is, of course: "it depends".
Many firewalls and load balancers maintain separate frontend and backend TCP connections, e.g.:
client <-- TCP --> firewall/balancer <-- TCP --> server
For situations like this, using TCP keepalive will not work as you'd expect. Why not? The TCP keepalive works for that TCP session only, and the keepalive probe packets are more like "administrative overhead" packets that data-bearing packets. This means that a) using TCP keepalive on the client end only means keeping the TCP connection to the firewall/balancer alive, and b) the firewall/balancer does not "forward" those keepalive probe packets across to the backend connection.
So is using TCP keepalive useful? Yes. There are other types of proxies which work at lower layers in the OSI stack, and which do forward those packets; using TCP keepalive is good for keeping your idle connection alive through those types of network intermediaries.
If your client/server application uses a long-lived, possibly idle TCP connection through firewalls/balancers, the best way to ensure that that connection is not torn down (sometimes politely, e.g. with a RST packet sent by the firewall/balancer, sometimes silently) is to use a "ping" or "heartbeat" message at the application layer. (Think of this as an "application keepalive".) This is just some kind of message that is sent e.g. from the client to the server. A simple and effective technique is to have the client periodically send some bytes to the server, which the server echoes back to the client. The client knows which bytes it sent, and when it receives those same bytes back from the server, it knows that everything in the network path is still working as expected.
Hope this helps!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex