Why cant I connect more than 8000 clients to MQTT brokers via HAProxy? - tcp

I am trying to establish 10k client connections(potentially 100k) with my 2 MQTT brokers using HAProxy as a load balancer.
I have a working simulator(using Java Paho library) that can simulate 10k clients. On the same machine I run 2 MQTT brokers in docker. For LB im using another machine with virtual image of Ubuntu 16.04.
When I connect directly to a MQTT Broker those connections are established without a problem, however when I use HAProxy I only get around 8.8k connections, while the rest throw: Error at client{insert number here}: Connection lost (32109) - java.net.SocketException: Connection reset. When I connect simulator directly to a broker (Same machine) about 20k TCP connections open, however when I use load balancer only 17k do. This leaves me thinking that LB is causing the problem.
It is important to add that whenever I run the simulator I'm unable to use the browser (Cannot connect to the internet). I havent tested if this is browser only, but could that mean that I actually run out of ports or something similar and the real issue here is not in the LB?
Here is my HAProxy configuration:
global
log /dev/log local0
log /dev/log local1 notice
maxconn 500000
ulimit-n 500000
maxpipes 500000
defaults
log global
mode http
timeout connect 3h
timeout client 3h
timeout server 3h
listen mqtt
bind *:8080
mode tcp
option tcplog
option clitcpka
balance leastconn
server broker_1 address:1883 check
server broker_2 address:1884 check
listen stats
bind 0.0.0.0:1936
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /
This is what MQTT broker shows for every successful/unsuccessful connection
...
//Successful connection
1613382861: New connection from xxx:32850 on port 1883.
1613382861: New client connected from xxx:60974 as 356 (p2, c1, k1200, u'admin').
...
//Unsuccessful connection
1613382699: New connection from xxx:42861 on port 1883.
1613382699: Client <unknown> closed its connection.
...
And this is what ulimit -a shows on LB machine.
core file size (blocks) (-c) 0
data seg size (kb) (-d) unlimited
scheduling priority (-e) 0
file size (blocks) (-f) unlimited
pending signals (-i) 102355
max locked memory (kb) (-l) 82000
max memory size (kb) (-m) unlimited
open files (-n) 500000
POSIX message queues (bytes) (-q) 819200
real-time priority (-r) 0
stack size (kb) (-s) 8192
cpu time (seconds) (-t) unlimited
max user processes (-u) 500000
virtual memory (kb) (-v) unlimited
file locks (-x) unlimited
Note: The LB process has the same limits.
I followed various tutorials and increased open file limit as well as port limit and TCP header size, etc. The number of connected users increased from 2.8k to about 8.5-9k (Which is still way lower than the 300k author of the tutorial had). ss -s command shows about 17000ish TCP and inet connections.
Any pointers would greatly help!
Thanks!

You can't do a normal LB of MQTT traffic, as you can't "pin" the connection based on the MQTT Topic. If you send in a SUBSCRIBE to Broker1 for Topic "test/blatt/#", but the next client PUBLISHes to Broker2 "test/blatt/foo", then if the two brokers are not bridged, your first subscriber will never get that message.
If your clients are terminating the TCP connection sometime after the CONNECT, or the HAproxy is round-robin'ing the packets between the two brokers, you will get errors like this. You need to somehow persist the connections, and I don't know how you do that with HAproxy. Non-free LB's like A10 Thunder or F5 LTM can persist TCP connections...but you still need the MQTT brokers bridged for it all to work.

Turns out I was running out of resources on my computer.
I moved simulator to another machine and managed to get 15k connections running. Due to resource limits I cant get more than that. Computer thats running the serverside uses 20/32GB of RAM and the computer running simulator used 32/32GB for approx 15k devices. Now I see why running both on the same computer is not an option.

Related

Data cost of keeping a tcp connection open

Let's suppose 2 computers:
The first is running a netcat server on a tcp port.
The second is running a netcat client, connected to the previous netcat server.
(netcat is an example, you can imagine a basic c program with socket)
We ca send data between the 2 computers.
Let's imagine nobody send data during multiple days.
Is there a timeout in tcp stack ?
Does netcat (or operating system) sends some packets to keep the connection opened ?
What i want to know is how much data is sent if there is no top level activity.
Thanks
Is there a timeout in tcp stack ?
There are many different timeouts in the TCP stack, depending on what state we are currently in, and how the connection was configured (e.g. with keepalive or not). The idle connection timeout (which is what you refer to) does not seem to be defined. With keepalive the timeout is ~2 hours. That being said pretty much every firewall in the world will setup some timeout. Based on this reddit thread 15 minutes looks like a reasonable assumption, maybe even 1 hour. But multiple days? I doubt it will be alive in any network (except your own).
Does netcat (or operating system) sends some packets to keep the connection opened ?
No. You will have to do it yourself by sending data. With the keepalive option for TCP, the OS will do it for you (note: keepalive is disabled by default), but this works between direct peers, i.e. may fail when proxies are involved. Sending data is definitely a better approach.

Troubleshoot RServe config option keep.alive

I am using RServe 1.7.3 on a headless RHEL 7.9 VM. On the client, I am using RserveCLI2.
On long running jobs, the TCP/IP connection becomes blocked by a fire wall, after 2 hours.
I came across the keep.alive configuration option, that is available since RServe 1.7.2 (RServe News/Changelog).
The specs read:
added support for keep.alive configuration option - it is global to
all servers and if enabled the client sockets are instructed to keep
the connection alive by periodic messages.
I added the following to /etc/Rserv.conf:
keep.alive enable
but this does no prevent the connection from being blocked.
Unfortunately, I cannot run a network monitoring tool, like Wireshark, to monitor the traffic between client and server.
How could I troubleshoot this?
Some specific questions I have:
Is the path of the config file indeed /etc/Rserv.conf, as specified in Documentation for Rserve? Notice that it does not have a final e, like Rserve.
Does this behaviour depend on de RServe client in use, or is this completely handled at the socket level?
Can I inspect the runtime settings of RServe, to see if keep.alive is enabled?
We got this to work.
To summarize, we adjusted some kernel settings to make sure keep-alive packets are send at shorter intervals to prevent the connection from being deemed dead by network components.
This is how and why.
The keep.alive enable setting is in fact an instruction to the socket layer to periodically emit keep-alive packets from server to client. The client is expected to return an ACK on these packets. The behaviour is governed by three kernel-level settings, as explained in TCP Keepalive HOWTO - Using TCP keepalive under Linux:
tcp_keepalive_time (defaults to 7200 seconds)
tcp_keepalive_intvl (defaults to 75 seconds)
tcp_keepalive_probes (defaults to 9 times)
The tcp_keepalive_time is the first time a keep-alive packet is sent, after establishing the tcp/ip connection. The tcp_keepalive_intvl interval is de wait time between subsequent packets and tcp_keepalive_probes the number of subsequent unacknowledged packets that make the system decide the connection is dead.
So, the first keep-alive packet was only send after 2 hours. After that time, some network component had already decided the connection was dead and the keep-alive packet never made it to the client and thus no ACK was ever send.
We lowered both tcp_keepalive_time and tcp_keepalive_intvl to 600 seconds.
With tcpdump -i [interface] port 6311 we were able to monitor the keep-alive packets.
15:40:11.225941 IP <server>.6311 <some node>.<port>: Flags [.], ack 1576, win 237, length 0
15:40:11.226196 IP <some node>.<port> <server>.6311: Flags [.], ack 401, win 511, length 0
This continues until the results are send back and the connection is closed. At least, I test for a duration of 12 hours.
So, we use keep-alive here not to check for dead peers, but to prevent disconnection due to network inactivity, as is discussed in TCP Keepalive HOWTO - 2.2. Why use TCP keepalive?. In that scenario, you want to use low values for keep-alive time and interval.
Note that these are kernel level settings, and thus are applied system-wide. We use a dedicated server, so this is no issue for us, but may be in other cases.
Finally, for completeness, I'll answer my own three questions.
The path of the the configuration is /etc/Rserv.conf, as was confirmed by changing another setting (remoted enable to remote disable).
This is handled a the socket level.
I am not sure, but using tcpdump shows that Rserve emits keep-alive packets, which is a more useful way to inspect what's happening.

What happens to a waiting WebSocket connection on a TCP level when server is busy (blocked)

I am load testing my WebSocket Tornado server, running on Ubuntu Server 14.04.
I am playing with a big client machine loading 60,000 users, 150 a second (that's what my small server can comfortably take). Client is a RedHat machine. When a load test suite finishes, I have to wait a few seconds to be able to rerun.
Within these few seconds, my websocket server is handling closing of the 60,000 connections. I can see it in my graphite dashboard (the server logs every connect and and disconnect information there).
I am also logging relevant outputs of the netstat -s and ss -s commands to my graphite dashboard. When the test suite finishes, I can immediately see tcp established seconds dropping from 60,000 to ~0. Other socket states (closed, timewait, synrecv, orphaned) remain constant, very low. My client's sockets go to timewait for a short period and then this number goes to 0 too. When I immediately rerun the suite, and all the tcp sockets on both ends are free, but the server has not finished processing of the previous closing batch yet, I see no changes on the tcp socket level until the server is finished processing and starts accepting new connections again.
My question is - where is the information about the sockets waiting to be established stored (RedHat and Ubuntu)? No counter/queue length that I am tracking shows this.
Thanks in advance.

Get rsyslog forwarding messages after remote server restart

I have syslog successfully forwarding logs to an upstream server like so:
$MainMsgQueyeType LinkedList
$MainMsgQueueSize 10000
$MainMsgQueusDiscardMark 8000
$MainMsgQueueDiscardSeverity 1
$MainMsgQueueSaveOnShutdown off
$MainMsgQueueTimeoutEnqueue 0
$ActionQueueType LinkedList # in memory queue
$ActionQueueFileName fwdRule1 # unique name prefix for spool files
$ActionQueueSize 10000 # Only allow 10000 elements in the queue
$ActionQueueDiscardMark 8000 # Only allow 8000 elements in the queue before dropping msgs
$ActionQueueDiscardSeverity 1 # Discard Alert,Critical,Error,Warning,Notice,Info,Debug, NOT Emergency
$ActionQueueSaveOnShutdown off # save messages to disk on shutdown
$ActionQueueTimeoutEnqueue 0
$ActionResumeRetryCount -1 # infinite retries if host is down
$RepeatedMsgReduction off
*.* ##remoteserver.mynetwork.com:5544
On the remoteserver I have something that talks syslog and listens on that port. To test, I have a simple log client that logs 100 messages a second to syslog.
This all works fine, and I have configured the queues above so that in the event that the remoteserver is unavailable, the queues start filling up, and then eventually messages get discarded, thus safeguarding syslog from blocking its logging clients.
When I stop the remote log sink on remoteserver:5544, syslog is still stable (queues filling up / full up), but when I restart the remote log sink a while later, rsyslog detects the server again, reestablishes a TCP connection
HOWEVER - syslog only forwards 1 message to it, despite the queue having many thousands of messages in it, and the logging client continuing to log 100 messages a second
How can I make syslog start forwarding messages again once it has detected the remoteserver is back up? (Without restarting syslog).
Am using rsyslog 4.6.2-2
I am using, and want to use TCP
The problem in case anybody comes across this was that workdirectory was set to:
$WorkDirectory /var/spool/rsyslog
And the above config, does this:
$ActionQueueFileName fwdRule1
Even though its supposed to be an in-memory queue. Because of this, when the queue reached 800 (bizarrely, not 8000), disk-assisted mode was activated, and syslog attempted to write messages to /var/spool/rsyslog. This directory didn't exist . Randomly, (hence a race condition must exist and a bug in rsyslog), after continually trying to open a queue file on the disk in that directory, rsyslog got into a twisted state and gave up and continued queueing messages, until it hit the high 10,000 mark. Restarting the downstream logserver failed to make it recover.
Taking out all references to ActionQueueFileName and making WorkDirectory exist fixed this issue.

Tcp Socket Closed

I always thought that if you didn't implement a heartbeat, there was no way to know if one side of a TCP connection died unexpectedly. If the process was just killed on one side and didn't exit gracefully, there was no way for the socket to send FIN or let the other side know that it was closed.
(See some of the comments here for example http://www.perlmonks.org/?node_id=566568 )
But there is a stock order server that I connect to that has a new "cancel all orders on disconnect feature" that cancels live orders if the client dis-connects. It works even when I kill the process on my end, and there is definitely no heartbeat from my app to it.
So how is it able to detect when I've killed the process? My app is running on Windows Server 2003 and the order server is on Suse Linux Enterprise Server 10. Does Windows detect that the process associated with the socket is no longer alive and send the FIN?
When a process exits - for whatever reason - the OS will close the TCP connections it had open.
There's numerous other ways a TCP connection can go dead undetected
someone yanks out a network cable inbetween.
the computer at the other end gets nuked.
a nat gateway inbetween silently drops the connection
the OS at the other end crashes hard.
the FIN packets gets lost.
Though enabling tcp keepalive, you'll detect it eventually - atleast during a couple of hours.
It could be using a TCP Keep Alive to check for dead peers:
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
As far as I know, the OS detects the process termination and closes all the file descriptors/sockets/handles the process was using. So, there isn't difference between "killing" application and "gracefully terminating". Of course, the kernel itself must be running (=pc turned on, wire connected...). But it's on the OS the job of sending the FIN and so on...
Also, if a host becomes unreachable /turned off, disconnected...) an intermediate gateway (or the client itself) may detect the event (e.g. loss of carrier, DHCP lease not renewed...) and reply to the packets sent to the died host with a ICMP error (host/network unreachable). This causes the peer's TCP connection to die, but it happens only if the client has some packet to send to the host.

Resources