nginx connection limit - nginx

We had 2 nginx servers running perfectly at 1000reqs/second total in front of 3 php5-fpm servers with TCP connections. We thought that one nginx server would be sufficient and redirected all of our traffic to it. But, the server could not serve more than 750reqs/sec. It has gigabit ethernet and total traffic on it doesnot exceed 100mbits (Debian 6.0)
We could not find any reason and after googling found out that it might be related with TCP issues. But it did not seem very likely that we should do any change with this number of connections and bandwidth (around 70mbits/sec) Later we redirected half of our traffic back to another nginx and again reached 1000reqs/second.
We have been looking at nginx error and access logs. Is there any tool or file that could help us find the solution for the problem?

Most linux distributions have 28232 ephemeral ports available. A server needs one ephemeral port for each connection in order to free up the primary port (i.e. http server port 80) for the new connections.
So, it would seem if the server is handling 1000 requests/sec for content generated by php5-fpm over TCP, you are allocating 2000 ports/sec. This is not really the case, it is likely 5% PHP and 95% static (no port allocation) and IIRC nginx<->php-fpm keeps ports open for subsequent requests. There are lots of factors what can affect these numbers, but for arguments sake, lets say 1000 port allocations/sec.
On the surface this does not seem like a problem, but by default ports are not immediately released and made available for new connections. There are various reasons for this behavior, and I highly recommend a thorough understanding of TCP before arbitrarily making changes detailed here (or anywhere else).
Primarily a connection state called TIME_WAIT (socket is waiting after close to handle packets still in the network, netstat man page) is what holds up ports from being released for reuse. On recent (all?) linux kernels TIME_WAIT is hard-coded to 60 seconds, and according to RFC793 a connection may stay in TIME_WAIT up to four minutes!
This means at least 1000 ports will be in use for at least 60 seconds. In the real world, you need to account for transit time, keep-alive requests (multiple requests use the same connection), and service ports (between nginx and backend server). Lets arbitrarily knock it down to 750 ports/sec.
In ~37 seconds all your available ports will be used up (28232 / 750 = 37). That's a problem, because it takes 60 seconds to release a port!
To see all the ports in use, run apache bench or something similar that can generate the number of requests per second you are tuning for. Then run:
root:~# netstat -n -t -o | grep timewait
You'll get output like (but many, many more lines):
tcp 0 0 127.0.0.1:40649 127.1.0.2:80 TIME_WAIT timewait (57.58/0/0)
tcp 0 0 127.1.0.1:9000 127.0.0.1:50153 TIME_WAIT timewait (57.37/0/0)
tcp 0 0 127.0.0.1:40666 127.1.0.2:80 TIME_WAIT timewait (57.69/0/0)
tcp 0 0 127.0.0.1:40650 127.1.0.2:80 TIME_WAIT timewait (57.58/0/0)
tcp 0 0 127.0.0.1:40662 127.1.0.2:80 TIME_WAIT timewait (57.69/0/0)
tcp 0 0 127.0.0.1:40663 127.1.0.2:80 TIME_WAIT timewait (57.69/0/0)
tcp 0 0 127.0.0.1:40661 127.1.0.2:80 TIME_WAIT timewait (57.61/0/0)
For a running total of allocated ports:
root:~# netstat -n -t -o | wc -l
If you're receiving failed requests, the number will be at/close to 28232.
How to solve the problem?
Increase the number of ephemeral ports from 28232 to 63976.
sysctl -w net.ipv4.ip_local_port_range="1024 65000"
Allow linux to reuse TIME_WAIT ports before the timeout expires.
sysctl -w net.ipv4.tcp_tw_reuse="1"
Additional IP addresses.

Related

Troubleshoot RServe config option keep.alive

I am using RServe 1.7.3 on a headless RHEL 7.9 VM. On the client, I am using RserveCLI2.
On long running jobs, the TCP/IP connection becomes blocked by a fire wall, after 2 hours.
I came across the keep.alive configuration option, that is available since RServe 1.7.2 (RServe News/Changelog).
The specs read:
added support for keep.alive configuration option - it is global to
all servers and if enabled the client sockets are instructed to keep
the connection alive by periodic messages.
I added the following to /etc/Rserv.conf:
keep.alive enable
but this does no prevent the connection from being blocked.
Unfortunately, I cannot run a network monitoring tool, like Wireshark, to monitor the traffic between client and server.
How could I troubleshoot this?
Some specific questions I have:
Is the path of the config file indeed /etc/Rserv.conf, as specified in Documentation for Rserve? Notice that it does not have a final e, like Rserve.
Does this behaviour depend on de RServe client in use, or is this completely handled at the socket level?
Can I inspect the runtime settings of RServe, to see if keep.alive is enabled?
We got this to work.
To summarize, we adjusted some kernel settings to make sure keep-alive packets are send at shorter intervals to prevent the connection from being deemed dead by network components.
This is how and why.
The keep.alive enable setting is in fact an instruction to the socket layer to periodically emit keep-alive packets from server to client. The client is expected to return an ACK on these packets. The behaviour is governed by three kernel-level settings, as explained in TCP Keepalive HOWTO - Using TCP keepalive under Linux:
tcp_keepalive_time (defaults to 7200 seconds)
tcp_keepalive_intvl (defaults to 75 seconds)
tcp_keepalive_probes (defaults to 9 times)
The tcp_keepalive_time is the first time a keep-alive packet is sent, after establishing the tcp/ip connection. The tcp_keepalive_intvl interval is de wait time between subsequent packets and tcp_keepalive_probes the number of subsequent unacknowledged packets that make the system decide the connection is dead.
So, the first keep-alive packet was only send after 2 hours. After that time, some network component had already decided the connection was dead and the keep-alive packet never made it to the client and thus no ACK was ever send.
We lowered both tcp_keepalive_time and tcp_keepalive_intvl to 600 seconds.
With tcpdump -i [interface] port 6311 we were able to monitor the keep-alive packets.
15:40:11.225941 IP <server>.6311 <some node>.<port>: Flags [.], ack 1576, win 237, length 0
15:40:11.226196 IP <some node>.<port> <server>.6311: Flags [.], ack 401, win 511, length 0
This continues until the results are send back and the connection is closed. At least, I test for a duration of 12 hours.
So, we use keep-alive here not to check for dead peers, but to prevent disconnection due to network inactivity, as is discussed in TCP Keepalive HOWTO - 2.2. Why use TCP keepalive?. In that scenario, you want to use low values for keep-alive time and interval.
Note that these are kernel level settings, and thus are applied system-wide. We use a dedicated server, so this is no issue for us, but may be in other cases.
Finally, for completeness, I'll answer my own three questions.
The path of the the configuration is /etc/Rserv.conf, as was confirmed by changing another setting (remoted enable to remote disable).
This is handled a the socket level.
I am not sure, but using tcpdump shows that Rserve emits keep-alive packets, which is a more useful way to inspect what's happening.

Unable to reduce TIME_WAIT

I'm attempting to reduce the amount of time a connection is in the TIME_WAIT state by setting tcp_fin_timeout detailed here:
root:~# sysctl -w net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_fin_timeout = 30
However, this setting does not appear to affect anything. When I look at the netstat of the machine, the connections still wait the default 60s:
root:~# watch netstat -nato
tcp 0 0 127.0.0.1:34185 127.0.0.1:11209 TIME_WAIT timewait (59.14/0/0)
tcp 0 0 127.0.0.1:34190 127.0.0.1:11209 TIME_WAIT timewait (59.14/0/0)
Is there something I'm missing? The machine is running Ubuntu 14.04.1.
Your link is urban myth. The actual function of net.ipv4.tcp_fin_timeout is as follows:
This specifies how many seconds to wait for a final FIN packet before the socket is forcibly closed. This is strictly a violation of the TCP specification, but required to prevent denial-of-service attacks. In Linux 2.2, the default value was 180.
This doesn't have anything to do with TIME_WAIT. It establishes a timeout for a socket in FIN_WAIT_1, after which the connection is reset (which bypasses TIME_WAIT altogether). This is a DOS measure, as stated, and should never arise in a correctly written client-server application. You don't want to set it so low that ordinary connections are reset: you will lose data. You don't want to fiddle with it at all, actually.
The correct way to reduce TIME_WAIT states is given here.

What could cause so many TIME_WAIT connections to be open?

So, I have application A on one server which sends 710 HTTP POST messages per second to application B on another server, which is listening on a single port. The connections are not keep-alive; they are closed.
After a few minutes, application A reports that it can't open new connections to application B.
I am running netstat continuously on both machines, and see that a huge number of TIME_WAIT connections are open on each. Virtually all connections showing are in TIME_WAIT. From reading online, it seems that this is the state it's in for 30 seconds (on our machines 30 seconds according to /proc/sys/net/ipv4/tcp_fin_timeout value) after each side closes the connection.
I have a script running on each machine that's continuously doing:
netstat -na | grep 5774 | wc -l
and:
netstat -na | grep 5774 | grep "TIME_WAIT" | wc -l
The value of each, on each machine, seems to get to around 28,000 before application A reports that it can't open new connections to application B.
I've read that this file: /proc/sys/net/ipv4/ip_local_port_range provides the total number of connections that can be open at once:
$ cat /proc/sys/net/ipv4/ip_local_port_range
32768 61000
61000 - 32768 = 28232, which is right in line with the approximately 28,000 TIME_WAITs I am seeing.
My question is how is it possible to have so many connections in TIME_WAIT.
It seems that at 710 connections per second being closed, I should see approximately 710 * 30 seconds = 21300 of these at a given time. I suppose that just because there are 710 being opened per second doesn't mean that there are 710 being closed per second...
The only other thing I can think of is a slow OS getting around to closing the connections.
TCP's TIME_WAIT indicates that local endpoint (this side) has closed the connection. The connection is being kept around so that any delayed packets can be matched to the connection and handled appropriately. The connections will be removed when they time out within four minutes.
Assuming that all of those connections were valid, then everything is working correctly. You can eliminate the TIME_WAIT state by having the remote end close the connection or you can modify system parameters to increase recycling (though it can be dangerous to do so).
Vincent Bernat has an excellent article on TIME_WAIT and how to deal with it:
The Linux kernel documentation is not very helpful about what net.ipv4.tcp_tw_recycle does:
Enable fast recycling TIME-WAIT sockets. Default value is 0. It should
not be changed without advice/request of technical experts.
Its sibling, net.ipv4.tcp_tw_reuse is a little bit more documented but the language is about the same:
Allow to reuse TIME-WAIT sockets for new connections when it is safe
from protocol viewpoint. Default value is 0. It should not be changed
without advice/request of technical experts.
The mere result of this lack of documentation is that we find numerous tuning guides advising to set both these settings to 1 to reduce the number of entries in the TIME-WAIT state. However, as stated by tcp(7) manual page, the net.ipv4.tcp_tw_recycle option is quite problematic for public-facing servers as it won’t handle connections from two different computers behind the same NAT device, which is a problem hard to detect and waiting to bite you:
Enable fast recycling of TIME-WAIT sockets. Enabling this option is
not recommended since this causes problems when working with NAT
(Network Address Translation).

Understanding TCP dynamics

What I am experiencing:
We have an app server that connects to a rabbitmq server.
As time goes on, the number of ESTABLISHED connections looking from the rabbitmq server goes up, but the count from the app server remains fairly constant.
I run this on both:
root#app01:~# netstat -ant | grep EST | grep 5672 | grep 172.25.12.48
tcp 0 0 172.25.12.48:50587 10.48.64.230:5672 ESTABLISHED
tcp 0 0 172.25.12.48:50588 10.48.64.230:5672 ESTABLISHED
root#rabbit01:~# netstat -ant | grep EST | grep 5672 | grep 172.25.12.48
tcp6 0 0 10.48.64.230:5672 172.25.12.48:38408 ESTABLISHED
tcp6 0 0 10.48.64.230:5672 172.25.12.48:50588 ESTABLISHED
tcp6 0 0 10.48.64.230:5672 172.25.12.48:33491 ESTABLISHED
tcp6 0 0 10.48.64.230:5672 172.25.12.48:50587 ESTABLISHED
tcp6 0 0 10.48.64.230:5672 172.25.12.48:34541 ESTABLISHED
Example results will give say 6 on the app server and 15 (even as high as 46) on the rabbitmq server.
I restart the rabbitmq server and obviously everything is back to normal, 2 connections each side.
I am assuming that either the switch times out the connection or the application terminates the process uncleanly. I am looking into this, but I would like to understand the TCP behaviour better.
Current settings on the rabbitmq server:
tcp_retries1 set to 3
tcp_retries2 set to 15
So I would normally have expected to see those 'invalid' connections drop after around 13-30 minutes if I understand the forums correctly.
Yet, looking at the tcp keepalive values:
tcp_keepalive_time is set to 7200 (So after 2h it will send the first keepalive probe.)
tcp_keepalive_intvl is 75 (So 75 seconds after the first one, it will resend a probe.)
tcp_keepalive_probes is 9 (So it will send 9 probes in total)
So the keepalive process will take 7200+(9*75)=7875, or roughly 2 hours 11 minutes before closing.
Due to the connection (presumably) disappearing, that leads to two questions.
1. Which is correct?
2. Am I missing another option apart from switch or app terminating abnormally that could cause these invalid connections?
Source: https://www.frozentux.net/ipsysctl-tutorial/chunkyhtml/tcpvariables.html
Our F5's were the reason for this.
Seems after an idle timeout, the F5 drops the entry from its tables. So from the app servers, the connection is invalid, and TCP closes them.
From the rabbitmq server though, somehow that connections remains a valid connections.
Although I don't understand why a switch has the ability to keep a connection state valid - but, this was the reason.

Too many connections to memcached in TIME_WAIT state

I have troubles with connections to memcached.
I assume there are no free local ports at busy hours.
netstat -n | grep "127.0.0.1" | grep TIME_WAIT | wc
This command give me 36-50k connections, possible it is more at busy hours
How could extend port range or is there other way to fix it?
We have fixed it.
So if you have many connections with TIME_WAIT status (more than 10-20K) i recommend to make some changes for tcp/ip settings
Modify net.ipv4.tcp_fin_timeout. We use 20s, and i think we can 15s or 10s because connections between servers are really fast.
Extend port range. Modify net.ipv4.ip_local_port_range. Set it for "1024 - 65535"

Resources