Understanding TCP dynamics

Understanding TCP dynamics - tcp

What I am experiencing:
We have an app server that connects to a rabbitmq server.
As time goes on, the number of ESTABLISHED connections looking from the rabbitmq server goes up, but the count from the app server remains fairly constant.
I run this on both:
root#app01:~# netstat -ant | grep EST | grep 5672 | grep 172.25.12.48
tcp 0 0 172.25.12.48:50587 10.48.64.230:5672 ESTABLISHED
tcp 0 0 172.25.12.48:50588 10.48.64.230:5672 ESTABLISHED
root#rabbit01:~# netstat -ant | grep EST | grep 5672 | grep 172.25.12.48
tcp6 0 0 10.48.64.230:5672 172.25.12.48:38408 ESTABLISHED
tcp6 0 0 10.48.64.230:5672 172.25.12.48:50588 ESTABLISHED
tcp6 0 0 10.48.64.230:5672 172.25.12.48:33491 ESTABLISHED
tcp6 0 0 10.48.64.230:5672 172.25.12.48:50587 ESTABLISHED
tcp6 0 0 10.48.64.230:5672 172.25.12.48:34541 ESTABLISHED
Example results will give say 6 on the app server and 15 (even as high as 46) on the rabbitmq server.
I restart the rabbitmq server and obviously everything is back to normal, 2 connections each side.
I am assuming that either the switch times out the connection or the application terminates the process uncleanly. I am looking into this, but I would like to understand the TCP behaviour better.
Current settings on the rabbitmq server:
tcp_retries1 set to 3
tcp_retries2 set to 15
So I would normally have expected to see those 'invalid' connections drop after around 13-30 minutes if I understand the forums correctly.
Yet, looking at the tcp keepalive values:
tcp_keepalive_time is set to 7200 (So after 2h it will send the first keepalive probe.)
tcp_keepalive_intvl is 75 (So 75 seconds after the first one, it will resend a probe.)
tcp_keepalive_probes is 9 (So it will send 9 probes in total)
So the keepalive process will take 7200+(9*75)=7875, or roughly 2 hours 11 minutes before closing.
Due to the connection (presumably) disappearing, that leads to two questions.
1. Which is correct?
2. Am I missing another option apart from switch or app terminating abnormally that could cause these invalid connections?
Source: https://www.frozentux.net/ipsysctl-tutorial/chunkyhtml/tcpvariables.html

Our F5's were the reason for this.
Seems after an idle timeout, the F5 drops the entry from its tables. So from the app servers, the connection is invalid, and TCP closes them.
From the rabbitmq server though, somehow that connections remains a valid connections.
Although I don't understand why a switch has the ability to keep a connection state valid - but, this was the reason.

Related

Vert.x HTTP Client creates more connections than the MaxPoolSize

I have 8 verticle in my application. Each Verticle is on a separate thread. Each Verticle has an WebClient ( Vert.x HTTP client)
I am setting the MaxPoolSize to 10.
WebClientOptions webClientOptions = new WebClientOptions() .setMaxPoolSize(10)
However when I checked with
/usr/sbin/ss -o state established -tn | tail -n +2 | awk '{ print $4 }' | sort |uniq -c | sort -n
On a production host, I can see that there are more than 10 connections per IP:Port.
Question 1:
Is MaxPoolSize global for the entire application or per verticle.
So for X.X.X.X:Y can I created 10 connections or 80 from my application?
Question 2:
When I send a request to a host that has more than one IP in its DNS, would the connection pool be per host, or per IP?
For example gogo.com resolves to 2 IP addresses. Can I create 10 connections to gogo.com 20?

To understand how it works, let's look at the actual code of HttpClientImpl.
You would be most interested in this part:
https://github.com/eclipse/vert.x/blob/master/src/main/java/io/vertx/core/http/impl/HttpClientImpl.java#L161
As you can see, each WebClient/HttpClient has its own connection pool. So, 8 clients with maxPool of 10 will result in 80 connections.
As to you second question, the connection is per host, not IP, as far as I know and can see from the code. So you'll always be able to establish up to 10 connections:
https://github.com/eclipse/vert.x/blob/39c22d657d2daf640cfbdd8c63e5110fc73474fb/src/main/java/io/vertx/core/http/impl/ConnectionManager.java#L56
Footnote: this is all true only if you don't touch http2MaxPoolSize. If you do, the math is a bit different.

Unable to reduce TIME_WAIT

I'm attempting to reduce the amount of time a connection is in the TIME_WAIT state by setting tcp_fin_timeout detailed here:
root:~# sysctl -w net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_fin_timeout = 30
However, this setting does not appear to affect anything. When I look at the netstat of the machine, the connections still wait the default 60s:
root:~# watch netstat -nato
tcp 0 0 127.0.0.1:34185 127.0.0.1:11209 TIME_WAIT timewait (59.14/0/0)
tcp 0 0 127.0.0.1:34190 127.0.0.1:11209 TIME_WAIT timewait (59.14/0/0)
Is there something I'm missing? The machine is running Ubuntu 14.04.1.

Your link is urban myth. The actual function of net.ipv4.tcp_fin_timeout is as follows:
This specifies how many seconds to wait for a final FIN packet before the socket is forcibly closed. This is strictly a violation of the TCP specification, but required to prevent denial-of-service attacks. In Linux 2.2, the default value was 180.
This doesn't have anything to do with TIME_WAIT. It establishes a timeout for a socket in FIN_WAIT_1, after which the connection is reset (which bypasses TIME_WAIT altogether). This is a DOS measure, as stated, and should never arise in a correctly written client-server application. You don't want to set it so low that ordinary connections are reset: you will lose data. You don't want to fiddle with it at all, actually.
The correct way to reduce TIME_WAIT states is given here.

What could cause so many TIME_WAIT connections to be open?

So, I have application A on one server which sends 710 HTTP POST messages per second to application B on another server, which is listening on a single port. The connections are not keep-alive; they are closed.
After a few minutes, application A reports that it can't open new connections to application B.
I am running netstat continuously on both machines, and see that a huge number of TIME_WAIT connections are open on each. Virtually all connections showing are in TIME_WAIT. From reading online, it seems that this is the state it's in for 30 seconds (on our machines 30 seconds according to /proc/sys/net/ipv4/tcp_fin_timeout value) after each side closes the connection.
I have a script running on each machine that's continuously doing:
netstat -na | grep 5774 | wc -l
and:
netstat -na | grep 5774 | grep "TIME_WAIT" | wc -l
The value of each, on each machine, seems to get to around 28,000 before application A reports that it can't open new connections to application B.
I've read that this file: /proc/sys/net/ipv4/ip_local_port_range provides the total number of connections that can be open at once:
$ cat /proc/sys/net/ipv4/ip_local_port_range
32768 61000
61000 - 32768 = 28232, which is right in line with the approximately 28,000 TIME_WAITs I am seeing.
My question is how is it possible to have so many connections in TIME_WAIT.
It seems that at 710 connections per second being closed, I should see approximately 710 * 30 seconds = 21300 of these at a given time. I suppose that just because there are 710 being opened per second doesn't mean that there are 710 being closed per second...
The only other thing I can think of is a slow OS getting around to closing the connections.

TCP's TIME_WAIT indicates that local endpoint (this side) has closed the connection. The connection is being kept around so that any delayed packets can be matched to the connection and handled appropriately. The connections will be removed when they time out within four minutes.
Assuming that all of those connections were valid, then everything is working correctly. You can eliminate the TIME_WAIT state by having the remote end close the connection or you can modify system parameters to increase recycling (though it can be dangerous to do so).
Vincent Bernat has an excellent article on TIME_WAIT and how to deal with it:
The Linux kernel documentation is not very helpful about what net.ipv4.tcp_tw_recycle does:
Enable fast recycling TIME-WAIT sockets. Default value is 0. It should
not be changed without advice/request of technical experts.
Its sibling, net.ipv4.tcp_tw_reuse is a little bit more documented but the language is about the same:
Allow to reuse TIME-WAIT sockets for new connections when it is safe
from protocol viewpoint. Default value is 0. It should not be changed
without advice/request of technical experts.
The mere result of this lack of documentation is that we find numerous tuning guides advising to set both these settings to 1 to reduce the number of entries in the TIME-WAIT state. However, as stated by tcp(7) manual page, the net.ipv4.tcp_tw_recycle option is quite problematic for public-facing servers as it won’t handle connections from two different computers behind the same NAT device, which is a problem hard to detect and waiting to bite you:
Enable fast recycling of TIME-WAIT sockets. Enabling this option is
not recommended since this causes problems when working with NAT
(Network Address Translation).

How to determine what subsystem listens to each port in a large java system

I have a big java system on unix with many subsystems (email, connection etc..) that listen on many ports, but i dont know which of my classes\subsystems listen to which port.
Is there a tool that can help me figure this out?
example:
this is what i get when i run netstat, and i dont know what in my java system is using port 2503 and what 2505
>netstat -nap |grep 250
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:2503 0.0.0.0:* LISTEN 26659/java
tcp 0 0 0.0.0.0:2505 0.0.0.0:* LISTEN 26659/java

The same PID/app is using 2 ports. You have your answer.
You will have to be MUCH more specific on your problem given the fact that the answer is so obvious based on the information provided; I can only assume this isn't what you're looking for

Once you have the PID, you can use ps
ps -Af |grep 26659 |less
The -f option will show not only the program (java) that is using the port, but also the command line used to start it. So if you have multiple java processes running, and each are started as a separate task, you will see which one is using the port.
You will probably want to view the result in "less" so you can scroll the very long command lines common to java.

nginx connection limit

We had 2 nginx servers running perfectly at 1000reqs/second total in front of 3 php5-fpm servers with TCP connections. We thought that one nginx server would be sufficient and redirected all of our traffic to it. But, the server could not serve more than 750reqs/sec. It has gigabit ethernet and total traffic on it doesnot exceed 100mbits (Debian 6.0)
We could not find any reason and after googling found out that it might be related with TCP issues. But it did not seem very likely that we should do any change with this number of connections and bandwidth (around 70mbits/sec) Later we redirected half of our traffic back to another nginx and again reached 1000reqs/second.
We have been looking at nginx error and access logs. Is there any tool or file that could help us find the solution for the problem?

Most linux distributions have 28232 ephemeral ports available. A server needs one ephemeral port for each connection in order to free up the primary port (i.e. http server port 80) for the new connections.
So, it would seem if the server is handling 1000 requests/sec for content generated by php5-fpm over TCP, you are allocating 2000 ports/sec. This is not really the case, it is likely 5% PHP and 95% static (no port allocation) and IIRC nginx<->php-fpm keeps ports open for subsequent requests. There are lots of factors what can affect these numbers, but for arguments sake, lets say 1000 port allocations/sec.
On the surface this does not seem like a problem, but by default ports are not immediately released and made available for new connections. There are various reasons for this behavior, and I highly recommend a thorough understanding of TCP before arbitrarily making changes detailed here (or anywhere else).
Primarily a connection state called TIME_WAIT (socket is waiting after close to handle packets still in the network, netstat man page) is what holds up ports from being released for reuse. On recent (all?) linux kernels TIME_WAIT is hard-coded to 60 seconds, and according to RFC793 a connection may stay in TIME_WAIT up to four minutes!
This means at least 1000 ports will be in use for at least 60 seconds. In the real world, you need to account for transit time, keep-alive requests (multiple requests use the same connection), and service ports (between nginx and backend server). Lets arbitrarily knock it down to 750 ports/sec.
In ~37 seconds all your available ports will be used up (28232 / 750 = 37). That's a problem, because it takes 60 seconds to release a port!
To see all the ports in use, run apache bench or something similar that can generate the number of requests per second you are tuning for. Then run:
root:~# netstat -n -t -o | grep timewait
You'll get output like (but many, many more lines):
tcp 0 0 127.0.0.1:40649 127.1.0.2:80 TIME_WAIT timewait (57.58/0/0)
tcp 0 0 127.1.0.1:9000 127.0.0.1:50153 TIME_WAIT timewait (57.37/0/0)
tcp 0 0 127.0.0.1:40666 127.1.0.2:80 TIME_WAIT timewait (57.69/0/0)
tcp 0 0 127.0.0.1:40650 127.1.0.2:80 TIME_WAIT timewait (57.58/0/0)
tcp 0 0 127.0.0.1:40662 127.1.0.2:80 TIME_WAIT timewait (57.69/0/0)
tcp 0 0 127.0.0.1:40663 127.1.0.2:80 TIME_WAIT timewait (57.69/0/0)
tcp 0 0 127.0.0.1:40661 127.1.0.2:80 TIME_WAIT timewait (57.61/0/0)
For a running total of allocated ports:
root:~# netstat -n -t -o | wc -l
If you're receiving failed requests, the number will be at/close to 28232.
How to solve the problem?
Increase the number of ephemeral ports from 28232 to 63976.
sysctl -w net.ipv4.ip_local_port_range="1024 65000"
Allow linux to reuse TIME_WAIT ports before the timeout expires.
sysctl -w net.ipv4.tcp_tw_reuse="1"
Additional IP addresses.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex