Assessing/diagnosing time connections are in SYN_RECV before being established - tcp

I'm trying to improve the performance of a (virtual) web server with a fairly standard CentOS/Apache setup and one thing I noticed is that new connections seem to "stick" in the SYN_RECV state, sometimes for several seconds, before finally being established and handled by Apache.
My first guess was that Apache could be reaching the limit for the number of connections it's prepared to handle simultaneously, but e.g. with keep-alive off netstat is reporting a few established connections (just those not involving localhost, so discarding "housekeeping" connections e.g. between Apache and Tomcat), whereas with keep-alive on it will happily get up to 100+ established connections (but with no clear difference to the SYN_RECV behaviour either way -- there's typically 10-20 connections sitting in SYN_RECV at any one time).
What are people's recommendations for investigating where the bottleneck is that's preventing the connections from being established quickly?
P.S. Follow-on question: does anybody know what a TYPICAL statistic would be for the time for a connection to be established once first "hitting" the server?
Update in case anyone else encounters this: in the end, I wrote a small Java program to take data from /proc/net/tcp and analyse and it appears that this is happening for a small proportion of connections (although that still means that at any one time there can be a number of connections in this state, because they can stay this way for a number of seconds) and looks like an issue local to those connections. Over 90% of connections are still going through in < 500ms and 81% in < 200ms. So if others get this, there isn't necessarily need for panic immediately.

Try capturing a packet trace and see if SYN ACKs are being retransmitted (and the number of re-tx). This could indicate a routing issue (SYN comes in via path A and SYN-ACK goes via path B which is broken).
Also see if these connections have a specific pattern (such as originating from the same network).

Related

Extremely high latency when network gets busy, TCP, libevent

In our C/S based online game project, we use TCP for network transmission. We include Libevent, utilise a bufferevent for each connection to handling with the network I/O automatically.
It works well before,but the lagging problem comes to the surface recently. When I do some stress testing to make the network busier, the latency becomes extremely high, several seconds or more. The server sinks into a confusing state:
the average CPU usage decreased (0%-60%-0%-60% repeat, waiting something?)
the net traffic decreased (nethogs)
the clients connected to server still alive (netstat & tcpdump)
It looks like something magically slowed all system down, but new connection to server responded quit in time.
When I changed the protocol to UDP, it works well on the same situation: no obvious latency, the system runs fast. Net traffic is around 3M/S.
The project is running on an Intranet. I also tested the max download speed, nearly 18M/S.
I studied part of Libevent's header files and ducumentations, tried to setup a rate limit to all connections. It did some improvements, but not completely resolved the problem even though I had tried several different configurations. Here is my parameters: read_rate 163840, read_burst 163840, write_rate 163840, write_burst 163840, tick_len 500ms.
Thank you for your help!
TCP = Transmission Control Protocol. It responds to packet loss by retransmitting unacknowledged packets after a delay. In the case of repeated loss, it will exponentially back off. Take a look at this network capture of an attempt to open a connection to host that is not responding:
It sends the initial SYN, and then after not getting an ack for 1s it tries again. After not getting an ack it then sends another after ~2s, then ~4s, then ~8s, and so on. So you can see that you can get some serious latency in the face of repeated packet loss.
Since you said you were deliberately stressing the network, and that the CPU usage is inconsistent, one possible explanation is that TCP is waiting to retransmit lost packets.
The best way to see what is going on is to get a network capture of what is actually transmitted. If your hosts are connected to a single switch, you can "span" a port of interest to the port of another host where you can make the capture.
If your switch isn't capable of this, or if you don't have the administrative control of the switch, then you will have to get the capture from one of hosts involved in your online game. The disadvantage of this is that taking the capture will possibly alter what happens, and it doesn't see what is actually on the wire. For example, you might have TCP segmentation offload enabled for your interface, in which case the capture will see large packets that will be broken up by the network interface.
I would suggest installing wireshark to analyse the network capture (which you can do in real time by using wireshark to do the capture as well). Any time you are working with a networked system I would recommend using wireshark so that you have some visibility into what is actually happening on the network. The first filter I would suggest you use is tcp.analysis.flags which will show you packets suggestive of problems.
I would also suggest turning off the rate limiting first to try to see what is going on (rate limiting is adding another reason to not send packets, which is probably going to make it harder to diagnose what is going on). Also, 500ms might be a longish tick_len depending on how your game operates. If your burst configuration allows the rate to be used up in 100ms, you will end up waiting 400ms before you can transmit again. The IO Graph is a very helpful feature of Wireshark in this regard. It can help you see transmission rates, although the default tick interval and unit are not very helpful in this regard. Here is an example of a bursty flow being rate limited to 200mbit/s:
Note that the tick interval is 1ms and the unit is bits/tick, which makes the top of the chart 1gb/s, the speed of the interface in question.

How To Compute HTTP request processing time without network latency?

Because of geographic distance between server and client network latency can vary a lot. So I want to get "pure" req. processing time of service without network latency.
I want to get network latency as TCP connecting time. As far as I understand this time depends a lot on network.
Main idea is to compute:
TCP connecting time,
TCP first packet receive time,
Get "pure" service time = TCP first packet receive (waiting time) – TCP connecting.
I divide TCP connecting by 2 because in fact there are 2 requests-response (3-way handshake).
I have two questions:
Should I compute TCP all packets receive time instead of only first packet?
Is this method okay in general?
PS: As a tool I use Erlang's gen_tcp. I can show the code.
If at all, i guess the "pure" service time = TCP first packet receive - TCP connecting.. You have written other way round.
A possible answer to your first questions is , you should ideally compute atleast some sort of average by considering pure service time of many packets rather than just first packet.
Ideally it can also have worst case, average case, best case service times.
For second question to be answered we would need why would you need pure service time only. I mean since it is a network application, network latencies(connection time etc...) should also be included in the "response time", not just pure service time. That is my view based on given information.
I have worked on a similar question when working for a network performance monitoring vendor in the past.
IMHO, there are a certain number of questions to be asked before proceeding:
connection time and latency: if you base your network latency metric, beware that it takes into account 3 steps: Client sends a TCP/SYN, Server responds with a TCP/SYN-ACK, the Client responds by a final ACK to set up the TCP connection. This means that the CT is equivalent to 1.5 RTT (round trip time). This validates taking the first two steps of the TCP setup process in acccount like you mention.
Taking in account later TCP exchanges: while this first sounds like a great idea to keep evaluating network latency in the course of the session, this becomes a lot more tricky. Here is why: 1. Not all packets have to be acknowledged (RFC1122 or https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment) , which will generate false measurements when it occurs, so you will need an heuristic to take these off your calculations. 2. Not all systems consider acknowledging packets a high priority tasks. This means that some high values will pollute your network latency data and simply reflect the level of load of the server for example.
So if you use only the first (and reliable) measurement you may miss some network delay variation (especially in apps using long lasting TCP sessions).

Requirements for Repeated TCP connects

I am using Winsock, and I have a need to issue a TCP connect repeatedly to a third-party server. These applications will stay up potentially for days at a time. I am the only client connecting to the server. The time between connects is on the order of seconds, and the connection stays up only long enough to send a single message of a few bytes. I am currently seeing that the connects start to fail (WSAECONNREFUSED) after a few hours. Is there anything I must do (e.g. socket options, etc.) to ensure these frequent repeated connects will succeed for an indefinite amount of time? Thanks!
When doing a lot of transaction based connections and having issues with TCP's TIME_WAIT state duration (which last 2MSL = 120 seconds) leading to no more connections available for a client host toward a special server host, you should consider UDP and managing yourself the re-sending of lost requests.
I know that sounds odd. But standard services like DNS are required to use UDP to handle a ton of transactions (request then a single answer in one UDP segment) in order to avoid issues you are experimenting yourself. Web browsers send a request using UDP to the DNS. Re-request is done using UDP after a short time, no longer than a few milliseconds I guess. Sometimes the resolved name is too long and does not fit in the UDP paquet. As a consequence the DNS server send a UDP reply with a dedicated flag raised, in order to ask the client to use TCP this time.
Moreover you may consider also the T/TCP extension (Transactional TCP) of TCP, if available on your Windows platform. It provides TCP reliability with shorter TIME_WAIT state, as nearly no costs in the modifications of your client code. As far as I know it may work even though the server does not handle that extension. As a side note it is currently not used on the internet as it is know to have some flaw...

TCP Connection Life

How long can I expect a client/server TCP connection to last in the wild?
I want it to stay permanently connected, but things happen, so the client will have to reconnect. At what point do I say that there's a problem in the code rather than there's a problem with some external equipment?
I agree with Zan Lynx. There's no guarantee, but you can keep a connection alive almost indefinitely by sending data over it, assuming there are no connectivity or bandwidth issues.
Generally I've gone for the application level keep-alive approach, although this has usually because it's been in the client spec so I've had to do it. But just send some short piece of data every minute or two, to which you expect some sort of acknowledgement.
Whether you count one failure to acknowledge as the connection having failed is up to you. Generally this is what I have done in the past, although there was a case I had wait for three failed responses in a row to drop the connection because the app at the other end of the connection was extremely flaky about responding to "are you there?" requests.
If the connection fails, which at some point it probably will, even with machines on the same network, then just try to reestablish it. If that fails a set number of times then you have a problem. If your connection persistently fails after it's been connected for a while then again, you have a problem. Most likely in both cases it's probably some network issue, rather than your code, or maybe a problem with the TCP/IP stack on your machine (has been known: I encountered issues with this on an old version of QNX--it'd just randomly fall over). Having said that you might have a software problem, and the only way to know for sure is often to attach a debugger, or to get some logging in there. E.g. if you can always connect successfully, but after a time you stop getting ACKs, even after reconnect, then maybe your server is deadlocking, or getting stuck in a loop or something.
What's really useful is to set up a series of long-running tests under a variety of load conditions, from just sending the keep alive are you there?/ack requests and responses, to absolutely battering the server. This will generally give you more confidence about your software components, and can be really useful in shaking out some really weird problems which won't necessarily cause a problem with your connection, although they might result in problems with the transactions taking place. For example, I was once writing a telecoms application server that provided services such as number translation, and we'd just leave it running for days at a time. The thing was that when Saturday came round, for the whole day, it would reject every call request that came in, which amounted to millions of calls, and we had no idea why. It turned out to be because of a single typo in some date conversion code that only caused a problem on Saturdays.
Hope that helps.
I think the most important idea here is theory vs. practice.
The original theory was that the connections had no lifetimes. If you had a connection, it stayed open forever, even if there was no traffic, until an event caused it to close.
The new theory is that most OS releases have turned on the keep-alive timer. This means that connections will last forever, as long as the system on the other end responds to an occasional TCP-level exchange.
In reality, many connections will be terminated after time, with a variety of criteria and situations.
Two really good examples are: The remote client is using DHCP, the lease expires, and the IP address changes.
Another example is firewalls, which seem to be increasingly intelligent, and can identify keep-alive traffic vs. real data, and close connections based on any high level criteria, especially idle time.
How you want to implement reconnect logic depends a lot on your architecture, the working environment, and your performance goals.
It shouldn't really matter, you should design your code to automatically reconnect if that is the desired behavior.
There really is no way to tell. There is nothing inherent to TCP that would cause the connection to just drop after a certain amount of time. Someone on a reliable connection could have years of uptime, while someone on a different connection could have to reconnect every 5 minutes. There is no way to tell or even guess.
You will need some data going over the connection periodically to keep it alive - many OS's or firewalls will drop an inactive connection.
Pick a value. One drop every hour is probably fine. Ten unexpected connection drops in 5 minutes probably indicates a problem.
TCP connections will generally last about two hours without any traffic. Either end can send keep-alive packets, which are, I think, just an ACK on the last received packet. This can usually be set per socket or by default on every TCP connection.
An application level keep-alive is also possible. For a telnet style protocol like FTP, SMTP, POP or IMAP something like sending return, newline and getting back a command prompt.

Setting TIME_WAIT TCP

We're trying to tune an application that accepts messages via TCP and also uses TCP for some of its internal messaging. While load testing, we noticed that response time degrades significantly (and then stops altogether) as more simultaneous requests are made to the system. During this time, we see a lot of TCP connections in TIME_WAIT status and someone suggested lowering the TIME_WAIT environment variable from it's default 60 seconds to 30.
From what I understand, the TIME_WAIT setting essentially sets the time a TCP resource is made available to the system again after the connection is closed.
I'm not a "network guy" and know very little about these things. I need a lot of what's in that linked post, but "dumbed down" a little.
I think I understand why the TIME_WAIT value can't be set to 0, but can it safely be set to 5? What about 10? What determines a "safe" setting for this value?
Why is the default for this value 60? I'm guessing that people a lot smarter than me had good reason for selecting this as a reasonable default.
What else should I know about the potential risks and benefits of overriding this value?
A TCP connection is specified by the tuple (source IP, source port, destination IP, destination port).
The reason why there is a TIME_WAIT state following session shutdown is because there may still be live packets out in the network on their way to you (or from you which may solicit a response of some sort). If you were to re-create that same tuple and one of those packets showed up, it would be treated as a valid packet for your connection (and probably cause an error due to sequencing).
So the TIME_WAIT time is generally set to double the packets maximum age. This value is the maximum age your packets will be allowed to get to before the network discards them.
That guarantees that, before you're allowed to create a connection with the same tuple, all the packets belonging to previous incarnations of that tuple will be dead.
That generally dictates the minimum value you should use. The maximum packet age is dictated by network properties, an example being that satellite lifetimes are higher than LAN lifetimes since the packets have much further to go.
Usually, only the endpoint that issues an 'active close' should go into TIME_WAIT state. So, if possible, have your clients issue the active close which will leave the TIME_WAIT on the client and NOT on the server.
See here: http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html and http://www.isi.edu/touch/pubs/infocomm99/infocomm99-web/ for details (the later also explains why it's not always possible due to protocol design that doesn't take TIME_WAIT into consideration).
Pax is correct about the reasons for TIME_WAIT, and why you should be careful about lowering the default setting.
A better solution is to vary the port numbers used for the originating end of your sockets. Once you do this, you won't really care about time wait for individual sockets.
For listening sockets, you can use SO_REUSEADDR to allow the listening socket to bind despite the TIME_WAIT sockets sitting around.
In Windows, you can change it through the registry:
; Set the TIME_WAIT delay to 30 seconds (0x1E)
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters]
"TcpTimedWaitDelay"=dword:0000001E
setting the tcp_reuse is more useful than changing time_wait, as long as you have the parameter (kernels 3.2 and above, unfortunately that disqualifies all versions of RHEL and XenServer).
Dropping the value, particularly for VPN connected users, can result in constant recreation of proxy tunnels on the outbound connection. With the default Netscaler (XenServer) config, which is lower than the default Linux config, Chrome will sometimes have to recreate the proxy tunnel up to a dozen times to retrieve one web page. Applications that don't retry, such as Maven and Eclipse P2, simply fail.
The original motive for the parameter (avoid duplication) was made redundant by a TCP RFC that specifies timestamp inclusion on all TCP requests.
I have been load testing a server application (on linux) by using a test program with 20 threads.
In 959,000 connect / close cycles I had 44,000 failed connections and many thousands of sockets in TIME_WAIT.
I set SO_LINGER to 0 before the close call and in subsequent runs of the test program had no connect failures and less than 20 sockets in TIME_WAIT.
TIME_WAIT might not be the culprit.
int listen(int sockfd, int backlog);
According to Unix Network Programming Volume1, backlog is defined to be the sum of completed connection queue and incomplete connection queue.
Let's say the backlog is 5. If you have 3 completed connections (ESTABLISHED state), and 2 incomplete connections (SYN_RCVD state), and there is another connect request with SYN. The TCP stack just ignores the SYN packet, knowing it'll be retransmitted some other time. This might be causing the degradation.
At least that's what I've been reading. ;)

Resources