How long can I expect a client/server TCP connection to last in the wild?
I want it to stay permanently connected, but things happen, so the client will have to reconnect. At what point do I say that there's a problem in the code rather than there's a problem with some external equipment?
I agree with Zan Lynx. There's no guarantee, but you can keep a connection alive almost indefinitely by sending data over it, assuming there are no connectivity or bandwidth issues.
Generally I've gone for the application level keep-alive approach, although this has usually because it's been in the client spec so I've had to do it. But just send some short piece of data every minute or two, to which you expect some sort of acknowledgement.
Whether you count one failure to acknowledge as the connection having failed is up to you. Generally this is what I have done in the past, although there was a case I had wait for three failed responses in a row to drop the connection because the app at the other end of the connection was extremely flaky about responding to "are you there?" requests.
If the connection fails, which at some point it probably will, even with machines on the same network, then just try to reestablish it. If that fails a set number of times then you have a problem. If your connection persistently fails after it's been connected for a while then again, you have a problem. Most likely in both cases it's probably some network issue, rather than your code, or maybe a problem with the TCP/IP stack on your machine (has been known: I encountered issues with this on an old version of QNX--it'd just randomly fall over). Having said that you might have a software problem, and the only way to know for sure is often to attach a debugger, or to get some logging in there. E.g. if you can always connect successfully, but after a time you stop getting ACKs, even after reconnect, then maybe your server is deadlocking, or getting stuck in a loop or something.
What's really useful is to set up a series of long-running tests under a variety of load conditions, from just sending the keep alive are you there?/ack requests and responses, to absolutely battering the server. This will generally give you more confidence about your software components, and can be really useful in shaking out some really weird problems which won't necessarily cause a problem with your connection, although they might result in problems with the transactions taking place. For example, I was once writing a telecoms application server that provided services such as number translation, and we'd just leave it running for days at a time. The thing was that when Saturday came round, for the whole day, it would reject every call request that came in, which amounted to millions of calls, and we had no idea why. It turned out to be because of a single typo in some date conversion code that only caused a problem on Saturdays.
Hope that helps.
I think the most important idea here is theory vs. practice.
The original theory was that the connections had no lifetimes. If you had a connection, it stayed open forever, even if there was no traffic, until an event caused it to close.
The new theory is that most OS releases have turned on the keep-alive timer. This means that connections will last forever, as long as the system on the other end responds to an occasional TCP-level exchange.
In reality, many connections will be terminated after time, with a variety of criteria and situations.
Two really good examples are: The remote client is using DHCP, the lease expires, and the IP address changes.
Another example is firewalls, which seem to be increasingly intelligent, and can identify keep-alive traffic vs. real data, and close connections based on any high level criteria, especially idle time.
How you want to implement reconnect logic depends a lot on your architecture, the working environment, and your performance goals.
It shouldn't really matter, you should design your code to automatically reconnect if that is the desired behavior.
There really is no way to tell. There is nothing inherent to TCP that would cause the connection to just drop after a certain amount of time. Someone on a reliable connection could have years of uptime, while someone on a different connection could have to reconnect every 5 minutes. There is no way to tell or even guess.
You will need some data going over the connection periodically to keep it alive - many OS's or firewalls will drop an inactive connection.
Pick a value. One drop every hour is probably fine. Ten unexpected connection drops in 5 minutes probably indicates a problem.
TCP connections will generally last about two hours without any traffic. Either end can send keep-alive packets, which are, I think, just an ACK on the last received packet. This can usually be set per socket or by default on every TCP connection.
An application level keep-alive is also possible. For a telnet style protocol like FTP, SMTP, POP or IMAP something like sending return, newline and getting back a command prompt.
Related
The description of the questions goes like this:
Someone recorded all the IP packets of a TCP connection between a client and a server for 30 minutes. In the record, he didn't find any packet that was ACK-only. How is this possible?
What is claimed to be a possible solution: For all the record time, the server sent data to the client, which the client processed, but he didn't send any data back to the server.
I am having trouble understanding how can it be possible.
From what I see, since the client didn't send any data to the server, and there weren't any ACK-only packets in the record, then the server didn't get any ACK from the client. Logically, I would think that since no ACK is received by the server, it will always do re-transmit. But also, since the server doesn't get anything from the client for 30 minutes, which seems like a long time for me, it will conclude that the connection is broken and stop it. (maybe even send an ACK only, but I am not sure about it).
Moreover, from what I know, when using keepalive, the sender gets and ACK-only packet from his peer.
Can anyone help me understand this?
Help would be appreciated
Perhaps more details would be helpful here. What kind of server/client? What protocol is being used and for what purpose?
Is the connection running as expected and this is just viewed as strange traffic you are trying to understand or is the connection timing out?
Some devices or softwares can be set to a "No ACK" state which means that no ACKs are sent nor are they expected.
One reason for this is usually bandwidth. ACKs do consume bandwidth and there are cases where bandwidth is such a great premium that packets being lost is preferable to bandwidth being consumed by ACKs. That type of traffic would probably do better with a UDP protocol but that is a completely different topic.
Another reason is that you don't care if packets are lost. Again, this would be better off as UDP instead of TCP, but someone may be trying to work within strange parameters is bending the rules about what type of traffic to advertise as in order to get around some issue.
Hopefully this is helpful, but if it does not apply, then please put in more details about the connection so that we can better understand what may be happening.
I'm not sure if this is the correct place to ask, so forgive me if it isn't.
I'm writing computer monitoring software that needs to connect to a server. The server may send out relatively urgent messages, such as sound or cancel an alarm, and the client may send out data about the computer, such as screenshots. The data that the client sends isn't too critical on timing, but shouldn't be more than a two minutes late.
It is essential to the software that portforwarding need not be set up, and it is assumed that the internet connection will be done through a wireless router that has NAT almost all the time.
My idea is to have a TCP connection initiated from the client, and use that to transfer data. Ideally, I would have no data being sent when it is not needed, but I believe this to be impossible. Would sending the equivalent of a ping every now and again keep the connection alive, and what sort of bandwidth would it use if this program was running all the time on the computer? In addition, would it be possible to reduce the header size for these keep-alives?
Before I start designing the communication and programming, is this plan for connection flawed? Are there better alternatives?
Thanks!
1) You do not need to send 'ping' data to keep the connection alive, the TCP stack does this automatically; one reason for sending 'ping' data would be to detect a connection close on the client side - typically you only find out something has gone wrong when you try and read/write from the socket. There may be a way to change various time-outs so you can detect this condition faster.
2) In general while TCP provides a stream-oriented error free channel, it makes no guarantees about timeliness, if you are using it on the internet it is even more unpredictable.
3) For applications such as this (I hope you are making it for ethical purposes) - I would tend to use TCP, since you don't want a situation where the client receives a packet to raise an alarm but misses that one that turns it off again.
While reading one of the assignment questions in "Data Communication and Networking" by Behrouz Forouzan, one of the questions asked were using UDP for file-transfer have any adverse effects keeping process crash phenomenon in mind.
The solution to this said that if a process A asked for the file-contents from a server X and soon after the request, A crashed and another process B came up on the same port on the same machine(giving it the same socket address) and sends a request to the same server for another file but the request is lost which makes the server unknown of both the process A crashing and the request being lost and hence, it sends the contents of the file asked by A to B.
Why doesn't this problem occur, in a video-on-demand channel like you-tube or likes?
One of the closest answers I got is this, but it doesn't seem to address my problem:
When is it appropriate to use UDP instead of TCP?
UPDATE: For people who would like to have a read of the question given in the book, I found an online version of the required part, please have a look at the 8th question of the PDF:
http://ceng334.cankaya.edu.tr/uploads/files/file/network%20sample.pdf
In theory the problem could happen but in real life? Not a chance.
Let's say a user wants to stream a video from Youtube with a browser.
Browser must crash - realistically does not happen too often.
New browser instance takes the exact same source UDP port - virtually never happens.
The user decides to look at a different video - makes no sense.
While all this happens, server side does not time out - I don't think so.
This is like arguing that TCP should be used because a packet might get dropped on the wire when two computers are connected back to back with one meter Ethernet cable.
Below is a apache bench run for 10K requests with 50 concurrent threads.
I need help understanding the results, does anything stand out in the results that might be pointing to something blocking and restricting more requests per second?
I'm looking at the connection time section, and see 'waiting' and 'processing'. It shows the mean time for waiting is 208, and the mean time to connect is 0 and processing is 208..yet the total is 208. Can someone explain this to me as it doesn't make much sense to me.
Connect time is time it took ab to establish connection with your server. you are probably running it on same server or within LAN, so your connect time is 0.
Processing time is total time server took to process and send complete response.
Wait time is time between sending request and receiving 1st byte of response.
Again, since you are running on same server, and small size of file, your processing time == wait time.
For real benchmark, try ab from multiple points near your target market to get real idea of latency. Right now all the info you have is the wait time.
This question is getting old, but I've run into the same problem so I might as well contribute an answer.
You might benefit from disabling either TCP nagle on the agent side, or ACK delay on the server side. They can interact badly and cause an unwanted delay. Like me, that's probably why your minimum time is exactly 200ms.
I can't confirm, but my understanding is that the problem is cross-platform since it's part of the TCP spec. It might be just for quick connections with a small amount of data sent and received, though I've seen reports of issues for larger transfers too. Maybe somebody who knows TCP better can pitch in.
Reference:
http://en.wikipedia.org/wiki/TCP_delayed_acknowledgment#Problems
http://blogs.technet.com/b/nettracer/archive/2013/01/05/tcp-delayed-ack-combined-with-nagle-algorithm-can-badly-impact-communication-performance.aspx
I'm trying to improve the performance of a (virtual) web server with a fairly standard CentOS/Apache setup and one thing I noticed is that new connections seem to "stick" in the SYN_RECV state, sometimes for several seconds, before finally being established and handled by Apache.
My first guess was that Apache could be reaching the limit for the number of connections it's prepared to handle simultaneously, but e.g. with keep-alive off netstat is reporting a few established connections (just those not involving localhost, so discarding "housekeeping" connections e.g. between Apache and Tomcat), whereas with keep-alive on it will happily get up to 100+ established connections (but with no clear difference to the SYN_RECV behaviour either way -- there's typically 10-20 connections sitting in SYN_RECV at any one time).
What are people's recommendations for investigating where the bottleneck is that's preventing the connections from being established quickly?
P.S. Follow-on question: does anybody know what a TYPICAL statistic would be for the time for a connection to be established once first "hitting" the server?
Update in case anyone else encounters this: in the end, I wrote a small Java program to take data from /proc/net/tcp and analyse and it appears that this is happening for a small proportion of connections (although that still means that at any one time there can be a number of connections in this state, because they can stay this way for a number of seconds) and looks like an issue local to those connections. Over 90% of connections are still going through in < 500ms and 81% in < 200ms. So if others get this, there isn't necessarily need for panic immediately.
Try capturing a packet trace and see if SYN ACKs are being retransmitted (and the number of re-tx). This could indicate a routing issue (SYN comes in via path A and SYN-ACK goes via path B which is broken).
Also see if these connections have a specific pattern (such as originating from the same network).