I'm troubleshooting some communications issues and in the network traces I am occasionally coming across TCP sequence errors. One example I've got is:
Server to Client: Seq=3174, Len=50
Client to Server: Ack=3224
Server to Client: Seq=3224, Len=50
Client to Server: Ack=3224
Server to Client: Seq=3274, Len=10
Client to Server: Ack=3224, SLE=3274, SRE=3284
Packets 4 & 5 are recorded in the trace (which is from a router in between the client and server) at almost exactly the same time so they most likely crossed in transit.
The TCP session has got out of sync with the client missing the last two transmissions from the server. Those two packets should have been retransmitted but they weren't, the next log in the trace is a RST packet from the Client 24 seconds after packet 6.
My question is related to what could be responsible for the failure to retransmit the server data from packets 3 & 5? I would assume that the retransmit would be at the operating system level but is there anyway the application could influence it and stop it being sent? A thread blocking or put to sleep or something like that?
Only one packet has been lost from server to client - packet 3. Packet 6 contains a selective acknowledgement (SACK) for packet 5, so that got through.
In answer to your specific question, no, application-level issues shouldn't prevent TCP retransmissions.
Related
I am using raw sockets to communicate with a TCP server. For the purposes of my project, I need to emulate a TCP timeout.
Whenever a timeout occurs, server re-transmits the first lost packet. On receiving ACK for this packet, the sever re-transmits the second packet and also sends a packet that was previously unseen (due to F-RTO algorithm). In order to stop F-RTO, I need to send duplicate ACK for the later packet.
Lets says the congestion window is 20 at the time of time out. Server will send packet 1 and I will ACK packet 1. Server will then send packet 2 and packet 21. I will ACK packet 2 and send duplicate ACK for packet 21 to stop F-RTO. The problem that I am having is that although client is sending 2 ACKs, for some unknown reasons server is only getting one ACK. As a results it gets stuck in F-RTO.
Wireshark shows client sends multiple duplicate ACKs but from server side I can only see a single ACK. Since the second ACK is duplicate to first one, their fields and checksums are same. Can some one please help me out?
I am very new to networking, so this might sound simple. Though I have tried to look here and here and here and have got few basics of TCP, there are few questions whose answers I am not certain about.
Is a request and response part of 2 different TCP establishments. To explain that :
Is a connection established, kept alive until all packets are delivered, request sent and connection closed for each request and same happens for its response.
or
A connection is opened, request sent, connection kept alive, response arrives and connection closed.
Is the ACK number always 1 + sequence number of sent segment.
Is a request and response part of 2 different TCP establishments
You need just 3 packets to handshake and establish a bidirectional TCP connection. So no, you do not establish TCP connection for receiving and sending parts.
On the other hand, there is a sutdown() system call which allows to shutdown a part of the bidirectional connection. See man shutdown(2). So there is a possibility to establish a unidirectional connection by opening a bidirectional and then shutdown one of the sides.
Is the ACK number always 1 + sequence number of sent segment.
We usually do not send ACK for every received packet. There are also selective ACKs, retransmissions etc. So in general, the answer is no, the ACK number is not always seq + 1.
On the other hand, if you are sending a small amount of data and waiting for the confirmation, no errors or packet loss occurred, most probably there will be just one packet with that data and one ACK with seq + 1.
Hope that helps.
(Original title: "Weird TCP connection close behavior")
I am troubleshooting TCP connection process using Wireshark. Client opens connection to server (I tried two different servers), and starts receiving long stream of data. At some point in time client wants to stop and sends server [FIN, ACK] packet, but server does not stop sending data, it continues till its own full stream end, and then sends its own completion packet [FIN, PSH, ACK]. I figured it out keeping reading data from the client's socket after client sent FIN packet. Also, after client sent this FIN packet, its state is FIN_WAIT, thus waiting for FIN response from server...
Why servers do not stop sending data and respond to FIN packet with acknowledgment with FIN set?
I would expect, after client sends FIN packet, server will still send several packets which were on the fly before it received FIN, but not the whole pack of long data stream!
Edit: reading this I think that web server is stuck in stage "CLOSE-WAIT: The server waits for the application process on its end to signal that it is ready to close" (third row), and its data sending process "is done" when it flushed all contents to the socket at its end, and this process can not be terminated. Weird.
Edit1: it appears my question is a little different one. I need to totally terminate connection at client's side, so that server stops sending data, and it (server) would not go crazy about forceful termination from client's side, and aborted its data sending thread at its side being ready for next connection.
Edit2: environment is HTTP servers.
The client has only shutdown the connection for output, not closed it. So the server is fully entitled to keep sending.
If the client had closed the connection, it would issue an RST in response to any further data received, which would stop the server from sending any more, modulo buffering.
Why servers do not stop sending data and respond to FIN packet with acknowledgment with FIN set?
Why should they? The client has said it won't send another request, but that doesn't mean it isn't interested in the response to any requests it has already sent.
Most protocols, such as HTTP, specify that the server should complete the response to the current request and only then close the connection. This is not an abnormal abort, it's just a promise not to send anything else.
We're seeing this pattern happen a lot between two RHEL 6 boxes that are transferring data via a TCP connection. The client issues a TCP Window Full, 0.2s later the client sends TCP Keep-Alives, to which the server responds with what look like correctly shaped responses. The client is unsatisfied by this however and continues sending TCP Keep-Alives until it finally closes the connection with an RST nearly 9s later.
This is despite the RHEL boxes having the default TCP Keep-Alive configuration:
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75
...which declares that this should only occur until 2hrs of silence. Am I reading my PCAP wrong (relevant packets available on request)?
Below is Wireshark screenshot of the pattern, with my own packet notes in the middle.
Actually, these "keep-alive" packets are not used for TCP keep-alive! They are used for window size updates detection.
Wireshark treats them as keep-alive packets just because these packets look like keep-alive packet.
A TCP keep-alive packet is simply an ACK with the sequence number set to one less than the current sequence number for the connection.
(We assume that ip 10.120.67.113 refers to host A, 10.120.67.132 refers to host B.) In packet No.249511, A acks seq 24507484. In next packet(No.249512), B send seq 24507483(24507484-1).
Why there are so many "keep-alive" packets, what are they used for?
A sends data to B, and B replies zero-window size to tell A that he temporarily can't receive data anymore. In order to assure that A knows when B can receive data again, A send "keep-alive" packet to B again and again with persistence timer, B replies to A with his window size info (In our case, B's window size has always been zero).
And the normal TCP exponential backoff is used when calculating the persist timer. So we can see that A send its first "keep-alive" packet after 0.2s, send its second packet after 0.4s, the third is sent after 0.8, the fouth is sent after 1.6s...
This phenomenon is related to TCP flow control.
The source and destination IP addresses in the packets originating from client do not match the destination and source IP addresses in the response packets, which indicates that there is some device in between the boxes doing NAT. It is also important to understand where the packets have been captured. Probably a packet capture on the client itself will help understand the issue.
Please note that the client can generate TCP keepalive if it does not receive a data packet for two hours or more. As per RFC 1122, the client retries keepalive if it does not receive a keepalive response from the peer. It eventually disconnects after continuous retry failure.
The NAT devices typically implement connection caches to maintain the state of ongoing connections. If the size of the connection reaches limit, the NAT devices drops old connections in order to service the new connections. This could also lead to such a scenario.
The given packet capture indicates that there is a high probability that packets are not reaching the client, so it will be helpful to capture packets on client machine.
I read the trace slightly differently:
Sender sends more data than receiver can handle and gets zerowindow response
Sender sends window probes (not keepalives it is way to soon for that) and the application gives up after 10 seconds with no progress and closes the connection, the reset indicates there is data pending in the TCP sendbuffer.
If the application uses a large blocksize writing to the socket it may have seen no progress for more than the 10 seconds seen in the tcpdump.
If this is a straight connection (no proxies etc.) the most likely reason is that the receiving up stop receiving (or is slower than the sender & data transmission)
It looks to me like packet number 249522 provoked the application on 10.120.67.113 to abort the connection. All the window probes get a zero window response from .132 (with no payload) and then .132 sends (unsolicited) packet 249522 with 63 bytes (and still showing 0 window). The PSH flag suggests that this 63 bytes is the entire data written by the app on .132. Then .113 in the same millisecond responds with an RST. I can't think of any reason why the TCP stack would send a RST immediately after receiving data (sequence numbers are correct). In my view it is almost certain that the app on .113 decided to give up based on the 63 byte message sent by .132.
So I have a remote device using a Lantronics XPort module connecting to a VPS. They establish a TCP connection and everything is great. The server ACKs everything.
At some point the remote device stops transmitting data. 30 seconds goes by.
The device then starts sending SYN packets as if trying to establish a new connection. The device is configured to maintain a connection to the server, and it always uses the same source port. (I realize this is bad, but it is hard for me to change)
The server sees a SYN packet from the same (source ip, source port), so the server thinks the connection is ESTABLISHED. The server does not respond to the SYN packet.
Why does the server not respond with ACK as described in Figure 10 in RFC 793? ( https://www.ietf.org/rfc/rfc793.txt )
How can I get the server to kill the connection or respond with an ACK?
It could be the case that during that 30 second silence, the device is waiting for an ACK from the server, and that ACK was dropped somewhere along the line. In this case, I think it should retransmit.
Server is running Ubuntu with kernel 3.12.9-x86_64-linode37
Thank you for any help!
My first suggestion is change the client to use the same connection or to gracefully close the connection before re-opening.
As you DO NOT have control over client and all that can do is on server, you can try this:
Configure keep-alive to be sent after 10 secs of silence and probe only once. If client does not respond, server closes the connection. By doing this, the server should be in listening mode again within 10 seconds of silence without client responding. You can play with the following sysctl's and arrive at optimal values.
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 1
======
Also, regarding missing-ack that you have mentioned in your question, TCP takes care of those things. 30 seconds is too long a time for the first re-transmission from sender. If client/device does not get an ack after 30 seconds, it will/should not try to open a new connection. If you are seeing that, it is an insane-TCP stack at the client. What is that device and which OS/TCP-stack is it using?
it is kernel version has different behavior, ignore any syn packet in kernel 3.12.9-x86_64. but server ack a ack packet, client receive the ack resent rst, and sent new syn in kernel 4.9.0.
incoming-tcp-syns-possibilities
TCP packets ignored on ESTABLISHED connection