I am working on a high-performance TCP server, and I see the server not processing fast enough on and off when I pump high traffic using a TCP client. Upon close inspection, I see spikes in "delta time" on the TCP server. And, I see the server sending an ACK and 0.8 seconds later sending PSH,ACK for the same seqno. I am seeing this pattern multiple times in the pcap. Can experts comment on why the server is sending an ACK followed by a PSH,ACK with a delay in between?
TCP SERVER PCAP
To simplify what ACK and PSH means
ACK will always be present, it simply informs the client what was the last received byte by the server.
PSH tells the client/server to push the bytes to the application layer (the bytes forms a full message).
The usual scenario you are used to, is more or less the following:
The OS has a buffer where it stores received data from the client.
As soon as a packet is received, it is added to the buffer.
The application calls the socket receive method and takes the data out of the buffer
The application writes back data into the socket (response)
the OS sends a packet with flags PSH,ACK
Now imagine those scenarios:
step 4 does not happen (application does not write back any data, or takes too long to write it)
=> OS acknowledge the reception with just an ACK (the packet will not have any data in it), if the application decides later on to send something, it will be sent with PSH,ACK.
the message/data sent by the server is too big to fit in one packet:
the first packets will not have PSH flag, and will only have the ACK flag
the the last packet will have the flags PSH,ACK, to inform the end of the message.
Related
I am using raw sockets to communicate with a TCP server. For the purposes of my project, I need to emulate a TCP timeout.
Whenever a timeout occurs, server re-transmits the first lost packet. On receiving ACK for this packet, the sever re-transmits the second packet and also sends a packet that was previously unseen (due to F-RTO algorithm). In order to stop F-RTO, I need to send duplicate ACK for the later packet.
Lets says the congestion window is 20 at the time of time out. Server will send packet 1 and I will ACK packet 1. Server will then send packet 2 and packet 21. I will ACK packet 2 and send duplicate ACK for packet 21 to stop F-RTO. The problem that I am having is that although client is sending 2 ACKs, for some unknown reasons server is only getting one ACK. As a results it gets stuck in F-RTO.
Wireshark shows client sends multiple duplicate ACKs but from server side I can only see a single ACK. Since the second ACK is duplicate to first one, their fields and checksums are same. Can some one please help me out?
(Original title: "Weird TCP connection close behavior")
I am troubleshooting TCP connection process using Wireshark. Client opens connection to server (I tried two different servers), and starts receiving long stream of data. At some point in time client wants to stop and sends server [FIN, ACK] packet, but server does not stop sending data, it continues till its own full stream end, and then sends its own completion packet [FIN, PSH, ACK]. I figured it out keeping reading data from the client's socket after client sent FIN packet. Also, after client sent this FIN packet, its state is FIN_WAIT, thus waiting for FIN response from server...
Why servers do not stop sending data and respond to FIN packet with acknowledgment with FIN set?
I would expect, after client sends FIN packet, server will still send several packets which were on the fly before it received FIN, but not the whole pack of long data stream!
Edit: reading this I think that web server is stuck in stage "CLOSE-WAIT: The server waits for the application process on its end to signal that it is ready to close" (third row), and its data sending process "is done" when it flushed all contents to the socket at its end, and this process can not be terminated. Weird.
Edit1: it appears my question is a little different one. I need to totally terminate connection at client's side, so that server stops sending data, and it (server) would not go crazy about forceful termination from client's side, and aborted its data sending thread at its side being ready for next connection.
Edit2: environment is HTTP servers.
The client has only shutdown the connection for output, not closed it. So the server is fully entitled to keep sending.
If the client had closed the connection, it would issue an RST in response to any further data received, which would stop the server from sending any more, modulo buffering.
Why servers do not stop sending data and respond to FIN packet with acknowledgment with FIN set?
Why should they? The client has said it won't send another request, but that doesn't mean it isn't interested in the response to any requests it has already sent.
Most protocols, such as HTTP, specify that the server should complete the response to the current request and only then close the connection. This is not an abnormal abort, it's just a promise not to send anything else.
We're seeing this pattern happen a lot between two RHEL 6 boxes that are transferring data via a TCP connection. The client issues a TCP Window Full, 0.2s later the client sends TCP Keep-Alives, to which the server responds with what look like correctly shaped responses. The client is unsatisfied by this however and continues sending TCP Keep-Alives until it finally closes the connection with an RST nearly 9s later.
This is despite the RHEL boxes having the default TCP Keep-Alive configuration:
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75
...which declares that this should only occur until 2hrs of silence. Am I reading my PCAP wrong (relevant packets available on request)?
Below is Wireshark screenshot of the pattern, with my own packet notes in the middle.
Actually, these "keep-alive" packets are not used for TCP keep-alive! They are used for window size updates detection.
Wireshark treats them as keep-alive packets just because these packets look like keep-alive packet.
A TCP keep-alive packet is simply an ACK with the sequence number set to one less than the current sequence number for the connection.
(We assume that ip 10.120.67.113 refers to host A, 10.120.67.132 refers to host B.) In packet No.249511, A acks seq 24507484. In next packet(No.249512), B send seq 24507483(24507484-1).
Why there are so many "keep-alive" packets, what are they used for?
A sends data to B, and B replies zero-window size to tell A that he temporarily can't receive data anymore. In order to assure that A knows when B can receive data again, A send "keep-alive" packet to B again and again with persistence timer, B replies to A with his window size info (In our case, B's window size has always been zero).
And the normal TCP exponential backoff is used when calculating the persist timer. So we can see that A send its first "keep-alive" packet after 0.2s, send its second packet after 0.4s, the third is sent after 0.8, the fouth is sent after 1.6s...
This phenomenon is related to TCP flow control.
The source and destination IP addresses in the packets originating from client do not match the destination and source IP addresses in the response packets, which indicates that there is some device in between the boxes doing NAT. It is also important to understand where the packets have been captured. Probably a packet capture on the client itself will help understand the issue.
Please note that the client can generate TCP keepalive if it does not receive a data packet for two hours or more. As per RFC 1122, the client retries keepalive if it does not receive a keepalive response from the peer. It eventually disconnects after continuous retry failure.
The NAT devices typically implement connection caches to maintain the state of ongoing connections. If the size of the connection reaches limit, the NAT devices drops old connections in order to service the new connections. This could also lead to such a scenario.
The given packet capture indicates that there is a high probability that packets are not reaching the client, so it will be helpful to capture packets on client machine.
I read the trace slightly differently:
Sender sends more data than receiver can handle and gets zerowindow response
Sender sends window probes (not keepalives it is way to soon for that) and the application gives up after 10 seconds with no progress and closes the connection, the reset indicates there is data pending in the TCP sendbuffer.
If the application uses a large blocksize writing to the socket it may have seen no progress for more than the 10 seconds seen in the tcpdump.
If this is a straight connection (no proxies etc.) the most likely reason is that the receiving up stop receiving (or is slower than the sender & data transmission)
It looks to me like packet number 249522 provoked the application on 10.120.67.113 to abort the connection. All the window probes get a zero window response from .132 (with no payload) and then .132 sends (unsolicited) packet 249522 with 63 bytes (and still showing 0 window). The PSH flag suggests that this 63 bytes is the entire data written by the app on .132. Then .113 in the same millisecond responds with an RST. I can't think of any reason why the TCP stack would send a RST immediately after receiving data (sequence numbers are correct). In my view it is almost certain that the app on .113 decided to give up based on the 63 byte message sent by .132.
I am preparing for my university exam and one of the question last year was " how to make UDP multicast reliable " ( like tcp, retransmission of lost packets )
I thought about something like this :
Server send multicast using UDP
Every client send acknowledgement of receiving that packets ( using TCP )
If server realize that not everyone receive packets , it resends multicast or unicast to particular client
The problem are that there might be one client who usually lost packets and force server to resend.
Is it good ?
Every client send acknowledgement of receiving that packets ( using TCP )
Sending an ACK for each packet, and using TCP to do so, is not scalable to a large number of receivers. Using a NACK based scheme is more efficient.
Each packet sent from the server should have a sequence number associated with it. As clients receive them, they keep track of which sequence numbers were missed. If packets are missed, a NACK message can then be sent back to the server via UDP. This NACK can be formatted as either a list of sequence numbers or a bitmap of received / not received sequence numbers.
If server realize that not everyone receive packets , it resends multicast or unicast to particular client
When the server receives a NACK it should not immediately resend the missing packets but wait for some period of time, typically a multiple of the GRTT (Group Round Trip Time -- the largest round trip time among the receiver set). That gives it time to accumulate NACKs from other receivers. Then the server can multicast the missing packets so any clients missing them can receive them.
If this scheme is being used for file transfer as opposed to streaming data, the server can alternately send the file data in passes. The complete file is sent on the first pass, during which any NACKs that are received are accumulated and packets that need to be resent are marked. Then on subsequent passes, only retransmissions are sent. This has the advantage that clients with lower loss rates will have the opportunity to finish receiving the file while high loss receivers can continue to receive retransmissions.
The problem are that there might be one client who usually lost packets and force server to resend.
For very high loss clients, the server can set a threshold for the maximum percentage of packets missed. If a client sends back NACKs in excess of that threshold one or more times (how many times is up to the server), the server can drop that client and either not accept its NACKs or send a message to that client informing it that it was dropped.
There are a number of protocols which implement these features:
UFTP - Encrypted UDP based FTP with multicast (disclosure: author)
NORM - NACK-Oriented Reliable Multicast
PGM - Pragmatic General Multicast
UDPCast
Relevant RFCs:
RFC 4654 - TCP-Friendly Multicast Congestion Control (TFMCC): Protocol Specification
RFC 5401 - Multicast Negative-Acknowledgment (NACK) Building Blocks
RFC 5740 - NACK-Oriented Reliable Multicast (NORM) Transport Protocol
RFC 3208 - PGM Reliable Transport Protocol Specification
To make UDP reliable, you have to handle few things (i.e., implement it yourself).
Connection handling: Connection between the sending and receiving process can drop. Most reliable implementations usually send keep-Alive messages to maintain the connection between the two ends.
Sequencing: Messages need to split into chunks before sending.
Acknowledgement: After each message is received an ACK message needs to be send to the sending process. These ACK messasges can also be sent through UDP, doesn't have to be through UDP. The receiving process might realise that has lost a message. In this case, it will stop delivering the messages from the holdback queue (queue of messages that holds the received messages, it is like a waiting room for messages), and request of a retransmission of the missing message.
Flow control: Throttle the sending of data based on the abilities of the receiving process to deliver the data.
Usually, there is a leader of a group of processes. Each of these groups normally have a leader and a view of the entire group. This is called a virtual synchrony.
So I have a remote device using a Lantronics XPort module connecting to a VPS. They establish a TCP connection and everything is great. The server ACKs everything.
At some point the remote device stops transmitting data. 30 seconds goes by.
The device then starts sending SYN packets as if trying to establish a new connection. The device is configured to maintain a connection to the server, and it always uses the same source port. (I realize this is bad, but it is hard for me to change)
The server sees a SYN packet from the same (source ip, source port), so the server thinks the connection is ESTABLISHED. The server does not respond to the SYN packet.
Why does the server not respond with ACK as described in Figure 10 in RFC 793? ( https://www.ietf.org/rfc/rfc793.txt )
How can I get the server to kill the connection or respond with an ACK?
It could be the case that during that 30 second silence, the device is waiting for an ACK from the server, and that ACK was dropped somewhere along the line. In this case, I think it should retransmit.
Server is running Ubuntu with kernel 3.12.9-x86_64-linode37
Thank you for any help!
My first suggestion is change the client to use the same connection or to gracefully close the connection before re-opening.
As you DO NOT have control over client and all that can do is on server, you can try this:
Configure keep-alive to be sent after 10 secs of silence and probe only once. If client does not respond, server closes the connection. By doing this, the server should be in listening mode again within 10 seconds of silence without client responding. You can play with the following sysctl's and arrive at optimal values.
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 1
======
Also, regarding missing-ack that you have mentioned in your question, TCP takes care of those things. 30 seconds is too long a time for the first re-transmission from sender. If client/device does not get an ack after 30 seconds, it will/should not try to open a new connection. If you are seeing that, it is an insane-TCP stack at the client. What is that device and which OS/TCP-stack is it using?
it is kernel version has different behavior, ignore any syn packet in kernel 3.12.9-x86_64. but server ack a ack packet, client receive the ack resent rst, and sent new syn in kernel 4.9.0.
incoming-tcp-syns-possibilities
TCP packets ignored on ESTABLISHED connection