In the TCP three way handshake connection procedure does the client (the one who initiated the connection) send to the server any data payload joined with the ACK packet in the third step ?
The last ACK in the TCP handshake can already contain a payload. But, this is usually not done since the application first calls connect and then will either wait for the server to reply or send its first data. Since the kernel does not know what the application will do next it will already send out the ACK within the connect so that the server knows as fast as possible that the connection is established.
Depending on your OS it might be possible to change this behavior and send the ACK together with the first data. In Linux this can be achieved by explicitly disabling quick ack before connecting:
int off = 0;
setsockopt(fd, IPPROTO_TCP, TCP_QUICKACK, &off, sizeof(off));
connect(fd,...)
Related
I am working on a high-performance TCP server, and I see the server not processing fast enough on and off when I pump high traffic using a TCP client. Upon close inspection, I see spikes in "delta time" on the TCP server. And, I see the server sending an ACK and 0.8 seconds later sending PSH,ACK for the same seqno. I am seeing this pattern multiple times in the pcap. Can experts comment on why the server is sending an ACK followed by a PSH,ACK with a delay in between?
TCP SERVER PCAP
To simplify what ACK and PSH means
ACK will always be present, it simply informs the client what was the last received byte by the server.
PSH tells the client/server to push the bytes to the application layer (the bytes forms a full message).
The usual scenario you are used to, is more or less the following:
The OS has a buffer where it stores received data from the client.
As soon as a packet is received, it is added to the buffer.
The application calls the socket receive method and takes the data out of the buffer
The application writes back data into the socket (response)
the OS sends a packet with flags PSH,ACK
Now imagine those scenarios:
step 4 does not happen (application does not write back any data, or takes too long to write it)
=> OS acknowledge the reception with just an ACK (the packet will not have any data in it), if the application decides later on to send something, it will be sent with PSH,ACK.
the message/data sent by the server is too big to fit in one packet:
the first packets will not have PSH flag, and will only have the ACK flag
the the last packet will have the flags PSH,ACK, to inform the end of the message.
We're seeing this pattern happen a lot between two RHEL 6 boxes that are transferring data via a TCP connection. The client issues a TCP Window Full, 0.2s later the client sends TCP Keep-Alives, to which the server responds with what look like correctly shaped responses. The client is unsatisfied by this however and continues sending TCP Keep-Alives until it finally closes the connection with an RST nearly 9s later.
This is despite the RHEL boxes having the default TCP Keep-Alive configuration:
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75
...which declares that this should only occur until 2hrs of silence. Am I reading my PCAP wrong (relevant packets available on request)?
Below is Wireshark screenshot of the pattern, with my own packet notes in the middle.
Actually, these "keep-alive" packets are not used for TCP keep-alive! They are used for window size updates detection.
Wireshark treats them as keep-alive packets just because these packets look like keep-alive packet.
A TCP keep-alive packet is simply an ACK with the sequence number set to one less than the current sequence number for the connection.
(We assume that ip 10.120.67.113 refers to host A, 10.120.67.132 refers to host B.) In packet No.249511, A acks seq 24507484. In next packet(No.249512), B send seq 24507483(24507484-1).
Why there are so many "keep-alive" packets, what are they used for?
A sends data to B, and B replies zero-window size to tell A that he temporarily can't receive data anymore. In order to assure that A knows when B can receive data again, A send "keep-alive" packet to B again and again with persistence timer, B replies to A with his window size info (In our case, B's window size has always been zero).
And the normal TCP exponential backoff is used when calculating the persist timer. So we can see that A send its first "keep-alive" packet after 0.2s, send its second packet after 0.4s, the third is sent after 0.8, the fouth is sent after 1.6s...
This phenomenon is related to TCP flow control.
The source and destination IP addresses in the packets originating from client do not match the destination and source IP addresses in the response packets, which indicates that there is some device in between the boxes doing NAT. It is also important to understand where the packets have been captured. Probably a packet capture on the client itself will help understand the issue.
Please note that the client can generate TCP keepalive if it does not receive a data packet for two hours or more. As per RFC 1122, the client retries keepalive if it does not receive a keepalive response from the peer. It eventually disconnects after continuous retry failure.
The NAT devices typically implement connection caches to maintain the state of ongoing connections. If the size of the connection reaches limit, the NAT devices drops old connections in order to service the new connections. This could also lead to such a scenario.
The given packet capture indicates that there is a high probability that packets are not reaching the client, so it will be helpful to capture packets on client machine.
I read the trace slightly differently:
Sender sends more data than receiver can handle and gets zerowindow response
Sender sends window probes (not keepalives it is way to soon for that) and the application gives up after 10 seconds with no progress and closes the connection, the reset indicates there is data pending in the TCP sendbuffer.
If the application uses a large blocksize writing to the socket it may have seen no progress for more than the 10 seconds seen in the tcpdump.
If this is a straight connection (no proxies etc.) the most likely reason is that the receiving up stop receiving (or is slower than the sender & data transmission)
It looks to me like packet number 249522 provoked the application on 10.120.67.113 to abort the connection. All the window probes get a zero window response from .132 (with no payload) and then .132 sends (unsolicited) packet 249522 with 63 bytes (and still showing 0 window). The PSH flag suggests that this 63 bytes is the entire data written by the app on .132. Then .113 in the same millisecond responds with an RST. I can't think of any reason why the TCP stack would send a RST immediately after receiving data (sequence numbers are correct). In my view it is almost certain that the app on .113 decided to give up based on the 63 byte message sent by .132.
So I have a remote device using a Lantronics XPort module connecting to a VPS. They establish a TCP connection and everything is great. The server ACKs everything.
At some point the remote device stops transmitting data. 30 seconds goes by.
The device then starts sending SYN packets as if trying to establish a new connection. The device is configured to maintain a connection to the server, and it always uses the same source port. (I realize this is bad, but it is hard for me to change)
The server sees a SYN packet from the same (source ip, source port), so the server thinks the connection is ESTABLISHED. The server does not respond to the SYN packet.
Why does the server not respond with ACK as described in Figure 10 in RFC 793? ( https://www.ietf.org/rfc/rfc793.txt )
How can I get the server to kill the connection or respond with an ACK?
It could be the case that during that 30 second silence, the device is waiting for an ACK from the server, and that ACK was dropped somewhere along the line. In this case, I think it should retransmit.
Server is running Ubuntu with kernel 3.12.9-x86_64-linode37
Thank you for any help!
My first suggestion is change the client to use the same connection or to gracefully close the connection before re-opening.
As you DO NOT have control over client and all that can do is on server, you can try this:
Configure keep-alive to be sent after 10 secs of silence and probe only once. If client does not respond, server closes the connection. By doing this, the server should be in listening mode again within 10 seconds of silence without client responding. You can play with the following sysctl's and arrive at optimal values.
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 1
======
Also, regarding missing-ack that you have mentioned in your question, TCP takes care of those things. 30 seconds is too long a time for the first re-transmission from sender. If client/device does not get an ack after 30 seconds, it will/should not try to open a new connection. If you are seeing that, it is an insane-TCP stack at the client. What is that device and which OS/TCP-stack is it using?
it is kernel version has different behavior, ignore any syn packet in kernel 3.12.9-x86_64. but server ack a ack packet, client receive the ack resent rst, and sent new syn in kernel 4.9.0.
incoming-tcp-syns-possibilities
TCP packets ignored on ESTABLISHED connection
Four-way handshake connection termination can be reduced to three-way and even two way one. Is it possible the three-way handshake connection establishment would be extended to four-way?
SYN=>
<=ACK
<=SYN
ACK=>
Given the semantics of SYN and ACK it should be possible to send SYN+ACK in different packets and those delay the handshake. E.g. client sends a SYN, server replies with an ACK to acknowledge the wish of the client for a new connection, but it does not grant the wish yet. Later the server sends a SYN and gets the matching ACK back from the client and the connection is established. But I doubt that anybody does connection establishment this way and it might be, that some OS will croak on it.
But, there is another scenario for a four-way-handshake, however with a different ordering of the packets. It could happen, if both side try to establish a connection to the other side at the same time, e.g. both send a SYN to the peer, and get an ACK back. It is described in the RFC 793 (TCP) section 3.4. But I doubt you will ever see such a handshake, because it does not fit into the typical client-server-scenario where one end is waiting for connects and the other end does the connect.
Edit: the handshake you envision exists and it is called "split handshake". See http://hackmageddon.com/2011/04/17/tcp-split-handshake-attack-explained/ . And like I expected, it is not universally supported.
Iam trying to create an iterative server based on datagram sockets (UDP).
It calls connect to the first client which it gets from the first recvfrom() call (yes I know this is no real connect).
After having served this client, I disconnect the UDP socket (calling connect with AF_UNSPEC)
Then I call recvfrom() to get the first packet from the next client.
Now the problem is, that the call of recvfrom() in the second iteration of the loop is returning 0. My clients never send empty packets, so what could be going on.
This is what Iam doing (pseudocode):
s = socket(PF_INET, SOCK_DGRAM, 0)
bind(s)
for(;;)
{
recvfrom(s, header, &client_address) // get first packet from client
connect(s,client_address) // connect to this client
serve_client(s);
connect(s, AF_UNSPEC); // disconnect, ready to serve next client
}
EDIT: I found the bug in my client accidently sending an empty packet.
Now my problem is how to make the client wait to get served instead of sending a request into nowhere (server is connected to another client and doesn't serve any other client yet).
connect() is really completely unnecessary on SOCK_DGRAM.
Calling connect does not stop you receiving packets from other hosts, nor does it stop you sending them. Just don't bother, it's not really helpful.
CORRECTION: yes, apparently it does stop you receiving packets from other hosts. But doing this in a server is a bit silly because any other clients would be locked out while you were connect()ed to one. Also you'll still need to catch "chaff" which float around. There are probably some race conditions associated with connect() on a DGRAM socket - what happens if you call connect and packets from other hosts are already in the buffer?
Also, 0 is a valid return value from recvfrom(), as empty (no data) packets are valid and can exist (indeed, people often use them). So you can't check whether something has succeeded that way.
In all likelihood, a zero byte packet was in the queue already.
Your protocol should be engineered to minimise the chance of an errant datagram being misinterpreted; for this reason I'd suggest you don't use empty datagrams, and use a magic number instead.
UDP applications MUST be capable of recognising "chaff" packets and dropping them; they will turn up sooner or later.
man connect:
...
If the initiating socket is not connection-mode, then connect()
shall set the socket’s peer address, and no connection is made.
For SOCK_DGRAM sockets, the peer address identifies where all
datagrams are sent on subsequent send() functions, and limits
the remote sender for subsequent recv() functions. If address
is a null address for the protocol, the socket’s peer address
shall be reset.
...
Just a correction in case anyone stumbples across this like I did. To disconnect connect() needs to be called with the sa_family member of sockaddr set to AF_UNSPEC. Not just passed AF_UNSPEC.