I have a question regarding TCP receive window size.
Here is a example from the application Wireshark:
client A :syn, win=8192 , ws=4 ====>
<===== client B: syn, ack win 5840 , ws=128
client A : Ack win=65700 ,
How did we obtain 65700 (from 8192 to 65700) in the three way handshake ?
And how does the ws get negotiated?
The TCP receive Window size is not negotiated. It's just send to the other host. Each host uses it's own receive window size so the other host knows when to expect an ACK.
client A : >>Ack<< win=65700 , // just the confirmation
Related
I’d like to reopen a previous issue that was incorrectly classified as a network engineering problem and after more test, I think it’s a real issue for programmers.
So, my application streams mp3 files from a server. I can’t modify the server. The client reads data from the server as needed which is 160kbits/s and feeds it to a DAC. Let’s use a file the file of 3.5MB.
When the server is done sending last byte, it closes the connection, so it sends a FIN, seems normal practice.
The problem is that the kernel, especially on Windows, seems to store 1 to 3 MB of data, I assume TCP window size has fully opened.
After a few seconds, the server has sent the whole 3.5 MB and about 3MB sit inside the kernel buffer. At this point the server has sent FIN which is ACK in due time.
From a client point of view, it continues reading data by chunk of 20kB and will do that for the next 3MB/20 ~= 150s before it sees the EOF.
Meanwhile the server is in FIN_WAIT_2 (and not TIME_WAIT as I initially wrote, thank to Steffen for correcting me. Now, OS like Windows seems to have a half-closed socket timer that starts with sending their FIN and be as small as 120s, regardless of the actual TCPWindowsize BTW). Of course after 120s it considers that it should have received a client’s FIN, so it sends a RST. That RST cause all client’s kernel buffer to be discarded and the application fails.
As code is required, here is:
int sock = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr;
addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
addr.sin_family = AF_INET;
addr.sin_port = htons(80);
int res = connect(sock, (const struct sockaddr*) & addr, sizeof(addr));
char* get = "GET /data-3 HTTP/1.0\n\r"
"User-Agent: mine\n\r"
"Host: localhost\n\r"
"Connection: close\n\r"
"\n\r\n\r";
bytes = send(sock, get, strlen(get), 0);
printf("send %d\n", bytes);
char *buf = malloc(20000);
while (1) {
int n = recv(sock, buf, 20000, 0);
if (n == 0) {
printf(“normal eof at %d”, bytes);
close(sock);
break;
}
if (n < 0) {
printf(“error at %d”, bytes);
exit(1);
}
bytes += n;
Sleep(n*1000/(160000/8));
}
free(buf);
closesocket(sock);
It can be tested with any HTTP server.
I know there are solutions by having a handshake with the server before it closes the socket (but server is just an HTTP server) but the kernel level of buffering make that a systematic failure when its buffer are larger than the time to consume them.
The client is perfectly real time in absorbing data. Having a larger client buffer or no buffer at all does not change the issue which seems a system design flaw to me, unless there is possibility to either control kernel buffers, at the application level, not the whole OS, or detect a FIN reception at client level before the EOF of recv(). I’ve tried to change SO_RCVBUF but it does not seems to influence logically this level of kernel buffering.
Here is a capture of one successful and one failed exchange
success
3684 381.383533 192.168.6.15 192.168.6.194 TCP 54 [TCP Retransmission] 9000 → 52422 [FIN, ACK] Seq=9305427 Ack=54 Win=262656 Len=0
3685 381.387417 192.168.6.194 192.168.6.15 TCP 60 52422 → 9000 [ACK] Seq=54 Ack=9305428 Win=131328 Len=0
3686 381.387417 192.168.6.194 192.168.6.15 TCP 60 52422 → 9000 [FIN, ACK] Seq=54 Ack=9305428 Win=131328 Len=0
3687 381.387526 192.168.6.15 192.168.6.194 TCP 54 9000 → 52422 [ACK] Seq=9305428 Ack=55 Win=262656 Len=0
failed
5375 508.721495 192.168.6.15 192.168.6.194 TCP 54 [TCP Retransmission] 9000 → 52436 [FIN, ACK] Seq=5584802 Ack=54 Win=262656 Len=0
5376 508.724054 192.168.6.194 192.168.6.15 TCP 60 52436 → 9000 [ACK] Seq=54 Ack=5584803 Win=961024 Len=0
6039 628.728483 192.168.6.15 192.168.6.194 TCP 54 9000 → 52436 [RST, ACK] Seq=5584803 Ack=54 Win=0 Len=0
Here is what I think is the cause, thanks very much to Steffen for putting me on the right track.
an mp3 file is 3.5 MB at 160 kbits/s = 20 kB/s
a client reads it at the exact required speed, 20kB/sec, let's say one recv() of 20kB per second, no pre-buffering for simplicity
some OS, like Windows, can have very large TCP kernel buffer (about 3MB or more) and with a fast connection, the TCP windows size is widely open
in a matter of seconds, the whole file is sent to the client, let's say that about 3MB are in the kernel buffers
as far as the server is concerned, all has been sent and acknowledge, so it does a close()
the close() sends a FIN to the client which responds by an ACK and the server enters FIN_WAIT_2 state
BUT, at that point from a client point of view, all recv() will have plenty of read for the next 150 s before it sees the eof!
so client, will not do a close() and thus will not send a FIN
the server is in FIN_WAIT_2 state and according to the TCP specs, it should stay like that forever
now, various OS (Windows at least) start a timer similar to TIME_WAIT (120s) when starting a close(), or when receiving the ACK of their FIN, that I don't know (in fact Windows has a specific registry entry for that, AFAIK). This is to more aggressively deal with half-closed sockets.
of course, after 120s, the server has not seen a client's FIN and sends a RST
that RST is received by the client and causes an error there and all the remaining data in the TCP buffers to be discarded and lost
of course, not of that happens with high bitrate formats as the client consumes data fast enough so that the kernel TCP buffers are never idle for 120s and it might not happen for low bit rate when the application buffering system reads it all. It has to be the bad combination of bitrate, file size and kernel's buffers... hence it does not happen all the time.
That's it. That can be reproduced with a few lines of code and every HTTP server. This can be debated, but I see that as a systemic OS issue. Now, the solution that seems to work is to force client's receive buffers (SO_RCVBUF) to a lower level so that the server has little chances to have sent all data and that data sits in client's kernel buffers for too long. Note that this can still happen though if the buffer is 20kB and the client consumes it at 1B/s... hence I call it a systemic failure instead. Now I agree that some will see that as an application issue
I am doing some TCP related experiment between two virtualbox VMs. On the client side, I sent out a TCP syn packet with the MSS option of 1400 bytes. However, it seems that the server (sender) ignored this option and sent out a packet with very large payload, something like 10000+ bytes.Why didn't the MSS option honored by the server? BTW, the server is a Nginx server.
Below this some PCAP showing the problem. First is the SYN packet with MSS = 1400.
Second is the payload sent by the server:
As can be seen that the payload size is 11200.
BTW the MTU on the interface is 1500 bytes.
Thanks.
By discussion with Jim.The issue this LRO/GRO. Please turn that off if we want to see the packets as they appears on the wire.
I am using tcpdump/wireshark to capture tcp packets while tcp client sending data to tcp server. The client simply sends 4096 bytes to server in one "send()" call. And I get different tcp packets on two sides, two packets on the sender side seem to be "compacted" on the receiver side, this conflicts with how i understand the tcp protocol and I stuck on this issue for a few days and really need some help.
Please notice the packet length in following packets:
client (sender) sends 2 packets 0Xbcac (4) and 0xbcae (5), sends 2896 + 1200 = 4096 bytes in all.
(0xbcac) 4 14:31:33.838305 192.168.91.194 192.168.91.193 TCP 2962 59750 > 9877 [ACK] Seq=1 Ack=1 Win=14720 **Len=2896** TSval=260728 TSecr=3464603 0
(0xbcae) 5 14:31:33.838427 192.168.91.194 192.168.91.193 TCP 1266 59750 > 9877 [PSH, ACK] Seq=2897 Ack=1 Win=14720 **Len=1200** TSval=260728 TSecr=3464603 0
However on the server (receiver) side, only one packet is presented, with ip.id=0xbcac and length = 4096 (receiver.packet.0xbcac = sender.packet.0xbcac + 0xbcae):
(0xbcac) 4 14:31:33.286296 192.168.91.194 192.168.91.193 TCP 4162 59750 > 9877 [PSH, ACK] Seq=1 Ack=1 Win=14720 **Len=4096** TSval=260728 TSecr=3464603 0
I'm aware that tcp is a stream protocol and data sent can be divided into packets according to MSS (or MTU), but i guess the division happens before packets are sent to NIC, thus before captured. I'm also aware that the PSH flag in packet 0xbcae lead to writing data from buffer to NIC, but that cannot explain the "compacted" packet. Also I tried in client to send 999999 bytes in one "send" call and the data are divided into small packets and sent, but still mismatch the packets captured on server side. At last I disable tcp nagle, get the same result, and ruled out that reason.
So my question is the mismatching i encountered normal? If it is, what caused this? If not, i'm using ubuntu 12.04 and ubuntu 13.10 in LAN, and what is the possible reason to this "compacted" packet?
Thanks in advance for any help!
two packets on the sender side seem to be "compacted" on the receiver
side
It looks like a case of generic receive offload or large receive offload. Long story short, the receiving network card does some smart stuff and coalesces segments before they hit the kernel, which improves performance.
To check if this is the case you can try to disable it using:
$ ethtool -K eth0 gro off
$ ethtool -K eth0 lro off
Something complementary happens on the sending side: tcp segmentation offload or generic segmentation offload.
After disabling these don't forget to reenable them: they seriously improve performance.
*Tcp retransmists without connection establishment(syn, syn ack , ack) after a reset packet*
I observed while using a application , i got a reset(RST,ACK)packet. I know that reset packet sent doesnot mean to close the connection but to retry the connection again.
But why the tcp connection which tried to retransmit packets again has no syn , synack and ack ?
You are mistaken. RST means 'connection reset', usually that from the point of view of the sender the connection no longer exists, or never did. In the context of the connection handshake is is emitted by Windows platforms when the backlog queue is full, and in that context it is interpreted by Windows clients as 'retry the connection'. But in any case there is no connection, so there is no SYN or ACK.
I'm confused about the ACK and SEQ numbers in TCP packets right after the 3-way-handshake. I thought the ACK number is the next expected SEQ number.
So when I analyse a TCP connection in Wireshark it says
TCP SYN with SEQ=0
TCP SYN ACK with SEQ 0, ACK=1 (clear, server expects SEQ 1 in next packet)
TCP ACK with SEQ 1, ACK=1 (clear, sender expects SEQ 1 in next packet)
HTTP Request: TCP PSH ACK with SEQ 1, ACK=1
The last line is unclear. I would say the sender expects SEQ=2 next, so it should be ACK=2? There was already a packet with SEQ=1 from the server, why does the sender want another one?
Can someone explain this to me?
Well, in the handshake the client receives only one packet from the server: SEQ=0 and ACK=1. With this information, the server tells the client 'I am waiting for a packet with SEQ=1 now'. You got this right.
Now, in the last part of the handshake the client sends a SEQ=1 and ACK=1, what basically means the same thing as from the server: 'I am waiting for your packet with SEQ=1 now'
But: After a TCP handshake, the client will usually not wait for this packet to be acked, but rather send the first data packets already (in fact, data may already be contained within the last packet of the handshake - I assume this is the case in your example, because the HTTP request has the same SEQ as the last handshake packet). So any next packet again has ACK=1. But why? It again says 'I am waiting for a packet with SEQ=1 from you'. And this completely makes sense: The last packet the client received from the server had SEQ=0 (in the handshake). Also keep in mind, that both client and server have independent SEQs. That means, that the client could send 100 packets. As long as the server did not send one, the client would still be waiting for ACK=1, because the last packet he received from the server hat SEQ=0
Another Edit:
To really understand what is going on, you might want to choose an example with different beginning SEQs (I already wrote it, SEQs of server and client are independent):
Client -> SYN, SEQ=100
Client <- SYN, ACK, SEQ=700, ACK=101 <- Server
Client -> ACK = 701, SEQ=101 [50 Bytes of data]
Client -> ACK = 701 [Again, didn't receive sth from server yet], SEQ = 151
Acknowledgment number (32 bits) – if the ACK flag is set then the value of this field is the next sequence number that the receiver is expecting. This acknowledges receipt of all prior bytes (if any). The first ACK sent by each end acknowledges the other end's initial sequence number itself, but no data.
So they just acknowledge where they should start from.
TCP on Wikipedia
The answer to your question is actually quite simple. I can see that you have no problem understanding the three-way handshake procedure I'll assume that you already know client and server count the Sequence Number separately and independently, but please notice: a pure ACK packet does not contribute to Sequence Number:
SEG.LEN = the number of octets occupied by the data in the segment
(counting SYN and FIN)
See RFC 793
Therefore, the second packet [SYN, ACK] does increase the Sequence Number by 1, but the third packet [ACK] does not affect the Sequence Number, which is the reason why next packet still has Seq=1.
I'll show you a sample TCP connection in SMTP to further illustrate it:
Client -> Server: [SYN] Seq=0 Len=0
Server -> Client: [SYN, ACK] Seq=0 Ack=1 Len=0
Client -> Server: [ACK] Seq=1 Ack=1 Len=0
Server -> Client: [PSH, ACK] Seq=1 Ack=1 Len=46
Client -> Server: [ACK] Seq=1 Ack=47 Len=0
Client -> Server: [PSH, ACK] Seq=1 Ack=47 Len=24
Server -> Client: [PSH, ACK] Seq=47 Ack=25 Len=63
You can see that after the connection was made, the Client actually sent three consecutive packets with Seq=1! This is because the pure ACK packet does not increase the Sequence Number, which kind of makes sense because you're not transferring real data.
In a summary, the Sequence Number does not always increase after you sent off packets. As a rule of thumb, you just need to look at the len and check SYN FIN flags to determine whether the Sequence Number should be increased.