TCP MSS not enforced - tcp

I am doing some TCP related experiment between two virtualbox VMs. On the client side, I sent out a TCP syn packet with the MSS option of 1400 bytes. However, it seems that the server (sender) ignored this option and sent out a packet with very large payload, something like 10000+ bytes.Why didn't the MSS option honored by the server? BTW, the server is a Nginx server.
Below this some PCAP showing the problem. First is the SYN packet with MSS = 1400.
Second is the payload sent by the server:
As can be seen that the payload size is 11200.
BTW the MTU on the interface is 1500 bytes.
Thanks.

By discussion with Jim.The issue this LRO/GRO. Please turn that off if we want to see the packets as they appears on the wire.

Related

TCP ACK of packets in wireshark

I've noticed in wireshark that I'm able to send 4096 bytes of data to a HTTP webserver (from uploading a file) however the server only seems to be acknowledging data 1460 bytes at a time. Why is this the case?
The size of TCP segments is restricted to the MSS (Maximum Segment Size), which is basically the MTU (Maximum Transmission Unit) less the bytes comprising the IP and TCP overhead. On a typical Ethernet link, the MTU is 1500 bytes and basic IP and TCP headers comprise 20 bytes each, so the MSS is 1460 (1500 - 20 - 20).
If you're seeing packets indicated with a length field of 4096 bytes, then it almost certainly means that you're capturing on the transmitting host and Wireshark is being handed the large packet before it's segmented into 1460 byte chunks. If you were to capture at the receiving side, you would see the individual 1460 byte segments arriving and not a single, large 4096 byte packet.
For further reading, I would encourage you to read Jasper Bongertz's blog titled, "The drawbacks of local packet captures".
TCP by default uses path MTU discovery:
When system send packet to the network it set don't fragment flag (DF) in IP header
When IP router or you local machine see DF packet that should be fragmented to match MTU of the next hop link it sends feedback (RTCP fragmentation need) that contains new MTU
When system receives fragmentation needed ICMP it adjusts MSS and send data again.
This procedure is performed to reduce overall load on the network and increase probability of each packet delivery.
This is why you see 1460 packets.
Regarding to you question: the server only seems to be acknowledging data 1460 bytes at a time. Why is this the case?
TCP keep track window that defines "how many bytes of data you can send without acknowledge". Its purpose is to provide flow control mechanisms (sender can't send too much data that can't be processed) and congestion control mechanisms (sender can't send too much data to overload network). Window is defined by receiver side and may be increased during connection when TCP will estimate real channel bandwidth. So you may see one ACK that acknowledges several packets.

TCP checksum error for fragmented packets

I'm working on a server/client socket application that is using Linux TUN interface.
Server gets packets directly from TUN interface and pass them to clients and clients put received packets directly in the TUN interface.
<Server_TUN---><---Server---><---Clients---><---Client_TUN--->
Sometimes the packets from Server_TUN need to be fragmented in IP layer before transmitting to a client.
So at the server I read a packet from TUN, start fragmenting it in the IP layer and send them via socket to clients.
When the fragmentation logic was implemented, the solution did not work well.
After starting Wireshark on Client_TUN I noticed for all incoming fragmented packets I get TCP Checksum error.
At the given screenshot, frame number 154 is claimed to be reassembled in in 155.
But TCP checksum is claimed to be incorrect!
At server side, I keep tcp data intact and for the given example, while you see the reverse in Wireshark, I've split a packet with 1452 bytes (including IP header) and 30 bytes (Including IP header)
I've also checked the TCP checksum value at the server and its exactly is 0x935e and while I did not think that Checksum offloading matters for incoming packets, I checked offloading at the client and it was off.
$ sudo ethtool -k tun0 | grep ": on"
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: on
generic-segmentation-offload: on
generic-receive-offload: on
tx-vlan-offload: on
tx-vlan-stag-hw-insert: on
Despite that, because of the solution is not working now, I don't think its caused by offload effect.
Do you have any idea why TCP checksum could be incorrect for fragmented packets?
Hopefully I found the issue. It was my mistake. Some tcp data was missing when I was coping buffers. I was tracing on the indexes and lengths but because of the changes in data, checksum value was calculating differently in the client side.

Mechanism for determining MSS during TCP 3-Way Handshake

I'm troubleshooting a MTU/MSS issue that is causing fragmentation over a PPPoE service. Below is a packet dump of a TCP 3-Way Handshake from a different service (that is working as expected) that relates to my question.
I understand the way PMTUD works as this: by setting the Don't Fragment (DF) bit to 1 in the IP header, a router along the path to the destination that requires fragmentation of the packet sends an ICMP back to the host to adjust the MSS size accordingly. However, my understanding is that this will only happen when fragmentation occurs (packets greater than the path MTU). This suggests that PMTUD works during the data exchange phase, NOT when TCP 3-Way Handshake is negotiated (since these are small packets, 78 bytes in this case).
In the above packet capture the SYN packet sends a MSS=1460 (which is too large, due to the 8 byte overhead of PPPoE) and the SYN/ACK response from the server sends back the correct MSS=1452. What mechanism does TCP use to determine the MSS during this exchange?
Maybe, the server hasn't computed the MSS during this three-way handshake. For instance, if the system administrator has observed a lot of fragmentation, he can have set the MSS of the whole system to 1452 (with the command ip tcp adjust-mss 1452), so when you are doing the three-way handshake, the server only advertises its default MSS. Is it applicable to your case?
What you're probably seeing here is the result of what's known as MSS clamping where the network on which the server is attached to modifies the MSS in the outgoing SYN/ACK packets to signal to the sender to use a lower MSS. This is commonly done on networks that perform some form of tunnelling such as PPPoE on ADSL.

Discrepancy between MSS sent by client and MSS received by host

When the client initiates the connection with the SYN bit set, Wireshark (and TCPDump) show the MSS as being 1460. However, when the same packet is delivered to the host, Wireshark (and TCPDump) show the MSS as being 1416.
Can anybody please explain why there's a discrepancy of 44 bytes?
The image below shows the MSS received by the host. Sorry but I don't have a screenshot showing the client's initial SYN 1460 MSS.
During actual data transfer, the 1416 is used as an MSS (1404 for payload and 12 for options such as the TSVal)
My original thought was that it has something to do with Path MTU discovery, and that some space is being reserved for any additional headers that may be added on while the packet is making it's way from the sender to the destination. Am I correct in thinking so? If so, is there a way to find a breakdown of how these are being used?
After consulting the university's network admin, we concluded that that a lower MSS was being imposed by the network for load reasons.

Is MSS value fixed in SYN packet?

I wonder how MSS is set in SYN packet? Is it a fixed value in one operating system or the value could be changed in the same operating system? I know that the value is different in different operating systems. Also is the MSS value in SYN related to hardware configuration?
Thanks.
RFC 879 describes how MSS is used and specified.
In short, MSS is specified during TCP handshake via SYN packet. However, this value can later be changed by OS itself or by setting a protocol option.
You can set option TCP_MAXSEG via setsockopt.
Whilst the value of MSS in SYN and SYNACK packets are set by the initiator and responder side, respectively, a widely used practice known as MSS clamping can result in the MSS being altered by a network element on the path - this is often used to reduce the MSS of all connections going over some sort of tunnelled link. For example PPPoE is commonly used on residential broadband and requires an MTU of 1492 and corresponding IPv4 MSS of 1452 so whilst the SYN may leave your machine with an MSS of 1460 (assuming you're using Ethernet with an MTU of 1500) but once it passes the MSS clamping ISP router the MSS in SYN packet will subsequently be changed to 1452, and likewise for the incoming/responder's SYNACK packet so the connection proceeds with reduced MSS of 1452. This practice seems to be used instead of Path MTU Discovery which relies upon the use of ICMP Fragmentation Needed responses from the network as these can be lost on poorly configured networks and by certain load balancing techniques.

Resources