Maximum packet size for a TCP connection - networking

What is the maximum packet size for a TCP connection or how can I get the maximum packet size?

The absolute limitation on TCP packet size is 64K (65535 bytes), but in practicality this is far larger than the size of any packet you will see, because the lower layers (e.g. ethernet) have lower packet sizes.
The MTU (Maximum Transmission Unit) for Ethernet, for instance, is 1500 bytes. Some types of networks (like Token Ring) have larger MTUs, and some types have smaller MTUs, but the values are fixed for each physical technology.

This is an excellent question and I run in to this a lot at work actually. There are a lot of "technically correct" answers such as 65k and 1500. I've done a lot of work writing network interfaces and using 65k is silly, and 1500 can also get you in to big trouble. My work goes on a lot of different hardware / platforms / routers, and to be honest the place I start is 1400 bytes. If you NEED more than 1400 you can start to inch your way up, you can probably go to 1450 and sometimes to 1480'ish? If you need more than that then of course you need to split in to 2 packets, of which there are several obvious ways of doing..
The problem is that you're talking about creating a data packet and writing it out via TCP, but of course there's header data tacked on and so forth, so you have "baggage" that puts you to 1500 or beyond.. and also a lot of hardware has lower limits.
If you "push it" you can get some really weird things going on. Truncated data, obviously, or dropped data I've seen rarely. Corrupted data also rarely but certainly does happen.

At the application level, the application uses TCP as a stream oriented protocol. TCP in turn has segments and abstracts away the details of working with unreliable IP packets.
TCP deals with segments instead of packets. Each TCP segment has a sequence number which is contained inside a TCP header.
The actual data sent in a TCP segment is variable.
There is a value for getsockopt that is supported on some OS that you can use called TCP_MAXSEG which retrieves the maximum TCP segment size (MSS). It is not supported on all OS though.
I'm not sure exactly what you're trying to do but if you want to reduce the buffer size that's used you could also look into: SO_SNDBUF and SO_RCVBUF.

There're no packets in TCP API.
There're packets in underlying protocols often, like when TCP is done over IP, which you have no interest in, because they have nothing to do with the user except for very delicate performance optimizations which you are probably not interested in (according to the question's formulation).
If you ask what is a maximum number of bytes you can send() in one API call, then this is implementation and settings dependent. You would usually call send() for chunks of up to several kilobytes, and be always ready for the system to refuse to accept it totally or partially, in which case you will have to manually manage splitting into smaller chunks to feed your data into the TCP send() API.

According to http://en.wikipedia.org/wiki/Maximum_segment_size, the default largest size for a IPV4 packet on a network is 536 octets (bytes of size 8 bits). See RFC 879

Generally, this will be dependent on the interface the connection is using. You can probably use an ioctl() to get the MTU, and if it is ethernet, you can usually get the maximum packet size by subtracting the size of the hardware header from that, which is 14 for ethernet with no VLAN.
This is only the case if the MTU is at least that large across the network. TCP may use path MTU discovery to reduce your effective MTU.
The question is, why do you care?

If you are with Linux machines, "ifconfig eth0 mtu 9000 up" is the command to set the MTU for an interface. However, I have to say, big MTU has some downsides if the network transmission is not so stable, and it may use more kernel space memories.

One solution can be to set socket option TCP_MAXSEG (http://linux.die.net/man/7/tcp) to a value that is "safe" with underlying network (e.g. set to 1400 to be safe on ethernet) and then use a large buffer in send system call.
This way there can be less system calls which are expensive.
Kernel will split the data to match MSS.
This way you can avoid truncated data and your application doesn't have to worry about small buffers.

It seems most web sites out on the internet use 1460 bytes for the value of MTU. Sometimes it's 1452 and if you are on a VPN it will drop even more for the IPSec headers.
The default window size varies quite a bit up to a max of 65535 bytes. I use http://tcpcheck.com to look at my own source IP values and to check what other Internet vendors are using.

The packet size for a TCP setting in IP protocol(Ip4). For this field(TL), 16 bits are allocated, accordingly the max size of packet is 65535 bytes: IP protocol details

Related

UDP and TCP/IP packet size in Toit

While experimenting with a UDP server that runs on esp32, I found that the size of the received packet is limited to 1500 bytes: 20 (IP header) + 8 (UDP header) + 1472 (data), (although in theory UDP as if can support packets data up to 64K). This means that in order to transfer a larger amount of data, the client must split it into several chunks and send them one after the other, and on the server side, this data will need to be restored. I think that the overhead of such a solution will be quite high. I also know that Toit provides TCP/IP connection. Naturally, the packet size is also limited in the case of TCP/IP. This is 64K (65535 bytes). Does Toit have any additional restrictions on the TCP/IP connection, or 64K value is fact also for Toit?
As described in this question/answer, it's a matter of avoiding packet fragmentation. Sending packages above this size will force the system to split them up into multiple fragments of size MTU, with each of them being individually unreliable. As memory is already very limited on embedded systems, sending large (> MTU) packages where all fragments has to arrive before it can be processed, can be very unfortunate for the overall application behavior as it can time out or go out-of-memory.
Instead the application should look at a streaming pipeline (perhaps even TCP to handle the unreliable aspects as well).
As TCP/IP is a streaming protocol, any sized "packages" can be sent, as they are automatically split into fragments of size MTU. Note that the data is received in "random"-sized packages, though the order of the bytes is fully preserved.

Should I use UDP or TCP in this case?

P2P Network:
Largest message is about 300KB. Most of the messages are smaller (5-50kb). It is perfectly OK if they do not receive the messages, as they will initiate bootstrap (re-send).
I am leaning towards UDP, and you guessed it, its a blockchain software! However, our current design is TCP.
The largest size of a UDP packet is 65,535 bytes (including an 8 byte UDP header and 20 byte IP header), so for your largest messages you would have to implement a form of "chunking" which divides the message into smaller parts (unless you are using IPv6 Jumbograms), with a application generated header which contained the ordering of the packets, and possibly data size. You also have the issue of fragmentation when you are over the MTU size (although with a reliability mechanism like you mention this is probably not such an issue).
I guess you have to ask yourself what benefits UDP would give you over your current TCP design. The main reason to use UDP is when you need a lightweight protocol with a very small network delay or you need to be able to broadcast or multicast packets over a LAN. If you dont have these needs and TCP is doing the job, why change ?

When to set “Don't fragment” flag in IP header?

There is a “Don't fragment” flag in IP header.
Could applications set this flag ?
When to set this flag and why?
If the 'DF' bit is set on packets, a router which normally would fragment a packet larger than MTU (and potentially deliver it out of order), instead will drop the packet. The router is expected to send "ICMP Fragmentation Needed" packet, allowing the sending host to account for the lower MTU on the path to the destination host. The sending side will then reduce its estimate of the connection's Path MTU (Maximum Transmission Unit) and re-send in smaller segments.This process is called PMTU-D ("Path MTU Discovery").
A fragmentation causes extra overhead on CPU processing to re-assemble packets at the other end (and to handle missing fragments).
Typically, the 'DF' bit is a configurable parameter for the IP stack. I know of ping utility with an option to set DF.
It is often useful to avoid fragmentation, since apart from CPU utilization for fragmentation and re-assembly, it may affect throughput (if lost fragments need re-transmission). For this reason, it is often desirable to know the maximum transmission unit. So the 'Path MTU discovery' is used to find this size, by simply setting the DF bit (say for a ping)
Further down in RFC 791 it says:
If the Don't Fragment flag (DF) bit is set, then internet
fragmentation of this datagram is NOT permitted, although it may be
discarded. This can be used to prohibit fragmentation in cases
where the receiving host does not have sufficient resources to
reassemble internet fragments.
So it appears what they had in mind originally was small embedded devices with the simplest possible implementation of IP, and little memory. Today, you might think of an IoT device like a smart light bulb or smoke alarm. They might not have the code or memory to reassemble fragments, and so the software communicating with them would set DF.
The only situations I can think of where you would possibly want to set this flag is:
If you are building something like a client-server application where
you don't want the other side having to deal with a fragmented
packet but rather prefer a packet loss to that.
Or if you are on a
network with a very specific set of restrictions, possibly caused by
bandwidth issues or a specific firewall behaviour.
Except for such specific circumstances you would likely never touch it.
From RFC 791:
Fragmentation of an internet datagram is necessary when it
originates in a local net that allows a large packet size and must
traverse a local net that limits packets to a smaller size to reach
its destination.
An internet datagram can be marked "don't fragment." Any internet
datagram so marked is not to be internet fragmented under any
circumstances. If internet datagram marked don't fragment cannot be
delivered to its destination without fragmenting it, it is to be
discarded instead.
Could applications set this flag ?
Yes if you write low-level enough code that you are dealing with the IP header. This part of the question is a bit broad for giving a more specific answer, you should probably figure out if you want to set it before worrying about the how.

Send packets larger than 64K in TCP

As far as we know the absolute limitation on TCP packet size is 64K (65535 bytes), and in practicality this is far larger than the size of any packet you will see, because the lower layers (e.g. ethernet) have lower packet sizes. The MTU (Maximum Transmission Unit) for Ethernet, for instance, is 1500 bytes.
I want to know, Is there any any way or any tools, to send packets larger than 64k?
I want to test a device in facing with packet larger than 64k! I mean I want to see, if I send a packet larger than 64K, how it behave? Does it drop some part of it? Or something else.
So :
1- How to send this large packets? What is the proper layer for this?
2- How the receiver behave usually?
The IP packet format has only 16 bit for the size of the packet, so you will not be able to create a packet with a size larger than 64k. See http://en.wikipedia.org/wiki/IPv4#Total_Length. Since TCP uses IP as the lower layer this limit applies here too.
There is no such thing as a TCP packet. TCP data is sent and received in segments, which can be as large as you like up to the limits of the API you're using, as they can be comprised of multiple IP packets. At the receiver TCP is indistinguishable from a byte stream.
NB osi has nothing to do with this, or anything else.
TCP segments are not size-limited. The thing which imposes the limit is that IPv4 and IPv6 packets have 16 bit length fields, so a size larger than this limit is not possible to express.
However, RFC 2675 is a proposed standards for IPv6 which would expand the length field to 32 bits, allowing much larger TCP segments.
See here for a talk about why this change could help improve performance and here for a set of (experimental) patches to Linux to enable this RFC.

Ethernet device MTU setting implications in packet ingress path

What is the behavior of an ethernet device in the packet ingress path?
If the sender is sending a larger-than-MTU frame, then:
1) does the receiver's device drop it directly in hardware,
2) or accept it and send it up for the kernel's IP stack to handle it?
3) when is ICMP frag-required sent?
4) does it make a difference if the ethernet device is say on a intermediate router vs a end host?
It is impossible to answer 1), 2) definitively for all devices and networking stacks. The Ethernet standard defines an MTU of 1500 bytes so that is all you can rely on and in general you should expect frames with larger MTUs to be dropped.
However in reality it is likely in an end host that if the network interface hardware doesn't drop the oversized frame (which is typically referred to as giant) then it will make it's way up the software stack and be processed. Even though the stack may not drop the oversized frame due to it being over the MTU it may still get dropped for other reasons, e.g. due to an internal queue exhaustion.
Although the maximum MTU of an Ethernet frame has remained constant the maximum size of an Ethernet frame has grown over time to encompass features like 802.1Q VLAN single and double tags. MPLS further increases the frame size to include label stacks. This means intermediate switches are usually tolerant of frames that exceed the interface MTU by some amount. One major vendor effectively tolerates a max MTU of 2000 bytes by default in their current switches. Older switches may be less tolerant.
To get a definitive answer you'll need to do some research into the specific hardware and software that you care about.

Resources