Send packets larger than 64K in TCP - networking

As far as we know the absolute limitation on TCP packet size is 64K (65535 bytes), and in practicality this is far larger than the size of any packet you will see, because the lower layers (e.g. ethernet) have lower packet sizes. The MTU (Maximum Transmission Unit) for Ethernet, for instance, is 1500 bytes.
I want to know, Is there any any way or any tools, to send packets larger than 64k?
I want to test a device in facing with packet larger than 64k! I mean I want to see, if I send a packet larger than 64K, how it behave? Does it drop some part of it? Or something else.
So :
1- How to send this large packets? What is the proper layer for this?
2- How the receiver behave usually?

The IP packet format has only 16 bit for the size of the packet, so you will not be able to create a packet with a size larger than 64k. See http://en.wikipedia.org/wiki/IPv4#Total_Length. Since TCP uses IP as the lower layer this limit applies here too.

There is no such thing as a TCP packet. TCP data is sent and received in segments, which can be as large as you like up to the limits of the API you're using, as they can be comprised of multiple IP packets. At the receiver TCP is indistinguishable from a byte stream.
NB osi has nothing to do with this, or anything else.

TCP segments are not size-limited. The thing which imposes the limit is that IPv4 and IPv6 packets have 16 bit length fields, so a size larger than this limit is not possible to express.
However, RFC 2675 is a proposed standards for IPv6 which would expand the length field to 32 bits, allowing much larger TCP segments.
See here for a talk about why this change could help improve performance and here for a set of (experimental) patches to Linux to enable this RFC.

Related

Should I use UDP or TCP in this case?

P2P Network:
Largest message is about 300KB. Most of the messages are smaller (5-50kb). It is perfectly OK if they do not receive the messages, as they will initiate bootstrap (re-send).
I am leaning towards UDP, and you guessed it, its a blockchain software! However, our current design is TCP.
The largest size of a UDP packet is 65,535 bytes (including an 8 byte UDP header and 20 byte IP header), so for your largest messages you would have to implement a form of "chunking" which divides the message into smaller parts (unless you are using IPv6 Jumbograms), with a application generated header which contained the ordering of the packets, and possibly data size. You also have the issue of fragmentation when you are over the MTU size (although with a reliability mechanism like you mention this is probably not such an issue).
I guess you have to ask yourself what benefits UDP would give you over your current TCP design. The main reason to use UDP is when you need a lightweight protocol with a very small network delay or you need to be able to broadcast or multicast packets over a LAN. If you dont have these needs and TCP is doing the job, why change ?

How does MTU retransmission work in case of UDP

As we all perfectly know, UDP does not support retransmission along with some other things.
We also aware of such thing like MTU that works basically in the following way -- when one of the network devices on the path between source and destination points does not support packet of some size, it just drops it.
In case of TCP, it's not a problem -- it already knows MSS after handshake that is always less than MTU (am I right?), so there's no possibility to send a packet with the size greater than MTU.
However, I wonder how does it work in case of UDP? As I already said, there's no retransmission in this protocol and there's no such thing like MSS. So what happens when the packet is dropped due to exceeding MTU?
Or it just works because of the MTU nature (it actually belongs to the IP layer, not the transport layer protocols like UDP or TCP)? So the IP layer reconstruct the dropped packet in smaller units and send it again?
First of all, you must distinguish between the local MTU, which is just the MTU of the local link, and the path MTU (PMTU), which is the smallest MTU of the local link. Consider the following topology:
1500 1480 1500
A -------- B -------- C -------- D
then A's local MTU is 1500, but the PMTU is just 1480.
When router B receives a packet of size 1500 which it needs to forward, and the DF bit is set, it sends an ICMP packet back to the sender with the next hop's MTU, 1480 in this case. The sender can then reduce the packet size.
In TCP, this is done transparently by the network stack. In UDP, the application needs to deal with it. There are three ways to do that:
always send packets that are small enough; 1024 is always safe over IPv6, and 512 is usually (but not always) safe over IPv4;
use a connected UDP socket, and react to an EMSGSIZE error by reducing the packet size; or
use any kind of UDP socket, request the PMTU ancillary data, and use the data provided.
Technique (3) is the most efficient. For IPv6, it is described in Section 11.3 of RFC 3542.

UDP Networking Fundamentals

I've been doing some work with C# Networking using UDP. I'm getting on fine but need the answer to a couple of fundamental questions I'm having problems testing:
Currently I'm sending data in a ~16000 byte datagrams, which according to wireshark is getting split into several 1500 byte packets (because of max packet size limits) and then reassembled at the other end.
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I've looked in a lot of places but there seems to be a lot of confusion between the differences between datagrams and the underlying packets...
Thanks for you help!
There is no chance of ending up with a fragmented datagram due to packet loss?
I believe that's true: that fragmentation and fragment reassembly is handled by the protocol layer below UDP, i.e. that it's handled by the "IP" layer, which will error if it fails to reassemble the packet-fragments into a datagram (for example, search for "fragment" in RFC 792).
http://www.pcvr.nl/tcpip/udp_user.htm#11_5 says,
"The IP layer at the destination performs the reassembly. The goal is to make fragmentation and reassembly transparent to the transport layer (TCP and UDP), which it is, except for possible performance degradation."
As you may now 16 bit UDP length field indicates that you can send a total of 65535 bytes. However, the data can be theoretically (sizeof(IP Header) + sizeof(UDP Header)) = 65535-(20+8) = 65507 bytes.
But this does not mean that all applications that are using UDP will send this amount of data as an example DNS packets limits to 512 bytes. This is because you don't get any ACK packets from server. This is one reason that packets may get lost in the network (packet transmission problems and loss). Secondly intermediate nodes may encapsulate datagrams inside of another protocol, as an example IPSEC or other protocols do that.
For UDP there is no ACK packets, so in your case if underlying application uses UDP you should not see any ACK packets. Secondly, some of the server limit their sizes to the max UDP packets depending on the application, so if you have data transfer from client to server you should see same bytes e.g 512 bytes. going and coming back in wireshark. Mostly, source makes the request and destination sends X bytes UDP datagrams back.
These links may be good for your questions:
Wireshark UDP analysis
RFC 1122 (states that 576 is the minimum maximum reassembly buffer size)
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
That is correct.
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I don't understand this question. You need to ACK each datagram regardless of its size, and you should make them < 1500 bytes so they won't get fragmented. Otherwise you may never be able to transmit any specific datagrams at all, if it repeatedly gets fragmented and a fragment repeatedly gets lost.

UDP Packet size and packet losses

I've been writing a program that uses a stop and wait protocol on top of UDP to send packets over LAN and also over WAN. I've recently been testing my program and have noticed that the packet loss rate is higher for larger packets (approaching 64k bytes). Intuitively this makes sense but what are the actual reasons for this?
UDP packets greater than the MTU size of the network that carries them will be automatically split up into multiple packets, and then reassembled by the recipient. If any of those multiple sub-packets gets dropped, then the receiver will drop the rest of them as well.
So for example if you send a 63k UDP packet, and it goes over Ethernet, it will get broken up into 47+ smaller "fragment" packets (because Ethernet's MTU is 1500 bytes, but some of those are used for UDP headers, etc, so the amount of user-data-space available in a UDP packet is smaller than that). The receiver will only "see" that UDP packet if all 47+ of those fragment-packets make it through okay. If just one of those fragment-packets gets dropped, the whole operation fails.
Well, data networks are far from reliable; packets get dropped all the time. Overloaded routers, full buffers and corrupt packets are some of the reasons. Since UDP has no flow control capabilities, it can't slow down if for example the receiving end is overloaded.
As Jeremy explained, the bigger the payload, the more packets it is going to be split into, and therefore a bigger chance of losing some of them.
UDP is used in cases where a dropped packet here in there won't affect anything or cases that you need something to get there in time or not at all. (VOIP, streaming video etc)
Its all about IP fragmentation and defragmentation. Packet more than MTU would be fragmented and has to be defragmented at the final host, there are also chances the fragments gets fragmented again on the path and which again can add the delay. sometimes if some N/W element is configured for layer 4 filtering then it defragments(not the final host) apply rules and then again frgaments and forward. Thats the reason the applicaiton which need performance always try to send data with size <= (MTU-ETHHDR-IPHDR)

Maximum packet size for a TCP connection

What is the maximum packet size for a TCP connection or how can I get the maximum packet size?
The absolute limitation on TCP packet size is 64K (65535 bytes), but in practicality this is far larger than the size of any packet you will see, because the lower layers (e.g. ethernet) have lower packet sizes.
The MTU (Maximum Transmission Unit) for Ethernet, for instance, is 1500 bytes. Some types of networks (like Token Ring) have larger MTUs, and some types have smaller MTUs, but the values are fixed for each physical technology.
This is an excellent question and I run in to this a lot at work actually. There are a lot of "technically correct" answers such as 65k and 1500. I've done a lot of work writing network interfaces and using 65k is silly, and 1500 can also get you in to big trouble. My work goes on a lot of different hardware / platforms / routers, and to be honest the place I start is 1400 bytes. If you NEED more than 1400 you can start to inch your way up, you can probably go to 1450 and sometimes to 1480'ish? If you need more than that then of course you need to split in to 2 packets, of which there are several obvious ways of doing..
The problem is that you're talking about creating a data packet and writing it out via TCP, but of course there's header data tacked on and so forth, so you have "baggage" that puts you to 1500 or beyond.. and also a lot of hardware has lower limits.
If you "push it" you can get some really weird things going on. Truncated data, obviously, or dropped data I've seen rarely. Corrupted data also rarely but certainly does happen.
At the application level, the application uses TCP as a stream oriented protocol. TCP in turn has segments and abstracts away the details of working with unreliable IP packets.
TCP deals with segments instead of packets. Each TCP segment has a sequence number which is contained inside a TCP header.
The actual data sent in a TCP segment is variable.
There is a value for getsockopt that is supported on some OS that you can use called TCP_MAXSEG which retrieves the maximum TCP segment size (MSS). It is not supported on all OS though.
I'm not sure exactly what you're trying to do but if you want to reduce the buffer size that's used you could also look into: SO_SNDBUF and SO_RCVBUF.
There're no packets in TCP API.
There're packets in underlying protocols often, like when TCP is done over IP, which you have no interest in, because they have nothing to do with the user except for very delicate performance optimizations which you are probably not interested in (according to the question's formulation).
If you ask what is a maximum number of bytes you can send() in one API call, then this is implementation and settings dependent. You would usually call send() for chunks of up to several kilobytes, and be always ready for the system to refuse to accept it totally or partially, in which case you will have to manually manage splitting into smaller chunks to feed your data into the TCP send() API.
According to http://en.wikipedia.org/wiki/Maximum_segment_size, the default largest size for a IPV4 packet on a network is 536 octets (bytes of size 8 bits). See RFC 879
Generally, this will be dependent on the interface the connection is using. You can probably use an ioctl() to get the MTU, and if it is ethernet, you can usually get the maximum packet size by subtracting the size of the hardware header from that, which is 14 for ethernet with no VLAN.
This is only the case if the MTU is at least that large across the network. TCP may use path MTU discovery to reduce your effective MTU.
The question is, why do you care?
If you are with Linux machines, "ifconfig eth0 mtu 9000 up" is the command to set the MTU for an interface. However, I have to say, big MTU has some downsides if the network transmission is not so stable, and it may use more kernel space memories.
One solution can be to set socket option TCP_MAXSEG (http://linux.die.net/man/7/tcp) to a value that is "safe" with underlying network (e.g. set to 1400 to be safe on ethernet) and then use a large buffer in send system call.
This way there can be less system calls which are expensive.
Kernel will split the data to match MSS.
This way you can avoid truncated data and your application doesn't have to worry about small buffers.
It seems most web sites out on the internet use 1460 bytes for the value of MTU. Sometimes it's 1452 and if you are on a VPN it will drop even more for the IPSec headers.
The default window size varies quite a bit up to a max of 65535 bytes. I use http://tcpcheck.com to look at my own source IP values and to check what other Internet vendors are using.
The packet size for a TCP setting in IP protocol(Ip4). For this field(TL), 16 bits are allocated, accordingly the max size of packet is 65535 bytes: IP protocol details

Resources