How does MTU retransmission work in case of UDP - networking

As we all perfectly know, UDP does not support retransmission along with some other things.
We also aware of such thing like MTU that works basically in the following way -- when one of the network devices on the path between source and destination points does not support packet of some size, it just drops it.
In case of TCP, it's not a problem -- it already knows MSS after handshake that is always less than MTU (am I right?), so there's no possibility to send a packet with the size greater than MTU.
However, I wonder how does it work in case of UDP? As I already said, there's no retransmission in this protocol and there's no such thing like MSS. So what happens when the packet is dropped due to exceeding MTU?
Or it just works because of the MTU nature (it actually belongs to the IP layer, not the transport layer protocols like UDP or TCP)? So the IP layer reconstruct the dropped packet in smaller units and send it again?

First of all, you must distinguish between the local MTU, which is just the MTU of the local link, and the path MTU (PMTU), which is the smallest MTU of the local link. Consider the following topology:
1500 1480 1500
A -------- B -------- C -------- D
then A's local MTU is 1500, but the PMTU is just 1480.
When router B receives a packet of size 1500 which it needs to forward, and the DF bit is set, it sends an ICMP packet back to the sender with the next hop's MTU, 1480 in this case. The sender can then reduce the packet size.
In TCP, this is done transparently by the network stack. In UDP, the application needs to deal with it. There are three ways to do that:
always send packets that are small enough; 1024 is always safe over IPv6, and 512 is usually (but not always) safe over IPv4;
use a connected UDP socket, and react to an EMSGSIZE error by reducing the packet size; or
use any kind of UDP socket, request the PMTU ancillary data, and use the data provided.
Technique (3) is the most efficient. For IPv6, it is described in Section 11.3 of RFC 3542.

Related

What happens if fragmented IP datagram gets dropped (i.e. one fragment out of many)?

I am a student just beginning to dive into the network stack, so please forgive any misconceptions.
If you are using TCP as a transport layer protocol, and sending payloads that happen to be greater than the MTU lower down in the stack, the Network Layer protocol (IPv4 for example) will fragment the payload into separate IP datagrams. What happens if a single one of these "fragmented" IP datagrams is dropped?
Example
Sending 4000 byte datagram with MTU of 1500 bytes
[MF: 1 Offset: 0 Length: 1500]
[MF: 1 Offset: 185 Length: 1500] <--------- Dropped/Lost
[MF: 0 Offset: 370 Length: 1040]
Which would happen?
Only IP datagram #2 get re-transmitted
The entire 4000 byte datagram be fragmented again and entirely re-transmitted after TCP determines the datagram was lost
I was originally thinking that the individual fragment would be re-transmitted, however given that this is below the reliability ensured by TCP in the transport layer, I am now thinking that the entire TCP datagram is just dropped/re-transmitted (which seems inefficient).
As a follow up, is there any reason that this "fragmentation" occurs below the transport layer (specifically below TCP)? I feel like this could be handled most of the time by ensuring that packets are broken up into a generally acceptable MTU size above/at the transport layer. Upon some research I found some information on how TCP should have nothing to do with packetizing things, which I did not quite understand.
The entire 4000 byte datagram be fragmented again and entirely re-transmitted after TCP determines the datagram was lost. Although TCP should break it up into MTU fit-able chunks (aka segments) in the first place.

Should I use UDP or TCP in this case?

P2P Network:
Largest message is about 300KB. Most of the messages are smaller (5-50kb). It is perfectly OK if they do not receive the messages, as they will initiate bootstrap (re-send).
I am leaning towards UDP, and you guessed it, its a blockchain software! However, our current design is TCP.
The largest size of a UDP packet is 65,535 bytes (including an 8 byte UDP header and 20 byte IP header), so for your largest messages you would have to implement a form of "chunking" which divides the message into smaller parts (unless you are using IPv6 Jumbograms), with a application generated header which contained the ordering of the packets, and possibly data size. You also have the issue of fragmentation when you are over the MTU size (although with a reliability mechanism like you mention this is probably not such an issue).
I guess you have to ask yourself what benefits UDP would give you over your current TCP design. The main reason to use UDP is when you need a lightweight protocol with a very small network delay or you need to be able to broadcast or multicast packets over a LAN. If you dont have these needs and TCP is doing the job, why change ?

Send packets larger than 64K in TCP

As far as we know the absolute limitation on TCP packet size is 64K (65535 bytes), and in practicality this is far larger than the size of any packet you will see, because the lower layers (e.g. ethernet) have lower packet sizes. The MTU (Maximum Transmission Unit) for Ethernet, for instance, is 1500 bytes.
I want to know, Is there any any way or any tools, to send packets larger than 64k?
I want to test a device in facing with packet larger than 64k! I mean I want to see, if I send a packet larger than 64K, how it behave? Does it drop some part of it? Or something else.
So :
1- How to send this large packets? What is the proper layer for this?
2- How the receiver behave usually?
The IP packet format has only 16 bit for the size of the packet, so you will not be able to create a packet with a size larger than 64k. See http://en.wikipedia.org/wiki/IPv4#Total_Length. Since TCP uses IP as the lower layer this limit applies here too.
There is no such thing as a TCP packet. TCP data is sent and received in segments, which can be as large as you like up to the limits of the API you're using, as they can be comprised of multiple IP packets. At the receiver TCP is indistinguishable from a byte stream.
NB osi has nothing to do with this, or anything else.
TCP segments are not size-limited. The thing which imposes the limit is that IPv4 and IPv6 packets have 16 bit length fields, so a size larger than this limit is not possible to express.
However, RFC 2675 is a proposed standards for IPv6 which would expand the length field to 32 bits, allowing much larger TCP segments.
See here for a talk about why this change could help improve performance and here for a set of (experimental) patches to Linux to enable this RFC.

UDP Networking Fundamentals

I've been doing some work with C# Networking using UDP. I'm getting on fine but need the answer to a couple of fundamental questions I'm having problems testing:
Currently I'm sending data in a ~16000 byte datagrams, which according to wireshark is getting split into several 1500 byte packets (because of max packet size limits) and then reassembled at the other end.
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I've looked in a lot of places but there seems to be a lot of confusion between the differences between datagrams and the underlying packets...
Thanks for you help!
There is no chance of ending up with a fragmented datagram due to packet loss?
I believe that's true: that fragmentation and fragment reassembly is handled by the protocol layer below UDP, i.e. that it's handled by the "IP" layer, which will error if it fails to reassemble the packet-fragments into a datagram (for example, search for "fragment" in RFC 792).
http://www.pcvr.nl/tcpip/udp_user.htm#11_5 says,
"The IP layer at the destination performs the reassembly. The goal is to make fragmentation and reassembly transparent to the transport layer (TCP and UDP), which it is, except for possible performance degradation."
As you may now 16 bit UDP length field indicates that you can send a total of 65535 bytes. However, the data can be theoretically (sizeof(IP Header) + sizeof(UDP Header)) = 65535-(20+8) = 65507 bytes.
But this does not mean that all applications that are using UDP will send this amount of data as an example DNS packets limits to 512 bytes. This is because you don't get any ACK packets from server. This is one reason that packets may get lost in the network (packet transmission problems and loss). Secondly intermediate nodes may encapsulate datagrams inside of another protocol, as an example IPSEC or other protocols do that.
For UDP there is no ACK packets, so in your case if underlying application uses UDP you should not see any ACK packets. Secondly, some of the server limit their sizes to the max UDP packets depending on the application, so if you have data transfer from client to server you should see same bytes e.g 512 bytes. going and coming back in wireshark. Mostly, source makes the request and destination sends X bytes UDP datagrams back.
These links may be good for your questions:
Wireshark UDP analysis
RFC 1122 (states that 576 is the minimum maximum reassembly buffer size)
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
That is correct.
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I don't understand this question. You need to ACK each datagram regardless of its size, and you should make them < 1500 bytes so they won't get fragmented. Otherwise you may never be able to transmit any specific datagrams at all, if it repeatedly gets fragmented and a fragment repeatedly gets lost.

UDP Packet size and packet losses

I've been writing a program that uses a stop and wait protocol on top of UDP to send packets over LAN and also over WAN. I've recently been testing my program and have noticed that the packet loss rate is higher for larger packets (approaching 64k bytes). Intuitively this makes sense but what are the actual reasons for this?
UDP packets greater than the MTU size of the network that carries them will be automatically split up into multiple packets, and then reassembled by the recipient. If any of those multiple sub-packets gets dropped, then the receiver will drop the rest of them as well.
So for example if you send a 63k UDP packet, and it goes over Ethernet, it will get broken up into 47+ smaller "fragment" packets (because Ethernet's MTU is 1500 bytes, but some of those are used for UDP headers, etc, so the amount of user-data-space available in a UDP packet is smaller than that). The receiver will only "see" that UDP packet if all 47+ of those fragment-packets make it through okay. If just one of those fragment-packets gets dropped, the whole operation fails.
Well, data networks are far from reliable; packets get dropped all the time. Overloaded routers, full buffers and corrupt packets are some of the reasons. Since UDP has no flow control capabilities, it can't slow down if for example the receiving end is overloaded.
As Jeremy explained, the bigger the payload, the more packets it is going to be split into, and therefore a bigger chance of losing some of them.
UDP is used in cases where a dropped packet here in there won't affect anything or cases that you need something to get there in time or not at all. (VOIP, streaming video etc)
Its all about IP fragmentation and defragmentation. Packet more than MTU would be fragmented and has to be defragmented at the final host, there are also chances the fragments gets fragmented again on the path and which again can add the delay. sometimes if some N/W element is configured for layer 4 filtering then it defragments(not the final host) apply rules and then again frgaments and forward. Thats the reason the applicaiton which need performance always try to send data with size <= (MTU-ETHHDR-IPHDR)

Resources