UDP numbered segments? - networking

My firewall textbook says: "UDP breaks a message into numbered segments so that it can be transmitted."
My understanding was UDP had no sequence or other numbering scheme? That data was broken into packets and sent out with no ordered reconstruction on the other end, at least on this level. Am I missing something?

The book is just wrong here. The relevant section says:
User Datagram Protocol (UDP)—This protocol is similar to TCP in that it handles the addressing of a message. UDP breaks a message into numbered segments so that it can be transmitted. It then reassembles the message when it reaches the destination computer.
UDP does not include any mechanism to segment or reassemble messages; each message is sent as a single UDP datagram. If you look at the UDP "packet" (technically datagram) structure on page 108, there's no segment number or anything like that.
Mind you, segmentation can happen at other layers, either above or below UDP:
IP packets can be fragmented if they're too big for a network link's MTU (maximum transfer unit). This can happen to IP packets that contain UDP, TCP, or whatever. This is actually relevant for firewalls because creative fragmentation can sometimes be used to bypass packet filtering rules.
Some protocols that run on top of UDP also use something like numbered segments. For example, TFTP (trivial file transfer protocol) breaks files into "blocks", and transmits a block number in the header for each block. (And the receiver responds acknowledging the block number it's received -- it's like a drastically simplified version of TCP.) But this is part of the TFTP protocol, not part of UDP.
QUIC is another example of a protocol that runs over UDP and supports segmentation (and multiple connections, and...), and each packet contains a packet number. But again it's part of the QUIC protocol, not UDP.

Related

packet order in TCP packet fragmentation

In TCP/IP, we have MSS and MTU when sending and receiving packets.
MTU is an IP layer concept, which is determined by the underlying hardware. It shows the maximum data size that an IP layer packet can contain during one transmission.
MSS is a TCP layer concept, which is limited by the MTU, showing that the TCP data stream will be fragmented into MSS-size packets.
Our protocol lies on top of TCP, and each protocol will define its own packet. One example is MySQL, which defines its packet size up to 2^24-1, that is around 16M. When the big enough protocol packet comes to TCP, it will be fragmented according to MSS.
Assume that a client needs to send DATA1 and DATA2 to server. DATA2 size is bigger than MSS, and DATA2 will be fragmented into DATA2_1, DATA2_2. As the packets will be handled by the IP layer, so the time that each packet arrives at server might not be the same as that when the client sends them.
So I think the sequence of packets' arriving might be the following:
DATA1 DATA2_1, DATA2_2
DATA1, DATA2_1, DATA2_2
DATA1, DATA2_2, DATA2_1
In the first case, the server receives DATA1 and DATA2_1 in one tcp packet and then another packet contains DATA2_2 arrives.
In the second case, the server receives DATA1, DATA2_1 and DATA2_2 in three packets.
In the third case, the server first receives DATA2_2 and then DATA2_1.
My question:
Is the third case possible?
If yes, it disobeys that TCP is a stream protocol, and stream protocol should be ordered. And even this does not break the stream rule, how to handle this scenario?
If no, how TCP makes the disordered packets into its original order?
It is possible to receive that sequence over the network, however the TCP implementation will hide that detail from your application and only feed the data to you in stream order. (In fact since fragmentation happens at the IP layer it won't even be shown to the TCP layer until the second part has arrived also)
The fact that received packets have to be held in a buffer even after receiving them in some cases like this is why you will see people referring to UDP as better for lower latency applications: you can receive datagrams out of order with UDP and it's up to your application to figure out how to deal with that possibility.
Is the third case possible.
Yes, of course.
If so, it disobeys that TCP is a stream protocol ...
No it doesn't.
Your cases concern arrival of IP packets into a host. TCP being a stream protocol is about delivery of data into an application.
The packet fragments get reassembled in the correct order by the IP layer, and the packets get reassembled into segments in the correct order by TCP, and the now correctly ordered data stream is delivered to the application.

UDP - Optional Checksum

From what I have read about UDP, it has no error handling, no checking for things like sequence of data sent/recieved, no checking for duplicate packets, no checking for corrupt packets and obviously no guarantee that the packets sent are even received...
So with that in mind, why an earth is there actually an option to use checksums in UDP?? Because surely if you want to make sure the data being sent is received in the correct order (and not corrupt and so on) then you would use TCP...
UDP packets include a field for a 16 bit CRC checksum which the receiving operating system will use to check for packet corruption. If the checksum is present and fails, then the packet will be silently discarded. It is up to the application to notice that the packet disappeared and take corrective action.
UDP checksums are enabled by default on all modern operating systems. It is possible to disable UDP checksums in IPv4, either at the socket or OS level. Doing so would reduce the CPU overhead of processing each packet at both the sender and receiver. This might be desirable if, for example, the application were calculating its own checksum separately. Without any checksum, there would be no guarantee that the bytes received are the same as the bytes sent.
The task of UDP is to transport datagrams, which are "network data packets". For UDP, every data packet is a transmission of its own. If you send 3 packets, those are three independent transmissions for UDP. Whether the content of these 3 packets somehow belongs together or if these are three individual requests (think of DNS requests, where every request is sent as an own UDP packet), UDP doesn't know and doesn't care. All that UDP guarantees is that a packet is either transmitted as a whole or not at all; either the entire packet arrives or the entire packet is lost, you will never see "half of a packet" arriving. So if you just want to send a bunch of data packets, you use UDP.
The task of TCP, on the other hand, is to transport a stream of data. It's not about packets. It's about a stream of bytes somehow making it from one host to another. How this happens, e.g. how TCP is breaking the data stream into chunks and sending these chunks over the network and ensuring that no data is lost and all data is in order, is up to TCP. All that TCP guarantees is that the bytes will arrive correctly and in order at the other side, unless the TCP connection is lost, in which case the stream ends abruptly somewhere in the middle but all data, that arrived up to that point, did arrive correctly and in correct order. So despite TCP also working with packets, the transmission behaves like a stream that has no internal "data units". When sending 80 bytes over TCP, there may be one packet with 80 bytes or 10 packets with each 8 bytes or anything in between, you cannot know and you don't have to.
But just because you use UDP doesn't mean you don't care for data corruption in UDP packets. Keep in mind that corruption may not just affect your data, it may also affect the UDP header itself. If only a single bit swaps, the UDP packets may have an incorrect destination port. So they added a checksum which ensures that neither the UDP header nor the data payload has been corrupted but made it optional, so it's up to you whether you want to use it or not. If used, corrupt packets are dropped and thus behave like lost packets. If your code takes care of lost packets, it will automatically take care of corrupt packets, too.
With IPv6 though, the checksum was dropped from the IP header, which means that IP header corruptions are no longer detected. But this was seen as a small problem, as most layer 2 protocols have their own mechanism to detect corrupt data (e.g. Ethernet and WiFi already guarantee that data is not corrupted on its way through the network) and the checksums of UDP/TCP also cover some of the IP header fields, so even without layer 2 error checking, the recipient would notice if the IP addresses in the header have been corrupted along the way and drop the packet. As a consequence, the UDP checksum is no longer optional with IPv6.

UDP Networking Fundamentals

I've been doing some work with C# Networking using UDP. I'm getting on fine but need the answer to a couple of fundamental questions I'm having problems testing:
Currently I'm sending data in a ~16000 byte datagrams, which according to wireshark is getting split into several 1500 byte packets (because of max packet size limits) and then reassembled at the other end.
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I've looked in a lot of places but there seems to be a lot of confusion between the differences between datagrams and the underlying packets...
Thanks for you help!
There is no chance of ending up with a fragmented datagram due to packet loss?
I believe that's true: that fragmentation and fragment reassembly is handled by the protocol layer below UDP, i.e. that it's handled by the "IP" layer, which will error if it fails to reassemble the packet-fragments into a datagram (for example, search for "fragment" in RFC 792).
http://www.pcvr.nl/tcpip/udp_user.htm#11_5 says,
"The IP layer at the destination performs the reassembly. The goal is to make fragmentation and reassembly transparent to the transport layer (TCP and UDP), which it is, except for possible performance degradation."
As you may now 16 bit UDP length field indicates that you can send a total of 65535 bytes. However, the data can be theoretically (sizeof(IP Header) + sizeof(UDP Header)) = 65535-(20+8) = 65507 bytes.
But this does not mean that all applications that are using UDP will send this amount of data as an example DNS packets limits to 512 bytes. This is because you don't get any ACK packets from server. This is one reason that packets may get lost in the network (packet transmission problems and loss). Secondly intermediate nodes may encapsulate datagrams inside of another protocol, as an example IPSEC or other protocols do that.
For UDP there is no ACK packets, so in your case if underlying application uses UDP you should not see any ACK packets. Secondly, some of the server limit their sizes to the max UDP packets depending on the application, so if you have data transfer from client to server you should see same bytes e.g 512 bytes. going and coming back in wireshark. Mostly, source makes the request and destination sends X bytes UDP datagrams back.
These links may be good for your questions:
Wireshark UDP analysis
RFC 1122 (states that 576 is the minimum maximum reassembly buffer size)
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
That is correct.
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I don't understand this question. You need to ACK each datagram regardless of its size, and you should make them < 1500 bytes so they won't get fragmented. Otherwise you may never be able to transmit any specific datagrams at all, if it repeatedly gets fragmented and a fragment repeatedly gets lost.

Why do we say the IP protocol in TCP/IP suite is connectionless?

Why is the IP called a connectionless protocol? If so, what is the connection-oriented protocol then?
Thanks.
Update - 1 - 20:21 2010/12/26
I think, to better answer my question, it would be better to explain what "connection" actually means, both physically and logically.
Update - 2 - 9:59 AM 2/1/2013
Based on all the answers below, I come to the feeling that the 'connection' mentioned here should be considered as a set of actions/arrangements/disciplines. Thus it's more an abstract concept rather than a concrete object.
Update - 3 - 11:35 AM 6/18/2015
Here's a more physical explanation:
IP protocol is connectionless in that all packets in IP network are routed independently, they may not necessarily go through the same route, while in a virtual circuit network which is connection oriented, all packets go through the same route. This single route is what 'virtual circuit' means.
With connection, because there's only 1 route, all data packets will arrive in the same order as they are sent out.
Without connection, it is not guaranteed all data packets will arrive
in the same order as they are sent out.
Update - 4 - 9:55 AM 2016/1/20/Wed
One of the characteristics of connection-oriented is that the packet order is preserved. TCP use a sequence number to achieve that but IP has no such facility. Thus TCP is connection-oriented while IP is connection-less.
The basic idea is pretty simple: with IP (on its own -- no TCP, UDP, etc.) you're just sending a packet of data. You simply send some data onto the net with a destination address, but that's it. By itself, IP gives:
no assurance that it'll be delivered
no way to find out if it was
nothing to let the destination know to expect a packet
much of anything else
All it does is specify a minimal packet format so you can get some data from one point to another (e.g., routers know the packet format, so they can look at the destination and send the packet on its next hop).
TCP is connection oriented. Establishing a connection means that at the beginning of a TCP conversation, it does a "three way handshake" so (in particular) the destination knows that a connection with the source has been established. It keeps track of that address internally, so it can/will/does expect more packets from it, and be able to send replies to (for example) acknowledge each packet it receives. The source and destination also cooperate to serial number all the packets for the acknowledgment scheme, so each end knows whether packets it sent were received at the other end. This doesn't involve much physically, but logically it involves allocating some memory on both ends. That includes memory for metadata like the next packet serial number to use, as well as payload data for possible re-transmission until the other side acknowledges receipt of that packet.
TCP/IP means "TCP over IP".
TCP
--
IP
TCP provides the "connection-oriented" logic, ordering and control
IP provides getting packets from A to B however it can: "connectionless"
Notes:
UDP is connection less but at the same level as TCP
Other protocols such as ICMP (used by ping) can run over IP but have nothing to do with TCP
Edit:
"connection-oriented" mean established end to end connection. For example, you pick up the telephone, call someone = you have a connection.
"connection-less" means "send it, see what happens". For example, sending a letter via snail mail.a
So IP gets your packets from A to B, maybe, in any order, not always eventually. TCP sorts them out, acknowledges them, requests a resends and provides the "connection"
Connectionless means that no effort is made to set up a dedicated end-to-end connection, While Connection-Oriented means that when devices communicate, they perform handshaking to set up an end-to-end connection.
IP is an example of the Connectionless protocols , in this kind of protocols you usually send informations in one direction, from source to destination without checking to see if the destination is still there, or if it is prepared to receive the information . Connectionless protocols (Like IP and UDP) are used for example with the Video Conferencing when you don't care if some packets are lost , while you have to use a Connection-Oriented protocol (Like TCP) when you send a File because you want to insure that all the packets are sent successfully (actually we use FTP to transfer Files). Edit :
In telecommunication and computing in
general, a connection is the
successful completion of necessary
arrangements so that two or more
parties (for example, people or
programs) can communicate at a long
distance. In this usage, the term has
a strong physical (hardware)
connotation although logical
(software) elements are usually
involved as well.
The physical connection is layer 1 of
the OSI model, and is the medium
through which the data is transfered.
i.e., cables
The logical connection is layer 3 of
the OSI model, and is the network
portion. Using the Internetwork
Protocol (IP), each host is assigned a
32 bit IP address. e.g. 192.168.1.1
TCP is the connection part of TCP/IP. IP's the addressing.
Or, as an analogy, IP is the address written on the envelope, TCP is the postal system which uses the address as part of the work of getting the envelope from point A to point B.
When two hosts want to communicate using connection oriented protocol, one of them must first initiate a connection and the other must accept it. Logically a connection is made between a port in one host and other port in the other host. Software in one host must perform a connect socket operation, and the other must perform an accept socket operation. Physically the initiator host sends a SYN packet, which contains all four connection identifying numbers (source IP, source port, destination IP, destination port). The other receives it and sends SYN-ACK, the initiator sends an ACK, then the connection are established. After the connection established, then the data could be transferred, in both directions.
In the other hand, connectionless protocol means that we don't need to establish connection to send data. It means the first packet being sent from one host to another could contain data payloads. Of course for upper layer protocols such as UDP, the recipient must be ready first, (e.g.) it must perform a listen udp socket operation.
The connectionless IP became foundation for TCP in the layer above
In TCP, at minimal 2x round trip times are required to send just one packet of data. That is : a->b for SYN, b->a for SYN-ACK, a->b for ACK with DATA, b->a for ACK. For flow rate control, Nagle's algorithm is applied here.
In UDP, only 0.5 round trip times are required : a->b with DATA. But be prepared that some packets could be silently lost and there is no flow control being done. Packets could be sent in the rate that are larger than the capability of the receiving system.
In my knowledge, every layer makes a fool of the one above it. The TCP gets an HTTP message from the Application layer and breaks it into packets. Lets call them data packets. The IP gets these packets one by one from TCP and throws it towards the destination; also, it collects an incoming packet and delivers it to TCP. Now, TCP after sending a packet, waits for an acknowledgement packet from the other side. If it comes, it says the above layer, hey, I have established a connection and now we can communicate! The whole communication process goes on between the TCP layers on both the sides sending and receiving different types of packets with each other (such as data packet, acknowledgement packet, synchronization packet , blah blah packet). It uses other tricks (all packet sending) to ensure the actual data packets to be delivered in ordered as they were broken and assembled. After assembling, it transfers them to the above application layer. That fool thinks that it has got an HTTP message in an established connection but in reality, just packets are being transferred.
I just came across this question today. It was bouncing around in my head all day and didn't make any sense. IP doesn't handle transport. Why would anyone even think of IP as connectionless or connection oriented? It is technically connectionless because it offers no reliability, no guaranteed delivery. But so is my toaster. My toaster offers no guaranteed delivery, so why not call aa toaster connectionless too?
In the end, I found out it's just some stupid title that someone somewhere attached to IP and it stuck, and now everyone calls IP connectionless and has no good reason for it.
Calling IP connectionless implies there is another layer 3 protocol that is connection oriented, but as far as I know, there isn't and it is just plain stupid to specify that IP is connectionless. MAC is connectionless. LLC is connectionless. But that is useless, technically correct info.

sending multiple tcp packets in an ip packet

is it possible to send multiple tcp or udp packets on a single ip packet? are there any specifications in the protocol that do not allow this.
if it is allowed by the protocol but is generally not done by tcp/udp implementations could you point me to the relevant portion in the linux source code that proves this.
are there any implementations of tcp/udp on some os that do send multiple packets on a single ip packet. (if it is allowed).
It is not possible.
The TCP seqment header does not describe its length. The length of the TCP payload is derived from the length of the IP packet(s) minus the length of the IP and TCP headers. So only one TCP segment per IP packet.
Conversely, however, a single TCP segment can be fragmented over several IP packets by IP fragmentation.
Tcp doesn't send packets: it is a continuous stream. You send messages.
Udp, being packet based, will only send one packet at a time.
The protocol itself does not allow it. It won't break, it just won't happen.
The suggestion to use tunneling is valid, but so is the warning.
You might want to try tunneling tcp over tcp, although it's generally considered a bad idea. Depending on your needs, your mileage may vary.
You may want to take a look at the Stream Control Transmission Protocol which allows multiple data streams across a single TCP connection.
EDIT - I wasn't aware that TCP doesn't have it's own header field so there would be no way of doing this without writing a custom TCP equivalent that contains this info. SCTP may still be of use though so I'll leave that link.
TCP is a public specification, why not just read it?
RFC4164 is the roadmap document, RFC793 is TCP itself, and RFC1122 contains some errata and shows how it fits together with the rest of the (IPv4) universe.
But in short, because the TCP header (RFC793 section 3.1) does not have a length field, TCP data extends from the end of the header padding to the end of the IP packet. There is nowhere to put another data segment in the packet.
You cannot pack several TCP packets into one IP packet - that is a restriction of specification as mentioned above. TCP is the closest API which is application-oriented. Or you want to program sending of raw IP messages? Just tell us, what problem do you want to solve. Think about how you organize the delivery of the messages from one application to another, or mention that you want to hook into TCP/IP stack. What I can suggest you:
Consider packing whatever you like into UDP packet. I am not sure, how easy is to initiate routing of "unpacked" TCP packages on remote side.
Consider using PPTP or similar tunnelling protocol.

Resources