Methods for implementing UDP multicast reliable - tcp

I am preparing for my university exam and one of the question last year was " how to make UDP multicast reliable " ( like tcp, retransmission of lost packets )
I thought about something like this :
Server send multicast using UDP
Every client send acknowledgement of receiving that packets ( using TCP )
If server realize that not everyone receive packets , it resends multicast or unicast to particular client
The problem are that there might be one client who usually lost packets and force server to resend.
Is it good ?

Every client send acknowledgement of receiving that packets ( using TCP )
Sending an ACK for each packet, and using TCP to do so, is not scalable to a large number of receivers. Using a NACK based scheme is more efficient.
Each packet sent from the server should have a sequence number associated with it. As clients receive them, they keep track of which sequence numbers were missed. If packets are missed, a NACK message can then be sent back to the server via UDP. This NACK can be formatted as either a list of sequence numbers or a bitmap of received / not received sequence numbers.
If server realize that not everyone receive packets , it resends multicast or unicast to particular client
When the server receives a NACK it should not immediately resend the missing packets but wait for some period of time, typically a multiple of the GRTT (Group Round Trip Time -- the largest round trip time among the receiver set). That gives it time to accumulate NACKs from other receivers. Then the server can multicast the missing packets so any clients missing them can receive them.
If this scheme is being used for file transfer as opposed to streaming data, the server can alternately send the file data in passes. The complete file is sent on the first pass, during which any NACKs that are received are accumulated and packets that need to be resent are marked. Then on subsequent passes, only retransmissions are sent. This has the advantage that clients with lower loss rates will have the opportunity to finish receiving the file while high loss receivers can continue to receive retransmissions.
The problem are that there might be one client who usually lost packets and force server to resend.
For very high loss clients, the server can set a threshold for the maximum percentage of packets missed. If a client sends back NACKs in excess of that threshold one or more times (how many times is up to the server), the server can drop that client and either not accept its NACKs or send a message to that client informing it that it was dropped.
There are a number of protocols which implement these features:
UFTP - Encrypted UDP based FTP with multicast (disclosure: author)
NORM - NACK-Oriented Reliable Multicast
PGM - Pragmatic General Multicast
UDPCast
Relevant RFCs:
RFC 4654 - TCP-Friendly Multicast Congestion Control (TFMCC): Protocol Specification
RFC 5401 - Multicast Negative-Acknowledgment (NACK) Building Blocks
RFC 5740 - NACK-Oriented Reliable Multicast (NORM) Transport Protocol
RFC 3208 - PGM Reliable Transport Protocol Specification

To make UDP reliable, you have to handle few things (i.e., implement it yourself).
Connection handling: Connection between the sending and receiving process can drop. Most reliable implementations usually send keep-Alive messages to maintain the connection between the two ends.
Sequencing: Messages need to split into chunks before sending.
Acknowledgement: After each message is received an ACK message needs to be send to the sending process. These ACK messasges can also be sent through UDP, doesn't have to be through UDP. The receiving process might realise that has lost a message. In this case, it will stop delivering the messages from the holdback queue (queue of messages that holds the received messages, it is like a waiting room for messages), and request of a retransmission of the missing message.
Flow control: Throttle the sending of data based on the abilities of the receiving process to deliver the data.
Usually, there is a leader of a group of processes. Each of these groups normally have a leader and a view of the entire group. This is called a virtual synchrony.

Related

UDP TCP management of server congestion

Does TCP and UDP protocol have a way to manage their saturation?
When I write saturation, I mean Network congestion: what happens if the buffer of the server is full and the client sends a datagram UDP/TCP to the server?
Have these protocols a way to handle this scenario, or data would be lost?
This is a question about TCP/UDP basics. For this reason this answer is not going to be a complete TCP and UDP guide.
Network congestion at low level protocols
In case of network congestions, the data sender will usually notice it because of the failure of data sending APIs (e.g. the BSD functions send() and sendto()).
For example I have personal experience of TCP/IP over GPRS, in which the network problems caused data sending APIs to fail. In that case it was up to the sender to preserve its data in order to send it as soon as possible.
Congestion at receiver's side
That's what the asker had actually in mind.
Let's start from UDP. Really short answer: by its own design, data sent to congested servers will be lost. From Wikipedia,EN:
[...]It has no handshaking dialogues, and thus exposes the user's
program to any unreliability of the underlying network; there is no
guarantee of delivery, ordering, or duplicate protection[...]
Finally TCP. It has been designed to provide what is missing in UDP. From Wikipedia, EN:
TCP provides reliable, ordered, and error-checked delivery of a stream
of octets (bytes) between applications running on hosts communicating
via an IP network
How are these features achieved? I cannot provide a full TCP tutorial in this answer, but I can list three TCP's fundamental traits for achieving reliability:
Packets are numbered (each packet, but we can say each byte has a specific sequence number)
Retransmissions. The receiver sends an acknowledge (ACK) for each packet (but we can say each byte) it receives. For this reason the sender understands that the packet has not been received and can retransmit it (the number of retransmissions allowed vary according to different implmentations and user settings)
Sliding window. Let's describe it in a simply way: each peer currently informs the remote peer about its window, the number of bytes it is able to receive. As soon as a congestion occurs, a peer can reduce the windows so that the sender will slow down until the congestion ends.
To answer OP's question: in case of server congestion in case of TCP connection, the protocol assure retransmissions and throughput dynamic management that preserve for a reasonable amount of time any sent data.
I hope this simple description helps. It probably has raised even more questions, and in this case I suggest to deepen your study at the real source:
RFC 768 (UDP)
RFC 793 (TCP)

TCP/IP protocol and fragmentation

Using the TCP/IP protocol, given a connection between a client and a server, are the packets sent by the client to the server always received in the same order they were sent?
For example, if the client sends 3 packets of data, A, B and C, will the server always receive A first followed by B and C or is it possible for the server to receive C first, followed by A and B?
At IP level, packets may arrive in any order (if they arrive). At TCP level, the data stream is guaranteed to be ordered in the same manner on both ends.
That means yes, the server will always receive A then B then C. As long as you are using TCP.
When using TCP, data is received by the destination application in the same order as it is sent by the source application.
See the following for more details:
http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Data_transfer
TCP is a transmission protocol, and it transmits data by sending the data out in IP packets over the underlying IP network. TCP is responsible for ensuring the correct transmission of the data, which includes ordering the arriving packets, re-requesting missing ones and discarding duplicates.
TCP as such does not expose any notion of "packet" to the user; the fact that the data is chunked into IP packets is a detail of the "over IP" implementation. A different implementation, e.g. TCP-over-bicycle-courier, might employ an entirely different scheme.
It cannot happen that you receive data in a different order on the application side over a TCP socket.
It may happen that packets are received in a different order by the networking layer of the OS, but TCP makes it a requirement that the upper levels get data in order. It is the OS' role to ask again for unreceived fragments etc and assemble these fragments. So, you need not worry.
UDP, on the other hand, offers no such guarantee.
The server (as the physical NIC of the machine) might receive them in any order. Your OS might receive them in any order again - that will mostly (but not allways) be the order of physical reception. Your client application is guaranteed to receive them in correct order, thats a property of TCP
In general, packets will be received in the same order they are transmitted. But the network may drop or reorder packets. For example, packets may take different routes and arrive out of order. Packets may be lost or even duplicated on the network. The TCP implementation is responsible for retransmitting packets that are lost, acknowledging packets that are received, ignoring duplicated packets, all with the objective of accurately reconstructing the transmitted byte stream at the receiver.
At the application level, you send a stream of bytes and receive a stream of bytes. TCP does whatever is needed to ensure the received stream of bytes is identical to the sent stream of bytes, regardless of what happens to the packets on the network.

What is the difference between UDP and TCP packets? What do you use them for?

I was configuring IPtable yesterday. My colleague just asked me this question, and I couldn't anwser. I realized that I'm a much better developper than sysadmin and need to improve that.
So what are they? What are they for? Cons/Pros (if it's relevant).
These are like basic questions.
UDP :: User Datagram Protocol
1) No end to end Connection between to machines (may be in local network or somewhere in the internet).
2) The data received at the receiver end is not in stream as in TCP but as a complete block of data.
3) At the transport layer no packet order check is performed. That is in case of any error in the received packet, the receiver will not ask for resending the same packet to the sender.
4) Because of the above behaviour no sending buffers are required at the sender's end.
5) As no end to end connection is estld. and there are no handshakings required, UDP are pretty much faster but less reliable than TCP. Thus mostly used in gaming and DNS etc..
6) No acknowledgement required to be sent after recieiving packets.
TCP :: Transmission control Protocol
1) End to end Connection is maintained between to machines (may be in local network or somewhere in the internet).
2) The data received at the receiver end is a stream in TCP. Thus, when we do network programming for servers we first parse the header first and then depending upon the size mentioned in the header we obtain that much more number of bytes from the buffer.
3) Error checking and sequence number are all done. Thus in case any packet is received out of order (rarely) or is erred than that packet is made to resend. Also, lots of other protocols are involved for flow control (end to end flow control).
4) As connection establishment , handshaking and acknowledgement is to be done TCP are basically slower in operation than UDP.(Not significantly I believe)
5) Lots of protocols uses TCP as underlying transport protocol. HTTP,FTP,TELNET etc..
6) The communication procedure involves:
Server:: 1) Socket Open
2) Socket Bind
3) Socket Listen
4) Socket Accept
5) Socket Send/Recv
Client :: 1) Socket Open
2) Socket Connect
3) Socket Send/Recv
There are lots of other differeces also..but the above being the most common ones.
TCP is a reliable protocol which ensures that your packets reach their destination and is used in applications where all data must me trasfered accurately between parties. TCP requires both parties to negotiate a connection before data transfer can start and it is a resilient protocol since it will repeatedly resend a packet until that packet is received by the intended recipient.
UDP is unreliable in a sense that it allows some packets to be lost in transit. Some applications of UDP are found in movie streaming where you can actually afford to lose a frame and not jeopardize movie quality. UDP does not need binding between the two parties and is often looked at as a light alternative to TCP.
A nice table is found here:TCP vs UDP
P.R.'s answer is mostly correct, but incomplete.
TCP is a reliable, connected stream protocol. Its view of data is that of a bidirectional stream of bytes between hosts: whatever bytes you send will arrive at the other end in the same order, at least as far as the application is concerned (the OS will rearrange packets if needed).
UDP is an unconnected datagram protocol. Its view of data is that of discrete datagrams, or messages, with no guarantee that these messages actually reach their recipient, or that they arrive in the order they were sent. It does guarantee that if a message arrives, it arrives in its entirety and without modification.
This website probably offers the simplest explanation to the actual difference of UDP and TCP. From implementation point of view, see this question.
For short answer: TCP works kind of like registered letter when UDP is kind of like ordinary letter - with the latter you never know whether the recipient got the packet you sent.
There are loads of helpful comparisons
chris is right!
One fancy link dropping out of google is: http://www.skullbox.net/tcpudp.php

Why do we say the IP protocol in TCP/IP suite is connectionless?

Why is the IP called a connectionless protocol? If so, what is the connection-oriented protocol then?
Thanks.
Update - 1 - 20:21 2010/12/26
I think, to better answer my question, it would be better to explain what "connection" actually means, both physically and logically.
Update - 2 - 9:59 AM 2/1/2013
Based on all the answers below, I come to the feeling that the 'connection' mentioned here should be considered as a set of actions/arrangements/disciplines. Thus it's more an abstract concept rather than a concrete object.
Update - 3 - 11:35 AM 6/18/2015
Here's a more physical explanation:
IP protocol is connectionless in that all packets in IP network are routed independently, they may not necessarily go through the same route, while in a virtual circuit network which is connection oriented, all packets go through the same route. This single route is what 'virtual circuit' means.
With connection, because there's only 1 route, all data packets will arrive in the same order as they are sent out.
Without connection, it is not guaranteed all data packets will arrive
in the same order as they are sent out.
Update - 4 - 9:55 AM 2016/1/20/Wed
One of the characteristics of connection-oriented is that the packet order is preserved. TCP use a sequence number to achieve that but IP has no such facility. Thus TCP is connection-oriented while IP is connection-less.
The basic idea is pretty simple: with IP (on its own -- no TCP, UDP, etc.) you're just sending a packet of data. You simply send some data onto the net with a destination address, but that's it. By itself, IP gives:
no assurance that it'll be delivered
no way to find out if it was
nothing to let the destination know to expect a packet
much of anything else
All it does is specify a minimal packet format so you can get some data from one point to another (e.g., routers know the packet format, so they can look at the destination and send the packet on its next hop).
TCP is connection oriented. Establishing a connection means that at the beginning of a TCP conversation, it does a "three way handshake" so (in particular) the destination knows that a connection with the source has been established. It keeps track of that address internally, so it can/will/does expect more packets from it, and be able to send replies to (for example) acknowledge each packet it receives. The source and destination also cooperate to serial number all the packets for the acknowledgment scheme, so each end knows whether packets it sent were received at the other end. This doesn't involve much physically, but logically it involves allocating some memory on both ends. That includes memory for metadata like the next packet serial number to use, as well as payload data for possible re-transmission until the other side acknowledges receipt of that packet.
TCP/IP means "TCP over IP".
TCP
--
IP
TCP provides the "connection-oriented" logic, ordering and control
IP provides getting packets from A to B however it can: "connectionless"
Notes:
UDP is connection less but at the same level as TCP
Other protocols such as ICMP (used by ping) can run over IP but have nothing to do with TCP
Edit:
"connection-oriented" mean established end to end connection. For example, you pick up the telephone, call someone = you have a connection.
"connection-less" means "send it, see what happens". For example, sending a letter via snail mail.a
So IP gets your packets from A to B, maybe, in any order, not always eventually. TCP sorts them out, acknowledges them, requests a resends and provides the "connection"
Connectionless means that no effort is made to set up a dedicated end-to-end connection, While Connection-Oriented means that when devices communicate, they perform handshaking to set up an end-to-end connection.
IP is an example of the Connectionless protocols , in this kind of protocols you usually send informations in one direction, from source to destination without checking to see if the destination is still there, or if it is prepared to receive the information . Connectionless protocols (Like IP and UDP) are used for example with the Video Conferencing when you don't care if some packets are lost , while you have to use a Connection-Oriented protocol (Like TCP) when you send a File because you want to insure that all the packets are sent successfully (actually we use FTP to transfer Files). Edit :
In telecommunication and computing in
general, a connection is the
successful completion of necessary
arrangements so that two or more
parties (for example, people or
programs) can communicate at a long
distance. In this usage, the term has
a strong physical (hardware)
connotation although logical
(software) elements are usually
involved as well.
The physical connection is layer 1 of
the OSI model, and is the medium
through which the data is transfered.
i.e., cables
The logical connection is layer 3 of
the OSI model, and is the network
portion. Using the Internetwork
Protocol (IP), each host is assigned a
32 bit IP address. e.g. 192.168.1.1
TCP is the connection part of TCP/IP. IP's the addressing.
Or, as an analogy, IP is the address written on the envelope, TCP is the postal system which uses the address as part of the work of getting the envelope from point A to point B.
When two hosts want to communicate using connection oriented protocol, one of them must first initiate a connection and the other must accept it. Logically a connection is made between a port in one host and other port in the other host. Software in one host must perform a connect socket operation, and the other must perform an accept socket operation. Physically the initiator host sends a SYN packet, which contains all four connection identifying numbers (source IP, source port, destination IP, destination port). The other receives it and sends SYN-ACK, the initiator sends an ACK, then the connection are established. After the connection established, then the data could be transferred, in both directions.
In the other hand, connectionless protocol means that we don't need to establish connection to send data. It means the first packet being sent from one host to another could contain data payloads. Of course for upper layer protocols such as UDP, the recipient must be ready first, (e.g.) it must perform a listen udp socket operation.
The connectionless IP became foundation for TCP in the layer above
In TCP, at minimal 2x round trip times are required to send just one packet of data. That is : a->b for SYN, b->a for SYN-ACK, a->b for ACK with DATA, b->a for ACK. For flow rate control, Nagle's algorithm is applied here.
In UDP, only 0.5 round trip times are required : a->b with DATA. But be prepared that some packets could be silently lost and there is no flow control being done. Packets could be sent in the rate that are larger than the capability of the receiving system.
In my knowledge, every layer makes a fool of the one above it. The TCP gets an HTTP message from the Application layer and breaks it into packets. Lets call them data packets. The IP gets these packets one by one from TCP and throws it towards the destination; also, it collects an incoming packet and delivers it to TCP. Now, TCP after sending a packet, waits for an acknowledgement packet from the other side. If it comes, it says the above layer, hey, I have established a connection and now we can communicate! The whole communication process goes on between the TCP layers on both the sides sending and receiving different types of packets with each other (such as data packet, acknowledgement packet, synchronization packet , blah blah packet). It uses other tricks (all packet sending) to ensure the actual data packets to be delivered in ordered as they were broken and assembled. After assembling, it transfers them to the above application layer. That fool thinks that it has got an HTTP message in an established connection but in reality, just packets are being transferred.
I just came across this question today. It was bouncing around in my head all day and didn't make any sense. IP doesn't handle transport. Why would anyone even think of IP as connectionless or connection oriented? It is technically connectionless because it offers no reliability, no guaranteed delivery. But so is my toaster. My toaster offers no guaranteed delivery, so why not call aa toaster connectionless too?
In the end, I found out it's just some stupid title that someone somewhere attached to IP and it stuck, and now everyone calls IP connectionless and has no good reason for it.
Calling IP connectionless implies there is another layer 3 protocol that is connection oriented, but as far as I know, there isn't and it is just plain stupid to specify that IP is connectionless. MAC is connectionless. LLC is connectionless. But that is useless, technically correct info.

Failure scenarios for reliable UDP?

What could be good list of failure scenaros for testing a reliable UDP layer? I have thought of the below cases:
Drop Data packets
Drop ACK, NAK Packets
Send packets in out of sequence.
Drop intial hand shaking packets
Drop close / shutdown packets
Duplicate packets
Please help in identifying other cases that reliable UDP needs to handle?
The list you've given sounds pretty good. Also think about:
Very delayed packets (where most packets come through fine, but one or two are delayed by several minutes);
Very delayed duplicates (where the original came through quickly, but the duplicate arrived after several minutes delay);
Silent dropping of all packets above a certain size (both unidirectional and bidirectional cases);
Highly variable delays;
Sequence number wrapping tests.
Have you tried intentionally corrupting packets in transit?
Also, have you considered a scenario where only one-way communication is possible? In this case, the sending host thinks that the send failed, but the receiving end successfully processes the message. For instance:
host A sends a message to host B
B successfully receives message and replies with ACK
ACK gets dropped in the network
A waits for timeout and re-sends message (repeats steps 1-3)
host A exceeds retry count and thinks the send failed, but host B has in fact processed the message
I have thought UDP is a connectionless and unreliable protocol and that is does not require and specific transport handshake between hosts. And hence there is no such thing as a reliable UDP protocol.

Resources