Checksums on TCP - tcp

Is TCP not responsible for making sure that a stream is sent intact over the wire by doing whatever may become necessary as losses etc. occur during a transfer?
Does it not do a proper job of it?
Why do higher application-layer protocols and their applications still perform checksums?

While TCP does contain its own checksum, it is only a 16-bit checksum and it is certainly possible for a multi-bit transmission error to slip by the TCP checksum mechanism. This is quite rare, but it is still possible and I have in fact seen it happen (once or twice in a couple of decades).
A robust protocol will want to use a higher-level hash function to assure integrity of transmitted data. Having said that, not many applications that transmit a small amount of data go to this trouble. Bulk transfer applications (such as a package manager or auto-update mechanism) will usually use a cryptographic hash function to increase the assurance of data integrity.

TCP ensures that TCP packets are delivered reliably, using checksums to trap errors introduced during transmission, and retransmitting lost or damaged packets as required. When a packet is transmitted it is retained in a retransmission queue until the peer host acknowledges receipt; if no acknowledgement is received within a certain timeout period then the packet is retransmitted. But the host won't keep retransmitting a packet forever - if a packet repeatedly fails then TCP eventually gives up and closes the connection.
Higher-level protocols assume that TCP works reliably (a fair assumption) and use their own checksums or whatever to check that the higher-level data stream arrived safely. I've written lots of buggy sockets applications that screwed up their own higher-level buffers and mangled the application data stream!
In any production-grade TCP/IP stack with a robust application I think you can be confident that the problem is that your connection is dropping out. Or you might have a buggy application, but I doubt that your fetch/wget is buggy.

Related

How to deal with many incoming UDP packets when the server has only 1 UDP socket?

When a server has only 1 UDP socket, and many clients are sending UDP packets to it, what would be the best approach to handle all of the incoming packets?
I think this can also be a problem with TCP packets, since there's a limited thread count, which cannot cover all client TCP socket receive events.
But things are better in this situation because there's 1 TCP socket per client, and even if the network buffer is full, packet receiving is blocked until the queue has space (let me know if I'm wrong).
UDP packets, however, are discarded when the buffer is full, and there's only 1 socket, so the chances of that happening are higher.
How can I solve this problem? I've searched for a while, but I couldn't get a clear answer. Should I implement my own queueing system? Or just maximize the network buffer size?
There is no way to guarantee you won't drop UDP messages. No matter what you do, if the rate of packets being sent is too large, you will drop some, either on the receiving host or somewhere in the network.
Some things that can help include:
Implementing an internal queue for messages in your Java app, and handing them over to a thread pool to process.
Increasing the kernel's message buffering.
But neither of these can deal with the case where the average message arrival rate is higher that the receiver's ability to process them or the network capacity. This will inevitably lead to lost messages (requests).
I've searched for a while, but I couldn't get a clear answer.
That is because there isn't one! Some problems are fundamentally unsolvable. For others, the best answer depends on factors that are too hard to measure or predict.
(If you want certainty ... don't use networking!)
In the TCP case, what you should do is use a (long-term) socket for each client. Depending on the number of sockets you need to support, you could either:
Dedicate a server-side thread to each socket (and client).
Use java.nio.channels.Selector and a thread pool.
You will still get problems if the rate of requests exceeds your server's ability to process them. However, the TCP connections will ensure that requests are not lost, and that the clients get some "back pressure".

Why is UDP used for DNS requests and not TCP?

Why is UDP usually used for DNS requests instead of TCP?
I know that we could use TCP, but why UDP is the default protocol? Are there any reasons for that, or it is just for design purposes?
UDP is default protocol because in most cases, and when DNS was designed, an exchange is a single question/response, each part fitting into a small 512 bytes packet, so there is no need to establish a long running connection, where TCP needs first a 3-way handshake before exchanging any data.
Hence in most cases UDP gives better performances and DNS is time sensitive.
But then of course UDP is easier to spoof than TCP and bigger packets can be a problem.
First of all, it is important to note that TCP can also be used for DNS. In practice, most DNS servers support both UDP and TCP, though TCP is rarely used for simple DNS queries and is reserved mainly for operations like zone transfers.
The biggest advantage to using UDP is the performance boost. There are several reasons why TCP DNS queries are slower:
TCP requires a connection to be established before each request, then subsequently torn down. So if it takes 20ms for a message to travel from your computer to the server and back (a time known as RTT - Round Trip Time), then a TCP query would require 3xRTT (60ms) to be fully processed - 20ms for opening the connection, 20 more ms for the query, and another 20ms to tear it down. UDP would only require one RTT, so 20ms.
Due to TCP's connection-oriented nature, more resource are needed per-connection to store and manage TCP's state. TCP requires both the client and server to have a separate socket for each and every connection.
UDP makes it easy to deploy anycast DNS servers. In anycast, several servers (possibly around the world) share a single IP address - e.g. 1.1.1.1. When you send a query to 1.1.1.1, one of these servers (probably one of the closest ones geographically) gets it. Since TCP involves multiple packets sent back and forth, reliable anycast is harder to achieve since you need to make sure that the packets always reach the same exact server. Otherwise, they might end up reaching different servers which won't know what to do with them.
Lower data overhead - a UDP header is tiny compared to the header TCP sends for every segment. Using UDP means sending fewer bytes.
Simplicity - UDP is a lot simpler than TCP. TCP is optimized for long data transfers and has a bunch of complex mechanisms such as flow control and congestion control for optimizing the rate of data flow. DNS doesn't need any of these mechanisms for simple queries since the typical amount of sent data is tiny.

reading tcp packets out of order

Web games are forced to use tcp.
But with real time constraints tcp head of line blocking behavior is absurd when you don't care about old packets.
While I'm aware that there's definitely nothing that we can do on the client side, I'm wondering if there is a solution on the server side.
Indeed, on the server you get packets in order and miserably wait if misbehaving packet t+42 has been lost even though packets t+43, t+44 can already be nicely waiting in your receive buffer.
Since we are talking about local data, technically it should be possible to retrieve it..
So does anyone have an idea on how to perform that feat?
How to save this precious data from these pesky kernel space daemons?
TCP guarantees that the data arrives in order and re-transmits lost packets. TCP Man Page
Given this, there is only one way to achieve the results you want given your stated constraints, and that is to hack the TCP protocol at the server side (assuming you cannot control the Client WebSocket behavior). The simplest, relative term, would be to open a raw socket, implement your own simple TCP handshake (Syn-Ack when client Syns), then read and write from the socket managing your own TCP headers. Your custom implementation would need to keep track of received sequence numbers and acknowledge all of those you want the client to forget about.
You might be able to reduce effort by making this program a proxy to your original.
Example of TCP raw socket here.

Why is UDP + a software reliable ordering system faster than TCP?

Some games today use a network system that transmits messages over UDP, and ensures that the messages are reliable and ordered.
For example, RakNet is a popular game network engine. It uses only UDP for its connections, and has a whole system to ensure that packets can be reliable and ordered if you so choose.
My basic question is, what's up with that? Isn't TCP the same thing as ordered, reliable UDP? What makes it so much slower that people have to basically reinvent the wheel?
General/Specialisation
TCP is a general purpose reliable system
UDP +whatever is a special purpose reliable system.
Specialized things are usually better than general purpose things for the thing they are specialized.
Stream / Message
TCP is stream-based
UDP is message-based
Sending discrete gaming information maps usually better to a message-based paradigm. Sending it through a stream is possible but horribly ineffective. If you want to reliably send a huge amount of data (File transfer), TCP is quite effective. That's why Bit-torrent use UDP for control messages and TCP for data sending.
We switched from reliable to unreliable in "league of legends" about a year ago because of several advantages which have since proven to be true:
1) Old information becomes irrelevant. If I send a health packet and it doesn't arrive... I don't want to have to wait for that same health packet to resend when I know its changed.
2) Order is sometimes not necessary. If I'm sending different messages to different systems it may not be necessary to get those messages in order. I don't force the client to wait for in-order messages.
3) Unreliable doesn't get backed up with messages... ie waiting for acknowledgements which means you can resolve loss spikes much more quickly.
4) You can control resends when necessarily more efficiently. Such as repacking something that didn't send into another packet. (TCP does repack but you can do it more efficiently with knowledge about how your program works.)
5) Flow control of message such as throwing away messages that are less relevant when the network suddenly spikes. The network system can choose not to resend less relevant messages when you have a loss spike. With TCP you'd still have a queue of messages that are trying to resend which may be lower priority.
6) Smaller header packet... don't really need to say much about that.
There's much more of a difference between UDP and TCP than just reliability and sequencing:
At the heart of the matter is the fact that UDP is connectionless while TCP is connected. This simple difference leads to a host of other differences that I'm not going to be able to reasonbly summarize here. You can read the analysis below for much more detail.
TCP - UDP Comparative Analysis
The answer in in my opinion the two words: "Congestion control".
TCP goes to great lengths to manage the bandwidth of the path - to use the most of it, but to ensure that there is space for other applications. This is a very hard task, and inherently it is not possible to use 100% of the bandwidth 100% of the time.
With UDP, on the other hand, one can make their own protocol to send the packets onto the wire as fast as they want - this makes the protocol very unfriendly to other applications, but can gain more "performance" short-term. On the other hand, it is with high probability that if the conditions are appropriate, this kind of protocols might contribute to congestion collapse.
TCP is a stream-oriented protocol, whereas UDP is a message-oriented protocol. Hence TCP does more than just reliability and ordering. See this post for more details. Basically, the RakNet developers added the reliability and ordering while still keeping it as a message-oriented protocol, and so the result was more lightweight than TCP (which has to do more).
This little article is old, but it's still pretty true when it comes to games. It explains the two protocols, and the havoc these folks went trying to develop a multiplayer internet game. "X-Wing vs Tie Fighter"
Lessons Learned (The Internet Sucks)
There is one caveat to this though, I run/develop a multiplayer game, and I've used both. UDP was much better for my app, but alot of people couldn't play with UDP. Routers and such blocked the connections. So I changed to the "reliable" TCP. Well... Reliable? I don't think so. You send a packet, no errors, you send another and it crashes (exception) in the middle of the packet. Now which packets made it? So you end up writing a reliable protocol ON TOP OF tcp, to simulate UDP - but continuously establish a new connection when it crashes. Take about inefficient.
UDP + Stop and wait ARW = good
UDP + Sliding Window Protocol = better
TCP + Sliding Window Protocol with reconnection? = Worthless bulkware. (IMHO)
The other side effect is multi-threaded applications. TCP works well for a chat room type thing, since each room can be it's own thread. A room can hold 60-100 people and it runs fine, as the Room thread contains the Sockets for each participant.
UDP on the other hand is best served (IMO) by one thread, but when you get the packet, you have to parse it to figure out who it came from (via info sent or RemoteEndPoint), then pass that data to the chatroom thread in a threadsafe manner.
Actually, you have to do the same with TCP, but only on connect.
Last point. Remember that TCP will just error out and kill the connection at anytime, but you can reconnect in about .5 seconds and send the same information. Most bizzare thing I've ever worked with.
UDP have a lower reliability give it more reliability by making it send a message and wait for respond if no respond came it resend the message.

Why Does RTP use UDP instead of TCP?

I wanted to know why UDP is used in RTP rather than TCP ?. Major VoIP Tools used only UDP as i hacked some of the VoIP OSS.
As DJ pointed out, TCP is about getting a reliable data stream, and will slow down transmission, and re-transmit corrupted packets, in order to achieve that.
UDP does not care about reliability of the communication, and will not slow down or re-transmit data.
If your application needs a reliable data stream, for example, to retrieve a file from a webserver, you choose TCP.
If your application doesn't care about corrupted or lost packets, and you don't need to incur the additional overhead to provide the additional reliability, you can choose UDP instead.
VOIP is not significantly improved by reliable packet transmission, and in fact, in some cases things in TCP like retransmission and exponential backoff can actually hurt VOIP quality. Therefore, UDP was a better choice.
A lot of good answers have been given, but I'd like to point one thing out explicitly:
Basically a complete data stream is a nice thing to have for real-time audio/video, but its not strictly necessary (as others have pointed out):
The important fact is that some data that arrives too late is worthless. What good is the missing data for a frame that should have been displayed a second ago?
If you were to use TCP (which also guarantees the correct order of all data), then you wouldn't be able to get to the more up-to-date data until the old one is transmitted correctly. This is doubly bad: you have to wait for the re-transmission of the old data and the new data (which is now delayed) will probably be just as worthless.
So RTP does some kind of best-effort transmission in that it tries to transfer all available data in time, but doesn't attempt to re-transmit data that was lost/corrupted during the transfer (*). It just goes on with life and hopes that the more important current data gets there correctly.
(*) actually I don't know the specifics of RTP. Maybe it does try to re-transmit, but if it does then it won't be as aggressive as TCP is (which won't ever accept any lost data).
The others are correct, however the don't really tell you the REAL reason why. Saua kind of hints at it, but here's a more complete answer.
Audio and Video is real-time. If you are listening to a radio, or watching TV, and the signal is interrupted, it doesn't pick up where you left off.. you're just "observing" the signal as it streams, and if you can't observe it at any given time, you lose it.
The reason, is simple. Delay. VOIP tries very hard to minimize the amount of delay from the time someone speaks into one end and you get it on your end, and your response back. Otherwise, as errors occured, the amount of delay between when the person spoke and when the signal was received would continuously grow until it became useless.
Remember, each delay from a retransmission has to be replayed, and that causes further data to be delayed, then another error causes an even greater delay. The only workable solution is to simply drop any data that can't be displayed in real-time.
A 1 second delay from retransmission would mean it would now be 1 second from the time I said something until you heard it. A second 1 second delay now means it's 2 seconds from the time i say something until you hear it. This is cumulative because data is played back at the same rate at which it is spoken, and so on...
RTP could be connection oriented, but then it would have to drop (or skip) data to keep up with retransmission errors anyways, so why bother with the extra overhead?
Technically RTP packets can be interleaved over a TCP connection. There are lots of great answers given here. Two additional minor points:
RFC 4588 describes how one could use retransmission with RTP data. Most clients that receive RTP streams employ a buffer to account for jitter in the network that is typically 1-5 seconds long and which means there is time available for a retransmit to receive the desired data.
RTP traffic can be interleaved over a TCP connection. In practice when this is done, the difference between Interleaved RTP (i.e. over TCP) and RTP sent over UDP is how these two perform over a lossy network with insufficient bandwidth available for the user. The Interleaved TCP stream will end up being jerky as the player continually waits in a buffering state for packets to arrive. Depending on the player it may jump ahead to catch up. With an RTP connection you will get artifacts (smearing/tearing) in the video.
UDP is often used for various types of realtime traffic that doesn't need strict ordering to be useful. This is because TCP enforces an ordering before passing data to an application (by default, you can get around this by setting the URG pointer, but no one seems to ever do this) and that can be highly undesirable in an environment where you'd rather get current realtime data than get old data reliably.
RTP is fairly insensitive to packet loss, so it doesn't require the reliability of TCP.
UDP has less overhead for headers so that one packet can carry more data, so the network bandwidth is utilized more efficiently.
UDP provides fast data transmission also.
So UDP is the obvious choice in cases such as this.
Besides all the others nice and correct answers this article gives a good understanding about the differences between TCP and UDP.
The Real-time Transport Protocol is a network protocol used to deliver streaming audio and video media over the internet, thereby enabling the Voice Over Internet Protocol (VoIP).
RTP is generally used with a signaling protocol, such as SIP, which sets up connections across the network. RTP applications can use the Transmission Control Protocol (TCP), but most use the User Datagram protocol (UDP) instead because UDP allows for faster delivery of data.
UDP is used wherever data is send, that does not need to be exactly received on the target, or where no stable connection is needed.
TCP is used if data needs to be exactly received, bit for bit, no loss of bits.
For Video and Sound streaming, some bits that are lost on the way do not affect the result in a way, that is mentionable, some pixels failing in a picture of a stream, nothing that affects a user, on DVDs the lost bit rate is higher.
just a remark:
Each packet sent in an RTP stream is given a number one higher than its predecessor.This allows thr destination to determine if any packets are missing.
If a packet is mising, the best action for the destination to take is to approximate the missing vaue by interpolation.
Retranmission is not a proctical option since the retransmitted packet would be too late to be useful.
I'd like to add quickly to what Matt H said in response to Stobor's answer. Matt H mentioned that RTP over UDP packets can be checksum'ed so that if they are corrupted, they will get resent. This is actually an optional feature on most PBXs. In Asterisk, for example, you can enable / disable checksums on your RTP over UDP traffic in the rtp.conf configuration file with the following line:
rtpchecksums=yes ; or no if you prefer
Cheers!

Resources