TCP vs UDP for real-time chat recommendation-engine?

TCP vs UDP for real-time chat recommendation-engine? - networking

I am building a chat application, where each keystroke presses of the user are sent to the server. At the server, a recommendation engine which is based on nlp generates recommendations based on the context of the typed message at that point of time.
For large scale deployment, which connection type would be preferable between TCP and UDP. UDP is fast but unreliable, whereas TCP, being reliable may be slow in real-time. For example:A user types the words "Hey, lets watch" and quickly clears the text-box,the recommendation of a movie should not be generated after he clears the text-box.
If the server has a recommendation, it should be guaranteed to deliver the recommendation back to the client.
The aim is to get real-time recommendation with low latency. Which type would be more preferable?

TCP and UDP are almost identical if the size of the data being sent at a time is less than the maximum payload of a single frame.
In that case UDP will be more "reliable" in terms of real-time behaviour since it is more within your hands how the data is processed. The downside of course is that you have to care of certain things yourself which TCP will give you for free.
With TCP on the other hand the TCP layer of the protocol stack can make a mess out of your real-time requirements and you don't even have a chance doing anything about it. Ever thought about re-transmitts (about +200ms transmittion time), nagle algorithm (small packets are delayed for up to 200ms), delayed TCP ACKs (may cause re-transmitts on some stacks)? And there is a lot more in stock for you if you have strickt real-tme requirements.
I'm working on a project which has a 20ms timeframe and transmitts a lot of data in that time using TCP. Even though we have a star architecture and real-time operating systems it is a hell to get this working reliable (well lots of the effects are due to one of our used Ethernet chips the smsc91c111).
Concluding there is no "best way" doing things like that since neither UDP nor TCP are real-time protocols. But since it is fairly easy to switch between them I recommend simply to test it and choose the protocol which works best.

Related

Tcp congestion avoidance at every node

Is it possible for each node connected to the same router to implement different congestion avoidance technique? Also, is it possible to completely disable congestion avoidance in a node connected to a router. Thanks.

The answer depends on your definition of "possible".
Internet hosts should adhere to internet standards. According to these documents TCP should implement congestion control algorithm that is "reno-friendly", that is can coexist with Reno (see RFC5681). So, if you need a TCP implementation, that adheres to these standards, then the answer is no.
Enforcing this, is however another issue, which as far as I know does not really have a solution. So, can you still implement whatever congestion control or no congestion control at all, and still connect to Internet, the yes.
Is it actually done? Yes As of now, Linux hosts use TCP Cubic, and Windows uses another congestion control mechanism, whose name I don't remember. They are both Reno friendly and coexist with each other, but they are different and they are different from Reno. Recentrly, Google deployed BBR, which may or may not be Reno friendly. Moreover, realtime multimedia streams (e.g., voice or video conferences) also should use some kind of congestion control, so they contribute to the variety as well.
Will router care? Not Really. Router does not care if the attached hosts implement congestion control or not. A simple router will do exactly the same thing. It will get incomming packets, then either send them, if outgoing interface is free, queue them, if the interface is currently busy transmitting other packets and it has space in a queue for this interface, or drop packets, if the queue is full. More complicated routers can utilize things like active queue management schemes, or quality of service with rate control. This will affect how router handles packets, but it won't affect the functionality of the router. It has to take misbehaving hosts into account.
What will be affected? Applications. Application performance for several flows sharing the same bottleneck will be affected, if the hosts implement different congestion control mechanisms or no congestion control at all. How, actually depends on bottleneck bandwidth, capabilities of the routers, and traffic patterns of the applications. It is not possible to say how. However, there is definitelly a possibility that network will not be able to transmit useful traffic (this is known as congestion collapse). One other important thing that will most likely be affected is fairness, which more or less quantifies how equally several flows sharing the same bottleneck will share available bandwidth. A flow that does not implement congestion control can highjack all available bandwidth. The same applies with flows that use more aggressive congestion control than TCP Reno and flows that don't. So, it is not nice, not to implement congestion control. Of course the router can actually do something about it, but it requires pretty expensive per-flow sheduling (you can search for fair-queueing or flow-queueing), and routers usually do not do this.
References:
requirements for internet hosts: RFC1122
latest congestion control algorithm specification
RFC5681.

Multiple IOT devices communicating to a server Asynchronously via TCP

I want multiple IoT devices (Say 50) communicating to a server directly asynchronously via TCP. Assume all of them have a heartbeat pulse every 30 seconds and may drop off and reconnect at variable times.
Can anyone advice me the best way to make sure no data is dropped or blocked when multiple devices are communicating simultaneously?

TCP by itself ensures no data loss during the communication between a client and a server. It does that by the use of sequence numbers and ACK messages.
Technically, before the actual data transfer happens, a TCP connection is created between the client (which can be an IoT device, or any other device) and the server. Then, the data is split into multiple packets and sent over the network through that connection. All TCP-related mechanisms like flow-control, error-detection, congestion-detection, and many others, take place once the data starts to flow.
The wiki page for TCP is a pretty good start if you want to learn more about how it works.
Apart from that, as long as your server has enough capacity to support the flow of requests coming from the devices, then everything should work (at least in theory).

I don't think you are asking the right question. There is no way to make sure that no data is dropped or blocked. Networks do not always work (that is why the word work is in network, to convince you otherwise ).
The right question is: how do I make my distributed system as available and reliable as possible? The answer involves viewing interruption and congestion as part of the normal operation, and build your software appropriately.
There is a timeless usenix/acm/? paper from the late 70s early 80s that invigorated the notion that end-to-end protocols are much more effective then over-featured middle to middle protocols; and most guarantees of middle to middle amount to best effort. If you rely upon those guarantees, you are bound to fail. Sorry, cannot find the reference right now, but it is widely cited.

Are there any fault-tolerant protocols based on unidirectional UDP?

I have encountered a situation where I need to replicate a small (<10 MB) database over unidirectional UDP. The physical ethernet cable prevents data flowing the other direction. Updates must be replicated within a few seconds with very low risk of losing data, but there is no way for the receiving side to ask for retransmission if it detects a failure, e.g. a packet loss, as the data-link is unidirectional.
A crude solution to lower the probability of lost updates would be to send each update multiple times, but that seems rather inefficient if the update frequency is high. A more sophisticated solution would be to use error-correcting codes to recover lost or corrupted packets.
Are there any implementations of such fault-tolerant unidirectional protocol?

TCP vs UDP for Microprocessor Communication

I am using TCP for communicating with an arduino (just open a socket and wait for a connection)using an ethernet shield, While watching/reading about various other project that use some sort of network interface for communication they all seem to use UDP instead of TCP for communication. What I was wondering is what would be my gain if I use UDP instead?

A UDP stack is considerably simpler than a TCP stack. You can easily write a UDP stack from scratch on your own, TCP is a bit harder, doable but harder. TCP has built in retries and other things so you dont lose reliability with UDP directly, it is what you do with it that could compare. UDP is significantly faster than TCP and is why it is or was used for video and various things back in the day. Also things like video could stand to lose a packet here and there and didnt care. For embedded UDP is quite nice for being small, fast, etc. If you are using someone elses library then UDP is likely not going to save you much on memory/flash resources, it will still be a bit faster. It is when you implement your own UDP that you save quite a bit on memory, because you can cut corners. You can do things like only implement arp and udp and nothing else (although ping is useful but somehow painful), and you can cut corners on arp/rarp depending on what you need to do with this thing. You can implement support only for the packet size you are interested in. Numbering your packets and having the requesting side send two or three of everything and responding to every request can greatly reduce the lost packet problem. Keeping the packet size very small helps both the embedded resource problem and avoids any mtu or other problems along the way. For simplicity you can even force a specific packet length.
I always ask the question the other way, what would I stand to gain by using TCP. There are times when it is useful, embedded, desktop or server though I still ask that question every time and have to justify the use of TCP over UDP otherwise I wont use it.

You gain multicasting, but lose reliability.

You gain code space, data memory, and determinism.
A fair amount of memory is required to reassemble a TCP stream, unless you want to NAK every packet that's not in-order. They are never guaranteed to come in order....
An asynchronous command-response protocol with timeouts, where all commands and responses fit into a single UDP packet, and commands are idempotent (can be applied many times and maintain the correct result) is a pretty robust protocol.

UDP vs TCP, how much faster is it? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
For general protocol message exchange, which can tolerate some packet loss. How much more efficient is UDP over TCP?

People say that the major thing TCP gives you is reliability. But that's not really true. The most important thing TCP gives you is congestion control: you can run 100 TCP connections across a DSL link all going at max speed, and all 100 connections will be productive, because they all "sense" the available bandwidth. Try that with 100 different UDP applications, all pushing packets as fast as they can go, and see how well things work out for you.
On a larger scale, this TCP behavior is what keeps the Internet from locking up into "congestion collapse".
Things that tend to push applications towards UDP:
Group delivery semantics: it's possible to do reliable delivery to a group of people much more efficiently than TCP's point-to-point acknowledgement.
Out-of-order delivery: in lots of applications, as long as you get all the data, you don't care what order it arrives in; you can reduce app-level latency by accepting an out-of-order block.
Unfriendliness: on a LAN party, you may not care if your web browser functions nicely as long as you're blitting updates to the network as fast as you possibly can.
But even if you care about performance, you probably don't want to go with UDP:
You're on the hook for reliability now, and a lot of the things you might do to implement reliability can end up being slower than what TCP already does.
Now you're network-unfriendly, which can cause problems in shared environments.
Most importantly, firewalls will block you.
You can potentially overcome some TCP performance and latency issues by "trunking" multiple TCP connections together; iSCSI does this to get around congestion control on local area networks, but you can also do it to create a low-latency "urgent" message channel (TCP's "URGENT" behavior is totally broken).

In some applications TCP is faster (better throughput) than UDP.
This is the case when doing lots of small writes relative to the MTU size. For example, I read an experiment in which a stream of 300 byte packets was being sent over Ethernet (1500 byte MTU) and TCP was 50% faster than UDP.
The reason is because TCP will try and buffer the data and fill a full network segment thus making more efficient use of the available bandwidth.
UDP on the other hand puts the packet on the wire immediately thus congesting the network with lots of small packets.
You probably shouldn't use UDP unless you have a very specific reason for doing so. Especially since you can give TCP the same sort of latency as UDP by disabling the Nagle algorithm (for example if you're transmitting real-time sensor data and you're not worried about congesting the network with lot's of small packets).

UDP is faster than TCP, and the simple reason is because its non-existent acknowledge packet (ACK) that permits a continuous packet stream, instead of TCP that acknowledges a set of packets, calculated by using the TCP window size and round-trip time (RTT).
For more information, I recommend the simple, but very comprehensible Skullbox explanation (TCP vs. UDP)

with loss tolerant
Do you mean "with loss tolerance" ?
Basically, UDP is not "loss tolerant". You can send 100 packets to someone, and they might only get 95 of those packets, and some might be in the wrong order.
For things like video streaming, and multiplayer gaming, where it is better to miss a packet than to delay all the other packets behind it, this is the obvious choice
For most other things though, a missing or 'rearranged' packet is critical. You'd have to write some extra code to run on top of UDP to retry if things got missed, and enforce correct order. This would add a small bit of overhead in certain places.
Thankfully, some very very smart people have done this, and they called it TCP.
Think of it this way: If a packet goes missing, would you rather just get the next packet as quickly as possible and continue (use UDP), or do you actually need that missing data (use TCP). The overhead won't matter unless you're in a really edge-case scenario.

When speaking of "what is faster" - there are at least two very different aspects: throughput and latency.
If speaking about throughput - TCP's flow control (as mentioned in other answers), is extremely important and doing anything comparable over UDP, while certainly possible, would be a Big Headache(tm). As a result - using UDP when you need throughput, rarely qualifies as a good idea (unless you want to get an unfair advantage over TCP).
However, if speaking about latencies - the whole thing is completely different. While in the absence of packet loss TCP and UDP behave extremely similar (any differences, if any, being marginal) - after the packet is lost, the whole pattern changes drastically.
After any packet loss, TCP will wait for retransmit for at least 200ms (1sec per paragraph 2.4 of RFC6298, but practical modern implementations tend to reduce it to 200ms). Moreover, with TCP, even those packets which did reach destination host - will not be delivered to your app until the missing packet is received (i.e., the whole communication is delayed by ~200ms) - BTW, this effect, known as Head-of-Line Blocking, is inherent to all reliable ordered streams, whether TCP or reliable+ordered UDP. To make things even worse - if the retransmitted packet is also lost, then we'll be speaking about delay of ~600ms (due to so-called exponential backoff, 1st retransmit is 200ms, and second one is 200*2=400ms). If our channel has 1% packet loss (which is not bad by today's standards), and we have a game with 20 updates per second - such 600ms delays will occur on average every 8 minutes. And as 600ms is more than enough to get you killed in a fast-paced game - well, it is pretty bad for gameplay. These effects are exactly why gamedevs often prefer UDP over TCP.
However, when using UDP to reduce latencies - it is important to realize that merely "using UDP" is not sufficient to get substantial latency improvement, it is all about HOW you're using UDP. In particular, while RUDP libraries usually avoid that "exponential backoff" and use shorter retransmit times - if they are used as a "reliable ordered" stream, they still have to suffer from Head-of-Line Blocking (so in case of a double packet loss, instead of that 600ms we'll get about 1.5*2*RTT - or for a pretty good 80ms RTT, it is a ~250ms delay, which is an improvement, but it is still possible to do better). On the other hand, if using techniques discussed in http://gafferongames.com/networked-physics/snapshot-compression/ and/or http://ithare.com/udp-from-mog-perspective/#low-latency-compression , it IS possible to eliminate Head-of-Line blocking entirely (so for a double-packet loss for a game with 20 updates/second, the delay will be 100ms regardless of RTT).
And as a side note - if you happen to have access only to TCP but no UDP (such as in browser, or if your client is behind one of 6-9% of ugly firewalls blocking UDP) - there seems to be a way to implement UDP-over-TCP without incurring too much latencies, see here: http://ithare.com/almost-zero-additional-latency-udp-over-tcp/ (make sure to read comments too(!)).

Which protocol performs better (in terms of throughput) - UDP or TCP - really depends on the network characteristics and the network traffic. Robert S. Barnes, for example, points out a scenario where TCP performs better (small-sized writes). Now, consider a scenario in which the network is congested and has both TCP and UDP traffic. Senders in the network that are using TCP, will sense the 'congestion' and cut down on their sending rates. However, UDP doesn't have any congestion avoidance or congestion control mechanisms, and senders using UDP would continue to pump in data at the same rate. Gradually, TCP senders would reduce their sending rates to bare minimum and if UDP senders have enough data to be sent over the network, they would hog up the majority of bandwidth available. So, in such a case, UDP senders will have greater throughput, as they get the bigger pie of the network bandwidth. In fact, this is an active research topic - How to improve TCP throughput in presence of UDP traffic. One way, that I know of, using which TCP applications can improve throughput is by opening multiple TCP connections. That way, even though, each TCP connection's throughput might be limited, the sum total of the throughput of all TCP connections may be greater than the throughput for an application using UDP.

Each TCP connection requires an initial handshake before data is transmitted. Also, the TCP header contains a lot of overhead intended for different signals and message delivery detection. For a message exchange, UDP will probably suffice if a small chance of failure is acceptable. If receipt must be verified, TCP is your best option.

I will just make things clear. TCP/UDP are two cars are that being driven on the road. suppose that traffic signs & obstacles are Errors TCP cares for traffic signs, respects everything around. Slow driving because something may happen to the car. While UDP just drives off, full speed no respect to street signs. Nothing, a mad driver. UDP doesn't have error recovery, If there's an obstacle, it will just collide with it then continue. While TCP makes sure that all packets are sent & received perfectly, No errors , so , the car just passes obstacles without colliding. I hope this is a good example for you to understand, Why UDP is preferred in gaming. Gaming needs speed. TCP is preffered in downloads, or downloaded files may be corrupted.

UDP is slightly quicker in my experience, but not by much. The choice shouldn't be made on performance but on the message content and compression techniques.
If it's a protocol with message exchange, I'd suggest that the very slight performance hit you take with TCP is more than worth it. You're given a connection between two end points that will give you everything you need. Don't try and manufacture your own reliable two-way protocol on top of UDP unless you're really, really confident in what you're undertaking.

There has been some work done to allow the programmer to have the benefits of both worlds.
SCTP
It is an independent transport layer protolol, but it can be used as a library providing additional layer over UDP. The basic unit of communication is a message (mapped to one or more UDP packets). There is congestion control built in. The protocol has knobs and twiddles to switch on
in order delivery of messages
automatic retransmission of lost messages, with user defined parameters
if any of this is needed for your particular application.
One issue with this is that the connection establishment is a complicated (and therefore slow process)
Other similar stuff
https://en.wikipedia.org/wiki/Reliable_User_Datagram_Protocol
One more similar proprietary experimental thing
https://en.wikipedia.org/wiki/QUIC
This also tries to improve on the triple way handshake of TCP and change the congestion control to better deal with fast lines.
Update 2022: Quic and HTTP/3
QUIC (mentioned above) has been standardized through RFCs and even became the basis of HTTP/3 since the original answer was written. There are various libraries such as lucas-clemente/quic-go or microsoft/msquic or google/quiche or mozilla/neqo (web-browsers need to be implementing this).
These libraries expose to the programmer reliable TCP-like streams on top the UDP transport. RFC 9221 (An Unreliable Datagram Extension to QUIC) adds working with individual unreliable data packets.

Keep in mind that TCP usually keeps multiple messages on wire. If you want to implement this in UDP you'll have quite a lot of work if you want to do it reliably. Your solution is either going to be less reliable, less fast or an incredible amount of work. There are valid applications of UDP, but if you're asking this question yours probably is not.

If you need to quickly blast a message across the net between two IP's that haven't even talked yet, then a UDP is going to arrive at least 3 times faster, usually 5 times faster.

It is meaningless to talk about TCP or UDP without taking the network condition into account.
If the network between the two point have a very high quality, UDP is absolutely faster than TCP, but in some other case such as the GPRS network, TCP may been faster and more reliability than UDP.

The network setup is crucial for any measurements. It makes a huge difference, if you are communicating via sockets on your local machine or with the other end of the world.
Three things I want to add to the discussion:
You can find here a very good article about TCP vs. UDP in the
context of game development.
Additionally, iperf (jperf enhance iperf with a GUI) is a
very nice tool for answering your question yourself by measuring.
I implemented a benchmark in Python (see this SO question). In average of 10^6 iterations the difference for sending 8 bytes is about 1-2 microseconds for UDP.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex