How does TCP slow start increase throughput? - tcp

TCP slow start came about in a time when the Internet began experiencing "congestion collapses". The anecdotal example from Van Jacobson and Michael Karels paper goes:
During this period, the data throughput from LBL to UC Berkeley (sites separated
by 400 yards and two IMP hops) dropped from 32 Kbps to 40 bps.
The congestion problem is often described as being caused by the transition from a high-speed link to a slow-speed link, and packet build up/dropping at the buffer at this bottleneck.
What I'm trying to understand is how such a build up would cause a drop in end-to-end throughput, as opposed to simply causing superfluous activity/retransmits on the high-speed portion of the link leading into the full buffer. As an example, consider the following network:
fast slow fast
A ======== B -------- C ======== D
A and D are the endpoints and B and C are the packet buffers at a transition from a high speed to low speed network. So e.g. the link between A/B and C/D is 10Mbps, and link between B/C is 56Kbps. Now if A transmits a large (let's say theoretically infinite) message to D, what I'm trying to understand is why it would take it any longer to get through if it just hammered the TCP connection with data versus adapting to the slower link speed in the middle of the connection. I'm envisaging B as just being some thing whose buffer drains at a fixed rate of 56Kbps, regardless of how heavily its buffer is being hammered by A, and regardless of how many packets it has to discard because of a full buffer. So if A is always keeping B's buffer full (or overfull as may be the case), and B is always transmitting at it's maximum rate of 56Kbps, how would the throughput get any better by using slow-start instead?
The only thing I could think of was if the same packets D had already received were having to be retransmitted over the slow B/C link under congestion, and this was blocking new packets. But wouldn't D have typically ACK'd any packets it had received, so retransmitted packets should be mostly those which legitimately hadn't been received by D because they were dropped at B's buffer?

Remember that networks involve sharing resources between multiple computers. Very simplistically, slow start is required to avoid router buffer exhaustion by a small number of TCP sessions (in your diagram, this is most likely at points B and C)
From RFC 2001, Section 1:
Old TCPs would start a connection with the sender injecting multiple
segments into the network, up to the window size advertised by the
receiver. While this is OK when the two hosts are on the same LAN,
if there are routers and slower links between the sender and the
receiver, problems can arise. Some intermediate router must queue
the packets, and it's possible for that router to run out of space.
[2] shows how this naive approach can reduce the throughput of a TCP
connection drastically.
...
[2] V. Jacobson, "Congestion Avoidance and Control," Computer
Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988.
ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
Routers must have finite buffers. The larger a speed mismatch between links is, the greater the chance of buffer exhaustion without slow start. After you have buffer exhaustion, your average TCP throughput will go down because buffering increases TCP's ability to utilize links (preventing unnecessary drops for instantaneous link saturation).
Note that RFC 2001 above has been superseded by RFC 5681; however, RFC 2001 offers a more quotable answer to your question.
From your OP...
Now if A transmits a large (let's say theoretically infinite) message to D, what I'm trying to understand is why it would take it any longer to get through if it just hammered the TCP connection with data versus adapting to the slower link speed in the middle of the connection.
First, there is no such thing as an infinite message in TCP. TCP was limited by the initial window size before slow-start came along.
So, let's say the initial TCP segment was 64KB long. If the entire TCP segment fills the router's tx buffer at B, TCP utilizes less of the link over time due to dynamics involved with packet loss, ACKs and TCP back-off. Let's look at individual situations:
B's tx_buffer < 64KB: You automatically lost time for retransmissions because A's TCP is sending faster than B can dequeue packets
B's tx_buffer >= 64KB: As long as A is the only station transmitting, no negative effects (as long as D is ACK-ing correctly); however, if there are multiple hosts transmitting on A's LAN trying to transit across the 56K link, there are probably problems because it takes 200 milliseconds to dequeue a single 1500 byte packet at 56K. If you have 44 1500-byte packets from A's 64KB initial window (44*1460=64KB; you only get 1460 bytes of TCP payload), the router has a saturated link for almost 9 seconds handling A's traffic.
The second situation is neither fair nor wise. TCP backs off when it sees any packet loss... multiple hosts sharing a single link must use slow start to keep the situation sane.
BTW, I have never seen a router with 9 seconds of buffering on an interface. No user would tolerate that kind of latency. Most routers have about 1-2 seconds max, and that was years ago at T-1 speeds. For a number of reasons, buffers today are even smaller.

Related

Estimating TCP and UDP delay between two nodes

Suppose we have 2 nodes, A and B, directly connected by Internet (we can ignore the underlyng network eg, routers, ISP etc).
We know RTT between nodes (80ms)
We know packet loss (0.1)
We know jitter (1ms)
We know bandwith, A=100/10mbps B=50/5mbps (first value is download, second is upload)
A sends a 1GB file to B by using the TCP protocol (with 64KB segment size).
How many times they need to exchange the file?
How many times it takes to do the same thing by using the UDP
protocol?
EDIT:
i guess the main difference in the calculation between UDP and TCP is that in TCP we need to wait for every packet to be sent before sending the next one. Or, in other words, we have to add in the delay calculation one RTT for every packet. Moreover, packetloss is not considered at all in UDP. I am not sure of what I'm sayng in this edit, so let me know if I'm wrong.

Proper way to calculate Link Throughput

I have read some articles online and I got a pretty good idea about the TCP and UDP in general. However, I still have some doubts which I am sure not completely clear to me.
What is the proper way to calculate throughput ?
(Can't we just divide Total number of bytes received by total time taken ?)
What is that key feature in TCP that makes it have much much higher
throughput than UDP ?
UPDATE:
I understood that TCP uses windows which is nothing but that much segments can be sent before actually waiting for Acknowledgements. But my doubt is that in UDP segments are continuously sent without even bothering about Acknowledgements. So there is no extra overheads in UDP. Then, why the throughput of TCP is much much higher than that of UDP ?
Lastly,
Is this true ?
TCP throughput = (TCP Window Size / RTT) = BDP / RTT = (Link Speed in Bytes/sec * RTT)/RTT = Link Speed in Bytes/sec
If so then TCP throughput is always equals to the Know Link speed. And since the RTTs cancels out each other, the TCP throughput does not even depends on RTT.
I have seen in some network analysis tools like iperf, passmark performance test etc. that the TCP/UDP Throughput changes with Block size.
How is throughput dependent on Block size ?
Is Block size equals TCP window or UDP datagram size ?
What is the proper way to calculate throughput?
There are multiple ways, depending on what exactly you want to measure. They all boil down to dividing some number of bits (or bytes) to some duration, as you mention; what varies is which bits you are counting or (more rarely) which moments of time you are considering for measuring the duration.
The factors you need to take into account are:
At which layer in the network stack are you measuring throughput?
If you measure at the application layer, all that matters is what useful data you transmit to the other endpoint. For example, if you are transferring a file of 6 kB, the amount of data you count when measuring throughput is 6 kB (that is 6,000 bytes, not bits, and note the multiplier of 1000, not 1024; these conventions are common in networking).
This is usually called goodput and it may be different from what is actually sent at the transport layer (as in TCP or UDP), for two reasons:
1. Overhead due to headers
Each layer in the network adds a header to the data that introduces some overhead due to its transmission time. Moreover, the transport layer breaks your data into segments; this is because the network layer (as in IPv4 or IPv6) has a maximum packet size called MTU, typically 1,500 B in Ethernet networks. This value includes the network layer header size (e.g. the IPv4 header, which is variable in length but usually 20 B long) and the transport layer header (for TCP, it is also variable in length but usually 40 B long). This leads to a maximum segment size MSS (number of data bytes, without headers, in one segment) of 1500 - 40 - 20 = 1440 bytes.
Thus if we want to send 6 kB of application-layer data, we must break it into 6 segments, 4 of 1440 bytes each and one of 240 bytes. However at the network layer we end up sending 6 packets, 4 of 1500 bytes each and one of 300 bytes, for a total of 6.3 kB.
Here I have not considered the fact that the link layer (as in Ethernet) adds its own header and possibly also a suffix, which increases the overhead further. For Ethernet this is 14 bytes for the Ethernet header, optionally 4 bytes for VLAN tag, then a CRC of 4 bytes and a gap of 12 bytes, for a total of 36 bytes per packet.
If you consider a fixed-rate link, say of 10 Mb/s, depending on what you measure you will get a different throughput. Normally you want one of these:
The goodput, i.e. application layer throughput, if what you want to measure is application performance. For this example, you divide 6 kB by the transfer duration.
The link-layer throughput, if what you want to measure is network performance. For this example, you divide 6 kB + TCP overhead + IP overhead + Ethernet overhead = 6.3 kB + 5 * 36 B = 6516 B by the transfer duration.
Retransmission overheads
The Internet is a best-effort network, meaning that the packets will be delivered if possible, but may also be dropped. Packet drops are corrected by the transport layer, in case of TCP; for UDP, there is no such mechanism, which means that either the application does not care if some parts of the data do not get delivered, or the application implements retransmission itself on top of UDP.
Retransmission reduce goodput for two reasons:
a. Some data needs to be sent again, which takes time. This introduces a delay which is inversely proportional to the rate of the slowest link in the network between the sender and the receiver (a.k.a the bottleneck link).
b. Detecting that some data was not delivered needs feedback from the receiver to the sender. Due to propagation delays (sometimes called latency; caused by the finite speed of light in the cable), feedback can only be received by the sender with some latency, which slows down the transmission even more. In most practical cases, this is the most significant contribution to the extra delay caused by the retransmission.
Clearly, if you use UDP instead of TCP and you do not care about packet loss, you will of course get better performance. But for many applications, data loss cannot be tolerated, so such a measurement is meaningless.
There are some applications that do use UDP for transferring data. One is BitTorrent, which may use either TCP or a protocol they designed called uTP, which emulates TCP on top of UDP, but aims at being more efficient with many parallel connections. Another transport protocol implemented over UDP is QUIC, which also emulates TCP and offers multiplexing multiple parallel transfers over a single connection, and forward error correction to reduce retransmissions.
I will discuss forward error correction a little since it is related to your question about throughput. A naive way of implementing it is by sending every packet twice; in case one gets lost, the other still has a chance of being received. This reduces the amount of retransmissions to half, but also halves your goodput since you send redundant data (note that the network or link layer throughput remains the same!). In some cases this is fine; especially if the latency is very large, such as on intercontinental or satellite links. Moreover, some mathematical methods exist where you don't have to send a full copy of the data; for instance for every n packets you send, you send another reduntant one which is the XOR (or some other arithmetic operation) of them; if the redundant one gets lost, it doesn't matter; if one of the n packets gets lost, you can reconstruct it based on the redundant one and the other n-1. You can thus configure the overhead introduced by forward error correction to whatever amount of bandwidth you can spare.
How you are measuring the transfer time
Is the transfer completed when the sender finished sending the last bit over the wire, or does it also include the time it takes for the last bit to travel to the receiver? Additionally, does it include the time it takes to get a confirmation from the receiver, stating that all data has been received successfully and no retransmission is neede?
It really depends on what you want to measure. Note that for large transfers, one extra round-trip-time is insignificant in most cases (unless you are communicating, for instance, with a probe on Mars).
What is that key feature in TCP that makes it have much much higher throughput than UDP?
This is not true, although a common misconception.
In addition to retransmitting data when needed, TCP will also adjust its sending rate so that it will not cause packet drops by congesting the network. The adjustment algorithm has been perfected over decades, and usually converges quickly to the maximum rate supported by the network (actually, the bottleneck link). For this reason it is usually difficult to beat TCP in throughput.
With UDP, there is no rate limiting at the sender. UDP lets the application send as much as it wants. But if you try to send more than the network can handle, some of the data will be dropped, lowering your throughput, and also making the admin of the network you are congesting very angry. This means that sending UDP traffic at high rates is impractical (unless the goal is to DoS a network).
Some media applications are using UDP but rate-limiting the transfer at the sender at a very small rate. This is typically used in VoIP applications or Internet Radio, where you require very little throughput but low latency. I suppose this is one of the reasons for the misconception that UDP is slower than TCP; that is not the case, UDP can be as fast as the network allows.
As I said before, there are protocols such as uTP or QUIC, implemented over UDP, which achieve performance similar to TCP.
Is this true ?
TCP throughput = (TCP Window Size / RTT)
Without packet loss (and retransmissions), this is correct.
TCP throughput = BDP / RTT = (Link Speed in Bytes/sec * RTT)/RTT = Link Speed in Bytes/sec
This is correct only if the window size is configured to the optimal value. BDP/RTT is the optimal (maximum possible) transfer rate in the network. Most modern operating systems should be able to auto-configure it optimally.
How is throughput dependent on Block size ? Is Block size equals TCP window or UDP datagram size?
I don't see any block size in the iperf documentation.
If you refer to the TCP window size, if it is smaller than BDP, then your throughput will be suboptimal (because you waste time waiting for ACKs instead of sending more data; if needed I can explain further). If it is equal or higher to the BDP, then you achieve optimal throughput.
It depends on how you define "Throughput". It usually can be one of the followings.
Number of bytes (or bits) sent in a fixed period of time;
Number of bytes (or bits) sent and received on the receiver end in a fixed period of time;
You can apply these definition to every layer when people talking about throughput. In application layer, 2nd definition means the bytes have really been received by the receiver end of the application. Some people refer to it as "goodput". In Transport layer, say TCP, 2nd definition means the corresponding TCP ACKs are received. To me, most of people should be only interested in the bytes are really received by the receiver end. So, 2nd definition is usually what people mean by "Throughput".
Now, once we have a clear definition of throughput (2nd definition). We can discuss how to measure the throughput correctly.
Usually, people either use TCP or UDP to measure the network throughput.
TCP: People usually measure TCP throughput only on the sender end. As for packets successfully received by the receiver end, ACKs will be sending back. So, sender itself will know how many bytes are sent and received on the receiver end. Divided this number by the measuring time, we will know the throughput.
But, there are two things need to be noticed during TCP throughput measurement:
Is sender side always full buffer during the measurement? i.e. During the measurement period, sender should always has packets to send. It is important for correct throughput measurement. e.g. if I set my measuring time to be 60 seconds, but my file has been finished transmission in 40 seconds. Then there are 20 seconds the network is actually idle. I will under-estimate the throughput.
TCP rate is regulated by its congestion window size, slow-start duration, sender window (and receiver window) size. Sub-optimal configuration of these parameters will result in under-estimated TCP throughput. Although most of the modern TCP implementation should have a quite good configuration of all of these, it is hard for a tester to 100% sure all these configurations are optimal.
Due to these limitations/risks of TCP in network throughput estimation, quite a number of researchers will use UDP for measuring network throughput.
UDP: As UDP has no ACK sending back once the packets are successfully received, people has to measure the throughput in the receiver end. Or, if the receiver end is not easily accessed, people can compare the logs on both sender and receiver sides to determine the throughput. But, this inconvenience is mitigated by some throughput measuring tools. For example, iperf has embedded sequence numbers in its customized payload, so that it can detect any loss. Also, a receiver's report will be sent to the sender to show the throughput.
As UDP by nature is just sending whatever it has to the network and not waiting for the feedback. Its throughput (remember the 2nd definition) once measured will be the actual capacity (or bandwidth) of the network.
So, usually, the throughput measured by UDP should be higher than that from TCP although the difference should be small (~5%-10%).
One biggest drawback of UDP throughput measuring is that, when using UDP one should also make sure that sender-side buffer must be full. (Otherwise, it results in under-estimated throughput as TCP). This step will be little tricky. In iperf, one can specify the sending rate by -b option. Increase -b value in different rounds of testing will converge the throughput measured. For example, in my gigabit ethernet, I first use -b 100k in the test. The throughput measured is 100Kbps. Then I perform the following iterations to converge the maximum throughput which is the capacity of my ethernet.
-b 1m --> throughput: 1Mbps
-b 10m --> throughput: 10Mbps
-b 100m --> throughput: 100Mbps
-b 200m --> throughput: 170Mbps
-b 180m --> throughput: 175Mbps (this should be quite close to the actual capacity)

How does 802.11 MAC unfairness impact TCP performance?

One of the major factors that affect TCP performance in 802.11 ad hoc networks is the unfairness in the MAC. Could someone please illustrate for me what this "unfairness" means?
In ad hoc networks, you usually are trying to do multihop routing. 802.11 CSMA/CA can manifest the "exposed terminal problem" in these situations. Consider a linear topology
... A <---> T <---> B ...
A and B are not in CSMA range. Suppose T is already transmitting a data stream to A. Now suppose a TCP stream starts getting routed through B. Because B CSMAs with A, it will essentially be locked out of the channel. The TCP connection being routed through B will eventually timeout.
Another possibility is the "hidden terminal problem". Consider the topology
A ---> X <--- B
A and B cannot CSMA. Suppose A and B each try to send a TCP stream to X. Because they cannot CSMA with each other, both win their respective channel contention rounds and transmit, only to have their frames collide at X. This can be solved to some extent with RTS/CTS. But in general, the reason for poor TCP performance in wireless environments has to do with the fact that TCP uses a dropped packet as a congestion signal, i.e., a TCP source will cut its window and there by drop its throughput. In wireless networks, dropped packets can be due to any number of transient things (e.g., collision, interference). A TCP source misinterprets these packet drops as a congestion signal and will throttle its send rate and underutilize the channel.
Another problem that can arise is due to the "capture effect". Again, consider
A ----> X <-------------B
Both A and B are in range of X, but B is farther away and thus has a lower received signal strength at X. Again, A and B cannot CSMA. In this case, X may "capture" the stronger transmitter A, i.e., its radio will decode A's frames but consider B's as noise (even though in the absence of A, X would go right ahead and decode B's frames). This sets up an unfair advantage for A if both are trying to route a TCP stream through X.
802.11's DCF also favors the last winner of the channel contention round. As a result, this gives a slight advantage to long-lived, bulk TCP transfers.
Note that all these problems affect all transport protocols, not just TCP. It's just that the way TCP is designed makes it react particularly poorly to these scenarios.

Average UDP packet loss and packet re-ordering

I'd like to garner fellow SO'ers experience with regards to the issue of UDP packet loss (or drop-out).
Initially my understanding is that given direct point to point connections where the NICs are connected via a crossover cable and ample buffer on the NICs and timely processing of said buffers, that there 'should' be no packet loss or packet ordering issues. I believe this is also the case given one good/high-end switch in between the points.
Excluding the above scenario, what is the expected average UDP packet loss over a LAN
What scenarios cause UDP packet ordering issues?
No idea on the UDP packetloss on average LANs. I assume reasonably low on modern switched networks, otherwise your LAN or endpoints are too highly loaded. :)
The re-ordering is probably easiest to achieve when routes are brought up and down; say, one of the switches in your organization is under enough load that re-organizing the tree makes sense and traffic is sent through different switches. More likely is your ISP's peers coming and going, or reaching traffic limits, and the priority of packets through them changes -- old packets were in flight on the heavy-loaded network, new packets are in flight on the lighter-loaded network, and they arrive out of order.
I too am looking for an expected average. I found that from a direct link (PC to PC) packet loss occurs very rarely, although it definitely occurs. Availability was something like 99.9% at 1 kB packets # 50 Hz.
I have seen reordering just by sending and receiving on the same network interface.
I concluded that this occurs because each packet is handled asynchronously so that there is a chance of a newly arrived packet being processed before packets received prior to the newly received one.
On my basic gigabit switched LAN I get zero packet loss at even 50,000 packets per second, with FreeBSD, Solaris or Linux.
However Windows is something quite special, I easily see packet loss on exactly the same hardware at low speeds such as 10,000 per second. This is mainly due to buffer overflow between WinSock and the NIC, if you drive the packets faster you lose more, if you space out the packets you drop less.
There is no magical number, my situation is probably worse due to Broadcom having terrible Windows drivers.
You can easily see packet ordering issues, however it is almost always only the last two packets switched. This is an artifact of how switches function.
Interestingly what you haven't mentioned in Wi-Fi, radio signals are highly subject to interference and environmental conditions.

What does LAN/traffic congestion mean?

While talking about UDP I saw/heard congestion come up a few times. What does that mean?
congestion is when you are trying to send too much data over a limited bandwidth, it cannot send the data faster than the incoming amount so additional packets are dropped.
When congestion occurs, you can see these effects:
Delay due to the queue at one end of the connection being too big, so it takes time for your packet to be transmitted.
Packet loss when new packets are simply dropped, forcing connection resets (and often causing more congestion).
Lower quality of service, protocols like TCP will do a cutback on the transmission rate, so your throughput will be lowered.
Blocking, certain networks have protocol priorities, so your UDP packets may be dropped in favor of allowing TCP traffic through.
Its like a traffic jam, imagine right after a sports game where a parking lot full of cars is trying to empty out into a small side street.
It means that network-connected devices are attempting to send more data across the network than it can handle, e.g. 20 Mbps of data across a 10 Mbps link.
In context of UDP, it's your main source of lost datagrams under ordinary circumstances.
Most LANs use some sort of a collission detection/avoidance system. A congestion typically means that the amount of data which is being transmiited on the medium is causing enough collissions to deteriorate the quality of service defined for that medium.
You may want to read up CSMA/CD at wikipedia.
As UDP packets can often be broadcasted, congestion can occur more often.
Kind regards,
For instance, Ethernet is a broadband protocol. Once a message is sent, every node receives it but ignores if the packet are not sent to them. What happens when two nodes send a packet at the same time? It will cause a collision and data loss.
So, both of the nodes will have to resend the message. To avoid more collisions, nodes are designed to wait a random number of milliseconds. Otherwise they keep going on sending messages simultaneously and packages will collide forever.

Resources