What is the rationale behind bandwidth delay product - networking

My understanding is that Bandwidth delay product refers to the maximum amount of data "in-transit" at any point in time, between two endpoints.
The thing that I don't get is, why multiply bandwidth by RTT. Bandwidth is a function of underlying medium, such as copper wire, fire optics etc and RTT is function of how busy intermediate nodes are, any scheduling applied at the intermediate nodes, distance etc. RTT can change, but bandwidth for practical purposes can be considered as fixed. So how does multiplying a constant value (capacity aka bandwidth) by fluctuating value (RTT) represents total amount of data in transit?
Based on this, will a really really slow have very large capacity? Chances are the "Causes" of RTT will start dropping.

Look at the units:
[bandwidth] = bytes / second
[round trip time] = seconds
[data volume] = bytes
[data volume] = [bandwidth] * [round trip time].
Unit-wise, it is correct. Semantically,
What is bandwidth * round trip time? It's the amount of data that left the sender before the first acknowledgement was received by the sender. That is, bandwidth * round trip time = the desired window size under perfect conditions.
If the round trip time is measured from the last packet and the sender's outbound bandwidth is perfectly stable and fully used, then the measured window size exactly calculates the number of packets (data and ACKs together) in transit. If you want only one direction, divide the quantity by two.
Since the round trip time is a measured quantity, it naturally fluctuates (and gets smoothed out). The measured bandwidth could fluctuate as well, and thus the estimated total volume of data in transit fluctuates as well.
Note that the amount of data in transit can vary with the data transfer rate. If the bottleneck is wire delay, then RTT can be considered constant, and the amount of data in transit will be proportional to the speed with which it's sent to the network.
Of course, if a round trip time suddenly rises dramatically, the estimated max. amount of data in transit rises as well, but that is correct. If there is no accompanying packet loss, the sliding window needs to expand. If there is packet loss, you need to reconsider the bandwidth estimate (and the bandwidth delay product drops accordingly).

To add to Jan Dvorak's answer, you can think of the 'big fat pipe' as a garden hose. We are interested in how much water is in the pipe. So, we take its 'bandwidth' i.e. how fast it can deliver water, which for a hose is determined by its cross-sectional area, and multiply by its length, which corresponds to the RTT, i.e. how 'long' a drop of water takes to get from one end to the other. The result is the volume of the hose, the volume of the pipe, the amount of data 'in the pipe'.

First, BDP is a calculated value used in performance tuning to determine the upper bounds of data which could be outstanding/unacknowledged. This, almost always, does not represent the quantity of "in-transit" data, but a target which tuning parameters are applied. If it represented "in-transit" data, always, there would be no room for performance tuning.
RTT does in fact fluctuate. This is why the expected worse case RTT is used in calculations. By tuning to the worse case, throughput efficiency will be at maximum when RTT is poorest. If RTT improves, we get outstanding Acks sooner, the pipe remains full and maximum throughput (efficiency) is maintained.
"Full pipe" is a misnomer. The goal is to keep the Tx side full, as the Rx contains Ack packets which are typically smaller than the transmitted packets.
RTT also aggregated asymmetrical upstream and downstream bandwidths (ADSL, satellite modem, cable modem, etc.).

Related

Vulnerable time in ALOHA depends on the frame transmission time (Tfr), but in CSMA it depends on frame propagation time

Q: Why the vulnerable time in ALOHA depends on the frame
transmission time (Tfr), but in CSMA it depends on frame propagation
time (Tp)
-->I've understood that Vulnerable time is the time when there is a possibility for collision
There is no proper explanation about Transmission time and propagation time anywhere.
Please help
This is because of the difference in the magnitude of Transmission time and Propogation time. Most propogation is done through EM waves and thus for most distances the propogation delay is very negligible as compared to transmission time which is limited by the Bandwidth of a channel.
In the case of ALOHA, both these times contribute to the vulnerable time and thus the dominant factor is retained , however incase of CSMA the transmission time factor is avoided and thus only the propogation delay in considered although its very small , it must be taken into account to validate the efficiencies of CSMA protocols.

CSMA/CD: Minimum frame size to hear all collisions?

Question from a networking class:
"In a csma/cd lan of 2 km running at 100 megabits per second, what would be the minimum frame size to hear all collisions?"
Looked all over and can't find info anywhere on how to do this. Is there a formula for this problem? Thanks for any help.
bandwidth delay product is the amount of data on transit.
propagation delay is the amount of time it takes for the signal to propagate over network.
propagation delay=(lenght of wire/speed of signal).
assuming copper wire i.e speed =2/3* speed of light
propagation delay =(2000/(2*3*10^8/3))
=10us
round trip time is the time taken for message to travel from sender to receiver and back from receiver to sender.
round trip time =2*propagation delay =20us
minimum frame size =bandwidth *delay (rtt)
frame size = bandwidth *rtt
=100Mbps*20us=2000bits

Queue Length really affect Latency in DCTCP?

DCTCP is a variant of TCP for Data Center environment. The source is here
DCTCP using ECN feature in commodity switch to limit queue length of buffer in switch around the threshold K. Doing so, packet loss is rarely happen because K is much smaller than buffer's capacity so buffer isn't almost full.
DCTCP achieve low latency for small-flows while maintaining high throughput for big-flow. The reason is when queue length exceeds threshold K, a notification of congestion will be feedback to sender. At sender, a value for probability of congestion is computed over time, so sender will decrease sending rate correspondingly to the extent of congestion.
DCTCP states that small queue length will decrease the latency or the transmission time of flows. I doubted that. Because unless packet loss leading to re-transmission and so high latency. In DCTCP, packet loss rarely happens.
Small queue at switch forces senders to decrease sending rates so force packets to queue in TX buffer of senders.
Bigger queue at switch make senders have higher sending rates and packets instead queue in TX buffer of senders, it now queue in buffer of switch.
So I think that delay in both small and big queue is still the same.
What do you think?
The buffer in the switch does not increase the capacity of the network, it only helps to not loose too much packets if you have a traffic burst. But, TCP can deal with packet loss by sending slower, which is exactly what it needs to do in case the network capacity is reached.
If you continuously run the network at the limit, the queue of the switch will be full or nearly full all the time, so you still loose packets if the queue is full. But, you also increase the latency, because the packet needs some time to get from the end of the queue where it arrived to the beginning where it will be forwarded. This latency again causes the TCP stack to react slower to congestion, which again increases congestion, packet loss etc.
So the ideal switch behaves like a network cable, e.g. does not have any buffer at all.
You might read more about the problems caused by large buffers by searching for "bufferbloat", e.g. http://en.wikipedia.org/wiki/Bufferbloat.
And when in doubt benchmark yourself.
It depends on queue occupancy. DCTCP aims to maintain small queue occupancy, because the authors think that queueing delay is the reason of long latency.
So, it does not matter how maximum size of queue is. In 16Mb of maximum queue size or just 32kb of maximum queue size, if we can maintain queue occupancy always around 8kb or something small size, queueing delay will be the same.
Read a paper, HULL from NSDI 2012, of M. Alizadeh who is the first author of DCTCP. HULL also aims to maintain short queue occupancy.
What they talk about small buffer is, because trends of data center switches shift from 'store and forward' buffer to 'cut-through' buffer. Just google it, and you can find some documents from CISCO or somewhere related webpages.

Bandwidth estimation with multiple TCP connections

I have a client which issues parallel requests for data from a server. Each request uses a separate TCP connection. I would like to estimate the available throughput (bandwidth) based on the received data.
I know that for one connection TCP connection I can do so by dividing the amount of data the has been download by the duration of time it took to download the data. But given that there are multiple concurrent connections, would it be correct to sum up all the data that has been downloaded by the connections and divide the sum by the duration between sending the first request and the arrival time of the last byte (i.e., the last byte of the download that finishes last)? Or am I overlooking something here?
[This is a rewrite of my previous answer, which was getting too messy]
There are two components that we want to measure in order to calculate throughput: the total number of bytes transferred, and the total amount of time it took to transfer those bytes. Once we have those two figures, we just divide the byte-count by the duration to get the throughput (in bytes-per-second).
Calculating the number of bytes transferred is trivial; just have each TCP connection tally the number of bytes it transferred, and at the end of the sequence, we add up all of the tallies into a single sum.
Calculating the amount of time it takes for a single TCP connection to do its transfer is likewise trivial: just record the time (t0) at which the TCP connection received its first byte, and the time (t1) at which it received its last byte, and that connection's duration is (t1-t0).
Calculating the amount of time it takes for the aggregate process to complete, OTOH, is not so obvious, because there is no guarantee that all of the TCP connections will start and stop at the same time, or even that their download-periods will intersect at all. For example, imagine a scenario where there are five TCP connections, and the first four of them start immediately and finish within one second, while the final TCP connection drops some packets during its handshake, and so it doesn't start downloading until 5 seconds later, and it also finishes one second after it starts. In that scenario, do we say that the aggregate download process's duration was 6 seconds, or 2 seconds, or ???
If we're willing to count the "dead time" where no downloads were active (i.e. the time between t=1 and t=5 above) as part of the aggregate-duration, then calculating the aggregate-duration is easy: Just subtract the smallest t0 value from the largest t1 value. (this would yield an aggregate duration of 6 seconds in the example above). This may not be what we want though, because a single delayed download could drastically reduce the reported bandwidth estimate.
A possibly more accurate way to do it would be say that the aggregate duration should only include time periods when at least one TCP download was active; that way the result does not include any dead time, and is thus perhaps a better reflection of the actual bandwidth of the network path.
To do that, we need to capture the start-times (t0s) and end-times (t1s) of all TCP downloads as a list of time-intervals, and then merge any overlapping time-intervals as shown in the sketch below. We can then add up the durations of the merged time-intervals to get the aggregate duration.
You need to do a weighted average. Let B(n) be the bytes processed for connection 'n' and T(n) be the time required to process those bytes. The total throughput is:
double throughput=0;
for (int n=0; n<Nmax; ++n)
{
throughput += B(n) / T(n);
}
throughtput /= Nmax;

Is it mis-use to use "bandwith" to describe the speed of a network?

I often heard people talking about a network's speed in terms of "bandwith", and I read from < Computer Networks: A Systems Approach > the following definiton:
The bandwidth of a network is given by
the number of bits that can be
transmitted over the network in a
certain period of time.
AFAIK, the word "bandwith" is used to describe the the width of frequency that can be passed on some kind of medium. And the above definition describe something more like a throughput. So is it mis-use?
I have been thinking about this question for some time. I don't know where to post it. So forgive me if it is off topic.
Thanks.
Update - 1 - 9:56 AM 1/13/2011
I recall that, if a signal's cycle is smaller in time domain, its frequency belt will be wider in frequency domain, so IF the bit rate (digital bandwidth) is big, the signal's cycle should be quite small, and thus the analog bandwidth it required will be quite wide, but medium has its physical limit, the medium has the widest frequency it allows to pass, so it has the biggest bit rate it allows to transmit. From this point of view, I think the mis-use of bandwidth in digital world is acceptable.
The word bandwidth has more than one definition:
Bandwidth has several related meanings:
Bandwidth (computing) or digital bandwidth: a rate of data transfer, throughput or bit rate, measured in bits per second (bps), by analogy to signal processing bandwidth
Bandwidth (signal processing) or analog bandwidth, frequency bandwidth or radio bandwidth: a measure of the width of a range of frequencies, measured in hertz
...
With both definitions having more bandwidth means that you can send more data.
In computer networking and other digital fields, the term bandwidth often refers to a data rate measured in bits per second, for example network throughput, sometimes denoted network bandwidth, data bandwidth or digital bandwidth. The reason is that according to Hartley's law, the digital data rate limit (or channel capacity) of a physical communication link is proportional to its bandwidth in hertz, sometimes denoted radio frequency (RF) bandwidth, signal bandwidth, frequency bandwidth, spectral bandwidth or analog bandwidth. For bandwidth as a computing term, less ambiguous terms are bit rate, throughput, maximum throughput, goodput or channel capacity.
(Source)
Bandwidth is only one aspect of network speed. Delay is also important.
The term "bandwidth" is not a precise term, it may mean:
the clock frequency multiplied by the no-of-bits-transmitted-in-a-clock-tick - physical bandwidth,
minus bytes used for low-level error corrections, checksums (e.g. FEC in DVB),
minus bytes used by transmit protocol for addressing or other meta info (e.g. IP headers),
minus the time overhead of the handshake/transmit control (see TCP),
minus the time overhead of the administration of connection (e.g. DNS),
minus time spent on authentication (seeking user name on the host side),
minus time spent on receiving and handling the packet (e.g. an FTP server/client writes out the block of data received) - effective bandwidth, or throughput.
The best we can do is to always explain what kind of bandwidth we mean: with or without protocol overhead etc. Also, the users are often interested only in the last brutto value: how long does it take donwloading that stuff?

Resources