Bandwidth estimation with multiple TCP connections - networking

I have a client which issues parallel requests for data from a server. Each request uses a separate TCP connection. I would like to estimate the available throughput (bandwidth) based on the received data.
I know that for one connection TCP connection I can do so by dividing the amount of data the has been download by the duration of time it took to download the data. But given that there are multiple concurrent connections, would it be correct to sum up all the data that has been downloaded by the connections and divide the sum by the duration between sending the first request and the arrival time of the last byte (i.e., the last byte of the download that finishes last)? Or am I overlooking something here?

[This is a rewrite of my previous answer, which was getting too messy]
There are two components that we want to measure in order to calculate throughput: the total number of bytes transferred, and the total amount of time it took to transfer those bytes. Once we have those two figures, we just divide the byte-count by the duration to get the throughput (in bytes-per-second).
Calculating the number of bytes transferred is trivial; just have each TCP connection tally the number of bytes it transferred, and at the end of the sequence, we add up all of the tallies into a single sum.
Calculating the amount of time it takes for a single TCP connection to do its transfer is likewise trivial: just record the time (t0) at which the TCP connection received its first byte, and the time (t1) at which it received its last byte, and that connection's duration is (t1-t0).
Calculating the amount of time it takes for the aggregate process to complete, OTOH, is not so obvious, because there is no guarantee that all of the TCP connections will start and stop at the same time, or even that their download-periods will intersect at all. For example, imagine a scenario where there are five TCP connections, and the first four of them start immediately and finish within one second, while the final TCP connection drops some packets during its handshake, and so it doesn't start downloading until 5 seconds later, and it also finishes one second after it starts. In that scenario, do we say that the aggregate download process's duration was 6 seconds, or 2 seconds, or ???
If we're willing to count the "dead time" where no downloads were active (i.e. the time between t=1 and t=5 above) as part of the aggregate-duration, then calculating the aggregate-duration is easy: Just subtract the smallest t0 value from the largest t1 value. (this would yield an aggregate duration of 6 seconds in the example above). This may not be what we want though, because a single delayed download could drastically reduce the reported bandwidth estimate.
A possibly more accurate way to do it would be say that the aggregate duration should only include time periods when at least one TCP download was active; that way the result does not include any dead time, and is thus perhaps a better reflection of the actual bandwidth of the network path.
To do that, we need to capture the start-times (t0s) and end-times (t1s) of all TCP downloads as a list of time-intervals, and then merge any overlapping time-intervals as shown in the sketch below. We can then add up the durations of the merged time-intervals to get the aggregate duration.

You need to do a weighted average. Let B(n) be the bytes processed for connection 'n' and T(n) be the time required to process those bytes. The total throughput is:
double throughput=0;
for (int n=0; n<Nmax; ++n)
{
throughput += B(n) / T(n);
}
throughtput /= Nmax;

Related

What is the difference between the delay and the jitter in the context of real time applications?

According to Wikipedia Jitter is the undesired deviation from true periodicity of an assumed periodic signal, according to a papper on QoS that I am reading jitter is reffered to as delay variation. Are there any definition of the jitter in the context of real time applications? Are there applications that are sensitive to jitter but not sensitive to delay? If for example a streaming application use some kind of buffer to store packets before show them to the user, is it possible that this application is not sensitive to delay but is sensitive to jitter?
Delay: Is the amount of time data(signal) takes to reach the destination. Now a higher delay generally means congestion of some sort of breaking of the communication link.
Jitter: Is the variation of delay time. This happens when a system is not in deterministic state eg. Video Streaming suffers from jitter a lot because the size of data transferred is quite large and hence no way of saying how long it might take to transfer.
If your application is sensitive to jitter it is definitely sensitive to delay.
In Real-time Protocol (RTP, RFC3550), a header contains a timestamp field. The value of it usually comes from a monotonically incremented counter and the frequency of the increment is the clock-rate. This clock-rate must be the same all over the participant wants something with the timestamp field. The counters have different base offsets, because the start time may different or they contains it because of security reason, etc... All in all we say the clocks are not syncronized.
To show it in an example consider if we refer to snd_timestamp and rcv_timestamp the most recent packet sender timestamp from the RTP header field and receiver timestamp generated by the receiver using the same clock-rate.
The wrong conclusion is that
delay_in_timestamp_unit = rcv_timestamp - snd_timestamp
If the receiver and sender clock-rate has different base offset (and they have), this not gives you the delay, also it doesn't consider the wrap around the 32bit unsigned integer.
But monitoring the time for delivering packets is somehow necessary if we want a proper playout adaption algorithm or if we want to detect and avoid congestions.
Also note that if we have syncronized clocks delay_in_timestamp_unit might be not punctually represent the pure network delay, because of components at the sender or at the receiver side retaining these packets after and/or before the timestamp added and/or exemined. So if you calculate a 2seconds delay between the participant, but you know your network delay is around 100ms, then your packets suffer additional delays at the sender or/and at the receiver side. But that additional delay is somehow (or at least you hope that it is) constant, so the only delay changes in time is - hopefully - the network delay. So you should not say that if packet delay > 500ms then we have a congestion, because you have no idea what is the actual network delay if you use only one packet sender and receiver timestamp information.
But the difference between the delays of two consecutive packets might gives you some information about weather something wrong in the network or not.
diff_delay = delay_t0 - delay_t1
if diff_delay equals to 0 the delay is the same, if it greater than 0 the newly arrived packets needed more time then the previous one, and if it smaller than 0 it needed less time.
And from that relative information based on two consecutive delays you could say something.
How you determine the difference between two delay if the clocks are not syncronized?
Consider you stored the last timestamps in rcv_timestamp_t1 and snd_timestamp_t1
diff_delay = (rcv_timestamp_t0 - snd_timestamp_t0) - (rcv_timestamp_t1 - snd_timestamp_t1)
but that would be problem without maintaining the base offsets of the sender and the receiver, so reordering it:
diff_delay = (rcv_timestamp_t0 - rcv_timestamp_t1) - (snd_timestamp_t0 - snd_timestamp_t1)
and here you can subtract rcv timestamps from each other and it eliminates the offset rcv and snd contain, and then you can extract the rcv_diff from snd_diff and it gives you the information about the difference of the delays of two consecutive packets in the unit of the clock-rate.
Now, according to RFC3550 jitter is "An estimate of the statistical variance of the RTP data packet interarrival time".
In order to finally get to the point your question is
"What is the difference between the delay and the jitter in the context of real time applications?"
Tiny note, but real-time applications usually refer to systems processing data in a range of nanoseconds, so I think you refer to end-to-end systems.
Also despite of several altered definition of jitter, it all uses the difference of the delays of arrived packets and thus provide you information about the relative changes of the network delay, meanwhile delay itself is an absolute value of the time of delivery.

What is the rationale behind bandwidth delay product

My understanding is that Bandwidth delay product refers to the maximum amount of data "in-transit" at any point in time, between two endpoints.
The thing that I don't get is, why multiply bandwidth by RTT. Bandwidth is a function of underlying medium, such as copper wire, fire optics etc and RTT is function of how busy intermediate nodes are, any scheduling applied at the intermediate nodes, distance etc. RTT can change, but bandwidth for practical purposes can be considered as fixed. So how does multiplying a constant value (capacity aka bandwidth) by fluctuating value (RTT) represents total amount of data in transit?
Based on this, will a really really slow have very large capacity? Chances are the "Causes" of RTT will start dropping.
Look at the units:
[bandwidth] = bytes / second
[round trip time] = seconds
[data volume] = bytes
[data volume] = [bandwidth] * [round trip time].
Unit-wise, it is correct. Semantically,
What is bandwidth * round trip time? It's the amount of data that left the sender before the first acknowledgement was received by the sender. That is, bandwidth * round trip time = the desired window size under perfect conditions.
If the round trip time is measured from the last packet and the sender's outbound bandwidth is perfectly stable and fully used, then the measured window size exactly calculates the number of packets (data and ACKs together) in transit. If you want only one direction, divide the quantity by two.
Since the round trip time is a measured quantity, it naturally fluctuates (and gets smoothed out). The measured bandwidth could fluctuate as well, and thus the estimated total volume of data in transit fluctuates as well.
Note that the amount of data in transit can vary with the data transfer rate. If the bottleneck is wire delay, then RTT can be considered constant, and the amount of data in transit will be proportional to the speed with which it's sent to the network.
Of course, if a round trip time suddenly rises dramatically, the estimated max. amount of data in transit rises as well, but that is correct. If there is no accompanying packet loss, the sliding window needs to expand. If there is packet loss, you need to reconsider the bandwidth estimate (and the bandwidth delay product drops accordingly).
To add to Jan Dvorak's answer, you can think of the 'big fat pipe' as a garden hose. We are interested in how much water is in the pipe. So, we take its 'bandwidth' i.e. how fast it can deliver water, which for a hose is determined by its cross-sectional area, and multiply by its length, which corresponds to the RTT, i.e. how 'long' a drop of water takes to get from one end to the other. The result is the volume of the hose, the volume of the pipe, the amount of data 'in the pipe'.
First, BDP is a calculated value used in performance tuning to determine the upper bounds of data which could be outstanding/unacknowledged. This, almost always, does not represent the quantity of "in-transit" data, but a target which tuning parameters are applied. If it represented "in-transit" data, always, there would be no room for performance tuning.
RTT does in fact fluctuate. This is why the expected worse case RTT is used in calculations. By tuning to the worse case, throughput efficiency will be at maximum when RTT is poorest. If RTT improves, we get outstanding Acks sooner, the pipe remains full and maximum throughput (efficiency) is maintained.
"Full pipe" is a misnomer. The goal is to keep the Tx side full, as the Rx contains Ack packets which are typically smaller than the transmitted packets.
RTT also aggregated asymmetrical upstream and downstream bandwidths (ADSL, satellite modem, cable modem, etc.).

does TCP randomly select a time from an interval to decide when a timeout has occured?

In TCP how is the time for a time out to happen determined? I was told it is randomly selected from an interval that doubles after each time out, but nothing I found on Google mentions anything about random selection and instead says it's calculated used Smoothed Round Trip Time after the first acknowledgment is received. Does it do this for each packet or is there some randomness to the design?
An initial value of the RTT is calculated during the TCP 3-way handshake that starts a connection. It is updated thereafter when qualifying send/acks are seen.
Most modern implementations don't use this method directly but rather using a statistical analysis of the maximum time it should take to get an ACK and retransmit after that interval. The "exponential backoff" (the doubling of the wait interval) happens for further retransmissions of the same data.
A connection "times out" after some number of transmissions with no ACK being received.

Graphite: show change from previous value

I am sending Graphite the time spent in Garbage Collection (getting this from jvm via jmx). This is a counter that increases. Is their a way to have Graphite graph the change every minute so I can see a graph that shows time spent in GC by minute?
You should be able to turn the counter into a hit-rate with the Derivative function, then use the summarize function to the counter into the time frame that your after.
&target=summarize(derivative(java.gc_time), "1min") # time spent per minute
derivative(seriesList)
This is the opposite of the integral function. This is useful for taking a
running totalmetric and showing how many requests per minute were handled.
&target=derivative(company.server.application01.ifconfig.TXPackets)
Each time you run ifconfig, the RX and TXPackets are higher (assuming there is network traffic.)
By applying the derivative function, you can get an idea of the packets per minute sent or received, even though you’re only recording the total.
summarize(seriesList, intervalString, func='sum', alignToFrom=False)
Summarize the data into interval buckets of a certain size.
By default, the contents of each interval bucket are summed together.
This is useful for counters where each increment represents a discrete event and
retrieving a “per X” value requires summing all the events in that interval.
Source: http://graphite.readthedocs.org/en/0.9.10/functions.html

Measuring time difference between networked devices

I'm adding networked multiplayer to a game I've made. When the server sends an update packet to the client, I include a timestamp so that the client knows exactly when that information is valid. However, the server computer and the client computer might have their clocks set to different times (maybe even just a few seconds difference), so the timestamp from the server needs to be translated to the client's local time.
So, I'd like to know the best way to calculate the time difference between the server and the client. Currently, the client pings the server for a time stamp during initialization, takes note of when the request was sent and when it was answered, and guesses that the time stamp was generated roughly halfway along the journey. The client also runs 10 of these trials and takes the average.
But, the problem is that I'm getting different results over repeated runs of the program. Within each set of 10, each measurement rarely diverges by more than 400 milliseconds, which might be acceptable. But if I wait a few minutes between each run of the program, the resulting averages might disagree by as much as 2 seconds, which is not acceptable.
Is there a better way to figure out the difference between the clocks of two networked devices? Or is there at least a way to tweak my algorithm to yield more accurate results?
Details that may or may not be relevant: The devices are iPod Touches communicating over Bluetooth. I'm measuring pings to be anywhere from 50-200 milliseconds. I can't ask the users to sync up their clocks. :)
Update: With the help of the below answers, I wrote an objective-c class to handle this. I posted it on my blog: http://scooops.blogspot.com/2010/09/timesync-was-time-sink.html
I recently took a one-hour class on this and it wasn't long enough, but I'll try to boil it down to get you pointed in the right direction. Get ready for a little algebra.
Let s equal the time according to the server. Let c equal the time according to the client. Let d = s - c. d is what is added to the client's time to correct it to the server's time, and is what we need to solve for.
First we send a packet from the server to the client with a timestamp. When that packet is received at the client, it stores the difference between the given timestamp and its own clock as t1.
The client then sends a packet to the server with its own timestamp. The server sends the difference between the timestamp and its own clock back to the client as t2.
Note that t1 and t2 both include the "travel time" t of the packet plus the time difference between the two clocks d. Assuming for the moment that the travel time is the same in both directions, we now have two equations in two unknowns, which can be solved:
t1 = t - d
t2 = t + d
t1 + d = t2 - d
d = (t2 - t1)/2
The trick comes because the travel time is not always constant, as evidenced by your pings between 50 and 200 ms. It turns out to be most accurate to use the timestamps with the minimum ping time. That's because your ping time is the sum of the "bare metal" delay plus any delays spent waiting in router queues. Every once in a while, a lucky packet gets through without any queuing delays, so you use that minimum time as the most repeatable time.
Also keep in mind that clocks run at different rates. For example, I can reset my computer at home to the millisecond and a day later it will be 8 seconds slow. That means you have to continually readjust d. You can use the slope of various values of d computed over time to calculate your drift and compensate for it in between measurements, but that's beyond the scope of an answer here.
Hope that helps point you in the right direction.
Your algorithm will not be much more accurate unless you can use some statistical methods. First of all, 10 is probably not sufficient. The first and simplest change would be to gather 100 transit time samples and toss out the x longest and shortest.
Another thing to add would be that both clients send their own timestamp in each packet. Then you can also calculate how different their clocks are and check the average difference between the clocks.
You can also check up on STNP and NTP implementations specifically, as these protocols do this specifically.

Resources