Does TCP scale to fast networks? - tcp

It seems the maximum TCP receive window size is 1GB (when scaling is used). So then the largest RTT that would still make it possible to fill a 100Gb pipe with one connection is 40ms (because 2 * 40E-3 * 100E9 / 8 = 1GB). That would limit that sort of communication speed to a distance IRO 10000 kilometres.
Another scaling problem seems to be that 32-bit sequence numbers don't offer protection against duplicated packets delayed by more than about 400ms (because they wrap around in that amount of time). They also limit the window size to 2GB (because they need to be split between the sender and receiver window).
Three questions:
I am aware of TCP timestamps that can help solve the problem of sequence numbers, but I would like to know if that is a feature that just happens to help but was really designed for some other purpose. Also, I don't understand what it is that timestamps achieve that could not be done simply by increasing the number of bits used for sequence numbers.
I don't understand why the maximum receive window is just 1GB as opposed to 2GB that would presumably be trivially possible with the current headers.
Finally, I would like to know if TCP already scales well enough to be used over the sort of links that are supposedly coming soon.
Many thanks.

The TCP features you're talking about were specified in RFC 1323 in the early 1990s. The limitations you're encountering are justified by discussion text in the RFC:
The sequence number appears in the middle of the TCP segment header and could not have been lengthened without an incompatible change.
Using timestamps allows for the protocol to simultaneously measure round-trip time and protect against wrapped sequence numbers. Making the sequence number bigger would not provide any information about round-trip time.
You need the timestamps in order to measure round-trip time accurately. Measuring round-trip time without timestamps is a sampling problem, and the sampling becomes unsolvable due to aliasing if you get more than 1 error per window.
A 1 GB receive window is the largest that can be kept in sync across the connection. The RFC explains it about as well as can be done:
TCP determines if a data segment is "old" or "new" by testing
whether its sequence number is within 2**31 bytes of the left edge
of the window, and if it is not, discarding the data as "old". To
insure that new data is never mistakenly considered old and vice-
versa, the left edge of the sender's window has to be at most
2**31 away from the right edge of the receiver's window.
Similarly with the sender's right edge and receiver's left edge.
Since the right and left edges of either the sender's or
receiver's window differ by the window size, and since the sender
and receiver windows can be out of phase by at most the window
size, the above constraints imply that 2 * the max window size
must be less than 2**31, or
max window < 2**30
As Jonathon mentioned earlier, these limitations are per-TCP connection. It's tough to think of a scenario where a single application could reach the limits of a single TCP connection, and tougher to think of one where the application couldn't open additional connection(s) if needed.

Related

Which TCP window update is most recent?

I was writing a TCP implementation, did all the fancy slow and fast retransmission stuff, and it all worked so I thought I was done. But then I reviewed my packet receive function (almost half of the 400 lines total code), and realized that my understanding of basic flow control is incomplete...
Suppose we have a TCP connection with a "sender" and "receiver". Suppose that the "sender" is not sending anything, and the receiver is stalling and then unstalling.
Since the "sender" is not sending anything, the "receiver" sees no ack_no delta. So the two window updates from the "receiver" look like:
ack_no = X, window = 0
ack_no = X, window = 8K
since both packets have the same ack_no, and they could be reordered in transit, how does the sender know which came first?
If the sender doesn't know which came first, then, after receiving both packets, how does it know whether it's allowed to send?
One guess is that maybe the window's upper endpoint is never allowed to decrease? Once the receiver has allocated a receive buffer and advertised it, it can never un-advertise it? In that case the window update could be reliably handled via the following code (assume no window scale, for simplicity):
// window update (https://stackoverflow.com/questions/63931135/)
int ack_delta = pkt_ack_no - c->tx_sn_ack;
c->tx_window = MAX(BE16(PKT.l4.window), c->tx_window - ack_delta);
if (c->tx_window)
Net_Notify(); // wake up transmission
But this is terrible from a receiver standpoint: it vastly increases the memory you'd need to support 10K connections reliably. Surely the protocol is smarter than that?
There is an assumption that the receive buffer never shrinks, which is intentionally undocumented to create an elite "skin in the game" club in order to limit the number of TCP implementations.
The original standard says that shrinking the window is "discouraged" but doesn't point out that it can't work reliably:
The mechanisms provided allow a TCP to advertise a large window and
to subsequently advertise a much smaller window without having
accepted that much data. This, so called "shrinking the window," is
strongly discouraged.
Even worse, the standard is actually missing the MAX operation proposed in the question, and just sets the window from the most recent packet if the acknowledgement number isn't increasing:
If SND.UNA < SEG.ACK =< SND.NXT, the send window should be
updated. If (SND.WL1 < SEG.SEQ or (SND.WL1 = SEG.SEQ and
SND.WL2 =< SEG.ACK)), set SND.WND <- SEG.WND, set
SND.WL1 <- SEG.SEQ, and set SND.WL2 <- SEG.ACK.
Note that SND.WND is an offset from SND.UNA, that SND.WL1
records the sequence number of the last segment used to update
SND.WND, and that SND.WL2 records the acknowledgment number of
the last segment used to update SND.WND. The check here
prevents using old segments to update the window.
so it will fail to grow the window if packets having the same ack number are reordered.
Bottom line: implement something that actually works robustly, not what's in the standard.

What decides packETH's maximum packet generation speed?

I've been using packETH for a while and I have always wondered one thing.
When I set packet generation speed on Gen-b option, I realized packETH doesn't really send packets as set.
I think when I use packETH on a virtual machine, maximum speed tends to decrease.
Even if I set number of packets to send : 40000000 and set packets per second : 4000000, the operation wouldn't be finished in 10 seconds and instead I think packETH tries to send out packets as fast as possible but can't quite reach that speed and decides to send out packets slower and therefore taking longer for the operation to finish.
So, what decides packETH's maximum packet generation/transfer speed?
Does it automatically adjust the maximum speed so that the receiving server can intake all the packets correctly?
Thank you so much in advnace.
I've read about packETH and I didn't found anything related to be a multi-threaded package sender, so there should be a problem. What you want is a multithreaded package sender which can receive any amount of packages and send them in parallel. But first, let focus on packETH:
You have tried which configuration?
In the Auto mode you can choose one of five distribution modes. Except the random mode you can see different timings by choosing different mode. In the random mode the generator tries to be smart :). Beside timing you can also specify the amount of traffic for each stream. In the manual mode you select all of the parameters by hand.
Here is where I've found it: http://packeth.sourceforge.net/packeth/GUI_version.html
Related to a multithreaded sender I would suggest trafgen, let's expose some features:
This will help you at not worrying about limit
Process a number of packets and then exit. If the number of packets is 0, then this is equivalent to infinite packets resp. processing until interrupted. Otherwise, a number given as an unsigned integer will limit processing.
This will ensure paralelism
Specify the number of processes trafgen shall fork(2) off. By default trafgen will start as many processes as CPUs that are online and pin them to each, respectively. Allowed value must be within interval [1,CPUs].

What is the difference between the delay and the jitter in the context of real time applications?

According to Wikipedia Jitter is the undesired deviation from true periodicity of an assumed periodic signal, according to a papper on QoS that I am reading jitter is reffered to as delay variation. Are there any definition of the jitter in the context of real time applications? Are there applications that are sensitive to jitter but not sensitive to delay? If for example a streaming application use some kind of buffer to store packets before show them to the user, is it possible that this application is not sensitive to delay but is sensitive to jitter?
Delay: Is the amount of time data(signal) takes to reach the destination. Now a higher delay generally means congestion of some sort of breaking of the communication link.
Jitter: Is the variation of delay time. This happens when a system is not in deterministic state eg. Video Streaming suffers from jitter a lot because the size of data transferred is quite large and hence no way of saying how long it might take to transfer.
If your application is sensitive to jitter it is definitely sensitive to delay.
In Real-time Protocol (RTP, RFC3550), a header contains a timestamp field. The value of it usually comes from a monotonically incremented counter and the frequency of the increment is the clock-rate. This clock-rate must be the same all over the participant wants something with the timestamp field. The counters have different base offsets, because the start time may different or they contains it because of security reason, etc... All in all we say the clocks are not syncronized.
To show it in an example consider if we refer to snd_timestamp and rcv_timestamp the most recent packet sender timestamp from the RTP header field and receiver timestamp generated by the receiver using the same clock-rate.
The wrong conclusion is that
delay_in_timestamp_unit = rcv_timestamp - snd_timestamp
If the receiver and sender clock-rate has different base offset (and they have), this not gives you the delay, also it doesn't consider the wrap around the 32bit unsigned integer.
But monitoring the time for delivering packets is somehow necessary if we want a proper playout adaption algorithm or if we want to detect and avoid congestions.
Also note that if we have syncronized clocks delay_in_timestamp_unit might be not punctually represent the pure network delay, because of components at the sender or at the receiver side retaining these packets after and/or before the timestamp added and/or exemined. So if you calculate a 2seconds delay between the participant, but you know your network delay is around 100ms, then your packets suffer additional delays at the sender or/and at the receiver side. But that additional delay is somehow (or at least you hope that it is) constant, so the only delay changes in time is - hopefully - the network delay. So you should not say that if packet delay > 500ms then we have a congestion, because you have no idea what is the actual network delay if you use only one packet sender and receiver timestamp information.
But the difference between the delays of two consecutive packets might gives you some information about weather something wrong in the network or not.
diff_delay = delay_t0 - delay_t1
if diff_delay equals to 0 the delay is the same, if it greater than 0 the newly arrived packets needed more time then the previous one, and if it smaller than 0 it needed less time.
And from that relative information based on two consecutive delays you could say something.
How you determine the difference between two delay if the clocks are not syncronized?
Consider you stored the last timestamps in rcv_timestamp_t1 and snd_timestamp_t1
diff_delay = (rcv_timestamp_t0 - snd_timestamp_t0) - (rcv_timestamp_t1 - snd_timestamp_t1)
but that would be problem without maintaining the base offsets of the sender and the receiver, so reordering it:
diff_delay = (rcv_timestamp_t0 - rcv_timestamp_t1) - (snd_timestamp_t0 - snd_timestamp_t1)
and here you can subtract rcv timestamps from each other and it eliminates the offset rcv and snd contain, and then you can extract the rcv_diff from snd_diff and it gives you the information about the difference of the delays of two consecutive packets in the unit of the clock-rate.
Now, according to RFC3550 jitter is "An estimate of the statistical variance of the RTP data packet interarrival time".
In order to finally get to the point your question is
"What is the difference between the delay and the jitter in the context of real time applications?"
Tiny note, but real-time applications usually refer to systems processing data in a range of nanoseconds, so I think you refer to end-to-end systems.
Also despite of several altered definition of jitter, it all uses the difference of the delays of arrived packets and thus provide you information about the relative changes of the network delay, meanwhile delay itself is an absolute value of the time of delivery.

Benefit of small TCP receive window?

I am trying to learn how TCP Flow Control works when I came across the concept of receive window.
My question is, why is the TCP receive window scale-able? Are there any advantages from implementing a small receive window size?
Because as I understand it, the larger the receive window size, the higher the throughput. While the smaller the receive window, the lower the throughput, since TCP will always wait until the allocated buffer is not full before sending more data. So doesn't it make sense to have the receive window at the maximum at all times to have maximum transfer rate?
My question is, why is the TCP receive window scale-able?
There are two questions there. Window scaling is the ability to multiply the scale by a power of 2 so you can have window sizes > 64k. However the rest of your question indicates that you are really asking why it is resizeable, to which the answer is 'so the application can choose its own receive window size'.
Are there any advantages from implementing a small receive window size?
Not really.
Because as I understand it, the larger the receive window size, the higher the throughput.
Correct, up to the bandwidth-delay product. Beyond that, increasing it has no effect.
While the smaller the receive window, the lower the throughput, since TCP will always wait until the allocated buffer is not full before sending more data. So doesn't it make sense to have the receive window at the maximum at all times to have maximum transfer rate?
Yes, up to the bandwidth-delay product (see above).
A small receive window ensures that when a packet loss is detected (which happens frequently on high collision network),
No it doesn't. Simulations show that if packet loss gets above a few %, TCP becomes unusable.
the sender will not need to resend a lot of packets.
It doesn't happen like that. There aren't any advantages to small window sizes except lower memory occupancy.
After much reading around, I think I might just have found an answer.
Throughput is not just a function of receive window. Both small and large receive windows have their own benefits and harms.
A small receive window ensures that when a packet loss is detected (which happens frequently on high collision network), the sender will not need to resend a lot of packets.
A large receive window ensures that the sender will not be idle a most of the time as it waits for the receiver to acknowledge that a packet has been received.
The receive window needs to be adjustable to get the optimal throughput for any given network.

Compensating for jitter

I have a voice-chat service which is experiencing variations in the delay between packets. I was wondering what the proper response to this is, and how to compensate for it?
For example, should I adjust my audio buffers in some way?
Thanks
You don't say if this is an application you are developing yourself or one which you are simply using - you will obviously have more control over the former so that may be important.
Either way, it may be that your network is simply not good enough to support VoIP, in which case you really need to concentrate on improving the network or using a different one.
VoIP typically requires an end to end delay of less than 200ms (milli seconds) before the users perceive an issue.
Jitter is also important - in simple terms it is the variance in end to end packet delay. For example the delay between packet 1 and packet 2 may be 20ms but the delay between packet 2 and packet 3 may be 30 ms. Having a jitter buffer of 40ms would mean your application would wait up to 40ms between packets so would not 'lose' any of these packets.
Any packet not received within the jitter buffer window is usually ignored and hence there is a relationship between jitter and the effective packet loss value for your connection. Packet loss typically impacts users perception of voip quality also - different codes have different tolerance - a common target might be that it should be lower than 1%-5%. Packet loss concealment techniques can help if it just an intermittent problem.
Jitter buffers will either be static or dynamic (adaptive) - in either case, the bigger they get the greater the chance they will introduce delay into the call and you get back to the delay issue above. A typical jitter buffer might be between 20 and 50ms, either set statically or adapting automatically based on network conditions.
Good references for further information are:
- http://www.voiptroubleshooter.com/indepth/jittersources.html
- http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a00800945df.shtml
It is also worth trying some of the common internet connection online speed tests available as many will have specific VoIP test that will give you an idea if your local connection is good enough for VoIP (although bear in mind that these tests only indicate the conditions at the exact time you are running your test).

Resources