Which TCP window update is most recent? - tcp

I was writing a TCP implementation, did all the fancy slow and fast retransmission stuff, and it all worked so I thought I was done. But then I reviewed my packet receive function (almost half of the 400 lines total code), and realized that my understanding of basic flow control is incomplete...
Suppose we have a TCP connection with a "sender" and "receiver". Suppose that the "sender" is not sending anything, and the receiver is stalling and then unstalling.
Since the "sender" is not sending anything, the "receiver" sees no ack_no delta. So the two window updates from the "receiver" look like:
ack_no = X, window = 0
ack_no = X, window = 8K
since both packets have the same ack_no, and they could be reordered in transit, how does the sender know which came first?
If the sender doesn't know which came first, then, after receiving both packets, how does it know whether it's allowed to send?
One guess is that maybe the window's upper endpoint is never allowed to decrease? Once the receiver has allocated a receive buffer and advertised it, it can never un-advertise it? In that case the window update could be reliably handled via the following code (assume no window scale, for simplicity):
// window update (https://stackoverflow.com/questions/63931135/)
int ack_delta = pkt_ack_no - c->tx_sn_ack;
c->tx_window = MAX(BE16(PKT.l4.window), c->tx_window - ack_delta);
if (c->tx_window)
Net_Notify(); // wake up transmission
But this is terrible from a receiver standpoint: it vastly increases the memory you'd need to support 10K connections reliably. Surely the protocol is smarter than that?

There is an assumption that the receive buffer never shrinks, which is intentionally undocumented to create an elite "skin in the game" club in order to limit the number of TCP implementations.
The original standard says that shrinking the window is "discouraged" but doesn't point out that it can't work reliably:
The mechanisms provided allow a TCP to advertise a large window and
to subsequently advertise a much smaller window without having
accepted that much data. This, so called "shrinking the window," is
strongly discouraged.
Even worse, the standard is actually missing the MAX operation proposed in the question, and just sets the window from the most recent packet if the acknowledgement number isn't increasing:
If SND.UNA < SEG.ACK =< SND.NXT, the send window should be
updated. If (SND.WL1 < SEG.SEQ or (SND.WL1 = SEG.SEQ and
SND.WL2 =< SEG.ACK)), set SND.WND <- SEG.WND, set
SND.WL1 <- SEG.SEQ, and set SND.WL2 <- SEG.ACK.
Note that SND.WND is an offset from SND.UNA, that SND.WL1
records the sequence number of the last segment used to update
SND.WND, and that SND.WL2 records the acknowledgment number of
the last segment used to update SND.WND. The check here
prevents using old segments to update the window.
so it will fail to grow the window if packets having the same ack number are reordered.
Bottom line: implement something that actually works robustly, not what's in the standard.

Related

flow control implementation - how

I'm sending 1k data using TCP/IP (using FreeRTOS + LwiP). From documents I understood that TCP/IP protocol has its flow control inside its stack itself, but this flow control is dependent on the Network buffers. I'm not sure how this can be handled in my scenario which is described below.
Receive data of 1k size using TCP/IP from wifi (this data rate will be in 20Mb/s)
The received Wifi data is put into a queue of 10k size10 block, each block having a size of 1K
From the queue, each block is taken and send to another interface at lower rate 1Mb/s
So in this scenario, do I have to implement flow control manually between data from wifi <-> queue? How can I achieve this?
No you do not have to implement flow control yourself, the TCP algorithm takes care of it internally.
Basically what happens is that when a TCP segment is received from your sender LwIP will send back an ACK that includes the available space remaining in its buffers (the window size). Since the data is arriving faster than you can process it the stack will eventually send back an ACK with a window size of zero. This tells the sender's stack to back off and try again later, which it will do automatically. When you get around to extracting more data from the network buffers the stack should re-ACK the last segment it received, only this time it opens up the window to say that it can receive more data.
What you want to avoid is something called silly window syndrome because it can have a drastic effect on your network utilisation and performance. Try to read data off the network in big chunks if you can. Avoid tight loops that fill a buffer 1-byte at a time.

Does TCP scale to fast networks?

It seems the maximum TCP receive window size is 1GB (when scaling is used). So then the largest RTT that would still make it possible to fill a 100Gb pipe with one connection is 40ms (because 2 * 40E-3 * 100E9 / 8 = 1GB). That would limit that sort of communication speed to a distance IRO 10000 kilometres.
Another scaling problem seems to be that 32-bit sequence numbers don't offer protection against duplicated packets delayed by more than about 400ms (because they wrap around in that amount of time). They also limit the window size to 2GB (because they need to be split between the sender and receiver window).
Three questions:
I am aware of TCP timestamps that can help solve the problem of sequence numbers, but I would like to know if that is a feature that just happens to help but was really designed for some other purpose. Also, I don't understand what it is that timestamps achieve that could not be done simply by increasing the number of bits used for sequence numbers.
I don't understand why the maximum receive window is just 1GB as opposed to 2GB that would presumably be trivially possible with the current headers.
Finally, I would like to know if TCP already scales well enough to be used over the sort of links that are supposedly coming soon.
Many thanks.
The TCP features you're talking about were specified in RFC 1323 in the early 1990s. The limitations you're encountering are justified by discussion text in the RFC:
The sequence number appears in the middle of the TCP segment header and could not have been lengthened without an incompatible change.
Using timestamps allows for the protocol to simultaneously measure round-trip time and protect against wrapped sequence numbers. Making the sequence number bigger would not provide any information about round-trip time.
You need the timestamps in order to measure round-trip time accurately. Measuring round-trip time without timestamps is a sampling problem, and the sampling becomes unsolvable due to aliasing if you get more than 1 error per window.
A 1 GB receive window is the largest that can be kept in sync across the connection. The RFC explains it about as well as can be done:
TCP determines if a data segment is "old" or "new" by testing
whether its sequence number is within 2**31 bytes of the left edge
of the window, and if it is not, discarding the data as "old". To
insure that new data is never mistakenly considered old and vice-
versa, the left edge of the sender's window has to be at most
2**31 away from the right edge of the receiver's window.
Similarly with the sender's right edge and receiver's left edge.
Since the right and left edges of either the sender's or
receiver's window differ by the window size, and since the sender
and receiver windows can be out of phase by at most the window
size, the above constraints imply that 2 * the max window size
must be less than 2**31, or
max window < 2**30
As Jonathon mentioned earlier, these limitations are per-TCP connection. It's tough to think of a scenario where a single application could reach the limits of a single TCP connection, and tougher to think of one where the application couldn't open additional connection(s) if needed.

Congestion Control Algorithm at Receiver

Assume we talking about the situation of many senders sending packets to a receiver.
Often senders would be the one that control congestion by using sliding window that limits sending rate.
We have:
snd_cwnd = min(cwnd,rwnd)
Using explicit or implicit feedback information from network (router,switch), sender would control cwnd to control sending rate.
Normally, rwnd is always big enough that sender only care about cwnd. But if we consider rwnd, using it to limit snd_cwnd, it would make congestion control more efficiently.
rwnd is the number of packets (or bytes) that receiver be able to receive. What I'm concerned about is capability of senders.
Questions:
1. So how do receiver know how many flows sending packets to it?
2. Is there anyway that receiver know the snd_cwnd of sender?
This is all very confused.
The number of flows into a receiver isn't relevant to the rwnd of any specific flow. The rwnd is simply the amount of space left in the receive buffer for that flow.
The receiver has no need to know the sender's cwnd. That's the sender's problem.
Your statement that 'normally rwnd is always big enough that sender only cares about cwnd' is simply untrue. The receive window changes with every receive; it is re-advertised with every ACK; and it frequently drops to zero.
Your following statement 'if we consider rwnd, using it to limit cwnd ...' is simply a description of what already happens, as per 'snd_cwnd = min(cwnd, rwnd)'.
Or else it may constitute a completely unexplained proposal to needlessly modify TCP's flow control which has been working for 25 years, and which didn't work for several years before that: I remember several Arpanet freezes in the middle 1980s.

Benefit of small TCP receive window?

I am trying to learn how TCP Flow Control works when I came across the concept of receive window.
My question is, why is the TCP receive window scale-able? Are there any advantages from implementing a small receive window size?
Because as I understand it, the larger the receive window size, the higher the throughput. While the smaller the receive window, the lower the throughput, since TCP will always wait until the allocated buffer is not full before sending more data. So doesn't it make sense to have the receive window at the maximum at all times to have maximum transfer rate?
My question is, why is the TCP receive window scale-able?
There are two questions there. Window scaling is the ability to multiply the scale by a power of 2 so you can have window sizes > 64k. However the rest of your question indicates that you are really asking why it is resizeable, to which the answer is 'so the application can choose its own receive window size'.
Are there any advantages from implementing a small receive window size?
Not really.
Because as I understand it, the larger the receive window size, the higher the throughput.
Correct, up to the bandwidth-delay product. Beyond that, increasing it has no effect.
While the smaller the receive window, the lower the throughput, since TCP will always wait until the allocated buffer is not full before sending more data. So doesn't it make sense to have the receive window at the maximum at all times to have maximum transfer rate?
Yes, up to the bandwidth-delay product (see above).
A small receive window ensures that when a packet loss is detected (which happens frequently on high collision network),
No it doesn't. Simulations show that if packet loss gets above a few %, TCP becomes unusable.
the sender will not need to resend a lot of packets.
It doesn't happen like that. There aren't any advantages to small window sizes except lower memory occupancy.
After much reading around, I think I might just have found an answer.
Throughput is not just a function of receive window. Both small and large receive windows have their own benefits and harms.
A small receive window ensures that when a packet loss is detected (which happens frequently on high collision network), the sender will not need to resend a lot of packets.
A large receive window ensures that the sender will not be idle a most of the time as it waits for the receiver to acknowledge that a packet has been received.
The receive window needs to be adjustable to get the optimal throughput for any given network.

SR & GBN: Out-of-window ACKs

I'm currently studying fairly basic networking, and I'm currently on the subject of reliable transmission. I'm using the book Computer Networking by Kurrose & Ross, and two of the review questions were as follows:
With the selective-repeat/go-back-n protocol, it is possible for the
sender to receive an ACK for a packet that falls outside of its
current window?
For the SR version, my answer to the question was as follows:
Yes, if the window size is too big for the sequence number space. For
example, a receiver gets a number of packets equal to the space of the
sequence numbers. Its receive window has thus moved so that it is
expecting a new set of packets with the same sequence numbers as the
last one. The receiver now sends an ACK for each of the packets, but
all of them are lost along the way. This eventually causes the sender
to timeout for each of the previous set of packets, and retransmits
each of them. The receiver think that this duplicate set of packets
are really the new ones that it is expecting, and it sends ACKs for
each of them that successfully reaches the sender. The sender now
experiences a similar kind of confusion, where it thinks that the ACKs
are confirmations that each of the old packets have been received,
when they are really ACKs meant for the new, yet-to-be-sent packets.
I'm pretty sure this is correct (otherwise, please tell me!), since this kind of scenario seems to be the classic justification of why window size should be less than or equal to half the size of the sequence number space when it comes to SR protocols, but what about GBN?
Can the same kind of wraparound issue occur for it, making the answers mostly identical? If not, are there any other cases that can cause a typical GBN sender to receive an ACK outside of its window?
Regarding the later, the only example I can think of is the following:
A GBN sender sends packets A & B in order. The receiver receives both in order, and sends one cumulative ACK covering every packet before and up to A, and then another one covering every packet before and up to B (including A). The first one is so heavily delayed that the second one arrives first to the sender, causing its window to slide beyond A & B. When the first one finally arrives, it needlessly acknowledges that everything up to A has been correctly received, when A is already outside of the sender's window.
This example seems rather harmless and unlikely in contrast to the previous one, so I doubt that its correct (but again, correct me if I'm wrong, please!).
In practical world, how about a duplicated ACK delayed long enough to fall out of the window?
The protocol is between the sender and the receiver, but it does not have control over how the media (network path) behaves.
The protocol would still be reliable according to design but the implementation shall be able to handle such out-of-window duplicated ACKs.

Resources