I'm trying to understand how TCP works and I'm a bit surprised by the (absence of) effect of the receiver window (rwnd) on the congestion window (cwnd).
From what I've read (mainly wikipedia and RFC5681) I understand that if the slow start threshold (ssthresh) has not been reached but the transmission rate is restricted by rwnd (since it is the minimum value between rwnd and cwnd) then cwnd continues to increase during the slow start phase (and even during congestion avoidance) if there are no loss or timeout. Meaning that cwnd could potentially reach a very high value since the initial value of ssthresh is extremely big.
See the following citation to confirm my deduction :
Implementation Note: An easy mistake to make is to simply use cwnd,
rather than FlightSize, which in some implementations may
incidentally increase well beyond rwnd.
[from RFC5681 (this part of the RFC is about setting a new value for ssthresh after a loss)]
In this case wouldn't it be possible to :
keep a connection with a relatively low transmission rate (e.g. setting rwnd to 10mss in every ack) to have no loss and hence keep the connection in the slow start phase,
wait enough time to allow cwnd to be extremely big (like 10 times what the link can handle) and then
set rwnd to an even bigger value to let the transmission rate be restricted only by cwnd ?
This would lead to a massive amount of congestion on the link, especially since it will take quite a lot of time for the server to notice the loss with a timeout and reset cwnd back to its initial value... and this may have a huge impact on other connections using the same link, or at least the same bottleneck link.
I would have imagined that once rcwnd is reached, slow start algorithm stops and congestion avoidance would begin to react to any new change in the network (or an increase in rwnd).
According to https://stackoverflow.com/a/21775731/20003316, Linux implementation of TCP does not allow cwnd to increase when the sending rate is application-controlled (= sending rate is controlled by rwnd and not cwnd).
By looking more in depth into this, I've found that in fact there is an RFC handling this question : https://www.rfc-editor.org/rfc/rfc7661#page-10
When in slow start :
if the number of ACK in the current window is smaller than 0.5*cwnd, then a TCP implementation must not increase the value of cwnd.
if the number of ACK in the current window is greater or equal than 0.5*cwnd, then a TCP implementation must increase the value of cwnd as it would normally do.
When not in slow start:
if the sending rate is restricted by rwnd and not cwnd and the number of ACK in the current window is smaller than 0.5*cwnd, then a TCP implementation must not increase cwnd.
otherwise proceed as usual
Related
I received an assignment from the College where I have to implement a reliable transfer through UDP aka. TCP Over UDP (I know, reinvent the wheel since this has already been implemented on TCP) to know in deep how TCP works. Some of the requirements are: 3-Way Handshake, Congestion Control (TCP Tahoe, in particular) and Waved Hands. I think about doing this with Java or Python.
Some more specific requirements are:
After each ACK is received:
(Slow start) If CWND < SS-THRESH: CWND += 512
(Congestion Avoidance) If CWND >= SS-THRESH: CWND += (512 * 512) / CWND
After timeout, set SS-THRESH -> CWND / 2, CWND -> 512, and retransmit data after the last acknowledged byte.
I couldn't find more specific information about the TCP Tahoe implementation. But from what I understand, TCP Tahoe is based on Go-Back-N, so I found the following pseudo algorithm for sender and receiver:
My question is the Slow Start and Congestion Avoidance phase should happen right after if sendbase == nextseqnum? That is, right after confirming the receipt of an expected ACK?
My other question is about the Window Size, Go-Back-N uses a fixed window whereas TCP Tahoe uses a dynamic window. How can I calculate window size based on cwnd?
Note: your pictures are unreadable, please provide a higher resolution images
I don't think that algorithm is correct. A timer should be associated with each packet and stopped when ACK for this packet is received. Congestion control is triggered when the timer for any of the packets fires.
TCP is not exactly Go-Back-N receiver. In TCP receiver has a buffer too. This does not require any changes at the sender Go-Back-N. However, TCP is also supposed to implement flow control, in which the receiver tells the sender how much space in its buffer remains, and the sender adjusts its window accordingly.
Note, that Go-Back-N sequence number count packets, and TCP sequence numbers count bytes in the packets, you have to change your algorithm accordingly.
I would advice to get somewhat familiar with rfc793. It does not have congestion control, but it specifies how other TCP mechanics is supposed to work. Also this link has a nice illustration of TCP window and all variables associated with it.
My question is the Slow Start and Congestion Avoidance phase should happen right after if sendbase == nextseqnum? That is, right after confirming the receipt of an expected ACK?
your algorithm only does something when it receives ACK for the last packet. As I said, this is incorrect.
Regardless. Every ACK that acknowledges new packet shoult trigger window increase. You can do check this by checking if send_base was increased as the result of an ACK.
Dunno if every Tahoe implementation does this, but you may need this also. After three consequtive duplicate ACKs, i.e., ACKs that do not increase send_base you trigger congestion response.
My other question is about the Window Size, Go-Back-N uses a fixed window whereas TCP Tahoe uses a dynamic window. How can I calculate window size based on cwnd?
you make the N variable instead of constant, and assign congestion window to it.
in a real TCP with flow control you do N = min (cwnd, receiver_window).
Assume we talking about the situation of many senders sending packets to a receiver.
Often senders would be the one that control congestion by using sliding window that limits sending rate.
We have:
snd_cwnd = min(cwnd,rwnd)
Using explicit or implicit feedback information from network (router,switch), sender would control cwnd to control sending rate.
Normally, rwnd is always big enough that sender only care about cwnd. But if we consider rwnd, using it to limit snd_cwnd, it would make congestion control more efficiently.
rwnd is the number of packets (or bytes) that receiver be able to receive. What I'm concerned about is capability of senders.
Questions:
1. So how do receiver know how many flows sending packets to it?
2. Is there anyway that receiver know the snd_cwnd of sender?
This is all very confused.
The number of flows into a receiver isn't relevant to the rwnd of any specific flow. The rwnd is simply the amount of space left in the receive buffer for that flow.
The receiver has no need to know the sender's cwnd. That's the sender's problem.
Your statement that 'normally rwnd is always big enough that sender only cares about cwnd' is simply untrue. The receive window changes with every receive; it is re-advertised with every ACK; and it frequently drops to zero.
Your following statement 'if we consider rwnd, using it to limit cwnd ...' is simply a description of what already happens, as per 'snd_cwnd = min(cwnd, rwnd)'.
Or else it may constitute a completely unexplained proposal to needlessly modify TCP's flow control which has been working for 25 years, and which didn't work for several years before that: I remember several Arpanet freezes in the middle 1980s.
DCTCP is a variant of TCP for Data Center environment. The source is here
DCTCP using ECN feature in commodity switch to limit queue length of buffer in switch around the threshold K. Doing so, packet loss is rarely happen because K is much smaller than buffer's capacity so buffer isn't almost full.
DCTCP achieve low latency for small-flows while maintaining high throughput for big-flow. The reason is when queue length exceeds threshold K, a notification of congestion will be feedback to sender. At sender, a value for probability of congestion is computed over time, so sender will decrease sending rate correspondingly to the extent of congestion.
DCTCP states that small queue length will decrease the latency or the transmission time of flows. I doubted that. Because unless packet loss leading to re-transmission and so high latency. In DCTCP, packet loss rarely happens.
Small queue at switch forces senders to decrease sending rates so force packets to queue in TX buffer of senders.
Bigger queue at switch make senders have higher sending rates and packets instead queue in TX buffer of senders, it now queue in buffer of switch.
So I think that delay in both small and big queue is still the same.
What do you think?
The buffer in the switch does not increase the capacity of the network, it only helps to not loose too much packets if you have a traffic burst. But, TCP can deal with packet loss by sending slower, which is exactly what it needs to do in case the network capacity is reached.
If you continuously run the network at the limit, the queue of the switch will be full or nearly full all the time, so you still loose packets if the queue is full. But, you also increase the latency, because the packet needs some time to get from the end of the queue where it arrived to the beginning where it will be forwarded. This latency again causes the TCP stack to react slower to congestion, which again increases congestion, packet loss etc.
So the ideal switch behaves like a network cable, e.g. does not have any buffer at all.
You might read more about the problems caused by large buffers by searching for "bufferbloat", e.g. http://en.wikipedia.org/wiki/Bufferbloat.
And when in doubt benchmark yourself.
It depends on queue occupancy. DCTCP aims to maintain small queue occupancy, because the authors think that queueing delay is the reason of long latency.
So, it does not matter how maximum size of queue is. In 16Mb of maximum queue size or just 32kb of maximum queue size, if we can maintain queue occupancy always around 8kb or something small size, queueing delay will be the same.
Read a paper, HULL from NSDI 2012, of M. Alizadeh who is the first author of DCTCP. HULL also aims to maintain short queue occupancy.
What they talk about small buffer is, because trends of data center switches shift from 'store and forward' buffer to 'cut-through' buffer. Just google it, and you can find some documents from CISCO or somewhere related webpages.
I am trying to learn how TCP Flow Control works when I came across the concept of receive window.
My question is, why is the TCP receive window scale-able? Are there any advantages from implementing a small receive window size?
Because as I understand it, the larger the receive window size, the higher the throughput. While the smaller the receive window, the lower the throughput, since TCP will always wait until the allocated buffer is not full before sending more data. So doesn't it make sense to have the receive window at the maximum at all times to have maximum transfer rate?
My question is, why is the TCP receive window scale-able?
There are two questions there. Window scaling is the ability to multiply the scale by a power of 2 so you can have window sizes > 64k. However the rest of your question indicates that you are really asking why it is resizeable, to which the answer is 'so the application can choose its own receive window size'.
Are there any advantages from implementing a small receive window size?
Not really.
Because as I understand it, the larger the receive window size, the higher the throughput.
Correct, up to the bandwidth-delay product. Beyond that, increasing it has no effect.
While the smaller the receive window, the lower the throughput, since TCP will always wait until the allocated buffer is not full before sending more data. So doesn't it make sense to have the receive window at the maximum at all times to have maximum transfer rate?
Yes, up to the bandwidth-delay product (see above).
A small receive window ensures that when a packet loss is detected (which happens frequently on high collision network),
No it doesn't. Simulations show that if packet loss gets above a few %, TCP becomes unusable.
the sender will not need to resend a lot of packets.
It doesn't happen like that. There aren't any advantages to small window sizes except lower memory occupancy.
After much reading around, I think I might just have found an answer.
Throughput is not just a function of receive window. Both small and large receive windows have their own benefits and harms.
A small receive window ensures that when a packet loss is detected (which happens frequently on high collision network), the sender will not need to resend a lot of packets.
A large receive window ensures that the sender will not be idle a most of the time as it waits for the receiver to acknowledge that a packet has been received.
The receive window needs to be adjustable to get the optimal throughput for any given network.
the timeout interval dynamically changes depending on the network. It is generally represented by
TimeoutInterval = EstimatedRTT + 4*DevRTT
But why do we you 4*DevRTT?
Why can't it be 2*DevRTT??
You could set it to that, but you would be decreasing the amount of cushion you are giving variations in RTT by half.
If you have wide variances in RTT, which can happen in more situations than you realize, then you would be setting the timeout value relatively low.
Because this timeout controls the re-transmission of data, setting this level lower almost certainly means that the number of re-transmission will increase in certain scenarios. The concern would be that these re-transmissions are unnecessary, and possibly increase utilization of an already saturated network.