DCTCP is a variant of TCP for Data Center environment. The source is here
DCTCP using ECN feature in commodity switch to limit queue length of buffer in switch around the threshold K. Doing so, packet loss is rarely happen because K is much smaller than buffer's capacity so buffer isn't almost full.
DCTCP achieve low latency for small-flows while maintaining high throughput for big-flow. The reason is when queue length exceeds threshold K, a notification of congestion will be feedback to sender. At sender, a value for probability of congestion is computed over time, so sender will decrease sending rate correspondingly to the extent of congestion.
DCTCP states that small queue length will decrease the latency or the transmission time of flows. I doubted that. Because unless packet loss leading to re-transmission and so high latency. In DCTCP, packet loss rarely happens.
Small queue at switch forces senders to decrease sending rates so force packets to queue in TX buffer of senders.
Bigger queue at switch make senders have higher sending rates and packets instead queue in TX buffer of senders, it now queue in buffer of switch.
So I think that delay in both small and big queue is still the same.
What do you think?
The buffer in the switch does not increase the capacity of the network, it only helps to not loose too much packets if you have a traffic burst. But, TCP can deal with packet loss by sending slower, which is exactly what it needs to do in case the network capacity is reached.
If you continuously run the network at the limit, the queue of the switch will be full or nearly full all the time, so you still loose packets if the queue is full. But, you also increase the latency, because the packet needs some time to get from the end of the queue where it arrived to the beginning where it will be forwarded. This latency again causes the TCP stack to react slower to congestion, which again increases congestion, packet loss etc.
So the ideal switch behaves like a network cable, e.g. does not have any buffer at all.
You might read more about the problems caused by large buffers by searching for "bufferbloat", e.g. http://en.wikipedia.org/wiki/Bufferbloat.
And when in doubt benchmark yourself.
It depends on queue occupancy. DCTCP aims to maintain small queue occupancy, because the authors think that queueing delay is the reason of long latency.
So, it does not matter how maximum size of queue is. In 16Mb of maximum queue size or just 32kb of maximum queue size, if we can maintain queue occupancy always around 8kb or something small size, queueing delay will be the same.
Read a paper, HULL from NSDI 2012, of M. Alizadeh who is the first author of DCTCP. HULL also aims to maintain short queue occupancy.
What they talk about small buffer is, because trends of data center switches shift from 'store and forward' buffer to 'cut-through' buffer. Just google it, and you can find some documents from CISCO or somewhere related webpages.
Related
I'm trying to understand how TCP works and I'm a bit surprised by the (absence of) effect of the receiver window (rwnd) on the congestion window (cwnd).
From what I've read (mainly wikipedia and RFC5681) I understand that if the slow start threshold (ssthresh) has not been reached but the transmission rate is restricted by rwnd (since it is the minimum value between rwnd and cwnd) then cwnd continues to increase during the slow start phase (and even during congestion avoidance) if there are no loss or timeout. Meaning that cwnd could potentially reach a very high value since the initial value of ssthresh is extremely big.
See the following citation to confirm my deduction :
Implementation Note: An easy mistake to make is to simply use cwnd,
rather than FlightSize, which in some implementations may
incidentally increase well beyond rwnd.
[from RFC5681 (this part of the RFC is about setting a new value for ssthresh after a loss)]
In this case wouldn't it be possible to :
keep a connection with a relatively low transmission rate (e.g. setting rwnd to 10mss in every ack) to have no loss and hence keep the connection in the slow start phase,
wait enough time to allow cwnd to be extremely big (like 10 times what the link can handle) and then
set rwnd to an even bigger value to let the transmission rate be restricted only by cwnd ?
This would lead to a massive amount of congestion on the link, especially since it will take quite a lot of time for the server to notice the loss with a timeout and reset cwnd back to its initial value... and this may have a huge impact on other connections using the same link, or at least the same bottleneck link.
I would have imagined that once rcwnd is reached, slow start algorithm stops and congestion avoidance would begin to react to any new change in the network (or an increase in rwnd).
According to https://stackoverflow.com/a/21775731/20003316, Linux implementation of TCP does not allow cwnd to increase when the sending rate is application-controlled (= sending rate is controlled by rwnd and not cwnd).
By looking more in depth into this, I've found that in fact there is an RFC handling this question : https://www.rfc-editor.org/rfc/rfc7661#page-10
When in slow start :
if the number of ACK in the current window is smaller than 0.5*cwnd, then a TCP implementation must not increase the value of cwnd.
if the number of ACK in the current window is greater or equal than 0.5*cwnd, then a TCP implementation must increase the value of cwnd as it would normally do.
When not in slow start:
if the sending rate is restricted by rwnd and not cwnd and the number of ACK in the current window is smaller than 0.5*cwnd, then a TCP implementation must not increase cwnd.
otherwise proceed as usual
Hello all I am new to networking and a question arose in my head. Would a device that is physically closer to another device transfer a file quicker than a device which is across the globe if a P2P connection were used?
Thanks!
No, not generally.
The maximum throughput between any two nodes is limited by the slowest interconnect they are using in their path. When acknowledgments are used (eg. with TCP), throughput is also limited by congestion, possible send/acknowledgment window size, round-trip time (RTT) - you cannot transfer more than one full window in each RTT period - and packet loss.
Distance doesn't matter basically. However, for long distance a large number of interconnects is likely used, increasing the chance for a weak link, congestion, or packet loss. Also, RTT inevitably increases, requiring a large send window (TCP window scale option).
Whether the links are wired, copper, fiber, or wireless doesn't matter - the latter means there's some risk for additional packet loss, however. P2P or classic client-server doesn't matter either.
I have a system that sends "many" (hundreds) of UDP datagrams in bursts, every once in awhile (say, 10 times a minute). According to nload, this averages about 222kBit/s. The content of these datagrams is JSON. I've considered altering the system so that it waits some time (500ms?) and combines many of the JSON objects into one datagram, before sending. But I'm not sure it's worth the effort (bandwidth, protocol, frequency of sending considered.) Would the new approach provide any real benefits over the current one?
The short answer is that it's up to you to decide that.
The long version is that it depends on your use case. Since we don't know what you're building, it's hard to say what's more important - latency? Throughput? Reliability? Something else? Let's analyze some pros and cons. Here's what I came up with:
Pros to sending larger packets:
Fewer messages means fewer system calls and less I/O against the network. That means fewer blocked/waiting threads and less time spent on interrupts.
Fewer, larger packets means less overhead for each individual packet (stuff like IP/UDP headers that's send with each packet). Therefore a higher data rate is (theoretically) achievable, although keep in mind that all of these headers (L2+IP+UDP) typically add up to no more than 60-70 bytes per packet since the UDP header is only 8 bytes long.
Since UDP doesn't guarantee ordering, larger packets with more time between them will reduce any existing reordering.
Cons to sending larger packets:
Re-writing existing code, and making it (slightly) more complicated.
UDP is unreliable, so a loss of a single (large) packet would be more significant compared to the loss of a small packet.
Latency - some data will have to wait 500ms to be sent. That means that a delay is added between the sender and the receiver.
Fragmentation - if one of the packets you create crosses the MTU boundary (typically 1450-1500 bytes including the IP+UDP header, which is normally 28 bytes long), the IP layer would need to fragment the packet into several smaller ones. IP fragmentation is considered bad for a multitude of reasons.
Processing of larger packets might take longer
Timeliness:
The system must deliver data in a timely manner. Data delivered late are useless. In the case of video and audio, timely delivery means delivering data as they are produced, in the same order that they are produced, and without significant delay. This kind of delivery is called real-time transmission.
Jitter:
Jitter refers to the variation in the packet arrival time. It is the uneven delay in the delivery of audio or video packets. For example, let us assume that video packets are sent every 3D ms. If some of the packets arrive with 3D-ms delay and others with 4D-ms delay, an uneven quality in the video is the result.
Real-time applications, such as video and VoIP, can withstand a certain amount of latency (for VoIP, this is normally considered to be 250 ms) and lost data.
Late delivery really means out-of-order delivery. Having data considered lost arrive after it is useful (e.g. packet 100 arriving after packet 110) is more disruptive than losing the data, and late-arriving data must be discarded, otherwise it creates chaos.
Unidirectional real-time data can actually stand a lot of latency: think of the seven-second delay added to real-time television and radio broadcasts. If video frames are delivered out-of-order (timeliness), they must be discarded.
Jitter is variance in latency. VoIP can withstand a fair amount of latency, as long as that latency is consistent, but, even with very good latency, a lot of jitter will kill VoIP. For instance, a VoIP latency of 50 ms is good, but having packets delivered with a lot of jitter, even keeping the maximum latency under 50 ms, will destroy VoIP.
I am trying to learn how TCP Flow Control works when I came across the concept of receive window.
My question is, why is the TCP receive window scale-able? Are there any advantages from implementing a small receive window size?
Because as I understand it, the larger the receive window size, the higher the throughput. While the smaller the receive window, the lower the throughput, since TCP will always wait until the allocated buffer is not full before sending more data. So doesn't it make sense to have the receive window at the maximum at all times to have maximum transfer rate?
My question is, why is the TCP receive window scale-able?
There are two questions there. Window scaling is the ability to multiply the scale by a power of 2 so you can have window sizes > 64k. However the rest of your question indicates that you are really asking why it is resizeable, to which the answer is 'so the application can choose its own receive window size'.
Are there any advantages from implementing a small receive window size?
Not really.
Because as I understand it, the larger the receive window size, the higher the throughput.
Correct, up to the bandwidth-delay product. Beyond that, increasing it has no effect.
While the smaller the receive window, the lower the throughput, since TCP will always wait until the allocated buffer is not full before sending more data. So doesn't it make sense to have the receive window at the maximum at all times to have maximum transfer rate?
Yes, up to the bandwidth-delay product (see above).
A small receive window ensures that when a packet loss is detected (which happens frequently on high collision network),
No it doesn't. Simulations show that if packet loss gets above a few %, TCP becomes unusable.
the sender will not need to resend a lot of packets.
It doesn't happen like that. There aren't any advantages to small window sizes except lower memory occupancy.
After much reading around, I think I might just have found an answer.
Throughput is not just a function of receive window. Both small and large receive windows have their own benefits and harms.
A small receive window ensures that when a packet loss is detected (which happens frequently on high collision network), the sender will not need to resend a lot of packets.
A large receive window ensures that the sender will not be idle a most of the time as it waits for the receiver to acknowledge that a packet has been received.
The receive window needs to be adjustable to get the optimal throughput for any given network.