Why is the congestion control process deployed at sender nodes?
I know one of the limitations that avoid deploying congestion control at routers is that routers need to maintain flow states, which increases the burden of routers. Put this aside, Is there any other defects that limit deploying congestion at router side?
There are indeed congestion controls that either require support from or is deployed in routers. An example is XCP (https://www.ietf.org/proceedings/61/slides/tsvwg-5.pdf) in which routers allocate bandwidth without keeping per-flow state. Another is Data Center TCP which uses ECN-marks provided by the routers to detect the extent of congestion. These two examples are ment for networks with one authority. In the Internet there are many authorities/actors with different goals. If we were to put the congestion control in the routers, what congestion control policy do we choose?
Imagine that you have two flows, A and B, and two routers, R1 and R2. R1 has a capcity of 100Mbit/s, and R2 has a capacity of 10Mbit/s. Flow A goes through only R1, while Flow B goes through R1 and R2. Let us say that we share the capacity of R1 equally, A and B gets 50Mbit/s each. B goes through R2 which has only 10Mbit/s, so it is unable to use the 50Mbit/s given to it by R1. What should happen in this case? R1 should probably change the allocation, but how? If the routers are in different domains which does not trust eachother negotiation is out of the question. The routers does not trust the end-systems, so the end-systems cannot communicate the allocation to the routers.
The main issue as I see it is defining a congestion control policy that is accepted by all the actors in the Internet. A strong argument for having congestion control in the end-system is the end-to-end principle. TCP congestion control is transport-layer functionality and should not be implemented in the internet-layer because it is not used by all transport-layer algorithms (UDP).
Related
Is it possible for each node connected to the same router to implement different congestion avoidance technique? Also, is it possible to completely disable congestion avoidance in a node connected to a router. Thanks.
The answer depends on your definition of "possible".
Internet hosts should adhere to internet standards. According to these documents TCP should implement congestion control algorithm that is "reno-friendly", that is can coexist with Reno (see RFC5681). So, if you need a TCP implementation, that adheres to these standards, then the answer is no.
Enforcing this, is however another issue, which as far as I know does not really have a solution. So, can you still implement whatever congestion control or no congestion control at all, and still connect to Internet, the yes.
Is it actually done? Yes As of now, Linux hosts use TCP Cubic, and Windows uses another congestion control mechanism, whose name I don't remember. They are both Reno friendly and coexist with each other, but they are different and they are different from Reno. Recentrly, Google deployed BBR, which may or may not be Reno friendly. Moreover, realtime multimedia streams (e.g., voice or video conferences) also should use some kind of congestion control, so they contribute to the variety as well.
Will router care? Not Really. Router does not care if the attached hosts implement congestion control or not. A simple router will do exactly the same thing. It will get incomming packets, then either send them, if outgoing interface is free, queue them, if the interface is currently busy transmitting other packets and it has space in a queue for this interface, or drop packets, if the queue is full. More complicated routers can utilize things like active queue management schemes, or quality of service with rate control. This will affect how router handles packets, but it won't affect the functionality of the router. It has to take misbehaving hosts into account.
What will be affected? Applications. Application performance for several flows sharing the same bottleneck will be affected, if the hosts implement different congestion control mechanisms or no congestion control at all. How, actually depends on bottleneck bandwidth, capabilities of the routers, and traffic patterns of the applications. It is not possible to say how. However, there is definitelly a possibility that network will not be able to transmit useful traffic (this is known as congestion collapse). One other important thing that will most likely be affected is fairness, which more or less quantifies how equally several flows sharing the same bottleneck will share available bandwidth. A flow that does not implement congestion control can highjack all available bandwidth. The same applies with flows that use more aggressive congestion control than TCP Reno and flows that don't. So, it is not nice, not to implement congestion control. Of course the router can actually do something about it, but it requires pretty expensive per-flow sheduling (you can search for fair-queueing or flow-queueing), and routers usually do not do this.
References:
requirements for internet hosts: RFC1122
latest congestion control algorithm specification
RFC5681.
The Gossip protocol used by many distributed systems e.g. Cassandra to communicate with other nodes in the ring. So, does it use HTTP or TCP protocol?
Also, what are the pros choosing one over another in distributed systems?
You can use any protocol you want (tcp, http, dns etc) to broadcast information regarding the state of your nodes from a cluster. In my opinion , you should focus on a gossip algorithm, and not really think about the "protocol" word from the naming. At it's core, it's all about broadcasting information between nodes. Each node sending it's own view of the cluster state to a subgroup of nodes, and the broadcast keeps going until all nodes share the same view. There are multiple ways of implmenting such a broadcasting algorithm so research more about it or try your own model :) .
Here is some nice info and pseudo code about gossip model/algorithms
HTTP and TCP are fundamentally different things as they work on different layers of the network stack:
https://en.wikipedia.org/wiki/OSI_model
If you look at the OSI Model TCP works on the transport layer (Layer 4) and HTTP works on the application layer (layer 7), The two perform different jobs. The transport layer is responsible for providing the functional mechanisms for transferring data. The application layer is built on top of the transport (and other) layers and provides items such as partner negotiation, availability and communication synching.
The two are not interchangeable with one another.
Coming from a background of vSphere vm's with vNIC's defined on creation as I am do the GCE instances internal and public ip network connections use a particular virtualised NIC and if so what speed is it 100Mbit/s 1Gb or 10Gb?
I'm not so much interested in the bandwidth from the public internet in but more what kind of connection is possible between instances given networks can span regions
Is it right to think of a GCE project network as a logical 100Mbit/s 1Gb or 10Gb network spanning the atlantic I plug my instances into or should there be no minimum expectation because too many variables exist like noisy neighbours and inter region bandwidth not to mention physical distance?
The virtual network adapter advertised in GCE conforms to the virtio-net specification (specifically virtio-net 0.9.5 with multiqueue). Within the same zone we offer up to 2Gbps/core of network throughput. The NIC itself does not advertise a specific speed. Performance between zones and between regions is subject to capacity limits and quality-of-service within Google's WAN.
The performance relevant features advertised by our virtual NIC as of December 2015 are support for:
IPv4 TCP Transport Segmentation Offload
IPv4 TCP Large Receive Offload
IPv4 TCP/UDP Tx checksum calculation offload
IPv4 TCP/UDP Rx checksum verification offload
Event based queue signaling/interrupt suppression.
In our testing for best performance it is advantageous to enable of all of these features. Images supplied by Google will take advantage of all the features available in the shipping kernel (that is, some images ship with older kernels for stability and may not be able to take advantage of all of these features).
I can see up to 1Gb/s between instances within the same zone, but AFAIK that is not something which is guaranteed, especially for tansatlantic communication. Things might change in the future, so I'd suggest to follow official product announcements.
There have been a few enhancements in the years since the original question and answers were posted. In particular, the "2Gbps/core" (really, per vCPU) is still there but there is now a minimum cap of 10 Gbps for VMs with two or more vCPUs. The maximum cap is currently 32 Gbps, with 50 Gbps and 100 Gbps caps in the works.
The per-VM egress caps remain "guaranteed not to exceed" not "guaranteed to achieve."
In terms of achieving peak, trans-Atlantic performance, one suggestion would be the same as for any high-latency path. Ensure that your sources and destinations are tuned to allow sufficient TCP window to achieve the throughput you desire. In particular, this formula would be in effect:
Throughput <= WindowSize / RoundTripTime
Of course that too is a "guaranteed not to exceed" rather than a "guaranteed to achieve" thing. As was stated before "Performance between zones and between regions is subject to capacity limits and quality-of-service within Google's WAN."
We're about to design an inhouse industry network consisting basically of the following: 1 server connected via wire to up to 100 proprietary RF access points (basically embedded devices), which each can be connected via radio to up to 100 endpoint embedded devices. Something like this:
Now, I'm wondering about some design decisions that we need to take and I'm sure there are plenty of similar designs out there and lots of folks with experiences of them, both good and bad. Maybe you can chime in?
All endpoint devices are independent and will communicate their own unique data to the server, and the other way around. The server therefore needs to be able to target each endpoint device individually. Each endpoint device pairs itself with 1 access point and then talks a proprietary RF protocol to it, TCP/IP is not an option there.
The server will know which endpoint device is paired with which access point, so when the server needs to talk to an individual endpoint device, the communication must go through the paired access point. Hence, the server needs to directly address the access point.
Question: Considering the limited resources available in the proprietary access point, is TCP/IP between server and access point recommended for this scenario? Or would you suggest something entirely different?
I find the diagram confusing:
If this isn't its own network and the server to AP link is running on your internal company network, there isn't really an option, there must be a TCP/IP stack on the AP.
If this is its own isolated network then what is the router for?
If this is, in fact, its own isolated network then you are right, there really isn't a need for the Ethernet connectivity at all. The overhead you will see on the wireless is huge, your no overhead ideal data rate is 250kbit/sec, running ZigBee on 802.15.4 # 2.4ghz point to point your real data throughout is usually around 20kbit/sec. A custom protocol should be able to obtain lower overhead but this would need to be defined.
If I were designing this I would choose a SoC for the AP that had on board 802.15.4 and CAN (Controller Area Network). Depending on size and data rate just get a PCI CAN card for the server and connect it up, use something like DeviceNet as your protocol layer for server to AP communications. This can be expanded by using CAN switches and repeaters. CAN is used all the time in industrial automation, a little googling can find you example of tens of thousands of nodes used in some manufacturing plants.
There are small TCP/IP stacks, for example LwIP.
You didn't mention the amount of data to be communicated, or bandwidth considerations?
A 3rd party TCPIP stack targeted at the 8051 would simplify all the networking issues with connecting 100 units. You probably will still end up with a proprietary protocol that sits on top of the tcpip stack but then it is just simple point-to-point communication between the server and each end point.
I want to develop simple Serverless LAN Chat program just for fun. How can I do this ? What type Architecture should I use?
Last year I have worked on TCP,UDP Client/ Server application Project.It was simple (Server listens to certain port/socket and Client connect to server's port etc..) But I have no idea about how to develop "Serverless" LAN Chat program. How can I do this? UDP,TCP,Multicast,Broadcast? or Should program behave like both server and client?
The simplest way would be to use UDP and simply broadcast your messages all over the network.
A little bit more advanced version would be to only use the broadcast to discover other nodes in the network.
Every node maintains a list of known peers.
Messages are sent with TCP to all known peers.
When a node starts up, it sends out an UDP broadcast to discover other nodes.
When a node receives a discovery broadcast, it sends "itself" to the source of the broadcast, in order to make it self known. The receiving node adds the broadcaster to it's own list of known peers.
When a node drops out of the network, it sends another broadcast in order to inform the remaining nodes that they should remove the dropped client from their list.
You would also have to consider handling the dropping out of nodes without them informing the rest of the network.
The spread toolkit may be a bit overkill for what you want, but an interesting starting point.
From the blurb:
Spread is an open source toolkit that provides a high performance messaging service that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support. Spread services range from reliable messaging to fully ordered messages with delivery guarantees.
Spread can be used in many distributed applications that require high reliability, high performance, and robust communication among various subsets of members. The toolkit is designed to encapsulate the challenging aspects of asynchronous networks and enable the construction of reliable and scalable distributed applications.
Spread consists of a library that user applications are linked with, a binary daemon which runs on each computer that is part of the processor group, and various utility and demonstration programs.
Some of the services and benefits provided by Spread:
Reliable and scalable messaging and group communication.
A very powerful but simple API simplifies the construction of distributed architectures.
Easy to use, deploy and maintain.
Highly scalable from one local area network to complex wide area networks.
Supports thousands of groups with different sets of members.
Enables message reliability in the presence of machine failures, process crashes and recoveries, and network partitions and merges.
Provides a range of reliability, ordering and stability guarantees for messages.
Emphasis on robustness and high performance.
Completely distributed algorithms with no central point of failure.
Apples iChat is an example of the very product you are envisioning. It uses Bonjour (apple's zero-conf networking protocol) to identify peers on a LAN. You can then chat or audio/video chat with them.
I'm not entirely sure how Bonjour works inside, but I know it uses multicast. Clients "register" services on the LAN, and the Bonjour protocol allows for each host to pull up a directory of hosts for a given service (all without central management).