Why do Docker overlay networks require consensus?

Why do Docker overlay networks require consensus? - networking

Just been reading up on Docker overlay networks, very cool stuff. I just can't seem to find an answer to one thing.
According to the docs:
If you install and use Docker Swarm, you get overlay networks across your manager/worker hosts automagically, and don't need to configure anything more; but...
If you simply want a (non-Swarm) overlay network across multiple hosts, you need to configure that network with an external "KV Store" (consensus server) like Consul or ZooKeeper
I'm wondering why this is. Clearly, overlay networks require consensus amongst peers, but I'm not sure why or who those "peers" even are.
And I'm just guessing that, with Swarm, there's some internal/under-the-hood consensus server running out of the box.

Swarm Mode uses Raft for it's manager consensus with a built-in KV store. Before swarm mode, overlay networking was possible with third party KV stores. Overlay networking itself doesn't require consensus, it just relies on whatever the KV store says regardless of the other nodes or even it's own local state (I've found this out the hard way). The KV stores out there are typically setup with consensus for HA.
The KV store tracks IP allocations to containers running on each host (IPAM). This allows docker to only allocate a given address once, and to know which docker host it needs to communicate with when you connect to a container running on another host. This needs to be external from any one docker host, and preferably in an HA configuration (like swarm mode's consensus) so that it can continue to work even when some docker nodes are down.
Overlay networking between docker nodes only involves the nodes that have containers on that overlay network. So once the IP is allocated and discovered, all the communication only happens between the nodes with the relevant containers. This is easy to see with swarm mode if you create a network and then list networks on a worker, it won't be there. Once a container on that network gets scheduled, the network will appear. From docker, this reduces overhead of multi-host networking while also adding to the security of the architecture. The result looks like this graphic:
The raft consensus itself is only needed for leader election. Once a node is selected to be the leader and enough nodes remain to have consensus, only one node is writing to the KV store and maintaining the current state. Everyone else is a follower. This animation describes it better than I ever could.
Lastly, you don't need to setup an external KV store to use overlay networking outside of swarm mode services. You can implement swarm mode, configure overlay networks with the --attachable option, and run containers outside of swarm mode on that network as you would have with an external KV store. I've used this in the past as a transition state to get containers into swarm mode, where some were running with docker-compose and others had been deployed as a swarm stack.

Related

Limits when running zerotier

We want to use zero tier to connect from one cloud machine to multiple remote machines. We do not want remote machines to access each other. What would be a good approach?
Use a single network and set rules based on tags to restrict access
Run multiple networks, each having cloud machine and a remote machine
Are there limits to
Number of members in zerotier network
Number of zerotier networks a machine can connect to at a time - tun interfaces, ip conflicts or performance impact

I would use a single network and use rules to prevent peering between the machines. For instance, you could set the 192.168.141.0/25 portion of the network to prevent peering, and allow only defined network paths between hosts.
Just a personal rant here: You don't want to do that. Really. You're going to make a headache for yourself when you have to scale horizontally (which you will if you're successful). I would STRONGLY recommend taking a mTLS approach to service authentication instead. Somewhat more work at the start, but a lot easier in the long run.

Migrate from legacy network in GCE

Long story short - I need to use networking between projects to have separate billing for them.
I'd like to reach all the VMs in different projects from a single point that I will use for provisioning systems (let's call it coordinator node).
It looks like VPC network peering is a perfect solution to this. But unfortunately one of the existing networks is "legacy". Here's what google docs state about legacy networks.
About legacy networks
Note: Legacy networks are not recommended. Many newer GCP features are not supported in legacy networks.
OK, naturally the question arises: how do you migrate out of legacy network? Documentation does not address this topic. Is it not possible?
I have a bunch of VMs, and I'd be able to shutdown them one by one:
shutdown
change something
restart
unfortunately it does not seem possible to change network even when VM is down?
EDIT:
it has been suggested to recreate VMs keeping the same disks. I would still need a way to bridge legacy network with new VPC network to make migration fluent. Any thoughts on how to do that using GCE toolset?

One possible solution - for each VM in the legacy network:
Get VM parameters (API get method)
Delete VM without deleting PD (persistent disk)
Create VM in the new VPC network using parameters from step 1 (and existing persistent disk)
This way stop-change-start is not so different from delete-recreate-with-changes. It's possible to write a script to fully automate this (migration of a whole network). I wouldn't be surprised if someone already did that.
UDPATE
https://github.com/googleinterns/vm-network-migration tool automates the above process, plus it supports migration of a whole Instance Group or Load Balancer, etc. Check it out.

What is the replacement for `--net=container` in new docker networking?

In the pre docker 1.9 days I used to have a vpn provider container which I could use as the network gateway for my other containers by passing the option --net=container:[container-name].
This was very simple but had a major limitation in that the provider container had to exist prior to starting the consumers and it could not be restarted.
The new docker networking stack seems to have dropped this provision in favour of creating networks which does sound better, but I'm struggling to get equivalent behaviour.
Right now I have created an internal network docker network create isolated --internal --subnet=172.32.0.0/16 and brought up 2 containers one of which is attached only to internal network and one which is attached to both the default bridge and the internal network.
Now I need to route all network traffic from the isolated container through the connected one. I've messed around with some iptable rules but tbh this is not my strongest area.
So my questions are simply: Is my approach along the right lines? What rules need to be in place in the two containers to get this working as --net=container?

Docker overlay network doesn't clean up removed containers

We're running Docker across two hosts, with overlay networking enabled and configured. It's version 1.12.1, with Consul as the KV store - but we aren't using Swarm, largely because we didn't feel it gave us the relevant control over ensuring availability and minimising resources, but anyway.
Our setup is micro service based, and we run quite a lot of containers which get restarted fairly frequently. Our model uses nginx as a "reverse proxy" for service discovery, for various reasons, and so we start multiple containers which share a --host of "nginx-lb". This works fine, and other containers on the network can connect to nginx-lb, which gives them a random one of the containers' IP addresses.
The problem we have is that in killing containers and creating new ones, sometimes (I don't know what specific circumstance this occurs in), the overlay network does not remove the old container from the system, and so other containers then try to connect to the dead ones, causing problems.
The only way to then resolve this, is to manually call a docker network disconnect -f overlay_net [container], having already run a docker network inspect overlay_net to find the errant containers.
Is there a known issue with the overlay networking not removing dead containers from the KV data, or any ideas of a fix?

Yes it's a known issue. Follow it here https://github.com/docker/docker/issues/26244

How can I set up a Docker network with restricted communication?

I'm trying to create something like this:
The server containers each have port 8080 exposed, and accept requests from the client, but crucially, they are not allowed to communicate with each other.
The problem here is that the server containers are launched after the client container, so I can't pass container link flags to the client like I used to, since the containers it's supposed to link to don't exist yet.
I've been looking at the newer Docker networking stuff, but I can't use a bridge because I don't want server cross-communication to be possible. It also seems to me like one bridge per server doesn't scale well, and would be difficult to manage within the client container.
Is there some kind of switch-like docker construct that can do this?

It seems like you will need to create multiple bridge networks, one per container. To simplify that, you may want to use docker-compose to specify how the networks and containers should be provisioned, and have the docker-compose tool wire it all up correctly.
Resources:
https://docs.docker.com/engine/userguide/networking/dockernetworks/
https://docs.docker.com/compose/
https://docs.docker.com/compose/compose-file/#version-2
One more side note: I think that exposed ports are accessible to all networks. If that's right, you may be able to set all of the server networking to none and rely on the exposed ports to reach the servers.

Hope this is relevant to your use-case - I'm attempting to draw context regards your actual application from the diagram and comments. I'd recommend you go the Service Discovery route. It may involve a little bit of simple API over a central store (say Redis, or SkyDNS), but would make things simple in the long run.
Kubernetes, for instance, uses SkyDNS to do so with DNS. At the end of the day, any orchestration tool of your choice would most likely do something like this out of the box: https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns
The idea is simple:
Use a DNS container that keeps entries of newly spawned servers
Allow the Client Container to query it for a list of servers. e.g. Picture a DNS response with a bunch of server-<<ISO Timestamp of Server Creation>>s
Disallow client containers read-access to this DNS (how to manage this permission-configuration without indirection, i.e. without proxying through an endpoint that allows writing into the DNS Container, but not reading, is going to exotic)
Bonus Edit: I just realised you can use a simpler Redis-like setup to do this, and that DNS might just be overengineering :)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex