I am trying to understand the relationship between:
eth0 on the host machine; and
docker0 bridge; and
eth0 interface on each container
It is my understanding that Docker:
Creates a docker0 bridge and then assigns it an available subnet that is not in conflict with anything running on the host; then
Docker binds docker0 to eth0 running on the host; then
Docker binds each new container it spins up to docker0, such that the container's eth0 interface connects to docker0 on the host, which in turn is connected to eth0 on the host
This way, when something external to the host tries to communicate with a container, it must send the message to a port on the host's IP, which then gets forwarded to the docker0 bridge, which then gets broadcasted to all the containers running on the host, yes?
Also, this way, when a container needs to communicate to something outside the host, it has its own IP (leased from the docker0 subnet) and so the remote caller will see the message as having came from the container's IP.
So if anything I have stated above is incorrect, please begin by clarifying for me!
Assuming I'm more or less correct, my main concerns are:
When remote services "call in" to the container, all containers get broadcasted the same message, which creates a lot of traffic/noise, but could also be a security risk (where only container 1 should be the recipient of some message, but all the other containers running on it get the message as well); and
What happens when Docker chooses identical subnets on different hosts? In this case, container 1 living on host 1 might have the same IP address as container 2 living on host 2. If container 1 needs to "call out" to some external/remote system (not living on the host), then how does that remote system differentiate between container 1 vs container 2 (both will show the same egress IP)?
I won't say that you are clear with the concept of networking in Docker.
Let me clarify this part first:
So here's how it goes:
Docker uses a feature of Linux kernel called namespaces to classify/divide the resources.
When a container starts, Docker creates a set of namespaces for that container.
This provides a layer of isolation.
One of these is the "net namespace": Used for managing network interfaces.
Now talking a bit about Network Namespaces:
Net namespace,
Lets each container have its own network resources, own network stack.
It’s own network interfaces.
It’s own routing tables.
It’s own iptables rules.
It’s own sockets (ss, netstat)
We can move a network interface across net namespaces.
So we can create a netns somewhere and use it at some other container.
Typically: Two virtual interfaces are used, which act as a cross-over cable.
Eth0 # container netNS, which is paired with virtual interface vethXXX in host
network ns.
➔ All virtual interfaces vethXXX are bridged together. (Using the bridge docker0)
Now, apart from namespaces, there's a second feature in Linux kernel that makes creation of containers possible: c-groups (or control-groups).
Control groups let us implement metering and limiting of:
Memory
CPU
Block I/O
Network
TL/DL
In Short:
Containers are made possible because of 2 main features of kernel:
Namespaces and C-Groups.
Cgroups ---> Limits how much you can use.
Namespaces ---> Limits what you can see.
And you can't effect what you can't see.
Coming back to your question, When a packet is received by the host that is intended for a container, It is encapsulated in layers such that each layer helps the network controller, which peels the packet layer after layer to send it to it's destination. (And similarly while outgoing, the packets are encapsulated layer by layer)
So, I think this answers both of your questions as well.
It's not exactly a broadcast. Other containers can't see a packet that's not related to them (Namespaces).
Since layers are added as a packet goes to the external network, the external layer (Different for different hosts) will help the packet to identify it's recipient uniquely.
P.S.:
If someone find some information erroneous, please let me know in the comments. I have written this in a hurry, will update it with better reviewed text soon.
Thank you.
Related
i am trying out Kubernetes on bare-metal, as a example I have docker containers exposing port 2002 (this is not HTTP).
I do not need to load balance traffic among my pods since each of new pod is doing its own jobs not for the same network clients.
Is there a software that will allow to access each new created service with new IP from internal DHCP so I can preserve my original container port?
I can create service with NodePort and access this pod by some randomly generated port that is forwarded to my 2002 port.
But i need to preserve that 2002 port while accessing my containers.
Each new service would need to be accessible by new LAN IP but with the same port as containers.
Is there some network plugin (LoadBalancer?) that will allow to forward from IP assigned by DHCP back to this randomly generated service port so I can access containers by original ports?
Starting service in Kubernetes, and then accessing this service with IP:2002, then starting another service but the same container image as previous, and then accessing it with another_new_IP:2002
Ah, that happens automatically within the cluster -- each Pod has its own IP address. I know you said bare metal, but this post by Lyft may give you some insight into how you can skip or augment the SDN and surface the Pod's IPs into routable address space, doing exactly what you want.
In more real terms: I haven't ever had the need to attempt such a thing, but CNI is likely flexible enough to interact with a DHCP server and pull a Pod's IP from a predetermined pool, so long as the pool is big enough to accommodate the frequency of Pod creation and termination.
Either way, I would absolutely read a blog post describing your attempt -- successful or not -- to pull this off!
On a separate note, be careful because the word Service means something specific within kubernetes, even though it is regrettably a word often used in a more generic term (as I suspect you did). Thankfully, a Service is designed to do the exact opposite of what you want to happen, so there was little chance of confusion -- just be aware.
Been googling it for a while and can't figure out the answer: suppose I have two containers inside a pod, and one has to send the other some secrets. Should I use https or is it safe to do it over http? If I understand correctly, the traffic inside a pod is firewalled anyway, so you can't eavesdrop on the traffic from outside the pod. So... no need for https?
Containers inside a Pod communicate using the loopback network interface, localhost.
TCP packets would get routed back at IP layer itself, if the address is localhost.
It is implemented entirely within the operating system's networking software and passes no packets to any network interface controller. Any traffic that a computer program sends to a loopback IP address is simply and immediately passed back up the network software stack as if it had been received from another device.
So when communication among Containers inside a Pod, it is not possible to get hijacked/ altered.
If you want to understand more, take a look understanding-kubernetes-networking
Hope it answers your question
I'm a little confused as to how the following scenario works. It's a very simple setup, so I hope the explanation is simple.
I have a host with a single physical NIC. I create a single macvlan sub-interface in bridge mode off this physical NIC. Then I start up two LXD/LXC containers. Each with their own unique MAC and IP, but in the profile, I specify the same single macvlan sub-interface as each container's parent interface.
Both containers have access to the network without issue. I'm also able to SSH into each container using each container's unique IP address. This is the bit that confuses me:
How is all of this working underneath the hood? Both containers are using the single macvlan MAC/IP when accessing the external world. Isn't there going to be some sort of collision? Shouldn't this not work? Shouldn't I need one macvlan subinterface per container? Is there some sort of NAT going on here?
macvlan isn't documented much, hoping someone out there can help out.
There isn't NATing per say as that is at the IP layer -- MACs are the link layer -- but it is a similar result.
All of the MACs (the NIC's and the macvlan's) will get routed through the same link that goes to the NIC. The NIC device driver will then route the traffic to the correct interface (virtual or not) which puts it to one of the guests or to the host. You can think of macvlan's as virtual switches.
I want to assign IP addresses to my docker containers, at the same level as the physical host. i.e. if the IP adress of the host is 192.168.1.101 I would like to give the docker containers IP addresses of 192.168.1.102,103,104 etc.
Essentially I am looking for a functionality similar to bridged networking in VMWare/Virtualbox etc.
Any ideas how we can go about doing this?
Docker's default bridge network allows you to NAT your containers into the physical network.
To achieve what you want, use Pipework or, if you are cutting edge, you can try the docker macvlan driver which is, for now, experimental.
To quote docker docs:
The host network adds a container on the hosts network stack. You’ll
find the network configuration inside the container is identical to
the host.
When starting the container just say --net=host. Check this link. You can't actually assign a static IP when starting with that parameter, but you can give the container a hostname with --hostname, which is at the very least equally useful as knowing the IP. Also you can add more entries to /etc/hosts with --add-host.
I'm trying to understand what's happening under the hood to a network packet coming from the wire connected to the host machine and directed to an application inside a Docker container.
If it were a classic VM, I know that a packet arriving on the host would be transmitted by the hypervisor (say VMware, VBox etc.) to the virtual NIC of the VM and from there through the TCP/IP stack of the guest OS, finally reaching the application.
In the case of Docker, I know that a packet coming on the host machine is forwarded from the network interface of the host to the docker0 bridge, that is connected to a veth pair ending on the virtual interface eth0 inside the container. But after that? Since all Docker containers use the host kernel, is it correct to presume that the packet is processed by the TCP/IP stack of the host kernel? If so, how?
I would really like to read a detailed explanation (or if you know a resource feel free to link it) about what's really happening under the hood. I already carefully read this page, but it doesn't say everything.
Thanks in advance for your reply.
The network stack, as in "the code", is definitely not in the container, it's in the kernel of which there's only one shared by the host and all containers (you already knew this). What each container has is its own separate network namespace, which means it has its own network interfaces and routing tables.
Here's a brief article introducing the notion with some examples: http://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/
and I found this article helpful too:
http://containerops.org/2013/11/19/lxc-networking/
I hope this gives you enough pointers to dig deeper.