Multiple LXD containers on single macvlan interface - networking

I'm a little confused as to how the following scenario works. It's a very simple setup, so I hope the explanation is simple.
I have a host with a single physical NIC. I create a single macvlan sub-interface in bridge mode off this physical NIC. Then I start up two LXD/LXC containers. Each with their own unique MAC and IP, but in the profile, I specify the same single macvlan sub-interface as each container's parent interface.
Both containers have access to the network without issue. I'm also able to SSH into each container using each container's unique IP address. This is the bit that confuses me:
How is all of this working underneath the hood? Both containers are using the single macvlan MAC/IP when accessing the external world. Isn't there going to be some sort of collision? Shouldn't this not work? Shouldn't I need one macvlan subinterface per container? Is there some sort of NAT going on here?
macvlan isn't documented much, hoping someone out there can help out.

There isn't NATing per say as that is at the IP layer -- MACs are the link layer -- but it is a similar result.
All of the MACs (the NIC's and the macvlan's) will get routed through the same link that goes to the NIC. The NIC device driver will then route the traffic to the correct interface (virtual or not) which puts it to one of the guests or to the host. You can think of macvlan's as virtual switches.

Related

Is there a way to rename network interfaces in Docker swarm?

When using Docker swarm mode and exposing ports outside, you have at least three networks, the ingress network, the bridge network and the overlay network (used for internal cluster communications). The container joins these networks using one of eth0-2 (randomically each time) interfaces and from an application point of view is not easy to understand which of these is the cluster network (the correct one to use for service discovery client publish - e.g Spring Eureka).
Is there a way to customize network interface names in some way?
Not a direct answer to your question, but one of the key selling points of swarm mode is the built-in service discovery mechanism, which in my opinion works really nicely.
More related, I don't think it's possible to specify the desired interface for an overlay network. However, when creating a network, it is possible to define the subnet or the IP range of the network (https://docs.docker.com/engine/reference/commandline/network_create/). You could use that to identify the interface belonging to your overlay network, by checking if the bound IP address is part of the network you want to publish on.

Docker giving IP address at the same level as the host, similar to VM bridged networking

I want to assign IP addresses to my docker containers, at the same level as the physical host. i.e. if the IP adress of the host is 192.168.1.101 I would like to give the docker containers IP addresses of 192.168.1.102,103,104 etc.
Essentially I am looking for a functionality similar to bridged networking in VMWare/Virtualbox etc.
Any ideas how we can go about doing this?
Docker's default bridge network allows you to NAT your containers into the physical network.
To achieve what you want, use Pipework or, if you are cutting edge, you can try the docker macvlan driver which is, for now, experimental.
To quote docker docs:
The host network adds a container on the hosts network stack. You’ll
find the network configuration inside the container is identical to
the host.
When starting the container just say --net=host. Check this link. You can't actually assign a static IP when starting with that parameter, but you can give the container a hostname with --hostname, which is at the very least equally useful as knowing the IP. Also you can add more entries to /etc/hosts with --add-host.

Docker Networking

I am trying to understand the relationship between:
eth0 on the host machine; and
docker0 bridge; and
eth0 interface on each container
It is my understanding that Docker:
Creates a docker0 bridge and then assigns it an available subnet that is not in conflict with anything running on the host; then
Docker binds docker0 to eth0 running on the host; then
Docker binds each new container it spins up to docker0, such that the container's eth0 interface connects to docker0 on the host, which in turn is connected to eth0 on the host
This way, when something external to the host tries to communicate with a container, it must send the message to a port on the host's IP, which then gets forwarded to the docker0 bridge, which then gets broadcasted to all the containers running on the host, yes?
Also, this way, when a container needs to communicate to something outside the host, it has its own IP (leased from the docker0 subnet) and so the remote caller will see the message as having came from the container's IP.
So if anything I have stated above is incorrect, please begin by clarifying for me!
Assuming I'm more or less correct, my main concerns are:
When remote services "call in" to the container, all containers get broadcasted the same message, which creates a lot of traffic/noise, but could also be a security risk (where only container 1 should be the recipient of some message, but all the other containers running on it get the message as well); and
What happens when Docker chooses identical subnets on different hosts? In this case, container 1 living on host 1 might have the same IP address as container 2 living on host 2. If container 1 needs to "call out" to some external/remote system (not living on the host), then how does that remote system differentiate between container 1 vs container 2 (both will show the same egress IP)?
I won't say that you are clear with the concept of networking in Docker.
Let me clarify this part first:
So here's how it goes:
Docker uses a feature of Linux kernel called namespaces to classify/divide the resources.
When a container starts, Docker creates a set of namespaces for that container.
This provides a layer of isolation.
One of these is the "net namespace": Used for managing network interfaces.
Now talking a bit about Network Namespaces:
Net namespace,
Lets each container have its own network resources, own network stack.
It’s own network interfaces.
It’s own routing tables.
It’s own iptables rules.
It’s own sockets (ss, netstat)
We can move a network interface across net namespaces.
So we can create a netns somewhere and use it at some other container.
Typically: Two virtual interfaces are used, which act as a cross-over cable.
Eth0 # container netNS, which is paired with virtual interface vethXXX in host
network ns.
➔ All virtual interfaces vethXXX are bridged together. (Using the bridge docker0)
Now, apart from namespaces, there's a second feature in Linux kernel that makes creation of containers possible: c-groups (or control-groups).
Control groups let us implement metering and limiting of:
Memory
CPU
Block I/O
Network
TL/DL
In Short:
Containers are made possible because of 2 main features of kernel:
Namespaces and C-Groups.
Cgroups ---> Limits how much you can use.
Namespaces ---> Limits what you can see.
And you can't effect what you can't see.
Coming back to your question, When a packet is received by the host that is intended for a container, It is encapsulated in layers such that each layer helps the network controller, which peels the packet layer after layer to send it to it's destination. (And similarly while outgoing, the packets are encapsulated layer by layer)
So, I think this answers both of your questions as well.
It's not exactly a broadcast. Other containers can't see a packet that's not related to them (Namespaces).
Since layers are added as a packet goes to the external network, the external layer (Different for different hosts) will help the packet to identify it's recipient uniquely.
P.S.:
If someone find some information erroneous, please let me know in the comments. I have written this in a hurry, will update it with better reviewed text soon.
Thank you.

Same IP address for multiple Bluemix Docker containers

Like the title says, is it possible to run multiple Bluemix containers with the same public IP address, but with different ports exposed? (There should be no need to buy additional or waste IPv4 space.)
I'd like to run 6 differently parameterized (with environment variables) containers. The difference would be the exposed port numbers (and the inner application logic).
The only thing I need is to be able to access that port either with Docker configuration or other solutions, like NAT between these 6 images and a "router".
Thank you.
This is not possible with IBM Containers.

Does a docker container have its own TCP/IP stack?

I'm trying to understand what's happening under the hood to a network packet coming from the wire connected to the host machine and directed to an application inside a Docker container.
If it were a classic VM, I know that a packet arriving on the host would be transmitted by the hypervisor (say VMware, VBox etc.) to the virtual NIC of the VM and from there through the TCP/IP stack of the guest OS, finally reaching the application.
In the case of Docker, I know that a packet coming on the host machine is forwarded from the network interface of the host to the docker0 bridge, that is connected to a veth pair ending on the virtual interface eth0 inside the container. But after that? Since all Docker containers use the host kernel, is it correct to presume that the packet is processed by the TCP/IP stack of the host kernel? If so, how?
I would really like to read a detailed explanation (or if you know a resource feel free to link it) about what's really happening under the hood. I already carefully read this page, but it doesn't say everything.
Thanks in advance for your reply.
The network stack, as in "the code", is definitely not in the container, it's in the kernel of which there's only one shared by the host and all containers (you already knew this). What each container has is its own separate network namespace, which means it has its own network interfaces and routing tables.
Here's a brief article introducing the notion with some examples: http://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/
and I found this article helpful too:
http://containerops.org/2013/11/19/lxc-networking/
I hope this gives you enough pointers to dig deeper.

Resources