How does docker0 bridge work internally inside the host? - networking

I am trying to understand how the bridged docker0 interface works.
When docker daemon starts up, it creates a bridged device docker0;
When a container starts up, it creates a interface vthn and bind to docker0
say we issue a ping command from inside the container to a external host
[root#f505f022eb5b app]# ping 130.49.40.130
PING 130.49.40.130 (130.49.40.130) 56(84) bytes of data.
64 bytes from 130.49.40.130: icmp_seq=1 ttl=52 time=11.9 ms
so apprently my host eth0 is receiving this ping back, but how does this package get forwarded to the container? There are serveral questions to ask
eth0 and docker0 are not bridged, how come docker0 get the packets from eth0?
even if docker0 got the packets, how it works internally sending packets to vth0? does it internally maintains some Maps so it can convert packets to between different mac address?
how is iptables related here?
Cheers.

Docker is not doing anything specifically magical here and your question is not really docker dependant/related.
docker0 is just a network bridge. As soon as this bridge is created (upon starting the docker service) you can assume that a new machine (in this case in a VM/docker form) has joined the your network.
When pinging the docker container from host or vice versa you are basically pinging another machine inside your network.
Regarding docker, unless you have created a new network interface (which I doubt so since you are pinging eth0) you are basically pinging yourself.
If you run the container as:
docker run -i -t --rm -p 10.0.0.99:80:8080 ubuntu:16.04
You are telling docker to create a NAT rule in iptables to forward any packets going to 10.0.0.99:80 to your docker container on port 8080.
When you run the container as:
docker run -i -t --rm -p --net=host ubuntu:16.04
Then you are saying the docker container should have the same network stack as the host so all the packets going to host will also arrive to your docker container via the docker0 bridge.

To answer your question, how does a container ping an external host, this is also achieved via NAT.
If you list your Iptables / NAT rules using: sudo iptables -t nat -L
You will likely see, something similar to the below (docker subnet may be different)
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- 172.17.0.0/16 anywhere
This is basically saying NAT any outgoing packets originating from the docker subnet. So the outgoing packets will appear to originate from the docker host machine. When the ping packets return, the NAT table will be used to determine that a docker host actually made the request, and the packet gets forwarded to the docker veth.

Related

How do I connect Cloud Composer Airflow DAG to a VPN

How do I allow a Cloud Composer Airflow DAG to connect to a REST API via VPN gateway? The cluster is connected to the according VPC.
The kube-proxy is able to reach the API, yet the containers can not.
I have SSH'd into the kube-proxy and containers and tried a traceroute. The containers' traceroute ends with the kube-proxy. The kube-proxy has 4 hops before reaching destination.
I have dumped the iptables from the kube-proxy, they do not specify anything in regards to NATing the VPCs subnet with the containers.
The VPC subnet also does not show up in the containers, which is by design.
Some reading material:
https://www.stackrox.com/post/2020/01/kubernetes-networking-demystified/
EDIT1: More info:
Let's assume the VPN connects the VPC to the remote 10.200.0.0 network.
The VPC has multiple subnets. The primary range is e.g. 10.10.0.0/20. For each Kubernetes cluster we have two more subnet, one for each pod (10.16.0.0/14) and another for services (10.20.0.0/20). The gateway is 10.10.0.1.
Each pod again has it's own range, where pod_1 is 10.16.0.0/14, pod_2 is 10.16.1.0/14, pod_3 10.16.3.0/14 and so on.
One of the kube-proxies has multiple network adapters. It resides in the 10.10.0.0 network with eth0 and has a cbr0 bridge to 10.16.0.0. Through said kube-proxy via the bridge the workers for Airflow are connecting to the network. One worker e.g. 10.16.0.1 has only one network adapter.
The kube-proxy can reach the 10.200.0.0 network. The Airflow workers can not.
How do we get the worker to access the 10.200.0.0 network? Do we need to change the iptables of the kube-proxy?
One of the possible solutions would be to forward the packages from the kube virtual interface to the node's real one. E.g. adding the following rules to ip tables:
iptables -A FORWARD -i cbr0 -o eth0 -d 10.200.0.0/25 -j ACCEPT
iptables -A FORWARD -i eth0 -o cbr0 -s 10.200.0.0/25 -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

What is the relation between docker0 and eth0?

I know by default docker creates a virtual bridge docker0, and all container network are linked to docker0.
As illustrated above:
container eth0 is paired with vethXXX
vethXXX is linked to docker0 same as a machine linked to switch
But what is the relation between docker0 and host eth0?
More specifically:
When a packet flows from container to docker0, how does it know it will be forwarded to eth0, and then to the outside world?
When an external packet arrives to eth0, why it is forwarded to docker0 then container? instead of processing it or drop it?
Question 2 can be a little confusing, I will keep it there and explained a little more:
It is a return packet that initialed by container(in question 1): since the outside does not know container network, the packet is sent to host eth0. How it is forwarded to container? I mean, there must be some place to store the information, how can I check it?
Thanks in advance!
After reading the answer and official network articles, I find the following diagram more accurate that docker0 and eth0 has no direct link,instead they can forward packets:
http://dockerone.com/uploads/article/20150527/e84946a8e9df0ac6d109c35786ac4833.png
There is no direct link between the default docker0 bridge and the hosts ethernet devices. If you use the --net=host option for a container then the hosts network stack will be available in the container.
When a packet flows from container to docker0, how does it know it will be forwarded to eth0, and then to the outside world?
The docker0 bridge has the .1 address of the Docker network assigned to it, this is usually something around a 172.17 or 172.18.
$ ip address show dev docker0
8: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:03:47:33:c1 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
Containers are assigned a veth interface which is attached to the docker0 bridge.
$ bridge link
10: vethcece7e5 state UP #(null): <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master docker0 state forwarding priority 32 cost 2
Containers created on the default Docker network receive the .1 address as their default route.
$ docker run busybox ip route show
default via 172.17.0.1 dev eth0
172.17.0.0/16 dev eth0 src 172.17.0.3
Docker uses NAT MASQUERADE for outbound traffic from there and it will follow the standard outbound routing on the host, which may or may not involve eth0.
$ iptables -t nat -vnL POSTROUTING
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
iptables handles the connection tracking and return traffic.
When an external packet arrives to eth0, why it is forwarded to docker0 then container? instead of processing it or drop it?
If you are asking about the return path for outbound traffic from the container, see iptables above as the MASQUERADE will map the connection back through.
If you mean new inbound traffic, Packets are not forwarded into a container by default. The standard way to achieve this is to setup a port mapping. Docker launches a daemon that listens on the host on port X and forwards to the container on port Y.
I'm not sure why NAT wasn't used for inbound traffic as well. I've run into some issues trying to map large numbers of ports into containers which led to mapping real world interfaces completely into containers.
You can detect the relation by network interface iflink from a container and ifindex on the host machine.
Get iflink from a container:
$ docker exec ID cat /sys/class/net/eth0/iflink
17253
Then find this ifindex among interfaces on the host machine:
$ grep -l 17253 /sys/class/net/veth*/ifindex
/sys/class/net/veth02455a1/ifindex

Openstack VM is not accessible on LAN

I am facing issue with accessing Open stack VM's on LAN.
I have setup single machine(192.168.2.15) opensatck using devstack, so
all VM's are running inside this machine
My machine(192.168.2.15) has one network card(eth0) and
I have nova networking, have not installed neutron.
I have assigned static IP on eth0 of all the LAN machine( such as 192.168.2.15 and 192.168.2.16) in /etc/network/interfaces file.
System information of the Openstack Machine is as below:
Memory usage: 19% IP address for virbr0: 192.168.122.1
Swap usage: 0% IP address for br100: 10.0.0.1
Below works fine
I can access internet from VM1(10.0.0.2 which is auto assigned IP).
I can ping LAN machine(192.168.2.16) from VM1.
Openstack machine(192.168.2.15) can ping VM1(10.0.0.2).
VM1(10.0.0.2) can ping VM2(10.0.0.3).
But LAN machine 192.168.2.16 is not able to ping VM1(10.0.0.2)
So please suggest how can it be achieved ? And Please consider me as very new to Openstack and networking.
Thanks !!!
You need to assign a floating IP to the VMs you create if you want a host from outside the openstack network to connect to it. The internal IPs are only accessible from inside the openstack network.
See how to assign a floating IP to a VM here: http://docs.openstack.org/user-guide/content/floating_ip_allocate.html
To access the VM's floating IP from another host (that is not the devstack host) you should make sure that the devstack host is configured to forward packets. You can do this with:
sudo bash
echo 1 > /proc/sys/net/ipv4/ip_forward
echo 1 > /proc/sys/net/ipv4/conf/eth0/proxy_arp
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
See more details here:
http://barakme.tumblr.com/post/70895539608/openstack-in-a-box-setting-up-devstack-havana-on-your
Adding a route to client machine to openstack VM, helped me.

Multiple IPs + bridge for KVM

I got a problem at the moment and really don't know where the mistake is. I got a Root-Server from my ISP. This Root-Server has already one IP included and today i booked two more IP-Addresses. So what I want to do now is to map this two new IP-Adresses to two virtual Machines but also hold the included IP for the Root-Server. So how I realize this?
I thought something like:
br0 - holds the original IP of the Root-Server
br0:0 - holds first IP of first virtual Machine
br0:1 - holds second IP of second virtual Machine
But this doesn't work. Any Ideas. I'm really frustrated. Worked the hole Day on it and no solution.
I was also struggling with similar scenario, I've got server and got to point that setting up bridge did cut me out and had to restart to be able to reach it , anyway I've managed to handle it by iptables ..
#create alias for your second ip address (lets say its 111.222.333.2 , local address 192.168.1.2)
ifconfig eth0:1 111.222.333.2
#you should add netmask to be proper if you've got subnet
#now you should be able to ping this second address from outside world - try it,
#that is if you have not set up firewall to block pings ... flush iptables rules if you are not sure...
#set up NAT rule (network-address-translate : outside ip-> local ip and back local ip->outside ip)
#assumes your virtual machines lives as 192.168.1.2
iptables -t nat -A PREROUTING -d 111.222.333.2 -j DNAT --to-destination 192.168.1.2
iptables -t nat -A POSTROUTING -s 192.168.1.2 -j SNAT --to-source 111.222.333.2
This did help me with server which has multiple IP addresses and KVM virtual machines,
which were originally run in default network (forward mode=nat), so they had internet through NAT and internal IP at first , this also gives them outside-world public IP address.
You can also do these redirects on port-by-port basis by adjusting iptables rule to set address like -d 111.222.333.2:80 -p tcp and also adding port to local address ...
You may also need to turn on device IP forwarding, you can check that by for example sysctl -a | grep forward (where you should see it on for your eth0 device) , optionally adjusting it by proper sysctl command like
sysctl -w net.ipv4.ip_forward=1
Map br0 to VM1 and VM2 as TAP DEVICE and in VM1 and VM2 you can see that as eth device;
Assign IP1 and IP2 to VM1 and VM2 respectively; With this configuration you can ping from VM1 to VM2 and from host machine to any guest machine(VM1 or VM2);
The following link will help you setting up TAP device for VM via bridge; See qemu-ifup script specified there and understand it well.

Forwarding within local network to same network

I have X-Wrt based on OpenWrt 8.09 on my router
I have home LAN of few computers on which I have some network servers (SVN, web, etc). For each of service I made forwarding on my router (Linksys wrt54gl) to access it from the Internet (<my_external_ip>:<external_port> -> <some_internal_ip>:<internal_port>)
But within my local network this resources by above request is unreachable (so I need make some reconfiguration <some_internal_ip>:<internal_port> to access).
I added some line to my /etc/hosts
<my_external_ip> localhost
So now all requests from local network to <my_external_ip> forwards to my router but further redirection to appropriate port not works.
Advise proper redirection please.
You need to install an IP redirect for calls going out of the internal network and directed to the public IP. Normally these packets get discarded. You want to reroute them, DNATting to the destination server, but also masqueraded so that the server, seeing as you, its client, are in its same network, doesn't respond directly to you with its internal IP (which you, the client, not having sent the packet there, would discard).
I found this on OpenWRT groups:
iptables -t nat -A prerouting_rule -d YOURPUBLICIP -p tcp --dport PORT -j DNAT --to YOURSERVER
iptables -A forwarding_rule -p tcp --dport PORT -d YOURSERVER -j ACCEPT
iptables -t nat -A postrouting_rule -s YOURNETWORK -p tcp --dport PORT -d YOURSERVER -j MASQUERADE
https://forum.openwrt.org/viewtopic.php?id=4030
If I remember correctly OpenWrt allows you to define custom DNS entries. So maybe simply give a proper local names to your sources (ie. svnserver.local) and map them to specific local IPs. This way you do not even need to go through router to access local resources from local network.

Resources