I know by default docker creates a virtual bridge docker0, and all container network are linked to docker0.
As illustrated above:
container eth0 is paired with vethXXX
vethXXX is linked to docker0 same as a machine linked to switch
But what is the relation between docker0 and host eth0?
More specifically:
When a packet flows from container to docker0, how does it know it will be forwarded to eth0, and then to the outside world?
When an external packet arrives to eth0, why it is forwarded to docker0 then container? instead of processing it or drop it?
Question 2 can be a little confusing, I will keep it there and explained a little more:
It is a return packet that initialed by container(in question 1): since the outside does not know container network, the packet is sent to host eth0. How it is forwarded to container? I mean, there must be some place to store the information, how can I check it?
Thanks in advance!
After reading the answer and official network articles, I find the following diagram more accurate that docker0 and eth0 has no direct link,instead they can forward packets:
http://dockerone.com/uploads/article/20150527/e84946a8e9df0ac6d109c35786ac4833.png
There is no direct link between the default docker0 bridge and the hosts ethernet devices. If you use the --net=host option for a container then the hosts network stack will be available in the container.
When a packet flows from container to docker0, how does it know it will be forwarded to eth0, and then to the outside world?
The docker0 bridge has the .1 address of the Docker network assigned to it, this is usually something around a 172.17 or 172.18.
$ ip address show dev docker0
8: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:03:47:33:c1 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
Containers are assigned a veth interface which is attached to the docker0 bridge.
$ bridge link
10: vethcece7e5 state UP #(null): <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master docker0 state forwarding priority 32 cost 2
Containers created on the default Docker network receive the .1 address as their default route.
$ docker run busybox ip route show
default via 172.17.0.1 dev eth0
172.17.0.0/16 dev eth0 src 172.17.0.3
Docker uses NAT MASQUERADE for outbound traffic from there and it will follow the standard outbound routing on the host, which may or may not involve eth0.
$ iptables -t nat -vnL POSTROUTING
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
iptables handles the connection tracking and return traffic.
When an external packet arrives to eth0, why it is forwarded to docker0 then container? instead of processing it or drop it?
If you are asking about the return path for outbound traffic from the container, see iptables above as the MASQUERADE will map the connection back through.
If you mean new inbound traffic, Packets are not forwarded into a container by default. The standard way to achieve this is to setup a port mapping. Docker launches a daemon that listens on the host on port X and forwards to the container on port Y.
I'm not sure why NAT wasn't used for inbound traffic as well. I've run into some issues trying to map large numbers of ports into containers which led to mapping real world interfaces completely into containers.
You can detect the relation by network interface iflink from a container and ifindex on the host machine.
Get iflink from a container:
$ docker exec ID cat /sys/class/net/eth0/iflink
17253
Then find this ifindex among interfaces on the host machine:
$ grep -l 17253 /sys/class/net/veth*/ifindex
/sys/class/net/veth02455a1/ifindex
Related
Case:
[ Subnet A , 192.168.2.0/24, Padavan firmware based internet gw ]
[ Subnet B , 192.168.1.0/24, Padavan firmware based internet gw ]
Host from subnet A (2.155) is connected via VPN (possible options: PPTP, OpenVPN, L2TP w/o ipsec) to subnet B, and receives address, saying 1.245/32
In subnet B exists host (1.10/32) which sends multicast datagramms to 224.0.0.50:9898 ; On router I see them with
tcpdump -i br0 -c 10 dst host 224.0.0.50 and port 9898 and multicast
13:46:54.345369 IP 192.168.1.10.4321 > 224.0.0.50.9898: UDP, length 135
I am looking for solutions, to receive/forward those broadcast messages, so they could be seen by hosts, connected via VPN
On router B, which is Padavan firmware based, I have, and limited to udpxy, igmproxy utilities, if needed.
On client host, I am debian based, and generally not limited in tools.
Datagrams are proprietary protocol, i.e. not a iptv or video stream.
Any ideas are welcomed.
[UPD] Additional info - per discussion in comments
That's a very specific hardware device, which is not very chatty in ethernet terms (saying max 1-2 datagramms in 5 seconds), thus for sure should be pretty forwardable. Unfortunately, It sends status updates purely via broadcasting. in Subnet A do exist similar device + control software. Thus I am looking for a way datagramms broadcasted to 224.0.0.50:9898 in subnet B to re-appear in subnet A. May be with help of some tool. May be smcroute, may be udpxy, maybe igmproxy
As I don't like to leave resolved questions unanswered, here is currently working solution
In subnet B I have installed openVPN server endpoint, configured as L2.
In subnet A, on a control host I have installed openvpn client, that connects to subnet B, assigned interface is tapz
20: tapz: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 100
link/ether 0a:da:be:96:78:d9 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.245/24 brd 192.168.1.255 scope global noprefixroute tapz
valid_lft forever preferred_lft forever
inet6 fe80::8da:beff:fe96:78d9/64 scope link
valid_lft forever preferred_lft forever
So now on a control host I have:
broadcasting from local device on physical ethernet enp5s0
sudo tcpdump -i enp5s0 -c 10 dst host 224.0.0.50 and port 9898 and multicast
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp5s0, link-type EN10MB (Ethernet), capture size 262144 bytes
13:55:05.642963 IP lumi-gateway-v3_miio56591509.4321 > 224.0.0.50.9898: UDP,
length 136
and now I also receive broadcasts from remote network device on tapz
sudo tcpdump -i tapz -c 10 dst host 224.0.0.50 and port 9898 and multicast
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tapz, link-type EN10MB (Ethernet), capture size 262144 bytes
13:53:49.141751 IP 192.168.1.10.4321 > 224.0.0.50.9898: UDP, length 135
So far that it what I was looking for I am getting necessary datagrams on a VPN client. OpenVPN on remote side can be also optimized on filter of information forwarded for multicasts.
For those who come here, with the same question.
When you will have necessary multicast on tap0,
you can create bridge from, saying, eth0 and tap0
For notes of everyone interested, who would came here.
ip link add br0 type bridge
ip link set tap0 master br0
ip link set eth0 master br0
POC - both multicasts on single interface
sudo tcpdump -i br0 dst host 224.0.0.50 and port 9898
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 262144 bytes
21:09:51.823632 IP 192.168.1.10.4321 > 224.0.0.50.9898: UDP, length 135
21:09:55.045138 IP 192.168.2.214.4321 > 224.0.0.50.9898: UDP, length 136
I have installed docker with default settings on 3 physical machines. Docker created interface docker0 with default ip 172.17.0.1 in bridge mode.
I expected that this network would be private. Problem is that I cannot ping 172.17.0.1 but i can apring 172.17.0.1. why is this so? I want this network to bi completely private.
➜ ~ arping -I eno1 172.17.0.1
ARPING 172.17.0.1 from 172.19.20.35 eno1
Unicast reply from 172.17.0.1 [00:19:99:16:3E:24] 0.678ms
Unicast reply from 172.17.0.1 [00:19:99:16:3E:70] 0.685ms
Unicast reply from 172.17.0.1 [70:4D:7B:3D:83:33] 0.687ms
Is it safe to run this on corporate network or should I get permission of sysadmins?
The created bridge is private to the host it is running on. Can you verify that the IP 172.17.0.1 is not part of your private/corporate network? It's possible that some other host in your network is responding.
If this is the problem, you should probably use another IP and CIDR for the docker0 bridge. Having clashes between docker internal CIDRs and private/corporate network CIDRs will result in strange and hard to debug behavior inside your containers.
Please read https://docs.docker.com/engine/userguide/networking/default_network/custom-docker0/ for details on how to customize these settings.
I am trying to understand how the bridged docker0 interface works.
When docker daemon starts up, it creates a bridged device docker0;
When a container starts up, it creates a interface vthn and bind to docker0
say we issue a ping command from inside the container to a external host
[root#f505f022eb5b app]# ping 130.49.40.130
PING 130.49.40.130 (130.49.40.130) 56(84) bytes of data.
64 bytes from 130.49.40.130: icmp_seq=1 ttl=52 time=11.9 ms
so apprently my host eth0 is receiving this ping back, but how does this package get forwarded to the container? There are serveral questions to ask
eth0 and docker0 are not bridged, how come docker0 get the packets from eth0?
even if docker0 got the packets, how it works internally sending packets to vth0? does it internally maintains some Maps so it can convert packets to between different mac address?
how is iptables related here?
Cheers.
Docker is not doing anything specifically magical here and your question is not really docker dependant/related.
docker0 is just a network bridge. As soon as this bridge is created (upon starting the docker service) you can assume that a new machine (in this case in a VM/docker form) has joined the your network.
When pinging the docker container from host or vice versa you are basically pinging another machine inside your network.
Regarding docker, unless you have created a new network interface (which I doubt so since you are pinging eth0) you are basically pinging yourself.
If you run the container as:
docker run -i -t --rm -p 10.0.0.99:80:8080 ubuntu:16.04
You are telling docker to create a NAT rule in iptables to forward any packets going to 10.0.0.99:80 to your docker container on port 8080.
When you run the container as:
docker run -i -t --rm -p --net=host ubuntu:16.04
Then you are saying the docker container should have the same network stack as the host so all the packets going to host will also arrive to your docker container via the docker0 bridge.
To answer your question, how does a container ping an external host, this is also achieved via NAT.
If you list your Iptables / NAT rules using: sudo iptables -t nat -L
You will likely see, something similar to the below (docker subnet may be different)
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- 172.17.0.0/16 anywhere
This is basically saying NAT any outgoing packets originating from the docker subnet. So the outgoing packets will appear to originate from the docker host machine. When the ping packets return, the NAT table will be used to determine that a docker host actually made the request, and the packet gets forwarded to the docker veth.
I found most of the configuration is for giving static or private network. But I want it to act as a different machine so it will get a separate IP address from the DHCP and I want to do it through nmcli.
Thanks in advance.
If you are using docker as tagged, rather than LXC, use pipework to map the wlan interface from the host to the container
pipework eth2 $CONTAINERID 10.10.9.9/24
or alternatively let the container do the dhcp negotiation for you
pipework eth1 $CONTAINERID dhclient
This setup is based on a macvlan interface so the same concept should work with LXC you just won't get the easy front end.
I'm confused if this is a docker question or an LXC question.
EDIT: as per the comments, wlan interface support in a bridge depends on the wlan vendor. It may work, or it may not work at all.
In any case, you should be able to create a bridge, add your wlan0 interface to the bridge, and then have your LXC container connect to this bridge directly. Then, when you run your DHCP client in the container, it will grab it from the wlan0 interface.
Configure bridge (manually for now)
# ifconfig wlan0 up
# brctl addbr br0
# brctl addif br0 wlan0
# ifconfig br0 up
# dhclient br0
Configure LXC configuration
If using traditional priviliged LXC, edit the container's config file at /var/lib/lxc/$NAME/config,
and update this value to point to your new bridge.
lxc.network.link = br0
Run DHCP in container
# lxc-attach -n $NAME
# dhclient eth0
# ip a
If the output to ip a shows the desired IP, you're all set!
If you want to make the configuration persistent, you'll have to add the bridge to your /etc/network/interfaces file.
IEEE 802.11 doesn’t like multiple MAC addresses on a single client, so bridge and macvlans are not the right solution here.
Use ipvlan in L2 mode.
Here is the topo: HostA(eth0) ---- (eth0)HostB
I have created a tun/tap device on HostB, for say tun0 or tap0. When eth0 of HostB receives a packet from HostA, maybe a ICMPv6(NS, echo request, etc.) or a UDP/TCP packet(encapsulated with IPv6 header), I want to forward this packet from eth0 to tap0. After doing something to this packet, I also want to send a reply back to HostA, through tap0 and eth0.
I cannot find a way to do that, can some one help me or give some hints?
This is an extremely basic routing question, probably unsuitable for Stack Overflow.
You need something like this on Host B:
HostB# sysctl -w net.ipv6.conf.all.forwarding=1
HostB# ip -6 addr add 2001:db8:0:0::1/64 dev eth0
HostB# ip -6 addr add 2001:db8:0:1::1/64 dev tun0
Then on Host A:
HostA# ip -6 addr add 2001:db8:0:0::2/64 dev eth0
HostA# ip -6 route add default via 2001:db8:0:0::1 dev eth0
HostA# ping6 2001:db8:0:1::2 # <-- should work if that host exists on tun0