Let's say we have 2 Nodes in a cluster.
Node A has 1 replica of a pod, Node B has 2 replicas. According to this talk (YouTube video with a time tag) from Google Cloud engineers, a request which was routed to Node A might be rerouted to the Node B by iptables which is inside the Node A. I have several questions regarding this behavior:
What information iptables of Node A knows about replicas of a pod outside of it? How does it know where to send the traffic?
Can it be that iptables of the Node B reroutes this request to Node C? If so, then will the egress traffic go back to the Node B -> Node A -> Client?
I think you might be mixing up two subsystems, service proxies and CNI. CNI is first, it’s a plug-in based system that sets up the routing rules across all your nodes so that the network appears flat. A pod IP will work like normal from any node. Exactly how that happens varies by plugin, Calico uses BGP between the nodes. Then there’s the service proxies, usually implemented using iptables though also somewhat pluggable. Those define the service IP -> endpoint IP (read: pod IP) load balancing. But the actual routing is handled by whatever your CNI plugin set up. There’s a lot of special modes and cases but that’s the basic overview.
Packets can move between nodes, services and pods before reaching the final destination.
All the intra-cluster routing (node-to-node, pod-to-pod, service-to-service, pod-to-service, service-to-pod, pod-to-node, node-to-pod, etc) in kubernetes is done by:
CNI
load-balancing algorithm
kube-proxy
iptables.
Packet route in k8s also depends on many things like load in the cluster, per-node load, affinity/anti-affinity rules, nodeSelectors, taints/tolerations, autoscaling, number of pod replicas, etc.
Intra-cluster routing is transparent to the router and ideally the user need not know about it unless there are networking issues to debug.
Doing sudo iptables -L -n -v on any k8s node shows the low-level iptables rules and chains used for packet-forwarding.
Related
There are already tools out there which visualize the traffic between pods. In detail the state the following:
Linkerd tap listens to a traffic stream for a resource.
In Weave Scope, edges indicate TCP connections between nodes.
I am now wondering how these tools get the data because the Kubernetes API itself does not provide this information. I know that Linkered installs a proxy next to each service but is this the only option?
The component that monitors the traffic must be either a sidecar container in each pod or a daemon on each node. For example:
Linkerd uses a sidecar container
Weave Scope uses a DaemonSet to install an agent on each node of the cluster
A sidecar container observes traffic to/from its pod. A node daemon observes traffic to/from all the pods on the node.
In Kubernetes, each pod has its own unique IP address, so these components basically check the source and destination IP addresses of the network traffic.
In general, any traffic from/to/between pods has nothing to do with the Kubernetes API and to monitor it, basically the same principles as in non-Kubernetes environments apply.
You can use SideCar Proxy for it or use prometheus-operator which internally uses grafana dashboards. in there you can monitor each and everything.
My advice is to use istio.io that injects an envoy proxy as a sidecar container on each pod, then you can use Prometheus to scrape metrics from these proxies and use Grafana for visualisation.
i am trying out Kubernetes on bare-metal, as a example I have docker containers exposing port 2002 (this is not HTTP).
I do not need to load balance traffic among my pods since each of new pod is doing its own jobs not for the same network clients.
Is there a software that will allow to access each new created service with new IP from internal DHCP so I can preserve my original container port?
I can create service with NodePort and access this pod by some randomly generated port that is forwarded to my 2002 port.
But i need to preserve that 2002 port while accessing my containers.
Each new service would need to be accessible by new LAN IP but with the same port as containers.
Is there some network plugin (LoadBalancer?) that will allow to forward from IP assigned by DHCP back to this randomly generated service port so I can access containers by original ports?
Starting service in Kubernetes, and then accessing this service with IP:2002, then starting another service but the same container image as previous, and then accessing it with another_new_IP:2002
Ah, that happens automatically within the cluster -- each Pod has its own IP address. I know you said bare metal, but this post by Lyft may give you some insight into how you can skip or augment the SDN and surface the Pod's IPs into routable address space, doing exactly what you want.
In more real terms: I haven't ever had the need to attempt such a thing, but CNI is likely flexible enough to interact with a DHCP server and pull a Pod's IP from a predetermined pool, so long as the pool is big enough to accommodate the frequency of Pod creation and termination.
Either way, I would absolutely read a blog post describing your attempt -- successful or not -- to pull this off!
On a separate note, be careful because the word Service means something specific within kubernetes, even though it is regrettably a word often used in a more generic term (as I suspect you did). Thankfully, a Service is designed to do the exact opposite of what you want to happen, so there was little chance of confusion -- just be aware.
I created a cluster using Kubernetes and I want to make petitions to a server using different IP, does each of the nodes has a different one so that I can parallelize the requests? Thanks
to make calls to your pods you should use kubernetes services which effectively loadbalance requests between your pods, so that you do not need to worry about particular IPs of PODs at all.
That said, each pod has it's unique IP address, but these are internal addresses, in most implementations they come from the overlay network and are, in a way, internal to kube cluster (can't be directly called from outside - which is not exactly the full truth, but close enough).
Depending on your goals, Ingress might be interesting for you.
https://kubernetes.io/docs/concepts/services-networking/service/
https://kubernetes.io/docs/concepts/services-networking/ingress/
I have a GCP project with two subnets (VPC₁ and VPC₂). In VPC₁ I have a few GCE instances and in VPC₂ I have a GKE cluster.
I have established VPC Network Peering between both VPCs, and POD₁'s host node can reach VM₁ and vice-versa. Now I would like to be able to reach VM₁ from within POD₁, but unfortunately I can't seem to be able to reach it.
Is this a matter of creating the appropriate firewall rules / routes on POD₁, perhaps using its host as router, or is there something else I need to do? How can I achieve connectivity between this pod and the GCE instance?
Network routes are only effective within its VPC. Say request from pod1 reaches VM1, VPC1 do not know how to route the packet back to pod1. To solve this, just need to SNAT traffic from Pod CIDR range in VPC2 and heading to VPC1.
Here is a simple daemonset that can help to inject iptables rules to your GKE cluster. It SNAT traffic based on custom destinations.
https://github.com/bowei/k8s-custom-iptables
Of course, the firewall rules need to be setup properly.
Or, if possible, you can create your cluster(s) with VPC-native and it will work automatically.
I have two dedicated servers and no hardware firewall. I'd like to forward all requests that come into the primary server on port 1008 to be fulfilled by another dedicated server on the same network. I know I need to set up some kind of TCP proxy, but I first heard of IPTables yesterday. Any quick tips?
The easiest way is to use something which is just a TCP proxy. It is possible to achieve this using iptables, but not easy.
It is easy to forward requests from A intended for B to C (using DNAT) but harder to get C's responses to go back to A via B (because DNAT does not change the sender address). A would then ignore the responses as they'd be coming from C rather than B.
Essentially the way of doing it would be to set up B as C's default gateway, and using forwarding, however, this places an additional point of failure in B; if B fails, C's outbound traffic (including responses to requests sent to C directly) would end up going down a black hole.
Using IPtables it is possible without this routing trick I think, but you'd need to have the same connection SNAT'd and DNAT'd, which is tricky at best.
Normally in such situations, most people put another host (firewall) in front of the two machines and have it make DNAT decisions - of course this introduces a point of failure as well, which is why in critical setups, the firewall typically has a rendundant backup (its configuration is synchronised and sometimes its connection tracking table is)
Carson is right: put a bridging firewall between servers and clients. Shorewall (IP tables) can then redirect traffic to different ports and different machines.
With the firewall being a bridge, you don't have to change your network settings, yet the bridge interface has to be assigned one IP address of each network clients and servers are in. Else the redirection won't work.
Caveat: the machine where the connection originally was pointed to has to be online (means: its IP address has to be in use), else the redirect won't work.
If the redirection is meant as a means of failover for high-availability, I would consider a load balancer (cluster) instead of the firewall, which leads to linux virtual server (for a general approach) or to a load balancer software like Apache (with mod proxy_balancer), balance or pount (if only http request are to be balanced). There's also hardware appliances like from f5 for load balancing.
First off I would recommend you get a firewall in place. I've used Shorewall for a long time to manage iptables and it is pretty easy to configure. Second if you use something like Shorewall there are easy guides on how to do DNAT (port fowarding).