K3S Rancher - Debian based docker images not resolving dns properly - networking

I am facing strange problem.
What I've done:
I deployed Rancher K3S cluster and there is a problem in dns resolving with debian based images. Domains are not resolved properly - it adds suffix to it with one of ours domain.
What I've found:
Debian based image adds suffix with domain to the end. e.g. I ping google.com and its pinging google.com.example.com. (example.com is one of our domains - not specifing it because it is not important imo)
The same for curl google.com makes request to ip address of example.com . Even tried pure debian image and it is still doing the same issue.
Alpine based images works fine (ping to google.com pings google.com, nslookup shows right ip address).
Host server where k3s is installed also works fine (redhat os). Ping to google.com pings google.com.
Some additional data that can maybe help you:
CoreDNS configmap
kubectl -n kube-system get configmap coredns -o go-template={{.data.Corefile}}
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
hosts /etc/coredns/NodeHosts {
reload 1s
fallthrough
}
prometheus :9153
forward . 8.8.8.8
cache 30
loop
reload
loadbalance
}
Does anyone faced the same or similar problem?
Do you have some points to push me towards solving?
Thanks,
David

I faced similar issues with k3s (v.v1.19.3+k3s3) on centos 8 (not quite sure it has anything to do with the images' OS, though). k3s is a bit less plug and play that other distro like microk8s.
Use local DNS parameter
On each node, you could say that you want to use the host's resolv parameters. If k3s is managed as systemd service (which is probably the case), you could just edit /etc/systemd/system/k3s.service.env to add you system's resolv.conf
K3S_RESOLV_CONF=/etc/resolv.conf
and then restart the service
sudo systemctl status k3s
plus: the easiest solution, easily scriptable
cons: you'll need to do it on each of your nodes (from what I understand). Different resolv.conf on different systems involves that the very same deployment might not act the same way depending on the nodes used by kube
relevant documentation
Use Global DNS
Haven't tried but here is the doc

Related

Cannot see initial page from nginx in other computers in same LAN on Fedora 35

I have installed it with command "sudo yum install ngingx" and its visible from computer host using its own ip in the browser, but in other computer in the same LAN and resolving ping it doesnt work and answers a timeout error. I know there is a /etc/nginx/nginx.conf file but I didnt see any valid configuration to resolve this (or I didnt search very well).
Machine has internet and resolves ping to other machines in lan
Could somebody guide me?
I use virtualbox to run Fedora
Thank you, here I left nginx.conf enter image description here
First of all, are you using bridge mode in virtualBox? If so and this is still not working, check if Fedora has enabled the firewall by typing in a shell:
systemctl status firewalld.service
If active, check the zone where the main adapter is configured
firewall-cmd --get-active-zones
Add ports 443 and 80 to the zone related to your interface (for instance FedoraWorkstation)
firewall-cmd --zone=FedoraWorkstation --permanent --add-port=80/tcp
firewall-cmd --zone=FedoraWorkstation --permanent --add-port=443/tcp
This should do the trick
Finally the problem was that apache is installed by default in fedora workstation and main page of nginx was showing apache as current web server, so any changes made in nginx wasnt being loaded. Solution: purge apache from system and reboot. Now nginx load as the main web server and changes made are applied

How to fix Kubernetes Ingress Controller cutting off nodes from cluster

I'm having some trouble installing an Ingress Controller in my on-prem cluster (created with Kubespray, running MetalLB to create LoadBalancer.).
I tried using nginx, traefik and kong but all got the same results.
I'm installing my the nginx helm chart using the following values.yaml:
controller:
kind: DaemonSet
nodeSelector:
node-role.kubernetes.io/master: ""
image:
tag: 0.23.0
rbac:
create: true
With command:
helm install --name nginx stable/nginx-ingress --values values.yaml --namespace ingress-nginx
When I deploy the ingress controller in the cluster, a service is created (e.g. nginx-ingress-controller for nginx). This service is of the type LoadBalancer and gets an external IP.
When this external IP is assigned, the node that's linked to this external IP is lost (status Not Ready). However, when I check this node, it's still running, it's just cut off from the other
nodes, it can't even ping them (No route found). When I remove the service (not the rest of the nginx helm chart), everything works and the Ingress works. I also tried installing nginx/traefik/kong without a LoadBalancer using NodePorts or External IPs on the service, but I get the same result.
Does anyone recognize this behaviour?
Why does the ingress still work, even when I remove the nginx-ingress-controller service?
After a long search, we finally found a working solution for this problem.
As mentioned by #A_Suh, the pool of IPs that metallb uses, should contain IPs that are currently not used by one of the nodes in the cluster. By adding a new IP range that's also configured in the DHCP server, metallb can use ARP to link one of the IPs to one of the nodes.
For example in my 5 node cluster (kube11-15): When metallb gets the range 10.4.5.200/31 and allocates 10.4.5.200 for my nginx-ingress-controller, 10.4.5.200 is linked to kube12. On ARP requests for 10.4.5.200, all 5 nodes respond with kube12 and trafic will be routed to this node.

how to access local kubernetes minikube dashboard remotely

Kubernetes newbie (or rather basic networking) question:
Installed single node minikube (0.23 release) on a ubuntu box running in my lan (on IP address 192.168.0.20) with virtualbox.
minikube start command completes successfully as well
minikube start
Starting local Kubernetes v1.8.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
minikube dashboard also comes up successfully. (running on 192.168.99.100:30000)
what i want to do is access minikube dashboard from my macbook (running on 192.168.0.11) in the same LAN.
Also I want to access the same minikube dashboard from the internet.
For LAN Access:
Now from what i understand i am using virtualbox (the default vm option), i can change the networking type (to NAT with port forwarding) using vboxnet command
VBoxManage modifyvm "VM name" --natpf1 "guestssh,tcp,,2222,,22"
as listed here
In my case it will be something like this
VBoxManage modifyvm "VM name" --natpf1 "guesthttp,http,,30000,,8080"
Am i thinking along the right lines here?
Also for remotely accessing the same minikube dashboard address, i can setup a no-ip.com like service. They asked to install their utility on linux box and also setup port forwarding in the router settings which will port forward from host port to guest port. Is that about right? Am i missing something here?
I was able to get running with something as simple as:
kubectl proxy --address='0.0.0.0' --disable-filter=true
#Jeff provided the perfect answer, put more hints for newbies.
Start a proxy using #Jeff's script, as default it will open a proxy on '0.0.0.0:8001'.
kubectl proxy --address='0.0.0.0' --disable-filter=true
Visit the dashboard via the link below:
curl http://your_api_server_ip:8001/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/
More details please refer to the officially doc.
I reached this url with search keywords: minikube dashboard remote.
In my case, minikube (and its dashboard) were running remotely and I wanted to access it securely from my laptop.
[my laptop] --ssh--> [remote server with minikube]
Following gmiretti's answer, my solution was local forwarding ssh tunnel:
On minikube remote server, ran these:
minikube dashboard
kubectl proxy
And on my laptop, ran these (keep localhost as is):
ssh -L 12345:localhost:8001 myLogin#myRemoteServer
The dashboard was then available at this url on my laptop:
http://localhost:12345/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/
The ssh way
Assuming that you have ssh on your ubuntu box.
First run kubectl proxy & to expose the dashboard on http://localhost:8001
Then expose the dashboard using ssh's port forwarding, executing:
ssh -R 30000:127.0.0.1:8001 $USER#192.168.0.20
Now you should access the dashboard from your macbook in your LAN pointing the browser to http://192.168.0.20:30000
To expose it from outside, just expose the port 30000 using no-ip.com, maybe change it to some standard port, like 80.
Note that isn't the simplest solution but in some places would work without having superuser rights ;) You can automate the login after restarts of the ubuntu box using a init script and setting public key for connection.
I had the same problem recently and solved it as follows:
Get your minikube VM onto the LAN by adding another network adapter in bridge network mode. For me, this was done through modifying the minikube VM in the VirtualBox UI and required VM stop/start. Not sure how this would work if you're using hyperkit. Don't muck with the default network adapters configured by minikube: minikube depends on these. https://github.com/kubernetes/minikube/issues/1471
If you haven't already, install kubectl on your mac: https://kubernetes.io/docs/tasks/tools/install-kubectl/
Add a cluster and associated config to the ~/.kube/config as below, modifying the server IP address to match your newly exposed VM IP. Names can also be modified if desired. Note that the insecure-skip-tls-verify: true is needed because the https certificate generated by minikube is only valid for the internal IP addresses of the VM.
clusters:
- cluster:
insecure-skip-tls-verify: true
server: https://192.168.0.101:8443
name: mykubevm
contexts:
- context:
cluster: mykubevm
user: kubeuser
name: mykubevm
users:
- name: kubeuser
user:
client-certificate: /Users/myname/.minikube/client.crt
client-key: /Users/myname/.minikube/client.key
Copy the ~/.minikube/client.* files referenced in the config from your linux minikube host. These are the security key files required for access.
Set your kubectl context: kubectl config set-context mykubevm. At this point, your minikube cluster should be accessible (try kubectl cluster-info).
Run kubectl proxy http://localhost:8000 to create a local proxy for access to the dashboard. Navigate to that address in your browser.
It's also possible to ssh to the minikube VM. Copy the ssh key pair from ~/.minikube/machines/minikube/id_rsa* to your .ssh directory (renaming to avoid blowing away other keys, e.g. mykubevm & mykubevm.pub). Then ssh -i ~/.ssh/mykubevm docker#<kubevm-IP>
Thanks for your valuable answers, If you have to use the kubectl proxy command unable to view permanently, using the below "Service" object in YAML file able to view remotely until you stopped it. Create a new yaml file minikube-dashboard.yaml and write the code manually, I don't recommend copy and paste it.
apiVersion : v1
kind: Service
metadata:
labels:
app: kubernetes-dashboard
name: kubernetes-dashboard-test
namespace: kube-system
spec:
ports:
- port: 80
protocol: TCP
targetPort: 9090
nodePort: 30000
selector:
app: kubernetes-dashboard
type: NodePort
Execute the command,
$ sudo kubectl apply -f minikube-dashboard.yaml
Finally, open the URL:
http://your-public-ip-address:30000/#!/persistentvolume?namespace=default
Slight variation on the approach above.
I have an http web service with NodePort 30003. I make it available on port 80 externally by running:
sudo ssh -v -i ~/.ssh/id_rsa -N -L 0.0.0.0:80:localhost:30003 ${USER}#$(hostname)
Jeff Prouty added useful answer:
I was able to get running with something as simple as:
kubectl proxy --address='0.0.0.0' --disable-filter=true
But for me it didn't worked initially.
I run this command on the CentOS 7 machine with running kubectl (local IP: 192.168.0.20).
When I tried to access dashboard from another computer (which was in LAN obviously):
http://192.168.0.20:8001/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy/
then only timeout was in my web browser.
The solution for my case is that in CentOS 7 (and probably other distros) you need to open port 8001 in your OS firewall.
So in my case I need to run in CentOS 7 terminal:
sudo firewall-cmd --zone=public --add-port=8001/tcp --permanent
sudo firewall-cmd --reload
And after that. It works! :)
Of course you need to be aware that this is not safe solution, because anybody have access to your dashbord now. But I think that for local lab testing it will be sufficient.
In other linux distros, command for opening ports in firewall can be different. Please use google for that.
Wanted to link this answer by iamnat.
https://stackoverflow.com/a/40773822
Use minikube ip to get your minikube ip on the host machine
Create the NodePort service
You should be able to access the configured NodePort id via < minikubeip >:< nodeport >
This should work on the LAN as well as long as firewalls are open, if I'm not mistaken.
Just for my learning purposes I solved this issue using nginx proxy_pass. For example if the dashboard has been bound to a port, lets say 43587. So my local url to that dashboard was
http://127.0.0.1:43587/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/
Then I installed nginx and went to the out of the box config
sudo nano /etc/nginx/sites-available/default
and edited the location directive to look like this:
location / {
proxy_set_header Host "localhost";
proxy_pass http://127.0.0.1:43587;
}
then I did
sudo service nginx restart
then the dashboard was available from outside at:
http://my_server_ip/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/#/cronjob?namespace=default

kubernetes: a service is not accessible outside host

I am following the guide at http://kubernetes.io/docs/getting-started-guides/ubuntu/ to create a kubernetes cluster. Once the cluster is up, i can create pods and services using kubectl. Basically, do the following
kubectl run nginx --image=nginx --port=80
kubectl expose deployment/nginx
I see a pod and service running
# kubectl get services
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 192.168.3.1 <none> 443/TCP 2d
nginx 192.168.3.208 <none> 80/TCP 2d
When I try to access the service from the machine where the pod is running, I get back the nginx helloworld page. But if i try it another machine in the kubernetes cluster, i get a timeout.
I thought all the services are accessible anywhere in the cluster. Why could it not be working that way?
Thanks
Yes, services should be accessible anywhere in the cluster. Is your "another machine" listed in the output of kubectl get nodes? Is the node Ready? Maybe the machine wasn't configured correctly.
If you want to get the servicer anywherer in the cluster, You must use the network plug-in,such as Flannel,OpenVSwitch.
http://kubernetes.io/docs/admin/networking/#flannel
https://github.com/coreos/flannel#flannel
found out my error by comparing it with another installation where it worked. This installation was missing an iptables rule that forced everything going to the containers onto the flannel interface. So the traffic was reaching the target host on eth0 making it discard the packet. I donot know why the proxy didnt add that rule. Once i manually added it, it worked.

Hadoop Datanodes cannot find NameNode

I've set up a distributed Hadoop environment within VirtualBox: 4 virtual Ubuntu 11.10 installations, one acting as the master node, the other three as slaves. I followed this tutorial to get the single-node version up and running and then converted to the fully-distributed version. It was working just fine when I was running 11.04; however, when I upgraded to 11.10, it broke. Now all my slaves' logs show the following error message, repeated ad nauseum:
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 0 time(s).
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 1 time(s).
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 2 time(s).
And so on. I've found other instances of this error message on the Internet (and StackOverflow) but none of the solutions have worked (tried changing the core-site.xml and mapred-site.xml entries to be the IP address rather than hostname; quadruple-checked /etc/hosts on all slaves and master; master can SSH password-less into all slaves). I even tried reverting each slave back to a single-node setup, and they would all work fine in this case (on that note, the master always works fine as both a Datanode and the Namenode).
The only symptom I've found that would seem to give a lead is that from any of the slaves, when I attempt a telnet 192.168.1.10 54310, I get Connection refused, suggesting there is some rule blocking access (which must have gone into effect when I upgraded to 11.10).
My /etc/hosts.allow has not changed, however. I tried the rule ALL: 192.168.1., but it did not change the behavior.
Oh yes, and netstat on the master clearly shows tcp ports 54310 and 54311 are listening.
Anyone have any suggestions to get the slave Datanodes to recognize the Namenode?
EDIT #1: In doing some poking around with nmap (see comments on this post), I'm thinking the issue is in my /etc/hosts files. This is what is listed for the master VM:
127.0.0.1 localhost
127.0.1.1 master
192.168.1.10 master
192.168.1.11 slave1
192.168.1.12 slave2
192.168.1.13 slave3
For each slave VM:
127.0.0.1 localhost
127.0.1.1 slaveX
192.168.1.10 master
192.168.1.1X slaveX
Unfortunately, I'm not sure what I changed, but the NameNode is now always dying with the exception of trying to bind a port "that's already in use" (127.0.1.1:54310). I'm clearly doing something wrong with the hostnames and IP addresses, but I'm really not sure what it is. Thoughts?
I found it! By commenting out the second line of the /etc/hosts file (the one with the 127.0.1.1 entry), netstat shows the NameNode ports binding to the 192.168.1.10 address instead of the local one, and the slave VMs found it. Ahhhhhhhh. Mystery solved! Thanks for everyone's help.
This solution worked for me. i.e make sure that the name you used in property in core-site.xml and mapred-site.xml :
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<final>true</final>
</property>
i.e. master is defined in /etc/hosts as xyz.xyz.xyz.xyz master on BOTH master and slave nodes.
Then restart the namenode and check using
netstat -tuplen
and to see that it is bound to the "external" IP address
tcp 0 xyz.xyz.xyz.xyz:54310 0.0.0.0:* LISTEN 102 107203 -
and NOT local IP 192.168.x.y or 127.0.x.y
I had the same trouble. #Magsol solution worked but it should be noted that the entry that needs to be commented out is
127.0.1.1 masterxyz
on the master machine, not the 127.0.1.1 on the slave, though I did that too. Also you need to stop-all.sh and start-all.sh for hadoop, probably obvious.
Once you have restarted hadoop check the nodemaster here: http://masterxyz:50030/jobtracker.jsp
and look at the number of nodes available for jobs.
Though this response is not the solution the author is looking for, other users might land on this page thinking otherwise, so if you are using AWS for setting up your cluster, it is likely that ICMP security rules haven't been enabled in AWS Security Groups page. Look at the following: Pinging EC2 instances
The above solved the connectivity issue from data nodes to master nodes. Ensure that you can ping between each instance.
I am running a 2-nodes cluster.
192.168.0.24 master
192.168.0.26 worker2
I was facing the same problem of Retrying connect to server: master/192.168.0.24:54310 in my worker2 machine logs. But the people mentioned above encountered errors running this command - telnet 192.168.0.24 54310. However, in my case the telnet command worked fine. Then I checked my /etc/hosts file
master /etc/hosts
127.0.0.1 localhost
192.168.0.24 ubuntu
192.168.0.24 master
192.168.0.26 worker2
worker2 /etc/hosts
127.0.0.1 localhost
192.168.0.26 ubuntu
192.168.0.24 master
192.168.0.26 worker2
When I hit http://localhost:50070 on master, I saw Live nodes : 2. But when I clicked on it, I saw only one datanode which was of master's. I checked jps both on master and worker2. Datanode process was running on both the machines.
Then after several trial and errors, I realized that my master and worker2 machines had the same host name "ubuntu". I changed the worker2's hostname from "ubuntu" to "worker2" and removed the "ubuntu" entry from the worker2 machine.
Note: To change the hostname edit the /etc/hostname with sudo.
Bingo! It worked :) I was able to see two datanodes on the dfshealth UI page ( locahost:50070)
I also faced similar issue. (I am using ubuntu 17.0)
I kept only the entries of master and slaves in /etc/hosts file. (in both master and slave machines)
127.0.0.1 localhost
192.168.201.101 master
192.168.201.102 slave1
192.168.201.103 slave2
secondly, > sudo gedit /etc/hosts.allow
and add the entry : ALL:192.168.201.
thirdly, disabled the firewall using sudo ufw disable
finally, I deleted both namenode and datanode folders from all the nodes in cluster, and rerun
$HADOOP_HOME/bin> hdfs namenode -format -force
$HADOOP_HOME/sbin> ./start-dfs.sh
$HADOOP_HOME/sbin> ./start-yarn.sh
To check the health report from command line (which I would recommend)
$HADOOP_HOME/bin> hdfs dfsadmin -report
and I got all the nodes working correctly.

Resources