Kubernetes API server failing to start: TLS handshake error - networking

Out of nowhere one of our API servers has started to fail with the following error:
http: TLS handshake error from 172.23.88.213:17244: EOF
It throws this error for every single node in the cluster, thus failing to start. This started happening this morning with no changes to any infrastructure.
Things I've tried that haven't helped:
Manually restart the weave docker container on the master node.
Manually kill and reschedule the api-server.
Manually restart the Docker daemon.
Manually restarted the kubelet service.
Check all SSL certs are valid, which they are.
Check inodes, thousands free.
Ping IP addresses of other nodes in cluster, all return ok with 0 packet loss.
Check journalctl and systemctl logs of kubelet services and the only significant errors I see are related to TLS handshake error.
Cluster specs:
Cloud provider: AWS
Kubernetes version: 1.11.6
Kubelet version: 1.11.6
Kops version: 1.11
I'm at a bit of a loss as to how to debug this further.

Related

What is NGINX [notice] signal process started error message

With regards to nginx error log, what does 2020/10/23 06:51:45 [notice] 361#361: signal process started mean?
Some more context:
I have some raspberry pi's communicating with my Django application on a digital ocean Ubuntu droplet running nginx as the web server. These raspberry pi's stopped communicating with my server and they are physically very far from me. I can see their last communication with my server was at 2020/10/23 06:51:41 then they stopped (seconds before nginx error message was logged).
A user that has access to the pi's said they did not lose power, internet is working, so they did a reboot, still nothing.
I have tried:
sudo systemctl restart nginx followed by sudo systemctl restart gunicorn
This did not resolve the issue. I can't seem to find the documentation on this error
Have you verified the security group of your instance from digital ocean ? I think your port is not opened for http and HTTPS. Cross check for port 80 and 443. Is it opened or closed.

Kubernetes Ingress nginx on Minikube fails

minikube v1.13.0 on Ubuntu 18.04 with Kubernetes v1.19.0 on Docker 19.03.8. Using helm/helmfile ("v3.3.4"). The Ubuntu VM is on VM-Workstation running on Win10, networking set as NAT, everything in my home wifi network.
I am trying to use ingress-backend stable/nginx-ingress 1.36.0 . I do have the nginx-ingress-1.36.0.tgz in the ingress/charts folder, and I have ingress/enabled minikube addons enable ingress.
Before I had enabled ingress on minikube, everything will get deployed successfully (no errors) but the service/LB stayed pending:
ClusterIP 10.101.41.156 <none> 8080/TCP
ingress-controller-nginx-ingress-controller LoadBalancer 10.98.157.222 <pending> 80:30050/TCP,443:32294/TCP
After I enabled ingress on minikube, I now get this connection refused error:
STDERR:
Error: UPGRADE FAILED: cannot patch "ingress-service" with kind Ingress:
Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://ingress-nginx-controller-admission.kube-system.svc:443/extensions/v1beta1/ingresses?timeout=30s":
dial tcp 10.105.131.220:443: connect: connection refused
COMBINED OUTPUT:
Error: UPGRADE FAILED: cannot patch "ingress-service" with kind Ingress:
Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://ingress-nginx-controller-admission.kube-system.svc:443/extensions/v1beta1/ingresses?timeout=30s":
dial tcp 10.105.131.220:443: connect: connection refused
I don't know what is this IP 10.105.131.220 - looks like pvt IP. It is not my minikube IP, or my VM IP or my laptop IP, I cant ping it.
But it all still deploys fine- but the Load Balancer still shows pending.
Update
I had missed one of the Steps based on documentation
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.30.0/deploy/static/mandatory.yaml
I stopped/deleted minkube and redid everything, now the error is gone, but the loadbalancer is still <pending>
By default all solutions like minikube does not provide you LoadBalancer. Cloud solutions like EKS, Google Cloud, Azure do it for you automatically by spinning in the background separate LB. Thats why you see Pending status.
Solutions:
use MetalLB on minikube
MetalLB hooks into your Kubernetes cluster, and provides a network load-balancer implementation. In short, it allows you to create Kubernetes services of type LoadBalancer in clusters that don’t run on a cloud provider, and thus cannot simply hook into paid products to provide load-balancers.
Installation:
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.8.1/manifests/metallb.yaml
namespace/metallb-system created
podsecuritypolicy.policy/speaker created
serviceaccount/controller created
serviceaccount/speaker created
clusterrole.rbac.authorization.k8s.io/metallb-system:controller created
clusterrole.rbac.authorization.k8s.io/metallb-system:speaker created
role.rbac.authorization.k8s.io/config-watcher created
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:controller created
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:speaker created
rolebinding.rbac.authorization.k8s.io/config-watcher created
use minikube tunnel
Services of type LoadBalancer can be exposed via the minikube tunnel
command. It must be run in a separate terminal window to keep the
LoadBalancer running. Ctrl-C in the terminal can be used to terminate
the process at which time the network routes will be cleaned up.
minikube tunnel runs as a process, creating a network route on the host to the service CIDR of the cluster using the cluster’s IP
address as a gateway. The tunnel command exposes the external IP
directly to any program running on the host operating system.

Kubernetes Dashboard : Dashboard keeps cancelling connection to pod, resulting in bad gateway to user

I am using kubernetes-dashboard to view all pods, check status, login, pass commands, etc. It works good, but there is a lot of connectivity issues related to it. I am currently running it on port-8443, and forwarding the connection from 443 to 8443 via Nginx's proxy pass. But I keep getting bad gateway, and connection keeps dropping. It's not an nginx issue, since I have kubernetes error. I am using Letsencrypt certificate in nginx, What am I doing wrong?
Error log :
E0831 05:31:45.839693 11324 portforward.go:385] error copying from local connection to remote stream: read tcp4 127.0.0.1:8443->127.0.0.1:33380: read: connection reset by peer
E0831 05:33:22.971448 11324 portforward.go:340] error creating error stream for port 8443 -> 8443: Timeout occured
Theses are the 2 errors I constantly get. I am running this command as a nohup process :
nohup kubectl port-forward -n kubernetes-dashboard service/kubernetes-dashboard 8443:443 --address 0.0.0.0 &
And finally my nginx config :
default :
location / {
proxy_intercept_errors off;
proxy_pass https://localhost:8443/;
}
Thank you. :-)
Unfortunately this is an on-going issue with Kubernetes' port forwarding. You may find it not particularly reliable when used for long-running connections. If possible, try to setup a direct connection instead. A more extended discussion regarding this can be found here and here.

serverspec test fails for listening port

I got a kitchen-ansible test that runs serverspec as a verifier. The test runs on two containers. One running with Amazon Linux 1 and the other Amazon Linux 2. The Ansible code installs a Keycloak server which listens on the ports 8080 and 8443.
In the Amazon Linux 1 container, everything's fine and the serverspec reports the ports to be listening.
In the Amazon Linux 2 container, the installation also ends without any errors but serverspec fails to report the ports not be listening. As I found out Serverspec is wrong.
After logging into the container running netstat -tulpen |grep LISTEN it shows the ports to be listening. Serverspec is checking with ss command: /bin/sh -c ss\ -tunl\ \|\ grep\ -E\ --\ :8443\\\
So I logged in to the Amazon Linux 1 container for checking the output of the ss command there and it showed no listening on both ports.
So has anyone a clue why the serverspec succeeds on Amazon Linux 1 and fails on Amazon Linux 2 despite in both containers the ss command is reporting no ports to be listened on?
The root cause was, that the ports aren't bind quickly enough. Serverspec starts to check, when the service hasn't been started completely. Logging in to the container takes more time, so the service is started successful and the ports are bound.

Creating docker repo in Artifactory with dedicated port, it says "SocketException: Permission denied"

I am running Artifactory Pro (5.3.1), and was trying to use the docker registry functionality.
I created a docker repository, and gave it a port 5001 in the "Registry Port" config.
However, there's nothing running on port 5001 ("telnet localhost 5001" refuses to connect), and the logs show this:
[http-nio-8081-exec-7] [ERROR] (o.a.s.s.SshAuthServiceImpl:210) - Failed to start SSH server
java.net.SocketException: Permission denied
at sun.nio.ch.Net.bind0(Native Method) ~[na:1.8.0_72-internal]
at sun.nio.ch.Net.bind(Net.java:433) ~[na:1.8.0_72-internal]
at sun.nio.ch.Net.bind(Net.java:425) ~[na:1.8.0_72-internal]
at sun.nio.ch.AsynchronousServerSocketChannelImpl.bind(AsynchronousServerSocketChannelImpl.java:162) ~[na:1.8.0_72-internal]
at org.apache.sshd.common.io.nio2.Nio2Acceptor.bind(Nio2Acceptor.java:66) ~[sshd-core-0.14.0.jar:0.14.0]
Any idea what could cause a "permission denied"? There's nothing running on that port (same error for any other port). It's on Ubuntu 14.04.
I had a misunderstanding how the docker registry worked with Artifactory.
The Artifactory service doesn't actually open the port assigned to the repo (5001 in this case), but the reverse proxy will listen on it and forward it (with the right X-forwarded-port) to the "normal" Artifactory service port (e.g. 8081).
After setting up the reverse proxy for it, it worked fine.

Resources