I am having difficulty connecting Apache airflow to connect through a reverse proxy (Nginx) when running on Kubernetes. I have installed from the Helm stable/airflow chart. I enabled the Ingress resource to be created. What I am trying to do is get configure Nginx ingress controller to route public IP requests to the airflow-web ClusterIP service to the airflow-web pod.
I have attempted to follow the official documentation and several other issues that have popped up on StackOverflow 1, 2, and 3. All of these issues I've experienced are related to connecting airflow and Nginx. I Feel I am (as well as others) not understanding the concepts needed to tie Airflow and Nginx reverse proxy together. Is anyone able to explain the meaning of the additional configuration and why it's needed (relating to the official documentation)? I think using that as a basis I will understand how to then use that to configure it on my Kubernetes setup.
Related
We have a Kubernetes cluster running in Azure (AKS) with an Nginx controller as an ingress. I would like to track all incoming requests. Solutions like Prometheus with Grafana are not working, because the tracking should be highly customized.
I already found that Traefik implemented middlewares (https://doc.traefik.io/traefik/middlewares/overview/) which would be a great solution. Is there also a similar solution that I can use with Nginx?
So if I have 10 services that I need to expose to the outside world and use path-based routing to connect to different services, I can create an Nginx pod and service type LoadBalancer
I can then create Nginx configurations and can redirect to different services depending upon the URL path. After exploring more, I came to know about Nginx ingress which can also do the same. What is the difference between the two approaches and which approach is better?
In both cases, you are running an Nginx reverse proxy in a Kubernetes pod inside the cluster. There is not much technical difference between them.
If you run the proxy yourself, you have complete control over the Nginx version and the configuration, which could be desirable if you have very specific needs and you are already an Nginx expert. The most significant downside is that you need to manually reconfigure and redeploy the central proxy if you add or change a service.
If you use Kubernetes ingress, the cluster maintains the Nginx configuration for you, and it regenerates it based on Ingress objects that can be deployed per-service. This is easier to maintain if you are not an Nginx expert, and you can add and remove services without touching the centralized configuration.
The Kubernetes ingress system in principle can also plug in alternate proxies; it is not limited to Nginx. However, its available configuration is somewhat limited, and some of the other proxies I've looked at recommend using their own custom Kubernetes resources instead of the standard Ingress objects.
I understand there are various ways to get external traffic into the cluster - Ingress, cluster IP, node port and load balancer. I am particularly looking into the Ingress and k8s and from the documentation k8s supports AKS, EKS & Nginx controllers.
https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/
To implement Ingress, understand that we need to configure an Ingress Controller in the cluster. My query is whether Nginx ingress & proxy are an offering of core k8s (packaged / embedded)? Might have overlooked, did not find any documentation where it is mentioned. Any insight or pointer to documentation if stated above is true is highly appreciated.
Just reading the first rows of the page you linked, it states that no controller are started automatically with a cluster and that you must choose the one of your preference, depending on your requirements
Ingress controllers are not started automatically with a cluster. Use
this page to choose the ingress controller implementation that best
fits your cluster.
Kubernetes defines Ingress, IngressClass and other ingress-related resources but a fresh installation does not come with any default.
Some prepackaged installation of Kubernetes (like microk8s, minikube etc...) comes with ingress controller that, usually, needs to be enabled manually during the installation/configuration phase.
I followed this guide: https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nginx-ingress-with-cert-manager-on-digitalocean-kubernetes on how to setup an Nginx Ingress with Cert Manager with Kubernetes having DigitalOcean as a cloud provider.
The tutorial worked fine, I was able to setup everything according to what it was written. Though, (as it is stated) following the tutorial one ends up with three pods of which only one is in "Running 1/1", while the other two are "Down". Also when checking the comments section, it seems that it is quite a problem. Since if all the traffic gets routed to only 1 pods, it is not really scalable. Or am I missing something? Quoting from their tutorial:
Note: By default the Nginx Ingress LoadBalancer Service has
service.spec.externalTrafficPolicy set to the value Local, which
routes all load balancer traffic to nodes running Nginx Ingress Pods.
The other nodes will deliberately fail load balancer health checks so
that Ingress traffic does not get routed to them.
Mainly my question is: Is there a best practice that I am missing in order to have Kubernetes hosting my website? It seems I have to choose either scalability (having all the pods healthy and running) or getting IP of the client visitor.
And for whoever will ever find himself/herself in my situation, this is the reply I got from the DigitaOcean Support:
Unfortunately with that Kubernetes setup it would show those other
nodes as down without additional traffic configuration. It is possible
to skip the nginx ingress part and just use a DigitalOcean load
balancer but this again does require a good deal of setup and can be
more difficult then easy.
The suggestion to have a website with analytics (IP) and scalable was to setup a droplet with Nginx and setup a LoadBalancer to it. More specifically:
As for using a droplet this would be a normal website configuration
with Nginx as your webserver configured to serve content to your app.
You would have full access to your application and the Nginx logs on
the droplet itself. Putting a load balancer in front of this would
require additional configuration as load balancers do not pass the
x-forward header so the IP addresses of clients would not show up in
the logs by default. You would need to configured proxy protocol on
the load balancer and in your nginx configuration to be able to obtain
those IPs.
https://www.digitalocean.com/blog/load-balancers-now-support-proxy-protocol/
This is also a bit more complex unfortunately.
Hope it might save some time to someone
The Problem
I need to expose a Kubernetes NodePort service externally over https.
The Setup
I've deployed Kubernetes on bare-metal and have deployed Polyaxon on the cluster via Helm
I need to access Polyaxon's dashboard via the browser, using a virtual machine that's external to the cluster
The dashboard is exposed as a NodePort service, and I'm able to connect to it over http. I am not able to connect over https, which is a hard requirement in my case.
Following an initial "buildout" period, both the cluster and the virtual machine will not have access to the broader internet. They will connect to one another and that's it.
Polyaxon supposedly supports SSL/TLS through its own configs, but there's very little documentation on this. I've made my best attempts to solve the issue that way and also bumped an issue on their github, but haven't had any luck so far.
So I'm now wondering if there might be a more general Kubernetes hack that could help me here.
The Solutions
I'm looking for the simplest solution, rather than the most elegant or scalable. There are also some things that might make my situation simpler than the average user who would want https, namely:
It would be OK to support https on just one node, rather than every node
I don't need (or really want) a domain name; connecting at https://<ip_address>:<port> is not just OK but preferred
A self-signed certificate is also OK
So I'm hoping there's some way to manipulate the NodePort service directly such that https will work on the virtual machine. If that's not possible, other solutions I've considered are using an Ingress Controller or some sort of proxy, but those solutions are both a little half-baked in my mind. I'm a novice with both Kubernetes and networking ideas in general, so if you're going to propose something more complex please speak very slowly :)
Thanks a ton for your help!
Ingress-controller it's a standard way to expose HTTP backend over TLS connection from cluster to client.
Existing NodePort service has ClusterIP which can be used as a backend for Ingress. ClusterIP type of service is enough, so you can change service type later to prevent HTTP access via nodeIP:nodePort.
Ingress-controller allows you to teminate TLS connection or pass-through TLS traffic to the backend.
You can use self-signed certificate or use cert-manager with Let's encrypt service.
Note, that starting from 0.22.0 version Nginx-ingress rewrite syntax has changed and some examples in the articles may be outdated.
Check the links:
TLS termination
TLS/HTTPS
How to get Kubernetes Ingress to terminate SSL and proxy to service?
Configure Nginx Ingress Controller for TLS termination on Kubernetes on Azure