Can't complete HTTP challenge for letsencrypt on Kubernetes - nginx

I have a k3s cluster and I'm trying to configure it to get a SSL certificate from let's encrypt. I have followed many guides, and I think I'm really near to manage it, but the problem is that the Challenge object in Kubernetes reports this error:
Waiting for HTTP-01 challenge propagation: failed to perform self check GET request 'http://devstore.XXXXXXX.com/.well-known/acme-challenge/kVVHaQaaGU7kbYqnt8v7LZGaQvWs54OHEe2WwI_MOgk': Get "http://devstore.XXXXXXX.com/.well-known/acme-challenge/kVVHaQaaGU7kbYqnt8v7LZGaQvWs54OHEe2WwI_MOgk": dial tcp: lookup devstore.XXXXXXX.com on 10.43.0.10:53: no such host
It seems that the in some way cert manager is trying to resolve my public DNS name internally, and is not managing to do it, so the challenge is not working. Can you help me on that, I googled it but I cannot find a solution for it...
Thank you

It is probable that the DNS record for the domain you want the certificate does not exist.
If if does, and you are using a split horizon DNS config (hijacking the .com domain in your local network) make sure it points out to your public ip (e.g. your home gateway)
[Edit]
Also, you have to figure out LE getting to your cluster in the network, so port-forward 80/443 to your cluster's IPs.
You can get away with it because k3s will default to cluster traffic policy in the load balancer

This can be caused by multiple different reasons. If you find that it is a transient issue (or possibly if you have misconfigured coredns before), you might want to double-check your coredns configmap (in the kube-system namespace).
E.g. you could remove/reduce caching, or point to different DNS nameservers.
Here's a description of the issue, where a switch to Google DNS + cache removal helped clear the issue.

Thank you DarthHTTP, I finally manage to make it work! The problem was, as I mentioned on the comment, that the firewall was not routing correctly the HTTP request using the public IP from the private network side. I solved configuring an internal DNS server that is resolving the name with the private IP address of the K3S node, and using that server as the DNS server for the K3S node. Eventually my HTTP web app got a valid let's encrypt certificate!

Related

Best practise for a website hosted on Kubernetes (DigitalOcean)

I followed this guide: https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nginx-ingress-with-cert-manager-on-digitalocean-kubernetes on how to setup an Nginx Ingress with Cert Manager with Kubernetes having DigitalOcean as a cloud provider.
The tutorial worked fine, I was able to setup everything according to what it was written. Though, (as it is stated) following the tutorial one ends up with three pods of which only one is in "Running 1/1", while the other two are "Down". Also when checking the comments section, it seems that it is quite a problem. Since if all the traffic gets routed to only 1 pods, it is not really scalable. Or am I missing something? Quoting from their tutorial:
Note: By default the Nginx Ingress LoadBalancer Service has
service.spec.externalTrafficPolicy set to the value Local, which
routes all load balancer traffic to nodes running Nginx Ingress Pods.
The other nodes will deliberately fail load balancer health checks so
that Ingress traffic does not get routed to them.
Mainly my question is: Is there a best practice that I am missing in order to have Kubernetes hosting my website? It seems I have to choose either scalability (having all the pods healthy and running) or getting IP of the client visitor.
And for whoever will ever find himself/herself in my situation, this is the reply I got from the DigitaOcean Support:
Unfortunately with that Kubernetes setup it would show those other
nodes as down without additional traffic configuration. It is possible
to skip the nginx ingress part and just use a DigitalOcean load
balancer but this again does require a good deal of setup and can be
more difficult then easy.
The suggestion to have a website with analytics (IP) and scalable was to setup a droplet with Nginx and setup a LoadBalancer to it. More specifically:
As for using a droplet this would be a normal website configuration
with Nginx as your webserver configured to serve content to your app.
You would have full access to your application and the Nginx logs on
the droplet itself. Putting a load balancer in front of this would
require additional configuration as load balancers do not pass the
x-forward header so the IP addresses of clients would not show up in
the logs by default. You would need to configured proxy protocol on
the load balancer and in your nginx configuration to be able to obtain
those IPs.
https://www.digitalocean.com/blog/load-balancers-now-support-proxy-protocol/
This is also a bit more complex unfortunately.
Hope it might save some time to someone

Kubernetes nginx ingress prevent response from default-backend on https

Currently with the nginx ingress in kubernetes it will always respond to direct IP requests (i.e. http://1.1.1.1) with the default-backend and there doesn't appear to be a way of disabling it.
Worse, it will also respond to https://1.1.1.1 in the same manner with a self-signed cert (you can override it but obviously even if you provide a valid cert, it still won't be valid against an IP request) This is a major security vulnerability that causes any site using Kubernetes and nginx ingress to fail PCI compliance network scans.
AND there is no way of overriding this behavior in your ingress defintion.
I'm trying to figure out how without hacking to be able to prevent the default-backend from responding to https on an IP request given that there is never a case in a production environment where this would be secure and will always cause PCI failure.
How does one get the nginx ingress to not respond to https on an IP request?
So I finally worked around this for PCI purposes. It isn't satisfying but it works and they don't freak out. Just apply the default certificate as the same for your site. It isn't valid, but because it comes from a valid signing authority PCI scans allow it to go through.
Which is lame, but whatever.
I tried googling and tested several ideas but non of them worked.
I also tried reading nginx-ingress code but didn't find anything satisfying.
I think that the best thing you could do is to ask developers.
Just open an issue on nginx-ingress github repository.

OpenShift / NGiNX reverse proxy guidance - pass to SDN-based addresses?

I am configuring a fairly complicated app for a client, and am getting stuck with the reverse proxy model. As far as my understanding goes, we should proxy_pass/uwsgi_pass to the internal endpoint addresses (172.30.0.0/16), such as
appname.project.svc.cluster.local
However, these addresses, although DNS-resolvable from within the pods which make up the app, are not reachable. The pods seem to run from the 10.200.0.0/14 SDN address range, and so no route exists for them by default from within the pod.
The alternative might be to proxy_pass to the exposed routes of each service, but this seems wrong - the request would then be routed back out of the OpenShift pod space, back through the (default haproxy) router to the exposed endpoint address.
What is the correct way?
Seasons greetings and thanks
In order to answer my own question, I wanted to mention.. I just discovered the other types of SkyDNS-based ranges, such as:
app.project.endpoints.cluster.local
See here: Table 1. DNS Example Names
These are reachable from the pods.
Thanks for your time

Supporting SSL/TLS for a Kubernetes NodePort Service

The Problem
I need to expose a Kubernetes NodePort service externally over https.
The Setup
I've deployed Kubernetes on bare-metal and have deployed Polyaxon on the cluster via Helm
I need to access Polyaxon's dashboard via the browser, using a virtual machine that's external to the cluster
The dashboard is exposed as a NodePort service, and I'm able to connect to it over http. I am not able to connect over https, which is a hard requirement in my case.
Following an initial "buildout" period, both the cluster and the virtual machine will not have access to the broader internet. They will connect to one another and that's it.
Polyaxon supposedly supports SSL/TLS through its own configs, but there's very little documentation on this. I've made my best attempts to solve the issue that way and also bumped an issue on their github, but haven't had any luck so far.
So I'm now wondering if there might be a more general Kubernetes hack that could help me here.
The Solutions
I'm looking for the simplest solution, rather than the most elegant or scalable. There are also some things that might make my situation simpler than the average user who would want https, namely:
It would be OK to support https on just one node, rather than every node
I don't need (or really want) a domain name; connecting at https://<ip_address>:<port> is not just OK but preferred
A self-signed certificate is also OK
So I'm hoping there's some way to manipulate the NodePort service directly such that https will work on the virtual machine. If that's not possible, other solutions I've considered are using an Ingress Controller or some sort of proxy, but those solutions are both a little half-baked in my mind. I'm a novice with both Kubernetes and networking ideas in general, so if you're going to propose something more complex please speak very slowly :)
Thanks a ton for your help!
Ingress-controller it's a standard way to expose HTTP backend over TLS connection from cluster to client.
Existing NodePort service has ClusterIP which can be used as a backend for Ingress. ClusterIP type of service is enough, so you can change service type later to prevent HTTP access via nodeIP:nodePort.
Ingress-controller allows you to teminate TLS connection or pass-through TLS traffic to the backend.
You can use self-signed certificate or use cert-manager with Let's encrypt service.
Note, that starting from 0.22.0 version Nginx-ingress rewrite syntax has changed and some examples in the articles may be outdated.
Check the links:
TLS termination
TLS/HTTPS
How to get Kubernetes Ingress to terminate SSL and proxy to service?
Configure Nginx Ingress Controller for TLS termination on Kubernetes on Azure

Multiple certificates for HTTPS on a software NLB'd IIS7 cluster

We're currently trying to set up a HTTPS with multiple certificates. We've had some limited success but we're getting some results I can't make any sense of...
Basically we have two servers on our NLB (10.0.51.51 and 10.0.51.52) and two IPs assigned to our NLB (10.0.51.2 and 10.0.51.4) and we have IIS listening on both of these IPs with a different wildcard certificates (To avoid giving out public IP's let's say A:443 routes to 10.0.51.2:443 and B:443 routes to 10.0.51.4:443). We also have a Cisco router using port address translation to route port 443 from two external IP's to these internal NLB IPs.
The weird thing is, this works if we request A:443 or B:443, but if you go internally on 10.0.51.51:443, 10.0.51.52:443, 10.0.51.2:443 or 10.0.51.4:443 you ALWAYS get the same SSL cert. This cert was in the past assigned to *:443 but we've made sure there's no * bindings anymore defined in IIS.
When i run "netsh http show sslcert" after trimming out all the irrelevant stuff I get:
IP:port : 0.0.0.0:443
Certificate Hash : <Removed: Cert 1>
IP:port : 10.0.51.2:446
Certificate Hash : <Removed: Cert 3 - Another site>
IP:port : 10.0.51.3:446
Certificate Hash : <Removed: Cert 3 - Another site>
IP:port : 10.0.51.4:443
Certificate Hash : <Removed: Cert 2>
Which tells me that the * binding is still in there, which is a bit weird, but I can't see why that would prevent the other from working (Or even more more strangely why the request through the router would work).
It's got me wondering whether it's actually treating the requests as the machine's IP rather than the NLB IP, but unfortunately our dev environment is only a single server which sorta reduces the amount of trial/error I can take to this (Since all I can test on is a live environment) without convincing management to buy more servers for the test environment - which is something I'm trying.
Does anyone have any idea:
Why there's a difference between internal and through the router?
Why the internal request is getting the wrong cert?
How I can remedy this so that we get the same behavior on both sides?
I ended up tracking the problem down. Leaving this as a hint for anyone else who falls in the same trap...
The problem was caused by us using a shared configuration model on our IIS servers. When setting up a HTTPS binding this appears to only actually bind it on the box you're managing it on (Leaving the other completely unbound). Since our * binding still existed it was catching it on the server we didn't do through the UI and just let pick up the shared config.
Crazy bad luck with single-affinity NLB sent us down the garden path after the router being the cause by making our internal requests go to one server and our external requests to another.
We ended up finding this by running "netsh http show sslcert > certs.txt" on both servers and diff'ing the outputs.
Going forwards our plan is to no longer use the IIS UI for SSL configuration instead following the steps below:
Install the certificates on each server.
Run a command-line binding of the SSL port "netsh http add sslcert ipport=?:? certhash=? appid=?" (ip:port is easy to work out, certhash can be copied from the "certificate hash" section of the server certificates page, appid can be copied from an existing IIS binding on the netsh http add sslcert)
Edit the IIS ApplicationHost.config file directly to add the bindings without the UI being involved.
Our understanding is this will prevent a repeat of this error.

Resources