Need help troubleshooting Istio IngressGateway HTTP ERROR 503

Need help troubleshooting Istio IngressGateway HTTP ERROR 503 - http

My Test Environment Cluster has the following configurations :
Global Mesh Policy (Installed as part of cluster setup by our org) : output of kubectl describe MeshPolicy default
Name: default
Namespace:
Labels: operator.istio.io/component=Pilot
operator.istio.io/managed=Reconcile
operator.istio.io/version=1.5.6
release=istio
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"authentication.istio.io/v1alpha1","kind":"MeshPolicy","metadata":{"annotations":{},"labels":{"operator.istio.io/component":...
API Version: authentication.istio.io/v1alpha1
Kind: MeshPolicy
Metadata:
Creation Timestamp: 2020-07-23T17:41:55Z
Generation: 1
Resource Version: 1088966
Self Link: /apis/authentication.istio.io/v1alpha1/meshpolicies/default
UID: d3a416fa-8733-4d12-9d97-b0bb4383c479
Spec:
Peers:
Mtls:
Events: <none>
The above configuration I believe enables services to receive connections in mTls mode.
DestinationRule : Output of kubectl describe DestinationRule commerce-mesh-port -n istio-system
Name: commerce-mesh-port
Namespace: istio-system
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"networking.istio.io/v1alpha3","kind":"DestinationRule","metadata":{"annotations":{},"name":"commerce-mesh-port","namespace"...
API Version: networking.istio.io/v1beta1
Kind: DestinationRule
Metadata:
Creation Timestamp: 2020-07-23T17:41:59Z
Generation: 1
Resource Version: 33879
Self Link: /apis/networking.istio.io/v1beta1/namespaces/istio-system/destinationrules/commerce-mesh-port
UID: 4ef0d49a-88d9-4b40-bb62-7879c500240a
Spec:
Host: *
Ports:
Name: commerce-mesh-port
Number: 16443
Protocol: TLS
Traffic Policy:
Tls:
Mode: ISTIO_MUTUAL
Events: <none>
Istio Ingress-Gateway :
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: finrpt-gateway
namespace: finrpt
spec:
selector:
istio: ingressgateway # use Istio's default ingress gateway
servers:
- port:
name: https
number: 443
protocol: https
tls:
mode: SIMPLE
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
privateKey: /etc/istio/ingressgateway-certs/tls.key
hosts:
- "*"
- port:
name: http
number: 80
protocol: http
tls:
httpsRedirect: true
hosts:
- "*"
I created a secret to be used for TLS and using that to terminate the TLS traffic at the gateway (as configured in mode SIMPLE)
Next, I configured my VirtualService in the same namespace and did a URL match for HTTP :
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: finrpt-virtualservice
namespace: finrpt
spec:
hosts:
- "*"
gateways:
- finrpt-gateway
http:
- match:
- queryParams:
target:
exact: "commercialprocessor"
ignoreUriCase: true
route:
- destination:
host: finrpt-commercialprocessor
port:
number: 8118
The Service CommercialProcessor (ClusterIP) is expecting traffic on HTTP/8118.
With the above setting in place, when I browse to the External IP of my Ingress-Gateway, first I get a certificate error (expected as I am using self-signed for testing) and then on proceeding I get HTTP Error 503.
I am not able to find any useful logs in the gateway, I am wondering if the gateway is unable to communicate to my VirtualService in plaintext (TLS termination) and it is expecting https but I have put it as http?
Any help is highly appreciated, I am very new to Istio and I think I might be missing something naive here.
My expectation is : I should be able to hit the Gateway with https, gateway does the termination and forwards the unencrypted traffic to the destination configured in the VirtualService on HTTP port based on URL regex match ONLY (I have to keep URL match part constant here).

As 503 often occurs and it´s hard to find the issue I set up little troubleshooting answer, there are another questions with 503 error which I encountered for several months with answers, useful informations from istio documentation and things I would check.
Examples with 503 error:
Istio 503:s between (Public) Gateway and Service
IstIO egress gateway gives HTTP 503 error
Istio Ingress Gateway with TLS termination returning 503 service unavailable
how to terminate ssl at ingress-gateway in istio?
Accessing service using istio ingress gives 503 error when mTLS is enabled
Common cause of 503 errors from istio documentation:
https://istio.io/docs/ops/best-practices/traffic-management/#avoid-503-errors-while-reconfiguring-service-routes
https://istio.io/docs/ops/common-problems/network-issues/#503-errors-after-setting-destination-rule
https://istio.io/latest/docs/concepts/traffic-management/#working-with-your-applications
Few things I would check first:
Check services ports name, Istio can route correctly the traffic if it knows the protocol. It should be <protocol>[-<suffix>] as mentioned in istio
documentation.
Check mTLS, if there are any problems caused by mTLS, usually those problems would result in error 503.
Check if istio works, I would recommend to apply bookinfo application example and check if it works as expected.
Check if your namespace is injected with kubectl get namespace -L istio-injection
If the VirtualService using the subsets arrives before the DestinationRule where the subsets are defined, the Envoy configuration generated by Pilot would refer to non-existent upstream pools. This results in HTTP 503 errors until all configuration objects are available to Pilot.
Hope you find this useful.

Related

Different hosts (Pods in Kubernetes) responds with different certificate for the same hostname

I have a werid problem - when asking for my internal hostname, xxx.home.arpa via e.g openssl s_client -connect xxx.home.arpa:443 one (example) pod
- image: docker.io/library/node:8.17.0-slim
name: node
args:
- "86400"
command:
- sleep
is getting response with DEFAULT NGINX INGRESS CERTIFICATE.
Other pod in the same namespace for the same command is getting response with my custom certificate.
Question:
Why one pod RECEIVES different cert for the same request?
For the purpose of this problem, please assume that cert-manager and certs should be properly configured - they are working in most of the system, it's only few pods that are misbehaving
Configuration: k8s nginx ingress, calico CNI, custom coredns svc which manages DNS responses (might be important?), my own CA authority.
e:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: ca-issuer
kubernetes.io/ingress.class: nginx
creationTimestamp: "2022-03-13T06:54:17Z"
generation: 1
name: gerrit-ingress
namespace: gerrit
resourceVersion: "739842"
uid: f22034ab-0ed8-4779-b01e-2738e6f63eb7
spec:
rules:
- host: gerrit.home.arpa
http:
paths:
- backend:
service:
name: gerrit-gerrit-service
port:
number: 80
pathType: ImplementationSpecific
tls:
- hosts:
- gerrit.home.arpa
secretName: gerrit-tls
status:
loadBalancer:
ingress:
- ip: 192.168.10.2
Most of the configuration (Except DNS) is up here.

As it turns out, my initial guesses were far off - particular container had a set of tools which were both configured to not send servername (Or not support SNI at all, which was the problem), specifically yarn:1.x and openssl:1.0.x.
The problem was with SNI of course, newer openssl or curl do use -servername by default satisfying SNI requirements.
To this I've considered two solutions:
Wildcard DNS for the clients that do not support SNI, which is easier but does not feel secure
TLS termination with reverse proxy allowing me to transparently use client with SNI support, which I haven't yet tried.
I went with wildcard DNS, though I don't feel that this should be done in prod. :)

Deploy Spring MVC application using GPC: Cloud SQL, Kubernetes (Service and Ingress) and HTTP(S) Load Balancer with Google Managed Certificate

Let me explain what the deployment consists of. First of all I created a Cloud SQL db by importing some data. To connect the db to the application I used cloud-sql-proxy and so far everything works.
I created a kubernetes cluster in which there is a pod containing the Docker container of the application that I want to depoly and so far everything works ... To reach the application in https then I followed several online guides (https://cloud.google.com/load-balancing/docs/ssl-certificates/google-managed-certs#console , https://cloud.google.com/load-balancing/docs/ssl-certificates/google-managed-certs#console , etc.), all converge on using a service and an ingress kubernetes. The first one maps the 8080 of spring to the 80 while the second one creates a load balacer that exposes a frontend in https. I configured a health-check, I created a certificate (google managed) associated to a domain which maps the static ip assigned to the ingress.
Apparently everything works but as soon as you try to reach from the browser the address https://example.org/ you are correctly redirected to the login page ( http://example.org/login ) but as you can see it switches to the HTTP protocol and obviously a 404 is returned by google since http is disabled. Forcing https on the address to which it redirects you then ( https://example.org/login ) for some absurd reason adds "www" in front of the domain name ( https://www.example.org/login ). If you try not to use the domain by switching to the static IP the www problem disappears... However, every time you make a request in HTTPS it keeps changing to HTTP.
P.S. the general goal would be to have http up to the load balancer (google's internal network) and then have https between the load balancer and the client.
Can anyone help me? If it helps I post the yaml file of the deployment. Thank you very much!
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: my-app # Label for the Deployment
name: my-app # Name of Deployment
spec:
minReadySeconds: 60 # Number of seconds to wait after a Pod is created and its status is Ready
selector:
matchLabels:
run: my-app
template: # Pod template
metadata:
labels:
run: my-app # Labels Pods from this Deployment
spec: # Pod specification; each Pod created by this Deployment has this specification
containers:
- image: eu.gcr.io/my-app/my-app-production:latest # Application to run in Deployment's Pods
name: my-app-production # Container name
# Note: The following line is necessary only on clusters running GKE v1.11 and lower.
# For details, see https://cloud.google.com/kubernetes-engine/docs/how-to/container-native-load-balancing#align_rollouts
ports:
- containerPort: 8080
protocol: TCP
- image: gcr.io/cloudsql-docker/gce-proxy:1.17
name: cloud-sql-proxy
command:
- "/cloud_sql_proxy"
- "-instances=my-app:europe-west6:my-app-cloud-sql-instance=tcp:3306"
- "-credential_file=/secrets/service_account.json"
securityContext:
runAsNonRoot: true
volumeMounts:
- name: my-app-service-account-secret-volume
mountPath: /secrets/
readOnly: true
volumes:
- name: my-app-service-account-secret-volume
secret:
secretName: my-app-service-account-secret
terminationGracePeriodSeconds: 60 # Number of seconds to wait for connections to terminate before shutting down Pods
---
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: my-app-health-check
spec:
healthCheck:
checkIntervalSec: 60
port: 8080
type: HTTP
requestPath: /health/check
---
apiVersion: v1
kind: Service
metadata:
name: my-app-svc # Name of Service
annotations:
cloud.google.com/neg: '{"ingress": true}' # Creates a NEG after an Ingress is created
cloud.google.com/backend-config: '{"default": "my-app-health-check"}'
spec: # Service's specification
type: ClusterIP
selector:
run: my-app # Selects Pods labelled run: neg-demo-app
ports:
- port: 80 # Service's port
protocol: TCP
targetPort: 8080
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: my-app-ing
annotations:
kubernetes.io/ingress.global-static-ip-name: "my-static-ip"
ingress.gcp.kubernetes.io/pre-shared-cert: "example-org"
kubernetes.io/ingress.allow-http: "false"
spec:
backend:
serviceName: my-app-svc
servicePort: 80
tls:
- secretName: example-org
hosts:
- example.org
---

As I mention in the comment section, you can redirect HTTP to HTTPS.
Google Cloud have quite good documentation and you can find there step by step guides, including firewall configurations or tests. You can find this guide here.
I would also suggest you to read also docs like:
Traffic management overview for external HTTP(S) load balancers
Setting up traffic management for external HTTP(S) load balancers
Routing and traffic management
As alternative you could check Nginx Ingress with proper annotation (force-ssl-redirect). Some examples can be found here.

Using self-signed certificates in nginx Ingress

I'm migrating services into a kubernetes cluster on minikube, these services require a self-signed certificate on load, accessing the service via NodePort works perfectly and demands the certificate in the browser (picture below), but accessing via the ingress host (the domain is modified locally in /etc/hosts) provides me with a Kubernetes Ingress Controller Fake Certificate by Acme and skips my self-signed cert without any message.
The SSLs should be decrypted inside the app and not in the Ingress, and the tls-acme: "false" flag does not work and still gives me the fake cert
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
# decryption of tls occurs in the backend service
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
nginx.ingress.kubernetes.io/tls-acme: "false"
spec:
rules:
- host: admin.domain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: admin-service
port:
number: 443
when signing in it should show the following before loading:
minikube version: v1.15.1
kubectl version: 1.19
using ingress-nginx 3.18.0

The problem turned out to be a bug on Minikube, and also having to enable ssl passthrough in the nginx controller (in addition to the annotation) with the flag --enable-ssl-passthrough=true.
I was doing all my cluster testing on a Minikube cluster version v1.15.1 with kubernetes v1.19.4 where ssl passthrough failed, and after following the guidance in the ingress-nginx GitHub issue, I discovered that the issue didn't replicate in kind, so I tried deploying my app on a new AWS cluster (k8 version 1.18) and everything worked great.

Two ingress controller on same K8S cluster

I have installed the following two different ingress controllers on my DigitalOcean managed K8S cluster:
Nginx
Istio
and they have been assigned to two different IP addresses. My question is, if it is wrong to have two different ingress controllers on the same K8S cluster?
The reason, why I have done it, because nginx is for tools like harbor, argocd, etc. and istio for microservices.
I have also figured out, when both are installed alongside each other, sometimes during the deployment, the K8S suddenly goes down.
For example, I have deployed:
apiVersion: v1
kind: Service
metadata:
name: hello-kubernetes-first
namespace: dev
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 8080
selector:
app: hello-kubernetes-first
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-kubernetes-first
namespace: dev
spec:
replicas: 3
selector:
matchLabels:
app: hello-kubernetes-first
template:
metadata:
labels:
app: hello-kubernetes-first
spec:
containers:
- name: hello-kubernetes
image: paulbouwer/hello-kubernetes:1.7
ports:
- containerPort: 8080
env:
- name: MESSAGE
value: Hello from the first deployment!
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: istio
name: helloworld-ingress
namespace: dev
spec:
rules:
- host: hello.service.databaker.io
http:
paths:
- path: /*
backend:
serviceName: hello-kubernetes-first
servicePort: 80
---
Then I've got:
Error from server (InternalError): error when creating "istio-app.yml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: dial tcp 10.245.107.175:443: i/o timeout

You have raised several points - before answering your question, let's take a step back.
K8s Ingress not recommended by Istio
It is important to note how Istio does not recommend using K8s Ingress:
Using the Istio Gateway, rather than Ingress, is recommended to make use of the full feature set that Istio offers, such as rich traffic management and security features.
Ref: https://istio.io/latest/docs/tasks/traffic-management/ingress/kubernetes-ingress/
As noted, Istio Gateway (Istio IngressGateway and EgressGateway) acts as the edge, which you can find more in https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/.
Multiple endpoints within Istio
If you need to assign one public endpoint for business requirement, and another for monitoring (such as Argo CD, Harbor as you mentioned), you can achieve that by using Istio only. There are roughly 2 approaches to this.
Create separate Istio IngressGateways - one for main traffic, and another for monitoring
Create one Istio IngressGateway, and use Gateway definition to handle multiple access patterns
Both are valid, and depending on requirements, you may need to choose one way or the other.
As to the Approach #2., it is where Istio's traffic management system shines. It is a great example of Istio's power, but the setup is slightly complex if you are new to it. So here goes an example.
Example of Approach #2
When you create Istio IngressGateway by following the default installation, it would create istio-ingressgateway like below (I overly simplified YAML definition):
apiVersion: v1
kind: Service
metadata:
labels:
app: istio-ingressgateway
istio: ingressgateway
name: istio-ingressgateway
namespace: istio-system
# ... other attributes ...
spec:
type: LoadBalancer
# ... other attributes ...
This LB Service would then be your endpoint. (I'm not familiar with DigitalOcean K8s env, but I suppose they would handle LB creation.)
Then, you can create Gateway definition like below:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: your-gateway
namespace: istio-system
spec:
selector:
app: istio-ingressgateway
istio: ingressgateway
servers:
- port:
number: 3000
name: https-your-system
protocol: HTTPS
hosts:
- "your-business-domain.com"
- "*.monitoring-domain.com"
# ... other attributes ...
You can then create 2 or more VirtualService definitions.
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: business-virtsvc
spec:
gateways:
- istio-ingressgateway.istio-system.svc.cluster.local
hosts:
- "your-business-domain.com"
http:
- match:
- port: 3000
route:
- destination:
host: some-business-pod
port:
number: 3000
# ... other attributes ...
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: monitoring-virtsvc
spec:
gateways:
- istio-ingressgateway.istio-system.svc.cluster.local
hosts:
- "harbor.monitoring-domain.com"
http:
- match:
- port: 3000
route:
- destination:
host: harbor-pod
port:
number: 3000
# ... other attributes ...
NOTE: The above is assuming a lot of things, such as port mapping, traffic handling, etc.. Please check out the official doc for details.
So, back to the question after long detour:
Question: [Is it] wrong to have two different ingress controllers on the same K8S cluster[?]
I believe it is OK, though this can cause an error like you are seeing, as two ingress controller fight for the K8s Ingress resource.
As mentioned above, if you are using Istio, it's better to stick with Istio IngressGateway instead of K8s Ingress. If you need K8s Ingress for some specific reason, you could use other Ingress controller for K8s Ingress, like Nginx.
As to the error you saw, it's coming from Nginx deployed webhook, that ingress-nginx-controller-admission.nginx.svc is not available. This means you have created a K8s Ingress helloworld-ingress with kubernetes.io/ingress.class: istio annotation, but Nginx webhook is interfering with K8s Ingress handling. The webhook is then failing to handle the resource, as the Pod / Svc responsible for webhook traffic is not found.
The error itself just says something is unhealthy in K8s - potentially not enough Node allocated to the cluster, and thus Pod allocation not happening. It's also good to note that Istio does require some CPU and memory footprint, which may be putting more pressure to the cluster.

Both products have distinct characteristics and solve different type of problems. So, no issue in having both installed on your cluster.
To call them Ingress Controller is not correct:
- Nginx is a well known web server
- Nginx ingress controller is an implementation of a Kubernetes Ingress controller based on Nginx (Load balancing, HTTPS termination, authentication, traffic routing , etc)
- Istio is a service mesh (well known to microservice architecture and used to address cross cutting concerns in a standard way - things like, logging, tracing, Https termination, etc - at the POD level)
Can you provide more details to what you mean by "K8S suddenly goes down". Are you talking about the cluster nodes or the PODs running inside?
Thanks.

Have you looked specifying the ingress.class (kubernetes.io/ingress.class: "nginx" ), like mentioned here? - https://kubernetes.github.io/ingress-nginx/user-guide/multiple-ingress/

Gitlab kubernetes integration

I have a custom kubernetes cluster on a serve with public IP and DNS pointing to it (also wildcard).
Gitlab was configured with the cluster following this guide: https://gitlab.touch4it.com/help/user/project/clusters/index#add-existing-kubernetes-cluster
However, after installing Ingress, the ingress endpoint is never detected:
I tried patching the object in k8s, like so
externalIPs: (was empty)
- 1.2.3.4
externalTrafficPolicy: local (was cluster)
I suspect that the problem is empty ingress (scroll to the end) object then calling:
# kubectl get service ingress-nginx-ingress-controller -n gitlab-managed-apps -o yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2019-11-20T08:57:18Z"
labels:
app: nginx-ingress
chart: nginx-ingress-1.22.1
component: controller
heritage: Tiller
release: ingress
name: ingress-nginx-ingress-controller
namespace: gitlab-managed-apps
resourceVersion: "3940"
selfLink: /api/v1/namespaces/gitlab-managed-apps/services/ingress-nginx-ingress-controller
uid: c175afcc-0b73-11ea-91ec-5254008dd01b
spec:
clusterIP: 10.107.35.248
externalIPs:
- 1.2.3.4 # (public IP)
externalTrafficPolicy: Local
healthCheckNodePort: 30737
ports:
- name: http
nodePort: 31972
port: 80
protocol: TCP
targetPort: http
- name: https
nodePort: 31746
port: 443
protocol: TCP
targetPort: https
selector:
app: nginx-ingress
component: controller
release: ingress
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer: {}
But Gitlab still cant find the ingress endpoint. I tried restarting cluster and Gitlab.
The network inspection in Gitlab always shows this response:
...
name ingress
status installed
status_reason null
version 1.22.1
external_ip null
external_hostname null
update_available false
can_uninstall false
...
Any ideas how to have a working Ingress Endpoint?
GitLab: 12.4.3 (4d477238500) k8s: 1.16.3-00

I had the exact same issue as you, and I finally figured out how to solve it.
The first to understand, is that on bare metal, you can't make it working without using MetalLB, because it calls the required Kubernetes APIs making it accepting the IP address you give to the Service of LoadBalancer type.
So first step is to deploy MetalLB to your cluster.
Then you need to have another machine, running a service like NGiNX or HAproxy or whatever can do some load balancing.
Last but not least, you have to give the Load Balancer machine IP address to MetalLB so that it can assign it to the Service.
Usually MetalLB requires a range of IP addresses, but you can also give one IP address like I did:
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: staging-public-ips
protocol: layer2
addresses:
- 1.2.3.4/32
This way, MetalLB will assign the IP address to the Service with type LoadBalancer and Gitlab will finally find the IP address.
WARNING: MetalLB will assign only once an IP address. If you need many Service with type LoadBalancer, you will need many machines running NGiNX/HAproxy and so on and add its IP address in the MetalLB addresses pool.
For your information, I've posted all the technical details to my Gitlab issue here.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex