Ingress-nginx on GKE configuration 502 bad gateway - nginx

I am trying to expose an mlflow model in a GKE cluster through an ingress-nginx and a google cloud load balancer.
The configuration of service to the respective deployment looks as follows:
apiVersion: v1
kind: Service
metadata:
name: model-inference-service
labels:
app: inference
spec:
ports:
- port: 5555
targetPort: 5555
selector:
app: inference
When forwarding this service to localhost using kubectl port-forward service/model-inference-service 5555:5555 I can successfully query the model by sending a test image to the api endpoint using the following script.
The url the request is sent to is http://127.0.0.1:5555/invocations.
This works as intended so I assume the deployment running the pod exposing the model and the corresponding clusterIP service model-inference-service is configured correctly.
Next, I installed ingress-nxinx into the cluster by doing
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install my-release ingress-nginx/ingress-nginx
The ingress is configured as follows (I suspect the error has to be here?):
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
# nginx.ingress.kubernetes.io/rewrite-target: /invocations
name: inference-ingress
namespace: default
labels:
app: inference
spec:
rules:
- http:
paths:
- path: /invocations
backend:
serviceName: model-inference-service
servicePort: 5555
The ingress controller pod is running successfully:
my-release-ingress-nginx-controller-6758cc8f45-fwtw7 1/1 Running 0 3h33m
In the GCP console I can see that the load balancer was created successfully as well and I can optain its IP.
When using the same test script I used before to make a request to the Rest api endpoint (previously the service was forwarded to localhost) but now with the ip of the load balancer, I get a 502 Bad Gateway error:
The url is the following now: http://34.90.4.0:80/invocations
Traceback (most recent call last):
File "test_inference.py", line 80, in <module>
run()
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "test_inference.py", line 76, in run
print(score_model(data_path, host, port).text)
File "test_inference.py", line 54, in score_model
status_code=response.status_code, text=response.text
Exception: Status Code 502. <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.19.1</center>
</body>
</html>
When accessing the same url in a browser it says:
502 Bad Gateway
nginx/1.19.1
The logs of the ingress controller state:
2020/08/26 16:06:45 [warn] 86#86: *42282 a client request body is buffered to a temporary file /tmp/client-body/0000000009, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
10.10.0.30 - - [26/Aug/2020:16:06:45 +0000] "POST /invocations HTTP/1.1" 502 157 "-" "python-requests/2.24.0" 86151 0.738 [default-model-inference-service-5555] [] 10.52.3.7:5555, 10.52.3.7:5555, 10.52.3.7:5555 0, 0, 0 0.000, 0.001, 0.000 502, 502, 502 0d86e360427c0a81c287da4ff5e907bc
To test if the ingress and the load balancer work in principle I replaced the docker image with the real rest api I want to expose with this docker image which returns "hello world" on port 5050 and path /. I changed the port and the path (from /invocations to /) in the service and ingress manifests shown above and could successfully see "hello world" when accessing the ip of the load balancer in the browser.
Does anyone see what I might have done wrong?
Thank you very much!
Best regards,
F

The configuration you have shared is looking fine. There must be something in your cluster environment that is causing this behavior. See if pod-to-pod communication is working. Launch a test pod on the same node as the Nginx ingress controller and do a curl from that pod to the target service. See if you get any DNS or Network issues. Try changing the host header when calling the service and see if it's sensitive to that.

Related

Kong/Nginx comes has bizarre upstream host

Kong/Nginx (I'm not sure which) changes upstream to unexpected IP address. I want it to leave it as localhost. The failing request was made by web browser to localhost:8443/cost-recovery.
Log message:
2020/12/15 16:50:09 [error] 88#0: *522 connect() failed (111: Connection refused) while connecting to upstream, client: 172.19.0.1, server: kong, request: "GET /login?state=/cost-recovery HTTP/1.1", upstream: "https://192.168.65.2:8444/login?state=/cost-recovery", host: "localhost:8443"
I don't know where it's getting the 192.168.65.2 host, but want it to be localhost.
I'm using pantsel/konga container which uses Kong version 1.2.3. Configuration was done in part via proxy api requests:
Service Request:
{"host":"host.docker.internal","created_at":1608009524,"connect_timeout":60000,"id":"647180a3-6f8c-41ae-9f71-c9fc9db40249","protocol":"http","name":"cost-recovery","read_timeout":60000,"port":8447,"path":null,"updated_at":1608009524,"retries":5,"write_timeout":60000,"tags":null}
Route Request:
{"id":"24100a1d-c679-46b7-93f3-552b055df26b","tags":null,"paths":["\/cost-recovery"],"destinations":null,"protocols":["https"],"created_at":1608009525,"snis":null,"hosts":null,"name":"cost-recovery-route","preserve_host":true,"regex_priority":0,"strip_path":false,"sources":null,"updated_at":1608009525,"https_redirect_status_code":302,"service":{"id":"647180a3-6f8c-41ae-9f71-c9fc9db40249"},"methods":["GET"]}
Plugin Request:
{"name": "access-validator", "protocols": ["https"], "route": { "id": "24100a1d-c679-46b7-93f3-552b055df26b"}, "config": {"redirect_login": true}}

Putting kibana behind nginx-ingress fails with a HTTP error

I have deployed kibana in a kubernetes environment. If I give that a LoadBalancer type Service, I could access it fine. However, when I try to access the same via a nginx-ingress it fails. The configuration that I use in my nginx ingress is:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
name: my-ingress
spec:
rules:
- http:
paths:
- backend:
serviceName: kibana
servicePort: {{ .Values.kibanaPort }}
path: /kibana
I have launched my kibana with the following setting:
- name: SERVER_BASEPATH
value: /kibana
and I am able to access the kibana fine via the LoadBalancer IP. However when I try to access via the Ingress, most of the calls go through fine except for a GET call to vendors.bundle.js where it fails almost consistently.
The log messages in the ingress during this call is as follows:
2019/10/25 07:31:48 [error] 430#430: *21284 upstream prematurely closed connection while sending to client, client: 10.142.0.84, server: _, request: "GET /kibana/bundles/vendors.bundle.js HTTP/2.0", upstream: "http://10.20.3.5:3000/kibana/bundles/vendors.bundle.js", host: "1.2.3.4", referrer: "https://1.2.3.4/kibana/app/kibana"
10.142.0.84 - [10.142.0.84] - - [25/Oct/2019:07:31:48 +0000] "GET /kibana/bundles/vendors.bundle.js HTTP/2.0" 200 1854133 "https://1.2.3.4/kibana/app/kibana" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36" 47 13.512 [some-service] 10.20.3.5:3000 7607326 13.513 200 506f778b25471822e62fbda2e57ccd6b
I am not sure why I get the upstream prematurely closed connection while sending to client across different browsers. I have tried setting proxy-connect-timeout and proxy-read-timeout to 100 seconds and even then it fails. I am not sure if this is due to some kind of default size or chunks.
Also it is interesting to note that only some kibana calls are failing and not all are failing.
In the browser, I see the error message:
GET https://<ip>/kibana/bundles/vendors.bundle.js net::ERR_SPDY_PROTOCOL_ERROR 200
in the developer console.
Anyone has any idea on what kind of config options I need to pass to my nginx-ingress to make kibana proxy_pass fine ?
I have found the cause of the error. The vendors.bundle.js file was relatively bigger and since I was accessing from a relatively slow network, the requests were terminated. The way I fixed this is, by adding to the nginx-ingress configuration the following fields:
nginx.ingress.kubernetes.io/proxy-body-size: 10m (Change this as you need)
nginx.ingress.kubernetes.io/proxy-connect-timeout: "100"
nginx.ingress.kubernetes.io/proxy-send-timeout: "100"
nginx.ingress.kubernetes.io/proxy-read-timeout: "100"
nginx.ingress.kubernetes.io/proxy-buffering: "on"

K8s Ingress service returning 503, nothing in Pod logs

I am running an Ingress, which supposed to connect to images inside my Pods. When I do describe it looks fine e.g.
$ kubectl describe svc solar-demo
Name: solar-demo
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"solar-demo","namespace":"default"},"spec":{"ports":[{"name":"city","port":3000...
Selector: app=solardemo
Type: ClusterIP
IP: 10.97.245.248
Port: city 3000/TCP
TargetPort: 3000/TCP
Endpoints: 172.17.0.4:3000,172.17.0.6:3000
Port: solar 3001/TCP
TargetPort: 3001/TCP
Endpoints: 172.17.0.4:3001,172.17.0.6:3001
Session Affinity: None
Events: <none>
It even correctly lists IP addresses of my images. However, when I try to reach services, I get 404 when I just ask for a root, which is fine since I do not map map / to anything and 503 error when I try to reach routes /solar and/or /city.
When I check the logs, it returns:
$ kubectl logs solar-demo-5845984b94-xp82l solar-svc
npm info it worked if it ends with ok
npm info using npm#5.6.0
npm info using node#v9.11.2
npm info lifecycle nf-images-test#1.0.0~prestart-solar-svc: nf-images-test#1.0.0
npm info lifecycle nf-images-test#1.0.0~start-solar-svc: nf-images-test#1.0.0
> nf-images-test#1.0.0 start-solar-svc /opt/app-root/src
> node solar-svc.js
{"level":30,"time":1530271233676,"msg":"Server listening at http://0.0.0.0:3001","pid":26,"hostname":"solar-demo-5845984b94-xp82l","v":1}
server listening on 3001
and the same thing for the other service:
$ kubectl logs solar-demo-5845984b94-xp82l api
npm info it worked if it ends with ok
npm info using npm#5.6.0
npm info using node#v9.11.2
npm info lifecycle nf-images-test#1.0.0~prestart-api: nf-images-test#1.0.0
npm info lifecycle nf-images-test#1.0.0~start-api: nf-images-test#1.0.0
> nf-images-test#1.0.0 start-api /opt/app-root/src
> node api.js
{"level":30,"time":1530271244205,"msg":"Server listening at http://0.0.0.0:3000","pid":21,"hostname":"solar-demo-5845984b94-xp82l","v":1}
server listening on 3000
I get 503s and images never get any requests, as if Ingress "thought" that every Pod is down or something. What could I check?
$ curl -v http://shmukler.example.com/solar
* Trying 192.168.99.101...
* Connected to shmukler.example.com (192.168.99.101) port 80 (#0)
> GET /solar HTTP/1.1
> Host: shmukler.example.com
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 503 Service Temporarily Unavailable
< Server: nginx/1.13.7
< Date: Sun, 01 Jul 2018 13:49:38 GMT
< Content-Type: text/html
< Content-Length: 213
< Connection: keep-alive
<
<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body bgcolor="white">
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx/1.13.7</center>
</body>
</html>
* Connection #0 to host shmukler.example.com left intact
Please advise.
In annotations, the config was missing:
nginx.org/server-snippet: "proxy_ssl_verify off;"

Error 502 when accessing backend inside same cluster in Kubernetes

Backend: python (Django)
Frontend: angular6
I just deployed my backend and frontend on same cluster in Google Kubernetes. They are two individual services inside same cluster. The pods on the clusters look like:
NAME READY STATUS RESTARTS AGE
backend-f4f5df588-nbc9p 1/1 Running 0 1h
frontend-85885799d9-92z5f 1/1 Running 0 1h
And the service looks like:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
backend LoadBalancer 10.3.249.148 35.232.61.116 8000:32291/TCP 26m
frontend LoadBalancer 10.3.248.72 35.224.112.111 8081:31444/TCP 3m
kubernetes ClusterIP 10.3.240.1 <none> 443/TCP 1h
My backend just works on the django server, starting by using python manage.py runserver command, everything works fine. I built the frontend and deployed on Nginx server. So there're two Docker images, one for django, one for nginx, as two pods in cluster.
Then there're two ingress for both of them. Exposing 80 port for frontend and 8000 for backend. Holding on the load balancer nginx controller. After assigning a domain, I can visit https://abc/project as front end. But when I want to make API requests, ERR 502 appears. The error message in nginx is:
38590 connect() failed (111: Connection refused) while connecting to upstream, client: 163.185.148.245, server: _, request: "GET /project/api HTTP/1.1", upstream: "http://10.0.0.30:8000/dataproject/api", host: "abc"
The upstream in error message is the correct IP for the backend service, but still gets a 502 error. I can curl from nginx server to frontend. But cannot cur to backend. Any help?
PS. Everything works fine before deployment.
Fixed. Django runserver cmd use 0.0.0.0 so it wont prevent from outside connections:
python runserver 0.0.0.0:8000

Nginx, Ansible, and uWSGI with Flask App, Internal Server Error

I have deployed my app on EC2 using the software in the title, but I am getting an Internal Server Error. Here is the tutorial I have been following.
Here is the error log for me trying to get on the application via the browser:
2014/02/17 19:48:29 [error] 26513#0: *1 connect() to unix:/tmp/uwsgi.sock failed (111: Connection refused) while connecting to upstream, client: xxx.xxx.xxx.xxx, server: localhost, request: "GET / HTTP/1.1", upstream: "uwsgi://unix:/tm p/uwsgi.sock:", host: "ec2-xx-xxx-xx-xxx.compute-1.amazonaws.com"
If your Ansible playbook is based on Matt Wright's tutorial, then all you need to do is reboot after the installation. The playbook doesn't update supervisor with the new program it installs (which is actually the upstream uWSGI referred to by the log), so the program cannot be started.

Resources