Istio causing frequent disconnection in grpc stream - grpc

I am trying to setup a grpc stream from outside world into istio cluster through istio ingress. I am able to establish the connection, but I am seeing a connection reset every 60sec.
Container logs tell "rpc error: code = Unavailable desc" before breaking
Looked into ingress and envoy logs. nothing much helpful. attached them below.
INGRESS LOGS
[2020-03-06T12:14:10.221Z] "- - -" 0 - "-" "-" 2679 2552 9993 - "-" "-" "-" "-" "10.244.0.93:5448" outbound|5448||grpc-broker.x-infra.svc.cluster.local 10.244.0.116:58094 10.244.0.116:443 10.222.2.9:37864 <xxxxxx DNS NAME xxxxxxxx> -
ENVOY LOGS
[2020-03-06T12:16:28.331Z] "- - -" 0 - "-" "-" 12021 2733 50282 - "-" "-" "-" "-" "127.0.0.1:5448" inbound|5448|tcp-broker|grpc-broker.x-infra.svc.cluster.local 127.0.0.1:56816 10.244.0.93:5448 10.244.0.116:34782 outbound_.5448_._.grpc-broker.x-infra.svc.cluster.local -
Should we add anything extra to grpc stream to work?
Cluster is default mTLS enabled, Source and destination pods are deployment and not stateful set.

From the istio side to make sure that istio is not shutting down the connection.:
This could be prevented by idleTimeout setting in DestinationRule.
According to istio documentation about idleTimeout:
The idle timeout for upstream connection pool connections. The idle timeout is defined as the period in which there are no active requests. If not set, there is no idle timeout. When the idle timeout is reached the connection will be closed. Note that request based timeouts mean that HTTP/2 PINGs will not keep the connection alive. Applies to both HTTP1.1 and HTTP2 connections.
So If You make DestinationRule like this:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: grpc-iddletimeout-policy
spec:
host: grpcservice.servicenamespace.svc.cluster.local
trafficPolicy:
connectionPool:
http:
idleTimeout: 2m
This should close the any HTTP/2 connection from Istio envoy proxy side after being idle for 2 minutes for grpcservice in servicenamespace namespace.
Hope it helps.

Related

Haproxy in docker cannot access device's localhost via "host.docker.internal"

I need to configure Haproxy for local testing.
The goal is to have 3 services
haproxy in docker container
listening http app on device's (macOS) localhost
a client app sending request through haproxy to the listening app (number 2.)
docker-compose.yml configuration for the proxy is
proxy_server:
image: haproxy:2.7.0-alpine
container_name: proxy_server
user: root # I used this to install curl in the container
ports:
- '3128:80' # haproxy itself
- '20005:20005' # configured proxy in haproxy.cfg
restart: always
volumes:
- ./test/proxy_server/config:/usr/local/etc/haproxy # this maps the haproxy.cfg file into the container
extra_hosts:
- 'host.docker.internal:host-gateway' # allows the proxy to access device's localhost on linux
haproxy.cfg is
defaults
timeout client 5s
timeout connect 5s
timeout server 5s
timeout http-request 5s
listen reverse-proxy
bind *:20005
mode http
option httplog
log stdout format raw local0 debug
When the listening app (2.) listens on localhost:56454, I can connect into the haproxy container, install curl and connect to the listening app via host.docker.internal host.
/ # curl -v -I http://host.docker.internal:56454
* Trying 192.168.65.2:56454...
* Connected to host.docker.internal (192.168.65.2) port 56454 (#0)
> HEAD / HTTP/1.1
> Host: host.docker.internal:56454
> User-Agent: curl/7.86.0
> Accept: */*
This is correct.
The problem is that I am not able to send request through the proxy to the same URL (http://host.docker.internal:56454) because the proxy logs
172.22.0.1:56636 [10/Dec/2022:13:50:22.601] reverse-proxy reverse-proxy/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 1/1/0/0/0 0/0 "POST http://host.docker.internal:56454/confirmation HTTP/1.1"
and the client gets following response:
HTTP Status 503: <html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>
Also, the request passes correctly when I use the following docker-compose.yml configuration with ubuntu/squid image instead of the haproxy one
proxy_server:
image: ubuntu/squid
container_name: proxy_server
ports:
- '3128:3128'
restart: always
extra_hosts:
- 'host.docker.internal:host-gateway' # allows the proxy to access device's localhost on linux
So, I guess the problem is that haproxy somehow does not see the service on http://host.docker.internal:56454 even though the service is accessible from the container.
I've also tried ubuntu and debian versions of the haproxy image and it still does not work correctly.
Any idea how to fix it?
Edit:
Investigating <NOSRV>... No clue how to fix it yet.

why ingress nginx cannot proxy grpc when client using insecure?

path: go-client --> ingress-nginx --> grpc pod
Because all the traffic is in our private network, so we didn't buy a public Certificate, rather we use a self-signed certificate. What happened is that the first code below worked well, but the second failed. I don't know why, and I want to know what the insecure exactly means.
code that worked well:
cert, _ := credentials.NewClientTLSFromFile("./example.pem", "example.com")
conn, err := grpc.DialContext(
ctx,
"example.com:443",
grpc.WithTransportCredentials(cert),
grpc.WithBlock(),
)
code that received 400 bad request
conn, err := grpc.DialContext(
ctx,
"example.com:443",
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithBlock(),
)
nginx access log for bad request
"PRI * HTTP/2.0" 400
ingress yaml:
apiVersion: networking.k8s.io/v1
kind: Ingress
spec:
ingressClassName: nginx
tls:
- hosts: example.com
secretName: example-tls
rules:
- host: example.com
http:
paths:
- path: /foo/bar
pathType: Prefix
backend:
service: grpc-svc
port:
name: grpc-port
Package insecure provides an implementation of the credentials.TransportCredentials interface which disables transport security. More specifically, it does not perform any TLS handshaking or use any certificates.
gRPC requires that the user pass it some credentials when attempting to create the ClientConn. If your deployment does not use any certificates and you know that it is secure (based on whatever reasons), then the insecure package will be your friend. But if you are using self-signed certificates, they are still certificates and a TLS handshake needs to happen here. So, in this case, you should continue using the code that you have mentioned at the top of your question. Hope this helps.

Liveness and readiness probe connection refused

I keep getting this error when I try to setup liveness & readiness prob for my awx_web container
Liveness probe failed: Get http://POD_IP:8052/: dial tcp POD_IP:8052: connect: connection refused
Liveness & Readiness section in my deployment for the container awx_web
ports:
- name: http
containerPort: 8052 # the port of the container awx_web
protocol: TCP
livenessProbe:
httpGet:
path: /
port: 8052
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
httpGet:
path: /
port: 8052
initialDelaySeconds: 5
periodSeconds: 5
if I test if the port 8052 is open or not from another pod in the same namespace as the pod that contains the container awx_web or if I test using a container deployed in the same pod as the container awx_web i get this (port is open)
/ # nc -vz POD_IP 8052
POD_IP (POD_IP :8052) open
I get the same result (port 8052 is open) if I use netcat (nc) from the worker node where pod containing the container awx_web is deployed.
for info I use a NodePort service that redirect traffic to that container (awx_web)
type: NodePort
ports:
- name: http
port: 80
targetPort: 8052
nodePort: 30100
I recreated your issue and it looks like your problem is caused by too small value of initialDelaySeconds for the liveness probe.
It takes more than 5s for awx container to open 8052 port.
You need to wait a bit longer for it to start. I have found out that setting it to 15s is enough for me, but you may require some tweaking.
In my case this issue has occurred because I've configured the backend application host as localhost. The issue is resolved when I changed the host value to 0.0.0.0 inside my app properties.
Use the latest built docker image after making this change.
Most likely your application couldnt startup or crash little after it start up . It may due to insufficient memory and cpu resource. Or one of the awx dependency not setup correctly like postgreslq & rabbit.
Did you check that if your application works correctly without probes? I recommend do that first. Examine the pods stats little bit to ensure its not restart.

oauth2-proxy authentication calls slow on kubernetes cluster with auth annotations for nginx ingress

We have secured some of our services on the K8S cluster using the approach described on this page. Concretely, we have:
nginx.ingress.kubernetes.io/auth-url: "https://oauth2.${var.hosted_zone}/oauth2/auth"
nginx.ingress.kubernetes.io/auth-signin: "https://oauth2.${var.hosted_zone}/oauth2/start?rd=/redirect/$http_host$escaped_request_uri"
set on the service to be secured and we have followed this tutorial to only have one deployment of oauth2_proxy per cluster. We have 2 proxies set up, both with affinity to be placed on the same node as the nginx ingress.
$ kubectl get pods -o wide -A | egrep "nginx|oauth"
infra-system wer-exp-nginx-ingress-exp-controller-696f5fbd8c-bm5ld 1/1 Running 0 3h24m 10.76.11.65 ip-10-76-9-52.eu-central-1.compute.internal <none> <none>
infra-system wer-exp-nginx-ingress-exp-controller-696f5fbd8c-ldwb8 1/1 Running 0 3h24m 10.76.14.42 ip-10-76-15-164.eu-central-1.compute.internal <none> <none>
infra-system wer-exp-nginx-ingress-exp-default-backend-7d69cc6868-wttss 1/1 Running 0 3h24m 10.76.15.52 ip-10-76-15-164.eu-central-1.compute.internal <none> <none>
infra-system wer-exp-nginx-ingress-exp-default-backend-7d69cc6868-z998v 1/1 Running 0 3h24m 10.76.11.213 ip-10-76-9-52.eu-central-1.compute.internal <none> <none>
infra-system oauth2-proxy-68bf786866-vcdns 2/2 Running 0 14s 10.76.10.106 ip-10-76-9-52.eu-central-1.compute.internal <none> <none>
infra-system oauth2-proxy-68bf786866-wx62c 2/2 Running 0 14s 10.76.12.107 ip-10-76-15-164.eu-central-1.compute.internal <none> <none>
However, a simple website load usually takes around 10 seconds, compared to 2-3 seconds with the proxy annotations not being present on the secured service.
We added a proxy_cache to the auth.domain.com service which hosts our proxy by adding
"nginx.ingress.kubernetes.io/server-snippet": <<EOF
proxy_cache auth_cache;
proxy_cache_lock on;
proxy_ignore_headers Cache-Control;
proxy_cache_valid any 30m;
add_header X-Cache-Status $upstream_cache_status;
EOF
but this didn't improve the latency either. We still see all HTTP requests triggering a log line in our proxy. Oddly, only some of the requests take 5 seconds.
We are unsure if:
- the proxy forwards each request to the oauth provider (github) or
- caches the authentications
We use cookie authentication, therefore, in theory, the oauth2_proxy should just decrypt the cookie and then return a 200 to the nginx ingress. Since they are both on the same node it should be fast. But it's not. Any ideas?
Edit 1
I have analyzed the situation further. Visiting my auth server with https://oauth2.domain.com/auth in the browser and copying the request copy for curl I found that:
running 10.000 queries against my oauth server from my local machine (via curl) is very fast
running 100 requests on the nginx ingress with the same curl is slow
replacing the host name in the curl with the cluster IP of the auth service makes the performance increase drastically
setting the annotation to nginx.ingress.kubernetes.io/auth-url: http://172.20.95.17/oauth2/auth (e.g. setting the host == cluster IP) makes the GUI load as expected (fast)
it doesn't matter if the curl is run on the nginx-ingress or on any other pod (e.g. a test debian), the result is the same
Edit 2
A better fix I found was to set the annotation to the following
nginx.ingress.kubernetes.io/auth-url: "http://oauth2.infra-system.svc.cluster.local/oauth2/auth"
nginx.ingress.kubernetes.io/auth-signin: "https://oauth2.domain.com/oauth2/start?rd=/redirect/$http_host$escaped_request_uri"
The auth-url is what the ingress queries with the cookie of the user. Hence, a local DNS of the oauth2 service is the same as the external dns name, but without the SSL communication and since it's DNS, it's permanent (while the cluster IP is not)
Given that it's unlikely that someone comes up with the why this happens, I'll answer my workaround.
A fix I found was to set the annotation to the following
nginx.ingress.kubernetes.io/auth-url: "http://oauth2.infra-system.svc.cluster.local/oauth2/auth"
nginx.ingress.kubernetes.io/auth-signin: "https://oauth2.domain.com/oauth2/start?rd=/redirect/$http_host$escaped_request_uri"
The auth-url is what the ingress queries with the cookie of the user. Hence, a local DNS of the oauth2 service is the same as the external dns name, but without the SSL communication and since it's DNS, it's permanent (while the cluster IP is not)
In my opinion you observe the increased latency in response time in case of:
nginx.ingress.kubernetes.io/auth-url: "https://oauth2.${var.hosted_zone}/oauth2/auth" setting
due to the fact, that auth server URL resolves to the external service (in this case VIP of Load Balancer seating in front of Ingress Controller).
Practically it means, that you go out with the traffic outside of the cluster (so called hairpin mode), and goes back via External IP of Ingress that routes to internal ClusterIP Service (which adds extra hops), instead going directly with ClusterIP/Service DNS name (you stay within
Kubernetes cluster):
nginx.ingress.kubernetes.io/auth-url: "http://oauth2.infra-system.svc.cluster.local/oauth2/auth"

Putting kibana behind nginx-ingress fails with a HTTP error

I have deployed kibana in a kubernetes environment. If I give that a LoadBalancer type Service, I could access it fine. However, when I try to access the same via a nginx-ingress it fails. The configuration that I use in my nginx ingress is:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
name: my-ingress
spec:
rules:
- http:
paths:
- backend:
serviceName: kibana
servicePort: {{ .Values.kibanaPort }}
path: /kibana
I have launched my kibana with the following setting:
- name: SERVER_BASEPATH
value: /kibana
and I am able to access the kibana fine via the LoadBalancer IP. However when I try to access via the Ingress, most of the calls go through fine except for a GET call to vendors.bundle.js where it fails almost consistently.
The log messages in the ingress during this call is as follows:
2019/10/25 07:31:48 [error] 430#430: *21284 upstream prematurely closed connection while sending to client, client: 10.142.0.84, server: _, request: "GET /kibana/bundles/vendors.bundle.js HTTP/2.0", upstream: "http://10.20.3.5:3000/kibana/bundles/vendors.bundle.js", host: "1.2.3.4", referrer: "https://1.2.3.4/kibana/app/kibana"
10.142.0.84 - [10.142.0.84] - - [25/Oct/2019:07:31:48 +0000] "GET /kibana/bundles/vendors.bundle.js HTTP/2.0" 200 1854133 "https://1.2.3.4/kibana/app/kibana" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36" 47 13.512 [some-service] 10.20.3.5:3000 7607326 13.513 200 506f778b25471822e62fbda2e57ccd6b
I am not sure why I get the upstream prematurely closed connection while sending to client across different browsers. I have tried setting proxy-connect-timeout and proxy-read-timeout to 100 seconds and even then it fails. I am not sure if this is due to some kind of default size or chunks.
Also it is interesting to note that only some kibana calls are failing and not all are failing.
In the browser, I see the error message:
GET https://<ip>/kibana/bundles/vendors.bundle.js net::ERR_SPDY_PROTOCOL_ERROR 200
in the developer console.
Anyone has any idea on what kind of config options I need to pass to my nginx-ingress to make kibana proxy_pass fine ?
I have found the cause of the error. The vendors.bundle.js file was relatively bigger and since I was accessing from a relatively slow network, the requests were terminated. The way I fixed this is, by adding to the nginx-ingress configuration the following fields:
nginx.ingress.kubernetes.io/proxy-body-size: 10m (Change this as you need)
nginx.ingress.kubernetes.io/proxy-connect-timeout: "100"
nginx.ingress.kubernetes.io/proxy-send-timeout: "100"
nginx.ingress.kubernetes.io/proxy-read-timeout: "100"
nginx.ingress.kubernetes.io/proxy-buffering: "on"

Resources