Simple gRPC envoy configuration - grpc

I'm trying to setup a envoy proxy as a gRPC fron end, and can't get it to work, so I'm trying to get to as simple a test setup as possible and build from there, but I can't get that to work either. Here's what my test setup looks like:
Python server (slightly modified gRPC example code)
# greeter_server.py
from concurrent import futures
import time
import grpc
import helloworld_pb2
import helloworld_pb2_grpc
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
class Greeter(helloworld_pb2_grpc.GreeterServicer):
def SayHello(self, request, context):
return helloworld_pb2.HelloReply(message='Hello, %s!' % request.name)
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
helloworld_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
server.add_insecure_port('[::]:8081')
server.start()
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == '__main__':
serve()
Python client (slightly modified gRPC example code)
from __future__ import print_function
import grpc
import helloworld_pb2
import helloworld_pb2_grpc
def run():
# NOTE(gRPC Python Team): .close() is possible on a channel and should be
# used in circumstances in which the with statement does not fit the needs
# of the code.
with grpc.insecure_channel('localhost:9911') as channel:
stub = helloworld_pb2_grpc.GreeterStub(channel)
response = stub.SayHello(helloworld_pb2.HelloRequest(name='you'))
print("Greeter client received: " + response.message)
if __name__ == '__main__':
run()
And then my two envoy yaml files:
# envoy-hello-server.yaml
static_resources:
listeners:
- address:
socket_address:
address: 0.0.0.0
port_value: 8811
filter_chains:
- filters:
- name: envoy.http_connection_manager
typed_config:
"#type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
codec_type: auto
stat_prefix: ingress_http
access_log:
- name: envoy.file_access_log
typed_config:
"#type": type.googleapis.com/envoy.config.accesslog.v2.FileAccessLog
path: "/dev/stdout"
route_config:
name: local_route
virtual_hosts:
- name: backend
domains:
- "*"
routes:
- match:
prefix: "/"
grpc: {}
route:
cluster: hello_grpc_service
http_filters:
- name: envoy.router
typed_config: {}
clusters:
- name: hello_grpc_service
connect_timeout: 0.250s
type: strict_dns
lb_policy: round_robin
http2_protocol_options: {}
load_assignment:
cluster_name: hello_grpc_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: hello_grpc_service
port_value: 8081
admin:
access_log_path: "/tmp/envoy_hello_server.log"
address:
socket_address:
address: 0.0.0.0
port_value: 8881
and
# envoy-hello-client.yaml
static_resources:
listeners:
- address:
socket_address:
address: 0.0.0.0
port_value: 9911
filter_chains:
- filters:
- name: envoy.http_connection_manager
typed_config:
"#type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
codec_type: auto
add_user_agent: true
access_log:
- name: envoy.file_access_log
typed_config:
"#type": type.googleapis.com/envoy.config.accesslog.v2.FileAccessLog
path: "/dev/stdout"
stat_prefix: egress_http
common_http_protocol_options:
idle_timeout: 0.840s
use_remote_address: true
route_config:
name: local_route
virtual_hosts:
- name: backend
domains:
- grpc
routes:
- match:
prefix: "/"
route:
cluster: backend-proxy
http_filters:
- name: envoy.router
typed_config: {}
clusters:
- name: backend-proxy
type: logical_dns
dns_lookup_family: V4_ONLY
lb_policy: round_robin
connect_timeout: 0.250s
http_protocol_options: {}
load_assignment:
cluster_name: backend-proxy
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: hello_grpc_service
port_value: 8811
admin:
access_log_path: "/tmp/envoy_hello_client.log"
address:
socket_address:
address: 0.0.0.0
port_value: 9991
Now, what I expect this would allow is something like hello_client.py (port 9911) -> envoy (envoy-hello-client.yaml) -> envoy (envoy-hello-server.yaml) -> hello_server.py (port 8081)
Instead, what I get is an error from the python client:
$ python3 greeter_client.py
Traceback (most recent call last):
File "greeter_client.py", line 35, in <module>
run()
File "greeter_client.py", line 30, in run
response = stub.SayHello(helloworld_pb2.HelloRequest(name='you'))
File "/usr/lib/python3/dist-packages/grpc/_channel.py", line 533, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/usr/lib/python3/dist-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNIMPLEMENTED
details = ""
debug_error_string = "{"created":"#1594770575.642032812","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"","grpc_status":12}"
>
And in the envoy client log:
[2020-07-14 16:22:10.407][16935][info][main] [external/envoy/source/server/server.cc:652] starting main dispatch loop
[2020-07-14 16:23:25.441][16935][info][runtime] [external/envoy/source/common/runtime/runtime_impl.cc:524] RTDS has finished initialization
[2020-07-14 16:23:25.441][16935][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:182] cm init: all clusters initialized
[2020-07-14 16:23:25.441][16935][info][main] [external/envoy/source/server/server.cc:631] all clusters initialized. initializing init manager
[2020-07-14 16:23:25.441][16935][info][config] [external/envoy/source/server/listener_manager_impl.cc:844] all dependencies initialized. starting workers
[2020-07-14 16:23:25.441][16935][warning][main] [external/envoy/source/server/server.cc:537] there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
[2020-07-14T23:49:35.641Z] "POST /helloworld.Greeter/SayHello HTTP/2" 200 NR 0 0 0 - "10.0.0.56" "grpc-python/1.16.1 grpc-c/6.0.0 (linux; chttp2; gao)" "aa72310a-3188-46b2-8cbf-9448b074f7ae" "localhost:9911" "-"
And nothing in the server log.
Also, weirdly, this is an almost one second delay between when I run the python client and when the log message shows up in the client envoy.
What am I missing to make these two scripts talk via envoy?

I know I'm bit late, hope this helps someone. Since you are grpc server is running in the same host you could specify hostname to be host.docker.internal (previous docker.for.mac.localhost deprecated from docker v18.03.0)
In your case if you are running in a dockerized environment you could do the following:
Envoy version: 1.13+
clusters:
- name: backend-proxy
type: logical_dns
dns_lookup_family: V4_ONLY
lb_policy: round_robin
connect_timeout: 0.250s
http_protocol_options: {}
load_assignment:
cluster_name: backend-proxy
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: host.docker.internal
port_value: 8811
hello_grpc_service won't be resolved to IP in dockerized environment.
Note: you could enable envoy trace log level for detailed logs

Related

artifactory docker push error unknown: Method Not Allowed

we are using artifactory pro license and installed artifactory through helm on kubernetes.
when we create a docker local repo(The Repository Path Method) and push docker image,
we get 405 method not allowed errror. (docker login/ pull is working normally.)
########## error msg
# docker push art2.bee0dev.lge.com/docker-local/hello-world
e07ee1baac5f: Pushing [==================================================>] 14.85kB
unknown: Method Not Allowed
##########
we are using haproxy load balancer that is used for TLS, in front of Nginx Ingress Controller.
(nginx ingress controller's http nodeport is 31071)
please help us how can we solve the problem.
The artifactory and haproxy settings are as follows.
########## value.yaml
global:
joinKeySecretName: "artbee-stg-joinkey-secret"
masterKeySecretName: "artbee-stg-masterkey-secret"
storageClass: "sa-stg-netapp8300-bee-blk-nonretain"
ingress:
enabled: true
defaultBackend:
enabled: false
hosts: ["art2.bee0dev.lge.com"]
routerPath: /
artifactoryPath: /artifactory/
className: ""
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/proxy-body-size: "0"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
nginx.ingress.kubernetes.io/configuration-snippet: |
proxy_pass_header Server;
proxy_set_header X-JFrog-Override-Base-Url https://art2.bee0dev.lge.com;
labels: {}
tls: []
additionalRules: []
## Artifactory license.
artifactory:
name: artifactory
replicaCount: 1
image:
registry: releases-docker.jfrog.io
repository: jfrog/artifactory-pro
# tag:
pullPolicy: IfNotPresent
labels: {}
updateStrategy:
type: RollingUpdate
migration:
enabled: false
customInitContainersBegin: |
- name: "init-mount-permission-setup"
image: "{{ .Values.initContainerImage }}"
imagePullPolicy: "{{ .Values.artifactory.image.pullPolicy }}"
securityContext:
runAsUser: 0
runAsGroup: 0
allowPrivilegeEscalation: false
capabilities:
drop:
- NET_RAW
command:
- 'bash'
- '-c'
- if [ $(ls -la /var/opt/jfrog | grep artifactory | awk -F' ' '{print $3$4}') == 'rootroot' ]; then
echo "mount permission=> root:root";
echo "change mount permission to 1030:1030 " {{ .Values.artifactory.persistence.mountPath }};
chown -R 1030:1030 {{ .Values.artifactory.persistence.mountPath }};
else
echo "already set. No change required.";
ls -la {{ .Values.artifactory.persistence.mountPath }};
fi
volumeMounts:
- mountPath: "{{ .Values.artifactory.persistence.mountPath }}"
name: artifactory-volume
database:
maxOpenConnections: 80
tomcat:
maintenanceConnector:
port: 8091
connector:
maxThreads: 200
sendReasonPhrase: false
extraConfig: 'acceptCount="100"'
customPersistentVolumeClaim: {}
license:
## licenseKey is the license key in plain text. Use either this or the license.secret setting
licenseKey: "???"
secret:
dataKey:
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "20Gi"
cpu: "8"
javaOpts:
xms: "1g"
xmx: "12g"
admin:
ip: "127.0.0.1"
username: "admin"
password: "!swiit123"
secret:
dataKey:
service:
name: artifactory
type: ClusterIP
loadBalancerSourceRanges: []
annotations: {}
persistence:
mountPath: "/var/opt/jfrog/artifactory"
enabled: true
accessMode: ReadWriteOnce
size: 100Gi
type: file-system
storageClassName: "sa-stg-netapp8300-bee-blk-nonretain"
nginx:
enabled: false
##########
########## haproxy config
frontend cto-stage-http-frontend
bind 10.185.60.75:80
bind 10.185.60.76:80
bind 10.185.60.201:80
bind 10.185.60.75:443 ssl crt /etc/haproxy/ssl/bee0dev.lge.com.pem ssl-min-ver TLSv1.2 ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
bind 10.185.60.76:443 ssl crt /etc/haproxy/ssl/bee0dev.lge.com.pem ssl-min-ver TLSv1.2 ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
bind 10.185.60.201:443 ssl crt /etc/haproxy/ssl/bee0dev.lge.com.pem ssl-min-ver TLSv1.2 ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
mode http
option forwardfor
option accept-invalid-http-request
acl k8s-cto-stage hdr_end(host) -i -f /etc/haproxy/web-ide/cto-stage
use_backend k8s-cto-stage-http if k8s-cto-stage
backend k8s-cto-stage-http
mode http
redirect scheme https if !{ ssl_fc }
option tcp-check
balance roundrobin
server lgestgbee04v 10.185.60.78:31071 check fall 3 rise 2
##########
The request doesn't seem to be landing at the correct endpoint. Please remove the semi-colon from the docker command and retry again.
docker push art2.bee0dev.lge.com;/docker-local/hello-world
Try executing it like below,
docker push art2.bee0dev.lge.com/docker-local/hello-world

Send the trace data of a website using Jaeger and Opentelemetry to Opensearch

I'm working on the observability part of Opensearch so I'm trying to collect the trace data of a wordpress website and send it to Opensearch.
I'm collecting the trace data using the wordpress plugin Decalog, this later sends the data to Jaeger agent, then from jaeger i'm sending the data to Opentelemetry and then to Data prepper and lastly to Opensearch.
Jaeger agent service in docker-compose :
jaeger-agent:
container_name: jaeger-agent
image: jaegertracing/jaeger-agent:latest
command: [ "--reporter.grpc.host-port=otel-collector:14250" ]
ports:
- "5775:5775/udp"
- "6831:6831/udp"
- "6832:6832/udp"
- "5778:5778/tcp"
networks:
- our-network
The "command" ligne got me this error : Err: connection error: desc = "transport: Error while dialing dial tcp: lookup otel-collector on 127.0.0.11:53: server misbehaving"","system":"grpc","grpc_log":true
So I changed otel-collector to the IP of the otel-collector container.
Otel collector and data prepper are installed using docker-compose.
data-prepper:
restart: unless-stopped
container_name: data-prepper
image: opensearchproject/data-prepper:latest
volumes:
- ./data-prepper/examples/trace_analytics_no_ssl.yml:/usr/share/data-prepper/pipelines.yaml
- ./data-prepper/examples/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml
- ./data-prepper/examples/demo/root-ca.pem:/usr/share/data-prepper/root-ca.pem
ports:
- "21890:21890"
networks:
- our-network
depends_on:
- "opensearch"
otel-collector:
container_name: otel-collector
image: otel/opentelemetry-collector:0.54.0
command: [ "--config=/etc/otel-collector-config.yml" ]
working_dir: "/project"
volumes:
- ${PWD}/:/project
- ./otel-collector-config.yml:/etc/otel-collector-config.yml
- ./data-prepper/examples/demo/demo-data-prepper.crt:/etc/demo-data-prepper.crt
ports:
- "4317:4317"
depends_on:
- data-prepper
networks:
- our-network
The configuration of otel.yaml (to send data from opentelemetry to opensearch):
receivers:
jaeger:
protocols:
grpc:
exporters:
otlp/2:
endpoint: data-prepper:21890
tls:
insecure: true
insecure_skip_verify: true
logging:
service:
pipelines:
traces:
receivers: [jaeger]
exporters: [logging, otlp/2]
The configuration for data prepper pipeline : entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
prepper:
- otel_trace_raw_prepper:
sink:
- opensearch:
hosts: [ "http://localhost:9200" ]
cert: "/usr/share/data-prepper/root-ca.pem"
username: "admin"
password: "admin"
trace_analytics_raw: true
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
prepper:
- service_map_stateful:
sink:
- opensearch:
hosts: ["http://localhost:9200"]
cert: "/usr/share/data-prepper/root-ca.pem"
username: "admin"
password: "admin"
trace_analytics_service_map: true
As of now I'm getting the following errors:
Jaeger agent :
Err: connection error: desc = \"transport: Error while dialing dial tcp otel-collector-container-IP:14250: i/o timeout\"","system":"grpc","grpc_log":true}
Open telemetry collector :
2022-08-04T15:31:32.675Z info pipelines/pipelines.go:78 Exporter is starting... {"kind": "exporter", "data_type": "traces", "name": "logging"}
2022-08-04T15:31:32.675Z info pipelines/pipelines.go:82 Exporter started. {"kind": "exporter", "data_type": "traces", "name": "logging"}
2022-08-04T15:31:32.675Z info pipelines/pipelines.go:78 Exporter is starting... {"kind": "exporter", "data_type": "traces", "name": "otlp/2"}
2022-08-04T15:31:32.682Z info pipelines/pipelines.go:82 Exporter started. {"kind": "exporter", "data_type": "traces", "name": "otlp/2"}
2022-08-04T15:31:32.682Z info pipelines/pipelines.go:86 Starting processors...
2022-08-04T15:31:32.682Z info pipelines/pipelines.go:98 Starting receivers...
2022-08-04T15:31:32.682Z info pipelines/pipelines.go:102 Exporter is starting... {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-08-04T15:31:32.683Z info static/strategy_store.go:203 No sampling strategies provided or URL is unavailable, using defaults {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-08-04T15:31:32.683Z info pipelines/pipelines.go:106 Exporter started. {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-08-04T15:31:32.683Z info service/collector.go:220 Starting otelcol... {"Version": "0.54.0", "NumCPU": 2}
2022-08-04T15:31:32.683Z info service/collector.go:128 Everything is ready. Begin running and processing data.
2022-08-04T15:31:32.684Z warn zapgrpc/zapgrpc.go:191 [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
"Addr": "data-prepper:21890",
"ServerName": "data-prepper:21890",
"Attributes": null,
"BalancerAttributes": null,
"Type": 0,
"Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp data-prepper-container-ip:21890: connect: connection refused" {"grpc_log": true}
Data prepper :
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.amazon.dataprepper.DataPrepper]: Constructor threw exception; nested exception is java.lang.RuntimeException: No valid pipeline is available for execution, exiting
Followed by this at the end :
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-08-04T15:23:22,803 [main] INFO com.amazon.dataprepper.parser.config.DataPrepperAppConfiguration - Command line args: /usr/share/data-prepper/pipelines.yaml,/usr/share/data-prepper/data-prepper-config.yaml
2022-08-04T15:23:22,806 [main] INFO com.amazon.dataprepper.parser.config.DataPrepperArgs - Using /usr/share/data-prepper/pipelines.yaml configuration file
Opensearch needs a separate tool to support ingestion of Opentelemetry data. It is called DataPrepper and is part of the Opensearch project. There is a nice getting started guide on how to set up trace analytics in Opensearch.
DataPrepper works similar as Fluentd or the Opentelemetry Collector, but has proper support for Opensearch as a data sink. It pre-processes trace data adequately for the Opensearch Dashboards UI tracing plugin. DataPrepper also supports the Opentelemetry metrics format.
Are you still having issues running Data Prepper? The configuration used in this example has been updated since the latest release, and should now be up to date and working (https://github.com/opensearch-project/data-prepper/blob/main/examples/trace_analytics_no_ssl.yml)

Envoy: "upstream connect error or disconnect/reset before headers. reset reason: connection failure"

I'm newbie in envoy.
I have been struggling during a week with error below. So my downstream(server which requests for some data/update) receives response:
Status code: 503
Headers:
...
Server:"envoy"
X-Envoy-Response-Code-Details:"upstream_reset_before_response_started{connection_failure}"
X-Envoy-Response-Flags: "UF,URX"
Body: upstream connect error or disconnect/reset before headers. reset reason: connection failure
On the other side, my upstream gets disconnection(context cancelled).
And upstream service doesn't return 503 codes at all.
All network is going by http1.
My envoy.yaml:
admin:
access_log_path: /tmp/admin_access.log
address:
socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 80 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"#type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: http1
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: [ "*" ]
response_headers_to_add: # added for debugging
- header:
key: x-envoy-response-code-details
value: "%RESPONSE_CODE_DETAILS%"
- header:
key: x-envoy-response-flags
value: "%RESPONSE_FLAGS%"
routes:
- match: # consistent routing
safe_regex:
google_re2: { }
regex: SOME_STRANGE_REGEX_FOR_CONSISTENT_ROUTING
route:
cluster: consistent_cluster
hash_policy:
header:
header_name: ":path"
regex_rewrite:
pattern:
google_re2: { }
regex: SOME_STRANGE_REGEX_FOR_CONSISTENT_ROUTING
substitution: "\\1"
retry_policy: # attempt to avoid 503 errors by retries
retry_on: "connect-failure,refused-stream,unavailable,cancelled,resource-exhausted,retriable-status-codes"
retriable_status_codes: [ 503 ]
num_retries: 3
retriable_request_headers:
- name: ":method"
exact_match: "GET"
- match: { prefix: "/" } # default routing (all routes except consistent)
route:
cluster: default_cluster
retry_policy: # attempt to avoid 503 errors by retries
retry_on: "connect-failure,refused-stream,unavailable,cancelled,resource-exhausted,retriable-status-codes"
retriable_status_codes: [ 503 ]
retry_host_predicate:
- name: envoy.retry_host_predicates.previous_hosts
host_selection_retry_max_attempts: 3
http_filters:
- name: envoy.filters.http.router
clusters:
- name: consistent_cluster
connect_timeout: 0.05s
type: STRICT_DNS
dns_refresh_rate: 1s
dns_lookup_family: V4_ONLY
lb_policy: MAGLEV
health_checks:
- timeout: 1s
interval: 1s
unhealthy_threshold: 1
healthy_threshold: 1
http_health_check:
path: "/health"
load_assignment:
cluster_name: consistent_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: consistent-host
port_value: 80
- name: default_cluster
connect_timeout: 0.05s
type: STRICT_DNS
dns_refresh_rate: 1s
dns_lookup_family: V4_ONLY
lb_policy: ROUND_ROBIN
health_checks:
- timeout: 1s
interval: 1s
unhealthy_threshold: 1
healthy_threshold: 1
http_health_check:
path: "/health"
outlier_detection: # attempt to avoid 503 errors by ejecting unhealth pods
consecutive_gateway_failure: 1
load_assignment:
cluster_name: default_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: default-host
port_value: 80
I also tried to add logs:
access_log:
- name: accesslog
typed_config:
"#type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: /tmp/http_access.log
log_format:
text_format: "[%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE% %CONNECTION_TERMINATION_DETAILS% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% \"%REQ(X-FORWARDED-FOR)%\" \"%REQ(USER-AGENT)%\" \"%REQ(X-REQUEST-ID)%\" \"%REQ(:AUTHORITY)%\" \"%UPSTREAM_HOST%\"\n"
filter:
status_code_filter:
comparison:
op: GE
value:
default_value: 500
runtime_key: access_log.access_error.status
It gave me nothing, because %CONNECTION_TERMINATION_DETAILS% is always empty("-") and response flags I have seen already from headers in downstream responses.
I increased connect_timeout twice (0.01s -> 0.02s -> 0.05s). It didn't help at all. And other services(by direct routing) work okay with connect timeout 10ms.
BTW everything works nice after redeploy during approximately 20 minutes for sure.
Hope to hear your ideas what it can be and where i should dig into)
P.S: I also receive health check errors sometimes(in logs), but i have no idea why. And everything without envoy worked well(no errors, no timeouts): health checking, direct requests, etc.
I experienced a similar problem when starting envoy as a docker container. In the end, the reason was a missing --network host option in the docker run command which lead to the clusters not being visible from within envoy's docker container. Maybe this helps you, too?

Struggling to make gRPC-web and HTTPS work

I've got a SPA web app that uses gRPC web and envoy to proxy back to a server that speaks gRPC. This all works great, no problems.
I'm trying to make this secure using HTTPS/TLS and just keep running into issues and can't make it work.
Our setup is this:
Web Client SPA app (accessed from web node.js server also running on the lahinch server. URL is https://lahinch.mycorp.com ). Web app connects to the envoy proxy using this address "https://coxos.mycorp.COM:8090"
\
Envoy Proxy (coxos - 172.16.0.116) - listens on port 8090 and proxies to port 50251
\
\
Backend gRPC server (lahinch - 172.16.0.109) - listens on port 50251
From reading the envoy docs, the web client is downstream and the backend server is upstream.
Here is my envoy.yaml file
admin:
access_log_path: /tmp/admin_access.log
address:
socket_address:
address: 0.0.0.0
port_value: 9901
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 8090
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
'#type': >-
type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
access_log:
- name: envoy.access_loggers.file
typed_config:
'#type': >-
type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: /dev/stdout
- name: envoy.access_loggers.http_grpc
typed_config:
'#type': >-
type.googleapis.com/envoy.extensions.access_loggers.grpc.v3.HttpGrpcAccessLogConfig
common_config:
log_name: envoygrpclog
grpc_service:
envoy_grpc:
cluster_name: controlweb_backendservice
transport_api_version: V3
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains:
- '*'
routes:
- match:
prefix: /
route:
cluster: controlweb_backendservice
hash_policy:
- header:
header_name: x-session-hash
max_stream_duration:
grpc_timeout_header_max: 300s
cors:
allow_origin_string_match:
- safe_regex:
google_re2: {}
regex: .*
allow_methods: 'GET, PUT, DELETE, POST, OPTIONS'
allow_headers: >-
keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,grpc-status-details-bin,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout,access-token,x-session-hash
expose_headers: >-
grpc-status-details-bin,grpc-status,grpc-message,access-token
max_age: '1728000'
http_filters:
- name: envoy.filters.http.grpc_web
typed_config:
'#type': >-
type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb
- name: envoy.filters.http.cors
typed_config:
'#type': >-
type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
- name: envoy.filters.http.router
typed_config:
'#type': >-
type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
# https://www.envoyproxy.io/docs/envoy/v1.17.0/api-v3/extensions/transport_sockets/tls/v3/tls.proto#extensions-transport-sockets-tls-v3-downstreamtlscontext
"#type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain:
# Certificate must be PEM-encoded
filename: /etc/lahinch.pem
private_key:
filename: /etc/lahinch.key.pem
validation_context:
trusted_ca:
filename: /etc/ssl/certs/ZZZ-CA256.pem
clusters:
- name: controlweb_backendservice
type: LOGICAL_DNS
connect_timeout: 0.25s
dns_lookup_family: V4_ONLY
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: cluster_controlweb_backendservice
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: lahinch.mycorp.com
port_value: 50251
http2_protocol_options: {}
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
'#type': >-
type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain:
filename: /etc/lahinch.pem
private_key:
filename: /etc/lahinch.key.pem
validation_context:
trusted_ca:
filename: /etc/ssl/certs/ZZZ-CA256.pem
Using this, I'm getting the following in the envoy log when I try and run my web app:
[2021-04-09 22:08:33.939][17][debug][conn_handler] [source/server/connection_handler_impl.cc:501] [C2] new connection
[2021-04-09 22:08:33.945][17][debug][http] [source/common/http/conn_manager_impl.cc:254] [C2] new stream
[2021-04-09 22:08:33.945][17][debug][http] [source/common/http/conn_manager_impl.cc:886] [C2][S3055347406573314092] request headers complete (end_stream=false):
':authority', 'coxos.mycorp.com:8090'
':path', '/WanderAuth.HostService/LogIn'
':method', 'POST'
'connection', 'keep-alive'
'content-length', '124'
'accept', 'application/grpc-web-text'
'x-user-agent', 'grpc-web-javascript/0.1'
'access-token', ''
'x-grpc-web', '1'
'user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36 Edg/89.0.774.68'
'grpc-timeout', '90000m'
'content-type', 'application/grpc-web-text'
'origin', 'https://lahinch.mycorp.com'
'sec-fetch-site', 'same-site'
'sec-fetch-mode', 'cors'
'sec-fetch-dest', 'empty'
'referer', 'https://lahinch.mycorp.com/'
'accept-encoding', 'gzip, deflate, br'
'accept-language', 'en-US,en;q=0.9'
[2021-04-09 22:08:33.946][17][debug][router] [source/common/router/router.cc:425] [C2][S3055347406573314092] cluster 'controlweb_backendservice' match for URL '/WanderAuth.HostService/LogIn'
[2021-04-09 22:08:33.946][17][debug][router] [source/common/router/router.cc:582] [C2][S3055347406573314092] router decoding headers:
':authority', 'coxos.mycorp.com:8090'
':path', '/WanderAuth.HostService/LogIn'
':method', 'POST'
':scheme', 'https'
'accept', 'application/grpc-web-text'
'x-user-agent', 'grpc-web-javascript/0.1'
'access-token', ''
'x-grpc-web', '1'
'user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36 Edg/89.0.774.68'
'grpc-timeout', '90000m'
'content-type', 'application/grpc'
'origin', 'https://lahinch.mycorp.com'
'sec-fetch-site', 'same-site'
'sec-fetch-mode', 'cors'
'sec-fetch-dest', 'empty'
'referer', 'https://lahinch.mycorp.com/'
'accept-encoding', 'gzip, deflate, br'
'accept-language', 'en-US,en;q=0.9'
'x-forwarded-proto', 'https'
'x-request-id', 'a4a041ab-dc29-4ed7-a342-90ac03b3be3c'
'te', 'trailers'
'grpc-accept-encoding', 'identity'
'x-envoy-expected-rq-timeout-ms', '15000'
[2021-04-09 22:08:33.946][17][debug][pool] [source/common/http/conn_pool_base.cc:79] queueing stream due to no available connections
[2021-04-09 22:08:33.946][17][debug][pool] [source/common/conn_pool/conn_pool_base.cc:106] creating a new connection
[2021-04-09 22:08:33.946][17][debug][client] [source/common/http/codec_client.cc:41] [C3] connecting
[2021-04-09 22:08:33.946][17][debug][connection] [source/common/network/connection_impl.cc:860] [C3] connecting to 172.16.0.109:50251
[2021-04-09 22:08:33.946][17][debug][connection] [source/common/network/connection_impl.cc:876] [C3] connection in progress
[2021-04-09 22:08:33.946][17][debug][http2] [source/common/http/http2/codec_impl.cc:1184] [C3] updating connection-level initial window size to 268435456
[2021-04-09 22:08:33.946][17][debug][http] [source/common/http/filter_manager.cc:755] [C2][S3055347406573314092] request end stream
[2021-04-09 22:08:33.947][17][debug][connection] [source/common/network/connection_impl.cc:666] [C3] connected
[2021-04-09 22:08:33.947][17][debug][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:224] [C3] TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2021-04-09 22:08:33.948][17][debug][connection] [source/common/network/connection_impl.cc:241] [C3] closing socket: 0
[2021-04-09 22:08:33.948][17][debug][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:224] [C3] TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2021-04-09 22:08:33.948][17][debug][client] [source/common/http/codec_client.cc:99] [C3] disconnect. resetting 0 pending requests
[2021-04-09 22:08:33.948][17][debug][pool] [source/common/conn_pool/conn_pool_base.cc:343] [C3] client disconnected, failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2021-04-09 22:08:33.948][17][debug][router] [source/common/router/router.cc:1026] [C2][S3055347406573314092] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2021-04-09 22:08:33.948][17][debug][http] [source/common/http/filter_manager.cc:839] [C2][S3055347406573314092] Sending local reply with details upstream_reset_before_response_started{connection failure,TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER}
[2021-04-09 22:08:33.948][17][debug][http] [source/common/http/conn_manager_impl.cc:1484] [C2][S3055347406573314092] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '190'
'content-type', 'application/grpc-web-text+proto'
'access-control-allow-origin', 'https://lahinch.mycorp.com'
'access-control-expose-headers', 'grpc-status-details-bin,grpc-status,grpc-message,access-token'
'date', 'Fri, 09 Apr 2021 22:08:33 GMT'
'server', 'envoy'
[2021-04-09 22:08:36.139][9][debug][upstream] [source/common/upstream/logical_dns_cluster.cc:101] starting async DNS resolution for lahinch.mycorp.com
[2021-04-09 22:08:36.139][9][debug][main] [source/server/server.cc:199] flushing stats
[2021-04-09 22:08:36.141][9][debug][upstream] [source/common/upstream/logical_dns_cluster.cc:109] async DNS resolution complete for lahinch.mycorp.com
[2021-04-09 22:08:36.141][9][debug][upstream] [source/common/upstream/logical_dns_cluster.cc:155] DNS refresh rate reset for lahinch.mycorp.com, refresh rate 5000 ms
So the error appears to be this: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
I've looked up this error and it appears to be be related to security and certificates. But I haven't been able to find a good answer as to what I'm doing wrong.
When it comes to the required certs, should the certs be the same that is used by client(downstream), the proxy or the backend(upstream server) or both? I've tried using different certs for the different servers and the same certs for the servers and I still get the same error.

Setting up Ingress (Kubernetes)

I want to set up an Ingress, which routes traffic to my underlying Services. Unfortunately, I get an error when I deploy my ingress-controller-deployment.yaml and I don't know why... The pod with the ingress-controller crashes immediately, with the error message "CrashLoopBackOff".
With my understanding the Ingress-Control has to be deployed in a Pod and this pod can be accessed through the ingress-svc. The ingress-svc seems to work, but the Pod crashes. After the ingress-controller works I need an additional file that defines the routes and everything. But I don't see the point of continuing with out a working and deployable ingress-controller.
Pod description:
Name: ingress-controller-7749c785f-x94ll
Namespace: ingress
Node: gke-cluster-1-default-pool-8484e77d-r4wp/10.128.0.2
Start Time: Thu, 26 Apr 2018 14:25:04 +0200
Labels: k8s-app=nginx-ingress-lb
pod-template-hash=330573419
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"ingress","name":"ingress-controller-7749c785f","uid":"d8ff0a6d-494c-11e8-a840
-420...
Status: Running
IP: 10.8.0.14
Created By: ReplicaSet/ingress-controller-7749c785f
Controlled By: ReplicaSet/ingress-controller-7749c785f
Containers:
nginx-ingress-controller:
Container ID: docker://5654c7dffc44510132cba303d66ee570280f2cec235e4d4fa6ef8ad543e0c91d
Image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0
Image ID: docker-pullable://quay.io/kubernetes-ingress-controller/nginx-ingress-controller#sha256:39cc6ce23e5bcdf8aa78bc28bbcfe0999e449bf99fe2e8d60984b417facc5cd4
Ports: 80/TCP, 443/TCP
Args:
/nginx-ingress-controller
--admin-backend-svc=$(POD_NAMESPACE)/admin-backend
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 26 Apr 2018 14:26:57 +0200
Finished: Thu, 26 Apr 2018 14:26:57 +0200
Ready: False
Restart Count: 4
Liveness: http-get http://:10254/healthz delay=10s timeout=5s period=10s #success=1 #failure=3
Environment:
POD_NAME: ingress-controller-7749c785f-x94ll (v1:metadata.name)
POD_NAMESPACE: ingress (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-plbss (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-plbss:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-plbss
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Ingress-controller-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ingress-controller
spec:
replicas: 1
revisionHistoryLimit: 3
template:
metadata:
labels:
k8s-app: nginx-ingress-lb
spec:
containers:
- args:
- /nginx-ingress-controller
- "--admin-backend-svc=$(POD_NAMESPACE)/admin-backend"
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0"
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 5
name: nginx-ingress-controller
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: ingress-svc
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: http
- name: https
port: 443
targetPort: https
selector:
k8s-app: nginx-ingress-lb
The issue is the args. The args on one of mine are
args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-http-backend
- --configmap=$(POD_NAMESPACE)/nginx-configuration
- --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- --udp-services-configmap=$(POD_NAMESPACE)/udp-services
- --publish-service=$(POD_NAMESPACE)/ingress-nginx
- --annotations-prefix=nginx.ingress.kubernetes.io
I had also created the config maps for configuration, tcp and udp.

Resources