I have a simple stream block to stream MySQL TCP traffic to Maxscale instances. 2nd instance acts as a failover only, like:
stream {
upstream maxscale {
zone upstream_maxscale 64k;
server 10.1.0.11:3307;
server 10.1.0.12:3307 backup;
}
server {
listen 3307;
proxy_pass maxscale;
}
}
When connections are low (<30), everything goes fine. But when connection are high (>40, if we can say that 40 connections are high...), nginx error log keeps complaining about something that i don't know how to debug.
recv() failed (104: Connection reset by peer) while proxying and reading from upstream, client: 10.1.0.16, server: 10.1.0.15:3307, upstream: "10.1.0.11:3307", bytes from/to client:15738/64316, bytes from/to upstream:64316/15738
I've tried play with options like reuseport, worker_connections or so_keepalive but no chances.
https://nginx.org/en/docs/stream/ngx_stream_core_module.html
https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/
Can it be a problem in the Maxscale side?
Here the Maxscale 2.4 listener:
# Listener
[listener-rw]
type=listener
service=readwritesplit
protocol=MariaDBClient
address=10.1.0.11
port=3307
ssl=required
ssl_ca_cert=/var/lib/maxscale/ssl/ca-cert.pem
ssl_cert=/var/lib/maxscale/ssl/server.pem
ssl_key=/var/lib/maxscale/ssl/server.key
ssl_version=MAX
# Service
[readwritesplit]
type=service
router=readwritesplit
servers=sql1,sql2,sql3
user=maxscale
password=324F74A347291B3BE79956AD5F4BB2FAD65E1F9052A976722917701742729400
enable_root_user=1
max_sescmd_history=150
max_slave_connections=100%
lazy_connect=true
slave_selection_criteria=LEAST_CURRENT_OPERATIONS
optimistic_trx=true
connection_keepalive=300
master_failure_mode=fail_on_write
https://nginx.org/en/docs/stream/ngx_stream_core_module.html
https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/
The MaxScale log (in /var/log/maxscale/maxscale.log) most likely contains either an answer as to why you receive such errors or at least will help you determine what the problem might be.
In case you can't find out the reason for this from the logs alone, I would suggest opening a bug report on the MariaDB Jira under the MaxScale project.
Related
I have a very simple flask app that is deployed on GKE and exposed via google external load balancer. And getting random 502 responses from the backend-service (added a custom headers on backend-service and nginx to make sure the source and I can see the backend-service's header but not nginx's)
The setup is;
LB -> backend-service -> neg -> pod (nginx -> uwsgi) where pod is the application built using flask and deployed via uwsgi and nginx.
The scenario is to handle image uploads in simple-secured way. Sender sends me a token with upload request.
My flask app
receive request and check the sent token via another service using "requests".
If token valid, proceed to handle the image and return 200
If token is not valid, stop and send back a 401 response.
First, I got suspicious about the 200 and 401's. And reverted all responses to 200. Following some of the expected responses, server starts to respond 502 and keep sending it. "Some of the messages at the very beginning succeeded".
nginx error logs contains below lines
2023/02/08 18:22:29 [error] 10#10: *145 readv() failed (104: Connection reset by peer) while reading upstream, client: 35.191.17.139, server: _, request: "POST /api/v1/imageUpload/image HTTP/1.1", upstream: "uwsgi://127.0.0.1:21270", host: "example-host.com"
my uwsgi.ini file is as below;
[uwsgi]
socket = 127.0.0.1:21270
master
processes = 8
threads = 1
buffer-size = 32768
stats = 127.0.0.1:21290
log-maxsize = 104857600
logdate
log-reopen
log-x-forwarded-for
uid = image_processor
gid = image_processor
need-app
chdir = /server/
wsgi-file = image_processor_application.py
callable = app
py-auto-reload = 1
pidfile = /tmp/uwsgi-imgproc-py.pid
my nginx.conf is as below
location ~ ^/api/ {
client_max_body_size 15M;
include uwsgi_params;
uwsgi_pass 127.0.0.1:21270;
}
Lastly, my app has a healthcheck method with simple JSON response. It does no extra stuff and simply returns. This never fails as explained above.
Edit : my nginx access logs in the pod shows the response as 401 while the client receives 502.
for those who gonna face with the same issue, the problem was post data reading (or not reading).
nginx was expecting to get post data read by the proxied, in our case uwsgi, app. But according to my logic I was not reading it in some cases and returning back the response.
Setting uwsgi post-buffering solved the issue.
post-buffering = %(16 * 1024 * 1024)
Which led me to this solution;
https://stackoverflow.com/a/26765936/631965
Nginx uwsgi (104: Connection reset by peer) while reading response header from upstream
Problem
I am getting an error message in my Kong error log reporting that the upstream server has timed out. But I know that the upstream process was just taking over a minute, and when it completes (after Kong has logged the error) it logs a java error "Broken Pipe", implying that Kong was no longer listening for the response.
This is the behavior when the upstream process takes longer than 60 seconds. In some cases, it takes less than 60 seconds and everything works correctly.
How can I extend Kong's timeout?
Details
Kong Version
1.1.2
Kong's Error Message (slightly edited):
2019/12/06 09:57:10 [error] 1421#0: *1377 upstream timed out (110: Connection timed out) while reading response header from upstream, client: xyz.xyz.xyz.xyz, server: kong, request: "POST /api/...... HTTP/1.1", upstream: "http://127.0.0.1:8010/api/.....", host: "xyz.xyz.com"
Here is the error from the upstream server log (Java / Tomcat via SpringBoot)
Dec 06 09:57:23 gateway-gw001-99 java[319]: org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
Dec 06 09:57:23 gateway-gw001-99 java[319]: at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364) ~[tomcat-embed-core-8.5.42.jar!/
Dec 06 09:57:23 gateway-gw001-99 java[319]: at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:833) ~[tomcat-embed-core-8.5.42.jar!
...
My kong.conf (slightly edited)
trusted_ips = 0.0.0.0/0
admin_listen = 0.0.0.0:8001
proxy_listen = 0.0.0.0:8080 proxy_protocol, 0.0.0.0:8443 ssl proxy_protocol
database = postgres
pg_host = 127.0.0.1
pg_port = 5432
pg_user = kong
pg_password = xyzxyzxyzxyzxyz
pg_database = kong
plugins = bundled,session
real_ip_header = proxy_protocol
A little more Context
Kong and the Upstream Server are hosted on the same Ubuntu VM
The Ubuntu VM is hosted as a linux container (LXC) inside another Ubuntu VM
The outer VM uses NGinX to receive public traffic and reverse proxies it to Kong. It does this using stream. This allows Kong to be my SSL demarcation point.
The Outer NGinX Stream Config:
stream {
server {
listen 80;
proxy_pass xyz.xyz.xyz.xyz:8080;
proxy_protocol on;
}
server {
listen 443;
proxy_pass xyz.xyz.xyz.xyz:8443;
proxy_protocol on;
}
}
What I've Tried
I've tried adding the following lines to kong.conf. In version 1.1.2 of Kong you basically alter the NGinX settings remotely by adding prefixes to NginX config and placing them in the kong.conf (https://docs.konghq.com/1.1.x/configuration/#injecting-individual-nginx-directives ). None of them seemed to do anything:
nginx_http_keepalive_timeout=300s
nginx_proxy_proxy_read_timeout=300s
nginx_http_proxy_read_timeout=300s
nginx_proxy_send_timeout=300s
nginx_http_send_timeout=300s
Per the documentation Kong Version 0.10 has three properties that you can set for managing proxy connections
upstream_connect_timeout: defines in milliseconds the timeout for
establishing a connection to your upstream service.
upstream_send_timeout: defines in milliseconds a timeout between two successive write operations for transmitting a request
to your upstream service.
upstream_read_timeout:
defines in milliseconds a timeout between two successive read
operations for receiving a request from your upstream service.
In this case, as Kong is timing out waiting for the response from the upstream you would need to add a property setting for upstream_read_timeout
In the Kong Version 1.1 documentation the Service object now includes these timeout attributes with slightly different names:
connect_timeout: The timeout in milliseconds for establishing a connection to the upstream server. Defaults to 60000.
write_timeout: The timeout in milliseconds between two successive write operations for transmitting a request to the upstream server. Defaults to 60000.
read_timeout: The timeout in milliseconds between two successive read operations for transmitting a request to the upstream server. Defaults to 60000.
if you use Kubernetes, you must specify a special annotation in service:
konghq.com/override: {{ ingressName }}
no obvious, though.
I discovered it here https://github.com/Kong/kubernetes-ingress-controller/issues/905#issuecomment-739927116
example of service:
apiVersion: v1
kind: Service
metadata:
name: websocket
annotations:
konghq.com/override: timeout-kong-ingress
spec:
selector:
app: websocket
ports:
- port: 80
targetPort: 8010
for detailed explanation please follow the link above
I'm trying to configure SSL-passthrough for multiple webapps using the same nginx server (nginx version: nginx/1.13.6), but when restarting the nginx server, I get an error complaining that
nginx: [emerg] "stream" directive is duplicate
The configuration I have is the following:
2 files for the ssl passthrough that look like this:
server1.conf:
stream {
upstream workers {
server 192.168.1.10:443;
server 192.168.1.11:443;
server 192.168.1.12:443;
}
server {
listen server1.com:8443;
proxy_pass workers;
}
}
and server2.conf:
stream {
upstream workers {
server 192.168.1.20:443;
server 192.168.1.21:443;
server 192.168.1.22:443;
}
server {
listen server2.com:8443;
proxy_pass workers;
}
}
If I remove one of the two files, then nginx starts correctly.
How can this be achieved?
Thanks,
Cristi
Streams work on Layer 5, and cannot read encrypted traffic (which is Layer 6 on the OSI model), and thus cannot tell apart requests hitting server1.com and server2.com unless they are pointing to different IPs.
This can be solved by one of the following solutions
Decrypt the traffic on nginx, then proxy-pass it to backend processes/wockers using HTTP.
Bind server1.com to a port that is different to server2.com.
Get an additional IP address and bind server2.com on that.
Get an additional load balancer and move server2.com there.
I am using nginx + uwsgi over a flask app. In nginx settings the server block is having server_name *.mydomain.com; and location block for uwsgi is like
location /api/ {
include uwsgi_params;
uwsgi_pass unix:///var/uwsgi/app.sock;
.........
}
so the issue is I can access app.mydomain.com, but when i am trying app1.mydomain.com uwsgi log is not showing any request. nginx error log is showing
upstream timed out (110: Connection timed out) while reading response header from upstream, client: 122.166.94.231, server: *.mydomain.com, request: "GET /api/client/generic/ping HTTP/1.1", upstream: "uwsgi://unix:///var/uwsgi/app.sock", host: "app1.mydomain.com
I have another test setup where all these settings are same and its working. Any pointers? When i restart uwsgi and nginx app1.mydomain.com works, until i load app.mydomain.com (initial load of app.mydomain.com fails, but if i keep on refreshing it loads then app1.mydomain.com raises 504 gateway timeout and log shows Connection timed out while reading response header from upstream).
It worked when I added single-interpreter = true in uwsgi.ini settings.
A newly added python library was causing the issue.
Don't know whether this will help others.
I also ran into the same issue. uWSGI has "http", "http-socket" and "socket" options. When putting uWSGI behind a full webserver like Nginx, we should spawn uWSGI to natively speak the uWSGI protocol:
uwsgi --socket 127.0.0.1:3031 --wsgi-file foobar.py --master --processes 4 --threads 2 --stats 127.0.0.1:9191
More details from uwsgi documentation: https://uwsgi-docs.readthedocs.io/en/latest/WSGIquickstart.html#putting-behind-a-full-webserver
Looking at the uwsgi error logs and understanding what the problem is helped me. Issue was not related to Nginx configurations at all. My email host has changed and the code threw error while calling the send email code.
I'm asking myself if it possible to reproduce NGinx proxy_next_upstream system on F5 BIG-IP.
As a reminder, here is how it works on NGinx:
Given a pool of upstream servers let's call it webservers compose by 2 instances:
upstream webservers {
server 192.168.1.10:8080 max_fails=1 fail_timeout=10s;
server 192.168.1.20:8080 max_fails=1 fail_timeout=10s;
}
With the following instruction (proxy_next_upstream error), if a tcp connection fail on first instance when routing a request (because instance is down for example), NGinx automatically forward request to the second instance (USER DOESN'T SEE ANY ERROR).
Furthermore, instance 1 is blacklisted for 10 seconds (fail_timeout=10s).
Every 10 sec, NGinx will try to route 1 request to instance 1 (to know if instance is coming back) and make the instance available again if it succeed otherwise it wait again 10 sec to try.
location / {
proxy_next_upstream error;
proxy_pass http://webservers/$1;
}
I hope I'm clear enough...
Thanks for your help.
Here is something interesting: https://support.f5.com/kb/en-us/solutions/public/10000/600/sol10640.html