How to set timeout in Kong v1.1.2 - nginx

Problem
I am getting an error message in my Kong error log reporting that the upstream server has timed out. But I know that the upstream process was just taking over a minute, and when it completes (after Kong has logged the error) it logs a java error "Broken Pipe", implying that Kong was no longer listening for the response.
This is the behavior when the upstream process takes longer than 60 seconds. In some cases, it takes less than 60 seconds and everything works correctly.
How can I extend Kong's timeout?
Details
Kong Version
1.1.2
Kong's Error Message (slightly edited):
2019/12/06 09:57:10 [error] 1421#0: *1377 upstream timed out (110: Connection timed out) while reading response header from upstream, client: xyz.xyz.xyz.xyz, server: kong, request: "POST /api/...... HTTP/1.1", upstream: "http://127.0.0.1:8010/api/.....", host: "xyz.xyz.com"
Here is the error from the upstream server log (Java / Tomcat via SpringBoot)
Dec 06 09:57:23 gateway-gw001-99 java[319]: org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
Dec 06 09:57:23 gateway-gw001-99 java[319]: at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364) ~[tomcat-embed-core-8.5.42.jar!/
Dec 06 09:57:23 gateway-gw001-99 java[319]: at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:833) ~[tomcat-embed-core-8.5.42.jar!
...
My kong.conf (slightly edited)
trusted_ips = 0.0.0.0/0
admin_listen = 0.0.0.0:8001
proxy_listen = 0.0.0.0:8080 proxy_protocol, 0.0.0.0:8443 ssl proxy_protocol
database = postgres
pg_host = 127.0.0.1
pg_port = 5432
pg_user = kong
pg_password = xyzxyzxyzxyzxyz
pg_database = kong
plugins = bundled,session
real_ip_header = proxy_protocol
A little more Context
Kong and the Upstream Server are hosted on the same Ubuntu VM
The Ubuntu VM is hosted as a linux container (LXC) inside another Ubuntu VM
The outer VM uses NGinX to receive public traffic and reverse proxies it to Kong. It does this using stream. This allows Kong to be my SSL demarcation point.
The Outer NGinX Stream Config:
stream {
server {
listen 80;
proxy_pass xyz.xyz.xyz.xyz:8080;
proxy_protocol on;
}
server {
listen 443;
proxy_pass xyz.xyz.xyz.xyz:8443;
proxy_protocol on;
}
}
What I've Tried
I've tried adding the following lines to kong.conf. In version 1.1.2 of Kong you basically alter the NGinX settings remotely by adding prefixes to NginX config and placing them in the kong.conf (https://docs.konghq.com/1.1.x/configuration/#injecting-individual-nginx-directives ). None of them seemed to do anything:
nginx_http_keepalive_timeout=300s
nginx_proxy_proxy_read_timeout=300s
nginx_http_proxy_read_timeout=300s
nginx_proxy_send_timeout=300s
nginx_http_send_timeout=300s

Per the documentation Kong Version 0.10 has three properties that you can set for managing proxy connections
upstream_connect_timeout: defines in milliseconds the timeout for
establishing a connection to your upstream service.
upstream_send_timeout: defines in milliseconds a timeout between two successive write operations for transmitting a request
to your upstream service.
upstream_read_timeout:
defines in milliseconds a timeout between two successive read
operations for receiving a request from your upstream service.
In this case, as Kong is timing out waiting for the response from the upstream you would need to add a property setting for upstream_read_timeout
In the Kong Version 1.1 documentation the Service object now includes these timeout attributes with slightly different names:
connect_timeout: The timeout in milliseconds for establishing a connection to the upstream server. Defaults to 60000.
write_timeout: The timeout in milliseconds between two successive write operations for transmitting a request to the upstream server. Defaults to 60000.
read_timeout: The timeout in milliseconds between two successive read operations for transmitting a request to the upstream server. Defaults to 60000.

if you use Kubernetes, you must specify a special annotation in service:
konghq.com/override: {{ ingressName }}
no obvious, though.
I discovered it here https://github.com/Kong/kubernetes-ingress-controller/issues/905#issuecomment-739927116
example of service:
apiVersion: v1
kind: Service
metadata:
name: websocket
annotations:
konghq.com/override: timeout-kong-ingress
spec:
selector:
app: websocket
ports:
- port: 80
targetPort: 8010
for detailed explanation please follow the link above

Related

GCP deployment with nginx - uwsgi - flask fails

I have a very simple flask app that is deployed on GKE and exposed via google external load balancer. And getting random 502 responses from the backend-service (added a custom headers on backend-service and nginx to make sure the source and I can see the backend-service's header but not nginx's)
The setup is;
LB -> backend-service -> neg -> pod (nginx -> uwsgi) where pod is the application built using flask and deployed via uwsgi and nginx.
The scenario is to handle image uploads in simple-secured way. Sender sends me a token with upload request.
My flask app
receive request and check the sent token via another service using "requests".
If token valid, proceed to handle the image and return 200
If token is not valid, stop and send back a 401 response.
First, I got suspicious about the 200 and 401's. And reverted all responses to 200. Following some of the expected responses, server starts to respond 502 and keep sending it. "Some of the messages at the very beginning succeeded".
nginx error logs contains below lines
2023/02/08 18:22:29 [error] 10#10: *145 readv() failed (104: Connection reset by peer) while reading upstream, client: 35.191.17.139, server: _, request: "POST /api/v1/imageUpload/image HTTP/1.1", upstream: "uwsgi://127.0.0.1:21270", host: "example-host.com"
my uwsgi.ini file is as below;
[uwsgi]
socket = 127.0.0.1:21270
master
processes = 8
threads = 1
buffer-size = 32768
stats = 127.0.0.1:21290
log-maxsize = 104857600
logdate
log-reopen
log-x-forwarded-for
uid = image_processor
gid = image_processor
need-app
chdir = /server/
wsgi-file = image_processor_application.py
callable = app
py-auto-reload = 1
pidfile = /tmp/uwsgi-imgproc-py.pid
my nginx.conf is as below
location ~ ^/api/ {
client_max_body_size 15M;
include uwsgi_params;
uwsgi_pass 127.0.0.1:21270;
}
Lastly, my app has a healthcheck method with simple JSON response. It does no extra stuff and simply returns. This never fails as explained above.
Edit : my nginx access logs in the pod shows the response as 401 while the client receives 502.
for those who gonna face with the same issue, the problem was post data reading (or not reading).
nginx was expecting to get post data read by the proxied, in our case uwsgi, app. But according to my logic I was not reading it in some cases and returning back the response.
Setting uwsgi post-buffering solved the issue.
post-buffering = %(16 * 1024 * 1024)
Which led me to this solution;
https://stackoverflow.com/a/26765936/631965
Nginx uwsgi (104: Connection reset by peer) while reading response header from upstream

Nginx load balancing Maxscale to failover

I have a simple stream block to stream MySQL TCP traffic to Maxscale instances. 2nd instance acts as a failover only, like:
stream {
upstream maxscale {
zone upstream_maxscale 64k;
server 10.1.0.11:3307;
server 10.1.0.12:3307 backup;
}
server {
listen 3307;
proxy_pass maxscale;
}
}
When connections are low (<30), everything goes fine. But when connection are high (>40, if we can say that 40 connections are high...), nginx error log keeps complaining about something that i don't know how to debug.
recv() failed (104: Connection reset by peer) while proxying and reading from upstream, client: 10.1.0.16, server: 10.1.0.15:3307, upstream: "10.1.0.11:3307", bytes from/to client:15738/64316, bytes from/to upstream:64316/15738
I've tried play with options like reuseport, worker_connections or so_keepalive but no chances.
https://nginx.org/en/docs/stream/ngx_stream_core_module.html
https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/
Can it be a problem in the Maxscale side?
Here the Maxscale 2.4 listener:
# Listener
[listener-rw]
type=listener
service=readwritesplit
protocol=MariaDBClient
address=10.1.0.11
port=3307
ssl=required
ssl_ca_cert=/var/lib/maxscale/ssl/ca-cert.pem
ssl_cert=/var/lib/maxscale/ssl/server.pem
ssl_key=/var/lib/maxscale/ssl/server.key
ssl_version=MAX
# Service
[readwritesplit]
type=service
router=readwritesplit
servers=sql1,sql2,sql3
user=maxscale
password=324F74A347291B3BE79956AD5F4BB2FAD65E1F9052A976722917701742729400
enable_root_user=1
max_sescmd_history=150
max_slave_connections=100%
lazy_connect=true
slave_selection_criteria=LEAST_CURRENT_OPERATIONS
optimistic_trx=true
connection_keepalive=300
master_failure_mode=fail_on_write
https://nginx.org/en/docs/stream/ngx_stream_core_module.html
https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/
The MaxScale log (in /var/log/maxscale/maxscale.log) most likely contains either an answer as to why you receive such errors or at least will help you determine what the problem might be.
In case you can't find out the reason for this from the logs alone, I would suggest opening a bug report on the MariaDB Jira under the MaxScale project.

Nginx memcached with fallback to remote service

I can't get Nginx working with memcached module, the requirement is to query remote service, cache data in memcached and never fetch remote endpoint until backend invalidates the cache. I have 2 containers with memcached v1.4.35 and one with Nginx v1.11.10.
The configuration is the following:
upstream http_memcached {
server 172.17.0.6:11211;
server 172.17.0.7:11211;
}
upstream remote {
server api.example.com:443;
keepalive 16;
}
server {
listen 80;
location / {
set $memcached_key "$uri?$args";
memcached_pass http_memcached;
error_page 404 502 504 = #remote;
}
location #remote {
internal;
proxy_pass https://remote;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
I tried to set memcached upstream incorrectly but I get HTTP 499 instead and warnings:
*3 upstream server temporarily disabled while connecting to upstream
It seems with described configuration Nginx can reach memcached successfully but can't write or read from it. I can write and read to memcached with telnet successfully.
Can you help me please?
My guesses on what's going on with your configuration
1. 499 codes
HTTP 499 is nginx' custom code meaning the client terminated connection before receiving the response (http://lxr.nginx.org/source/src/http/ngx_http_request.h#0120)
We can easily reproduce it, just
nc -k -l 172.17.0.6 172.17.0.6:11211
and curl your resource - curl will hang for a while and then press Ctrl+C — you'll have this message in your access logs
2. upstream server temporarily disabled while connecting to upstream
It means nginx didn't manage to reach your memcached and just removed it from the pool of upstreams. Suffice is to shutdown both memcached servers and you'd constantly see it in your error logs (I see it every time with error_log ... info).
As you see these messages your assumption that nginx can freely communicate with memcached servers doesn't seem to be true.
Consider explicitly setting http://nginx.org/en/docs/http/ngx_http_memcached_module.html#memcached_bind
and use the -b option with telnet to make sure you're correctly testing memcached servers for availability via your telnet client
3. nginx can reach memcached successfully but can't write or read from it
Nginx can only read from memcached via its built-in module
(http://nginx.org/en/docs/http/ngx_http_memcached_module.html):
The ngx_http_memcached_module module is used to obtain responses from
a memcached server. The key is set in the $memcached_key variable. A
response should be put in memcached in advance by means external to
nginx.
4. overall architecture
It's not fully clear from your question how the overall schema is supposed to work.
nginx's upstream uses weighted round-robin by default.
That means your memcached servers will be queried once at random.
You can change it by setting memcached_next_upstream not_found so a missing key will be considered an error and all of your servers will be polled. It's probably ok for a farm of 2 servers, but unlikely is it what your want for 20 servers
the same is ordinarily the case for memcached client libraries — they'd pick a server out of a pool according to some hashing scheme => so your key would end up on only 1 server out of the pool
5. what to do
I've managed to set up a similar configuration in 10 minutes on my local box - it works as expected. To mitigate debugging I'd get rid of docker containers to avoid networking overcomplication, run 2 memcached servers on different ports in single-threaded mode with -vv option to see when requests are reaching them (memcached -p 11211 -U o -vv) and then play with tail -f and curl to see what's really happening in your case.
6. working solution
nginx config:
https and http/1.1 is not used here but it doesn't matter
upstream http_memcached {
server 127.0.0.1:11211;
server 127.0.0.1:11212;
}
upstream remote {
server 127.0.0.1:8080;
}
server {
listen 80;
server_name server.lan;
access_log /var/log/nginx/server.access.log;
error_log /var/log/nginx/server.error.log info;
location / {
set $memcached_key "$uri?$args";
memcached_next_upstream not_found;
memcached_pass http_memcached;
error_page 404 = #remote;
}
location #remote {
internal;
access_log /var/log/nginx/server.fallback.access.log;
proxy_pass http://remote;
proxy_set_header Connection "";
}
}
server.py:
this is my dummy server (python):
from random import randint
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
return 'Hello: {}\n'.format(randint(1, 100000))
This is how to run it (just need to install flask first)
FLASK_APP=server.py [flask][2] run -p 8080
filling in my first memcached server:
$ telnet 127.0.0.1 11211
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
set /? 0 900 5
cache
STORED
quit
Connection closed by foreign host.
checking:
note that we get a result every time although we stored data
only in the first server
$ curl http://server.lan && echo
cache
$ curl http://server.lan && echo
cache
$ curl http://server.lan && echo
cache
this one is not in the cache so we'll get a response from server.py
$ curl http://server.lan/?q=1 && echo
Hello: 32337
whole picture:
the 2 windows on the right are
memcached -p 11211 -U o -vv
and
memcached -p 11212 -U o -vv

nginx Connection timed out while reading response header from upstream

I am using nginx + uwsgi over a flask app. In nginx settings the server block is having server_name *.mydomain.com; and location block for uwsgi is like
location /api/ {
include uwsgi_params;
uwsgi_pass unix:///var/uwsgi/app.sock;
.........
}
so the issue is I can access app.mydomain.com, but when i am trying app1.mydomain.com uwsgi log is not showing any request. nginx error log is showing
upstream timed out (110: Connection timed out) while reading response header from upstream, client: 122.166.94.231, server: *.mydomain.com, request: "GET /api/client/generic/ping HTTP/1.1", upstream: "uwsgi://unix:///var/uwsgi/app.sock", host: "app1.mydomain.com
I have another test setup where all these settings are same and its working. Any pointers? When i restart uwsgi and nginx app1.mydomain.com works, until i load app.mydomain.com (initial load of app.mydomain.com fails, but if i keep on refreshing it loads then app1.mydomain.com raises 504 gateway timeout and log shows Connection timed out while reading response header from upstream).
It worked when I added single-interpreter = true in uwsgi.ini settings.
A newly added python library was causing the issue.
Don't know whether this will help others.
I also ran into the same issue. uWSGI has "http", "http-socket" and "socket" options. When putting uWSGI behind a full webserver like Nginx, we should spawn uWSGI to natively speak the uWSGI protocol:
uwsgi --socket 127.0.0.1:3031 --wsgi-file foobar.py --master --processes 4 --threads 2 --stats 127.0.0.1:9191
More details from uwsgi documentation: https://uwsgi-docs.readthedocs.io/en/latest/WSGIquickstart.html#putting-behind-a-full-webserver
Looking at the uwsgi error logs and understanding what the problem is helped me. Issue was not related to Nginx configurations at all. My email host has changed and the code threw error while calling the send email code.

Gitlab timeouts / slow on initial page loads

I am running Gitlab on Debian using the package from the Repository. Most of the time Gitlab is running very fast, but after longer idle times Gitlab is very slow or even times out (error 502). One time I also had a timeout on a remote git access (could not reproduce the issue - timeout on the internal API).
In my setup the the Debian machine is behind another nginx proxy which also serves some other services just fine. I did the gitlab-cli checks and everything seems fine.
In the error log of my reverse proxy I only see connection timeouts:
[error] 8643#0: *4139 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.1.1.10, server: gitlab.mydomain.tld, request: "GET / HTTP/1.1", upstream: "http://{SERVER-IP}:80/", host: "gitlab.mydomain.tld"
I can see some errors in my unicorn_stderr.log
E, [2016-03-30T19:40:20.183991 #783] ERROR -- : worker=1 PID:16798 timeout (61s > 60s), killing
E, [2016-03-30T19:40:20.194969 #783] ERROR -- : reaped #<Process::Status: pid 16798 SIGKILL (signal 9)> worker=1
I, [2016-03-30T19:40:20.197554 #16871] INFO -- : worker=1 spawned pid=16871
I, [2016-03-30T19:40:20.197909 #16871] INFO -- : worker=1 ready
E, [2016-03-30T20:08:42.911429 #783] ERROR -- : worker=0 PID:16866 timeout (61s > 60s), killing
E, [2016-03-30T20:08:43.191151 #783] ERROR -- : reaped #<Process::Status: pid 16866 SIGKILL (signal 9)> worker=0
I, [2016-03-30T20:08:43.758363 #18728] INFO -- : worker=0 spawned pid=18728
I, [2016-03-30T20:08:44.108244 #18728] INFO -- : worker=0 ready
What I am a bit curious about is the fact that there are no errors in the log of the nginx delivered with gitlab.
Some more system information:
#sudo gitlab-rake gitlab:env:info
System information
System: Debian 8.3
Current User: git
Using RVM: no
Ruby Version: 2.1.8p440
Gem Version: 2.5.1
Bundler Version:1.10.6
Rake Version: 10.5.0
Sidekiq Version:4.0.1
GitLab information
Version: 8.5.0
Revision: a513e09
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: postgresql
URL: http://gitlab.mydomain.tld
HTTP Clone URL: http://gitlab.mydomain.tld/some-group/some-project.git
SSH Clone URL: git#gitlab.mydomain.tld:some-group/some-project.git
Using LDAP: no
Using Omniauth: no
GitLab Shell
Version: 2.6.10
Repositories: /var/opt/gitlab/git-data/repositories
Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks/
Git: /opt/gitlab/embedded/bin/git
Edit:
My nginx config on the "external" reverse proxy looks like this:
server {
listen 443;
ssl on;
server_name gitlab.mydomain.tld;
access_log /var/log/nginx/gitlab.mydomain.tld.access.log;
error_log /var/log/nginx/gitlab.mydomain.tld.error.log;
ssl_certificate /etc/nginx/ssl/gitlab.mydomain.tld_unified.crt;
ssl_certificate_key /etc/nginx/ssl/mydomain.tld.key;
location / {
proxy_pass http://gitlab:80;
proxy_redirect default;
proxy_set_header Host $http_host;
proxy_set_header X_FORWARDED_PROTO "https";
satisfy any;
}
}
Edit2:
I took the suggested answer into account and also considered this source: https://github.com/gitlabhq/gitlabhq/blob/master/doc/install/requirements.md
I assigned 2GB RAM to the VM now, and also added one additional unicorn worker.
Edit3:
The problem seems to be solved by adding more memory and using 3 unicorn workers.
Jan,
I have a similar setup although our box is dedicated to GITlab. Without knowing the specs of your server (GITLAB likes memory) and the load on that box I would suggest the following diagnostics:
Does your upstream nginx use identical parameters as the gitlab nginx configuration? They have tweaked a number of things including timeouts.
What kind of request result in time outs? Some operations (like generating diffs) can take some time to render.
If you run the requests via SSH do you also experience time outs?
Have you checked global logs in /var/log?
FYI: I had to enlarge my small GitLab installation to have 4GB RAM not to throw OOM errors
Now I think, I'd better go with gogs or other alternative.

Resources