I am trying to use the nginx proxy_bind directive to have upstream traffic to a gRPC server use a specific network interface. For some reason, nginx seems to be completely ignoring the directive and just using the default network interface. I tried using the proxy_bind directive for a different server that doesn't use gRPC (it is http1.1 I believe) and that worked fine, so I am led to believe that nginx is ignoring the proxy_bind directive because of something related to the server being a gRPC server. I have confirmed that it works for the normal server and not the gRPC server by running ss and looking for traffic originating from the ip I am trying to bind. There was traffic, but it was only ever going to the normal server. All traffic to the gRPC server had the default local ip.
This is the server config block for the gRPC server where proxy_bind is not working:
server {
listen 8980 http2;
# set client body size to 128M to prevent HTTP 413 errors
client_max_body_size 128M;
# set client buffer size to 128M to prevent writing temp files
client_body_buffer_size 128M;
# We have plenty of RAM, up the output_buffers
output_buffers 256 128M;
# Allow plenty of connections
http2_max_concurrent_streams 100000;
keepalive_requests 100000;
keepalive_timeout 120s;
# Be forgiving with grpc
grpc_connect_timeout 240;
grpc_read_timeout 2048;
grpc_send_timeout 2048;
grpc_socket_keepalive on;
proxy_bind <local ip>;
location / {
proxy_set_header Host $host;
grpc_pass grpc://my8980;
}
}
and this is the server config block for a normal server where proxy_bind is working:
server {
listen 4646;
# set client body size to 16M to prevent HTTP 413 errors
client_max_body_size 64M;
# set client buffer size to 32M to prevent writing temp files
client_body_buffer_size 64M;
# We have plenty of RAM, up the output_buffers
output_buffers 64 64M;
# Fix Access-Control-Allow-Origin header
proxy_hide_header Access-Control-Allow-Origin;
add_header Access-Control-Allow-Origin $http_origin;
proxy_bind <local ip>;
location / {
proxy_set_header Host $host;
proxy_pass http://myapp1;
proxy_buffering off;
}
}
The grpc_pass directive belongs to ngx_http_grpc_module, while proxy_set_header, proxy_hide_header and proxy_bind directives comes from ngx_http_proxy_module. Those modules are two different things. The ngx_http_grpc_module has its own grpc_set_header, grpc_hide_header and grpc_bind analogs to be used instead.
We are trying to build HA Kubernetese cluster with 3 core nodes each of having full set of vital components: ETCD + APIServer + Scheduller + ControllerManager and external balancer. Since ETCD can make clusters by themselves, we are stack with making HA APIServers. What seemed an obvious task a couple of weeks ago now became a "no way disaster"...
We decided to use nginx as a balancer for 3 independent APIServers. All the rest parts of our cluster that communicate with APIServer (Kublets, Kube-Proxys, Schedulers, ControllerManagers..) are suppose to use balancer to access it. Everything went well before we started the "destructive" tests (as I call it) with some pods runing.
Here is the part of APIServer config that dials with HS:
.. --apiserver-count=3 --endpoint-reconciler-type=lease ..
Here is our nginx.conf:
user nginx;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
worker_processes auto;
events {
multi_accept on;
use epoll;
worker_connections 4096;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
gzip on;
underscores_in_headers on;
include /etc/nginx/conf.d/*.conf;
}
And apiservers.conf:
upstream apiserver_https {
least_conn;
server core1.sbcloud:6443; # max_fails=3 fail_timeout=3s;
server core2.sbcloud:6443; # max_fails=3 fail_timeout=3s;
server core3.sbcloud:6443; # max_fails=3 fail_timeout=3s;
}
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 6443 ssl so_keepalive=1m:10s:3; # http2;
ssl_certificate "/etc/nginx/certs/server.crt";
ssl_certificate_key "/etc/nginx/certs/server.key";
expires -1;
proxy_cache off;
proxy_buffering off;
proxy_http_version 1.1;
proxy_connect_timeout 3s;
proxy_next_upstream error timeout invalid_header http_502; # non_idempotent # http_500 http_503 http_504;
#proxy_next_upstream_tries 3;
#proxy_next_upstream_timeout 3s;
proxy_send_timeout 30m;
proxy_read_timeout 30m;
reset_timedout_connection on;
location / {
proxy_pass https://apiserver_https;
add_header Cache-Control "no-cache";
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $http_host;
proxy_set_header Authorization $http_authorization;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-SSL-CLIENT-CERT $ssl_client_cert;
}
}
What came out after some tests is that Kubernetes seem to use single long living connection instead of tradition open-close sessions. This is probably dew to SSL. So we have to increase proxy_send_timeout and proxy_read_timeout to ridiculous 30m (the default value for APIServer is 1800s). If this settings are under 10m, then all clients (like Scheduler and ControllerManager) will generate tons if INTERNAL_ERROR because of broken streams.
So, for the crash test I simply put one of APIServers down by gently switching it off. Then I restart another one so nginx sees that upstream went down and switch all current connections to the last one. A couple of seconds later restarted APIserver returns back and we have 2 APIServers working. Then, I put network down on the third APIServer by running 'systemctl stop network' on that server so it has no chances to inform Kubernetes or nginx that its going down.
Now, the cluster it totally broken! nginx seem to recognize that upstream went down, but it will not reset already exciting connections to the upstream that is dead. I can still see them with 'ss -tnp'. If I restart Kubernetes services, they will reconnect and continue to work, same if I restart nginx - new sockets will show in ss output.
This happens only if I make APIserver unavailable by putting network down (preventing it from closing existing connections to nginx and informing Kubernetes that it is switching off). If I just stop it - everything work as a charm. But this is not a real case. Server can go down without any warning - just instantly.
What we are doing wrong? Is there is a way to force nginx to drop all connections to the upstream that went down? Anything to try before we move to HAProxy or LVS and ruin a week of kicking nginx in our attempts to make it balance instead of breaking our not so HA cluster.
So I have looked at all the tutorials that I could found on this topic, and nothing worked.
I have a JENKINS instance on a windows 10 pro, and a centos with nginx.
I want to use the NGINX as reverse proxy for Jenkins, to have https and make it accessible from internet.
My current configuration is:
server {
listen 80;
listen [::]:80;
server_name build.test.com;
access_log /var/log/nginx/log/build.test.com.access.log main;
error_log /var/log/nginx/log/build.test.com.error.log;
location ^~ /jenkins/ {
proxy_pass http://192.X.X.X:8080/;
proxy_redirect http://192.X.X.X:8080 http://build.test.com;
sendfile off;
proxy_set_header Host $host:$server_port;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_max_temp_file_size 0;
#this is the maximum upload size
client_max_body_size 10m;
client_body_buffer_size 128k;
proxy_connect_timeout 90;
proxy_send_timeout 90;
proxy_read_timeout 90;
proxy_temp_file_write_size 64k;
# Required for new HTTP-based CLI
proxy_http_version 1.1;
proxy_request_buffering off;
proxy_buffering off; # Required for HTTP-based CLI to work over SSL
}
}
(I replaced the real url and IPs.)
But this gave me a 502 Bad Gateway.
With the following error:
connect() to 192.X.X.X:8080 failed (13: Permission denied) while connecting to upstream, client: 192.168.5.254, server: build.test.com, request: "GET /jenkins HTTP/1.1", upstream: "http://192.X.X.X:8080/", host: "build.test.com"
But on my local network when I try to access the server with the http://192.X.X.X:8080/ url, it works fine.
Any idea ?
Thanks
Doing a little bit of research indicates that this is most likely an issue within CentOS, and more specifically, SELinux. SELinux could be causing the problem in any number of locations; however, this is probably going to be a good jumping off point: https://stackoverflow.com/a/24830777/8680186
Check the SELinux logs to figure out why it's throwing a hissy fit if the above doesn't help.
I am using Nginx (nginx/1.10.2) as a reverse proxy to back end servers. I have websockets that I need to ensure a long lived connection on. I have the following lines in the http part of the config:
keepalive_timeout 0;
proxy_read_timeout 5d;
proxy_send_timeout 5d;
I understand the proxy_read and proxy_sends lines as per documentation. However how does the keepalive_timeout come into this? Should I set the keepalive_timeout to 0 to basically have no timeout? or should I set it to a high value?
What does this actually do? I didn't really find the documentation that clear on this parameter:http://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_timeout
Also how will setting or disabling the keepalive_timeout affect the other static pages that I'm loading? Is it possible to set these timeout values for just the websocket? because the documentation has them under the http module so I wasn't sure if I can set them within specific locations:
location /websock {
# limit connections to 10
limit_conn addr 10;
proxy_set_header Host $host;
proxy_pass http://backends;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
I have Puma running as the upstream app server and Riak as my background db cluster. When I send a request that map-reduces a chunk of data for about 25K users and returns it from Riak to the app, I get an error in the Nginx log:
upstream timed out (110: Connection timed out) while reading
response header from upstream
If I query my upstream directly without nginx proxy, with the same request, I get the required data.
The Nginx timeout occurs once the proxy is put in.
**nginx.conf**
http {
keepalive_timeout 10m;
proxy_connect_timeout 600s;
proxy_send_timeout 600s;
proxy_read_timeout 600s;
fastcgi_send_timeout 600s;
fastcgi_read_timeout 600s;
include /etc/nginx/sites-enabled/*.conf;
}
**virtual host conf**
upstream ss_api {
server 127.0.0.1:3000 max_fails=0 fail_timeout=600;
}
server {
listen 81;
server_name xxxxx.com; # change to match your URL
location / {
# match the name of upstream directive which is defined above
proxy_pass http://ss_api;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_cache cloud;
proxy_cache_valid 200 302 60m;
proxy_cache_valid 404 1m;
proxy_cache_bypass $http_authorization;
proxy_cache_bypass http://ss_api/account/;
add_header X-Cache-Status $upstream_cache_status;
}
}
Nginx has a bunch of timeout directives. I don't know if I'm missing something important. Any help would be highly appreciated....
This happens because your upstream takes too long to answer the request and NGINX thinks the upstream already failed in processing the request, so it responds with an error.
Just include and increase proxy_read_timeout in location config block.
Same thing happened to me and I used 1 hour timeout for an internal app at work:
proxy_read_timeout 3600;
With this, NGINX will wait for an hour (3600s) for its upstream to return something.
You should always refrain from increasing the timeouts, I doubt your backend server response time is the issue here in any case.
I got around this issue by clearing the connection keep-alive flag and specifying http version as per the answer here:
https://stackoverflow.com/a/36589120/479632
server {
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $http_host;
# these two lines here
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_pass http://localhost:5000;
}
}
Unfortunately I can't explain why this works and didn't manage to decipher it from the docs mentioned in the answer linked either so if anyone has an explanation I'd be very interested to hear it.
First figure out which upstream is slowing by consulting the nginx error log
file and adjust the read time out accordingly
in my case it was fastCGI
2017/09/27 13:34:03 [error] 16559#16559: *14381 upstream timed out (110: Connection timed out) while reading response header from upstream, client:xxxxxxxxxxxxxxxxxxxxxxxxx", upstream: "fastcgi://unix:/var/run/php/php5.6-fpm.sock", host: "xxxxxxxxxxxxxxx", referrer: "xxxxxxxxxxxxxxxxxxxx"
So i have to adjust the fastcgi_read_timeout in my server configuration
location ~ \.php$ {
fastcgi_read_timeout 240;
...
}
See: original post
In your case it helps a little optimization in proxy, or you can use "# time out settings"
location /
{
# time out settings
proxy_connect_timeout 159s;
proxy_send_timeout 600;
proxy_read_timeout 600;
proxy_buffer_size 64k;
proxy_buffers 16 32k;
proxy_busy_buffers_size 64k;
proxy_temp_file_write_size 64k;
proxy_pass_header Set-Cookie;
proxy_redirect off;
proxy_hide_header Vary;
proxy_set_header Accept-Encoding '';
proxy_ignore_headers Cache-Control Expires;
proxy_set_header Referer $http_referer;
proxy_set_header Host $host;
proxy_set_header Cookie $http_cookie;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Server $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
I would recommend to look at the error_logs, specifically at the upstream part where it shows specific upstream that is timing out.
Then based on that you can adjust proxy_read_timeout, fastcgi_read_timeout or uwsgi_read_timeout.
Also make sure your config is loaded.
More details here Nginx upstream timed out (why and how to fix)
I think this error can happen for various reasons, but it can be specific to the module you're using. For example I saw this using the uwsgi module, so had to set "uwsgi_read_timeout".
As many others have pointed out here, increasing the timeout settings for NGINX can solve your issue.
However, increasing your timeout settings might not be as straightforward as many of these answers suggest. I myself faced this issue and tried to change my timeout settings in the /etc/nginx/nginx.conf file, as almost everyone in these threads suggest. This did not help me a single bit; there was no apparent change in NGINX' timeout settings. Now, many hours later, I finally managed to fix this problem.
The solution lies in this forum thread, and what it says is that you should put your timeout settings in /etc/nginx/conf.d/timeout.conf (and if this file doesn't exist, you should create it). I used the same settings as suggested in the thread:
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
Please also check the keepalive_timeout of the upstream server.
I got a similar issue: random 502, with Connection reset by peer errors in nginx logs, happening when server was on heavy load. Eventually found it was caused by a mismatch between nginx' and upstream's (gunicorn in my case) keepalive_timeout values. Nginx was at 75s and upstream only a few seconds. This caused upstream to sometimes fall in timeout and drop the connection, while nginx didn't understand why.
Raising the upstream server value to match nginx' one solved the issue.
I had the same problem and resulted that was an "every day" error in the rails controller. I don't know why, but on production, puma runs the error again and again causing the message:
upstream timed out (110: Connection timed out) while reading response header from upstream
Probably because Nginx tries to get the data from puma again and again.The funny thing is that the error caused the timeout message even if I'm calling a different action in the controller, so, a single typo blocks all the app.
Check your log/puma.stderr.log file to see if that is the situation.
If you're using an AWS EC2 instance running Linux like I am you may also need to restart Nginx for the changes to take effect after adding proxy_read_timeout 3600; to etc/nginx/nginx.conf, I did: sudo systemctl restart nginx
Hopefully it helps someone:
I ran into this error and the cause was wrong permission on the log folder for phpfpm, after changing it so phpfpm could write to it, everything was fine.
From our side it was using spdy with proxy cache. When the cache expires we get this error till the cache has been updated.
For proxy_upstream timeout, I tried the above setting but these didn't work.
Setting resolver_timeout worked for me, knowing it was taking 30s to produce the upstream timeout message. E.g. me.atwibble.com could not be resolved (110: Operation timed out).
http://nginx.org/en/docs/http/ngx_http_core_module.html#resolver_timeout
we faced issue while saving content (customt content type) giving timeout error. Fixed this by adding all above timeouts, http client config to 600s and increasing memory for php process to 3gb.
If you are using wsl2 on windows 10, check your version by this command:
wsl -l -v
you should see 2 under the version.
if you don't, you need to install wsl_update_x64.
new add a line config to location or nginx.conf, for example:
proxy_read_timeout 900s;