Nginx Load Balancer issue

Nginx Load Balancer issue - nginx

We are using Nginx As a load balancer for multiple riak nodes. The setup worked fine for some time(few hours) before Nginx started giving bad gateway 502 errors. On checking the individual nodes seemed to be working. We found out that The problem was with nginx buffer size hence increased the buffer size to 16k, it worked fine for one more day before we started getting 502 error for everything.
My Nginx configuration is as follows
upstream riak {
server 127.0.0.1:8091 weight=3;
server 127.0.0.1:8092;
server 127.0.0.1:8093;
server 127.0.0.1:8094;
}
server {
listen 8098;
server_name 127.0.0.1:8098;
location / {
proxy_pass http://riak;
proxy_buffer_size 16k;
proxy_buffers 8 16k;
}
}
Any help is appreciated,Thank you.

Check if you are running out of fd's on the nginx box. Check with netstat if you have too many connections in the TIME_WAIT state. If so, you will need to reduce you tcp_fin_timeout value from default 60 seconds to something smaller.

Related

Nginx does not re-resolve DNS names in Docker

I am running nginx as part of the docker-compose template.
In nginx config I am referring to other services by their docker hostnames (e.g. backend, ui).
That works fine until I do that trick:
docker stop backend
docker stop ui
docker start ui
docker start backend
which makes backend and ui containers to exchange IP addresses (docker provides private network IPs on a basis of giving the next IP available in CIDR to each new requester). This 4 commands executed imitate some rare cases when both upstream containers got restarted at the same time but the nginx container did not. Also, I believe, this should be a very common situation when running pods on Kubernetes-based clusters.
Now nginx resolves backend host to ui's IP and ui to backend's IP.
Reloading nginx' configuration does help (nginx -s reload).
Also, if I do nslookup from within the nginx container - the IPs are always resolved correctly.
So this isolates the problem to be a pure nginx issue around the DNS caching.
The things I tried:
I have the resolver set under the http {} block in nginx config:
resolver 127.0.0.11 ipv6=off valid=10s;
Most common solution proposed by the folks on the internet to use variables in proxy-pass (this helps to prevent nginx to resolve and cache DNS records on start) - that did not make ANY difference at all:
server {
<...>
set $mybackend "backend:3000";
location /backend/ {
proxy_pass http://$mybackend;
}
}
Tried adding resolver line into the location itself
Tried setting the variable on the http{} block level, using map:
http {
map "" $mybackend {
default backend:3000;
}
server {
...
}
}
Tried to use openresty fork of nginx (https://hub.docker.com/r/openresty/openresty/) with resolver local=true
None of the solutions gave any effect at all. The DNS caches are only wiped if I reload nginx configuration inside of the container OR restart the container manually.
My current workaround is to use static docker network declared in docker-compose.yml. But this has its cons too.
Nginx version used: 1.20.0 (latest as of now)
Openresty versions used: 1.13.6.1 and 1.19.3.1 (latest as of now)
Would appreciate any thoughts
UPDATE 2021-09-08: Few months later I am back to solving this same issue and still no luck. Really looks like the bug in nginx - I can not make nginx to re-resolve the dns names. There seems to be no timeout to nginx' dns cache and none of the options listed above to introduce timeouts or trigger dns flush work.
UPDATE 2022-01-11: I think the problem is really in the nginx. I tested my config in many ways a couple months ago and it looks like something else in my nginx.conf prevents the valid parameter of the resolver directive from working properly. It is either the limit_req_zone or the proxy_cache_path directives used for request rate limiting and caching respectively. These just don't play nicely with the valid param for some reason. And I could not find any information about this anywhere in nginx docs.
I will get back to this later to confirm my hypothesis.

Maybe it's because nginx's DNS resolver for upstream servers only works in the commercial version, nginx plus?
https://www.nginx.com/products/nginx/load-balancing/#service-discovery

TLDR: Your Internet Provider may be caching dnses with no respect to tiny TTL values (like 1 second).
I've been trying to retest locally the same thing.
Your docker might be using local resolver (127.0.0.11)
Then Dns might be cached by your OS (which you may clean - that's OS specific)
Then you might have it cached on your WIFI/router (yes!)
Later it goes to your ISP and is beyond your control.
But nslookup is your friend, you can query each dns server between nginx and root DNS server.
Something very easy to reproduce (without setting up local dns server)
Create route 53 'A' entry with TTL of 1 second and try to query AWS dns server in your hosted zone (it will be sth. like ns-239.awsdns-29.com)
Play around with dig / nslookup command
nslookup
set type=a
server ns-239.awsdns-29.com
your.domain.com
It will return IP you have set
Change the Route53 'A' entry to some other ip.
use dig / nslookup and make sure you see changes immediately
Then set resolver in nginx to AWS dns name (for testing purposes only).
If that works it means that DNS is cached elsewere and this is no longer nginx issue!
In my case it was sunrise WIFI router which began to see new IP only after I restarted it (I assume things would resolve after some longer value).
Great help when debugging this is when your nginx is compiled with
--with-debug
Then in nginx logs you see whether given dns was resolved and to what IP.
My whole config looks like this (here with standard docker resolver which has to be set if you are using variables in proxy_pass!)
server {
listen 0.0.0.0:8888;
server_name nginx.my.custom.domain.in.aws;
resolver 127.0.0.11 valid=1s;
location / {
proxy_ssl_server_name on;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header Host $host;
set $backend_servers my.custom.domain.in.aws;
proxy_pass https://$backend_servers$request_uri;
}
}
Then you can try to test it with
curl -L http://nginx.my.custom.domain.in.aws --resolve nginx.my.custom.domain.in.aws 0.0.0.0:8888

Was struggling on the same thing exactly for the same thing (Docker Swarm) and actually to make it work I required to let the upstream away from my configuration.
Something that works well (tested 5' ago on NGINX 2.22) :
location ~* /api/parameters/(.*)$ {
resolver 127.0.0.11 ipv6=off valid = 1s;
set $bck_parameters parameters:8000;
proxy_pass http://$bck_parameters/api/$1$is_args$args;
}
where $bck_parameters is NOT an upstream but the real server behind.
Doing same thing with upstream will fail.

Nginx limit upload speed

I use Nginx as the reverse proxy.
How to limit the upload speed in Nginx?

I would like to share how to limit the upload speed of reverse proxy in Nginx.
Limiting download speed is easy as a piece of cake, but not for upload.
Here is the configuration to limit upload speed
find your directory
/etc/nginx/nginx.conf
user www-data;
worker_processes auto;
pid /run/nginx.pid;
events {
worker_connections 1;
}
# 1)
# Add a stream
# This stream is used to limit upload speed
stream {
upstream site {
server your.upload-api.domain1:8080;
server your.upload-api.domain1:8080;
}
server {
listen 12345;
# 19 MiB/min = ~332k/s
proxy_upload_rate 332k;
proxy_pass site;
# you can use directly without upstream
# your.upload-api.domain1:8080;
}
}
http {
server {
# 2)
# Proxy to the stream that limits upload speed
location = /upload {
# It will proxy the data immediately if off
proxy_request_buffering off;
# It will pass to the stream
# Then the stream passes to your.api.domain1:8080/upload?$args
proxy_pass http://127.0.0.1:12345/upload?$args;
}
# You see? limit the download speed is easy, no stream
location /download {
keepalive_timeout 28800s;
proxy_read_timeout 28800s;
proxy_buffering off;
# 75MiB/min = ~1300kilobytes/s
proxy_limit_rate 1300k;
proxy_pass your.api.domain1:8080;
}
}
}
If your Nginx doesn't support the stream.
You may need to add a module.
static:
$ ./configure --with-stream
$ make && sudo make install
dynamic
$ ./configure --with-stream=dynamic
https://www.nginx.com/blog/compiling-dynamic-modules-nginx-plus/
Note:
If you have a client such as HAProxy, and Nginx as a server.
You may face 504 timeout in HAProxy and 499 client is close in the Nginx while uploading large files with limiting the low upload speed.
You should increase or add timeout server: 605s or over in HAProxy because we want HAProxy not to close the connection while Nginx is busy uploading to your server.
https://stackoverflow.com/a/44028228/10258377
Some references:
https://nginx.org/en/docs/stream/ngx_stream_proxy_module.html#proxy_upload_rate
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_limit_rate
You will find some other ways by adding third-party modules to limit the upload speed, but it's complex and not working fine
https://www.nginx.com/resources/wiki/modules/
https://www.nginx.com/resources/wiki/modules/upload/
https://github.com/cfsego/limit_upload_rate
Thank me later ;)

Nginx memcached with fallback to remote service

I can't get Nginx working with memcached module, the requirement is to query remote service, cache data in memcached and never fetch remote endpoint until backend invalidates the cache. I have 2 containers with memcached v1.4.35 and one with Nginx v1.11.10.
The configuration is the following:
upstream http_memcached {
server 172.17.0.6:11211;
server 172.17.0.7:11211;
}
upstream remote {
server api.example.com:443;
keepalive 16;
}
server {
listen 80;
location / {
set $memcached_key "$uri?$args";
memcached_pass http_memcached;
error_page 404 502 504 = #remote;
}
location #remote {
internal;
proxy_pass https://remote;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
I tried to set memcached upstream incorrectly but I get HTTP 499 instead and warnings:
*3 upstream server temporarily disabled while connecting to upstream
It seems with described configuration Nginx can reach memcached successfully but can't write or read from it. I can write and read to memcached with telnet successfully.
Can you help me please?

My guesses on what's going on with your configuration
1. 499 codes
HTTP 499 is nginx' custom code meaning the client terminated connection before receiving the response (http://lxr.nginx.org/source/src/http/ngx_http_request.h#0120)
We can easily reproduce it, just
nc -k -l 172.17.0.6 172.17.0.6:11211
and curl your resource - curl will hang for a while and then press Ctrl+C — you'll have this message in your access logs
2. upstream server temporarily disabled while connecting to upstream
It means nginx didn't manage to reach your memcached and just removed it from the pool of upstreams. Suffice is to shutdown both memcached servers and you'd constantly see it in your error logs (I see it every time with error_log ... info).
As you see these messages your assumption that nginx can freely communicate with memcached servers doesn't seem to be true.
Consider explicitly setting http://nginx.org/en/docs/http/ngx_http_memcached_module.html#memcached_bind
and use the -b option with telnet to make sure you're correctly testing memcached servers for availability via your telnet client
3. nginx can reach memcached successfully but can't write or read from it
Nginx can only read from memcached via its built-in module
(http://nginx.org/en/docs/http/ngx_http_memcached_module.html):
The ngx_http_memcached_module module is used to obtain responses from
a memcached server. The key is set in the $memcached_key variable. A
response should be put in memcached in advance by means external to
nginx.
4. overall architecture
It's not fully clear from your question how the overall schema is supposed to work.
nginx's upstream uses weighted round-robin by default.
That means your memcached servers will be queried once at random.
You can change it by setting memcached_next_upstream not_found so a missing key will be considered an error and all of your servers will be polled. It's probably ok for a farm of 2 servers, but unlikely is it what your want for 20 servers
the same is ordinarily the case for memcached client libraries — they'd pick a server out of a pool according to some hashing scheme => so your key would end up on only 1 server out of the pool
5. what to do
I've managed to set up a similar configuration in 10 minutes on my local box - it works as expected. To mitigate debugging I'd get rid of docker containers to avoid networking overcomplication, run 2 memcached servers on different ports in single-threaded mode with -vv option to see when requests are reaching them (memcached -p 11211 -U o -vv) and then play with tail -f and curl to see what's really happening in your case.
6. working solution
nginx config:
https and http/1.1 is not used here but it doesn't matter
upstream http_memcached {
server 127.0.0.1:11211;
server 127.0.0.1:11212;
}
upstream remote {
server 127.0.0.1:8080;
}
server {
listen 80;
server_name server.lan;
access_log /var/log/nginx/server.access.log;
error_log /var/log/nginx/server.error.log info;
location / {
set $memcached_key "$uri?$args";
memcached_next_upstream not_found;
memcached_pass http_memcached;
error_page 404 = #remote;
}
location #remote {
internal;
access_log /var/log/nginx/server.fallback.access.log;
proxy_pass http://remote;
proxy_set_header Connection "";
}
}
server.py:
this is my dummy server (python):
from random import randint
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
return 'Hello: {}\n'.format(randint(1, 100000))
This is how to run it (just need to install flask first)
FLASK_APP=server.py [flask][2] run -p 8080
filling in my first memcached server:
$ telnet 127.0.0.1 11211
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
set /? 0 900 5
cache
STORED
quit
Connection closed by foreign host.
checking:
note that we get a result every time although we stored data
only in the first server
$ curl http://server.lan && echo
cache
$ curl http://server.lan && echo
cache
$ curl http://server.lan && echo
cache
this one is not in the cache so we'll get a response from server.py
$ curl http://server.lan/?q=1 && echo
Hello: 32337
whole picture:
the 2 windows on the right are
memcached -p 11211 -U o -vv
and
memcached -p 11212 -U o -vv

nginx returns Internal Server Error when uploading large files (several GB)

I have an Artifactory behind nginx and uploading files larger than 4 GB fails. I am fairly certain that this is nginx's fault, because if the file is uploaded from/to localhost, no problem occurs.
nginx is set up to have client_max_body_size and client_body_timeout large enough for this not to be an issue.
Still, when uploading a large file (>4 GB) via curl, after about half a minute it fails. The only error message I get is HTTP 500 Internal Server Error, nothing is written to the nginx's error logs.

The problem in my case was insufficient disk space mounted on root. I have a huge disk mounted on /home, but only had about 4 GB left on /. I assume that nginx was saving incoming request bodies there and after it had filled up, the request was shut down.
The way I fixed it was to add those lines to the nginx.conf file (not all of them are necessarily required):
http {
(...)
client_max_body_size 100G;
client_body_timeout 300s;
client_body_in_file_only clean;
client_body_buffer_size 16K;
client_body_temp_path /home/nginx/client_body_temp;
}
The last line is the important part - there I tell nginx to fiddle with its files in the /home space.

nginx - client_max_body_size has no effect

nginx keeps saying client intended to send too large body. Googling and RTM pointed me to client_max_body_size. I set it to 200m in the nginx.conf as well as in the vhost conf, restarted Nginx a couple of times but I'm still getting the error message.
Did I overlook something? The backend is php-fpm (max_post_size and max_upload_file_size are set accordingly).

Following nginx documentation, you can set client_max_body_size 20m ( or any value you need ) in the following context:
context: http, server, location

NGINX large uploads are successfully working on hosted WordPress sites, finally (as per suggestions from nembleton & rjha94)
I thought it might be helpful for someone, if I added a little clarification to their suggestions. For starters, please be certain you have included your increased upload directive in ALL THREE separate definition blocks (server, location & http). Each should have a separate line entry. The result will like something like this (where the ... reflects other lines in the definition block):
http {
...
client_max_body_size 200M;
}
(in my ISPconfig 3 setup, this block is in the /etc/nginx/nginx.conf file)
server {
...
client_max_body_size 200M;
}
location / {
...
client_max_body_size 200M;
}
(in my ISPconfig 3 setup, these blocks are in the /etc/nginx/conf.d/default.conf file)
Also, make certain that your server's php.ini file is consistent with these NGINX settings. In my case, I changed the setting in php.ini's File_Uploads section to read:
upload_max_filesize = 200M
Note: if you are managing an ISPconfig 3 setup (my setup is on CentOS 6.3, as per The Perfect Server), you will need to manage these entries in several separate files. If your configuration is similar to one in the step-by-step setup, the NGINX conf files you need to modify are located here:
/etc/nginx/nginx.conf
/etc/nginx/conf.d/default.conf
My php.ini file was located here:
/etc/php.ini
I continued to overlook the http {} block in the nginx.conf file. Apparently, overlooking this had the effect of limiting uploading to the 1M default limit. After making the associated changes, you will also want to be sure to restart your NGINX and PHP FastCGI Process Manager (PHP-FPM) services. On the above configuration, I use the following commands:
/etc/init.d/nginx restart
/etc/init.d/php-fpm restart

As of March 2016, I ran into this issue trying to POST json over https (from python requests, not that it matters).
The trick is to put "client_max_body_size 200M;" in at least two places http {} and server {}:
1. the http directory
Typically in /etc/nginx/nginx.conf
2. the server directory in your vhost.
For Debian/Ubuntu users who installed via apt-get (and other distro package managers which install nginx with vhosts by default), thats /etc/nginx/sites-available/mysite.com, for those who do not have vhosts, it's probably your nginx.conf or in the same directory as it.
3. the location / directory in the same place as 2.
You can be more specific than /, but if its not working at all, i'd recommend applying this to / and then once its working be more specific.
Remember - if you have SSL, that will require you to set the above for the SSL server and location too, wherever that may be (ideally the same as 2.). I found that if your client tries to upload on http, and you expect them to get 301'd to https, nginx will actually drop the connection before the redirect due to the file being too large for the http server, so it has to be in both.
Recent comments suggest that there is an issue with this on SSL with newer nginx versions, but i'm on 1.4.6 and everything is good :)

You need to apply following changes:
Update php.ini (Find right ini file from phpinfo();) and increase post_max_size and upload_max_filesize to size you want:
sed -i "s/post_max_size =.*/post_max_size = 200M/g" /etc/php5/fpm/php.ini
sed -i "s/upload_max_filesize =.*/upload_max_filesize = 200M/g" /etc/php5/fpm/php.ini```
Update NginX settings for your website and add client_max_body_size value in your location, http, or server context.
location / {
client_max_body_size 200m;
...
}
Restart NginX and PHP-FPM:
service nginx restart
service php5-fpm restart
NOTE: Sometime (In my case almost every time) you need to kill php-fpm process if it didn't refresh by service command properly. To do that you can get list of processes (ps -elf | grep php-fpm) and kill one by one (kill -9 12345) or use following command to do it for you:
ps -elf | grep php-fpm | grep -v grep | awk '{ print $4 }' | xargs kill -9

Please see if you are setting client_max_body_size directive inside http {} block and not inside location {} block. I have set it inside http{} block and it works

Someone correct me if this is bad, but I like to lock everything down as much as possible, and if you've only got one target for uploads (as it usually the case), then just target your changes to that one file. This works for me on the Ubuntu nginx-extras mainline 1.7+ package:
location = /upload.php {
client_max_body_size 102M;
fastcgi_param PHP_VALUE "upload_max_filesize=102M \n post_max_size=102M";
(...)
}

I had a similar problem recently and found out, that client_max_body_size 0; can solve such an issue. This will set client_max_body_size to no limit. But the best practice is to improve your code, so there is no need to increase this limit.

I meet the same problem, but I found it nothing to do with nginx. I am using nodejs as backend server, use nginx as a reverse proxy, 413 code is triggered by node server. node use koa parse the body. koa limit the urlencoded length.
formLimit: limit of the urlencoded body. If the body ends up being larger than this limit, a 413 error code is returned. Default is 56kb.
set formLimit to bigger can solve this problem.

Assuming you have already set the client_max_body_size and various PHP settings (upload_max_filesize / post_max_size , etc) in the other answers, then restarted or reloaded NGINX and PHP without any result, run this...
nginx -T
This will give you any unresolved errors in your NGINX configs. In my case, I struggled with the 413 error for a whole day before I realized there were some other unresolved SSL errors in the NGINX config (wrong pathing for certs) that needed to be corrected. Once I fixed the unresolved issues I got from 'nginx -T', reloaded NGINX, and EUREKA!! That fixed it.

I'm setting up a dev server to play with that mirrors our outdated live one, I used The Perfect Server - Ubuntu 14.04 (nginx, BIND, MySQL, PHP, Postfix, Dovecot and ISPConfig 3)
After experiencing the same issue, I came across this post and nothing was working. I changed the value in every recommended file (nginx.conf, ispconfig.vhost, /sites-available/default, etc.)
Finally, changing client_max_body_size in my /etc/nginx/sites-available/apps.vhost and restarting nginx is what did the trick. Hopefully it helps someone else.

In case you are using Kubernetes, add the following annotations to your Ingress:
annotations:
nginx.ingress.kubernetes.io/client-max-body-size: "5m"
nginx.ingress.kubernetes.io/client-body-buffer-size: "8k"
nginx.ingress.kubernetes.io/proxy-body-size: "5m"
nginx.ingress.kubernetes.io/proxy-buffer-size: "8k"
Confirm the changes were applied:
kubectl -n <namespace> describe ingress <ingress-name>
References:
Client Body Buffer Size
Custom max body size

Had the same issue that the client_max_body_size directive was ignored.
My silly error was, that I put a file inside /etc/nginx/conf.d which did not end with .conf. Nginx will not load these by default.

If you tried the above options and no success, also you're using IIS (iisnode) to host your node app, putting this code on web.config resolved the problem for me:
Here is the reference: https://www.inflectra.com/support/knowledgebase/kb306.aspx
Also, you can chagne the length allowed because now I think its 2GB. Modify it by your needs.
<security>
<requestFiltering>
<requestLimits maxAllowedContentLength="2147483648" />
</requestFiltering>
</security>

The following config worked for me. Notice I only set client_max_body_size 50M; once, contrary to what others are saying...
File: /etc/nginx/conf.d/sites.conf
server {
listen 80 default_server;
server_name portal.myserver.com;
return 301 https://$host$request_uri;
}
server {
resolver 127.0.0.11 valid=30s;
listen 443 ssl default_server;
listen [::]:443 ssl default_server;
ssl_certificate /secret/portal.myserver.com.crt;
ssl_certificate_key /secret/portal.myserver.com.pem;
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
server_name portal.myserver.com;
client_max_body_size 50M;
location /fileserver/ {
set $upstream http://fileserver:6976;
proxy_pass $upstream;
}
}

If you are using windows version nginx, you can try to kill all nginx process and restart it to see.
I encountered same issue In my environment, but resolved it with this solution.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex