I have a fully working config doing loadbalancing over 2 upstream servers.
I want to use "least_conn"
When I put least_conn in, it still is doing round robin.
I can confirm that other configs like "weight' and "ip_hash" are working as
expected.
Is there some other configuration/setting that also affects whether
least_conn is honored?
This is using nginx 1.18.0
For future reference. It was indeed working. Turns out that there is quite a large amount of buffer (default is ~1GB between a temp file and some in memory buffers) and the files that were being transferred were a little less than that.
Related
Intro
Hi! First, let me say that I am a networking n00b. With that in mind :
We have a small video conferencing app, dedicated to eLearning, which does live streaming through janus (webRTC). We offer the possibility to our clients to record their videos. The live streaming part we got running pretty well now, but we are having a small issue with video playback.
When we are playing back the videos, as long as there aren't many large video files to stream, everything plays back correctly. However, we had fringe cases of trouble for a while and I was sent to investigate them.
Problem encountered
When playing back multiple large video files, I noticed that I can't get more than 4-5 large files to stream simultaneously on a single client.
When the limit is hit, there seems to be some kind of race condition lock happening : a few videos are stuck, and a few are playing (usually 2 or 3). Then one of the playing videos will get stuck, and one of the stuck videos will start playing.
However, it doesn't affect the playback on other clients. When it get stuck, I can't even connect to the MinIO web interface from the same client, but I can from another client (i.e. another browser, or from another machine). I can also stream as much from the other client as I can from the one that is stuck.
I've been testing different configurations on a test minio server by loading the same file many times from different tabs in Chrome, which seems to recreate the problem.
Server Description
The files are hosted on a cloud storage that offers > 7 Gpbs bandwidth. The storage is mounted on a MinIO instance in a kubernetes cluster, behind a NGINX Ingress Controller that serves as the single point of entry to the cluster, and so it also controls traffic to the other micro-services on the k8s cluster.
Each k8s node has a guaranteed bandwitdth of > 250 Mbps, if that matters in this case.
The MinIO instance is mainly used to create transient sharing rights. We call the videos simply by pointing to their location using the DNS we set up for the service.
What has been investigated and tried
At first, I thought it might be a MinIO misconfiguration. However, I looked at the config files and the documentation and couldn't find anything that seemed to limit the number of connections / requests per client.
While reading, I stumbled upon the fact that HTML/1.1 didn't allow for more than 6 connections on Chrome and thought I hit the jackpot. But then I went and looked and the protocol used to get the files is already HTTP2 (h2).
Then I went one level higher and looked through the configuration of the NGINX Ingress Controller. Here again, everything seems ok :
events {
multi_accept on;
worker_connections 16384;
use epoll;
}
[...]
http2_max_field_size 4k;
http2_max_header_size 16k;
http2_max_requests 1000;
http2_max_concurrent_streams 128;
[...]
So I've been scouring the net for a good while now and I'm getting more and more confused by what I could investigate next, so I thought I'd come here and ask my very first StackOverflow question.
So, is there anything I could do with the current setup to make it so we can stream more large files simultaneously? If not, what are your thoughts and recommendations?
Edit :
I've found a workaround by searching hard enough : Increase Concurrent HTTP calls
At first I was not a fan - HTTP2 is supposed, from my understanding, to support a lot of parallel requests. However, I think I found the crux of the problem here : https://en.wikipedia.org/wiki/Head-of-line_blocking
Further research led me to find these mitigations to that problem : https://tools.ietf.org/id/draft-scharf-tcpm-reordering-00.html#rfc.section.3.2
I'll have to look into SCTP and see if it is something I'd like to implement, however. At first glance, that seems rather complicated and might not be worth the time investment.
Let's assume that Nginx is configured as a reverse proxy to serve very large files from a storage server. The cache is configured to cache everything with no limits (no max-size) for demo purpose. The server on which nginx is installed has 50 GB of disk space.
I was wondering how nginx behaves in these situations:
In case "max-size" is not specified, I understand that nginx can use all the available disk space. But when the disk is full, what is the behavior? It removes the oldest cache?
If thousands of files are cached and a 50 GB file needs to be cached. Nginx will then clean the cache of those thousands of files to make room for one big file?
Nginx receives a request for a 60 GB file. According to the configuration, it must cache it for future requests. But the disk is only 50 GB. Does it start caching the 50 GB file knowing that it will not be able to succeed? Or does it understand that this is not possible and just passes the request without caching.
Thank you
I can answer the first two
Yes, following LRU, but you need min_free specified proxy_cache_path. See proxy_cache_path
When the size is exceeded or there is not enough free space, it removes the least recently used data.
Per 1, yes.
Hope it helps, and I hope someone else can enlighten us with 3rd question.
I am running my websites on Ubuntu 14.04 server with nginx. I would like to set up a mail server (probably using squirrelmail) which uses apache2 (presently not in use). I intend to keep the port numbers for the web server and mail server entirely different.
Can I do this? Do I have to do anything out of the ordinary (secret handshakes) to set this up, and if so, what exactly do I need to do?
Yes it's entirely possible. Generally the complications you will have to sort out is both servers trying to listen on the same port, and directory permission issues.
In this case you will have to take precaution to keep Nginx user www-data as root because only root is allowed access to port 80 and 443 (which I'm assuming you are using for your website).
If you work out the kinks though it's entirely possible and has been done many times for situations like yours.
With that said I think it would be a lot of extra work setting up an entire apache webserver just to run squirrelmail. You can configure squirrelmail to work with Nginx and it's not that difficult.
Tutorial here.
Imho that is the easiest way forward with the fewest possible ways to crash your entire server and be forced to startover. Which brings me to another point.
Before you do any work on a server BACK UP EVERYTHING. Clone your hard drive if you can, if not make sure you get all databases static files config files ect....
It's also best practice to not work on a live production server if you can avoid it. Use a backup in the mean time while you play around with yours.
It will save you a lot of stress to work at a comfortable pace rather than work against the clock of hundreds of customers trying to view your website.
I know that doesn't apply in all cases, but it's a good practice to get into if you can to prepare you for any bigger jobs that come down the road.
2017/06/29 18:37:56 [crit] 2470#2470: ngx_slab_alloc() failed: no memory in upstream zone "backends"
2017/06/29 18:37:56 [error] 2470#2470: cannot add new server to upstream "<redacted>", memory exhausted
I am receiving a stream of critical errors in my log that indicate various upstream zones cannot be added to various upstreams because memory is exhausted.
That said, I have plenty of free memory. I'm guessing I need to increase some setting somewhere, but for the life of me, The Google doesn't seem to be able to tell me what I need to increase.
We use nginx as a reverse proxy for service discovery with our AWS ECS Docker container cluster.
Turns out the problem was that I had the zone "backends" set too small.. I increased that from 64k to 256k and all is fine now. It also turns out that I should probably be using a different zone for each upstream rather than a shared one for all my upstreams.
This answer was compliments of the nginx professional support team. We're using a licensed version of nginx. Awesome support!
I have a machine with only nginx installed without passenger that acts as load balancer with ips of some machines in its upstream list. All the app machines have nginx with phusion passenger that serve the main application. Now some of the application machines are of medium type while others are large type. As far as I know the default nginx load balancing scheme is round robin. As the load is distributed among the large and medium machines equally, if the traffic is large the medium machines get overloaded and when its less the large machines resources are wasted. Now I use newrelic to monitor the cpu and memory on these machines and a script to get the data from newrelic, so is there any way to use this data to decide the traffic route on load balancer.
One way I know is to monitor and the mark machines in upstream good or bad and then replace the upstream with the good ones and reload the nginx.conf each time without complete restart. So my second question, is the way correct. In other words does it have any drawbacks or will it cause any issues?
Third and more general question is there a better way tackle this issue of load balancing?
You can use another load balancing algorithm that will distribute load more fair: http://nginx.org/r/least_conn or/and configure weights.
Making decision based on current cpu/memory usage isn't a good idea if your goal is faster request processing instead of meaningless numbers