Nginx worker_connections sets the maximum number of simultaneous connections that can be opened by a worker process. This number includes all connections (e.g. connections with proxied servers, among others), not only connections with clients. Another consideration is that the actual number of simultaneous connections cannot exceed the current limit on the maximum number of open files. I have few queries around this:
What should be the optimal or recommended value for this?
What are the downsides of using a high number of worker connections?
Setting lower limits may be useful when you may be resource-constrained. Some connections, for example, keep-alive connections, are effectively wasting your resources (even if nginx is very efficient, which it is), and aren't required for correct operation of a general-purpose server.
Having a lower resource limit will indicate to nginx that you are low on physical resources, and those available should be allocated to new connections, rather than to serve the idling keep-alive connections.
What is the recommended value? It's the default.
The defaults are all documented within the documentation:
Default: worker_connections 512;
And can be confirmed in the source-code at event/ngx_event.c, too
13#define DEFAULT_CONNECTIONS 512
Related
With HTTP/1.0, there used to be a recommended limit of 2 connections per domain. More recent HTTP RFCs have relaxed this limitation but still warn to be conservative when opening multiple connections:
According to RFC 7230, section 6.4, "a client ought to limit the number of simultaneous open connections that it maintains to a given server".
More specifically, besides HTTP/2, these days, browsers impose a per-domain limit of 6-8 connections when using HTTP/1.1. From what I'm reading, these guidelines are intended to improve HTTP response times and avoid congestion.
Can someone help me understand what would happen with congestion and response times if many connections were opened by domain? It doesn't sound like an HTTP server problem since the amount of connection they can handle seems like an implementation detail. The explanation above seems to say it's about TCP performance? I can't find any more precise explanations for why HTTP clients limit the number of connections per domains.
The primary reasoning for this is resources on the server side.
Imagine that you have a server running Apache with the default of 256 worker threads. Imagine that this server is hosting an index page that has 20 images on it. Now imagine that 20 clients simultaneously connect and download the index page; each of these clients closes these connections after obtaining the page.
Since each of them will now establish connections to download the image, you likely see that the connections increase exponentially (or multiplicatively, I suppose). Consider what happens if every client is configured to establish up to ten simultaneous connections in parallel to optimize the display of the page with images. This takes us very quickly to 400 simultaneous connections. This is nearly double the number of worker processes that Apache has available (again, by default, with a pre-fork).
For the server, resources must be balanced to be able to serve the most likely load, but the clients help with this tremendously by throttling connections. If every client felt free to establish 100+ connections to a server in parallel, we would very quickly DoS lots of hosts. :)
I'm using
red:set_keepalive(max_idle_timeout, pool_size)
(From here: https://github.com/openresty/lua-resty-redis#set_keepalive)
with Nginx and trying to determine the best values to use for max_idle_timeout and pool_size.
If my worker_connections is set to 1024, does it make sense to have a pool_size of 1024?
For max_idle_timeout, is 60000 (1 minute) too "aggressive"? Is it safer to go with a smaller value?
Thanks,
Matt
I think Check List for Issues section of official documentation has a good guideline for sizing your connection pool:
Basically if your NGINX handle n concurrent requests and your NGINX has m workers, then the connection pool size should be configured as n/m. For example, if your NGINX usually handles 1000 concurrent requests and you have 10 NGINX workers, then the connection pool size should be 100.
So, if you expect 1024 concurrent requests that actually connect to Redis then a good size for your pool is 1024/worker_processes. Maybe a few more to account for uneven request distribution among workers.
Your keepalive should be long enough to account for the way traffic arives. If your traffic is constant then you can lower your timeout. Or stay with 60 seconds, in most cases longer timeout won't make any noticeable difference.
According to nginx documentation on limit_req_zone
One megabyte zone can keep about 16 thousand 64-byte states. If the zone storage is exhausted, the server will return the 503 (Service Temporarily Unavailable) error to all further requests.
I wonder in what way these zones get cleared? For example if we have smth like
limit_req_zone $binary_remote_addr zone=one:1m rate=1r/s;
and the number of unique users per a day exceeds 16000 - does it mean that the zone will get overflown and other users will start getting 503 error for the set up location? Or is there a time frame of user's inactivity after which the-user-related-zone-memory will be cleaned?
My main concern here is to set an optimal zone size without a risk of getting it exhausted
in case of high-load.
It should be checked, but as I understood lifetime of the zone items relates to the active connections.
So zone=one:1m can hold up to 16 K unique IPs among currently (simultaneously) active connections (total number of the active connections at the moment can exceed 16 K, because a few connections can be opened from the same IP).
So zone size in mb should be >= number of simultaneous connections from the unique IPs / 16K.
Note that if users share single IP over the NAT that is rather often for USSR providers then you will limit request frequency for the bunch of users that can be very inconvenient for them, so to handle this case you should set rate = simult_users_with_same_ip r/s
From https://www.nginx.com/blog/rate-limiting-nginx
If storage is exhausted when NGINX needs to add a new entry, it removes the oldest entry. If the space freed is still not enough to accommodate the new record, NGINX returns status code 503 (Service Temporarily Unavailable). Additionally, to prevent memory from being exhausted, every time NGINX creates a new entry it removes up to two entries that have not been used in the previous 60 seconds.
>16K entries a day is nothing to worry about. NGINX wipes entries that are inactive for more than a minute.
But if the number of active entries reaches >16K, it gets problematic, in that it might lose entries (and states) in use.
This is my nginx status below:
Active connections: 6119
server accepts handled requests
418584709 418584709 455575794
Reading: 439 Writing: 104 Waiting: 5576
The value of Waiting is much higher than Reading and Writing, is it normal?
Because of the 'keep-alive' is open?
But if I send a large number of requests to the server, the value of Reading and Writing don't increase, so I think there must be a bottleneck of the nginx or any other.
The Waiting time is Active - (Reading + Writing), i.e. connection still opened waiting for either a new request, or the keepalive expiration.
You could change the keepalive default (which is 75 seconds)
keepalive_timeout 20s;
or tell the browser when it should close the connection by adding an optional second timeout in the header sent to the browser
keepalive_timeout 20s 20s;
but in this nginx page about keepalive you see that some browsers do not care about the header (anyway your site wound't gain much thanks to this optional parameter).
The keepalive is a way to reduce the overhead of creating the connection, as, most of the time, a user will navigate through the site etc... (Plus the multiple requests from a single page, to download css, javascript, images etc...)
It depends on your site, you could reduce the keepalive - but keep in mind that establishing connections is expensive. This is a trade-off you have to refine depending on the site statistics. You could also decrease little by little the timeout (75s -> 50, then a week later 30...) and see how the server behaves.
You don't really want to fix it, as "waiting" means keep-alive
connections. They consume almost no resources (socket + about
2.5M of memory per 10000 connections in nginx).
Are the requests short lived? it's possible they're reading/writing then closing in a short amount of time.
If you're genuinely interested in fixing it you can test to see if nginx is bottleneck you could set keep-alive to 0 in your nginx config:
keepalive_timeout 0;
The nginx documentation says
max_clients = worker_processes * worker_connections
but how does the keepalive factor into this? I have my configuration setup with 2 worker_processes and 8192 worker_connections; that means I can theoretically handle a maximum of 16384 concurrent connections. Pushing out 16384 streams of data concurrently is ginormous, but if I have a 60s keepalive_timeout then with each client hogging a connection for 1 minute that number has a completely different meaning. Which is it?
Connected to all this is the $connection variable that can be used with the log_format directive. I defined the following log format so I could analyze the server's performance:
log_format perf '$request_time $time_local $body_bytes_sent*$gzip_ratio $connection $pipe $status $request_uri';
That $connection variable is reporting around 11-12 million connections! I'm no math major, but obviously that number is way higher than worker_processes * worker_connections. So what is it supposed to represent?
In short, I'm trying to figure out how to determine a good value for worker_connection.
$connection is a counter, not the total number of used connections right now. So it's intended to grow.
Keepalive connections cannot be discarded, so the room is worker_processes * worker_connections - keepalive connections
just imagine the whole picture: first client connects to you, gets a file and then browser keeps connection for 60 secs. another client connects, gets, and keeps its connection too. at the end of firstr minute, you may have (in worst case) all the clients requested something from you in the last 60 secs still keeping their connections open
so, in the worst case you will serve "worker_processes * worker_connections / keep_alive" connections in a second, i.e. about 260 for your numbers. if you need more, you should alloc more connections - just for serving keepalives: read my answerr in Tuning nginx worker_process to obtain 100k hits per min
afaik nginx may hold 10k of inactive (keepalived) connections in 2.5mb of memory, so increasing worker_connections is cheap, very cheap. i think that main bottleneck here may be your OS itself