How many nginx buffers is too many? - nginx

Reading the nginx documentation, the proxy_buffer command has this explanatory message:
This directive sets the number and the size of buffers, into which
will be read the answer, obtained from the proxied server. By default,
the size of one buffer is equal to the size of page. Depending on
platform this is either 4K or 8K.
The default is eight 4k or 8k buffers. Why did the authors of nginx choose eight, and not a higher number? What could go wrong if I add more buffers, or a bigger buffer size?

nginx is built to be efficient with memory and its default configurations are also light on memory usage. Nothing will go wrong if you add more buffers, but nginx will consume more RAM.
Eight buffers was probably chosen as the smallest effective count that was a square of two. Four would be too few, and 16 would be greater than the default needs of nginx.
The “too many buffers” answer depends on your performance needs, memory availability, and request concurrency. The “good” threshold to stay under is the point at which your server has to swap memory to disk. The “best” answer is: as few buffers as are necessary to ensure nginx never writes to disk (check your error logs to find out if it is).
Here are nginx configurations I use for a large PHP-FPM application on web hosts with 32 GB of RAM:
client_body_buffer_size 2m;
client_header_buffer_size 16k;
large_client_header_buffers 8 8k;
fastcgi_buffers 512 16k;
fastcgi_buffer_size 512k;
fastcgi_busy_buffers_size 512k;
These configurations were determined through some trial and error and by increasing values from nginx configuration guides around the web. The header buffers remain small because HTTP headers tend to be lightweight. The client and fastcgi buffers have been increased to deal with complex HTML pages and an XML API.

Related

Apache random slow image loading

I have a weird issue on my sites where some images load slow. I have caching in place (CloudFlare caching, Brotli compression enabled), this is referring to the first "uncached" load. All of the images have been compressed to the maximum extent.
I'm wondering why some of the images have such a delay on the first load and if there's anything I can do to fix it.
Here's the network result from a site I didn't have cached.
As you can see, it doesn't seem to matter how large the images are. Some larger ones load faster, while some smaller ones are delayed.
Apache Global Configuration settings are as follows (default):
Start Servers 5
Minimum Spare Servers 5
Maximum Spare Servers 5
Server Limit 256
Max Request Workers 150
Max Connections Per Child 10000
Keep-Alive On
Keep-Alive Timeout 5
Timeout 300
Is there some configuration needed to allow these images to all resolve quickly? The CPU usage when loading my sites (uncached) is minuscule, it never goes over 1%.
In total, I count 17 images (all under 5kb) loading on this particular site.
I understand that Ngix/Litespeed would probably speed up the loading, but this question is strictly related to Apache 2.4+, without either of those installed.
Digital Ocean $20 droplet (2x CPUs - Intel E5-2650 v4, 4gb ram, 80gb ssd).
Apache 2.4+/CentOS/cPanel 90.
Edit: Removing the Apache cache headers and relying on only Cloudflare solved the "random delay". But still the question remains, why does the first "uncached" version take so long to load small images?

How should I configure Nginx to maximise the throughput for single Ruby application running on Passenger?

I want to benchmark Nginx+Passenger, and am wondering if there is anything that can be adjusted in the following nginx.conf to improve throughput and reduce latency. This is running on a 4-core i7 (8 hardware threads) with 16GB of main memory.
load_module /usr/lib/nginx/modules/ngx_http_passenger_module.so;
# One per CPU core:
worker_processes auto;
events {
}
http {
include mime.types;
default_type application/octet-stream;
access_log off;
sendfile on;
keepalive_timeout 60;
# 8 should be number of CPU threads.
passenger_root /usr/lib/passenger;
passenger_max_pool_size 8;
server {
listen [::]:80;
server_name passenger;
root /srv/http/benchmark/public;
passenger_enabled on;
passenger_min_instances 8;
passenger_ruby /usr/bin/ruby;
passenger_sticky_sessions on;
}
}
I am using wrk with multiple concurrent connections (e.g. 100).
Here are some specific issues:
Can the Nginx configuration be improved further?
Is it using HTTP/1.1 persistent connections to the Passenger application servers?
Is using a dynamic module causing any performance issues?
Do I need to do anything else to maximise the efficiency of how the integration is working?
I haven't set a passenger log file to ensure that logging IO is not a bottleneck.
Regarding the number of processes - I have 8 hardware threads, so I’ve set it to use 8 instances minimum.
Would it make sense to use threads per application server? I assume it's only relevant for IO bound workloads.
If I am pegging the processors with 8 application servers, does that indicate a sufficient amount of servers? Or should I try with, say, 16?
What is the expected performance difference between Nginx+Passenger vs Passenger Standalone?
Passenger dev here.
"Can the Nginx configuration be improved further?"
Probably, Nginx has a lot of levers, and if all you are doing is serving known payloads in a benchmark then you can seriously improve performance with Nginx's caching, for example.
"Is it using HTTP/1.1 persistent connections to the Passenger application servers?"
No it uses unix sockets.
"Is using a dynamic module causing any performance issues?"
No, once nginx loads the library, making a function call into it is the same as any other c++ function call.
"Do I need to do anything else to maximize the efficiency of how the integration is working?"
You might want to look into Passenger's turbo caching, and/or nginx caching.
"I haven't set a passenger log file to ensure that logging IO is not a bottleneck."
Good, but turn the logging level down to 0 to avoid a bit of processing.
"Would it make sense to use threads per application server? I assume it's only relevant for IO bound workloads."
Not sure exactly what you mean, are you talking about Passenger's multithreading support or nginx's?
"If I am pegging the processors with 8 application servers, does that indicate a sufficient amount of servers?"
If you are CPU bound then adding more processes won't help.
"What is the expected performance difference between Nginx+Passenger vs Passenger Standalone?"
Not much, Passenger standalone uses nginx internally. You might see some improvement if you use the builtin engine with passenger standalone, but that means you can't use caching which is far more important.

What happens when nginx proxy_buffer_size is exceeded?

I am running a node server in AWS Elastic Beanstalk with Docker, which also uses nginx. One of my endpoints is responsible for image manipulation such as resizing etc.
My logs show a lot of ESOCKETTIMEDOUT errors, which indicate it could be caused by an invalid url.
This is not the case as it is fairly basic to handle that scenario, and when I open the apparent invalid url, it loads an image just fine.
My research has so far led me to make the following changes:
Increase the timeout of the request module to 2000
Set the container uv_threadpool_size env variable to the max 128
While 1 has helped in improving response times somewhat, I don't see any improvements from 2. I have now come across the following warning in my server logs:
an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/0/12/1234567890 while reading upstream,.
This makes me think that the ESOCKETTIMEDOUT errors could be due to the proxy_buffer_size being exceeded. But, I am not sure and I'd like some opinion on this before I continue making changes based on a hunch.
So I have 2 questions:
Would the nginx proxy_buffer_size result in an error if a) the size is exceeded in cases of manipulating a large image or b) the volume of requests maxes out the buffer size?
What are the cost impacts, if any, of updating the size. AWS memory, instance size etc?
I have come across this helpful article but wanted some more opinion on if this would even help in my scenario.
When proxy_buffer_size is exceeded it creates a temporary file to use as a kind of "swap", which uses your storage, and if it is billable your cost will increase. When you increase proxy_buffer_size value you will use more RAM, which means you will have to pay for a larger one, or try your luck with the current one.
There is two things you should never make the user wait for processing: e-mails and images. It can lead to timeouts or even whole application unavailability. You can always use larger timeouts, or even more robust instances for those endpoints, but when it scales you WILL have problems.
I suggest you approach this differently: Make a image placeholder response and process those images asynchronously. When they are available as versioned resized images you can serve them normally. There is an AWS article about something like this using lambda for it.

Is client_body_buffer_size per-connection?

I'm not able to tell from reading the documentation whether client_body_buffer_size means per-connection or per-server (or does it depend on where the directive is set?)
I would like to create a large in-memory buffer (16m) to allow occasional large uploads to be speedy. But I want that to be a shared 16m -- if there are a lot of concurrent uploads then slowing down to disk-speed is fine.

Nginx proxy buffering - changing buffer's number vs size ?

I was wondering and trying to figure out how these two settings:
proxy_buffers [number] [size];
may affect (improve / degrade) proxy server performance, and whether to change buffers' size, or the number, or both...?
In my particular case, we're talking about a system serving dynamically generated binary files, that may vary in size (~60 - 200kB). Nginx serves as a load-balancer in front of 2 Tomcats that act as generators. I saw in Nginx's error.log that with default buffers' size setting all of proxied responses are cached to a file, so what I found to be logical is to change the setting to something like this:
proxy_buffers 4 32k;
and the warning message disappeared.
What's not clear to me here is if I should preferably set 1 buffer with the larger size, or several smaller buffers... E.g.:
proxy_buffers 1 128k; vs proxy_buffers 4 32k; vs proxy_buffers 8 16k;, etc...
What could be the difference, and how it may affect performance (if at all)?
First, it's a good idea to see what the documentation says about the directives:
Syntax: proxy_buffers number size;
Default: proxy_buffers 8 4k|8k;
Context: http, server, location
Sets the number and size of the buffers used for reading a response from the proxied server, for a single connection. By default, the buffer size is equal to one memory page. This is either 4K or 8K, depending on a platform.
The documentation for the proxy_buffering directive provides a bit more explanation:
When buffering is enabled, nginx receives a response from the proxied server as soon as possible, saving it into the buffers set by the proxy_buffer_size and proxy_buffers directives. If the whole response does not fit into memory, a part of it can be saved to a temporary file on the disk. …
When buffering is disabled, the response is passed to a client synchronously, immediately as it is received. …
So, what does all of that mean?
An increase of buffer size would apply per connection, so even 4K would be quite an increase.
You may notice that the size of the buffer is by default equivalent to platform page. Long story short, choosing the "best" number might as well go beyond the scope of this question, and may depend on operating system and CPU architecture.
Realistically, the difference between a bigger number of smaller buffers, or a smaller number of bigger buffers, may depend on the memory allocator provided by the operating system, as well as how much memory you have and how much memory you want to be wasted by being allocated without being used for a good purpose.
E.g., I would not use proxy_buffers 1 1024k, because then you'll be allocating a 1MB buffer for every buffered connection, even if the content would easily fit in 4KB, that would be wasteful (although, of course, there's also the little-known fact that unused-but-allocated-memory is virtually free since 1980s). There's likely a good reason that the default number of buffers was chosen to be 8 as well.
Increasing the buffers at all might actually be a bit pointless if you do caching of the responses of these binary files with the proxy_cache directive, because Nginx will still be writing it to disk for caching, and you might as well not waste the extra memory for buffering these responses.
A good operating system should be capable of already doing appropriate caching of the stuff that gets written to disk, through the file-system buffer-cache functionality. There is also the somewhat strangely-named article at Wikipedia, as "disk-buffer" name was already taken for the HDD hardware article.
All in all, there's likely little need to duplicate buffering directly within Nginx. You might also take a look at varnish-cache for some additional ideas and inspiration about the subject of multi-level caching. The fact is, "good" operating systems are supposed to take care of many things that some folks mistakenly attempt to optimise through application-specific functionality.
If you don't do caching of responses, then you might as well ask yourself whether or not buffering is appropriate in the first place.
Realistically, buffering may come useful to better protect your upstream servers from the Slowloris attack vector — however, if you do let your Nginx have megabyte-sized buffers, then, essentially you're exposing Nginx itself for consuming an unreasonable amount of resources to service clients with malicious intents.
If the responses are too large, you might want to look into optimising things at the response level. E.g. doing splitting of some content into individual files; doing compression on the file level; doing compression with gzip with HTTP Content-Encoding etc.
TL;DR: this is really a pretty broad question, and there are too many variables that require non-trivial investigation to come up with the "best" answer for any given situation.

Resources