What happens when nginx proxy_buffer_size is exceeded? - nginx

I am running a node server in AWS Elastic Beanstalk with Docker, which also uses nginx. One of my endpoints is responsible for image manipulation such as resizing etc.
My logs show a lot of ESOCKETTIMEDOUT errors, which indicate it could be caused by an invalid url.
This is not the case as it is fairly basic to handle that scenario, and when I open the apparent invalid url, it loads an image just fine.
My research has so far led me to make the following changes:
Increase the timeout of the request module to 2000
Set the container uv_threadpool_size env variable to the max 128
While 1 has helped in improving response times somewhat, I don't see any improvements from 2. I have now come across the following warning in my server logs:
an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/0/12/1234567890 while reading upstream,.
This makes me think that the ESOCKETTIMEDOUT errors could be due to the proxy_buffer_size being exceeded. But, I am not sure and I'd like some opinion on this before I continue making changes based on a hunch.
So I have 2 questions:
Would the nginx proxy_buffer_size result in an error if a) the size is exceeded in cases of manipulating a large image or b) the volume of requests maxes out the buffer size?
What are the cost impacts, if any, of updating the size. AWS memory, instance size etc?
I have come across this helpful article but wanted some more opinion on if this would even help in my scenario.

When proxy_buffer_size is exceeded it creates a temporary file to use as a kind of "swap", which uses your storage, and if it is billable your cost will increase. When you increase proxy_buffer_size value you will use more RAM, which means you will have to pay for a larger one, or try your luck with the current one.
There is two things you should never make the user wait for processing: e-mails and images. It can lead to timeouts or even whole application unavailability. You can always use larger timeouts, or even more robust instances for those endpoints, but when it scales you WILL have problems.
I suggest you approach this differently: Make a image placeholder response and process those images asynchronously. When they are available as versioned resized images you can serve them normally. There is an AWS article about something like this using lambda for it.

Related

Can't stream more than 5-6 static videos simultaneously on a single client

Intro
Hi! First, let me say that I am a networking n00b. With that in mind :
We have a small video conferencing app, dedicated to eLearning, which does live streaming through janus (webRTC). We offer the possibility to our clients to record their videos. The live streaming part we got running pretty well now, but we are having a small issue with video playback.
When we are playing back the videos, as long as there aren't many large video files to stream, everything plays back correctly. However, we had fringe cases of trouble for a while and I was sent to investigate them.
Problem encountered
When playing back multiple large video files, I noticed that I can't get more than 4-5 large files to stream simultaneously on a single client.
When the limit is hit, there seems to be some kind of race condition lock happening : a few videos are stuck, and a few are playing (usually 2 or 3). Then one of the playing videos will get stuck, and one of the stuck videos will start playing.
However, it doesn't affect the playback on other clients. When it get stuck, I can't even connect to the MinIO web interface from the same client, but I can from another client (i.e. another browser, or from another machine). I can also stream as much from the other client as I can from the one that is stuck.
I've been testing different configurations on a test minio server by loading the same file many times from different tabs in Chrome, which seems to recreate the problem.
Server Description
The files are hosted on a cloud storage that offers > 7 Gpbs bandwidth. The storage is mounted on a MinIO instance in a kubernetes cluster, behind a NGINX Ingress Controller that serves as the single point of entry to the cluster, and so it also controls traffic to the other micro-services on the k8s cluster.
Each k8s node has a guaranteed bandwitdth of > 250 Mbps, if that matters in this case.
The MinIO instance is mainly used to create transient sharing rights. We call the videos simply by pointing to their location using the DNS we set up for the service.
What has been investigated and tried
At first, I thought it might be a MinIO misconfiguration. However, I looked at the config files and the documentation and couldn't find anything that seemed to limit the number of connections / requests per client.
While reading, I stumbled upon the fact that HTML/1.1 didn't allow for more than 6 connections on Chrome and thought I hit the jackpot. But then I went and looked and the protocol used to get the files is already HTTP2 (h2).
Then I went one level higher and looked through the configuration of the NGINX Ingress Controller. Here again, everything seems ok :
events {
multi_accept on;
worker_connections 16384;
use epoll;
}
[...]
http2_max_field_size 4k;
http2_max_header_size 16k;
http2_max_requests 1000;
http2_max_concurrent_streams 128;
[...]
So I've been scouring the net for a good while now and I'm getting more and more confused by what I could investigate next, so I thought I'd come here and ask my very first StackOverflow question.
So, is there anything I could do with the current setup to make it so we can stream more large files simultaneously? If not, what are your thoughts and recommendations?
Edit :
I've found a workaround by searching hard enough : Increase Concurrent HTTP calls
At first I was not a fan - HTTP2 is supposed, from my understanding, to support a lot of parallel requests. However, I think I found the crux of the problem here : https://en.wikipedia.org/wiki/Head-of-line_blocking
Further research led me to find these mitigations to that problem : https://tools.ietf.org/id/draft-scharf-tcpm-reordering-00.html#rfc.section.3.2
I'll have to look into SCTP and see if it is something I'd like to implement, however. At first glance, that seems rather complicated and might not be worth the time investment.

Nginx cache behavior for large files and disk full

Let's assume that Nginx is configured as a reverse proxy to serve very large files from a storage server. The cache is configured to cache everything with no limits (no max-size) for demo purpose. The server on which nginx is installed has 50 GB of disk space.
I was wondering how nginx behaves in these situations:
In case "max-size" is not specified, I understand that nginx can use all the available disk space. But when the disk is full, what is the behavior? It removes the oldest cache?
If thousands of files are cached and a 50 GB file needs to be cached. Nginx will then clean the cache of those thousands of files to make room for one big file?
Nginx receives a request for a 60 GB file. According to the configuration, it must cache it for future requests. But the disk is only 50 GB. Does it start caching the 50 GB file knowing that it will not be able to succeed? Or does it understand that this is not possible and just passes the request without caching.
Thank you
I can answer the first two
Yes, following LRU, but you need min_free specified proxy_cache_path. See proxy_cache_path
When the size is exceeded or there is not enough free space, it removes the least recently used data.
Per 1, yes.
Hope it helps, and I hope someone else can enlighten us with 3rd question.

Nginx proxy buffering - changing buffer's number vs size ?

I was wondering and trying to figure out how these two settings:
proxy_buffers [number] [size];
may affect (improve / degrade) proxy server performance, and whether to change buffers' size, or the number, or both...?
In my particular case, we're talking about a system serving dynamically generated binary files, that may vary in size (~60 - 200kB). Nginx serves as a load-balancer in front of 2 Tomcats that act as generators. I saw in Nginx's error.log that with default buffers' size setting all of proxied responses are cached to a file, so what I found to be logical is to change the setting to something like this:
proxy_buffers 4 32k;
and the warning message disappeared.
What's not clear to me here is if I should preferably set 1 buffer with the larger size, or several smaller buffers... E.g.:
proxy_buffers 1 128k; vs proxy_buffers 4 32k; vs proxy_buffers 8 16k;, etc...
What could be the difference, and how it may affect performance (if at all)?
First, it's a good idea to see what the documentation says about the directives:
Syntax: proxy_buffers number size;
Default: proxy_buffers 8 4k|8k;
Context: http, server, location
Sets the number and size of the buffers used for reading a response from the proxied server, for a single connection. By default, the buffer size is equal to one memory page. This is either 4K or 8K, depending on a platform.
The documentation for the proxy_buffering directive provides a bit more explanation:
When buffering is enabled, nginx receives a response from the proxied server as soon as possible, saving it into the buffers set by the proxy_buffer_size and proxy_buffers directives. If the whole response does not fit into memory, a part of it can be saved to a temporary file on the disk. …
When buffering is disabled, the response is passed to a client synchronously, immediately as it is received. …
So, what does all of that mean?
An increase of buffer size would apply per connection, so even 4K would be quite an increase.
You may notice that the size of the buffer is by default equivalent to platform page. Long story short, choosing the "best" number might as well go beyond the scope of this question, and may depend on operating system and CPU architecture.
Realistically, the difference between a bigger number of smaller buffers, or a smaller number of bigger buffers, may depend on the memory allocator provided by the operating system, as well as how much memory you have and how much memory you want to be wasted by being allocated without being used for a good purpose.
E.g., I would not use proxy_buffers 1 1024k, because then you'll be allocating a 1MB buffer for every buffered connection, even if the content would easily fit in 4KB, that would be wasteful (although, of course, there's also the little-known fact that unused-but-allocated-memory is virtually free since 1980s). There's likely a good reason that the default number of buffers was chosen to be 8 as well.
Increasing the buffers at all might actually be a bit pointless if you do caching of the responses of these binary files with the proxy_cache directive, because Nginx will still be writing it to disk for caching, and you might as well not waste the extra memory for buffering these responses.
A good operating system should be capable of already doing appropriate caching of the stuff that gets written to disk, through the file-system buffer-cache functionality. There is also the somewhat strangely-named article at Wikipedia, as "disk-buffer" name was already taken for the HDD hardware article.
All in all, there's likely little need to duplicate buffering directly within Nginx. You might also take a look at varnish-cache for some additional ideas and inspiration about the subject of multi-level caching. The fact is, "good" operating systems are supposed to take care of many things that some folks mistakenly attempt to optimise through application-specific functionality.
If you don't do caching of responses, then you might as well ask yourself whether or not buffering is appropriate in the first place.
Realistically, buffering may come useful to better protect your upstream servers from the Slowloris attack vector — however, if you do let your Nginx have megabyte-sized buffers, then, essentially you're exposing Nginx itself for consuming an unreasonable amount of resources to service clients with malicious intents.
If the responses are too large, you might want to look into optimising things at the response level. E.g. doing splitting of some content into individual files; doing compression on the file level; doing compression with gzip with HTTP Content-Encoding etc.
TL;DR: this is really a pretty broad question, and there are too many variables that require non-trivial investigation to come up with the "best" answer for any given situation.

Long running-time script in PHP causes NGINX server to get very busy

I'll try to be very specific on this - it won't be easy, so please try to follow.
We have a script that runs with PHP on NGINX - PHP-fpm FastCGI.
This script gets information from the user trying to access it, and runs some algorithm on real-time. It cannot be a scheduled process running in the background.
Sometimes, it even takes for the page between 5-12 seconds to load - and it's ok.
Generally, we collect data from the user and make several outgoing request to third-party servers, collect the data, analyse it and return a response for the user .
The problem is,
There are many users running this script, and the server gets very busy - since they're all active connection on the server, waiting for a response.
We have 2 servers running under 1 load balancer, and that's not enough.
Sometimes the servers have more the 1,500 active connections at a time. You can imagine how these servers respond at that timeframe.
I'm looking for a solution.
We can add more and more servers to the LB, but it just sounds absurd that it's the only solution there is.
We ran over that script and optimized it to the maximum, I can promise you that -
There is no real solution for the long-time running of that script, since it depends on 3rd party servers that take time to respond to us on live traffic.
Is there a solution you can think of, to keep this script as it is -
but somehow to lower the impact of these active connection on the overall servers' functioning?
Sometimes, they just simply stop to respond.
Thank you very much for reading!
3 months old question, I know, but I cant help it thinking that:
In case you're sure that the sum of the network work for all requests to the third-party servers plus the corresponding processing of the responses inside your PHP script is much lower than the limits of your hardware.
Your PHP script is then likely inefficiently busy-looping until all responses come back from the third-party servers
If I were dealing with such an issue I'd do:
Stop using your custom external C++ curl thing, as the PHP script is busy-waiting for it anyways.
Google and read up on non-busy-looping usage of PHP's curl-multi implementation
Hope this makes sense.
My advice is to set limited timeouts for requests and to use asynchronous requests for each third-party request.
For example, for your page you have to display results of 5 third-party requests. It means, that inside script you call cURL or file_get_contents 5 times, but script becomes frozen for each timeout from third party. Step by step. It means, that if for each request you have to wait 10 seconds for the response - you'll have 50 seconds in total.
User calls the script -> script wait to end -> server is loaded for 50 seconds
Now, if each request to third party will be sent asynchronously - it will reduce script's load time to the maximum request delay. So, you'll have few smaller scripts, that will live shorter life - and it will decrease load on the server.
User calls the script -> script is loaded -> requests are sent -> there are no scripts that are waiting for the response and consuming resources of your server
May the AJAX be with you! ;)
This is a very old question, but since I had a similar problem I can share my solution. Long running scripts impact various parts of the system and cause stresses in webservers (in active connects), php-fpm and mysql/other databases. These tend to cause a number of knock on effects such as other requests starting to fail.
Firstly make sure you have netdata (https://github.com/netdata/netdata) installed on the server. If you are running many instances you might find having a Grafana/Prometheus setup is worth it too.
Next make sure it can see the PHP FPM process, Mysql and Nginx. There are many many things Netdata shows, but for this problem, my key metrics were:
Connections (mysql_local.connections) - is the database full of connections
PHP-FPM Active Connections (phpfpm_local.connections) - is PHP failing to keep up
PHP-FPM Request Duration (phpfpm_local.request_duration) - Is the time to process going thru the roof?
Disk Utilization Time (disk_util.sda) - this shows if the disk cannot keep up (100% = bad under load)
Users Open Files (users.files)
Make sure that you have sufficient file handles (https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/), and the disk is not fully occupied. Both of these will block you from making stuff work, so make them big on the troublesome server.
Next check Nginx has enough resources in nginx.conf:
worker_processes auto;
worker_rlimit_nofile 30000;
events {
worker_connections 768;
}
This will give you time to work out what is going wrong.
Next look at php-fpm (/etc/php/7.2/fpm/pool.d/www.conf):
set pm.max_spare_servers high such as 100
set pm.max_requests = 500 - just in case you have a script that doesn't free itself properly.
Then watch. The problem for me was every request blocks an incoming connection. More requests to the same script will block more connections. The machine can be operating fine, but a single slow script doing a curl hit or a slow SQL statement will take that connection for its entire duration, so 30 seconds = 1 less php process to handle incoming requests. Eventually you hit 500, and you run out. If you can increase the number of FPM processes to match the frequency of slow script requests to the number seconds they run for. So if the script takes 2 seconds, and gets hit 2 times a second, you will need a constant 4 additional fpm worker threads chewed up doing nothing.
If you can do that, stop there - the extra effort beyond that is probably not worth it. If it still feels like it will be an issue - create a second php-fpm instance on the box and send all requests for the slow script to that new instance. This allows you to fail those requests discretely in the case of too much run time. This will give you the power to do two important things:
Control the amount of resources devoted to the slow script
Mean that all other scripts are never blocked by the slow script and (assuming the OS limits are high enough) never affected by resource limits.
Hope that helps someone struggling under load!

Harvesting Dynamic HTTP Content to produce Replicating HTTP Static Content

I have a slowly evolving dynamic website served from J2EE. The response time and load capacity of the server are inadequate for client needs. Moreover, ad hoc requests can unexpectedly affect other services running on the same application server/database. I know the reasons and can't address them in the short term. I understand HTTP caching hints (expiry, etags....) and for the purpose of this question, please assume that I have maxed out the opportunities to reduce load.
I am thinking of doing a brute force traversal of all URLs in the system to prime a cache and then copying the cache contents to geodispersed cache servers near the clients. I'm thinking of Squid or Apache HTTPD mod_disk_cache. I want to prime one copy and (manually) replicate the cache contents. I don't need a federation or intelligence amongst the slaves. When the data changes, invalidating the cache, I will refresh my master cache and update the slave versions, probably once a night.
Has anyone done this? Is it a good idea? Are there other technologies that I should investigate? I can program this, but I would prefer a configuration of open source technologies solution
Thanks
I've used Squid before to reduce load on dynamically-created RSS feeds, and it worked quite well. It just takes some careful configuration and tuning to get it working the way you want.
Using a primed cache server is an excellent idea (I've done the same thing using wget and Squid). However, it is probably unnecessary in this scenario.
It sounds like your data is fairly static and the problem is server load, not network bandwidth. Generally, the problem exists in one of two areas:
Database query load on your DB server.
Business logic load on your web/application server.
Here is a JSP-specific overview of caching options.
I have seen huge performance increases by simply caching query results. Even adding a cache with a duration of 60 seconds can dramatically reduce load on a database server. JSP has several options for in-memory cache.
Another area available to you is output caching. This means that the content of a page is created once, but the output is used multiple times. This reduces the CPU load of a web server dramatically.
My experience is with ASP, but the exact same mechanisms are available on JSP pages. In my experience, with even a small amount of caching you can expect a 5-10x increase in max requests per sec.
I would use tiered caching here; deploy Squid as a reverse proxy server in front of your app server as you suggest, but then deploy a Squid at each client site that points to your origin cache.
If geographic latency isn't a big deal, then you can probably get away with just priming the origin cache like you were planning to do and then letting the remote caches prime themselves off that one based on client requests. In other words, just deploying caches out at the clients might be all you need to do beyond priming the origin cache.

Resources