nginx large static file serving slow on AWS EBS-backed server

nginx large static file serving slow on AWS EBS-backed server - nginx

I'm trying to figure out the correct tuning for nginx on an AWS server that is wholly backed by EBS. The basic issue is that when downloading a ~100MB static file, I'm seeing consistent download rates of ~60K/s. If I use scp to copy the same file from the AWS server, I'm seeing rates of ~1MB/s. (So, I'm not sure EBS even comes into play here).
Initially, I was running nginx with basically the out-of-the-box configuration (for CentOS 6.x). But in an attempt to speed things up, I've played around with various tuning parameters to no avail -- the speed has remained basically the same.
Here is the relevant fragment from my config as it stands at this moment:
location /download {
root /var/www/yada/update;
disable_symlinks off;
autoindex on;
# Transfer tuning follows
aio on;
directio 4m;
output_buffers 1 128k;
}
Initially, these tuning settings were:
sendfile on;
tcp_nopush on;
tcp_nodelay on;
Note, I'm not trying to optimize for a large amount of traffic. There is likely only a single client ever downloading at any given time. The AWS server is a 'micro' instance with 617MB of memory. Regardless, the fact that scp can download at ~1MB/s leads me to believe that HTTP should be able to match or beat that throughput.
Any help is appreciated.
[Update]
Additional information. Running a 'top' command while a download is running, I get:
top - 07:37:33 up 11 days, 1:56, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 63 total, 1 running, 62 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
and 'iostat' shows:
Linux 3.2.38-5.48.amzn1.x86_64 04/03/2013 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.02 0.00 0.03 0.03 0.02 99.89
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
xvdap1 0.23 2.66 8.59 2544324 8224920

Have you considered turning sendfile on? Sendfile allows nginx to use the kernel directly to send static files, so it should be faster than any other option.

By default scp will much faster then your HTTP connection. I have a suggestion for you. If you are serving a static file, I prefer to use S3 with Cloud front. Which makes it faster. Its very difficult to achieve better performance there is a file transfer.

Given that things work well on the same machine you are getting throttled. First check your usage policy with AWS, perhaps it's in the fine print. Alternatively, try different ISP'S. If they all give you 60kB/s you know it's AWS.

Related

Elastic Beanstalk WebSocket Connection Dropped

Under Elastic Beanstalk and behind an Application Load Balancer, I have a WebSockets application on Embedded Jetty.
Platform: Java 8 running on 64bit Amazon Linux/2.10.1
The issue is the connection is being dropped at the one minute mark. Even though, I have already set the Application Load Balancer's Idle Timeout to 300 seconds (which is Jetty's default timeout).
Thus, I did some research and I am thinking now that is a timeout imposed by Nginx, so I followed the answer here.
I could not deploy with an .ebextension formatted like that. Elastic Beanstalk would tell me that the file to replaced did not exist. After, I run into this article, so I came up with the following script:
files:
"/etc/nginx/conf.d/01_increase_timeouts.conf":
mode: "000644"
owner: root
group: root
content: |
keepalive_timeout 300;
proxy_connect_timeout 300;
proxy_send_timeout 300;
proxy_read_timeout 300;
send_timeout 300;
container_commands:
nginx_reload:
command: "sudo service nginx reload"
This way, I am able to deploy now. However, WebSockets connections continue to being dropped at the one minute mark.
Can anyone point out what I am doing wrong or what I could try next?
Please, any help would be greatly appreciated.

nginx returns Internal Server Error when uploading large files (several GB)

I have an Artifactory behind nginx and uploading files larger than 4 GB fails. I am fairly certain that this is nginx's fault, because if the file is uploaded from/to localhost, no problem occurs.
nginx is set up to have client_max_body_size and client_body_timeout large enough for this not to be an issue.
Still, when uploading a large file (>4 GB) via curl, after about half a minute it fails. The only error message I get is HTTP 500 Internal Server Error, nothing is written to the nginx's error logs.

The problem in my case was insufficient disk space mounted on root. I have a huge disk mounted on /home, but only had about 4 GB left on /. I assume that nginx was saving incoming request bodies there and after it had filled up, the request was shut down.
The way I fixed it was to add those lines to the nginx.conf file (not all of them are necessarily required):
http {
(...)
client_max_body_size 100G;
client_body_timeout 300s;
client_body_in_file_only clean;
client_body_buffer_size 16K;
client_body_temp_path /home/nginx/client_body_temp;
}
The last line is the important part - there I tell nginx to fiddle with its files in the /home space.

NGINX microcache

I'm using D7 with perusio's config, so far so good.
What I want to understand is how microchache works internally and what is the meaning of the configuration parameters in this line:
fastcgi_cache_path /var/cache/nginx/microcache levels=1:2
keys_zone=microcache:5M max_size=1G inactive=2h
loader_threshold=2592000000 loader_sleep=1 loader_files=100000;
I don't know, for example, if max_size=1G (which I assume is the max-size of entire cache) refers to 1G in RAM, and could be a problem if my VPS only has 1G RAM.
loader_files? Maybe the files that could be cached?
loader_sleep? Is it one second for microcaching? That sounds logical but in testing microcaches max time is about 15 seconds.
... And the other parameters I have no idea. Please help me to understand this line.
Thanks.

An explanation of the nginx/starman/dancer web stack

I've been doing web-programming for a while now and am quite familiar with the LAMP stack. I've decided to try playing around with the nginx/starman/dancer stack and I'm a bit confused about how to understand, from a high-level, how all the pieces relate to each other. Setting up the stack doesn't seem as straight forward as setting up the LAMP stack, but that's probably because I don't really understand how the pieces relate.
I understand the role nginx is playing - a lightweight webserver/proxy - but I'm confused about how starman relates to pgsi, plack and dancer.
I would appreciate a high-level breakdown of how these pieces relate to each other and why each is necessary (or not necessary) to get the stack setup. Thanks!

I've spent the last day reading about the various components and I think I have enough of an understanding to answer my own question. Most of my answer can be found in various places on the web, but hopefully there will be some value to putting all the pieces in one place:
Nginx: The first and most obvious piece of the stack to understand is nginx. Nginx is a lightweight webserver that can act as a replacement for the ubiquitous Apache webserver. Nginx can also act as a proxy server. It has been growing rapidly in its use and currently serves about 10% of all web domains. One crucial advantage of nginx is that it is asynchronous and event-driven instead of creating a process thread to handle each connection. In theory this means that nginx is able to handle a large number of connections without using a lot of system resources.
PSGI: PSGI is a protocol (to distinguish it from a particular implementation of the protocol, such as Plack). The main motivation for creating PSGI, as far as I can gather, is that when Apache was first created there was no native support for handling requests with scripts written in e.g., Perl. The ability to do this was tacked on to Apache using mod_cgi. To test your Perl application, you would have to run the entire webserver, as the application ran within the webserver. In contrast, PSGI provides a protocol with which a webserver can communicate with a server written in e.g. Perl. One of the benefits of this is that it's much easier to test the Perl server independently of the webserver. Another benefit is that once an application server is built, it's very easy to switch in different PSGI-compatible webservers to test which provides the best performance.
Plack: This is a particular implementation of the PSGI protocol that provides the glue between a PSGI-compatible webserver and a perl application server. Plack is Perl's equivalent of Ruby's Rack.
Starman: A perl based webserver that is compatible with the PSGI protocol. One confusion I had was why I would want to use both Starman and Nginx at the same time, but thankfully that question was answered quite well here on Stackoverflow. The essence is that it might be better to let nginx serve static files without requiring a perl process to do that, while also allowing the perl application server to run on a higher port.
Dancer: A web application framework for Perl. Kind of an equivalent of Ruby on Rails. Or to be more precise, an equivalent of Sinatra for Ruby (the difference is that Sinatra is a minimalist framework, whereas Ruby on Rails is a more comprehensive web framework). As someone who dealt with PHP and hadn't really used a web framework before, I was a bit confused about how this related to the serving stack. The point of web frameworks is they abstract away common tasks that are very frequently performed in web applications, such as converting database queries into objects/data structures in the web application.
Installation (on ubuntu):
sudo apt-get install nginx
sudo apt-get install build-essential curl
sudo cpan App::cpanminus
sudo cpanm Starman
sudo cpanm Task::Plack
sudo apt-get install libdancer-perl
Getting it running:
cd
dancer -a mywebapp
sudo plackup -s Starman -p 5001 -E deployment --workers=10 -a mywebapp/bin/app.pl
Now you will have a starman server running your Dancer application on port 5001. To make nginx send traffic to the server you have to modify /etc/nginx/nginx.conf and add a rule something like this to the http section:
server {
server_name permanentinvesting.com
listen 80;
location /css/ {
alias /home/ubuntu/mywebapp/public/css/;
expires 30d;
access_log off;
}
location / {
proxy_pass http://localhost:5001;
proxy_set_header X-Real-IP $remote_addr;
}
}
The first location rule specifies that nginx should handle static content in the /css directory by getting it from /home/ubuntu/mywebapp/public/css/. The second location rule says that traffic to the webserver on port 80 should be sent to the Starman server to handle. Now we just need to start nginx:
sudo service nginx start

Your Answer is so far correct, but it would be better to set up nginx the following way:
server {
listen 80;
server_name foo.example.com;
location / {
# Serve static files directly:
if (-f $request_filename) {
expires 30d;
break;
}
# Pass on other requests to Dancer app
proxy_pass_header Server;
proxy_pass http://localhost:5001/;
}
}
This make nginx serve all static files (JavaScript and images) and not just the css.
This example is taken from the 2011 Perl Dancer Advent :)

From nginx wiki:
"IfIsEvil ... Directive if has problems when used in location context, in some cases it doesn't do what you expect but something completely different instead. In some cases it even segfaults. It's generally a good idea to avoid it if possible...."
A better set up is:
server {
listen 80;
server_name foo.example.com;
location / {
# Checks the existence of files and uses the first match
try_files $uri $uri/ #dancer;
}
location #dancer {
# Pass on other requests to Dancer app
proxy_pass_header Server;
proxy_pass http://localhost:5001/;
}
}

Correction for the answer from s.magri:
location #dancer {
# Pass on other requests to Dancer app
proxy_pass_header Server;
proxy_pass http://localhost:5001;
}
I had to remove the trailing slash in the last proxy_pass directive. My version of nginx (1.10.3) won't start up with the trailing slash.

nginx errors readv() and recv() failed

I use nginx along with fastcgi. I see a lot of the following errors in the error logs
readv() failed (104: Connection reset
by peer) while reading upstream and
recv() failed (104: Connection reset
by peer) while reading response header
from upstream
I don't see any problem using the application. Are these errors serious or how to get rid of them.

I was using php-fpm in the background and slow scripts were getting killed after a said timeout because it was configured that way. Thus, scripts taking longer than a specified time would get killed and nginx would report a recv or readv error as the connection is closed from the php-fpm engine/process.

Update:
Since nginx version 1.15.3 you can fix this by setting the keepalive_requests option of your upstream to the same number as your php-fpm's pm.max_requests:
upstream name {
...
keepalive_requests number;
...
}
Original answer:
If you are using nginx to connect to php-fpm, one possible cause can also be having nginx' fastcgi_keep_conn parameter set to on (especially if you have a low pm.max_requests setting in php-fpm):
http|server|location {
...
fastcgi_keep_conn on;
...
}
This may cause the described error every time a child process of php-fpm restarts (due to pm.max_requests being reached) while nginx is still connected to it. To test this, set pm.max_requests to a really low number (like 1) and see if you get even more of the above errors.
The fix is quite simple - just deactivate fastcgi_keep_conn:
fastcgi_keep_conn off;
Or remove the parameter completely (since the default value is off). This does mean your nginx will reconnect to php-fpm on every request, but the performance impact is negligible if you have both nginx and php-fpm on the same machine and connect via unix socket.

Regarding this error:
readv() failed (104: Connection reset by peer) while reading upstream and recv() failed (104: Connection reset by peer) while reading response header from upstream
there was 1 more case where I could still see this.
Quick set up overview:
CentOS 5.5
PHP with PHP-FPM 5.3.8 (compiled from scratch with some 3rd party
modules)
Nginx 1.0.5
After looking at the PHP-FPM error logs as well and enabling catch_workers_output = yes in the php-fpm pool config, I found the root cause in this case was actually the amfext module (PHP module for Flash).
There's a known bug and fix for this module that can be corrected by altering the amf.c file.
After fixing this PHP extension issue, the error above was no longer an issue.

This is a very vague error as it can mean a few things. The key is to look at all possible logs and figure it out.
In my case, which is probably somewhat unique, I had a working nginx + php / fastcgi config. I wanted to compile a new updated version of PHP with PHP-FPM and I did so. The reason was that I was working on a live server that couldn't afford downtime. So I had to upgrade and move to PHP-FPM as seamlessly as possible.
Therefore I had 2 instances of PHP.
1 directly talking with fastcgi (PHP 5.3.4) - using TCP / 127.0.0.1:9000 (PHP 5.3.4)
1 configured with PHP-FPM - using Unix socket - unix:/dir/to/socket-fpm
(PHP 5.3.8)
Once I started up PHP-FPM (PHP 5.3.8) on an nginx vhost using a socket connection instead of TCP I started getting this upstream error on any fastcgi page taking longer than x minutes whether they were using FPM or not. Typically it was pages doing large SELECTS in mysql that took ~2 min to load. Bad I know, but this is because of back end DB design.
What I did to fix it was add this in my vhost configuration:
fastcgi_read_timeout 5m;
Now this can be added in the nginx global fastcgi settings as well. It depends on your set up. http://wiki.nginx.org/HttpFcgiModule

Answer # 2.
Interestingly enough fastcgi_read_timeout 5m; fixed one vhost for me.
However I was still getting the error in another vhost, just by running phpinfo();
What fixed this for me was by copying over a default production php.ini file and adding the config I needed into it.
What I had was an old copy of my php.ini from the previous PHP install.
Once I put the default php.ini from 'shared' and just added in the extensions and config I needed, this solved my problem and no longer did I have nginx errors readv() and recv() failed.
I hope 1 of these 2 fixes helps someone.

Also it can be a very simple problem - there is an infinity cicle somewhere in your code, or an infinity trying to connect an external host on your page.

Some times this problem happen because of huge of requests. By default the pm.max_requests in php5-fpm maybe is 100 or below.
To solve it increase its value depend on the your site's requests, For example 500.
And after the you have to restart the service
sudo service php5-fpm restart

Others have mentioned the fastcgi_read_timeout parameter, which is located in the nginx.conf file:
http {
...
fastcgi_read_timeout 600s;
...
}
In addition to that, I also had to change the setting request_terminate_timeout in the file: /etc/php5/fpm/pool.d/www.conf
request_terminate_timeout = 0
Source of information (there are also a few other recommendations for changing php.ini parameters, which may be relevant in some cases): https://ma.ttias.be/nginx-and-php-fpm-upstream-timed-out-failed-110-connection-timed-out-or-reset-by-peer-while-reading/

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex