i have a huge Problem on my site.
Please help me to fix it.
i have a site where users can download files from different other sites (f.e one-click-hoster like uploaded.net). We act as like a proxy. The user generate a link and download the file directly. Our Script download nothing on the server. A little bit like a premium link generator but different. AND NOT ILLEGAL.
If the user are downloading a file that is larger than 1GB, the download will be canceled when it reachs 1gb.
In the log files i found repeated the error
"Upstream timed out (110: Connection timed out) while reading response"
I have tried to put the settings higher but that didn't help.
I tried following:
1. nginx.conf:
fastcgi_send_timeout 300s;
fastcgi_read_timeout 300s;
2. nginx host file:
fastcgi_read_timeout 300;
fastcgi_buffers 8 128k;
fstcgi_buffer_size 256k;
3. PHP.ini:
max_execution_time = 60 (but my php script will set it automaticly to 0)
max_input_time = 60
memory_limit = 128M
4. PHP-FPM >> www.conf
pm.max_children = 25
pm.start_servers = 2
pm.min_spare_servers = 2
pm.max_spare_servers = 12
request_terminate_timeout = 300s
But nothing helps. What can i do to fix this problem?
Server/Nginx Infos:
Memory: 32079MB
CPU: model name: Intel(R) Xeon(R) CPU E3-1230 v3 # 3.30GHz (8 Cores)
PHP: PHP 5.5.15-1~dotdeb.1 (cli) (built: Jul 24 2014 16:44:04)
NGINX: nginx/1.2.1
nginx.conf:
worker_processes 8;
worker_connections 2048;
But time settings are doens't matter i think. Because the download stops exactly on 1.604.408 KB everytime. If i download with 20kb/s the download needs more time, but will cancel on exactly 1.604.408 KB.
thank you for any help.
If you need more informations please ask me.
i had similar problem, where download would stop at 1024MB with error
readv() failed (104: Connection reset by peer) while reading upstream
adding this to nginx.conf file helped:
fastcgi_max_temp_file_size 1024m;
Related
We have been developing an enterprise application for the last two years. Based on microservice architecture, we have nine services with their respective databases and an Angular frontend on NGINX that calls/connects microservices. During our development, we implemented these services and their databases on the Hetzner cloud server with 4GB RAM and 2 CPUs over the internal network, and everything has been working seamlessly. We are uploading all images, pdf, and videos on AWS S3, and it has been smooth sailing. Videos of all sizes were uploaded and played without any issues.
We liked Hetzner and decided to go production also with them. We took the first server and installed proxmox over it, and deployed LXC containers and our services. I tested again here, and no problems were found again.
We then decided to take another server, deployed proxmox, and clustered them. This is where the problem started when we hired a network guy who configured a bridged network between the containers of both nodes. Each container pings the other well, and the telnet also connects over an internal network. MTU set on this bridge is 1400.
Primary Problem- We are NOT able to upload videos over 2 MB to S3 anymore from this network
Other problems – These are intermittent issues, noted in logs–
NGNIX –
504 Gateway Time-out ERRORS of likes, on multiple services--> upstream timed out (110: Connection timed out) while reading response header from upstream, client: 223.235.101.169, server: abc.xyz.com, request: "GET /courses/course HTTP/1.1", upstream: "http://10.10.XX.XX:8080//courses/course/toBeApprove", host: " abc.xyz.com, ", referrer: "https:// abc.xyz.com, /"
Tomcat-
com.amazonaws.services.s3.model.AmazonS3Exception: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 7J2EHKVDWQP3367G; S3 Extended Request ID: xGGCQhESxh/Mo6ddwtGYShLIeCJYbgCRT8oGleQu/IfguEfbZpTQXG/AIzgLnG2F5YuCqk7vVE8=), S3 Extended Request ID: xGGCQhESxh/Mo6ddwtGYShLIeCJYbgCRT8oGleQu/IfguEfbZpTQXG/AIzgLnG2F5YuCqk7vVE8=
(we increased all known timeouts, both in nginx and tomcat)
Mysql- 2022-09-08T04:24:27.235964Z 8 [Warning] [MY-010055]
[Server] IP address '10.10.XX.XX could not be resolved: Name or
service not known
Other key points to note – we allow video up to 100 mb to upload thus known limits set in nginx and tomcat configurations
Nginx, client_max_body_size 100m;
And tomcat <Connector port="8080"
protocol="HTTP/1.1" maxPostSize="102400” maxHttpHeaderSize="102400"
connectionTimeout="20000" redirectPort="8443" />
In these readings and trials running over last 15 days, we stopped, all firewalls, ufw on OS, proxmox firewall, and even the data center firewall while debugging.
This is our nginx.conf
http {
proxy_http_version 1.1;
proxy_set_header Connection "";
##
client_body_buffer_size 16K;
client_header_buffer_size 1k;
client_max_body_size 100m;
client_header_timeout 100s;
client_body_timeout 100s;
large_client_header_buffers 4 16k;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 300;
send_timeout 600;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
gzip on;
gzip_comp_level 2;
gzip_min_length 1000;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain application/x-javascript text/xml text/css application/xml;
These are our primary test/debugging trials.
**1. Testing with a small video (of size 273 Kb)**
a. Nginx log- clean, nothing related to operations
b. Tomcat log-
Start- CoursesServiceImpl - addCourse - Used Memory:73
add course 703
image file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile#15476ca3
image save to s3 bucket
image folder name images
buckets3 lmsdev-cloudfront/images
image s3 bucket for call
imageUrl https://lmsdev-cloudfront.s3.amazonaws.com/images/703_4_istockphoto-1097843576-612x612.jpg
video file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile#13419d27
video save to s3 bucket
video folder name videos
input Stream java.io.ByteArrayInputStream#4da82ff
buckets3 lmsdev-cloudfront/videos
video s3 bucket for call
video url https://lmsdev-cloudfront.s3.amazonaws.com/videos/703_4_giphy360p.mp4
Before Finally - CoursesServiceImpl - addCourse - Used Memory:126
After Finally- CoursesServiceImpl - addCourse - Used Memory:49
c. S3 bucket
[S3 bucket][1]
[1]: https://i.stack.imgur.com/T7daW.png
3. Testing with video 2 mb (fractionally less)
a. Progress bar keeps running about 5 minutes, then
b. Nginx logs-
2022/09/10 16:15:34 [error] 3698306#3698306: *24091 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 223.235.101.169, server: login.pathnam.education, request: "POST /courses/courses/course HTTP/1.1", upstream: "http://10.10.10.10:8080//courses/course", host: "login.pathnam.education", referrer: "https://login.pathnam.education/"
c. Tomcat logs-
Start- CoursesServiceImpl - addCourse - Used Memory:79
add course 704
image file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile#352d57e3
image save to s3 bucket
image folder name images
buckets3 lmsdev-cloudfront/images
image s3 bucket for call
imageUrl https://lmsdev-cloudfront.s3.amazonaws.com/images/704_4_m_Maldives_dest_landscape_l_755_1487.webp
video file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile#45bdb178
video save to s3 bucket
video folder name videos
input Stream java.io.ByteArrayInputStream#3a85dab9
And after few minutes
com.amazonaws.SdkClientException: Unable to execute HTTP request: Connection timed out (Write failed)
d. S3 Bucket – No entry
Now tried to upload the same video from our test server, and it was instantly uploaded to S3 bucket.
Reading all posts with similar problems,mostly are related to php.ini configurations and thus not related to us.
I have solved the issue now, MTU set in LXC container was set differently than what was configured in virtual switch. Proxmox does not give to set MTU while creating LXC container (and you expect bridge MTU to be used) and you can miss that.
Go to conf file of container; in my case it is 100
nano /etc/pve/lxc/100.conf
find and edit this line
net0: name=eno1,bridge=vmbr4002,firewall=1,hwaddr=0A:14:98:05:8C:C5,ip=192.168.0.2/24,type=veth
to add mtu value, as per switch in towards the last:
name=eno1,bridge=vmbr4002,firewall=1,hwaddr=0A:14:98:05:8C:C5,ip=192.168.0.2/24,type=veth,mtu=1400 (my value at vswitch)
Reboot the container for a permanent change.
And all worked like a charm for me. Hope it helps someone who also uses Proxmox interface to create the containers and thus missed this to configure via CLI (a suggested enhancement to Proxmox)
I tried to restart nginx with command, but error occured.
When I run "sudo systemctl restart nginx", this happens.
Job for nginx.service failed because the control process exited with error code. See "systemctl status nginx.service" and "journalctl -xe" for details.
When I run "systemctl status nginx.service", this happens.
Mar 30 08:55:04 ip-172-31-22-186 nginx[2624]: nginx: [emerg] "proxy_buffers" directive invalid value in /etc/nginx/sites-enabled/...:19
Mar 30 08:55:04 ip-172-31-22-186 nginx[2624]: nginx: configuration file /etc/nginx/nginx.conf test failed
in nginx.conf file:
location / {
....
proxy_buffer_size 0M;
proxy_buffers 4 0M;
proxy_busy_buffers_size 0M;
client_max_body_size 0M;
}
is there a problem with the configuration here?
The proxy_buffers can not be configured like this. Based on what they are used for how how they are designed you can NOT set a buffer of 0m. This would set a memory size (page size) of 0M.
proxy_buffers
Sets the number and size of the buffers used for reading a response from the proxied server, for a single connection. By default, the buffer size is equal to one memory page. This is either 4K or 8K, depending on a platform.
The proxy buffer size is equal to a memory page. To find your current memory_page size type:
getconf PAGE_SIZE
This should return 4096(bytes) -> 4K.
So as you can see there is a reason why you can only use 4K or 8K depending on your system architecture.
We have a great blog post about proxying in general.
https://www.nginx.com/blog/performance-tuning-tips-tricks/
By turning proxy_buffering to on you can configure the proxy_buffers with the directives shown in the docs:
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_buffering
I have an Artifactory behind nginx and uploading files larger than 4 GB fails. I am fairly certain that this is nginx's fault, because if the file is uploaded from/to localhost, no problem occurs.
nginx is set up to have client_max_body_size and client_body_timeout large enough for this not to be an issue.
Still, when uploading a large file (>4 GB) via curl, after about half a minute it fails. The only error message I get is HTTP 500 Internal Server Error, nothing is written to the nginx's error logs.
The problem in my case was insufficient disk space mounted on root. I have a huge disk mounted on /home, but only had about 4 GB left on /. I assume that nginx was saving incoming request bodies there and after it had filled up, the request was shut down.
The way I fixed it was to add those lines to the nginx.conf file (not all of them are necessarily required):
http {
(...)
client_max_body_size 100G;
client_body_timeout 300s;
client_body_in_file_only clean;
client_body_buffer_size 16K;
client_body_temp_path /home/nginx/client_body_temp;
}
The last line is the important part - there I tell nginx to fiddle with its files in the /home space.
I work for a rather busy internet site that is often gets very large spikes of traffic. During these spikes hundreds of pages per second are requested and this produces random 502 gateway errors.
Now we run Nginx (1.0.10) and PHP-FPM on a machine with 4x SAS 15k drives (raid10) with a 16 core CPU and 24GB of DDR3 ram. Also we make use of the latest Xcache version. The DB is located on another machine, but this machine's load is very low, and has no issues.
Under normal load everything runs perfect, system load is below 1, and PHP-FPM status report never really shows more than 10 active processes at one time. There is always about 10GB of ram still available. Under normal load the machine handles about 100 pageviews per second.
The problem arises when huge spikes of traffic arrive, and hundreds of page-views per second are requested from the machine. I notice that FPM's status report then shows up to 50 active processes, but that is still way below the 300 max connections that we have configured. During these spikes Nginx status reports up to 5000 active connections instead of the normal average of 1000.
OS Info: CentOS release 5.7 (Final)
CPU: Intel(R) Xeon(R) CPU E5620 # 2.40GH (16 cores)
php-fpm.conf
daemonize = yes
listen = /tmp/fpm.sock
pm = static
pm.max_children = 300
pm.max_requests = 1000
I have not setup rlimit_files, because as far as I know it should use the system default if you don't.
fastcgi_params (only added values to standard file)
fastcgi_connect_timeout 60;
fastcgi_send_timeout 180;
fastcgi_read_timeout 180;
fastcgi_buffer_size 128k;
fastcgi_buffers 4 256k;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
fastcgi_intercept_errors on;
fastcgi_pass unix:/tmp/fpm.sock;
nginx.conf
worker_processes 8;
worker_connections 16384;
sendfile on;
tcp_nopush on;
keepalive_timeout 4;
Nginx connects to FPM via Unix Socket.
sysctl.conf
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 1
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.tcp_timestamps = 0
net.ipv4.conf.all.rp_filter=1
net.ipv4.conf.default.rp_filter=1
net.ipv4.conf.eth0.rp_filter=1
net.ipv4.conf.lo.rp_filter=1
net.ipv4.ip_conntrack_max = 100000
limits.conf
* soft nofile 65536
* hard nofile 65536
These are the results for the following commands:
ulimit -n
65536
ulimit -Sn
65536
ulimit -Hn
65536
cat /proc/sys/fs/file-max
2390143
Question: If PHP-FPM is not running out of connections, the load is still low, and there is plenty of RAM available, what bottleneck could be causing these random 502 gateway errors during high traffic?
Note: by default this machine's ulimit's were 1024, since I changed it to 65536 I have not fully rebooted the machine, as it's a production machine and it would mean too much downtime.
This should fix it...
You have:
fastcgi_buffers 4 256k;
Change it to:
fastcgi_buffers 256 16k; // 4096k total
Also set fastcgi_max_temp_file_size 0, that will disable buffering to disk if replies start to exceeed your fastcgi buffers.
Unix socket accept 128 connections by default. It is good to put this line into /etc/sysctl.conf
net.core.somaxconn = 4096
If it's not helping in some cases - use normal port bind instead of socket, because socket on 300+ can block new requests forcing nginx to show 502.
#Mr. Boon
I have 8 core 14 GB ram. But the system gives Gateway time-out very often.
Implementing below fix also didn't solved the issue. Still searching for better fixes.
You have: fastcgi_buffers 4 256k;
Change it to:
fastcgi_buffers 256 16k; // 4096k total
Also set fastcgi_max_temp_file_size 0, that will disable buffering to disk if replies start to exceed your fastcgi buffers.
Thanks.
I use nginx along with fastcgi. I see a lot of the following errors in the error logs
readv() failed (104: Connection reset
by peer) while reading upstream and
recv() failed (104: Connection reset
by peer) while reading response header
from upstream
I don't see any problem using the application. Are these errors serious or how to get rid of them.
I was using php-fpm in the background and slow scripts were getting killed after a said timeout because it was configured that way. Thus, scripts taking longer than a specified time would get killed and nginx would report a recv or readv error as the connection is closed from the php-fpm engine/process.
Update:
Since nginx version 1.15.3 you can fix this by setting the keepalive_requests option of your upstream to the same number as your php-fpm's pm.max_requests:
upstream name {
...
keepalive_requests number;
...
}
Original answer:
If you are using nginx to connect to php-fpm, one possible cause can also be having nginx' fastcgi_keep_conn parameter set to on (especially if you have a low pm.max_requests setting in php-fpm):
http|server|location {
...
fastcgi_keep_conn on;
...
}
This may cause the described error every time a child process of php-fpm restarts (due to pm.max_requests being reached) while nginx is still connected to it. To test this, set pm.max_requests to a really low number (like 1) and see if you get even more of the above errors.
The fix is quite simple - just deactivate fastcgi_keep_conn:
fastcgi_keep_conn off;
Or remove the parameter completely (since the default value is off). This does mean your nginx will reconnect to php-fpm on every request, but the performance impact is negligible if you have both nginx and php-fpm on the same machine and connect via unix socket.
Regarding this error:
readv() failed (104: Connection reset by peer) while reading upstream and recv() failed (104: Connection reset by peer) while reading response header from upstream
there was 1 more case where I could still see this.
Quick set up overview:
CentOS 5.5
PHP with PHP-FPM 5.3.8 (compiled from scratch with some 3rd party
modules)
Nginx 1.0.5
After looking at the PHP-FPM error logs as well and enabling catch_workers_output = yes in the php-fpm pool config, I found the root cause in this case was actually the amfext module (PHP module for Flash).
There's a known bug and fix for this module that can be corrected by altering the amf.c file.
After fixing this PHP extension issue, the error above was no longer an issue.
This is a very vague error as it can mean a few things. The key is to look at all possible logs and figure it out.
In my case, which is probably somewhat unique, I had a working nginx + php / fastcgi config. I wanted to compile a new updated version of PHP with PHP-FPM and I did so. The reason was that I was working on a live server that couldn't afford downtime. So I had to upgrade and move to PHP-FPM as seamlessly as possible.
Therefore I had 2 instances of PHP.
1 directly talking with fastcgi (PHP 5.3.4) - using TCP / 127.0.0.1:9000 (PHP 5.3.4)
1 configured with PHP-FPM - using Unix socket - unix:/dir/to/socket-fpm
(PHP 5.3.8)
Once I started up PHP-FPM (PHP 5.3.8) on an nginx vhost using a socket connection instead of TCP I started getting this upstream error on any fastcgi page taking longer than x minutes whether they were using FPM or not. Typically it was pages doing large SELECTS in mysql that took ~2 min to load. Bad I know, but this is because of back end DB design.
What I did to fix it was add this in my vhost configuration:
fastcgi_read_timeout 5m;
Now this can be added in the nginx global fastcgi settings as well. It depends on your set up. http://wiki.nginx.org/HttpFcgiModule
Answer # 2.
Interestingly enough fastcgi_read_timeout 5m; fixed one vhost for me.
However I was still getting the error in another vhost, just by running phpinfo();
What fixed this for me was by copying over a default production php.ini file and adding the config I needed into it.
What I had was an old copy of my php.ini from the previous PHP install.
Once I put the default php.ini from 'shared' and just added in the extensions and config I needed, this solved my problem and no longer did I have nginx errors readv() and recv() failed.
I hope 1 of these 2 fixes helps someone.
Also it can be a very simple problem - there is an infinity cicle somewhere in your code, or an infinity trying to connect an external host on your page.
Some times this problem happen because of huge of requests. By default the pm.max_requests in php5-fpm maybe is 100 or below.
To solve it increase its value depend on the your site's requests, For example 500.
And after the you have to restart the service
sudo service php5-fpm restart
Others have mentioned the fastcgi_read_timeout parameter, which is located in the nginx.conf file:
http {
...
fastcgi_read_timeout 600s;
...
}
In addition to that, I also had to change the setting request_terminate_timeout in the file: /etc/php5/fpm/pool.d/www.conf
request_terminate_timeout = 0
Source of information (there are also a few other recommendations for changing php.ini parameters, which may be relevant in some cases): https://ma.ttias.be/nginx-and-php-fpm-upstream-timed-out-failed-110-connection-timed-out-or-reset-by-peer-while-reading/