I have been dealing with this problem all day and it is driving me insane. All Google results and searches here lead to dead ends. I hope someone can work with me to provide a solution for myself and future victims. Here we go.
I am running a very popular website with over 3M page views a day. On average that is 34 page views per second, but more realistically, during peak hours, it gets to over 300 page views per second. Think of these as requests.
I am running a Ubuntu 10.04 64-bit server with 2 E5620 CPUs, 12GB RAM, and a Micron P300 6Gb/s SSD. During the peak hours the CPU and memory load is average (20-30% CPU and half of memory is used).
The software that powers this site is: NGINX, MySQL, PHP5-FPM, PHP-APC, and Memcached. Ok, now finally the meat of the post, here are my error logs. There a bunch of these errors logged.
/var/log/php5-fpm
Jul 20 14:49:47.289895 [NOTICE] fpm is running, pid 29373
Jul 20 14:49:47.337092 [NOTICE] ready to handle connections
Jul 20 14:51:23.957504 [ERROR] [pool www] unable to retrieve process activity of one or more child(ren). Will try again later.
Jul 20 14:51:41.846439 [WARNING] [pool www] child 29534 exited with code 1 after 114.518174 seconds from start
Jul 20 14:51:41.846797 [NOTICE] [pool www] child 29597 started
Jul 20 14:51:41.896653 [WARNING] [pool www] child 29408 exited on signal 11 SIGSEGV after 114.596706 seconds from start
Jul 20 14:51:41.897178 [NOTICE] [pool www] child 29598 started
Jul 20 14:51:41.903286 [WARNING] [pool www] child 29398 exited with code 1 after 114.605761 seconds from start
Jul 20 14:51:41.903719 [NOTICE] [pool www] child 29600 started
Jul 20 14:51:41.907816 [WARNING] [pool www] child 29437 exited with code 1 after 114.601417 seconds from start
Jul 20 14:51:41.908253 [NOTICE] [pool www] child 29601 started
Jul 20 14:51:41.916002 [WARNING] [pool www] child 29513 exited with code 1 after 114.592514 seconds from start
Jul 20 14:51:41.916501 [NOTICE] [pool www] child 29602 started
Jul 20 14:51:41.916558 [WARNING] [pool www] child 29494 exited on signal 11 SIGSEGV after 114.597355 seconds from start
Jul 20 14:51:41.916873 [NOTICE] [pool www] child 29603 started
Jul 20 14:51:41.921389 [WARNING] [pool www] child 29502 exited with code 1 after 114.600405 seconds from start
/var/log/nginx/error.log
2011/07/20 15:48:42 [error] 29583#0: *569743 readv() failed (104: Connection reset by peer) while reading upstream, client: 77.223.197.193, server: domain.com, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29578#0: *571695 readv() failed (104: Connection reset by peer) while reading upstream, client: 150.70.64.196, server: domain.com, request: "GET /page HTTP/1.0", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29581#0: *571050 readv() failed (104: Connection reset by peer) while reading upstream, client: 110.136.157.66, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29581#0: *564892 readv() failed (104: Connection reset by peer) while reading upstream, client: 110.136.161.214, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29585#0: *456171 readv() failed (104: Connection reset by peer) while reading upstream, client: 93.223.33.135, server: domain.com, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29585#0: *471192 readv() failed (104: Connection reset by peer) while reading upstream, client: 74.90.33.142, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29580#0: *570132 readv() failed (104: Connection reset by peer) while reading upstream, client: 180.246.182.191, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
Finally, I want to point out that I did try to disable PHP-APC to see if it was a bug with the opt cacher, but the segfaults still persisted. I also have PHP5-SUHOSIN installed and I disabled it too, but the errors still keep happening.
This issue just happend to me.
PHP5-FPM was having segfaults on most of its children. In my case, we had 0bytes available on the harddisk. A quick log shredding stopped the segfaults.
2011/07/20 15:48:42 [error] 29583#0: *569743 readv() failed (104: Connection reset by peer) while reading upstream, client: 77.223.197.193, server: domain.com, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
thats just some problem with your config for your upstream server / router / client reset?
of nginx dropped the request but running a site at 3 times the load you described i never saw that message, the requested resource isnt even handed to a php-fpm process, its a favicon
and for the php-fpm messages
the children seem to stop after the 114 sec limit, is that a limit set by your php.ini file?
seg faults in php often occur when using high memory, your php scripts could leak memory and will eventually reach the memory limit, having the php-fpm processes serve less requests helps in dealing with memory leaks
See my answer here that's related to your question (about nginx + magento and high load)
NGINX-FPM configuration settings for magento
Its not a direct answer per say, but it may help you configure your nginx + php-fpm to help eliminate the faults.
You are probably using suhosin
Disable ths suhosin.ini under /etc/php5/fpm/conf.d and restart the php5-fpm service
Check the suhosin version and try to install another one.
Related
I am getting this log and Interner Server Error 500 message, when using phpMyAdmin 5.1.1 and HTTP/3 enabled on NGINX Server. With HTTP/2 protocol works well.
2021/11/04 18:11:27 [alert] 21777#21777: *259 epoll_ctl(1, 16) failed (17: File exists), client: 37.234.***.***, server: *******, request: "POST /phpadmin/index.php?route=/config/get HTTP/3", host: "*******"
EDIT:
I installed a new nginx server with Cloudflare's http3. I used this script:
https://github.com/angristan/nginx-autoinstall/blob/master/nginx-autoinstall.sh
I installed Php 7.4 fastCgi and downloaded and unpacked Phpmyadmin 5.1.1.
I got these nginx error log after started up nginx and logging in to phpmyadmin:
2021/11/04 18:54:32 [notice] 23537#23537: signal process started
2021/11/04 18:54:46 [alert] 23539#23539: *288 epoll_ctl(1, 16) failed (17: File exists), client: 37.234.122.188, server: harisnyauzlet.hu, request: "POST /phpadmin/index.php?route=/ HTTP/3", host: "harisnyauzlet.hu"
2021/11/04 18:55:10 [alert] 23539#23539: *288 epoll_ctl(1, 16) failed (17: File exists), client: 37.234.122.188, server: harisnyauzlet.hu, request: "POST /phpadmin/index.php?route=/ HTTP/3", host: "harisnyauzlet.hu"
2021/11/04 20:14:00 [crit] 23539#23539: *307 SSL_do_handshake() failed (SSL: error:1e000065:Cipher functions:OPENSSL_internal:BAD_DECRYPT error:1000008b:SSL routines:OPENSSL_internal:DECRYPTION_FAILED_OR_BAD_RECORD_MAC) while SSL handshaking, client: 128.1.248.26, server: 0.0.0.0:443
2021/11/04 21:00:08 [crit] 23539#23539: *314 SSL_do_handshake() failed (SSL: error:1e000065:Cipher functions:OPENSSL_internal:BAD_DECRYPT error:1000008b:SSL routines:OPENSSL_internal:DECRYPTION_FAILED_OR_BAD_RECORD_MAC) while SSL handshaking, client: 193.118.53.202, server: 0.0.0.0:443
2021/11/05 01:07:19 [crit] 23539#23539: *354 SSL_do_handshake() failed (SSL: error:1e000065:Cipher functions:OPENSSL_internal:BAD_DECRYPT error:1000008b:SSL routines:OPENSSL_internal:DECRYPTION_FAILED_OR_BAD_RECORD_MAC) while SSL handshaking, client: 128.14.134.134, server: 0.0.0.0:443
It's a known bug by Cloudflare (Angristan use their quiche/nginx patch). You have this error when you use POST with a body.
But, the Cloudflare team doesn’t use POST in their system, so it is not a priority for them.
Clouflare Team : It's definitely something we should fix at some point, though we don't really have an estimate right now as it's not very high priority (we don't use this option ourselves in production). Would definitely welcome patches though, so I'll leave this ticket open if anyone wants to work on this before we get to it.
This was from one year ago. More about the issue.
I have no idea how to fix this...
2019/01/14 05:15:02 [alert] 27307#27307: *9 write() to "/var/log/nginx/access.log" failed (28: No space left on device) while logging request, client: 108.162.226.175, server: titomi.cf, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock", host: "titomi.cf"
2019/01/14 05:15:22 [error] 27307#27307: *11 FastCGI sent in stderr: "PHP message: PHP Warning: mysqli_connect(): (HY000/2002): No such file or directory in /var/www/test/lib/common.lib.php on line 1443" while reading response header from upstream, client: 162.158.118.78, server: test.titomi.cf, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "test.titomi.cf"
I think this is access.log error size
How can I reduce the size of access.log?
You probably need log rotation to rotate the logs.
sudo vim /etc/logrotate.d/nginx
/var/log/nginx/*.log {
daily
missingok
rotate 14
compress
notifempty
create 0640 www-data adm
}
Daily: Rotate the logs daily basis
missingok: Do not give output if log is missing
Rotate 14: Will keep the last 14 logs
Compress: Will the compress the old copies of the log files.
You can run the logrotate
logrotate /var/log/nginx/
I've installed OSticket application on my nginx server.
The webpage is opened only first time, if I simply refresh the page, it gives connection reset by peer upstream error..
I tried to change fastcgi_read_timeout and max_execution time as described in https://laracasts.com/discuss/channels/forge/502-bad-gateway-with-large-file-uploads and https://www.scalescale.com/tips/nginx/configure-max_execution_time-in-php-fpm-using-nginx/#, that didn't help.
nginx error log:
2017/08/07 22:15:08 [error] 26877#26877: *42 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.231.1.79, server: web.com, request: "GET /ticket/logo.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm/php-fpm.sock:", host: "web.com", referrer: "https://web.com/ticket/"
PHP-FPM error log:
WARNING: [pool www] child 26883 exited on signal 11 (SIGSEGV) after 4270.119171 seconds from start
I have a codebase that's running fine con HHVM.
I'm realizing some load tests with apache ab tool but when i test with 100 concurrent requests I start to get 502 gateway errors.
I generally only get two types of logs in nginx's error log:
2015/01/10 09:42:48 [crit] 29794#0: *302 connect() to
unix:/var/run/hhvm/hhvm.sock failed (2: No such file or directory)
while connecting to upstream, client: 192.168.56.211, server: ,
request: "GET /api/v2/checkaccess HTTP/1.1", upstream:
"fastcgi://unix:/var/run/hhvm/hhvm.sock:", host: "instela.com"
2015/01/10 09:42:26 [error] 29794#0: *264 connect() to
unix:/var/run/hhvm/hhvm.sock failed (111: Connection refused) while
connecting to upstream, client: 192.168.56.211, server: , request:
"GET /api/v2/checkaccess HTTP/1.1", upstream:
"fastcgi://unix:/var/run/hhvm/hhvm.sock:", host: "instela.com"
Under high load this error occurs on all load balanced backends servers, so it's not a server specific problem (I hope)
Mostly after some time HHVM returns back but many times I had to restart the daemon to bring the server back.
I am using the default configuration mostly. Here I've my configuration: http://pastie.org/9823561
2014/03/31 23:06:50 [error] 25914#0: *765 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 173.77.251.136, server: wiki.resonant-rise.com, request: "POST /index.php?title=Chisel&action=submit HTTP/1.1", upstream: "fastcgi://127.0.0.1:9016", host: "wiki.resonant-rise.com", referrer: "http://wiki.resonant-rise.com/index.php?title=Chisel&action=edit"
2014/03/31 23:06:50 [error] 25914#0: *765 open() "/usr/share/nginx/html/50x.html" failed (2: No such file or directory), client: 173.77.251.136, server: wiki.resonant-rise.com, request: "POST /index.php?title=Chisel&action=submit HTTP/1.1", upstream: "fastcgi://127.0.0.1:9016", host: "wiki.resonant-rise.com", referrer: "http://wiki.resonant-rise.com/index.php?title=Chisel&action=edit"
I have a mediawiki installation and an IPB installation. They both through up errors but this one error from mediawiki prevents me from posting semi-large articles. I have tried a lot of the solutions out there, adding catch_workers_output = yes, adjusting pm.* settings. Still not able to resolve this issue. I am coming to my wits end trying to figure this one out.
PHP-FPM Conf
http://pastie.org/private/aczklc52ll9yv0uz5drcqg
PHP-FPM WWW.CONF
http://pastie.org/private/wod3xipxhm8ractksw7ha
NGINX VHOST for MEDIAWIKI
http://pastie.org/private/h9co8aykbdmfzk2bd5qq
If the failure depends on size of the pages, it has to do with how much work the operation causes. My wild guess would be: increase the timeout (you currently have send_timeout 60s;).
It's easy for the parse time of a very long page to go over 60 seconds, especially if you're on a low power server, have not tuned performance or have enabled heavy parser extensions.
in my case it was that the php version of the project was different with the version of php I had been