PHP-FPM using 40% CPU for a Single Request - wordpress

(I googled and searched this forum for hours, found some topics, but none of them worked for me)
I'm using Wordpress with: Varnish + Nginx + PHP-FPM + APC + W3 Total Cache + PageSpeed.
As I'm using Varnish, first time I call www.mysite.com it use just 10% of CPU. Calling the second time, it will be cached. The problem is passing request parameter in URL.
For just 1 request (www.mysite.com?1=1) it shows in top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7609 nginx 20 0 438m 41m 28m S 11.6 7.0 0:00.35 php-fpm
7606 nginx 20 0 437m 39m 26m S 10.3 6.7 0:00.31 php-fpm
After the page is fully loaded, these processes above are still active. And after 2 seconds, they are replaced by another 2 php-fpm processes(below), which are active for 3 seconds.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7665 nginx 20 0 444m 47m 28m S 20.9 7.9 0:00.69 php-fpm
7668 nginx 20 0 444m 46m 28m R 20.9 7.9 0:00.63 php-fpm
40% CPU usage just for 1 request not cached!
Strange things:
CPU usage is higher after the page was loaded
When I purged the cache (W3 and Varnish), it take just 10% of CPU to load a not cached page
This high CPU usage just happend passing request parameter or in Wordpress Admin
When I try to do 10 request(pressing F5 key 10x), the server stop serving and in php-fpm log appears:
WARNING: [pool www] server reached max_children setting (10), consider raising it
I raised that value to 20, same problem.
I'm using pm=ondemand (pm.max_children=10 and pm.max_requests=500).
Inittialy I was using pm=dynamic (pm.max_children=10, pm.start_servers=1, pm.min_spare_servers=1, pm.min_spare_servers=2, pm.max_requests=500) and it happened the same problem.
Anyone could help, plz? Any help would be appreciated!
PS:
APC is ON (98% Hits, 2% Misses)
Server is Amazon Micro (613MB RAM)
PHP 5.3.26 (fpm-fcgi)
Linux version 3.4.48-45.46.amzn1.x86_64 Red Hat 4.6.3-2 (I think it's based on CentOS 5)

First reduce the stack of caches. Why using varnish which serves pages from memory when you're using w3 cache already which serves from memory as well?
W3cache is CPU intensive! It does not just cache items but also compresses, minifies and merges files on the fly.
You got a total of 512MB of memory on your machine which is not a lot, also your CPU power is less than a modern smartphone has. Memory access is extremely slow compared to a root server because of the xen virtualization layer - That's why less is more.
Make sure w3cache is properly set up so it actually caches items, then warmup your cache and you should be fine.
Have a look at Googles nginx pagespeed module https://github.com/pagespeed/ngx_pagespeed, it can do the same thing w3cache does, just much more efficient because it happens in the webserver, not in PHP
Nginx can also directly serve from memcached http://www.kingletas.com/2012/08/full-page-cache-with-nginx-and-memcache.html (example article, might need some more investigation)

Problem solved!
For those who are having the same problem:
Check Varnish configuration;
Check your Wordpress's plugin;
1) In my case, TTL was not configured in Varnish, so nothing was being cached.
This config worked for me:
sub vcl_fetch {
if (!(req.url ~ "wp-(login|admin)")) {
unset beresp.http.set-cookie;
set beresp.ttl = 48h;
}
}
2) The high CPU usage AFTER page loads, was caused by a Wordpress plugin called: "Scroll Triggered Box".
It was doing some AJAX after page has loaded. I disabled that plugin and high load stopped.

There are two factors at play here:
You are using micro instance which has a burstable CPU profile. It can burst up to 2 ECU's then be limited to much less than 1 (Some estimates put this at around 0.1 - 0.2 ECU's)
While logged in as an admin, wordpress caching plugins often bypass or reduce caching. W3 should allow you to switch this if you want caching on all the time.

Related

EC2 with wordpress - everyday running out of space (no space left on device) - can't start apache

I'm having the strangest problem for days now. I took over a WordPress website of a company that was originally developed by another person – the codebase is a mess but I was able to go over it and make sure it at least is working.
The database is huge (70mb) and there is a lot of plugin dependencies on the site.
However the site works generally without issues now and I'm hosting it on an EC2 with a bitnami stack for WordPress.
The weird thing though is that everyday (for instance today morning) I check the site and it's down … 
Service Unavailable The server is temporarily unable to service your
request due to maintenance downtime or capacity problems. Please try
again later.
Additionally, a 503 Service Unavailable error was encountered while
trying to use an ErrorDocument to handle the request.
When logging into the server with ssh and trying to restart apache I get this:
Failed to unmonitor apache: write /var/lib/gonit/state: no space left
on device Syntax OK /opt/bitnami/apache2/scripts/ctl.sh : apache not
running Syntax OK (98)Address already in use: AH00072: make_sock:
could not bind to address [::]:80 (98)Address already in use: AH00072:
make_sock: could not bind to address 0.0.0.0:80 no listening sockets
available, shutting down AH00015: Unable to open logs
/opt/bitnami/apache2/scripts/ctl.sh : httpd could not be started
Failed to monitor apache: write /var/lib/gonit/state: no space left on
device
I had this the 3rd time in 3 days now even though I restored the server from a snapshot with a volume of 200gb (for testing purposes) and all site files including uploads only have 5gb.
The site is running on an EC2 (t2.medium) with 200gb volume now and today morning I can't restart apache. Yesterday evening when restoring from a snapshot the site works well and normal - it's actually even fast.
I don't know where to start investigating here. What could cause the server to run out of disc space in one night?
Thanks,
Matt
Also one of the weirdest things it seems. I reset everything yesterday eventing from an EC2 snapshot to a 200gb volume and attached it to the instance. Everything was working fine. I made some changes on the files, deleted some plugins, updated some settings.
And it seems this is all gone now. And I'm using an elastic IP, so I couldn't connect to a wrong device or something.
Bitnami Engineer here, you will probably need to resize the disk of your instance. But you can investigate those issues later, these commands will show the directories with large number of files
cd /opt/bitnami
sudo find . -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
du -h -d 1
If MySQL is the service that is taking more space, you can try adding this line under the [mysqld] block of the /opt/bitnami/mysql/my.cnf configuration file
expire_logs_days = 7
That will force MySQL to purge the old logs of the server after 7 days. You will need to restart MySQL after that:
sudo /opt/bitnami/ctlscript.sh restart mysql
More information here:
https://community.bitnami.com/t/something-taking-up-space-and-growing/64532/7
What you need to do is increase the size of partition on the disk and the size of file system on that partition. Even you increased the volume size, these figure kept unchanged. Create another from snapshot would not help too.
Check how to do it here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html
Your df result shows
Filesystem 1K—blocks Used Available Use% Mounted on
udev 2014560 0 2014560 0% /dev
tmpfs 404496 5872 398624 2% /run
/dev/xvdal 20263528 20119048 128096 100%
tmpfs 2022464 0 2022464 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 2022464 0 2022464 0% /sys/fs/cgroup
/dev/loop0 18432 18432 0 100% /snap/amazon—ssm—agent/1480
/dev/loopl 91264 91264 0 100% /snap/core/7713
/dev/loop2 12928 12928 0 100% /snap/amazon—ssm—agent/295
/dev/loop3 91264 91264 0 100% /snap/core/7917
tmpfs 404496 0 404496 0% /run/user/1000
where the root volume /deb/xvda1 has only 20GB and that is marked as 100% of the volume, not 200GB as you mentioned.
When you increase the volume size during the instance running, it is not automatically applied. In your EC2, you have to apply the change of volume as follows:
sudo resize2fs /dev/xvda1
and check the size of the volume by doing df -h then you will see the size is now 200GB.

httpd using high amount of memory?

On a t3.small Amazon EC instance, I'm running two wordpress websites. One is running WooCommerce and the other is just a plain Wordpress with the Avada theme.
After editing a few pages, I notice the memory usage going up and up and usually after about an hour, the server stops responding (out of memory and possible due to heavy swapping).
Running the following command for example shows;
ps aux | grep 'httpd' | awk '{print $6/1024 " MB";}'
15.1328 MB
416.688 MB
527.73 MB
372.43 MB
543.508 MB
2.45703 MB
The apache server is running prefork with the following config;
<IfModule mpm_prefork_module>
StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxRequestWorkers 10
MaxConnectionsPerChild 1000
</IfModule>
Any hints on how to improve this would be greatly appreciated. thank you
Turned out, my home page had a few images that were not loading via https which caused them to keep a few connections timing out which caused a high amount of connections to stay open.

Understanding Docker container resource usage

I have server running Ubuntu 16.04 with Docker 17.03.0-ce running an Nginx container. That server also has ConfigServer Security & Firewall installed. Shortly after starting the Nginx container I start receiving emails about "Excessive resource usage" with the following details:
Time: Fri Mar 24 00:06:02 2017 -0400
Account: systemd-timesync
Resource: Process Time
Exceeded: 1820 > 1800 (seconds)
Executable: /usr/sbin/nginx
Command Line: nginx: worker process
PID: 2302 (Parent PID:2077)
Killed: No
I fully understand that I can add exe:/usr/sbin/nginx to csf.pignore to stop these email alerts but I would like to understand a few things first.
Why is the "systemd-timesync" account being reported? That does not seem to have anything to do with Docker.
Why does the host machine seem to be reporting the excessive resource usage (the extended process time) when that is something running in the container?
Why are other docker containers not running Nginx not resulting in excessive resource usage emails?
I'm sure there are other questions but basically, why is this being reported the way it is being reported?
I can at least answer the first two questions:
Unlike real VMs, Docker containers are simply a collection of processes run under the host system kernel. They just have a different view on certain system resources, including their own file hierarchy, their own PID namespace and their own /etc/passwd file. As a result, they will still show up if you ps aux on the host machine.
The nginx container's /etc/passwd includes a user 'nginx' with UID 104 that runs the nginx worker process. However, in the host's /etc/passwd, UID 104 might belong to a completely different user, such as systemd-timesync.
As a result, if you run ps aux | grep nginx in the container, you might see
nginx 7 0.0 0.0 32152 2816 ? S 11:20 0:00 nginx: worker process
while on the host, you see
systemd-timesync 22004 0.0 0.0 32152 2816 ? S 13:20 0:00 nginx: worker process
even though both are the are the same process (also note the different PID namespaces; in containers, PIDs are counted from 1 again).
As a result, container processes will still be subject to ConfigServer's resource monitoring, but they might show up with random, or even non-existent user accounts.
As to why nginx triggers the emails and other containers don't, I can only assume that nginx is the only one of your containers that crosses ConfigServer's resource thresholds.

Why every request is being processed by PHP-FPM? (even if I'm using Cache)

I'm running Wordpress with: Nginx + PHP-FPM + APC + W3 Total Cache + PageSpeed.
After 3 days researching and configuring, I succeeded to configure.
Running "top" and hitting some cached pages, it shows:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13387 nginx 20 0 472m 11m 4664 S 12.3 2.0 0:46.55 nginx
17577 nginx 20 0 443m 47m 29m S 0.7 8.0 0:42.88 php-fpm
17591 nginx 20 0 438m 43m 29m S 0.7 7.2 0:42.59 php-fpm
1486 mysql 20 0 851m 21m 4832 S 0.3 3.7 1:24.71 mysqld
17907 nginx 20 0 438m 48m 34m S 0.3 8.1 0:36.78 php-fpm
18065 nginx 20 0 442m 47m 29m S 0.3 8.0 0:33.49 php-fpm
18543 nginx 20 0 445m 63m 42m S 0.3 10.6 0:22.94 php-fpm
21125 root 20 0 15012 1148 868 R 0.3 0.2 0:00.86 top
1 root 20 0 19356 1388 1136 S 0.0 0.2 0:00.74 init
1) Why every request is being processed by PHP-PFM? Shouldn't W3 Total Cache supposed to prevent the request from been processed by PHP-FPM?
I know that my page is being cached because every page return in the end of HTML:
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/
Page Caching using disk: enhanced
2) If I install Varnish in front of Nginx, will it stop the request from being processed by PHP-FPM? (Performance will increase? I'm using a Micro Ec2, Ram = 613MB)
PS: The response header is returning "Cache-Control: max-age=0, no-cache" from server. I don't know if this influence W3 Caching.
My specs:
Amazon Micro EC2
Linux version 3.4.48-45.46.amzn1.x86_64 Red Hat 4.6.3-2 (I think it's based on CentOS 5)
PHP 5.3.26 (fpm-fcgi)
I'm not aware of how this w3 total cache works, but let me state to you some facts,
first of all on the nginx level, any php page must hit the php engine, because that's probably what your try_files tells the nginx to do, if w3 total cache has some sort of html cache of the pages, then without some changes to nginx config you will still hit the php even if the cache existed. And if the cache is not really in a form of html then probably the php engine checks for the existence of the page then decides whether to rebuild the page or serve the cached one instead, so you definitely need php to run, the difference is that it won't hit the database and it won't do any processing, just serving the cached page instead.
second question, varnish, yes varnish would actually be good, it would spare you the need of a cache plugin, but then you need to make sure that wordpress knows when to ask varnish to purge the cached pages, the structure of the server would be user -> varnish -> nginx -> php, if varnish has a cached page or assets (css,js,etc) it would serve it directly without passing the request to nginx, I've tried it on a website and the response times of a cached page definitely improved, even when i did a ctrl+f5 to request the whole page without cache it still returned very fast as if the page was just a plain html page.
You'll still need to mess around with the varnish config, or at least that's what i did, because it needs a little bit of learning, but so far all I did was some copy and pasting from blogs and stuff and it worked just fine with me.
I installed Varnish in front of my server, but again, it was being processed by PHP-FPM.
The problem was the lacking of slash at the end of the URL.
In Wordpress, a page is a directory, so it respond as www.mysite.com/page1/.
The point is when you hit www.mysite.com/page1 (without the slash), Nginx have to redirect to www.mysite.com/page1/ (with slash), and by doing this, it uses PHP-FPM.
After putting the slash at the end of all links in my site, the redirect was not done, and all of my page was not processed by PHP-FPM.

Drupal site - Memcache Connection errors

We are tying to perf tune our drupal site.
We are using Siege to measure performance (as drupal visitor).
Env:
Nginx + FastCGI+ Memcache
Siege runs fine for a few seconds, and then we run into connection errors:
Example:
HTTP/1.1 200 29.18 secs: 5877 bytes ==> /
HTTP/1.1 200 29.39 secs: 5877 bytes ==> /
warning: socket: -1656235120 select timed out: Connection timed out
warning: socket: -1673020528 select timed out: Connection timed out
Using the same Siege test confiuration, Nginx + FastCGI+ Drupal Cache seems to work fine.
Example:
HTTP/1.1 200 1.41 secs: 5868 bytes ==> /
HTTP/1.1 200 1.40 secs: 5868 bytes ==> /
As you can see, Response time is much higher with MemCache, in addition to the connection errors.
Any idea what could be wrong here... and why Drupal is throwing errors with memcache under load?
Memcache runs on a separate instance. Allocated 2GB memory for MemCache.
I guess that You run out of memcached connections. Please run a check of Your memcached installation with a simple script every second. Then start Siege. I guess Your memcached stops responding after a while.
Test memcache php script:
<?php
$memcache = new Memcache;
$memcache->connect('localhost', 11211) or die ('Unable to connect');
$version = $memcache->getVersion();
echo 'Server version: '.$version;
?>
What I guess is happening is that You have not disable the persistent connections in memcache and they hang around in the php threads. Memcached can serve ~1023 of them at a time and that might not be enough while Sieging.
You might also try ab, apache benchmarking tool with the close look to the -c switch. Play around with it and see how the results change on different values.
Finally, You should run a tcpdump on Your memcached port (usually 11211) on the php machine to find out what is happening to the connections. Does drupal start them? Does the other host respond with a RST or does it time out?
There was a bug in the memcached php documentation api that said that the connections are non-persistent by default. They are persistent by default (well, they were at the time I had the problem with it).
Feel free to comment this answer, I'll read the comments and assist further if necessary.

Resources