php-fpm not scaling as well as php-fastcgi - nginx

I'm trying to optimize a PHP site to scale under high loads.
I'm currently using Nginx, APC and also Redis as a database cache.
All that works well and scales much better than stock.
My question is in regard to php-fpm:
I load tested using php-fpm VS php-fastcgi, in theory I should use php-fpm as it has better process handling and also should play better with APC since php-fastcgi processes can't share the same APC cache, and use more memory, if I understand it right.
Now the thing is under a heavy load test, php-fastcgi performed better, it's not faster but it "holds" longer, whereas php-fpm started giving timeouts and errors much sooner.
Does that make any sense ?
Probably I just have not configured php-fpm optimally maybe, but I tried a variety of settings and could not match php-fastcgi under that high volume load test scenario.
Any recommendations / comments / best practices / settings to try would be appreciated.
Thanks.
I mostly messed with the number of servers:
pm.start_servers = 20
pm.min_spare_servers = 10
pm.max_spare_servers = 100
pm.max_requests = 5000

Related

What happens when nginx proxy_buffer_size is exceeded?

I am running a node server in AWS Elastic Beanstalk with Docker, which also uses nginx. One of my endpoints is responsible for image manipulation such as resizing etc.
My logs show a lot of ESOCKETTIMEDOUT errors, which indicate it could be caused by an invalid url.
This is not the case as it is fairly basic to handle that scenario, and when I open the apparent invalid url, it loads an image just fine.
My research has so far led me to make the following changes:
Increase the timeout of the request module to 2000
Set the container uv_threadpool_size env variable to the max 128
While 1 has helped in improving response times somewhat, I don't see any improvements from 2. I have now come across the following warning in my server logs:
an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/0/12/1234567890 while reading upstream,.
This makes me think that the ESOCKETTIMEDOUT errors could be due to the proxy_buffer_size being exceeded. But, I am not sure and I'd like some opinion on this before I continue making changes based on a hunch.
So I have 2 questions:
Would the nginx proxy_buffer_size result in an error if a) the size is exceeded in cases of manipulating a large image or b) the volume of requests maxes out the buffer size?
What are the cost impacts, if any, of updating the size. AWS memory, instance size etc?
I have come across this helpful article but wanted some more opinion on if this would even help in my scenario.
When proxy_buffer_size is exceeded it creates a temporary file to use as a kind of "swap", which uses your storage, and if it is billable your cost will increase. When you increase proxy_buffer_size value you will use more RAM, which means you will have to pay for a larger one, or try your luck with the current one.
There is two things you should never make the user wait for processing: e-mails and images. It can lead to timeouts or even whole application unavailability. You can always use larger timeouts, or even more robust instances for those endpoints, but when it scales you WILL have problems.
I suggest you approach this differently: Make a image placeholder response and process those images asynchronously. When they are available as versioned resized images you can serve them normally. There is an AWS article about something like this using lambda for it.

Gunicorn CPU usage increasing to a very high value

We are using Gunicorn with Nginx. After every time we restart gunicorn, the CPU usage took by Gunicorn keeps on increasing gradually. This increases from 0.5% to around 85% in a matter of 3-4 days. On restarting gunicorn, it comes down to 0.5%.
Please suggest what can cause this issue and how to go forward to debug and fix this.
Check workers configuration. Try use the following: cores * 2 -1
Check your application, seems that your application is blocking / freezing threads. Add timeout to all api calls, database queries, etc.
You can add an APM software to analyze your application, for example datadog.

nginx protecting from screaming frog and too gready crawlers (so no real ddos, but close)

we have seen several actions where a simple screaming frog action would almost take down our server (it does not go down, but it slows down almost to a halt and PHP processes go crazy). We run Magento ;)
Now we applied this Nginx ruleset: https://gist.github.com/denji/8359866
But I was wondering of there is a more strict or better way to kick out too gready crawlers and screaming frog crawl episodes. Say 'after 2 minutes' of intense requesting we should already know someone is running too many requests of some automated system (not blocking the Google bot ofcourse)
Help and ideas appreciated
Seeing how a simple SEO utility scan may cause your server to crawl, you should realize that blocking of spiders isn't real solution. Suppose that you have managed to block every spider in the world or created a sophisticated ruleset to define that this number of requests per second is legit, and this one is not.
But it's obvious that your server can't handle even a few visitors at the same time. Few more visitors and they will bring your server down, whenever your store receives more traffic.
You should address the main performance bottleneck, which is PHP.
PHP is slow. With Magento it's slower. That's it.
Imagine every request to your Magento store causes scanning and parsing of dozens and dozens of PHP files. This will hit the CPU so bad.
If you have unoptimized PHP-FPM configuration, this will hit your RAM so bad also.
These are the things which should be done in order of priority to ease the PHP strain:
Make use of Full Page Cache
Really, it's a must with Magento. You don't loose anything, but only gain performance. Common choices are:
Lesti FPC. This is easiest to install and configure. Works most of the time even if your theme is badly coded. Profit - your server will no longer be down and will be able to serve more visitors. It can even store its cache to Redis if you have enough RAM and you are willing to configure it. It will cache, and it will cache things fast.
Varnish. It is the fastest caching server, but it's tricky to configure if you're running Magento 1.9. You will need Turpentine Magento plugin. The latter is quite picky to make work if your theme is not well coded. If you're running Magento 2, it's just compatible with Varnish out of the box and quite easy to configure.
Adjust PHP-FPM pool settings
Make sure that your PHP-FPM pool is configured properly. A value for pm.max_children that is too small will make for slow page requests. A value that is too high might hang your server because it will lack RAM. Set it to (50% of total RAM divided by 128MB) for starters.
Make sure to adjust pm.max_requests and set it to a sane number, i.e. 500. Many times, having it set to 0 (the default) will lead to "fat" PHP processes which will eventually eat all of the RAM on server.
PHP 7
If you're running Magento 2, you really should be using PHP 7 since it's twice as fast as PHP 5.5 or PHP 5.6.
Same suggestions with configs in my blog post Magento 1.9 performance checklist.

php-fpm processes increase drastically

we've seen that some time, under no heavy traffic, the php-fpm processes under nginx start to increase drastically.
We have 35 processes and them all of a sudden, you see CPU at 100% with 160 processes running at the same time. Last time it happened was a few seconds ago, the second last one was 2 weeks ago, pretty weird. We do not see memory problems or anything strange (too many accesses or so on).
Do you have any idea how we can avoid creating those processes? Or what could be the cause?
well fpm probably creates them to handle the traffic, and if the process has 100% then that might be some part of ur code eating up the processor.
If you want to force fpm not to create more than a certain number then check the file under /etc/php5/fpm/pool.d/www.conf, there you'll find max children and stuff like that

memcached and miss ratio

Drupal 6.15 and memcache running on RHEL 5.4 server. Memcache miss percentage is 32%. I think is is high. What can be done to improve it?
Slightly expanded form of the comment below.
A cache hit ratio will depend on a number of factors, things like
Cache Size
Cache timeout
Cache clearing frequency.
Traffic
Using memcached is most beneficial when you have a high number hits on a small amount of content. That way the cache is built quickly and then used frequently giving you a high hit ratio.
If you don't get that much traffic, cache items will be stale so will need to be re cached.
If you have traffic going to a lot of different content then the cache can either get full, or go stale before it is used again.
memcached is only something you really need to use if you are having, or anticipating scalability issues. It is not buggy, but adds another layer of application which needs to be monitored and configured.

Resources