HHVM "Crashing" 502 Bad Gateway Error - wordpress

My Setup
I'm running an AWS EC2 T2 Medium Instance with Webmin / Virtualmin atop NGinx, Reddis and HHVM (called web-server-1). It's connected to a separate AWS EC2 T2 Large Instance that runs MySQL (called database-server-1). Web-server-1 operates about 25 WordPress websites.
The Problem
On web-server-1 HHVM recently started "crashing" multiple times a day at seemingly random intervals. When HHVM stops operating correctly any websites I visit display 502 errors. The only thing that resolves it is restarting HHVM.
What I've Tried
I have New Relic installed on web-server-1. So I looked at the CPU and RAM usage at the times of the "crash" I don't see higher than normal usage of those resources - which would indicate to me there is a memory leak or a runaway script.
After looking through forums here I see that many others (https://github.com/facebook/hhvm/issues/3876) are experiencing this issue and that it was confirmed as a bug but I'm not sure where to go from here.
Thank you all for your time and assistance, I hope this is helpful to others!
HHVM Error Log
https://gist.github.com/s3w47m88/fac1e0cbf4ae5846fbd2

Increasing hhvm.jit_a_size to 128 MB fixed the issue.

Related

Google cloud compute engine - Wordpress high TTFB

I am running a LAMP Stack on a google cloud customized compute engine primarily to host wordpress websites running woocommerce stores.
Following are server specs:
RAM: 5GB, Cores: 1, Space: 30GB, OS: CentOS7, Maria DB Version: 5.5.64, PHP Version: 7.3
Currently facing extreme ttfb Values over 10-20 secs even with very low traffic. Have done the following optimisations for improving the timing but it doesn't seem to improve it. The site has close to 1500 products.
Wordpress caching using hummingbird and auto optimize (minify, GZIP compression etc..) custom .htaccess with header expires, APCU PHP cache, cloudflare CDN, compressed images.
Optimized mariadb with optimum memory allocation, allocated optimum memory to apache and PHP as well.
Tried adding more cores and increase memory of compute engine in vain.
Disabling theme and template has little to no effect.
All the above optimizations has had little effect on the ttfb timings, is this a server/network related issue on my google cloud compute instance ?
Pls check the ttfb values below, test link:
TTFB Test Results
Thanks in advance !
I think you can measure the repose times. Try to measure the time spent waiting for the initial response by going to your browser and clicking "F12" >> "Network" tab and then search for your website using the browser in the same window.
You will get the response times by each process to connect to your website. If you click a specific process and then select the timing you will be able to see the TTFB and with that try to catch where is taking more time.
I believe this is more related with your installations than with the server itself.
If you want to test your server connection you could try to avoid the app side and use a trace or iperf to test your TCP connections times to your server from your local computer (to the external IP), this will only work if you have ICMP traffic allowed.
And the last thing is the same than John mentioned above, check if you're server is not swaping memory or even try to monitor the CPU and mem in use while you run the ttbf test, that will give you an idea if the problem is with the server or with the website and its configuration.
Additionally here are some recommendations to reduce ttbf (https://wp-rocket.me/blog/how-to-reduce-ttfb-wordpress-site/). Hoping it can help some how with this.

Jenkins spikes up the the CPU usage to 100%

I have a jenkins master which has 3 docker slaves and 2 VM slaves. I have installed Jenkins as a service on RedHat linux. It is observed that, the CPU utilization goes upto 100% sometimes and thereby I have to reboot the box. When i check the processes, I can see that there is a main master jenkins process and several other child jenkins processes(which are an exact replica of the master process) are hung and are causing the spike(Confirmed this through new relic).
I am trying to reproduce this issue, however have been unsuccessful in the same.
Below are my queries:
I know the previous process id, can I get some logs or dumps related to it post the server restart?
Is there a better approach to trouble shoot this, so that I can narrow down on the issue?
At this point I am unable to understand where are these child processes getting spawned from and how can I find the culprit.

website down with mariadb "too many connections" error

I am running a single high-visited website on a high-end Centos 7 VPS (16 vCore / 128 GB of RAM) running Plesk Onyx on
Centos 7 / MariaDB 10.1 / PHP-FPM 5.6 setup.
Everything is usually smooth and fast, but it happened twice in a year that the website went down with the message "Too Many Connections" from MariaDB.
Being in a hurry to restore website I launched a " service mariadb restart " without actually launching a SHOW PROCESSLIST.
I checked mariadb logs and web server logs afterwards and I haven't find anything useful to troubleshoot the issue.
Note that when it happened first time, I raised the max_connections value to 300 in my.cnf and constantly monitored the "max_used_connections" variabile seeing that value never went over 50 so I guessed it happened because of some DDOS attack or malicious attempt.
Questions :
Any advice on how to troubleshoot this ?
How can I be alerted if the max_used_connections value is approaching the max_connections value ? Any tool ?
I am using external pingdom service to check website uptime but it didn't detect this kind of problem (the web response is 200 OK) and also a netdata instance on the server (https://netdata.io/) that didn't help...
Troubleshoot it by turning on the slowlog, preferably with a low value for long_query_time (such as "1"). Probably some naughty query will show up there.
Yes, do SHOW FULL PROCESSLIST next time. (Note "FULL".) Instead of restarting mysqld, look for the offending query. It will have one of the highest values in Time and it probably won't be in Sleep mode. It may be something potentially long like ALTER or a dump. Killing that one process will probably uncork the problem, and the problem will vanish in, perhaps, seconds.
Deleting a file that is "open" by a process (such as mysqld) will not help -- disk space is not recycled until all processes have closed the file. Killing the process closes any open files. Some logs are can be handled with FLUSH LOGS; -- this should be harmless, though it may not help.
If your tables are MyISAM, switching to InnoDB will avoid many cases of table locks (if that is what you are experiencing).
What is the value of innodb_buffer_pool_size? For that sized RAM, about 80G is reasonable.
There might be some clues in the GLOBAL STATUS; see http://mysql.rjweb.org/doc.php/mysql_analysis#tuning for analyzing it. (Caution: It will be useless immediately after a reboot.)

AWS EC2 Instance w/ WordPress keeps crashing from 25% CPU utilization spikes

I have an EC2 t2.medium instance i-0bf4623a779064e0a with a WordPress installation which keeps crashing (can't be accessed via http or SSH). It seems whenever CPU utilization gets to about 25% or more (which I would think isn't very much), the server crashes. I have an alert setup to restart the server whenever Network Out is <=50,000 bytes for 5 minutes and tonight it's had to restart 10 times. It has been doing this nearly everyday for weeks. Here is a screenshot of the monitoring http://i.imgur.com/zQQ4oiy.png
What can I do to stop this crashing? Can I do some sort of server config optimization? I hope I do not need a larger instance, since I am already paying quite a bit for AWS and previously using a $10/mo shared hosting which rarely went down.

Increasing req/sec on Nginx CE

I have a simple location block in nginx conf which echo's back the Server's ip. Its deployed on a 2 core 4gb ram EC2. i am able to get 400 req per second on load testing it.
Made optimizations like logs buffering, opening more FD, followed guidelines in http://www.freshblurbs.com/blog/2015/11/28/high-load-nginx-config.html.
The peak cpu load on the node is 4-5% and same for memory. I am wondering how can i blow it up even further. Will using Docker help or the cpu load & memory is irrelevant here as it might be running into network congestion. Will increasing EC2 Node size help ?
OS: CentOS. Any help appreciated. Thanks !

Resources