Increasing req/sec on Nginx CE - nginx

I have a simple location block in nginx conf which echo's back the Server's ip. Its deployed on a 2 core 4gb ram EC2. i am able to get 400 req per second on load testing it.
Made optimizations like logs buffering, opening more FD, followed guidelines in http://www.freshblurbs.com/blog/2015/11/28/high-load-nginx-config.html.
The peak cpu load on the node is 4-5% and same for memory. I am wondering how can i blow it up even further. Will using Docker help or the cpu load & memory is irrelevant here as it might be running into network congestion. Will increasing EC2 Node size help ?
OS: CentOS. Any help appreciated. Thanks !

Related

Debugging poor I/O performance on OpenStack block device (OpenStack kolla:queen)

I have an OpenStack VM that is getting really poor performance on its root disk - less than 50MB/s writes. My setup is 10 GbE, OpenStack deployed using kolla, the Queen release, with storage on Ceph. I'm trying to follow the path through the infrastructure to identify where the performance bottleneck is, but getting lost along the way:
nova show lets me see which hypervisor (an Ubuntu 16.04 machine) the VM is running on but once I'm on the hypervisor I don't know what to look at. Where else can I look?
Thank you!
My advice is to check the performance first between host (hypervisor) and ceph , if you are able to create a ceph block device, then you will able to map it with rbd command , create filesystem, and mount it - then you can measure the device io perf with : sysstat , iostas, iotop, dstat, vmastat or even with sar

Google cloud compute engine - Wordpress high TTFB

I am running a LAMP Stack on a google cloud customized compute engine primarily to host wordpress websites running woocommerce stores.
Following are server specs:
RAM: 5GB, Cores: 1, Space: 30GB, OS: CentOS7, Maria DB Version: 5.5.64, PHP Version: 7.3
Currently facing extreme ttfb Values over 10-20 secs even with very low traffic. Have done the following optimisations for improving the timing but it doesn't seem to improve it. The site has close to 1500 products.
Wordpress caching using hummingbird and auto optimize (minify, GZIP compression etc..) custom .htaccess with header expires, APCU PHP cache, cloudflare CDN, compressed images.
Optimized mariadb with optimum memory allocation, allocated optimum memory to apache and PHP as well.
Tried adding more cores and increase memory of compute engine in vain.
Disabling theme and template has little to no effect.
All the above optimizations has had little effect on the ttfb timings, is this a server/network related issue on my google cloud compute instance ?
Pls check the ttfb values below, test link:
TTFB Test Results
Thanks in advance !
I think you can measure the repose times. Try to measure the time spent waiting for the initial response by going to your browser and clicking "F12" >> "Network" tab and then search for your website using the browser in the same window.
You will get the response times by each process to connect to your website. If you click a specific process and then select the timing you will be able to see the TTFB and with that try to catch where is taking more time.
I believe this is more related with your installations than with the server itself.
If you want to test your server connection you could try to avoid the app side and use a trace or iperf to test your TCP connections times to your server from your local computer (to the external IP), this will only work if you have ICMP traffic allowed.
And the last thing is the same than John mentioned above, check if you're server is not swaping memory or even try to monitor the CPU and mem in use while you run the ttbf test, that will give you an idea if the problem is with the server or with the website and its configuration.
Additionally here are some recommendations to reduce ttbf (https://wp-rocket.me/blog/how-to-reduce-ttfb-wordpress-site/). Hoping it can help some how with this.

How to disallow Perforce UNIX server to generate thousands of IDLE processes

Im' asking this question because we run out of ideas on how to handle the current situation of our perforce versioning server.
The Server
The server is hosted on Scaleway and has a baremetal machine with two SSD under the hood (we know it is no hardware issue).
We are currently using the free license of perforce to evaluate it.
P4 info yields the following:
The Problem
We are using perforce on a UNIX server to version our Unreal Engine 4 project. Lately we discovered that the server stockpiled an amount of 2771 processes where around 80% of them are p4d processes. We suspect these IDLE connections / processes to swamp the server and to be the root of the connectivity issues we encounter at the office.
We enabled monitoring to keep an eye on RUNNING and IDLE processes
p4 configure set monitoring=2
When we now display the monitored processes we see IDLE ones running for more than one hour
p4 monitor show
We already tried disabling leepalive connections with
p4 configure set net.keepalive.disable=1
And we see the following which is going on for a while
The Question
Now the question I want to ask is:
Does anybody else ever has encountered this behaviour with a perforce server on UNIX?
Does anybody knows how we can tell the server that we want to discard IDLE connections ?
EDIT
So after some tracking we discovered that the proxy our office network is behind causes the problems and for some reasons don't allow the connections to close. Does anyone has some clues how to get around these issue?
Based on the monitor output it appears that these clients are opening a bunch of connections and holding them open, basically DOSing the server. You could go through and kill the pids on the server side, but this sounds like a bug in the client that should be raised with Perforce technical support.

HHVM "Crashing" 502 Bad Gateway Error

My Setup
I'm running an AWS EC2 T2 Medium Instance with Webmin / Virtualmin atop NGinx, Reddis and HHVM (called web-server-1). It's connected to a separate AWS EC2 T2 Large Instance that runs MySQL (called database-server-1). Web-server-1 operates about 25 WordPress websites.
The Problem
On web-server-1 HHVM recently started "crashing" multiple times a day at seemingly random intervals. When HHVM stops operating correctly any websites I visit display 502 errors. The only thing that resolves it is restarting HHVM.
What I've Tried
I have New Relic installed on web-server-1. So I looked at the CPU and RAM usage at the times of the "crash" I don't see higher than normal usage of those resources - which would indicate to me there is a memory leak or a runaway script.
After looking through forums here I see that many others (https://github.com/facebook/hhvm/issues/3876) are experiencing this issue and that it was confirmed as a bug but I'm not sure where to go from here.
Thank you all for your time and assistance, I hope this is helpful to others!
HHVM Error Log
https://gist.github.com/s3w47m88/fac1e0cbf4ae5846fbd2
Increasing hhvm.jit_a_size to 128 MB fixed the issue.

Network latency problems - Server stress testing

I am doing a server stress testing using Apache Benchmark Tool and Apache jMeter. With 30 and more concurrent requests, network starts to lag (every 100-200 requests).
Although there are no CPU load on server and have plenty of free memory.
For example, 200 requests perform with 50ms latency and then ~10 requests performs with latency over 3000 ms, and it keeps going like this.
Please note, server DOES NOT run Apache nor MySQL, therefore Apache is not the problem. Node.JS was used for stress testing and it seem to handle load perfectly well. I tried doing same experiment with apache and static content and got same result with delays.
Server configuration:
- Leaseweb
- Intel Xeon X3440
- 8GB DDR3
- 1 x 100Mbps Full-Duplex
What could be a problem and how can i monitor weak spots? Thank you in advance.
You seem to be confident that seeing the same results with a different target server means that the target server is not the problem. If you are correct, then the two remaining possibilities are the load generator and the network. Try using two load generators (at the same network location). If you get the exact same results, then the load generators are likely not to blame. If the results change, then the bottleneck is on the load generator.

Resources