Scaling nginx with static files -- non-Persistent requests kill req/s - networking

Working on a project where we need to server a small static xml file ~40k / s.
All incoming requests are sent to the server from HAProxy. However, none of the requests will be persistent.
The issue is that when benchmarking with non-Persistent requests, the nginx instance caps out at 19 114 req/s. When persistent connections are enabled, performance increases by nearly an order of magnitude, to 168 867 req/s. The results are similar with G-wan.
When benchmarking non-persistent requests, CPU usage is minimal.
What can I do to increase performance with non-persistent connections and nginx?
[root#spare01 lighttpd-weighttp-c24b505]# ./weighttp -n 1000000 -c 100 -t 16 "http://192.168.1.40/feed.txt"
finished in 52 sec, 315 millisec and 603 microsec, 19114 req/s, 5413 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 290000000 bytes total, 231000000 bytes http, 59000000 bytes data
[root#spare01 lighttpd-weighttp-c24b505]# ./weighttp -n 1000000 -c 100 -t 16 -k "http://192.168.1.40/feed.txt"
finished in 5 sec, 921 millisec and 791 microsec, 168867 req/s, 48640 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 294950245 bytes total, 235950245 bytes http, 59000000 bytes data

Your 2 tests are similar (except HTTP Keep-Alives):
./weighttp -n 1000000 -c 100 -t 16 "http://192.168.1.40/feed.txt"
./weighttp -n 1000000 -c 100 -t 16 -k "http://192.168.1.40/feed.txt"
And the one with HTTP Keep-Alives is 10x faster:
finished in 52 sec, 19114 req/s, 5413 kbyte/s
finished in 5 sec, 168867 req/s, 48640 kbyte/s
First, HTTP Keep-Alives (persistant connections) make HTTP requests run faster because:
Without HTTP Keep-Alives, the client must establish a new CONNECTION for EACH request (this is slow because of the TCP handshake).
With HTTP Keep-Alives, the client can send all requests at once (using the SAME CONNECTION). This is faster because there are less things to do.
Second, you say that the static file XML size is "small".
Is "small" nearer to 1 KB or 1 MB? We don't know. But that makes a huge difference in terms of available options to speedup things.
Huge files are usually served through sendfile() because it works in the kernel, freeing the usermode server from the burden of reading from disk and buffering.
Small files can use more flexible options available for application developers in usermode, but here also, file size matters (bytes and kilobytes are different animals).
Third, you are using 16 threads with your test. Are you really enjoying 16 PHYSICAL CPU Cores on BOTH the client and the server machines?
If that's not the case, then you are simply slowing-down the test to the point that you are no longer testing the web servers.
As you see, many factors have an influence on performance. And there are more with OS tuning (the TCP stack options, available file handles, system buffers, etc.).
To get the most of a system, you need to examinate all those parameters, and pick the best for your particular exercise.

Related

Why does the wrk of the host press the nginx of the virtual machine get different results which vary widely?

I'm doing performance testing and using wrk for the first time. My goal is to make the guest's CPU usage of nginx threads as high as possible. But I find the wrk results often vary widely with same wrk parameters as follows:
wrk -c 200 -t 12 -d 10 https://192.168.122.110:443/index.html
Several results are obtained as follows with everything unchanged.
[root#localhost nicstat-1.95]# wrk -c 200 -t 12 -d 10 https://192.168.122.110:443/index.html
Running 10s test # https://192.168.122.110:443/index.html
12 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.22ms 604.26us 24.59ms 97.86%
Req/Sec 7.25k 487.64 11.37k 96.43%
870313 requests in 10.10s, 705.50MB read
Requests/sec: 86171.61
Transfer/sec: 69.85MB
[root#localhost nicstat-1.95]# wrk -c 200 -t 12 -d 10 https://192.168.122.110:443/index.html
Running 10s test # https://192.168.122.110:443/index.html
12 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.44ms 287.96us 41.76ms 79.43%
Req/Sec 11.07k 0.93k 16.99k 75.45%
1333786 requests in 10.10s, 1.06GB read
Requests/sec: 132059.89
Transfer/sec: 107.05MB
[root#localhost nicstat-1.95]# wrk -c 200 -t 12 -d 10 https://192.168.122.110:443/index.html
Running 10s test # https://192.168.122.110:443/index.html
12 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.19ms 288.69us 26.82ms 80.27%
Req/Sec 13.42k 1.57k 17.62k 73.66%
1617255 requests in 10.10s, 1.28GB read
Requests/sec: 160129.08
Transfer/sec: 129.80MB
I check that the memory is enough and wait I/O time is around 0%. I use the nicstat to obtain the virtual machine NIC usage. The NIC utility usage rate reaches 100% in the third result above and the other two are all below 100%.
I don't know why the wrk results vary widely.
Thanks.

how many websites should i host on EC2 nano instance

I am developing a couple of websites, but I only have paid for an EC2 nano instance on AWS. How many websites could I possible host there, assuming the websites will only have minimum traffic? Most of the websites are for personal use only.
Only one way to find out ;)
No definite answer possible because it depends on a lot of factors.
But if traffic is really low you will only be limited by the amount of disk space and as t2.nano runs on EBS storage this can be as big as you want. So you could fit a lot of websites!
t2.nano has only 512Mb memory so best to pick a not-so-memory-hungry webserver such as ngnix.
I run five very low traffic websites on my t2 nano - four of them Wordpress, one custom PHP. I run Nginx, PHP5.6, and MySQl 5.6 on the same instance. Traffic is extremely light, in the region of 2000 pages a day, which is about a page every 30 seconds. If you include static resources it'll be higher. CloudFlare runs as the CDN, which reduces static resource consumption significantly, but doesn't cache pages.
I have MySQL on the instance configured to use very little memory, currently 141MB physical RAM. Nginx takes around 10MB RAM. I have four PHP workers, each taking 150MB RAM, but of that 130MB is shared, so it's really 20MB per worker after the first.
Here's the output of a quick performance test on the t2.nano. Note that the Nginx page cache will be serving all of the pages.
siege -c 50 -t10s https://www.example.com -i -q -b
Lifting the server siege... done.
Transactions: 2399 hits
Availability: 100.00 %
Elapsed time: 9.60 secs
Data transferred: 14.82 MB
Response time: 0.20 secs
Transaction rate: 249.90 trans/sec ***
Throughput: 1.54 MB/sec
Concurrency: 49.42
Successful transactions: 2399
Failed transactions: 0
Longest transaction: 0.36
Shortest transaction: 0.14
Here it is with nginx page caching turned off
siege -c 5 -t10s https://www.example.com -i -q -b
Lifting the server siege... done.
Transactions: 113 hits
Availability: 100.00 %
Elapsed time: 9.99 secs
Data transferred: 0.70 MB
Response time: 0.44 secs
Transaction rate: 11.31 trans/sec ***
Throughput: 0.07 MB/sec
Concurrency: 4.95
Successful transactions: 113
Failed transactions: 0
Longest transaction: 0.70
Shortest transaction: 0.33

Playframework always operation timeout on about 16k-th request from ApacheBench if keep-alive flag not set

An activator template project was created by
activator new rest-benchmark simple-rest-scala
cd rest-benchmark
activator clean stage
target/universal/stage/bin/rest-benchmark -Dplay.crypto.secret=1234
the template can be found here
Then I run ApacheBench to get a rough idea of playframework's throughput:
ab -n 20000 -c 5 http://127.0.0.1:9000/books
which always give similar result - timeout at around 16.3k-th request:
apr_socket_recv: Operation timed out (60)
Total of 16345 requests completed
However, if I run ab with -k KeepAlive:
ab -k -n 20000 -c 5 http://127.0.0.1:9000/books
The benchmark was able to complete.
I have several questions:
Why always timeout when Keep Alive is absent? Shouldn't play be able to handle requests regardless of this header? Or is it my OS keep the connection open, hence no further requests can be processed?
Why around the 16300-th request? Is it related to ulimit?
If missing Keep Alive will cause connection timeout, what can I do in production?
Edit: Switched to play 2.5.4
Edit2: Changed app launch command as marcospereira suggested, observing same result

Why is my Hello World go server getting crushed by ApacheBench?

package main
import (
"io"
"net/http"
)
func hello(w http.ResponseWriter, r *http.Request) {
io.WriteString(w, "Hello world!\n")
}
func main() {
http.HandleFunc("/", hello)
http.ListenAndServe(":8000", nil)
}
I've got a couple of incredibly basic HTTP servers, and all of them are exhibiting this problem.
$ ab -c 1000 -n 10000 http://127.0.0.1:8000/
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
apr_socket_recv: Connection refused (61)
Total of 5112 requests completed
With a smaller concurrency value, things still fall over. For me, the issue seems to show up around the 5k-6k mark usually:
$ ab -c 10 -n 10000 http://127.0.0.1:8000/
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
apr_socket_recv: Operation timed out (60)
Total of 6277 requests completed
And in fact, you can drop concurrency entirely and the problem still (sometimes) happens:
$ ab -c 1 -n 10000 http://127.0.0.1:8000/
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
apr_socket_recv: Operation timed out (60)
Total of 6278 requests completed
I can't help but wonder if I'm hitting some kind of operating system limit somewhere? How would I tell? And how would I mitigate?
In short, you're running out of ports.
The default ephemeral port range on osx is 49152-65535, which is only 16,383 ports. Since each ab request is http/1.0 (without keepalive in your first examples), each new request takes another port.
As each port is used, it get's put into a queue where it waits for the tcp "Maximum Segment Lifetime", which is configured to be 15 seconds on osx. So if you use more than 16,383 ports in 15 seconds, you're effectively going to get throttled by the OS on further connections. Depending on which process runs out of ports first, you will get connection errors from the server, or hangs from ab.
You can mitigate this by using an http/1.1 capable load generator like wrk, or using the keepalive (-k) option for ab, so that connections are reused based on the tool's concurrency settings.
Now, the server code you're benchmarking does so little, that the load generator is being taxed just as much as the sever itself, with the local os and network stack likely making a good contribution. If you want to benchmark an http server, it's better to do some meaningful work from multiple clients not running on the same machine.

Drupal site - Memcache Connection errors

We are tying to perf tune our drupal site.
We are using Siege to measure performance (as drupal visitor).
Env:
Nginx + FastCGI+ Memcache
Siege runs fine for a few seconds, and then we run into connection errors:
Example:
HTTP/1.1 200 29.18 secs: 5877 bytes ==> /
HTTP/1.1 200 29.39 secs: 5877 bytes ==> /
warning: socket: -1656235120 select timed out: Connection timed out
warning: socket: -1673020528 select timed out: Connection timed out
Using the same Siege test confiuration, Nginx + FastCGI+ Drupal Cache seems to work fine.
Example:
HTTP/1.1 200 1.41 secs: 5868 bytes ==> /
HTTP/1.1 200 1.40 secs: 5868 bytes ==> /
As you can see, Response time is much higher with MemCache, in addition to the connection errors.
Any idea what could be wrong here... and why Drupal is throwing errors with memcache under load?
Memcache runs on a separate instance. Allocated 2GB memory for MemCache.
I guess that You run out of memcached connections. Please run a check of Your memcached installation with a simple script every second. Then start Siege. I guess Your memcached stops responding after a while.
Test memcache php script:
<?php
$memcache = new Memcache;
$memcache->connect('localhost', 11211) or die ('Unable to connect');
$version = $memcache->getVersion();
echo 'Server version: '.$version;
?>
What I guess is happening is that You have not disable the persistent connections in memcache and they hang around in the php threads. Memcached can serve ~1023 of them at a time and that might not be enough while Sieging.
You might also try ab, apache benchmarking tool with the close look to the -c switch. Play around with it and see how the results change on different values.
Finally, You should run a tcpdump on Your memcached port (usually 11211) on the php machine to find out what is happening to the connections. Does drupal start them? Does the other host respond with a RST or does it time out?
There was a bug in the memcached php documentation api that said that the connections are non-persistent by default. They are persistent by default (well, they were at the time I had the problem with it).
Feel free to comment this answer, I'll read the comments and assist further if necessary.

Resources