I have pretty straightforward setup of CouchDB on my Mint/Debian box. My Java webapp was sufferring rather long delays on querying CouchDB, so I started to seek for the causes.
EDIT: The query pattern is lots of small queries and small JSON objects (like 300 bytes up / 1Kbyte down).
Wireshark dumps are pretty nice, showing mostly 3-5 millis request-response turnaround. JVM frame sampling showed me that socket code (client side queries to the Couch) is somewhat busy, but nothing remarkable. Then I tried to profile the same with ApacheBench and oops: I currently see that keep-alive introduces steady extra 39ms delay over non-persistent setups.
Does anyone know how to explain this? Maybe persistent connections increase the congestion window on the TCP layer and then are idling out due to TCP_WAIT and small request/response sizes, or something like that?
Should this option (TCP_WAIT) be ever switched ON for loopback tcp connections?
w#mint ~ $ uname -a
Linux mint 2.6.39-2-486 #1 Tue Jul 5 02:52:23 UTC 2011 i686 GNU/Linux
w#mint ~ $ curl http://127.0.0.1:5984/
{"couchdb":"Welcome","version":"1.1.1"}
running with keep alive, average 40 millis per request
w#mint ~ $ ab -n 1024 -c 1 -k http://127.0.0.1:5984/
>>>snip
Server Software: CouchDB/1.1.1
Server Hostname: 127.0.0.1
Server Port: 5984
Document Path: /
Document Length: 40 bytes
Concurrency Level: 1
Time taken for tests: 41.001 seconds
Complete requests: 1024
Failed requests: 0
Write errors: 0
Keep-Alive requests: 1024
Total transferred: 261120 bytes
HTML transferred: 40960 bytes
Requests per second: 24.98 [#/sec] (mean)
Time per request: 40.040 [ms] (mean)
Time per request: 40.040 [ms] (mean, across all concurrent requests)
Transfer rate: 6.22 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 1 40 1.4 40 48
Waiting: 0 1 0.7 1 8
Total: 1 40 1.3 40 48
Percentage of the requests served within a certain time (ms)
50% 40
>>>snip
95% 40
98% 41
99% 44
100% 48 (longest request)
No keepalive, and voila - 1 ms per request, mostly.
w#mint ~ $ ab -n 1024 -c 1 http://127.0.0.1:5984/
>>>snip
Time taken for tests: 1.080 seconds
Complete requests: 1024
Failed requests: 0
Write errors: 0
Total transferred: 236544 bytes
HTML transferred: 40960 bytes
Requests per second: 948.15 [#/sec] (mean)
Time per request: 1.055 [ms] (mean)
Time per request: 1.055 [ms] (mean, across all concurrent requests)
Transfer rate: 213.89 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 1 1 1.0 1 11
Waiting: 1 1 0.9 1 11
Total: 1 1 1.0 1 11
Percentage of the requests served within a certain time (ms)
50% 1
>>>snip
80% 1
90% 2
95% 3
98% 5
99% 6
100% 11 (longest request)
Okay, now with keep-alive on but also asking to close the connection via http header. Also 1 ms per request or so.
w#mint ~ $ ab -n 1024 -c 1 -k -H 'Connection: close' http://127.0.0.1:5984/
>>>snip
Time taken for tests: 1.131 seconds
Complete requests: 1024
Failed requests: 0
Write errors: 0
Keep-Alive requests: 0
Total transferred: 236544 bytes
HTML transferred: 40960 bytes
Requests per second: 905.03 [#/sec] (mean)
Time per request: 1.105 [ms] (mean)
Time per request: 1.105 [ms] (mean, across all concurrent requests)
Transfer rate: 204.16 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 1 1 1.2 1 14
Waiting: 0 1 1.1 1 13
Total: 1 1 1.2 1 14
Percentage of the requests served within a certain time (ms)
50% 1
>>>snip
80% 1
90% 2
95% 3
98% 6
99% 7
100% 14 (longest request)
Yeah, this is related to tcp socket setup options. This configuration now leveled off all three cases at 1ms per request.
[httpd]
socket_options = [{nodelay, true}]
See this for details:
http://wiki.apache.org/couchdb/Performance#Network
Related
stream table
experiment
protocol
test
stream_size
metric
value
1
tcp
stream
64
throughput Gbps
10
1
tcp
stream
64
cpu utilization
.5
2
tcp
stream
64
throughput Gbps
40
2
tcp
stream
64
cpu utilization
.9
3
udp
stream
64
throughput Gbps
20
3
udp
stream
64
cpu utilization
.5
4
udp
stream
64
throughput Gbps
60
4
udp
stream
64
cpu utilization
.8
rr table
experiment
protocol
test
request_size
response_size
metric
value
5
tcp
request and response
64
64
transactions per second
10
5
tcp
request and response
64
64
cpu utilization
.6
6
tcp
request and response
64
1024
transactions per second
8
6
tcp
request and response
64
1024
cpu utilization
.5
7
udp
request and response
64
64
transactions per second
30
7
udp
request and response
64
64
cpu utilization
.4
8
udp
request and response
64
1024
transactions per second
29
8
udp
request and response
64
64
cpu utilization
.75
As of now, the outcomes for the experiments are the listed in the metric column, and their value is in the value column.
I know that I can drop the test specific columns like stream_size, request_size, and response_size, and then bind the rows to make one data frame.
Using R and tidyverse tools, how would you go about combining the two data frames into a long format, so that the combined data frame does not have the test specific columns, stream_size, request_size, and response_size?
Is there a better or more succinct way to make the schema for these experiments' data to facilitate combining the data frames?
You could bind the 2 dataframes together, then pivot just the columns that end with size to long form.
library(tidyverse)
bind_rows(stream, rr) %>%
pivot_longer(ends_with("size"), names_to = "test_specific", values_to = "size", values_drop_na = TRUE)
Output
experiment protocol test metric value test_specific size
<int> <chr> <chr> <chr> <dbl> <chr> <int>
1 1 tcp stream throughput Gbps 10 stream_size 64
2 1 tcp stream cpu utilization 0.5 stream_size 64
3 2 tcp stream throughput Gbps 40 stream_size 64
4 2 tcp stream cpu utilization 0.9 stream_size 64
5 3 udp stream throughput Gbps 20 stream_size 64
6 3 udp stream cpu utilization 0.5 stream_size 64
7 4 udp stream throughput Gbps 60 stream_size 64
8 4 udp stream cpu utilization 0.8 stream_size 64
9 5 tcp request and response transactions per second 10 request_size 64
10 5 tcp request and response transactions per second 10 response_size 64
# … with 14 more rows
Performance test results with Apache Bench.
Performance degrades with increasing concurrency.
Project is here
https://github.com/ohs30359-nobuhara/nginx-php7-alpine
$ ab -n 50 -c 1 "127.0.0.1/sample.html"
Concurrency Level: 1
Time taken for tests: 0.111 seconds
Complete requests: 50
Failed requests: 0
Total transferred: 11700 bytes
HTML transferred: 550 bytes
Requests per second: 448.50 [#/sec] (mean)
Time per request: 2.230 [ms] (mean)
Time per request: 2.230 [ms] (mean, across all concurrent requests)
Transfer rate: 102.49 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 1 2 0.9 2 6
Waiting: 1 2 0.8 2 5
Total: 1 2 1.0 2 6
Percentage of the requests served within a certain time (ms)
50% 2
66% 2
75% 2
80% 2
90% 3
95% 5
98% 6
99% 6
100% 6 (longest request)
$ ab -n 50 -c 50 "127.0.0.1/sample.html"
Concurrency Level: 50
Time taken for tests: 0.034 seconds
Complete requests: 50
Failed requests: 0
Total transferred: 11700 bytes
HTML transferred: 550 bytes
Requests per second: 1480.56 [#/sec] (mean)
Time per request: 33.771 [ms] (mean)
Time per request: 0.675 [ms] (mean, across all concurrent requests)
Transfer rate: 338.33 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 4 2.1 4 8
Processing: 9 18 5.2 20 24
Waiting: 2 18 5.5 20 24
Total: 9 23 5.6 25 30
Percentage of the requests served within a certain time (ms)
50% 25
66% 26
75% 26
80% 27
90% 29
95% 29
98% 30
99% 30
100% 30 (longest request)
The HTML returned here only displays characters that do not contain js or css.
I don't think the load will drop much with this load,
so is there a problem with nginx settings?
$ curl -s -D - https://www.google.com/ -o /dev/null
HTTP/1.1 200 OK
Date: Thu, 29 Oct 2015 05:33:13 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Set-Cookie: PREF=ID=1111111111111111:FF=0:TM=1446096793:LM=1446096793:V=1:S=LVeGIvKogvfq6VHi; expires=Thu, 31-Dec-2015 16:02:17 GMT; path=/; domain=.google.com
Set-Cookie: NID=72=sAIx-8ox3_AVxn6ymUjBsKzSmAXLwjNRTcV4Cj9ob1YmLkFc-lSJKvRK1kNdn1lIGruh-wH1_vctiRzKSFTG7IkJHSrVY_At_QbacsYgiI_8EOpMLe2cRIxXINj27DVpgnijGx7tKT1TCDirrunO3Bu0D4DVXz3lB0f42ZyJqOCtOJX2hprvbOOc8P8; expires=Fri, 29-Apr-2016 05:33:13 GMT; path=/; domain=.google.com; HttpOnly
Alternate-Protocol: 443:quic,p=1
Alt-Svc: quic="www.google.com:443"; p="1"; ma=600,quic=":443"; p="1"; ma=600
Accept-Ranges: none
Vary: Accept-Encoding
Transfer-Encoding: chunked
but Apache Bench has errors for all but one request:
$ ab -n 5 https://www.google.com/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking www.google.com (be patient).....done
Server Software: gws
Server Hostname: www.google.com
Server Port: 443
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
Document Path: /
Document Length: 18922 bytes
Concurrency Level: 1
Time taken for tests: 1.773 seconds
Complete requests: 5
Failed requests: 4
(Connect: 0, Receive: 0, Length: 4, Exceptions: 0)
Total transferred: 99378 bytes
HTML transferred: 94606 bytes
Requests per second: 2.82 [#/sec] (mean)
Time per request: 354.578 [ms] (mean)
Time per request: 354.578 [ms] (mean, across all concurrent requests)
Transfer rate: 54.74 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 158 179 40.8 162 252
Processing: 132 176 79.0 148 316
Waiting: 81 118 80.5 83 262
Total: 292 354 119.5 310 567
Percentage of the requests served within a certain time (ms)
50% 299
66% 321
75% 321
80% 567
90% 567
95% 567
98% 567
99% 567
100% 567 (longest request)
Why does ab have errors?
Add a -l to the command
It tells Apache Bench to not expect constant length for every response.
This should work:
ab -l -n 5 https://www.google.com/
Output:
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking www.google.com (be patient).....done
Server Software: gws
Server Hostname: www.google.com
Server Port: 443
SSL/TLS Protocol: TLSv1.2,ECDHE-ECDSA-CHACHA20-POLY1305,256,256
TLS Server Name: www.google.com
Document Path: /
Document Length: Variable
Concurrency Level: 1
Time taken for tests: 0.433 seconds
Complete requests: 5
Failed requests: 0
Total transferred: 67064 bytes
HTML transferred: 62879 bytes
Requests per second: 11.55 [#/sec] (mean)
Time per request: 86.588 [ms] (mean)
Time per request: 86.588 [ms] (mean, across all concurrent requests)
Transfer rate: 151.27 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 20 20 1.0 20 22
Processing: 63 66 2.7 67 69
Waiting: 62 65 2.8 66 68
Total: 83 86 3.3 87 91
Percentage of the requests served within a certain time (ms)
50% 85
66% 89
75% 89
80% 91
90% 91
95% 91
98% 91
99% 91
100% 91 (longest request)
AB has categorized these as length errors because it expects the responses to be of the same length. Google probably has some kind of dynamic content being returned on their homepage.
Load Testing with AB ... fake failed requests (length)
I have just built a proof of concept for an asp.net MVC controller to (1) generate a barcode from user input using barcode rendering framework and (2) embed it in a PDF document using wkhtmltopdf.exe
Before telling my client it's a working solution, I want to know it's not going to bring down their website. My main concern is long-term reliability -- whether for instance creating and disposing the unmanaged system process for wkhtmltopdf.exe might leak something. (Peak performance and load is not expected to be such an issue - only a few requests per minute at peak).
So, I run a couple of tests from the Windows command line:
(1) 1,000 Requests in Sequence (ie 1 at a time)
for /l %i in (1,1,1000) do curl ^
"http://localhost:8003/Home/Test?text=Iteration_%i___012345&scale=2&height=50" -o output.pdf
(2) Up to 40 requests sent within 2 seconds
for /l %i in (1,1,40) do start "Curl %i" curl ^
"http://localhost:8003/Home/Test?text=Iteration_%i___012345&scale=2&height=50" -o output%i.pdf
And I record some performance counters in perfmon before, during & after. Specifically I look at total processes, threads, handles, memory, disk use on the machine and on the IIS process specifically.
So, my questions:
1) What would you consider acceptable evidence that the the solution looks to be at low risk of bringing down the server? Would you amend what I've done, or would you do something completely different?
2) Given my concern is reliability, I think that the 'Before' vs 'After' figures are the ones I most care about. Agree or not?
3) Looking at the Before vs After figures, the only concern I see is the 'Processes Total Handle Count'. I conclude that launching wkhtmltopdf.exe nearly a thousand times has probably not leaked anything or destabilised the machine. But I might be wrong and should run the same tests for hours or days to reduce the level of doubt. Agree or not?
(The risk level: A couple of people's jobs might be on the line if it went pear-shaped. Revenue on the site is £1,000s per hour).
My perfmon results were as follows.
700 Requests in Sequence
1-5 Mins 10 Mins
Counter Before Test Min Ave Max After Test
System
System Processes 95 97 100 101 95
System Threads 1220 1245 1264 1281 1238
Memory Available MB 4888 4840 4850 4868 4837
Memory % Committed 23 24 24 24 24
Processes Total Handle Cou 33255 34147 34489 34775 34029
Processor % Processor Time 4 to 30 40 57 78 1 to 30
Physical Disk % Disk Time 1 0 7 75 0 to 30
IIS Express
% Processor Time 0 0 2 6 0
Handle Count 610 595 640 690 614
Thread Count 34 35 35 35 35
Working Set 138 139 139 139 139
IO Data KB/sec 0 453 491 691 0
20 Requests sent within 2 seconds followed by 40 Requests sent within 3 seconds
1-5 Mins 10 Mins
Counter Before Test Min Ave Max After Test
System
System Processes 95 98 137 257 96
System Threads 1238 1251 1425 1913 1240
Memory Available MB 4837 4309 4694 4818 4811
Memory % Committed 24 24 25 29 24
Processes Total Handle Cou 34029 34953 38539 52140 34800
Processor % Processor Time 1 to 30 1 48 100 1 to 10
Physical Disk % Disk Time 0 to 30 0 7 136 0 to 10
IIS Express
% Processor Time 0 0 1 29 0
Handle Count 610 664 818 936 834
Thread Count 34 37 50 68 37
Working Set 138 139 142 157 141
IO Data KB/sec 0 0 186 2559 0
I have written one http put client(use libcurl libarary) to put file into apache webdav server, and use tcpdump catch the packet at the server side, then use tcptrace (www.tcptrace.org) to analysis the dump file, below is the result:
Host a is the client side, Host b is the server side:
a->b: b->a:
total packets: 152120 total packets: 151974
ack pkts sent: 152120 ack pkts sent: 151974
pure acks sent: 120 pure acks sent: 151854
sack pkts sent: 0 sack pkts sent: 0
dsack pkts sent: 0 dsack pkts sent: 0
max sack blks/ack: 0 max sack blks/ack: 0
unique bytes sent: 3532149672 unique bytes sent: 30420
actual data pkts: 152000 actual data pkts: 120
actual data bytes: 3532149672 actual data bytes: 30420
rexmt data pkts: 0 rexmt data pkts: 0
rexmt data bytes: 0 rexmt data bytes: 0
zwnd probe pkts: 0 zwnd probe pkts: 0
zwnd probe bytes: 0 zwnd probe bytes: 0
outoforder pkts: 0 outoforder pkts: 0
pushed data pkts: 3341 pushed data pkts: 120
SYN/FIN pkts sent: 0/0 SYN/FIN pkts sent: 0/0
req 1323 ws/ts: N/Y req 1323 ws/ts: N/Y
urgent data pkts: 0 pkts urgent data pkts: 0 pkts
urgent data bytes: 0 bytes urgent data bytes: 0 bytes
mss requested: 0 bytes mss requested: 0 bytes
max segm size: 31856 bytes max segm size: 482 bytes
min segm size: 216 bytes min segm size: 25 bytes
avg segm size: 23237 bytes avg segm size: 253 bytes
max win adv: 125 bytes max win adv: 5402 bytes
min win adv: 125 bytes min win adv: 5402 bytes
zero win adv: 0 times zero win adv: 0 times
avg win adv: 125 bytes avg win adv: 5402 bytes
initial window: 15928 bytes initial window: 0 bytes
initial window: 1 pkts initial window: 0 pkts
ttl stream length: NA ttl stream length: NA
missed data: NA missed data: NA
truncated data: 0 bytes truncated data: 0 bytes
truncated packets: 0 pkts truncated packets: 0 pkts
data xmit time: 151.297 secs data xmit time: 150.696 secs
idletime max: 44571.3 ms idletime max: 44571.3 ms
throughput: 23345867 Bps throughput: 201 Bps
RTT samples: 151915 RTT samples: 120
RTT min: 0.0 ms RTT min: 0.1 ms
RTT max: 0.3 ms RTT max: 40.1 ms
RTT avg: 0.0 ms RTT avg: 19.9 ms
RTT stdev: 0.0 ms RTT stdev: 19.8 ms
RTT from 3WHS: 0.0 ms RTT from 3WHS: 0.0 ms
RTT full_sz smpls: 74427 RTT full_sz smpls: 60
RTT full_sz min: 0.0 ms RTT full_sz min: 39.1 ms
RTT full_sz max: 0.3 ms RTT full_sz max: 40.1 ms
RTT full_sz avg: 0.0 ms RTT full_sz avg: 39.6 ms
RTT full_sz stdev: 0.0 ms RTT full_sz stdev: 0.3 ms
post-loss acks: 0 post-loss acks: 0
segs cum acked: 89 segs cum acked: 0
duplicate acks: 0 duplicate acks: 0
triple dupacks: 0 triple dupacks: 0
max # retrans: 0 max # retrans: 0
min retr time: 0.0 ms min retr time: 0.0 ms
max retr time: 0.0 ms max retr time: 0.0 ms
avg retr time: 0.0 ms avg retr time: 0.0 ms
sdv retr time: 0.0 ms sdv retr time: 0.0 ms
According the result above, the RTT of client to server is small, but the server side to client side is large. Can anyone help explain this from me?
Because this
unique bytes sent: 3532149672 unique bytes sent: 30420
actual data pkts: 152000 actual data pkts: 120
actual data bytes: 3532149672 actual data bytes: 30420
a->b is sending a steady flow of data, which ensures buffers get filled and things get pushed.
b->a is only sending a few acks etc, doing next to nothing at all, so as a result things get left in buffers for a while (a few ms).
In addition to that, RTT is round trip time. It's the time from when the application queues a packet for sending and when the corresponding response is received. Since the host on a is busy pushing data, and probably filling its own buffers, there's going to be a small amount of additional overhead for something from b to get acknowledged.
Firstly host b sent very little data (a very small sample size). Secondly, I suspect that host a has an asymmetrical Internet connection (e.g. 10MB/1MB).