524 time out error with mutiiple connections - nginx

I am testing server performance using JMeter to mimic a large number of users hitting our digital ocean server in a short period of time. When I set JMeter to 200 users and test it against my Laravel based webpage, every works fine. When I increase the number of users to 500, I start to get 524 errors.
The server CPU never goes over 10% and the memory is at 30%, so the server has enough power. The first couple of hundred requests process correctly, but then the 524 errors begin to appear. The failed requests have a higher latency and connection than the successful requests as shown in this screen shot. Any clue where I should start looking for the problem?
My conf file in sites available
location ~.php$ {
try_files $uri =404;
fastcgi_pass unix:/var/run/php-fpm/xxxxxxxxxxxx.com.sock;
fastcgi_index index.php;
}
my nginx.conf settings
fastcgi_buffers 8 128k;
fastcgi_buffer_size 256k;
client_header_timeout 3000;
client_body_timeout 3000;
fastcgi_read_timeout 3000;
client_max_body_size 32m;
my /etc/sysctl.conf file settings, taken from this post after previously getting 504 errors https://www.digitalocean.com/community/questions/getting-nginx-fpm-sock-error
# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
### IMPROVE SYSTEM MEMORY MANAGEMENT ###
# Increase size of file handles and inode cache
fs.file-max = 2097152
# Do less swapping
vm.swappiness = 10
vm.dirty_ratio = 60
vm.dirty_background_ratio = 2
### GENERAL NETWORK SECURITY OPTIONS ###
# Number of times SYNACKs for passive TCP connection.
net.ipv4.tcp_synack_retries = 2
# Allowed local port range
net.ipv4.ip_local_port_range = 2000 65535
# Protect Against TCP Time-Wait
net.ipv4.tcp_rfc1337 = 1
# Decrease the time default value for tcp_fin_timeout connection
net.ipv4.tcp_fin_timeout = 15
# Decrease the time default value for connections to keep alive
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15
### TUNING NETWORK PERFORMANCE ###
# Default Socket Receive Buffer
net.core.rmem_default = 31457280
# Maximum Socket Receive Buffer
net.core.rmem_max = 12582912
# Default Socket Send Buffer
net.core.wmem_default = 31457280
# Maximum Socket Send Buffer
net.core.wmem_max = 12582912
# Increase number of incoming connections
net.core.somaxconn = 65535
# Increase number of incoming connections backlog
net.core.netdev_max_backlog = 65535
# Increase the maximum amount of option memory buffers
net.core.optmem_max = 25165824
# Increase the maximum total buffer-space allocatable
# This is measured in units of pages (4096 bytes)
net.ipv4.tcp_mem = 65535 131072 262144
net.ipv4.udp_mem = 65535 131072 262144
# Increase the read-buffer space allocatable
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.udp_rmem_min = 16384
# Increase the write-buffer-space allocatable
net.ipv4.tcp_wmem = 8192 65535 16777216
net.ipv4.udp_wmem_min = 16384
# Increase the tcp-time-wait buckets pool size to prevent simple DOS attacks
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
Added this to /etc/security/limits.conf
nginx soft nofile 2097152
nginx hard nofile 2097152
www-data soft nofile 2097152
www-data hard nofile 2097152
My php-fpm.d conf file settings
pm = static
pm.max_children = 40
pm.start_servers = 8
pm.min_spare_servers = 4
pm.max_spare_servers = 8
; number of processes to process before respawning. Lower number if you have memory leaks, but each respawn takes time
pm.max_requests=50
; pm.process_idle_timeout=10
chdir = /
php_admin_value[disable_functions] = exec,passthru,shell_exec,system
php_admin_flag[allow_url_fopen] = on
php_admin_flag[log_errors] = on
php_admin_value[post_max_size] = 8M
php_admin_value[upload_max_filesize] = 8M

HTTP 524 errors are cloudflare specific - they're not being generated by your own nginx installation. Cloudflare gave up waiting for a response from your backend service, probably because of the low numbers of fpm children available to serve the requests.
524 A Timeout Occurred
Cloudflare was able to complete a TCP connection to the origin server, but did not receive a timely HTTP response.
If you're performance testing your own server setup, don't go through the endpoint that points to cloudflare.
The general "backend timed out" response for http servers are 504 Gateway Timeout.

Related

What's the deciding factors of the number of tcp ack packages for a tcp data-transmission request?

I encountered a counterintuitive phenomenon that the rtt(Round-Trip Time) of request from client to a low-network-latency server(ServerL) is higher than the one from the same client to a high-network-latency server(ServerH). The interaction between server and client is quit simple: Client just sends a large bytes stream (1~2M bytes) and server accepts all the bytes and reply a simple ok return code.
Network latency is measured by ping. Ping latency:
client <-> ServerL: 1ms
client <-> ServerH: 1.9ms
It turned out that the rtt of Client to ServerL is enlarged due to server acks. ServerL sends much more acks than ServerB do. And the time interval between two adjacent ack seen by client perspective is sometimes larger than what is measured from ServerL side.
So, what's the probably deciding factors differing the numbers of acks sent by these two server? The socket buffer configuration in these two servers are the same, consequently excluding itself from being part of reasons. Configuration are checked by the following shell command:
# sysctl -a | grep congestion
net.ipv4.tcp_allowed_congestion_control = reno cubic
net.ipv4.tcp_available_congestion_control = reno cubic
net.ipv4.tcp_congestion_control = cubic
# sysctl -a | egrep "rmem|wmem|adv_win|moderate"
net.core.rmem_default = 262144000
net.core.rmem_max = 16777216
net.core.wmem_default = 262144000
net.core.wmem_max = 16777216
net.ipv4.tcp_adv_win_scale = 1
net.ipv4.tcp_moderate_rcvbuf = 1
sysctl: net.ipv4.tcp_rmem = 4096 87380 6291456
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096
vm.lowmem_reserve_ratio = 256 256 32 0 0

Decrease MariaDB query time per request

About
For any single query, it took the MariaDB server to do it in approximately 1ms for every request. When the concurrency increases the query time for each request also increases until it timed out. By far, It seems its only possible to do about 2k max connections per second for each mysql server instance, no amount of config tweeking seems to have any effect. Is there any way to reduce query time for each client by less than 0.1ms?
This is the query
select ID from table where id=1;
If it helps, here is the mysql configuration file
[client]
port = 3306
socket = /home/user/mysql.sock
[mysqld]
port = 3306
bind-address=127.0.0.1
datadir=/home/user/database
log-error=/home/user/error.log
pid-file=/home/user/mysqld.pid
innodb_file_per_table=1
back_log = 2000
max_connections = 1000000
max_connect_errors = 10
table_open_cache = 2048
max_allowed_packet = 16M
binlog_cache_size = 1M
max_heap_table_size = 64M
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
thread_cache_size = 8
thread_concurrency = 200
query_cache_size = 64M
query_cache_type = 1 #My settings
innodb_io_capacity = 100000
query_cache_limit = 2M
ft_min_word_len = 4
default-storage-engine = innodb
thread_stack = 240K
transaction_isolation = REPEATABLE-READ
tmp_table_size = 64M
log-bin=mysql-bin
binlog_format=mixed
slow_query_log
long_query_time = 2
server-id = 1
key_buffer_size = 32M
bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 128M
myisam_max_sort_file_size = 10G
myisam_repair_threads = 1
myisam_recover
innodb_buffer_pool_size = 2G
innodb_data_file_path = ibdata1:10M:autoextend
innodb_doublewrite = 0
sync_binlog=0
skip_name_resolve
innodb_write_io_threads = 500
innodb_read_io_threads = 500
innodb_thread_concurrency = 1000
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 8M
innodb_log_file_size = 256M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 120
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[myisamchk]
key_buffer_size = 512M
sort_buffer_size = 512M
read_buffer = 8M
write_buffer = 8M
[mysqlhotcopy]
interactive-timeout
[mysqld_safe]
open-files-limit = 81920
HW
2x Intel Xeon 2670 32 Gb RAM 500Gb ssd samsung evo 850
Detour
While its true that MySql can do more than 1 million queries per second, the test here just did only 250 connected clients.
Your machine has 4 cores, correct? So, if you run more than 4 CPU-bound processes simultaneously, the CPU will be saturated. This implies that each thread will be interrupted to let other threads run. That is, latency increases.
Is your goal to shrink time taken for the average query? That is latency? Then more connections will not help.
Is your goal queries/second, then, again, you will be stopped once the CPUs are saturated. That will probably happen before you get to 8 connections. After the CPU is saturated, throughput (queries/second) will level off even as you increase the number of connections. But, as I already said, latency for individual queries will increase.
If you want to push the machine, do multiple queries in each connection. Otherwise, you are only timing the connection handling. This is not a useful metric.
If you add more servers (via Replication, Clustering, etc), you can run more queries/second. Ditto for more cores. But nothing will decrease the time taken for an individual query by much.
In the settings, max_connections = 1000000 is ludicrous, and may consume a lot of RAM. As I have said, 8 might be all that your benchmarking can handle.
Another setting... Having the Query Cache turned on is deceptive. It speeds up running the identical SELECT if the table in question has not changed. That is, the first run of a query might take 1.0ms; then all subsequent runs of the same query might take 0.1ms. That is not a very exciting finding. Execute a query twice -- this will give you all you can learn, without firing up any benchmark platform etc.
But most Production machines find the QC to be useless. This is because the data is changing, so the QC is out of date. In fact, the cost of "purging" the QC may make queries run slower!
If you want lots of connections for reading, Replication Slaves can provide an unlimited number of connections. I used to work with a system with 23 Slaves; that gave 23x the connections. Booking.com has systems with well over 100 Slaves. That's how you can look up hotel availability so fast.
Please back up and think what your real goal is. Then we can discuss things further.

wordpress app works very slow on wampserver 2.5 localhost

I have made a lot of changes but always home page takes 5 secs to load and backend is also more slow
My PC is CORE 2 DUO 2GH CPU & 2G RAM Window 7 32 bits WampServer 2.5
I have disabled Cgi_module
I have changed localhost to 127.0.0.1 in System32/etc/hosts
I have updated this define('DB_HOST', '127.0.0.1'); in wp-config
I have increased memory limit in php.ini to 512
and I restart server but nothing changed
knowing that I work with woocommerce plugin
and I have two others apps without cms (native php, mysql), they works fast
UPDATE :
I have made those mysql tuning updates :
# of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 1024M
#innodb_additional_mem_pool_size = 2M
# Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 256M
innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit = 1
#innodb_lock_wait_timeout = 50
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates
[isamchk]
key_buffer = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
[mysqld]
port=3306
explicit_defaults_for_timestamp = TRUE

NGINX: Download stops after 1GB - upstream timed out (110: Connection timed out)

i have a huge Problem on my site.
Please help me to fix it.
i have a site where users can download files from different other sites (f.e one-click-hoster like uploaded.net). We act as like a proxy. The user generate a link and download the file directly. Our Script download nothing on the server. A little bit like a premium link generator but different. AND NOT ILLEGAL.
If the user are downloading a file that is larger than 1GB, the download will be canceled when it reachs 1gb.
In the log files i found repeated the error
"Upstream timed out (110: Connection timed out) while reading response"
I have tried to put the settings higher but that didn't help.
I tried following:
1. nginx.conf:
fastcgi_send_timeout 300s;
fastcgi_read_timeout 300s;
2. nginx host file:
fastcgi_read_timeout 300;
fastcgi_buffers 8 128k;
fstcgi_buffer_size 256k;
3. PHP.ini:
max_execution_time = 60 (but my php script will set it automaticly to 0)
max_input_time = 60
memory_limit = 128M
4. PHP-FPM >> www.conf
pm.max_children = 25
pm.start_servers = 2
pm.min_spare_servers = 2
pm.max_spare_servers = 12
request_terminate_timeout = 300s
But nothing helps. What can i do to fix this problem?
Server/Nginx Infos:
Memory: 32079MB
CPU: model name: Intel(R) Xeon(R) CPU E3-1230 v3 # 3.30GHz (8 Cores)
PHP: PHP 5.5.15-1~dotdeb.1 (cli) (built: Jul 24 2014 16:44:04)
NGINX: nginx/1.2.1
nginx.conf:
worker_processes 8;
worker_connections 2048;
But time settings are doens't matter i think. Because the download stops exactly on 1.604.408 KB everytime. If i download with 20kb/s the download needs more time, but will cancel on exactly 1.604.408 KB.
thank you for any help.
If you need more informations please ask me.
i had similar problem, where download would stop at 1024MB with error
readv() failed (104: Connection reset by peer) while reading upstream
adding this to nginx.conf file helped:
fastcgi_max_temp_file_size 1024m;

502 Gateway Errors under High Load (nginx/php-fpm)

I work for a rather busy internet site that is often gets very large spikes of traffic. During these spikes hundreds of pages per second are requested and this produces random 502 gateway errors.
Now we run Nginx (1.0.10) and PHP-FPM on a machine with 4x SAS 15k drives (raid10) with a 16 core CPU and 24GB of DDR3 ram. Also we make use of the latest Xcache version. The DB is located on another machine, but this machine's load is very low, and has no issues.
Under normal load everything runs perfect, system load is below 1, and PHP-FPM status report never really shows more than 10 active processes at one time. There is always about 10GB of ram still available. Under normal load the machine handles about 100 pageviews per second.
The problem arises when huge spikes of traffic arrive, and hundreds of page-views per second are requested from the machine. I notice that FPM's status report then shows up to 50 active processes, but that is still way below the 300 max connections that we have configured. During these spikes Nginx status reports up to 5000 active connections instead of the normal average of 1000.
OS Info: CentOS release 5.7 (Final)
CPU: Intel(R) Xeon(R) CPU E5620 # 2.40GH (16 cores)
php-fpm.conf
daemonize = yes
listen = /tmp/fpm.sock
pm = static
pm.max_children = 300
pm.max_requests = 1000
I have not setup rlimit_files, because as far as I know it should use the system default if you don't.
fastcgi_params (only added values to standard file)
fastcgi_connect_timeout 60;
fastcgi_send_timeout 180;
fastcgi_read_timeout 180;
fastcgi_buffer_size 128k;
fastcgi_buffers 4 256k;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
fastcgi_intercept_errors on;
fastcgi_pass unix:/tmp/fpm.sock;
nginx.conf
worker_processes 8;
worker_connections 16384;
sendfile on;
tcp_nopush on;
keepalive_timeout 4;
Nginx connects to FPM via Unix Socket.
sysctl.conf
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 1
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.tcp_timestamps = 0
net.ipv4.conf.all.rp_filter=1
net.ipv4.conf.default.rp_filter=1
net.ipv4.conf.eth0.rp_filter=1
net.ipv4.conf.lo.rp_filter=1
net.ipv4.ip_conntrack_max = 100000
limits.conf
* soft nofile 65536
* hard nofile 65536
These are the results for the following commands:
ulimit -n
65536
ulimit -Sn
65536
ulimit -Hn
65536
cat /proc/sys/fs/file-max
2390143
Question: If PHP-FPM is not running out of connections, the load is still low, and there is plenty of RAM available, what bottleneck could be causing these random 502 gateway errors during high traffic?
Note: by default this machine's ulimit's were 1024, since I changed it to 65536 I have not fully rebooted the machine, as it's a production machine and it would mean too much downtime.
This should fix it...
You have:
fastcgi_buffers 4 256k;
Change it to:
fastcgi_buffers 256 16k; // 4096k total
Also set fastcgi_max_temp_file_size 0, that will disable buffering to disk if replies start to exceeed your fastcgi buffers.
Unix socket accept 128 connections by default. It is good to put this line into /etc/sysctl.conf
net.core.somaxconn = 4096
If it's not helping in some cases - use normal port bind instead of socket, because socket on 300+ can block new requests forcing nginx to show 502.
#Mr. Boon
I have 8 core 14 GB ram. But the system gives Gateway time-out very often.
Implementing below fix also didn't solved the issue. Still searching for better fixes.
You have: fastcgi_buffers 4 256k;
Change it to:
fastcgi_buffers 256 16k; // 4096k total
Also set fastcgi_max_temp_file_size 0, that will disable buffering to disk if replies start to exceed your fastcgi buffers.
Thanks.

Resources