We are trying to build HA Kubernetese cluster with 3 core nodes each of having full set of vital components: ETCD + APIServer + Scheduller + ControllerManager and external balancer. Since ETCD can make clusters by themselves, we are stack with making HA APIServers. What seemed an obvious task a couple of weeks ago now became a "no way disaster"...
We decided to use nginx as a balancer for 3 independent APIServers. All the rest parts of our cluster that communicate with APIServer (Kublets, Kube-Proxys, Schedulers, ControllerManagers..) are suppose to use balancer to access it. Everything went well before we started the "destructive" tests (as I call it) with some pods runing.
Here is the part of APIServer config that dials with HS:
.. --apiserver-count=3 --endpoint-reconciler-type=lease ..
Here is our nginx.conf:
user nginx;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
worker_processes auto;
events {
multi_accept on;
use epoll;
worker_connections 4096;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
gzip on;
underscores_in_headers on;
include /etc/nginx/conf.d/*.conf;
}
And apiservers.conf:
upstream apiserver_https {
least_conn;
server core1.sbcloud:6443; # max_fails=3 fail_timeout=3s;
server core2.sbcloud:6443; # max_fails=3 fail_timeout=3s;
server core3.sbcloud:6443; # max_fails=3 fail_timeout=3s;
}
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 6443 ssl so_keepalive=1m:10s:3; # http2;
ssl_certificate "/etc/nginx/certs/server.crt";
ssl_certificate_key "/etc/nginx/certs/server.key";
expires -1;
proxy_cache off;
proxy_buffering off;
proxy_http_version 1.1;
proxy_connect_timeout 3s;
proxy_next_upstream error timeout invalid_header http_502; # non_idempotent # http_500 http_503 http_504;
#proxy_next_upstream_tries 3;
#proxy_next_upstream_timeout 3s;
proxy_send_timeout 30m;
proxy_read_timeout 30m;
reset_timedout_connection on;
location / {
proxy_pass https://apiserver_https;
add_header Cache-Control "no-cache";
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $http_host;
proxy_set_header Authorization $http_authorization;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-SSL-CLIENT-CERT $ssl_client_cert;
}
}
What came out after some tests is that Kubernetes seem to use single long living connection instead of tradition open-close sessions. This is probably dew to SSL. So we have to increase proxy_send_timeout and proxy_read_timeout to ridiculous 30m (the default value for APIServer is 1800s). If this settings are under 10m, then all clients (like Scheduler and ControllerManager) will generate tons if INTERNAL_ERROR because of broken streams.
So, for the crash test I simply put one of APIServers down by gently switching it off. Then I restart another one so nginx sees that upstream went down and switch all current connections to the last one. A couple of seconds later restarted APIserver returns back and we have 2 APIServers working. Then, I put network down on the third APIServer by running 'systemctl stop network' on that server so it has no chances to inform Kubernetes or nginx that its going down.
Now, the cluster it totally broken! nginx seem to recognize that upstream went down, but it will not reset already exciting connections to the upstream that is dead. I can still see them with 'ss -tnp'. If I restart Kubernetes services, they will reconnect and continue to work, same if I restart nginx - new sockets will show in ss output.
This happens only if I make APIserver unavailable by putting network down (preventing it from closing existing connections to nginx and informing Kubernetes that it is switching off). If I just stop it - everything work as a charm. But this is not a real case. Server can go down without any warning - just instantly.
What we are doing wrong? Is there is a way to force nginx to drop all connections to the upstream that went down? Anything to try before we move to HAProxy or LVS and ruin a week of kicking nginx in our attempts to make it balance instead of breaking our not so HA cluster.
Related
Trying to configure Nginx for two purposes:
Reverse proxy to redirect requests to local tomcat server (port 443 to 10443 listening by to
mcat)
Mirror requests to backend server for analysing purposes
Since we encountered very low performance using the default configuration and the mirror directive, we decided to try just with reverse proxy to check if there is an impact on the server and indeed seems like nginx is capping the traffic by almost half (we are using Locust and Jmeter as load tools)
Nginx version: 1.19.4
Worked through 10-tips-for-10x-application-performance & Tuning NGINX for Performance
with no avail.
The machine nginx & tomcat runs on should be strong enough (EC2 c5.4XLarge) and we don't see lack in resources but more of network capping. Very high count of TIME_WAIT connections (20k-40k)
From the machine perspective:
Increased net port range (1024 65300)
Lowered tcp_fin_timeout (15ms)
increased max FD to the max
Nginx perspective (adding nginx.conf after):
keepalive_requests 100000;
keepalive_timeout 1000;
worker_processes 10 (16 is cpu count)
worker_connections 3000;
worker_rlimit_nofile 100000;
nginx.conf:
user nginx;
worker_processes 10;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
worker_rlimit_nofile 100000;
events {
worker_connections 3000;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
log_format main_ext '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'"$host" sn="$server_name" '
'rt=$request_time '
'ua="$upstream_addr" us="$upstream_status" '
'ut="$upstream_response_time" ul="$upstream_response_length" '
'cs=$upstream_cache_status' ;
keepalive_requests 100000;
keepalive_timeout 1000;
ssl_session_cache shared:SSL:10m;
sendfile on;
#tcp_nopush on;
#gzip on;
include /etc/nginx/conf.d/*.conf;
upstream local_host {
server 127.0.0.1:10443;
keepalive 128;
}
server {
listen 443 ssl;
ssl_certificate /etc/ssl/nginx/crt.pem;
ssl_certificate_key /etc/ssl/nginx/key.pem;
location / {
# mirror /mirror;
proxy_set_header Host $host;
proxy_pass https://local_host$request_uri;
}
# Mirror configuration
location = /mirror {
internal;
proxy_set_header Host test-backend-dns;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_connect_timeout 3s;
proxy_read_timeout 100ms;
proxy_send_timeout 100s;
proxy_pass https://test-backend-ip:443$request_uri;
}
}
}
Also monitor using Amplify agent, seems like connections count meets with the expected requests and connections, but the actual requests count is low.
Amplify monitor output
Seems like a simple task for Nginx, but something is misconfigured.
Thank you for your answers
After many attempts and ways to figure things out, we got to a conclusion the response time from the application was higher with nginx.
Our assumption and how we eventually overcome this issue, was the SSL Termination.
This is an expensive operation, from both resources and time wise.
What we did was to have the nginx (which is more than capable of handling much higher load than what we hit it with, ~4k RPS) be responsible solely on the SSL Termination, and we changed the tomcat app configuration such that it listens to HTTP requests rather than HTTPS.
This reduced dramatically the TIME_WAIT connections that were packing and taking important resources from the server.
Final configurations for nginx, tomcat & the kernel:
linux machine configuration:
- /proc/sys/net/ipv4/ip_local_port_range - set to 1024 65535
(allows more ports hence ---> more connections)
- sysctl net.ipv4.tcp_timestamps=1
(""..reduce performance spikes related to timestamp generation..")
- sysctl net.ipv4.tcp_tw_recycle=0
(This worked for us. Should be tested with/without tcp_tw_reuse)
- sysctl net.ipv4.tcp_tw_reuse=1
(Same as tw_recycle)
- sysctl net.ipv4.tcp_max_tw_buckets=10000
(self explanatory)
Redhat explanation for tcp_timeouts conf
Tomcat configuration:
<Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
maxThreads="4000"
minSpareThreads="10"
/>
<!-- A "Connector" using the shared thread pool - NO SSL -->
<Connector executor="tomcatThreadPool"
port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
acceptCount="5000"
pollerThreadCount="16"
acceptorThreadCount="16"
redirectPort="8443"
/>
Nginx specific performance params configuration:
main directive:
- worker_processes auto;
- worker_rlimit_nofile 100000;
events directive:
- worker_connections 10000; (we think can be lower)
- multi_accept on;
http directive:
- keepalive_requests 10000;
- keepalive_timeout 10s;
- access_log off;
- ssl_session_cache shared:SSL:10m;
- ssl_session_timeout 10m;
Really helps to understand the two points of the equation: Nginx and tomcat.
We used jmx metrics to understand whats going on on tomcat along side prometheus metrics from our app.
And Amplify agent to monitor nginx behavior.
Hope that helps to anyone.
I am trying to use nginx for loadbalancing. I have to use ip_hash because I work with websockets. Following is the configuration:
#user nobody;
worker_processes 3;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
#tcp_nopush on;
#keepalive_timeout 0;
keepalive_timeout 65;
#gzip on;
upstream my_http_servers {
ip_hash;
server 127.0.0.1:3001;
server 127.0.0.1:3004;
server 127.0.0.1:3003;
}
server {
listen 3000;
server_name localhost;
#charset koi8-r;
#access_log logs/host.access.log main;
location / {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_pass http://my_http_servers;
# enable WebSockets
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
}
Now I have all the 3 servers and nginx running locally on machine 1 (ip: 192.168.10.2).
I also have a frontend application which calls this backend server. My frontend runs on 192.168.10.2:4200.
When I call the http://192.168.10.2:4200 from machine1, it goes to say server1.
From my machine2 which is connected to the same WIFI (ip: 192.168.10.23), I call http://192.168.10.2:4200, but it still goes to server1.
ip_hash is not correctly doing load balancing. I am not sure what I am doing wrong, I understand ip_hash will be a sticky connection, so all requests from machine1 should go to server1 but from machine2 it should go to some other servers?
Edit:
I even tried using hash $remote_addr; instead of ip_hash, but still all requests are going to the same single server. This is my configuration using hash:
worker_processes 3;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
keepalive_timeout 65;
upstream my_http_servers {
hash $remote_addr;
server 127.0.0.1:3001;
server 127.0.0.1:3002;
server 127.0.0.1:3003;
}
server {
listen 3000;
server_name localhost;
location / {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_pass http://my_http_servers;
# enable WebSockets
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
}
According to the docs
The first three octets of the client IPv4 address, or the entire IPv6 address, are used as a hashing key.
So as an example all these addresses 192.168.1.* , will be mapped to the same server.
If your server is running in your office network and both of the machine you tested are also connected to your office network, it probably won't work because usually office networks are configured such that all devices will get ip addresses with the same three octets.
If both the machines you used are running on the same office network then they probably have the same external ip, so they will also be mapped to the same server.
If you run both machines with an actual different external ips with where the first three octets are different, there is still a 33% chance that hashing two different ips will result in passing both of them to the same server
But if you use "hash" directive instead of "ip_hash" then you can combine several request variables into hash calculation. Example:
hash '$remote_addr $cookie_zzz $http_user_agent';
When you use remote IP addresses in directive "hash" , they (IP addresses) are treated as ordinary variables and can be used for round-robin upstreaming.
hash '$remote_addr';
We are connecting to a system where in 4 ports are exposed to serve the grpc requests. Used nginx as load balancer to forward the 4 client grpc requests with below configuration:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 768;
# multi_accept on;
}
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent"';
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
upstream backend{
#least_conn;
server localhost:9000 weight=1 max_conns=1;
server localhost:9001 weight=1 max_conns=1;
server localhost:9002 weight=1 max_conns=1;
server localhost:9003 weight=1 max_conns=1;
}
server {
listen 80 http2;
access_log /tmp/access.log main;
error_log /tmp/error.log error;
proxy_buffering off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_set_header Host $http_host;
location / {
#eepalive_timeout 0;
grpc_pass grpc://backend;
grpc_pass_header userid;
grpc_pass_header transid;
}
}
}
It is observed that few times all client 4 requests goes to all the 4 ports but sometimes (say 30%) to only 2 ports/3ports. Seems like default round robin is not happening with the NGINX as expected. We tried all possibilities like max_conns, least_conn, weight but no luck.
Seems like I have encountered the issue as in below links:
https://serverfault.com/questions/895116/nginx-round-robin-nor-exactly-round-robin
https://stackoverflow.com/questions/40859396/how-to-test-load-balancing-in-nginx
When i was going through Quora found that "fair" module in nginx would resolve this.
"The Nginx fair proxy balancer enhances the standard round-robin load
balancer provided with Nginx so that it will track busy back end servers (e.g. Thin, Ebb, Mongrel) and balance the load to non-busy server processes. "
https://www.quora.com/What-is-the-best-way-to-get-Nginx-to-do-smart-load-balancing
I tried using "fair" module with NGINX from source but encountered so many issues. I could not start the NGINX itself. Can anyone help with this issue?
We got the answer !!!! Just changed "worker_processes auto;" to "worker_processes 1;" Now, it is working fine.
All the requests are load balanced properly. Here we felt if you use other than single worker, multiple worker might send the requests to the same port.
I don't know why exactly this is happening but it may have something to do with the browser.
I encountered the same problem when I was using the browser to send the requests. When I sent the requests from the terminal using curl it was working fine.
Problem Statement:
I want to setup a active failover using nginx plus ( i subscribed for 30 day trial).
All servers should go to primary server, if that goes down(404) only then the requests should go to second server. Once the primary is up the requests should go back to the original server. Is it possible?
With the help of other threads i was able to create the following config file. Almost all the error codes i could find, I tried that with proxy_next_upstream, but i am still not able to achieve the intended results.
I brought down the primary server manually to return 404. It briefly return 503 when its going down. But still no luck with redirecting the traffic.
Both the servers are hosted on IBM Bluemix as nodejs apps. I can share more details if needed.
upstream up1 {
server up_server1;
}
upstream up2 {
server up_server2;
}
server {
listen 80;
location / {
proxy_pass http://up1;
proxy_next_upstream non_idempotent invalid_header error timeout http_500 http_502 http_504 http_403 http_404;
}
}
This is governed by another config file which looks like. Just to give more info
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log notice;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
# geoip_city /etc/nginx/geoip/GeoLiteCity.dat;
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}
# TCP/UDP proxy and load balancing block
#
#stream {
# Example configuration for TCP load balancing
#upstream stream_backend {
# zone tcp_servers 64k;
# server backend1.example.com:12345;
# server backend2.example.com:12345;
#}
#server {
# listen 12345;
# status_zone tcp_server;
# proxy_pass stream_backend;
#}
#}
Your problem is that you do not in any way use the upstream up2. proxy_next_upstream mean next server in your current upstream - which is defined as "up1" in proxy_pass http://up1, not magically pick any other upstream.
So what you want is to delete upstream up2 and only leave:
upstream up1 {
server up_server1;
server up_server2;
}
I was able to resolve this issue with a small tweak, which was missed in almost all the answers posted in stack overflow. When you are defining 2 different upstreams for using with proxy_next_upstream, you need to add the server as a second entry in your original upstream directive. Check out the code below.
upstream up1 {
server up_server1;
server up_server2; # this entry is important, but not sure why!
}
upstream up2 {
server up_server2;
}
I have an inner server that runs my application. This application runs on port 9001. I want people access this application through nginx which runs on an Ubuntu machine that runs on DMZ network.
I have built nginx from source with the options of sticky and SSL modules. It runs fine but does not do the proxy pass.
The DNS name for the outer IP of the server is: bd.com.tr and I want people to see the page http://bd.com.tr/public/control.xhtml when they enter bd.com.tr but even tough nginx redirects the root request to my desired path, the application does not show up.
My nginx.conf file is:
worker_processes 4;
error_log logs/error.log;
worker_rlimit_nofile 20480;
pid logs/nginx.pid;
events {
worker_connections 1900;
}
http {
include mime.types;
default_type application/octet-stream;
server_tokens off;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
keepalive_timeout 75;
rewrite_log on;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Ssl on;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_read_timeout 150;
server {
listen 80;
client_max_body_size 300M;
location = / {
rewrite ^ http://bd.com.tr/public/control.xhtml redirect;
}
location /public {
proxy_pass http://BACKEND_IP:9001;
}
}
}
What might I be missing?
It was a silly problem and I found it. The conf file is correct so you can use it if you want and the problem was; The port 9001 of the BACKEND_IP was not forwarded and thus nginx was not able to reach the inner service. After forwarding the port, it worked fine. I found the problem in error.log so if you encounter such problem please check error logs first :)