Backup nginx server returning 504 - nginx

I have 3 nginx servers setup. The backup web server and the Home server both have identical ../sites-enabled and ../sites-available directories. And the third server acts as a load balancer that points to both the backup and the home server with the config:
upstream myapp1 {
server 1.1.1.1; #home server
server 2.2.2.2 backup; #backup server
}
server {
listen 80;
location / {
proxy_pass http://myapp1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
But I am having an issue (which is explained more graphically below) when I am testing to see whether the backup server is working, as it only seems to be working when the Home Server is on!
1. test.foo.com -> Backup Web Server
2. foo.com -> Load Balancer
3. www.foo.com -> Home server
-> means points to
When Nothing is down:
- 1 returns OK
- 2 returns OK
- 3 returns OK
When Home Server is down:
- 1 returns 504 **(SHOULD BE OK)**
- 2 returns 504 **(SHOULD BE OK)**
- 3 returns DNS error
When Load Balancer is down:
- 1 returns OK
- 2 returns DNS error
- 3 returns OK
When Backup Web Server is down:
- 1 returns DNS error
- 2 returns 200
- 3 returns 200

You seem to be confused on the terminology here:
when load balancer is down, you'd be getting connect(2) Connection refused or Operation timed out-style errors; you would not be getting DNS errors
likewise, the fact that you're getting 504 from your upstream home server, means that it is NOT down, thus your backup server never gets used, because nginx would only use backup if the primary server is really not available
You could potentially fix the second issue by getting the paid version of nginx, which has support for the health_check directive.
Alternatively, you could implement caching, and use proxy_cache_use_stale to specify that a cached version should be returned instead. Also, take a look at error_page, too

Related

Upstream server always redirects back to root path in Nginx reverse proxy configuration

I plan to run an pgAdmin4 instance behind Nginx reverse proxy.
What I want is requests coming to http://myhost.com/pgadmin4 should be forwarded to upstream http://localhost:8087 where pgAdmin is listening.
To achieve this, I followed this nginx.conf recipe from pgAdmin's official doc. Here's the snippet:
server {
listen 80;
server_name myhost.com;
location /pgadmin4/ {
proxy_set_header X-Script-Name /pgadmin4;
proxy_set_header Host $host;
proxy_pass http://localhost:8087/;
proxy_redirect off;
}
}
Everything works fine except that after successful login, the server sends HTTP 301 response with location header sets to root path (i.e. "Location: /") aand bam, user agent is redirected to http://myhost.com/ where nothing is waiting (except nginx default page, for now).
Retyping the url to http://myhost.com/pgadmin4/ is still okay. The user agent's state, cookies and all are all set and user can continue as normal. It's just that it's a mild annoyance for end users having to retype the whole URL again.
I know that I can alter upstream's HTTP redirect response by using proxy_redirect directive, but I can't figure out what the value should be.
Is what I'm trying to do achievable from just by Nginx configuration? Is there any specific PgAdmin4 config that I need to change?

actioncable: subscription to channel fails in production with ssl

jruby 9.3.6 (hence ruby 2.6.8), rails 6.1.6.1. in production using ssl (wss) with devise, puma, nginx.
Locally actioncable is running without problems, but on the external server actioncable establishes a Websocket connection, leading to nginx: GET /cable HTTP/1.1" 101 210, the user gets verified correctly from the expanded session_id in the cookie, and after "Successfully upgraded to WebSocket", the browser receives a {"type":"welcome"} and pings.
Hence the actioncable javascript sends a request to subscribe to a channel, but that doesn't have success.
I tried a lot regarding the configuration of nginx, e.g. switching to passenger for the actioncable location /cable in nginx, and after 5 days I even changed from calling the server side of actioncable from a simple "new Websocket" in javascript to the client-side implementation of actioncable - as is designed to be (using import maps and actioncable.esm.js), but it didn't solve the main problem.
The logfiles:
production.log:
[ActionCable connect in app/channels/application_cable/connection.rb:] WebSocket error occurred: Broken pipe -
If the domain.com is called once, this error takes place every time 0.2 Seconds. If the page is closed, the error continues. The 0.2 seconds is the same frequency with which the browser sends the requests to subscribe to the channel. I assumed a log time, it would have to do with ssl, but I don't catch the problem. So now, I assume that the "broken pipe" is a problem between the jruby app and actioncable on the server side. But I am not sure and actually I don't know, how to troubleshoot there.
Additional there is a warning in puma.stderr.log:
warning: thread "Ruby-0-Thread-27:
/home/my_app/.rbenv/versions/jruby-9.3.6.0/lib/ruby/gems/shared/gems/actioncable-6.1.6.1/lib/action_cable/connection/stream_event_loop.rb:75" terminated with exception (report_on_exception is true):
ArgumentError: mode not supported for this object: r
starting redis-cli and doing 'monitor':
1660711192.522723 [1 127.0.0.1:33630] "select" "1"
1660711192.523545 [1 127.0.0.1:33630] "client" "setname" "ActionCable-PID-199512"
publishing to MessagesChannel_1 works:
1660711192.523831 [1 127.0.0.1:33630] "publish" "messages_1" "{\"message\":\"message-text\"}"
In comparison in the local development configuration, this looks different:
1660712957.712189 [1 127.0.0.1:46954] "select" "1"
1660712957.712871 [1 127.0.0.1:46954] "client" "setname" "ActionCable-PID-18600"
1660712957.713495 [1 127.0.0.1:46954] "subscribe" "_action_cable_internal"
1660712957.716100 [1 127.0.0.1:46954] "subscribe" "messages_1"
1660712957.974486 [1 127.0.0.1:46952] "publish" "messages_3" "{\"message\":\"message-text\"}"
So what is "_action_cable_internal", and why doesn't it take place in production?
I found the code for the actioncable gem and added 'p #pubsub' in gems/actioncable-6.1.6.1/lib/action_cable/server/base.rb at the end of the def pubsub -function and compared that information with the local configuration.
locally there is an info:
#thread=#<Thread:0x5326bff6#/home/me_the_user/.rbenv/versions/jruby-9.2.16.0/lib/ruby/gems/shared/gems/actioncable-6.1.6.1/lib/action_cable/connection/stream_event_loop.rb:75 sleep>
which corresponds to the info at the server:
#thread=#<Thread:0x5f4462e1#/home/me_the_user/.rbenv/versions/jruby-9.3.6.0/lib/ruby/gems/shared/gems/actioncable-6.1.6.1/lib/action_cable/connection/stream_event_loop.rb:75 dead>
So it looks like the "warning" was an 'error'.
Also I am not sure, if the output of wscat / curl is normal or reports an error:
root#server:~# wscat -c wss://domain.tld
error: Unexpected server response: 302
Which could be normal due to missing '/cable'.
But:
root#server:~# wscat -c wss://domain.tld/cable
error: Unexpected server response: 404
root#server:~# curl -I https://domain.tld/cable
HTTP/1.1 404 Not Found
the configurations:
nginx.conf:
http {
upstream app {
# Path to Puma SOCK file, as defined previously
server unix:///var/www/my_app/shared/sockets/puma.sock fail_timeout=0;
}
server {
listen 443 ssl default_server;
server_name domain.com www.domain.com;
include snippets/ssl-my_app.com.conf;
include snippets/ssl-params.conf;
root /var/www/my_app/public;
try_files $uri /index.html /index.htm;
location /cable {
proxy_pass http://app;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Ssl on;
proxy_set_header X-Forwarded-Proto https;
}
location / {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $http_host;
proxy_redirect off;
add_header X-Frame-Options SAMEORIGIN always;
proxy_pass http://app;
}
rails_env production;
} }
cable.yml:
production:
adapter: redis
url: redis://127.0.0.1:6379/1
ssl_params:
verify_mode: <%= OpenSSL::SSL::VERIFY_NONE %>
initializer/redis.rb:
$redis = Redis.new(:host => 'domain.com/cable', :port => 6379)
routes.rb:
mount ActionCable.server => '/cable'
config/environments/production.rb:
config.force_ssl = true
config.action_cable.allowed_request_origins = [/http:\/\/*/, /https:\/\/*/]
config.action_cable.allow_same_origin_as_host = true
config.action_cable.url = "wss://domain.com/cable"
I assume that is has to do with the ssl, but I am not an expert for configurations, so thank you very much for any help.
The problem was caused due to hardware: I am using a so called 'airbox' in France, which is a mobile internet access offering WLAN.
So the websocket connection always was closed due to actioncable websocket certificate not "being known" and mobile internet to be more strict.
Hence I created a pseudo-websocket: if the websocket connection is closed imediately, the browser of the user asks every 2,5 seconds, if there is something that would have been sent via websocket.

502 Bad Gateway Elixir/Phoenix

I've set up a new Ubuntu 18.04 server and deployed my phoenix app but am getting a 502 error when trying to access it.
I don't yet have a domain name because I will be transferring one from another server, so just trying to connect with the IP address.
The Phoenix app is deployed and running, and I can ping it with edeliver.
Prod conf:
config :app, AppWeb.Endpoint,
load_from_system_env: false,
url: [host: "127.0.0.1", port: 4013],
cache_static_manifest: "priv/static/cache_manifest.json",
check_origin: true,
root: ".",
version: Mix.Project.config[:version]
config :logger, level: :info
config :phoenix, :serve_endpoints, true
import_config "prod.secret.exs"
Nginx conf:
server {
listen 80;
server_name _;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_pass http://127.0.0.1:4013;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Nginx Error log:
2020/05/14 22:28:23 [error] 22908#22908: *24 connect() failed (111: Connection refused) while connecting to upstream, client: ipaddress, server: _, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:4013/", host: "ipaddress"
Edit:
Last two entries of OTP logs confirming app is alive
===== ALIVE Fri May 15 07:33:19 UTC 2020
===== ALIVE Fri May 15 07:48:19 UTC 2020
Edit 2:
I have posted a Gist detailing all the steps I have taken going from a clean Ubuntu box to where I am now here: https://gist.github.com/phollyer/cb3428e6c23b11fadc5105cea1379a7c
Thanks
You have to add server: true to your configuration, like:
config :wtmitu, WtmituWeb.Endpoint,
server: true, # <-- this line
load_from_system_env: false,
...
You don't have to add it to the dev environment because mix phx.server is doing it for you.
The Doc
This has been resolved as follows:
There were two problems that required resolving.
Adding config :app, AppWeb.Endpoint, server: true to either prod.secret.exs or prod.exs was required.
I had a running process left over from mistakenly deploying staging to the same server, initially. I originally logged in to the server, and stopped staging with ./bin/app stop, maybe this left a process running, maybe somehow I started the process by mistake later on. Anyway, I used ps ux to list the running processes and found that one of the processes listed staging in its path, so I killed all running processes related to the deloyment, both staging and production, with kill -9 processId, re-deployed to production, and all is now fine.

nginx reverse proxy not detecting dropped load balancer

We have the following config for our reverse proxy:
location ~ ^/stuff/([^/]*)/stuff(.*)$ {
set $sometoken $1;
set $some_detokener "foo";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Authorization "Basic $do_token_decoding";
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_redirect https://place/ https://place_with_token/$1/;
proxy_redirect http://place/ http://place_with_token/$1/;
resolver 10.0.0.2 valid=10s;
set $backend https://real_storage$2;
proxy_pass $backend;
}
Now, all of this works .... until the real_storage rotates a server. For example, say real_storage comes from foo.com. This is a load balancer which directs to two servers: 1.1.1.1 and 1.1.1.2. Now, 1.1.1.1 is removed and replaced with 1.1.1.3. However, nginx continues to try 1.1.1.1, resulting in:
epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting to upstream, client: ..., server: ..., request: "GET ... HTTP/1.1", upstream: "https://1.1.1.1:443/...", host: "..."
Note that the upstream is the old server, shown by a previous log:
[debug] 1888#1888: *570837 connect to 1.1.1.1:443, fd:60 #570841
Is this something misconfigured on our side or the host for our real_storage?
*The best I could find that sounds even close to my issue is https://mailman.nginx.org/pipermail/nginx/2013-March/038119.html ...
Further Details
We added
proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
and it still failed. I am now beginning to suspect that since it is two ELBs (ours and theirs) then the resolver we are using is the problem - since it is amazon specific (per https://serverfault.com/a/929517/443939)...and amazon still sees it as valid, but it won't resolve externally (our server trying to hit theirs..)
I have removed the resolver altogether from one configuration and will see where that goes. We have not been able to reproduce this using internal servers, so we must rely on waiting for the third party servers to cycle (about once per week).
I'm a bit uncertain about this resolver being the issue only because a restart of nginx will solve the problem and get the latest IP pair :/
Is it possible that I have to set the dns variable without the https?:
set $backend real_storage$2;
proxy_pass https://$backend;
I know that you have to use a variable or else the re-resolve won't happen, but maybe it is very specific which part of the variable - as I have only ever seen it set up as above in my queries....but no reason was ever given...I'll set that up on a 2nd server and see what happens...
And for my 3rd server I am trying this comment and moving the set outside of location. Of course if anybody else has a concrete idea then I'm open to changing my testing for this go round :D
set $rootbackend https://real_storage;
location ~ ^/stuff/([^/]*)/stuff(.*)$ {
set $backend $rootbackend$2;
proxy_pass $backend;
}
Note that I have to set it inside because it uses a dynamic variable, though.
As it was correctly noted by #cnst, using a variable in proxy_pass makes nginx resolve address of real_storage for every request, but there are further details:
Before version 1.1.9 nginx used to cache DNS answers for 5 minutes.
After version 1.1.9 nginx caches DNS answers for a duration equal to their TTL, and the default TTL of Amazon ELB is 60 seconds.
So it is pretty legal that after rotation nginx keeps using old address for some time. As per documentation, the expiration time of DNS cache can be overridden:
resolver 127.0.0.1 [::1]:5353 valid=10s;
or
resolver 127.0.0.1 ipv6=off valid=10s;
There's nothing special about using variables within http://nginx.org/r/proxy_pass — any variable use will make nginx involve the resolver on each request (if not found in a server group — perhaps you have a clash?), you can even get rid of $backend if you're already using $2 in there.
As to interpreting the error message — you have to figure out whether this happens because the existing connections get dropped, or whether it's because nginx is still trying to connect to the old addresses.
You might also want to look into lowering the _time values within http://nginx.org/en/docs/http/ngx_http_proxy_module.html; they all appear to be set at 60s, which may be too long for your use-case:
http://nginx.org/r/proxy_connect_timeout
http://nginx.org/r/proxy_send_timeout
http://nginx.org/r/proxy_read_timeout
I'm not surprised that you're not able to reproduce this issue, because there doesn't seem to be anything wrong with your existing configuration; perhaps the problem manifested itself in an earlier revision?

How to add the 'upstream try' to the request which I send to the backend server

I have an nginx server which acts as a load balancer.
The nginx is configured to upstream 3 tries:
proxy_next_upstream_tries 3
I am looking for a way to pass to the backend server the current try number of this request - i.e. first, second or last.
I believe it can be done by passing this data in the header, however, how can I configure this in nginx and where can I take this data from?
Thanks
I sent this question to Nginx support and they provided me this explanation:
As long as you are using proxy_next_upstream mechanism for
retries, what you are trying to do is not possible. The request
which is sent to next servers is completely identical to the one
sent to the first server nginx tries - or, more precisely, this is the
same request, created once and then sent to different upstream
servers as needed.
If you want to know on the backend if it is handling the first
request or it processes a retry request after an error, a working
option would be to switch proxy_next_upstream off, and instead
retry requests on 502/504 errors using the error_page directive.
See http://nginx.org/r/error_page for examples on how to use
error_page.
So, I did as they advised me:
proxy_intercept_errors on;
location / {
proxy_pass http://example.com;
proxy_set_header NlbRetriesCount 0;
error_page 502 404 #fallback;
}
location #fallback {
proxy_pass http://example.com;
proxy_set_header NlbRetriesCount 1;
}

Resources