502 Bad Gateway Elixir/Phoenix - nginx

I've set up a new Ubuntu 18.04 server and deployed my phoenix app but am getting a 502 error when trying to access it.
I don't yet have a domain name because I will be transferring one from another server, so just trying to connect with the IP address.
The Phoenix app is deployed and running, and I can ping it with edeliver.
Prod conf:
config :app, AppWeb.Endpoint,
load_from_system_env: false,
url: [host: "127.0.0.1", port: 4013],
cache_static_manifest: "priv/static/cache_manifest.json",
check_origin: true,
root: ".",
version: Mix.Project.config[:version]
config :logger, level: :info
config :phoenix, :serve_endpoints, true
import_config "prod.secret.exs"
Nginx conf:
server {
listen 80;
server_name _;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_pass http://127.0.0.1:4013;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Nginx Error log:
2020/05/14 22:28:23 [error] 22908#22908: *24 connect() failed (111: Connection refused) while connecting to upstream, client: ipaddress, server: _, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:4013/", host: "ipaddress"
Edit:
Last two entries of OTP logs confirming app is alive
===== ALIVE Fri May 15 07:33:19 UTC 2020
===== ALIVE Fri May 15 07:48:19 UTC 2020
Edit 2:
I have posted a Gist detailing all the steps I have taken going from a clean Ubuntu box to where I am now here: https://gist.github.com/phollyer/cb3428e6c23b11fadc5105cea1379a7c
Thanks

You have to add server: true to your configuration, like:
config :wtmitu, WtmituWeb.Endpoint,
server: true, # <-- this line
load_from_system_env: false,
...
You don't have to add it to the dev environment because mix phx.server is doing it for you.
The Doc

This has been resolved as follows:
There were two problems that required resolving.
Adding config :app, AppWeb.Endpoint, server: true to either prod.secret.exs or prod.exs was required.
I had a running process left over from mistakenly deploying staging to the same server, initially. I originally logged in to the server, and stopped staging with ./bin/app stop, maybe this left a process running, maybe somehow I started the process by mistake later on. Anyway, I used ps ux to list the running processes and found that one of the processes listed staging in its path, so I killed all running processes related to the deloyment, both staging and production, with kill -9 processId, re-deployed to production, and all is now fine.

Related

actioncable: subscription to channel fails in production with ssl

jruby 9.3.6 (hence ruby 2.6.8), rails 6.1.6.1. in production using ssl (wss) with devise, puma, nginx.
Locally actioncable is running without problems, but on the external server actioncable establishes a Websocket connection, leading to nginx: GET /cable HTTP/1.1" 101 210, the user gets verified correctly from the expanded session_id in the cookie, and after "Successfully upgraded to WebSocket", the browser receives a {"type":"welcome"} and pings.
Hence the actioncable javascript sends a request to subscribe to a channel, but that doesn't have success.
I tried a lot regarding the configuration of nginx, e.g. switching to passenger for the actioncable location /cable in nginx, and after 5 days I even changed from calling the server side of actioncable from a simple "new Websocket" in javascript to the client-side implementation of actioncable - as is designed to be (using import maps and actioncable.esm.js), but it didn't solve the main problem.
The logfiles:
production.log:
[ActionCable connect in app/channels/application_cable/connection.rb:] WebSocket error occurred: Broken pipe -
If the domain.com is called once, this error takes place every time 0.2 Seconds. If the page is closed, the error continues. The 0.2 seconds is the same frequency with which the browser sends the requests to subscribe to the channel. I assumed a log time, it would have to do with ssl, but I don't catch the problem. So now, I assume that the "broken pipe" is a problem between the jruby app and actioncable on the server side. But I am not sure and actually I don't know, how to troubleshoot there.
Additional there is a warning in puma.stderr.log:
warning: thread "Ruby-0-Thread-27:
/home/my_app/.rbenv/versions/jruby-9.3.6.0/lib/ruby/gems/shared/gems/actioncable-6.1.6.1/lib/action_cable/connection/stream_event_loop.rb:75" terminated with exception (report_on_exception is true):
ArgumentError: mode not supported for this object: r
starting redis-cli and doing 'monitor':
1660711192.522723 [1 127.0.0.1:33630] "select" "1"
1660711192.523545 [1 127.0.0.1:33630] "client" "setname" "ActionCable-PID-199512"
publishing to MessagesChannel_1 works:
1660711192.523831 [1 127.0.0.1:33630] "publish" "messages_1" "{\"message\":\"message-text\"}"
In comparison in the local development configuration, this looks different:
1660712957.712189 [1 127.0.0.1:46954] "select" "1"
1660712957.712871 [1 127.0.0.1:46954] "client" "setname" "ActionCable-PID-18600"
1660712957.713495 [1 127.0.0.1:46954] "subscribe" "_action_cable_internal"
1660712957.716100 [1 127.0.0.1:46954] "subscribe" "messages_1"
1660712957.974486 [1 127.0.0.1:46952] "publish" "messages_3" "{\"message\":\"message-text\"}"
So what is "_action_cable_internal", and why doesn't it take place in production?
I found the code for the actioncable gem and added 'p #pubsub' in gems/actioncable-6.1.6.1/lib/action_cable/server/base.rb at the end of the def pubsub -function and compared that information with the local configuration.
locally there is an info:
#thread=#<Thread:0x5326bff6#/home/me_the_user/.rbenv/versions/jruby-9.2.16.0/lib/ruby/gems/shared/gems/actioncable-6.1.6.1/lib/action_cable/connection/stream_event_loop.rb:75 sleep>
which corresponds to the info at the server:
#thread=#<Thread:0x5f4462e1#/home/me_the_user/.rbenv/versions/jruby-9.3.6.0/lib/ruby/gems/shared/gems/actioncable-6.1.6.1/lib/action_cable/connection/stream_event_loop.rb:75 dead>
So it looks like the "warning" was an 'error'.
Also I am not sure, if the output of wscat / curl is normal or reports an error:
root#server:~# wscat -c wss://domain.tld
error: Unexpected server response: 302
Which could be normal due to missing '/cable'.
But:
root#server:~# wscat -c wss://domain.tld/cable
error: Unexpected server response: 404
root#server:~# curl -I https://domain.tld/cable
HTTP/1.1 404 Not Found
the configurations:
nginx.conf:
http {
upstream app {
# Path to Puma SOCK file, as defined previously
server unix:///var/www/my_app/shared/sockets/puma.sock fail_timeout=0;
}
server {
listen 443 ssl default_server;
server_name domain.com www.domain.com;
include snippets/ssl-my_app.com.conf;
include snippets/ssl-params.conf;
root /var/www/my_app/public;
try_files $uri /index.html /index.htm;
location /cable {
proxy_pass http://app;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Ssl on;
proxy_set_header X-Forwarded-Proto https;
}
location / {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $http_host;
proxy_redirect off;
add_header X-Frame-Options SAMEORIGIN always;
proxy_pass http://app;
}
rails_env production;
} }
cable.yml:
production:
adapter: redis
url: redis://127.0.0.1:6379/1
ssl_params:
verify_mode: <%= OpenSSL::SSL::VERIFY_NONE %>
initializer/redis.rb:
$redis = Redis.new(:host => 'domain.com/cable', :port => 6379)
routes.rb:
mount ActionCable.server => '/cable'
config/environments/production.rb:
config.force_ssl = true
config.action_cable.allowed_request_origins = [/http:\/\/*/, /https:\/\/*/]
config.action_cable.allow_same_origin_as_host = true
config.action_cable.url = "wss://domain.com/cable"
I assume that is has to do with the ssl, but I am not an expert for configurations, so thank you very much for any help.
The problem was caused due to hardware: I am using a so called 'airbox' in France, which is a mobile internet access offering WLAN.
So the websocket connection always was closed due to actioncable websocket certificate not "being known" and mobile internet to be more strict.
Hence I created a pseudo-websocket: if the websocket connection is closed imediately, the browser of the user asks every 2,5 seconds, if there is something that would have been sent via websocket.

Performance Issues with .net core api and nginx as reverse proxy

Description:
We use nginx as a reverse proxy, that splits up the load to multiple backend servers that are running a Graphql interface using HotChocolate (.net core 3.1). The Graphql interface then triggers an ElasticSearch Call (using the official NEST Library).
Problem:
We start the system with 4 backend servers and one nginx reverse proxy and load test it with JMeter. That is working absolutely great. It is also performing well when we kill the 4th pod.
The problem only jumps in when we have two (or one) pods left. Nginx starts to return only errors and does no longer split up the load to the two remaining servers.
What we tried:
We thought that the elastic search query is performing badly, which in turn could block the backend. When executing against the query against the elastic search directly, we have a much higher performance. So that should not be the problem.
Our second approach was that the graphql.net library is bogus, so we replaced it with HotChocolate, which had no effect at all.
When we replace the graphql interface with an REST API interface it IS suddenly working?!
We played around with the nginx config but couldn't find settings that actually fixed it.
We replaced the nginx with traefik, but also the same result.
What we discovered is that as soon as we will pod three the number of ESTABLISH Connections on the reverse proxy suddenly doubles (without any additional incoming). => Maybe it is waiting for the timeout of these connections and blocks any additionally incoming one?!
We very much appreciate any help.
Thank you very much.
If you want/need to know anything else, please let me know!
Update: I did some changes to the code. It's working better now, but when scaling from one pod to two we have a performance drop for about 20-30 seconds. Why is that? how can we improve that?
We get this error from nginx:
[warn] 21#21: *9717 upstream server temporarily disabled while reading response header from upstream, client: 172.20.0.1, server: dummy.prod.com, request: "POST /graphql HTTP/1.1", upstream: "http://172.20.0.5:5009/graphql", host: "dummy.prod.com:81"
nginx_1 | 2020/09/14 06:01:24 [error] 21#21: *9717 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.20.0.1, server: dummy.prod.com, request: "POST /graphql HTTP/1.1", upstream: "http://172.20.0.5:5009/graphql", host: "dummy.prod.com:81"
nginx_1 | 172.20.0.1 - - [14/Sep/2020:06:01:24 +0000] "POST /graphql HTTP/1.1" 504 167 "-" "Apache-HttpClient/4.5.12 (Java/14)" "-"
nginx config:
upstream dummy {
server dummy0:5009 max_fails=3 fail_timeout=30s;
server dummy1:5009 max_fails=3 fail_timeout=30s;
server dummy3:5009 max_fails=3 fail_timeout=30s;
server dummy2:5009 max_fails=3 fail_timeout=30s;
}
server {
listen 80;
location / {
proxy_connect_timeout 180s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
proxy_buffering off;
proxy_buffer_size 128k;
proxy_buffers 4 128k;
proxy_max_temp_file_size 1024m;
proxy_request_buffering off;
proxy_http_version 1.1;
proxy_cookie_domain off;
proxy_cookie_path off;
# In case of errors try the next upstream server before returning an error
proxy_next_upstream error timeout;
proxy_next_upstream_timeout 0;
proxy_next_upstream_tries 3;
proxy_pass http://dummy;
}
server_name dummy.prod.com;
}

docker compose: rebuild of one linked container breaks nginx's upstream

I'm using docker-compose with "Docker for Mac" and I have two containers: one NGINX, one container serving a node-app on port 3000.
docker-compose.yml looks like this:
version: "2"
services:
nginx:
build: ./nginx
ports:
- "80:80"
links:
- api
api:
build: ./api
volumes:
- "./api:/opt/app"
In the NGINX's config I say:
upstream api {
server api:3000;
}
server {
# ....
location ~ ^/api/?(.*) {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_pass http://api;
proxy_redirect off;
}
}
Now, when I change something in the node code and rebuild the container
$ docker-compose stop api && docker-compose up -d --build --no-deps api
the container is getting rebuilt and started. The problem is, that sometimes the internal IP of the container changes and NGINX won't know about that. Funny enough, when I go into the NGINX container and ping api I get the new IP address
$ ping api
PING api (172.19.0.3): 56 data bytes
64 bytes from 172.19.0.3: icmp_seq=0 ttl=64 time=0.236 ms
but NGINX logs still say
2016/10/20 14:20:53 [error] 9#9: *9 connect() failed (113: No route to host) while connecting to upstream, client: 172.19.0.1, server: localhost, request: "GET /api/test HTTP/1.1", upstream: "http://172.19.0.7:3000/api/test", host: "localhost"
where the upstream's 172.19.0.7 is still the old IP address.
PS: this doesn't happen every time I rebuild the container.
This is because Nginx caches the DNS response for upstream servers - in your workflow you're only restarting the app container, so Nginx doesn't reload and always uses its cached IP address for the api container.
When you run a new api container, as you've seen, it can have a different IP address so the cache in Nginx is not valid. The ping works because it doesn't cache Docker's DNS response.
Assuming this is just for dev and downtime isn't an issue, docker-compose restart nginx after you rebuild the app container will restart Nginx and clear the DNS cache.

Backup nginx server returning 504

I have 3 nginx servers setup. The backup web server and the Home server both have identical ../sites-enabled and ../sites-available directories. And the third server acts as a load balancer that points to both the backup and the home server with the config:
upstream myapp1 {
server 1.1.1.1; #home server
server 2.2.2.2 backup; #backup server
}
server {
listen 80;
location / {
proxy_pass http://myapp1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
But I am having an issue (which is explained more graphically below) when I am testing to see whether the backup server is working, as it only seems to be working when the Home Server is on!
1. test.foo.com -> Backup Web Server
2. foo.com -> Load Balancer
3. www.foo.com -> Home server
-> means points to
When Nothing is down:
- 1 returns OK
- 2 returns OK
- 3 returns OK
When Home Server is down:
- 1 returns 504 **(SHOULD BE OK)**
- 2 returns 504 **(SHOULD BE OK)**
- 3 returns DNS error
When Load Balancer is down:
- 1 returns OK
- 2 returns DNS error
- 3 returns OK
When Backup Web Server is down:
- 1 returns DNS error
- 2 returns 200
- 3 returns 200
You seem to be confused on the terminology here:
when load balancer is down, you'd be getting connect(2) Connection refused or Operation timed out-style errors; you would not be getting DNS errors
likewise, the fact that you're getting 504 from your upstream home server, means that it is NOT down, thus your backup server never gets used, because nginx would only use backup if the primary server is really not available
You could potentially fix the second issue by getting the paid version of nginx, which has support for the health_check directive.
Alternatively, you could implement caching, and use proxy_cache_use_stale to specify that a cached version should be returned instead. Also, take a look at error_page, too

How to configure IPython behind nginx in a subpath?

I've got nginx running handling all SSL stuff and already proxying / to a Redmine instance and /ci to a Jenkins instance.
Now I want to serve an IPython instance on /ipython through that very same nginx.
In nginx.conf I've added:
http {
...
upstream ipython_server {
server 127.0.0.1:5001;
}
server {
listen 443 ssl default_server;
... # all SSL related stuff and the other proxy configs (Redmine+Jenkins)
location /ipython {
proxy_pass http://ipython_server;
}
}
}
In my .ipython/profile_nbserver/ipython_notebook_config.py I've got:
c.NotebookApp.base_project_url = '/ipython/'
c.NotebookApp.base_kernel_url = '/ipython/'
c.NotebookApp.port = 5001
c.NotebookApp.trust_xheaders = True
c.NotebookApp.webapp_settings = {'static_url_prefix': '/ipython/static/'}
Pointing my browser to https://myserver/ipython gives me the usual index page of all notebooks in the directory I launched IPython.
However, when I try to open one of the existing notebooks or create a new one, I'm getting the error:
WebSocket connection failed: A WebSocket connection to could not be established. You will NOT be able to run code. Check your network connection or notebook server configuration.
I've tried the same setup with the current stable (1.2.1, via pypi) and development (Git checkout of master) version of IPython.
I also tried adjusting the nginx config according to nginx reverse proxy websockets with no avail.
Due to an enforced policy I'm not able to allow connections to the server on other ports than 443.
Does anybody have IPython running behind an nginx?
I had the same problem. I updated nginx up to the current version (1.6.0). It seems to be working now.
Server config:
location /ipython {
proxy_pass http://ipython_server;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Origin "";
}
See: http://nginx.org/en/docs/http/websocket.html

Resources