Flask-SocketIO 502 Error on AWS EC2 with [CRITICAL] Worker Timeouts - nginx

I'm setting up a reverse-proxy NGINX EC2 deployment of a flask app on AWS by following this guide. More specifically, I'm using a proxy pass to a gunicorn server (see config info below).
Things have been running smoothly, and the flask portion of the setup works great. The only thing is that, when attempting to access pages that rely on Flask-SocketIO, the client throws a 502 (Bad Gateway) and some 400 (Bad Request) errors. This happens after successfully talking a bit with the server, but then the next message(s) (e.g. https://example.com/socket.io/?EIO=3&transport=polling&t=1565901966787-3&sid=c4109ab0c4c74981b3fc0e3785fb6558) sit(s) at pending, and after 30 seconds the gunicorn worker throws a [CRITICAL] WORKER TIMEOUT error and reboots.
A potentially important detail: I'm using eventlet, and I've applied monkey patching.
I've tried changing around ports, using 0.0.0.0 instead of 127.0.0.1, and a few other minor alterations. I haven't been able to locate any resources online that deal with these exact issues.
The tasks asked of the server are very light, so I'm really not sure why it's hanging like this.
GNIX Config:
server {
# listen on port 80 (http)
listen 80;
server_name _;
location ~ /.well-known {
root /home/ubuntu/letsencrypt;
}
location / {
# redirect any requests to the same URL but on https
return 301 https://$host$request_uri;
}
}
server {
# listen on port 443 (https)
listen 443 ssl;
server_name _;
...
location / {
# forward application requests to the gunicorn server
proxy_pass http://127.0.0.1:5000;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
location /socket.io {
include proxy_params;
proxy_http_version 1.1;
proxy_buffering off;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_pass http://127.0.0.1:5000/socket.io;
}
...
}
Launching the gunicorn server:
gunicorn -b 127.0.0.1:5000 -w 1 "app:create_app()"
Client socket declaration:
var protocol = window.location.protocol
var socket = io.connect(protocol + '//' + document.domain + ':' + location.port);
Requirements.txt:
Flask_SQLAlchemy==2.4.0
SQLAlchemy==1.3.4
Flask_Login==0.4.1
Flask_SocketIO==4.1.0
eventlet==0.25.0
Flask==1.0.3
Flask_Migrate==2.5.2
Sample client-side error messages:
POST https://example.com/socket.io/?EIO=3&transport=polling&t=1565902131372-4&sid=17c5c83a59e04ee58fe893cd598f6df1 400 (BAD REQUEST)
socket.io.min.js:1 GET https://example.com/socket.io/?EIO=3&transport=polling&t=1565902131270-3&sid=17c5c83a59e04ee58fe893cd598f6df1 400 (BAD REQUEST)
socket.io.min.js:1 GET https://example.com/socket.io/?EIO=3&transport=polling&t=1565902165300-7&sid=4d64d2cfc94f43b1bf6d47ea53f9d7bd 502 (Bad Gateway)
socket.io.min.js:2 WebSocket connection to 'wss://example.com/socket.io/?EIO=3&transport=websocket&sid=4d64d2cfc94f43b1bf6d47ea53f9d7bd' failed: WebSocket is closed before the connection is established
Sample gunicorn error messages (note: first line is the result of a print statement)
Client has joined his/her room successfully
[2019-08-15 20:54:18 +0000] [7195] [CRITICAL] WORKER TIMEOUT (pid:7298)
[2019-08-15 20:54:18 +0000] [7298] [INFO] Worker exiting (pid: 7298)
[2019-08-15 20:54:19 +0000] [7300] [INFO] Booting worker with pid: 7300

You need to use the eventlet worker with Gunicorn:
gunicorn -b 127.0.0.1:5000 -w 1 -k eventlet "app:create_app()"

Related

nginx transparent reverse proxy keepalive Error

Creating nginx transparent proxy. As a result of checking the curl, it is successful.
However, http keep-alive does not works normally.
client ---> nginx ----> Server (GET)
client <--- nginx <---- Server (200 OK)
client ---> nginx ----> Server (GET failed)
nginx log
failed (98: Address already in use) while connecting to upstream
nginx.conf file
upstream proxy {
server proxy;
}
server {
listen 80;
proxy_bind $remote_addr:$remote_port transparent;
server_name myhomepage.com;
location / {
proxy_set_header Host $host;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_pass http://proxy;
}
}
$remote_port must be included.
https://www.nginx.com/blog/ip-transparency-direct-server-return-nginx-plus-transparent-proxy/
I think this problem occurs because it is managed by pool when connecting to nginx server.

Nginx Proxy Pass to External APIs- 502 Bad Gateway

Issue: I have an nginx reverse proxy installed in a ubuntu server with private IP only. The purpose of this reverse proxy is to route the incoming request to various 3rd party web sockets and RestAPIs. Furthermore, to distribute the load, I have a http loadbalancer sitting behind the nginx proxy server.
So this is how it looks technically:
IncomingRequest --> InternalLoadBalancer(Port:80) --> NginxReverseProxyServer(80) --> ThirdParyAPIs(Port:443) & WebSockets(443)
The problem I have is that, Nginx does not reverse_proxy correctly to the RestAPIs and gives a 502 error, but it does work successfully for Web Sockets.
Below is my /etc/nginx/sites-available/default config file: (No changes done elsewhere)
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80;
location /binance-ws/ {
# Web Socket Connection
####################### THIS WORKS FINE
proxy_pass https://stream.binance.com:9443/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $host;
}
location /binance-api/ {
# Rest API Connection
##################### THIS FAILS WITH 502 ERROR
proxy_pass https://api.binance.com/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $host;
}
}
I have even tried adding https://api.binance.com:443/ but no luck.
The websocket connection works fine:
wscat -c ws://LOADBALANCER-DNS/binance-ws/ws/btcusdt#aggTrade
However, the below one fails:
curl http://LOADBALANCER-DNS/binance-api/api/v3/time
When I see the nginx logs for 502 error, this is what I see:
[error] 14339#14339: *20 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 10.5.2.187, server: , request: "GET /binance-api/api/v3/time HTTP/1.1", upstream: "https://52.84.150.34:443/api/v3/time", host: "internal-prod-nginx-proxy-xxxxxx.xxxxx.elb.amazonaws.com"
This is the actual RestAPI call which I am trying to simulate from nginx:
curl https://api.binance.com/api/v3/time
I have gone through many almost similar posts but unable to find what/where am I going wrong. Appreciate your help!

Thin timing out Nginx cannot connect

I have been trying different configurations and cannot figure out why Thin is timing out. At least this is what I think is happening. Thin is not available after the server is idle for a while (i.e. overnight).
Environment:
Ubuntu 14.04 running on AWS t2.nano
Redmine: Redmine 2.6.10.stable.15251
Ruby: 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]
Rails: 3.2.22.2
Thin: 1.3.1 codename Triple Espresso
Database: mysql
No Thin error log. The error log in Nginx is:
2016/03/24 07:01:47 [error] 18432#0: *1376 connect() to unix:/ebs_001/redmine/run/thin/redmine.0.sock failed (111: Connection refused) while connecting to upstream, client: 184.166.153.12, server: pm.source3.com, request: "GET / HTTP/1.1", upstream: "http://unix:/ebs_001/redmine/run/thin/redmine.0.sock:/", host: "pm.source3.com"
I can restart Thin (more on this later) and connect without restarting Ngnix.
When I try to stop Thin I get the following message
user1#ip-172-31-18-58:/var/log/nginx$ sudo service thin stop
[stop] /etc/thin1.9.1/redmine.yml ...
Stopping server on /ebs_001/redmine/run/thin/redmine.0.sock ...
Sending QUIT signal to process 18385 ...
process not found!
Sending KILL signal to process 18385 ...
/usr/lib/ruby/vendor_ruby/thin/daemonizing.rb:140:in `kill': No such process (Errno::ESRCH)
from /usr/lib/ruby/vendor_ruby/thin/daemonizing.rb:140:in `force_kill'
from /usr/lib/ruby/vendor_ruby/thin/daemonizing.rb:134:in `rescue in send_signal'
from /usr/lib/ruby/vendor_ruby/thin/daemonizing.rb:118:in `send_signal'
from /usr/lib/ruby/vendor_ruby/thin/daemonizing.rb:107:in `kill'
from /usr/lib/ruby/vendor_ruby/thin/controllers/controller.rb:93:in `block in stop'
from /usr/lib/ruby/vendor_ruby/thin/controllers/controller.rb:134:in `tail_log'
from /usr/lib/ruby/vendor_ruby/thin/controllers/controller.rb:92:in `stop'
from /usr/lib/ruby/vendor_ruby/thin/runner.rb:185:in `run_command'
from /usr/lib/ruby/vendor_ruby/thin/runner.rb:151:in `run!'
from /usr/bin/thin:6:in `<main>'
After I stop Thin, I can then start Thin (sudo service Thin start) and connect to my redmine project without restarting nxinx.
I do not see any error logs in redmine or Thin.
My /etc/thin/redmine.yml file:
---
user: user1
group: group1
pid: /ebs_001/redmine/run/thin/redmine.pid
timeout: 30
wait: 30
log: /ebs_001/redmine/logs/thin/redmine.log
max_conns: 1024
require: []
environment: production
max_persistent_conns: 512
servers: 1
daemonize: true
socket: /ebs_001/redmine/run/thin/redmine.sock
chdir: /ebs_001/redmine/redmine-2.6
tag: redmine
Portions of my /etc/nginx/sites-available/redmine.conf:
# Upstream Ruby process cluster for load balancing
upstream thin_cluster {
server unix:/ebs_001/redmine/run/thin/redmine.0.sock;
# server unix:/ebs_001/redmine/run/thin/redmine.1.sock max_fails=1 fail_timeout=15s;
# server unix:/ebs_001/redmine/run/thin/redmine.2.sock;
# server unix:/ebs_001/redmine/run/thin/redmine.3.sock;
}
### REDMINE - serve all pages via ssl (https)
server {
listen 80;
server_name pm.source3.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name pm.source3.com;
ssl on;
ssl_certificate /etc/nginx/ssl/redmine.crt;
ssl_certificate_key /etc/nginx/ssl/redmine.key;
include /etc/nginx/includes/redmine.include;
proxy_redirect off;
root /ebs_001/redmine/redmine-2.6;
# An alias to your upstream app
location #cluster {
proxy_pass http://thin_cluster;
# Define what a "failure" is, so it can try the next server
proxy_next_upstream error timeout http_502 http_503;
# If the upstream server doesn't respond within n seconds, timeout
proxy_read_timeout 60s;
}
location / {
try_files $uri/index.html $uri.html $uri #cluster;
}
}
...
And the ../includes/redmine.include
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
client_max_body_size 10m;
client_body_buffer_size 128k;
proxy_connect_timeout 90;
proxy_send_timeout 90;
proxy_read_timeout 90;
proxy_buffer_size 4k;
proxy_buffers 4 32k;
proxy_busy_buffers_size 64k;
proxy_temp_file_write_size 64k;
It is not a permission issue as I can restart Thin and connect to Redmine.
I only have about 15M of free memory. Maybe this is the issue. But if it was, then Redmine would crash as I am using it throughout the day.
Any help on figuring out the timeout is much appreciated. Lastly, I tried using a port rather than a socket and still had the same timeout issue
The problem was out of memory on the server. I was running to many applications on a AWS t2.nano. Redmine was the only one crashing. No error logs was a hint it was a memory issue. And the "free -h" command was a big clue.
I was running:
One Django app (running Postgres vis AWS RDS)
WordPress
Redmine (running MySQL locally).
Migrated to a AWS t2.mircro for more memory and all is fine.

No live upstreams while connecting to upstream, but upsteam is OK

I have a really weird issue with NGINX.
I have the following upstream.conf file, with the following upstream:
upstream files_1 {
least_conn;
check interval=5000 rise=3 fall=3 timeout=120 type=ssl_hello;
server mymachine:6006 ;
}
In locations.conf:
location ~ "^/files(?<command>.+)/[0123]" {
rewrite ^ $command break;
proxy_pass https://files_1 ;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Server $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
In /etc/hosts:
127.0.0.1 localhost mymachine
When I do: wget https://mynachine:6006/alive --no-check-certificate, I get HTTP request sent, awaiting response... 200 OK. I also verified that port 6006 is listening with netstat, and its OK.
But when I send to the NGINX file server a request, I get the following error:
no live upstreams while connecting to upstream, client: .., request: "POST /files/save/2 HTTP/1.1, upstream: "https://files_1/save"
But the upstream is OK. What is the problem?
When defining upstream Nginx treats the destination server and something that can be up or down. Nginx decides if your upstream is down or not based on fail_timeout (default 10s) and max_fails (default 1)
So if you have a few slow requests that timeout, Nginx can decide that the server in your upstream is down, and because you only have one, the whole upstream is effectively down, and Nginx reports no live upstreams. Better explained here:
https://docs.nginx.com/nginx/admin-guide/load-balancer/http-health-check/
I had a similar problem and you can prevent this overriding those settings.
For example:
upstream files_1 {
least_conn;
check interval=5000 rise=3 fall=3 timeout=120 type=ssl_hello max_fails=0;
server mymachine:6006 ;
}
I had the same error no live upstreams while connecting to upstream
Mine was SSL related: adding proxy_ssl_server_name on solved it.
location / {
proxy_ssl_server_name on;
proxy_pass https://my_upstream;
}

Thumbor/NGINX 502 Bad Gateway for larger images

I'm not sure if this is an issue with nginx or thumbor. I followed the instructions located here for setting up thumbor with nginx, and everything has been running smoothly for the last month. Then recently we tried to use thumbor after uploading images with larger dimensions (above 2500x2500), but I'm only greeted with a broken image icon.
If I go to my thumbor URL and pass the image location itself into the browser I get one of two response:
1) 500: Internal Server Error
or
2) 502: Bad Gateway
For example, if I try to pass this image:
http://www.newscenter.philips.com/pwc_nc/main/shared/assets/newscenter/2008_pressreleases/Simplicity_event_2008/hires/Red_Square1_hires.jpg
I get 502: Bad Gateway and checking my nginx error logs results in
2015/05/12 10:59:16 [error] 32020#0: *32089 upstream prematurely closed connection while reading response header from upstream, client: <my-ip>, server: <my-server>, request: "GET /unsafe/450x450/smart/http://www.newscenter.philips.com/pwc_nc/main/shared/assets/newscenter/2008_pressreleases/Simplicity_event_2008/hires/Red_Square1_hires.jpg HTTP/1.1" upstream: "http://127.0.0.1:8003/unsafe/450x450/smart/http://www.newscenter.philips.com/pwc_nc/main/shared/assets/newscenter/2008_pressreleases/Simplicity_event_2008/hires/Red_Square1_hires.jpg", host: "<my-host>"
If needed, here's my thumbor.conf file for nginx:
#
# A virtual host using mix of IP-, name-, and port-based configuration
#
upstream thumbor {
server 127.0.0.1:8000;
server 127.0.0.1:8001;
server 127.0.0.1:8002;
server 127.0.0.1:8003;
}
server {
listen 80;
server_name <my-server>;
client_max_body_size 10M;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header HOST $http_host;
proxy_set_header X-NginX-Proxy true;
proxy_pass http://thumbor;
proxy_redirect off;
}
}
For images below this, it works fine, but users will be uploading images from their phones. How can I fix this?

Resources