Flask+gevent - SSE times out with nginx+uwsgi - nginx

I'm writing a webapp, based on Flask, gevent and Redis, which makes use of Server Sent Events.
I've gone through several questions on StackOverflow, and extensive search on google, but did not find any suitable answer that works for me, so here I am asking for the community help.
The problem is with the production stack, nginx+uwsgi: the browser receives updates regularly (and refreshes as expected) for about 30 seconds. After that the connection times out and the browser does not receive any update anymore, until the page is reloaded manually.
Since the whole thing works perfectly on localhost, with standard flask development server (connection alive after 30 minutes of idle), I'm pretty sure that the issue is on the uwsgi/nginx config. I've tried all the nginx/uwsgi settings I could think of but nothing, it keeps timing out after some seconds.
Does anybody have a clue ?
Here some code and configs.
nginx relevant production settings:
location / {
include uwsgi_params;
uwsgi_pass unix:/tmp/myapp.sock;
uwsgi_param UWSGI_PYHOME /srv/www/myapp/venv;
uwsgi_param UWSGI_CHDIR /srv/www/myapp;
uwsgi_param UWSGI_MODULE run;
uwsgi_param UWSGI_CALLABLE app;
uwsgi_buffering off;
proxy_set_header Connection '';
proxy_http_version 1.1;
chunked_transfer_encoding off;
proxy_cache off;
}
uwsgi production settings
[uwsgi]
base = /srv/www/myapp
app = run
home = %(base)/venv
pythonpath = %(base)
socket = /tmp/%n.sock
gevent = 100
module = %(app)
callable = app
logto = /srv/www/myapp-logs/uwsgi_%n.log
this is the javascript that the template executes to subscribe to the channel (for the time being, the template just refreshes the whole page when the server pushes some data)
<script type="text/javascript">
var eventOutputContainer = document.getElementById("event");
var evtSrc = new EventSource("/movers/monitor");
evtSrc.onmessage = function(e) {
console.log(e.data);
location.reload();
//eventOutputContainer.innerHTML = e.data;
};
</script>
This is the code I use to return the streamed data
from myapp import redislist
from flask import Response, Blueprint, stream_with_context
movers = Blueprint('movers', __name__, url_prefix='/movers')
r = redislist['r']
#movers.route("/monitor")
def stream_movers():
def gen():
pubsub = r.pubsub()
pubsub.subscribe('movers-real-time')
for event in pubsub.listen():
if event['type'] == 'message':
yield 'retry: 10000\n\ndata: %s\n\n' % event['data']
return Response(stream_with_context(gen()), direct_passthrough=True, mimetype="text/event-stream")
and finally the app is executed like this (DEBUG is True on localhost)
from myapp import app
from gevent.wsgi import WSGIServer
if __name__ == '__main__':
DEBUG = True if app.config['DEBUG'] else False
if DEBUG:
app.run(debug=DEBUG, threaded=True)
app.debug = True
server = WSGIServer(("", 5000), app)
server.serve_forever()
else:
server = WSGIServer("", app)
server.serve_forever()

after long hours on nginx log files and firefox js console, it turned out that the configurations shown in the question are perfectly fine.
The issue was the page reloading, this action kills and reinitializes the connection and therefore the retry command doesn't have any effect.
After removing that instruction the SSE updates work like a charm even after long time of inactivity.
Now the question is why this worked on the simpler development environment stack :-)
EDIT
indeed, after few more days, the connection still times out. I've made some time measures and found out that the time out interval is variable between some 30 seconds and several minutes of inactivity.
My conclusion is that the stack above is fine, while it's the amazon EC2 connection which expires after some variable inactivity time, since I'm still using a micro instance.
The final fix is the following JS snippet:
evtSrc.onerror = function(e) {
location.reload();
}
the page reloads When the connection is dropped (whatever the reason). The reloads are not expected to happen when the server sent events are frequent.

Related

nginx - connection timed out while reading upstream

I have a flask server with and endpoint that processes some uploaded .csv files and returns a .zip (in a JSON reponse, as a base64 string)
This process can take up to 90 seconds
I've been setting it up for production using gunicorn and nginx and I'm testing the endpoint with smaller .csv s. They get processed fine and in a couple seconds I get the "got blob" log. But nginx doesn't return it to the client and finally it times out. I set up a longer fail-timeout of 10 minutes and the client WILL wait 10 minutes, then time out
the proxy read timeout offered as solution here is set to 3600s
Also the proxy connect timeout is set to 75s according to this
also the timeout for the gunicorn workers according to this
The error log says: "upstream timed out connection timed out while reading upstream"
I also see examples of nginx receiving an OPTIONS request and immediately after the POST request (some CORS weirdness from the client) where nginx passes the OPTIONS request but fails to pass the POST request to gunicorn despite nginx having received it
Question:
What am I doing wrong here?
Many thanks
http {
upstream flask {
server 127.0.0.1:5050 fail_timeout=600;
}
# error log
# 2022/08/18 14:49:11 [error] 1028#1028: *39 upstream timed out (110: Connection timed out) while reading upstream, ...
# ...
server {
# ...
location /api/ {
proxy_pass http://flask/;
proxy_read_timeout 3600;
proxy_connect_timeout 75s;
# ...
}
# ...
}
}
# wsgi.py
from main import app
if __name__ == '__main__':
app.run()
# flask endpoint
#app.route("/process-csv", methods=['POST'])
def process_csv():
def wrapped_run_func():
return blob, export_filename
# ...
try:
blob, export_filename = wrapped_run_func()
b64_file = base64.b64encode(blob.getvalue()).decode()
ret = jsonify(file=b64_file, filename=export_filename)
# return Response(response=ret, status=200, mimetype="application/json")
print("got blob")
return ret
except Exception as e:
app.logger.exception(f"0: Error processing file: {export_filename}")
return Response("Internal server error", status=500)
ps. getting this error from stackoverflow
"Your post appears to contain code that is not properly formatted as code. Please indent all code by 4 spaces using the code toolbar button or the CTRL+K keyboard shortcut. For more editing help, click the [?] toolbar icon."
for having perfectly well formatted code with language syntax, I'm sorry that I had to post it ugly
Sadly I got no response
See last lines for the "solution" finally implemented
CAUSE OF ERROR: I believe the problem is that I'm hosting the Nginx server on wsl1
I tried updating to wsl2 and see if that fixed it but I need to enable some kind of "nested virtualization", as the wsl1 is running already on a VM.
Through conf changes I got it to the point where no error is logged, gunicorn return the file then it just stays in the ether. Nginx never gets/sends the response
"SOLUTION":
I ended up changing the code for the client, the server and the nginx.conf file:
the server saves the resulting file and only returns the file name
the client inserts the filename into an href that then displays a link
on click a request is sent to nginx which in turn just sends the file from a static folder, leaving gunicorn alone
I guess this is the optimal way to do it anyway, though it still bugs me I couldn't (for sure) find the reason of the error

Ingress support for websocket

I have a jetty web app running under k8s. This web app has a websocket end point. The service deployed is exposed via an nginx ingress on https.
Everything works fine, I have the web app running and the websockets work fine (ie messages get pushed and received) but the websockets close with a 1006 error code, which to be honest doesn't stop my code from working but doesn't look good either.
The websocket is exposed # /notifications. In a "normal" config, ie not k8s, just plain software installed on a VM, I would need to add the following to nginx.conf
location /notifications {
proxy_pass http://XXX/notifications;
proxy_read_timeout 3700s;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Origin '';
}
I tried doing this via the ingress
nginx.ingress.kubernetes.io/configuration-snippet: |
location /notifications {
proxy_pass http://webapp:8080/notifications;
proxy_http_version 1.1;
proxy_set_header Upgrade "websocket";
proxy_set_header Connection "Upgrade";
}
But it has no effect, ie I checked the nginx.conf generated and there is no such block added...
Anybody has had issues like this before? any clue on how to solve the 1006 issue?
1006 meaning
As per RFC-6455 1006 means Abnormal Closure:
Used to indicate that a connection was closed abnormally (that is, with no close frame being sent) when a status code is expected.
Also see CloseReason.CloseCodes (Java(TM) EE 7 Specification APIs)
There are so many possible causes either on server or on client.
Client errors: try websocket.org Echo Test
To isolate and debug client's errors, you may use websocket.org Echo Test
As for server error
Jetty
Jetty-related discussion is here: Question regarding abnormal · Issue #604 · eclipse/jetty.project. But it doesn't contain any solutions.
Race detector for golang server code
If your server is written on golang, you may try Data Race Detector - The Go Programming Language
Data races are among the most common and hardest to debug types of bugs in concurrent systems. A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write. See the The Go Memory Model for details.
Here is an example of a data race that can lead to crashes and memory corruption:
func main() {
c := make(chan bool)
m := make(map\[string\]string)
go func() {
m\["1"\] = "a" // First conflicting access.
c <- true
}()
m\["2"\] = "b" // Second conflicting access.
<-c
for k, v := range m {
fmt.Println(k, v)
}
}
Case for PHP code
The case for PHP code discussed here: Unclean a closed connection by close() websocket's method (1006) · Issue #236 · walkor/Workerman

Grafana 6.7 auth proxy behind nginx for automatic UI login

I have a Nginx reverse proxy in front of my Grafana server.
I'm trying to use Nginx auth_basic to automatically login the user into Grafana.
I would like to do this, to be able to automatically login an embedded iframe graph placed in another web application (not on the same network)
nginx.conf
server {
server_name grafana.mydomain.com;
...
location / {
proxy_pass http://grafana.mydomain.com;
}
location /grafana/ {
proxy_pass http://grafana.mydomain.com;
auth_basic "Restricted grafana.mydomain.com";
auth_basic_user_file /etc/nginx/htpasswd/grafana.mydomain.com;
proxy_set_header X-WEBAUTH-USER $remote_user;
proxy_set_header Authorization "";
}
}
grafana.ini
[auth.basic]
enabled = true
[security]
allow_embedding = true
cookie_samesite = lax
root_url = https://grafana.mydomain.com/grafana/
[auth.proxy]
enabled = true
header_name = X-WEBAUTH-USER
header_property = username
auto_sign_up = true
sync_ttl = 60
enable_login_token = true
What is happening with this setup, is that if I go to grafana.mydomain.com it appears the normal login and everything works fine
While if I go to grafana.mydomain.com/grafana/ after logging in with Nginx, Grafana return this:
If I try to click on any link on the page a lot of unauthorized errors appears and I get logged out.
I've been playing with those settings a lot:
proxy_set_header X-WEBAUTH-USER
root_url
enable_login_token
cookie_samesite
But was unable to make things working
The user is created inside Grafana, so I have tried to give the created user full permissions:
But I still get unauthorized errors and 404 errors
I'm not even sure this is the right path to achieve what I'm trying to do, any suggestions?
I've removed the two locations and placed the authentication for the / location
Then I've switched back cookie_samesite = none and it started working as it was supposed to do.
By doing this I lost the possibility to log into grafana normally

Why Nginx sends requests to upstream servers sequentially

I use Nginx as Load Balancer with the following config:
http {
upstream backend {
server 127.0.0.1:8010;
server 127.0.0.1:8011;
}
server {
listen 80;
location / {
proxy_pass http://backend;
}
}
}
So I have 2 local servers which are Flask apps:
#app1.py
from flask import Flask, jsonify, abort, request, make_response
import time
#app.route("/", methods=['GET'])
def root():
time.sleep(5)
return jsonify({"response": "Hello, world!"})
app.run(debug=False, port=8010) # for app2.py the only diff is port=8011
When I do 4 calls simultaneously (in different tabs) localhost:80, then I need to wait for 20 seconds to see "Hello, world!" in all 4 tabs (instead of 10 as I expected, because it should be distributed to 2 servers, for each it should take 10 seconds, but instead it just processes it sequentially one-by-one). Can you explain why? And how could it be fixed?
I've played around with it a bit more and realized that this behavior is only reproducible, when I open several tabs in Chromium. For my other browser (Firefox) everything works as expected. Also, if I do curl requests, everything works as expected as well.

NGINX configuration for Rails 5 ActionCable with puma

I am using Jelastic for my development environment (not yet in production).
My application is running with Unicorn but I discovered websockets with ActionCable and integrated it in my application.
Everything is working fine in local, but when deploying to my Jelastic environment (with the default NGINX/Unicorn configuration), I am getting this message in my javascript console and I see nothing in my access log
WebSocket connection to 'ws://dev.myapp.com:8080/' failed: WebSocket is closed before the connection is established.
I used to have on my local environment and I solved it by adding the needed ActionCable.server.config.allowed_request_origins in my config file. So I double-checked my development config for this and it is ok.
That's why I was wondering if there is something specific for NGINX config, else than what is explained on ActionCable git page
bundle exec puma -p 28080 cable/config.ru
For my application, I followed everything from enter link description here but nothing's mentioned about NGINX configuration
I know that websocket with ActionCable is quite new but I hope someone would be able to give me a lead on that
Many thanks
Ok so I finally managed to fix my issue. Here are the different steps which allowed to make this work:
1.nginx : I don't really know if this is needed but as my application is running with Unicorn, I added this into my nginx conf
upstream websocket {
server 127.0.0.1:28080;
}
server {
location /cable/ {
proxy_pass http://websocket/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}
}
And then in my config/environments/development.rb file:
config.action_cable.url = "ws://my.app.com/cable/"
2.Allowed request origin: I have then noticed that my connection was refused even if I was using ActionCable.server.config.allowed_request_origins in my config/environments/development.rb file. I am wondering if this is not due to the development default as http://localhost:3000 as stated in the documentation. So I have added this:
ActionCable.server.config.disable_request_forgery_protection = true
I have not yet a production environment so I am not yet able to test how it will be.
3.Redis password: as stated in the documentation, I was using a config/redis/cable.yml but I was having this error:
Error raised inside the event loop: Replies out of sync: #<RuntimeError: ERR operation not permitted>
/var/www/webroot/ROOT/public/shared/bundle/ruby/2.2.0/gems/em-hiredis-0.3.0/lib/em-hiredis/base_client.rb:130:in `block in connect'
So I understood the way I was setting my password for my redis server was not good.
In fact your have to do something like this:
development:
<<: *local
:url: redis://user:password#my.redis.com:6379
:host: my.redis.com
:port: 6379
And now everything is working fine and Actioncable is really impressive.
Maybe some of my issues were trivial but I am sharing them and how I resolved them so everyone can pick something if needed

Resources