I'm trying to deploy airflow in a production environment on a server running nginx and uWSGI.
I've searched the web and found instructions on installing airflow behind a reverse proxy, but those instructions only have nginx config examples. However, due to the permissions, I can't change the nginx.conf itself and have to solve it via uswsgi.
My folder structure is:
project_folder
|_airflow
|_airflow.cfg
|_webserver_config.py
|_wsgi.py
|_env
|_start
|_stop
|_uwsgi.ini
My path/to/myproject/uwsgi.ini file is configured as follows:
[uwsgi]
master = True
http-socket = 127.0.0.1:9999
virtualenv = /path/to/myproject/env/
daemonize = /path/to/myproject/uwsgi.log
pidfile = /path/to/myproject/tmp/myapp.pid
workers = 2
threads = 2
# adjust the following to point to your project
wsgi-file = /path/to/myproject/airflow/wsgi.py
touch-reload = /path/to/myproject/airflow/wsgi.py
and currently the /path/to/myproject/airflow/wsgi.py looks as follows:
def application(env, start_response):
start_response('200 OK', [('Content-Type','text/html')])
return [b'Hello World!']
I'm assuming I have to somehow call the airflow flask app from the wsgi.py file (perhaps by also changing some reverse proxy fix configs, since I'm behind SSL), but I'm stuck; what do I have to configure?
Will this procedure then be identical for the workers and scheduler?
Below is the complete code simplified for the question.
ids_to_check returns a list of ids. For my testing, I used a list of 13 random strings.
#!/usr/bin/env python3
import time
from multiprocessing.dummy import Pool as ThreadPool, current_process as threadpool_process
import requests
def ids_to_check():
some_calls()
return(id_list)
def execute_task(id):
url = f"https://myserver.com/todos/{ id }"
json_op = s.get(url,verify=False).json()
value = json_op['id']
print(str(value) + '-' + str(threadpool_process()) + str(id(s)))
def main():
pool = ThreadPool(processes=20)
while True:
pool.map(execute_task, ids_to_check())
print("Let's wait for 10 seconds")
time.sleep(10)
if __name__ == "__main__":
s = requests.Session()
s.headers.update = {
'Accept': 'application/json'
}
main()
Output:
4-<DummyProcess(Thread-2, started daemon 140209222559488)>140209446508360
5-<DummyProcess(Thread-5, started daemon 140209123481344)>140209446508360
7-<DummyProcess(Thread-6, started daemon 140209115088640)>140209446508360
2-<DummyProcess(Thread-11, started daemon 140208527894272)>140209446508360
None-<DummyProcess(Thread-1, started daemon 140209230952192)>140209446508360
10-<DummyProcess(Thread-4, started daemon 140209131874048)>140209446508360
12-<DummyProcess(Thread-7, started daemon 140209106695936)>140209446508360
8-<DummyProcess(Thread-3, started daemon 140209140266752)>140209446508360
6-<DummyProcess(Thread-12, started daemon 140208519501568)>140209446508360
3-<DummyProcess(Thread-13, started daemon 140208511108864)>140209446508360
11-<DummyProcess(Thread-10, started daemon 140208536286976)>140209446508360
9-<DummyProcess(Thread-9, started daemon 140209089910528)>140209446508360
1-<DummyProcess(Thread-8, started daemon 140209098303232)>140209446508360
Let's wait for 10 seconds
None-<DummyProcess(Thread-14, started daemon 140208502716160)>140209446508360
3-<DummyProcess(Thread-20, started daemon 140208108455680)>140209446508360
1-<DummyProcess(Thread-19, started daemon 140208116848384)>140209446508360
7-<DummyProcess(Thread-17, started daemon 140208133633792)>140209446508360
6-<DummyProcess(Thread-6, started daemon 140209115088640)>140209446508360
4-<DummyProcess(Thread-4, started daemon 140209131874048)>140209446508360
9-<DummyProcess(Thread-16, started daemon 140208485930752)>140209446508360
5-<DummyProcess(Thread-15, started daemon 140208494323456)>140209446508360
2-<DummyProcess(Thread-2, started daemon 140209222559488)>140209446508360
8-<DummyProcess(Thread-18, started daemon 140208125241088)>140209446508360
11-<DummyProcess(Thread-1, started daemon 140209230952192)>140209446508360
10-<DummyProcess(Thread-11, started daemon 140208527894272)>140209446508360
12-<DummyProcess(Thread-5, started daemon 140209123481344)>140209446508360
Let's wait for 10 seconds
None-<DummyProcess(Thread-3, started daemon 140209140266752)>140209446508360
2-<DummyProcess(Thread-10, started daemon 140208536286976)>140209446508360
1-<DummyProcess(Thread-12, started daemon 140208519501568)>140209446508360
4-<DummyProcess(Thread-9, started daemon 140209089910528)>140209446508360
5-<DummyProcess(Thread-14, started daemon 140208502716160)>140209446508360
9-<DummyProcess(Thread-6, started daemon 140209115088640)>140209446508360
8-<DummyProcess(Thread-16, started daemon 140208485930752)>140209446508360
7-<DummyProcess(Thread-4, started daemon 140209131874048)>140209446508360
3-<DummyProcess(Thread-20, started daemon 140208108455680)>140209446508360
6-<DummyProcess(Thread-8, started daemon 140209098303232)>140209446508360
12-<DummyProcess(Thread-13, started daemon 140208511108864)>140209446508360
10-<DummyProcess(Thread-7, started daemon 140209106695936)>140209446508360
11-<DummyProcess(Thread-19, started daemon 140208116848384)>140209446508360
Let's wait for 10 seconds
.
.
My observation:
multiple connections are created (i.e., connection per process), but session object is same throughtout the execution of the code (as session object id is same)
connections keep recycling as seen from ss output. I couldn't identify any certain pattern/timeout for the recycling
connections are not recycling if I reduce the processes to a smaller number. (Example: 5)
I do not understand how/why the connections are being recycled and why they are not if I reduce the process count. I have tried disabling the garbage collector import gc; gc.disable() and still connections are recycled.
I would like the created connections to keep alive, until it reaches a maximum number of requests. I think it would work without sessions and using keep-alive connection header.
But I am curious to know what causing these sessions connections to keep recycling when a process pool length is high.
I can reproduce this issue with any server, so it may not be dependent on server.
I solved the same issue for myself by creating session for each process and parallelized requests executions. And at first time I used multiprocessing.dummy too, but I faced the same issue as yours and changed it to concurrent.futures.thread.ThreadPoolExecutor.
Here is my solution.
from concurrent.futures.thread import ThreadPoolExecutor
from functools import partial
from requests import Session, Response
from requests.adapters import HTTPAdapter
def thread_pool_execute(iterables, method, pool_size=30) -> list:
"""Multiprocess requests, returns list of responses."""
session = Session()
session.mount('https://', HTTPAdapter(pool_maxsize=pool_size)) # that's it
session.mount('http://', HTTPAdapter(pool_maxsize=pool_size)) # that's it
worker = partial(method, session)
with ThreadPoolExecutor(pool_size) as pool:
results = pool.map(worker, iterables)
session.close()
return list(results)
def simple_request(session, url) -> Response:
return session.get(url)
response_list = thread_pool_execute(list_of_urls, simple_request)
I test sitemaps with 200k urls with it with pool_size=150 without any problems. It's restricts only by target host configuration.
Have had airflow webserver -D deamon process (v1.10.7) running on machine (CentOS 7) for long time. Suddenly saw that the webserver could no longer be accessed and checking the airflow-webserver.log saw...
[airflow#airflowetl airflow]$ cat airflow-webserver.log
2020-10-23 00:57:15,648 ERROR - No response from gunicorn master within 120 seconds
2020-10-23 00:57:15,649 ERROR - Shutting down webserver
(nothing of note in airflow-webserver.err)
[airflow#airflowetl airflow]$ cat airflow-webserver.err
/home/airflow/.local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")
The airflow.cfg values for the webserver section looks like...
[webserver]
# The base url of your website as airflow cannot guess what domain or
# cname you are using. This is used in automated emails that
# airflow sends to point links to the right web server
#base_url = http://localhost:8080
base_url = http://airflowetl.co.local:8080
# The ip specified when starting the web server
web_server_host = 0.0.0.0
# The port on which to run the web server
web_server_port = 8080
# Paths to the SSL certificate and key for the web server. When both are
# provided SSL will be enabled. This does not change the web server port.
web_server_ssl_cert =
web_server_ssl_key =
# Number of seconds the webserver waits before killing gunicorn master that doesn't respond
web_server_master_timeout = 120
# Number of seconds the gunicorn webserver waits before timing out on a worker
#web_server_worker_timeout = 120
web_server_worker_timeout = 300
# Number of workers to refresh at a time. When set to 0, worker refresh is
# disabled. When nonzero, airflow periodically refreshes webserver workers by
# bringing up new ones and killing old ones.
worker_refresh_batch_size = 1
# Number of seconds to wait before refreshing a batch of workers.
worker_refresh_interval = 30
# Secret key used to run your flask app
secret_key = my_key
# Number of workers to run the Gunicorn web server
workers = 4
# The worker class gunicorn should use. Choices include
# sync (default), eventlet, gevent
worker_class = sync
Ultimately, just restarted the process as a daemon again (airflow webserver -D (should I have deleted the old airflow-webserer.log and .err files first?)), but not sure what would make this happen, since it had had no problems running for months before this.
Could anyone with more experience explain what could have happened after all this time and how I could prevent it in the future? Any issues with running dags or anything else that I should check for that this temporary unexpected shutdown of the websever may have caused?
I am experiencing the same issue, and it only started (very unfrequently) when I changed the following two config parameters in the webserver.
worker_refresh_interval = 120
workers = 2
However, my parameters are also set quite differently than yours, will share them here.
rbac = True
web_server_host = 0.0.0.0
web_server_port = 8080
web_server_master_timeout = 600
web_server_worker_timeout = 600
default_ui_timezone = Europe/Amsterdam
reload_on_plugin_change = True
After comparing the two, as your settings of the two I changed were set to the default (same as me before changing them), it seems that it is a combination of more parameters.
I have a problem deploying a pytorch model in production. For a demonstration, I build a simple model and a flask app. I put everything in a docker container (pytorch+flask+uwsgi) plus another container for nginx. Everything is running well, my app is rendered and I can navigate inside. However, well I navigate into the URL that launches a prediction of the model, the server hangs and does not seem to compute anything.
The uWSGI is run like this:
/opt/conda/bin/uwsgi --ini /usr/src/web/uwsgi.ini
with uwsgi.ini
[uwsgi]
#application's base folder
chdir = /usr/src/web/
#python module to import
wsgi-file = /usr/src/web/wsgi.py
callable = app
#socket file's location
socket = /usr/src/web/uwsgi.sock
#permissions for the socket file
chmod-socket = 666
# Port to expose
http = :5000
# Cleanup the socket when process stops
vacuum = true
#Log directory
logto = /usr/src/web/app.log
# minimum number of workers to keep at all times
cheaper = 2
processes = 16
As said, the server hangs and I finally got a timeout. What is strange is when I run the flask application directly (also in the container) with
python /usr/src/web/manage.py runserver --host 0.0.0.0
I get my prediction in no time
I think this is related to
https://discuss.pytorch.org/t/basic-operations-do-not-work-in-1-1-0-with-uwsgi-flask/50257
Maybe try as mentioned there:
app = flask.Flask(__name__)
segmentator = None
#app.before_first_request
def load_segmentator():
global segmentator
segmentator = Segmentator()
where Segmentator is a class with pytorch’s nn.Module, which loads weights in __init__
FYI this solution worked for me with one app but not the other
I want to run a Flask application on my Raspberry Pi 3. I already developed the Flask app and it works fine, but this is on Flask's development server.
I want to use a production server so i'm using nginx as the webserver and uWSGI as the application server on the Pi. Now, the Flask app uses server sent events (SSE) to to get live data from the server. When I run the app using uWSGI, it stalls. I believe its because i'm using SSE because I had a similar problem on the Flask server but all I did was enable threading and the problem was solved. Enabling threading on uWSGI (when running the uWSGI script) doesn't solve the issue though. HELP!
This is my uWSGI .ini file.
[uwsgi]
base = /home/pi/heap
app = app
module = %(app)
home = %(base)/venv
pythonpath = %(base)
socket = /home/pi/heap/%n.sock
chmod-socket = 666
callable = app
Thank you!
Try running it in port instead of socket mode with defined processes and threads.
[uwsgi]
base = project_path
chdir = project_path
module = your_module_name
callable = your_app_name
enable-threads = true
master = true
processes = 5
threads = 2
http = :5000