Running airflow worker gives error: Address already in use - airflow

I am running Airflow with CeleryExecutor. I am able to run the commands airflow webserver and airflow scheduler but trying to run airflow worker gives the error: socket.error: [Errno 98] Address already in use.
The traceback:

In the docker container running Airflow server a process was already running on the port 8793 which the worker_log_server_port settings in airflow.cfg refers by default. I changed the port to 8795 and the command airflow worker worked.
Or you can check the process listening to 8793 as: lsof i:8793 and if you don't need that process you kill it by: kill $(lsof -t -i:8793). I was running ubuntu container in docker I had to install lsof first:
apt-get update
apt-get install lsof

See if there's server_logs process running, if so, kill it and try again.
/usr/bin/python2 /usr/bin/airflow serve_logs

I had the same problem and Javed's answer about changing the worker_log_server_port on aiflow.cfg works for me.

Related

how to resolve "Error: No module named 'airflow.www'" while starting airflow websever

Getting below error while starting Airflow webserver
balajee#Balajees-MacBook-Air.local:~$ airflow webserver -p 8080
[2018-12-03 00:29:37,066] {init.py:51} INFO - Using executor SequentialExecutor
[2018-12-03 00:29:38,776] {models.py:271} INFO - Filling up the DagBag from /Users/balajee/airflow/dags
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
Error: No module named 'airflow.www'
Fixed for me
pip3 uninstall -y gunicorn
pip3 install gunicorn==19.4.0
I got this problem this morning, and I found a strange solution, may it helps you. I think maybe you just need to change the command running directory.
I install airflow basic dependence in my virtualenv directory venv with PyCharm help, and I use PyCharm build-in Terminal tab to directly access my venv, and I use airflow initdb to init sqlite database to store all my logs and ops, then according to the official tutorial I use airflow webserver to start the webserver. But somehow today I use my Mac terminal, and start virtulenv, and start airflow webserver, and I encounter this problem with:
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
=================================================================
Error: No module named 'airflow.www'
[2019-05-26 07:45:27,130] {cli.py:833} ERROR - No response from gunicorn master within 120 seconds
[2019-05-26 07:45:27,130] {cli.py:834} ERROR - Shutting down webserver
And I tried #Evgeniy Sobolev's solution by reinstall gunicorn and nothing changed, but when I still using my PyCharm Terminal, it can still running successfully. I guess maybe it is because the first directory you init your db and running webserver is critical. By default when I use PyCharm Terminal to init db and start webserver is the Project root directory, like:
(venv) root#root:~/GitHub/FakeProject$ airflow webserver
But today I check into venv to start virtualenv, and the root directory changed!
root#root:~/GitHub/FakeProject/SubDir$ source venv/bin/activate
(venv) root#root:~/GitHub/FakeProject/SubDir$ airflow webserver
** Error **
So in this way it encounters Error: No module named 'airflow.www', so I check out the directory, and the webserver running successfully just like PyCharm Terminal:
(venv) root#root:~/GitHub/FakeProject/SubDir$ cd ..
(venv) root#root:~/GitHub/FakeProject$ airflow webserver
** It works **
I thought maybe airflow store some metadata (like setup a PATH, maybe) in the first time init your airflow db, so you can not change your command running directory.
I hope it may help somebody in the future. Just check your directory!
Looks like you have a problem with gunicorn.
Try to execute this two commands:
sudo -H pip3 uninstall -y gunicorn
sudo -H pip3 install gunicorn
It should resolve your problem, cause airflow show you not clear error message related to gunicorn problems
I did this steps for the problem happens:
create a separate virtualenv only for airflow (I use anaconda distribution)
activate this env with conda activate
install airflow: pip install apache-airflow
at this moment the error No module named 'airflow.www' was showed for me
To fix follow this steps:
Look for where is your gunicorn in: whereis gunicorn
gunicorn have to stay only in your virtualenv directory: /home/yourname/anaconda3/envs/airflow_env/bin/gunicorn
If it stay in two directories, let it just in your airflow enviroment. Remove it all from another.
Another way to verify if gunicorn is in another directories is printing your PATH variable: echo $PATH. Look for gunicorn in /home/yourname/.local/bin and another anaconda directories from PATH. Remove all references. Remove gunicorn from conda base env as well: pip uninstall gunicorn.
With this steps, I think your problem will be solved.
I used anaconda distribution, but I think the same process can be done without it. I used airflow 1.10.0 and python 3.6.
If you have defined a custom home directory for airflow other than default one (~/airflow) during the installation:
You need first export the custom path:
export AIRFLOW_HOME=/your/custom/path/airflow
Go to the airflow directory and then Run the webserver
airflow webserver -p 8080
Run scheduler too
airflow scheduler
please check if gunicorn is installed already in server. for me it was installed in /usr/local/bin and it was taking precedence over gunicorn version installed with airflow. uninstall earlier one or fix $PATH variable
I solved this by starting the webserver from the airflow folder itself.
I was previously trying to open the server from the home directory but the required modules could not be found which may be the case here.
Late to the party but could help others who get here.
I got the same issue using latest airflow version 2.5.0
Make sure env variable AIRFLOW_HOME is pointing to right location
Thanks all for sharing
I added sudo and it actually worked just fine.
I got the same error today and a sudo did the trick to me

How to use Airflow scheduler with systemd?

The docs specify instructions for the integration
What I want is that every time the scheduler stop working it will be restarted by it's own. Usually I start it manually with airflow scheduler -D but sometimes it stops when I'm not available.
Reading the docs I'm not sure about the configs.
The GitHub contains the following files:
airflow
airflow-scheduler.service
airflow.conf
I'm running Ubuntu 16.04
Airflow is installed on:
home/ubuntu/airflow
I have path of:
etc/systemd
The docs says to:
Copy (or link) them to /usr/lib/systemd/system
Copy which of the files?
copy the airflow.conf to /etc/tmpfiles.d/
What is tmpfiles.d ?
What is # AIRFLOW_CONFIG= in the airflow file?
Or in another words... a more "down to earth" guide on how to do it?
Integrating Airflow with systemd files makes watching your daemons easy as systemd can take care of restarting a daemon on failure. This also enables to automatically start airflow webserver and scheduler on system start.
Edit the airflow file from systemd folder in Airflow Github as per the current configuration to set the environment variables for AIRFLOW_CONFIG, AIRFLOW_HOME & SCHEDULER.
Copy the services files (the files with .service extension) to /usr/lib/systemd/system in the VM.
Copy the airflow.conf file to /etc/tmpfiles.d/ or /usr/lib/tmpfiles.d/. Copying airflow.conf ensures /run/airflow is created with the right owner and permissions (0755 airflow airflow). Check whether /run/airflow exist with airflow:airflow owned by airflow user and airflow group if it doesn't create /run/airflowfolder with those permissions.
Enable this services by issuing systemctl enable <service> on command line as shown below.
sudo systemctl enable airflow-webserver
sudo systemctl enable airflow-scheduler
airflow-scheduler.service file should be as below:
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
Your question dates a little, but I just discovered it, because I'm interested at the moment in the same subject. I think the answer to your question is here.
https://medium.com/#shahbaz.ali03/run-apache-airflow-as-a-service-on-ubuntu-18-04-server-b637c03f4722

How do you keep your airflow scheduler running in AWS EC2 while exiting ssh?

Hi I'm using Airflow and put my airflow project in EC2. However, how would one keep the airflow scheduler running while my mac goes sleep or exiting ssh?
You have a few options, but none will keep it active on a sleeping laptop. On a server:
Can use --daemon to run as daemon: airflow scheduler --daemon
Or, maybe run in background: airflow scheduler >& log.txt &
Or, run inside 'screen' as above, then detach from screen using ctrl-a d, reattach as needed using 'screen -r'. That would work on an ssh connection.
I use nohup to keep the scheduler running and redirect the output to a log file like so:
nohup airflow scheduler >> ${AIRFLOW_HOME}/logs/scheduler.log 2>&1 &
Note: Assuming you are running the scheduler here on your EC2 instance and not on your laptop.
In case you need more details on running it as deamon i.e. detach completely from terminal and redirecting stdout and stderr, here is an example:
airflow webserver -p 8080 -D --pid /your-path/airflow-webserver.pid --stdout /your-path/airflow-webserver.out --stderr /your-path/airflow-webserver.err
airflow scheduler -D --pid /your-path/airflow-scheduler.pid —stdout /your-path/airflow-scheduler.out --stderr /your-path/airflow-scheduler.err
The most robust solution would be to register it as a service on your EC2 instance. Airflow provides systemd and upstart scripts for that (https://github.com/apache/incubator-airflow/tree/master/scripts/systemd and https://github.com/apache/incubator-airflow/tree/master/scripts/upstart).
For Amazon Linux, you'd need the upstart scripts, and for e.g. Ubuntu, you would use the systemd scripts.
Registering it as a system service is much more robust because Airflow will be started upon reboot or when it crashes. This is not the case when you use e.g. nohup like other people suggest here.

Docker "/bin/bash" could not be invoked when mounting an NFS file with -v on openstack

I'm running an Ubuntu 14.04 instance that has docker installed on openstack. I'm trying to mount a volume into a docker container. I'm doing this with
docker run -t -i -v /mnt/data/dir:/mnt/test ubuntu
Where /mnt/data/dir is an NFS shared directory. Doing this gets me:
docker:
Error response from daemon: Container command '/bin/bash' could not be invoked..
However, using a local directory instead of a mounted directory works exactly as expected.
I understand that docker doesn't natively support an NFS mounted file system, however the errors I googled are usually not of the form that I've mentioned above.
Any clue on how to proceed
Edit: I forgot to mention that its not just limited to /bin/bash could not be invoked. I tried running a tomcat server and that gave me the exact same error.

How to restart salt-minon daemon automatically after machine restart?

I've installed salt-minion on CentOS-7. Started minion using it's own user salt and command salt-minion -d.
Once the machine was restarted, salt-minion was not started automatically.
Suggest a clean solution.
On Centos 7 run the following command to ensure the salt-minion starts on boot:
systemctl enable salt-minion.service

Resources