Airflow database migration job is failing - airflow

I am trying to deploy airflow on our kubernetes cluster in our aws environment. We are using the official HELM chart airflow provides for this task.
When deploying the migrateDatabaseJob is started, but I can see that its quickly failing with a problem that can only be caused by a bug in the airflow source code I suppose:
[2022-03-25 09:27:17,567] {cli_action_loggers.py:105} WARNING - Failed to log action with (psycopg2.errors.UndefinedTable) relation "log" does not exist
LINE 1: INSERT INTO log (dttm, dag_id, task_id, event, execution_dat...
^
[SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (%(dttm)s, %(dag_id)s, %(task_id)s, %(event)s, %(execution_date)s, %(owner)s, %(extra)s) RETURNING log.id]
[parameters: {'dttm': datetime.datetime(2022, 3, 25, 9, 27, 17, 506252, tzinfo=Timezone('UTC')), 'dag_id': None, 'task_id': None, 'event': 'cli_upgradedb', 'execution_date': None, 'owner': 'airflow', 'extra': '{"host_name": "airflow-run-airflow-migrations-x7t6v", "full_command": "[\'/home/airflow/.local/bin/airflow\', \'db\', \'upgrade\']"}'}]
(Background on this error at: http://sqlalche.me/e/13/f405)
DB: postgresql://<db>/airflow?sslmode=prefer
[2022-03-25 09:27:18,510] {db.py:919} INFO - Creating tables
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
/home/airflow/.local/lib/python3.9/site-packages/azure/storage/common/_connection.py:82 SyntaxWarning: "is" with a literal. Did you mean "=="?
But it may also be the case that there are some problems with the custom dependencies we want to install into our airflow docker image:
FROM apache/airflow:2.2.4-python3.9
USER root
RUN apt-get update \
&& apt-get install build-essential -y \
&& apt-get install unixodbc-dev -y \
&& apt-get install libkrb5-dev -y
USER airflow
RUN pip install pipenv
COPY Pipfile /
RUN pipenv install --system
Can somebody point me in the right direction, I am quite lost on this issue at the moment, because it doesnt seems to be a configuration issue with the HELM chart. The connection to our external Postgres database seems to be working. It's seems to be a problem in the airflow source code itself.

For all the people who are having the same problem: The Airflow HELM chart seems only to be working with the standard tags e.g. 2.2.4and not 2.2.4-python3.9. This means that you have to use python3.7 (which is used by the standard tags) for running your DAGs.

The selected answer is wrong. Helm chart works similarly for -python images as well (adding this here so that the answer does not mislead passers-by).

Related

Getting internal exception while executing pg_dump when trying to get the init migrations from hasura

I want to get the latest migrations from my hasura endpoint to my local filesystem.
Command I'm trying to run
hasura init config --endpoint someendpoint.cloudfront.net/ --admin-secret mysecret
Output
INFO Metadata exported
INFO Creating migrations for source: default
ERRO creating migrations failed:
1 error occurred:
* applying migrations on source: default: cannot fetch schema dump: pg_dump request: 500
{
"path": "$",
"error": "internal exception while executing pg_dump",
"code": "unexpected"
}
run `hasura migrate create --from-server` from your project directory to retry
INFO directory created. execute the following commands to continue:
cd /home/kanhaya/Documents/clineage/study-container/src/conifg
hasura console
The Haura is deployed using a custom Docker image:
FROM hasura/graphql-engine:v2.3.0 as base
FROM ubuntu:focal-20220302
ENV HASURA_GRAPHQL_ENABLE_CONSOLE=true
ENV HASURA_GRAPHQL_DEV_MODE=true
# ENV HASURA_GRAPHQL_PG_CONNECTIONS=15
ENV HASURA_GRAPHQL_DATABASE_URL=somedatabaseURL
ENV HASURA_GRAPHQL_ADMIN_SECRET=mysecret
ENV EVENT_TRIGGER=google.com
ENV STUDY_CONFIG_ID=1
COPY ./Caddyfile ./Caddyfile
COPY ./install-packages.sh ./install-packages.sh
USER root
RUN ./install-packages.sh
RUN apt-get -y update \
&& apt-get install -y libpq-dev
COPY --from=base /bin/graphql-engine /bin/graphql-engine
EXPOSE 8080
CMD caddy start && graphql-engine --database-url $HASURA_GRAPHQL_DATABASE_URL serve --admin-secret $HASURA_GRAPHQL_ADMIN_SECRET
Which version of Postgres are you currently running?
There's a conversation over here with a couple of workarounds for supporting 14 in the Hasura container (looks like version mis-match in one of the libraries):
https://github.com/hasura/graphql-engine/issues/7676

Airflow2 gitSync DAG works for airflow namespace, but not alternate namespace

I'm running minikube to develop with Apache Airflow2. I am trying to sync my DAGs from a private repo on GitLab, but have taken a few steps back just to get a basic example working. In the case of the default "airflow" namespace, it works, but when using the exact same file in a non-default name space, it doesn't.
I have a values.yaml file which has the following section:
dags:
gitSync:
enabled: true
repo: "ssh://git#github.com/apache/airflow.git"
branch: v2-1-stable
rev: HEAD
depth: 1
maxFailures: 0
subPath: "tests/dags"
wait: 60
containerName: git-sync
uid: 65533
extraVolumeMounts: []
env: []
resources: {}
If I run helm upgrade --install airflow apache-airflow/airflow -f values.yaml -n airflow, and then kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow, I get a whole list of DAGs as expected at http://localhost:8080.
But if I run helm upgrade --install airflow apache-airflow/airflow -f values.yaml -n mynamespace, and then kubectl port-forward svc/airflow-webserver 8080:8080 --namespace mynamespace, I get no DAGs listed at http://localhost:8080.
This post would be 10 times longer if I listed all the sites I hit trying to resolve this. What have I done wrong??
UPDATE: I created a new namespace, test01, in case there was some history being held over and causing the problem. I ran helm upgrade --install airflow apache-airflow/airflow -f values.yaml -n test01. Starting the webserver and inspecting, I do not get a login screen, but it goes straight to the usual web pages, also does not show the dags list, but this time has a notice at the top of the DAG page:
The scheduler does not appear to be running.
The DAGs list may not update, and new tasks will not be scheduled.
This is different behaviour yet again (although the same as with mynamespace insofar as showing no DAGs via gitSync), even though it seems to suggest a reason why DAGs aren't being retrieved in this case. I don't understand why a scheduler isn't running if everything was spun-up and initiated the same as before.
Curiously, helm show values apache-airflow/airflow --namespace test01 > values2.yaml gives the default dags.gitSync.enabled: false and dags.gitSync.repo: https://github.com/apache/airflow.git. I would have thought that should reflect what I upgraded/installed from values.yaml: enable = true and the ssh repo fetch. I get no change in behaviour by editing values2.yaml to dags.gitSync.enabled: true and re-upgrading -- still the error note about scheduler no running, and no DAGs.

how to resolve "Error: No module named 'airflow.www'" while starting airflow websever

Getting below error while starting Airflow webserver
balajee#Balajees-MacBook-Air.local:~$ airflow webserver -p 8080
[2018-12-03 00:29:37,066] {init.py:51} INFO - Using executor SequentialExecutor
[2018-12-03 00:29:38,776] {models.py:271} INFO - Filling up the DagBag from /Users/balajee/airflow/dags
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
Error: No module named 'airflow.www'
Fixed for me
pip3 uninstall -y gunicorn
pip3 install gunicorn==19.4.0
I got this problem this morning, and I found a strange solution, may it helps you. I think maybe you just need to change the command running directory.
I install airflow basic dependence in my virtualenv directory venv with PyCharm help, and I use PyCharm build-in Terminal tab to directly access my venv, and I use airflow initdb to init sqlite database to store all my logs and ops, then according to the official tutorial I use airflow webserver to start the webserver. But somehow today I use my Mac terminal, and start virtulenv, and start airflow webserver, and I encounter this problem with:
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
=================================================================
Error: No module named 'airflow.www'
[2019-05-26 07:45:27,130] {cli.py:833} ERROR - No response from gunicorn master within 120 seconds
[2019-05-26 07:45:27,130] {cli.py:834} ERROR - Shutting down webserver
And I tried #Evgeniy Sobolev's solution by reinstall gunicorn and nothing changed, but when I still using my PyCharm Terminal, it can still running successfully. I guess maybe it is because the first directory you init your db and running webserver is critical. By default when I use PyCharm Terminal to init db and start webserver is the Project root directory, like:
(venv) root#root:~/GitHub/FakeProject$ airflow webserver
But today I check into venv to start virtualenv, and the root directory changed!
root#root:~/GitHub/FakeProject/SubDir$ source venv/bin/activate
(venv) root#root:~/GitHub/FakeProject/SubDir$ airflow webserver
** Error **
So in this way it encounters Error: No module named 'airflow.www', so I check out the directory, and the webserver running successfully just like PyCharm Terminal:
(venv) root#root:~/GitHub/FakeProject/SubDir$ cd ..
(venv) root#root:~/GitHub/FakeProject$ airflow webserver
** It works **
I thought maybe airflow store some metadata (like setup a PATH, maybe) in the first time init your airflow db, so you can not change your command running directory.
I hope it may help somebody in the future. Just check your directory!
Looks like you have a problem with gunicorn.
Try to execute this two commands:
sudo -H pip3 uninstall -y gunicorn
sudo -H pip3 install gunicorn
It should resolve your problem, cause airflow show you not clear error message related to gunicorn problems
I did this steps for the problem happens:
create a separate virtualenv only for airflow (I use anaconda distribution)
activate this env with conda activate
install airflow: pip install apache-airflow
at this moment the error No module named 'airflow.www' was showed for me
To fix follow this steps:
Look for where is your gunicorn in: whereis gunicorn
gunicorn have to stay only in your virtualenv directory: /home/yourname/anaconda3/envs/airflow_env/bin/gunicorn
If it stay in two directories, let it just in your airflow enviroment. Remove it all from another.
Another way to verify if gunicorn is in another directories is printing your PATH variable: echo $PATH. Look for gunicorn in /home/yourname/.local/bin and another anaconda directories from PATH. Remove all references. Remove gunicorn from conda base env as well: pip uninstall gunicorn.
With this steps, I think your problem will be solved.
I used anaconda distribution, but I think the same process can be done without it. I used airflow 1.10.0 and python 3.6.
If you have defined a custom home directory for airflow other than default one (~/airflow) during the installation:
You need first export the custom path:
export AIRFLOW_HOME=/your/custom/path/airflow
Go to the airflow directory and then Run the webserver
airflow webserver -p 8080
Run scheduler too
airflow scheduler
please check if gunicorn is installed already in server. for me it was installed in /usr/local/bin and it was taking precedence over gunicorn version installed with airflow. uninstall earlier one or fix $PATH variable
I solved this by starting the webserver from the airflow folder itself.
I was previously trying to open the server from the home directory but the required modules could not be found which may be the case here.
Late to the party but could help others who get here.
I got the same issue using latest airflow version 2.5.0
Make sure env variable AIRFLOW_HOME is pointing to right location
Thanks all for sharing
I added sudo and it actually worked just fine.
I got the same error today and a sudo did the trick to me

MariaDB is not working

I have installed mariadb using below commands:
yum install MariaDB-server MariaDB-client -y
But when execute commane:
service mysqld start
Error is coming:
mysqld: unrecognized service
Please let us know what wrong I am doing or what additionally I have to do to make it work.
Firstly verify that service sees MySQL using command:
sudo service --status-all
If you see MySQL there run command:
sudo service mysql start

Airflow installation issue on Windows 7

How to install Airflow on Windows 7? getting below error while installing it using pip install apache-airflow :
---------------------------------------- Command "c:\users\shrgupta5\appdata\local\programs\python\python36-32\python.exe
-u -c "import setuptools, tokenize;__file__='C:\\Users\\SHRGUP~1\\AppData\\Loca l\\Temp\\pip-build-_yptw7sa\\psutil\\setup.py';f=getattr(tokenize, 'open', open) (__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __fi le__, 'exec'))" install
--record C:\Users\SHRGUP~1\AppData\Local\Temp\pip-_cwm0n u7-record\install-record.txt --single-version-externally-managed
--compile" fail ed with error code 1 in C:\Users\SHRGUP~1\AppData\Local\Temp\pip-build-_yptw7sa\ psutil\
I wouldn't bother trying to install Airflow on windows, even after you install it successfully you cannot run the airflow script due to a dependency on the unix-only module pwd
You can run Airflow on Windows by using the Docker setup from puckel https://github.com/puckel/docker-airflow.
Use VirtualBox and Docker Toolbox(legacy) and setup a docker-machine on your Windows computer (docker-machine create -d virtualbox --virtualbox-cpu-count "2" --virtualbox-memory "2048" default)
make sure to fork puckels git repo underneath c:/Users/yourusername/documents otherwise mounting of DAGS won't work
You should now be able to spin up Airflow e.g. by using the celery-executor setup with docker compose -f docker-compose-CeleryExecutor.yml up -d
I have setup a environment where I develop the DAGs on Windows, test them within the dockercontainer and then push the Dockerimage to Linux to production. I have added a more detailed tutorial here.
Airflow cannot be installed on Windows within the standard command prompt.
You need to use bash and afterwards change the config:
How to run Airflow on Windows
Download the source of airflow from pypi:
https://pypi.org/project/airflow/#files
Unzip and edit setup.cfg, then go to the install_requires section and change the version of psutil with the following: 'psutil>=5.4.7',
Finally, run python setup.py install in the source directory

Resources