Cannot setup a MySQL Backend for Airflow LocalExecutor - airflow

I need to run dags in parallel but do not need significant scaling, so LocalExecutor can do the job just fine. I looked through the Airflow docs and first created a MySQL database:
CREATE DATABASE airflow_db CHARACTER SET utf8;
CREATE USER <user> IDENTIFIED BY <pass>;
GRANT ALL PRIVILEGES ON airflow_db.* TO <user>;
I then modify the following parameters in the airflow.cfg file:
executor = LocalExecutor
sql_alchemy_conn = mysql+mysqlconnector://<user>:<pass>#localhost:3306/airflow_db
When I run airflow db init, I run into the following error message:
AttributeError: 'MySQLConverter' object has no attribute '_dagruntype_to_mysql'
During handling of the above exception, another exception occurred:
TypeError: Python 'dagruntype' cannot be converted to a MySQL type
Please note that nothing else in the airflow.cfg file was altered and that using the default SequentialExecutor with sqlite lets everything run just fine. Also note that I am using Airflow version 2.2.0

I found the solution to my own question. Instead of using the mysqlconnector, I used the pymysql driver:
pip install PyMySQL
The airflow.cfg parameters can then be adjusted as follows:
sql_alchemy_conn = mysql+pymysql://<user>:<pass>#localhost:3306/airflow_db
All else can stay the same.

Related

Missing environment variables of connections

I added connection to s3 from UI
enter image description here
and supposed to see AIRFLOR_CONN_S3_CONN in my env variables due to https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html, but it didn't show up
(I used BashOperator with printenv command to see all env variables but didn't find any AIRFLOR_CONN_). What can be the problem?
I'm using Airflow 2.2.5

Error in creating airflow connection: fails with incorrect padding issue

When I try to create connection (SSH) in Airflow with username and password it fails an error like below
"Failed to update record. Could not create Fernet object: Incorrect padding"
You'll need to set fernet key in airflow.cfg or as env variable:
export AIRFLOW__CORE__FERNET_KEY=your_fernet_key
See documentation for more information.

password_auth.py mushroom cloud error on Airflow webserver open when switching backend DBs

Trying to transition from sqlite db to postgresql (based on the guide here: https://www.ryanmerlin.com/2019/07/apache-airflow-installation-on-ubuntu-18-04-18-10/ ) and getting mushroom cloud error at initial screen of webserver UI.
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
...
...
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/www/utils.py", line 93, in is_accessible
(not current_user.is_anonymous and current_user.is_superuser())
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/auth/backends/password_auth.py", line 114, in is_superuser
return hasattr(self, 'user') and self.user.is_superuser()
AttributeError: 'NoneType' object has no attribute 'is_superuser'
Looking at the webserver logs does not reveal much...
[airflow#airflowetl airflow]$ tail airflow-webserver.*
==> airflow-webserver.err <==
/home/airflow/.local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")
==> airflow-webserver.log <==
==> airflow-webserver.out <==
[2019-12-18 10:20:36,553] {settings.py:213} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=72725
==> airflow-webserver.pid <==
72745
One thing that may be useful to note (since this appears to be due to some kind of password issue) is that before trying to switch to postgres, I had set bycrpt password according to the docs (https://airflow.apache.org/docs/stable/security.html#password) with the script here:
import airflow
from airflow import models, settings
from airflow.contrib.auth.backends.password_auth import PasswordUser
user = PasswordUser(models.User())
user.username = 'airflow'
user.email = 'myemail#co.org'
user.password = 'mypasword'
session = settings.Session()
session.add(user)
session.commit()
session.close()
exit()
Anyone know what could be going on here or how to debug further?
Rerunning my user/password script fixed the problem.
I assume this is connected to the change to the new postgres server from the previously used sqllite db. I guess this is stored somewhere in the airflow backend DB (rather new to airflow, so was not aware of these internals) and since I am switching backends the new backend does not have this user/auth info and need to rerun the script to import the airflow package and write a new user/password to its backend to be able to log in with password (since my airflow.cfg uses auth_backend = airflow.contrib.auth.backends.password_auth).

airflow-scheduler not accept mysql settings, start with sqlite. why

airflow webserver can run without problem.
airflow scheduler would get error message:
Cannot use more than 1 thread when using sqlite. Setting parallelism to 1
airflow.cfg:
sql_alchemy_conn = mysql+pymysql://root:mypassword#localhost:3306/airflow
Have you set $AIRFLOW_HOME wherever you run scheduler too?
Looks like the scheduler is not picking up the airflow.cfg file at all.

Dag Seems to be missing

I have a dag which checks for new workflows to be generated (Dynamic DAG) at a regular interval and if found, creates them. (Ref: Dynamic dags not getting added by scheduler )
The above DAG is working and the dynamic DAGs are getting created and listed in the web-server. Two issues here:
When clicking on the DAG in web url, it says "DAG seems to be missing"
The listed DAGs are not listed using "airflow list_dags" command
Error:
DAG "app01_user" seems to be missing.
The same is for all other dynamically generated DAGs. I have compiled the Python script and found no errors.
Edit1:
I tried clearing all data and running "airflow run". It ran successfully but no Dynamic generated DAGs were added to "airflow list_dags". But when running the command "airflow list_dags", it loaded and executed the DAG, (which generated Dynamic DAGs). The dynamic DAGs are also listed as below:
[root#cmnode dags]# airflow list_dags
sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8\nLANG=en_US.UTF-8)
sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8\nLANG=en_US.UTF-8)
[2019-08-13 00:34:31,692] {settings.py:182} INFO - settings.configure_orm(): Using pool settings. pool_size=15, pool_recycle=1800, pid=25386
[2019-08-13 00:34:31,877] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-08-13 00:34:32,113] {__init__.py:305} INFO - Filling up the DagBag from /root/airflow/dags
/usr/lib/python2.7/site-packages/airflow/operators/bash_operator.py:70: PendingDeprecationWarning: Invalid arguments were passed to BashOperator (task_id: tst_dyn_dag). Support for passing such arguments will be dropped in Airflow 2.0. Invalid arguments were:
*args: ()
**kwargs: {'provide_context': True}
super(BashOperator, self).__init__(*args, **kwargs)
-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
app01_user
app02_user
app03_user
app04_user
testDynDags
Upon running again, all the above generated 4 dags disappeared and only the base DAG, "testDynDags" is displayed.
When I was getting this error, there was an exception showing up in the webserver logs. Once I resolved that error and I restarted the webserver it went through normally.
From what I can see this is the error that is thrown when the webserver tried to parse the dag file and there is an error. In my case it was an error importing a new operator I added to a plugin.
Usually, I check in Airflow UI, sometimes the reason of broken DAG appear in there. But if it is not there, I usually run the .py file of my DAG, and error (reason of DAG cant be parsed) will appear.
I never got to work on dynamic DAG generation but I did face this issue when DAG was not present on all nodes ( scheduler, worker and webserver ). In case you have airflow cluster, please make sure that DAG is present on all airflow nodes.
Same error, the reason was I renamed my dag_id in uppercase. Something like "import_myclientname" into "import_MYCLIENTNAME".
I am little late to the party but I faced the error today:
In short: try executing airflow dags report and/or airflow dags reserialize
Check out my comment here:
https://stackoverflow.com/a/73880927/4437153
I found that airflow fails to recognize a dag defined in a file that does not have from airflow import DAG in it, even if DAG is not explicitly used in that file.
For example, suppose you have two files, a.py and b.py:
# a.py
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
def makedag(dag_id="a"):
with DAG(dag_id=dag_id) as dag:
DummyOperator(task_id="nada")
dag = makedag()
and
# b.py
from a import makedag
dag = makedag(dag_id="b")
Then airflow will only look at a.py. It won't even look at b.py at all, even to notice if there's a syntax error in it! But if you add from airflow import DAG to it and don't change anything else, it will show up.

Resources