Airflow initdb stuck on "add max tries column to task instance" - airflow

I'm using Airflow with MSSQL 2016 as backend. I started Airflow for the first time, running first Airflow initdb.
It seems to be fine until it get stuck (for more than an hour) on Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance.
I'm not sure why does it take so long, as this is the first time I'm running Airflow, thus the DB is empty, and no actual migration should happen...

If you look at migration file you'll see the loop for all dags and tasks. You probably have a lot of them. Just make airflow think there is no dags in the dags folder.
Solution: set environment variable AIRFLOW__CORE__DAGS_FOLDER to, for example, /tmp/123 (any empty directory will work), and run airflow initdb again

Related

Unable to create dag with apache airflow

I am running airflow 2.0, setting up airflow dag for the first time, and following quick start tutorials.
After creating and running the py file I don't see the dag created it does not list for me.
setting:
airflow.cfg:dags_folder = /Users/vik/src/airflow/dags
my python file is in this folder. There are no errors here.
I am able to see the example dags in example-dags.
I did airflow db init
airflow webserver
airflow scheduler
then try to list the dags
I think I am missing something
I don't know exactly how you installed everything, but I highly recommend Astronomer CLI for simplicity and quick setup. With that you'll be able to setup a first DAG pretty quickly. Here is also the video tutorial that helps you understand how to install / setup everything.
A few things to try:
Make sure the scheduleris running (run airflow scheduler) or try to restart it .
Using the Airflow CLI, run airflow config list and make sure that the loaded config is in fact what you are expecting, check the value of dags_folder.
Try running airflow dags list from the CLI, and check the the filepath if your dag is shown in the results.
If there was an error parsing your DAG, and therefore could not be loaded by the scheduler, you can find the logs in ${AIRFLOW_HOME}/logs/scheduler/${date}/your_dag_id.log

Airflow dag file is in place but it's not showing up when I do airflow dags list

I placed a dag file in the dags folder based on a tutorial with slight modifications, but it doesn't show up in the GUI or when run airflow dags list.
Answering my own question: Check the python file for Exceptions by running it directly. It turns out one exception in the dag's python script due to a missing import made the dag not show up in the list. I note this just in case another new user comes across this. To me the moral of the story is that dag files should often be checked by running with python directly when they are modified because there won't be an obvious error showing up otherwise; they may just disappear from the list.

Airflow 2.0 - Scheduler is unable to find serialized DAG in the serialized_dag table

I have 2 files inside dags directory - dag_1.py and dag_2.py
dag_1.py creates a static DAG and dag_2.py creates dynamic DAGs based on external json files at some location.
The static DAG (created by dag_1.py) contains a task at a later stage which generates some of these input json files for dag_2.py and dynamic DAGs are created in this manner.
This used to work fine with Airflow 1.x versions where DAG Serialization was not used. But with Airflow 2.0 DAG Serialization has become mandatory. Sometimes, I get the following exception in the Scheduler when dynamic DAGs are spawned -
[2021-01-02 06:17:39,493] {scheduler_job.py:1293} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/global/packages/python/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
self._run_scheduler_loop()
File "/global/packages/python/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/global/packages/python/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1474, in _do_scheduling
self._create_dag_runs(query.all(), session)
File "/global/packages/python/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1557, in _create_dag_runs
dag = self.dagbag.get_dag(dag_model.dag_id, session=session)
File "/global/packages/python/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in wrapper
return func(*args, **kwargs)
File "/global/packages/python/lib/python3.7/site-packages/airflow/models/dagbag.py", line 171, in get_dag
self._add_dag_from_db(dag_id=dag_id, session=session)
File "/global/packages/python/lib/python3.7/site-packages/airflow/models/dagbag.py", line 227, in _add_dag_from_db
raise SerializedDagNotFound(f"DAG '{dag_id}' not found in serialized_dag table")
airflow.exceptions.SerializedDagNotFound: DAG 'dynamic_dag_1' not found in serialized_dag table
After this the scheduler gets terminated which is expected.
When I check the table manually after this error, I am able to see the DAG entry in it.
This issue is not reproducible all the time. What can be the probable cause for this? Is there any Airflow configuration which I should try tweaking?
We had the same issue after updating in the following order:
1.10.12 -> 1.10.14
1.10.14 -> 2.0.0
I've followed their guide through, and we had no issues until at some random point after a few hours scheduler started crashing complaining about random DAGs not being found in the database.
Our deployment procedure involves clearing out /opt/airflow/dags folder and doing a clean install every time (we store dags and supporting code in python packages)
So every now and then on 1.10.x version we had cases when scheduler parsed an empty folder and wiped serialized dags from the database, but it always was able to restore the picture on next parse
Apparently in 2.0, as a part of the effort to make scheduler HA, they fully separated DAG processor and scheduler. Which leads to a race condition:
if scheduler job hits a database before DAG processor has updated serialized_dag table values, it finds nothing and crashes
if luck is on your side, the above will not happen and you won't see this exception
In order to get rid of this problem, I disabled scheduling of all DAGs by updating is_paused in the database, restarted the scheduler and once it generated serialized dags, turned all dags back ON
I fixed this issue in https://github.com/apache/airflow/pull/13893 which will be released as part for Airflow 2.0.1.
Will release Airflow 2.0.1 next week (8 Feb 2021 - most likely).
Not enough rep to comment so having to leave an answer, but:
is this a clean 2.0 install or an upgrade of your old 1.10.x instance? and
are you recycling the names?
I literally just hit this problem (I found this question googling to see who else was in the same boat).
In my case, it's an upgraded existing 1.10.x install, and although the dags were generated dynamically, the names were recycled. I was getting errors clicking on the dag in the GUI and it was killing the scheduler.
Turns Out(TM), deleting the dags entirely using the 'trashcan' button in the GUI overview and letting them regenerate fixed it (as in, the problem immediately went away and hasn't recurred in the last 30 minutes).
At a guess, it smells to me like maybe some aspect of the dynamic dags wasn't properly migrated in the db upgrade step, and wiping them out and letting them fully regenerate fixed the problem. Obviously, you lose all your history etc, but (in my case at least) that's not necessarily a big deal.
Selected answer didn't work for me(after bashing my head for few hours).
Heres what works:
Just go to the backend database(postgresql) and delete all the records regarding logs, task instances, faild task and ... but dont delete the main tables(if you cant tell the diffrence, just remove the tables i mentioned)
then go and do airflow db init
seems like old data about obsolete and deleted dags and tasks can really turn airflow into a mess. delete the mess, get it working.

Pick DAG related constants from configuration file WITHOUT restarting airflow

I am using some constants in my DAG which are being imported from another (configuration) Python file in the project directory.
Scenario
Airflow is running, I add a new DAG. I am importing the schedule_interval from that configuration file and some other constant as well which I am passing to a function being called in the PythonOperator in my DAG.
I update the code base, so new dag gets added in the airflow_dag folder and the configuration file gets updated with the new constants.
Problem
The schedule_interval does not work and the dag does not get scheduled. It also throws an exception (import error) for any other constant which is being imported in the dag.
In the web ui I can see the new dag, but I can also see a red label error that displays could not found constant XYZ in configuration_file.py while it's actually there.
It does not come here no matter how long I wait.
Bad Solution
I go and restart the airflow scheduler (and webserver as well just in case), and everything starts working again.
Question
Is there a solution to this where I will not have to restart airflow and update those things?
Note: The proposed solution to refresh dag in question Can airflow load dags file without restart scheduler did not work for me.

Airflow DAG "seems to be existing only locally. The master scheduler doesn't seem to be aware of its existence"

I used airflow for workflow of Spark jobs. After installation, I copy the DAG files into DAGs folder set in airflow.cfg. I can backfill the DAG to run the BashOperators successfully. But there is always a warning like the one mentioned. I didn't verify if the scheduling is fine, but I doubt scheduling can work as the warning said the master scheduler doesn't know of my DAG's existence. How can I eliminate this warning and get scheduling work? Anybody run into the same issue who can help me out?
This is usually connected to the Scheduler not running or the refresh interval being too wide. There are no log entries present so we cannot analyze from there. Also, unfortunately the very cause might have been ignored, because this is usually the root of the problem:
I didn't verify if the scheduling is fine.
So first you should check if both of the following services are running:
airflow webserver
and
airflow scheduler
If that won't help, see this post for more reference: Airflow 1.9.0 is queuing but not launching tasks

Resources