Lets say today is 2017-10-20. I have an existing dag which is successful till today. I need to add a task with a start_date of 2017-10-01. How to make the scheduler trigger task from 2017-10-01 to 2017-10-20 automatically ?
You can use the backfill command line tool.
airflow backfill your_dag_id -s 2017-10-01 -e 2017-10-20 -t task_name_regex
This is assuming there is already a DAG run for dates beginning from 2017-10-01. If that's not the case, make sure the DAG's start date is 2017-10-01 or earlier and that catchup is enabled.
If you don't mind executing the whole DAG again, you can remove it from the Web UI and it will appear again with status Off. If you enable it again, it will run from the beginning, including the new tasks.
Related
The Next Run of DAG aren't getting updated for paused dags, even though catchup is set to False.
I have a Dag that is scheduled to run daily and catchup is set to False
The Dag ran from 2022-06-01 to 2022-07-01. I paused the Dag on 2022-07-02 and un-paused/enabled it on 2022-07-15
Now Airflow started running dags for dates starting from 2022-07-02 even with catchup set to False.
What properties should be set make airflow run dags starting from the date the dags were un-paused(2022-07-15 in this case) ?
I also noticed that for the Dags that has never run and are in paused state the next run date is getting updated, It is only when a dag is run and paused the next run date is updated
From Airflow manual at https://airflow.apache.org/tutorial.html#testing, I found that I can run something like following to test a specific task:
airflow test dag_id task_id
When I did, I only got this message:
[2018-07-10 18:29:54,346] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2018-07-10 18:29:54,367] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2018-07-10 18:29:54,477] {__init__.py:45} INFO - Using executor SequentialExecutor
[2018-07-10 18:29:54,513] {models.py:189} INFO - Filling up the DagBag from /var/lib/airflow/dags
It doesn't look like it is really running it. Am I misunderstood? Or is there another way to run a DAG locally?
I copied this example call from the paragraph in the page you have linked to:
# command layout: command subcommand dag_id task_id date
# testing print_date
airflow test tutorial print_date 2015-06-01
# testing sleep
airflow test tutorial sleep 2015-06-01
So just include the date as shown above and the DAG task should run as expected.
for airflow version 2.4.0
airflow tasks test tutorial sleep 2015-06-01
I have made a very simple DAG that looks like this:
from datetime import datetime
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
cleanup_command = "/home/ubuntu/airflow/dags/scripts/log_cleanup/log_cleanup.sh "
dag = DAG(
'log_cleanup',
description='DAG for deleting old logs',
schedule_interval='10 13 * * *',
start_date=datetime(2018, 3, 30),
catchup=False,
)
t1 = BashOperator(task_id='cleanup_task', bash_command=cleanup_command, dag=dag)
The task finishes successfully but despite of this the DAG remains in "running" status. Any idea what could cause this. The screenshot below show the issue with the DAG remaining running. The earlier runs are only finished because I manually mark status as success. [Edit: I had originally written: "The earlier runs are only finished because I manually set status to running."]
The earlier runs are only finished because I manually set status to running.
Are you sure your scheduler is running? You can start it with $ airflow scheduler, and check the scheduler CLI command docs You shouldn't have to manually set tasks to running.
Your code here seems fine. One thing you might try is restarting your scheduler.
In the Airflow metadata database, DAG run end state is disconnected from task run end state. I've seen this happen before, but usually it resolves itself on the scheduler's next loop when it realizes all of the tasks in the DAG run have reached a final state (success, failed, or skipped).
Are you running the LocalExecutor, SequentialExecutor, or something else here?
I'm running Airflow and attempting to iterate on some task we're building from the command line.
When running a airflow webserver, everything works as expected. But when I run airflow backfill dag task '2017-08-12', airflow returns:
[2017-08-15 02:52:55,639] {__init__.py:57} INFO - Using executor LocalExecutor
[2017-08-15 02:52:56,144] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow/dags
2017-08-15 02:52:59,055 - airflow.jobs.BackfillJob - INFO - Backfill done. Exiting
...and doesn't actually run the dag.
When using airflow test or airflow run (i.e. commands involving running a task rather than a dag), it works as expected
Am I making a basic mistake? What can I do to debug from here?
Thanks
Have you run those DAG on that date range already? You will need to clear the DAG first then backfill. Base on what Maxime mentioned here: https://groups.google.com/forum/#!topic/airbnb_airflow/gMY-sc0QVh0
If a task has a #monthly schedule, then if you try and run it with a start_date mid-month, it will merely state Backfill done. Exiting.. If a task has a schedule of '30 5 * * *', this also prevents backfill from the command line
(Updated to reflect better information, and this discussion)
Two possible reasons:
Execution date specified via -e option is outside of the DAG's [start_date, end_date) range.
Even if execution date is between the dates, please keep in mind that if you DAG has schedule_interval=None then it won't backfill iteratively: it will only run for a single date (specified as --start_date or --end_date if the first is omitted).
i would like to create a workflow that starts after past execution have finished but not depends on their success status. meaning dags would be scheduled sequentially, without any dependency on past statuses.
i.e.:
executions by order:
2017-03-09 15:00:00 success
2017-03-09 16:00:00 failed
2017-03-09 17:00:00 success
2017-03-09 18:00:00 success
how can i do it using Airflow?
(i want it to be the same for backfill)
To run tasks irrespective of failed previous tasks in a given DAG:
setting the trigger_rule for each Operator to dummy or all_done
To run DAGs irrespective of previous DAG Run failures:
setting depends_on_past=False for each DAG
Explore more options in the trigger_rule section in the Concepts page of Airflow documentation
http://airflow.incubator.apache.org/concepts.html