i would like to create a workflow that starts after past execution have finished but not depends on their success status. meaning dags would be scheduled sequentially, without any dependency on past statuses.
i.e.:
executions by order:
2017-03-09 15:00:00 success
2017-03-09 16:00:00 failed
2017-03-09 17:00:00 success
2017-03-09 18:00:00 success
how can i do it using Airflow?
(i want it to be the same for backfill)
To run tasks irrespective of failed previous tasks in a given DAG:
setting the trigger_rule for each Operator to dummy or all_done
To run DAGs irrespective of previous DAG Run failures:
setting depends_on_past=False for each DAG
Explore more options in the trigger_rule section in the Concepts page of Airflow documentation
http://airflow.incubator.apache.org/concepts.html
Related
The Next Run of DAG aren't getting updated for paused dags, even though catchup is set to False.
I have a Dag that is scheduled to run daily and catchup is set to False
The Dag ran from 2022-06-01 to 2022-07-01. I paused the Dag on 2022-07-02 and un-paused/enabled it on 2022-07-15
Now Airflow started running dags for dates starting from 2022-07-02 even with catchup set to False.
What properties should be set make airflow run dags starting from the date the dags were un-paused(2022-07-15 in this case) ?
I also noticed that for the Dags that has never run and are in paused state the next run date is getting updated, It is only when a dag is run and paused the next run date is updated
I am using airflow 1.9.0 with the LocalExecutor.
I have a subdag containing two long-running tasks. The structure of the subdag is:
def create_subdag(name_suffix, default_args):
dag_name = '{}.{}'.format(parent_dag_name, name_suffix)
subdag = DAG(dag_name, start_date=start, schedule_interval=schedule, default_args=default_args)
t1 = BashOperator(
task_id='print_date',
bash_command='some_long_running_cmd_1',
dag=subdag)
t2 = BashOperator(
task_id='sleep',
bash_command='some_long_running_cmd_2',
dag=subdag)
sub_dag_1 = SubDagOperator(
subdag=create_subdag('subdag1', default_args),
task_id='subdag1',
dag=dag)
I would like to be able to re-run task t2 when it fails even if task t1 is still running. Normally, clearing the status of a failed task causes it to get re-scheduled, even if other tasks in the dag are running. However, clearing the status of task t2 does not get it re-scheduled. Furthermore, clearing the status of sub_dag_1 while it is still running seems to get the scheduler into a hung state where the DAG never transitions out of running even after t2 completes, but t1 is never rescheduled for execution.
Is there a way to re-run a task in the subdag immediately without waiting for the other tasks to complete?
I have the following possible solutions:
In your subdag, try clear t2 first and then manually re-run t2 independently
Usually when sub_dag_1 finished (either ended up in a successful state of failure state), then clear t2 will automatically re-trigger the t2 to run.
If you still wanna t2 to re-run given t1 is still running and sub_dag_1 is running too, make sub_dag_1 as success (non-recursively). then repeat 2.
Hopefully, this will help you. BTW, SubDagOperator is not a very good operator to use in real world application, based on my experiences, it add more issues (e.g., subdag deadlock, celery workers become slow to pick up subdag tasks. etc) compared to the problem it solved.
From Airflow manual at https://airflow.apache.org/tutorial.html#testing, I found that I can run something like following to test a specific task:
airflow test dag_id task_id
When I did, I only got this message:
[2018-07-10 18:29:54,346] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2018-07-10 18:29:54,367] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2018-07-10 18:29:54,477] {__init__.py:45} INFO - Using executor SequentialExecutor
[2018-07-10 18:29:54,513] {models.py:189} INFO - Filling up the DagBag from /var/lib/airflow/dags
It doesn't look like it is really running it. Am I misunderstood? Or is there another way to run a DAG locally?
I copied this example call from the paragraph in the page you have linked to:
# command layout: command subcommand dag_id task_id date
# testing print_date
airflow test tutorial print_date 2015-06-01
# testing sleep
airflow test tutorial sleep 2015-06-01
So just include the date as shown above and the DAG task should run as expected.
for airflow version 2.4.0
airflow tasks test tutorial sleep 2015-06-01
Lets say today is 2017-10-20. I have an existing dag which is successful till today. I need to add a task with a start_date of 2017-10-01. How to make the scheduler trigger task from 2017-10-01 to 2017-10-20 automatically ?
You can use the backfill command line tool.
airflow backfill your_dag_id -s 2017-10-01 -e 2017-10-20 -t task_name_regex
This is assuming there is already a DAG run for dates beginning from 2017-10-01. If that's not the case, make sure the DAG's start date is 2017-10-01 or earlier and that catchup is enabled.
If you don't mind executing the whole DAG again, you can remove it from the Web UI and it will appear again with status Off. If you enable it again, it will run from the beginning, including the new tasks.
I'm running Airflow and attempting to iterate on some task we're building from the command line.
When running a airflow webserver, everything works as expected. But when I run airflow backfill dag task '2017-08-12', airflow returns:
[2017-08-15 02:52:55,639] {__init__.py:57} INFO - Using executor LocalExecutor
[2017-08-15 02:52:56,144] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow/dags
2017-08-15 02:52:59,055 - airflow.jobs.BackfillJob - INFO - Backfill done. Exiting
...and doesn't actually run the dag.
When using airflow test or airflow run (i.e. commands involving running a task rather than a dag), it works as expected
Am I making a basic mistake? What can I do to debug from here?
Thanks
Have you run those DAG on that date range already? You will need to clear the DAG first then backfill. Base on what Maxime mentioned here: https://groups.google.com/forum/#!topic/airbnb_airflow/gMY-sc0QVh0
If a task has a #monthly schedule, then if you try and run it with a start_date mid-month, it will merely state Backfill done. Exiting.. If a task has a schedule of '30 5 * * *', this also prevents backfill from the command line
(Updated to reflect better information, and this discussion)
Two possible reasons:
Execution date specified via -e option is outside of the DAG's [start_date, end_date) range.
Even if execution date is between the dates, please keep in mind that if you DAG has schedule_interval=None then it won't backfill iteratively: it will only run for a single date (specified as --start_date or --end_date if the first is omitted).