I want to set up a dag there are few cases that I would like to address while creating the dag.
Next run of the dag should be skiped if the current dag is in execution or failed, using catchup=False and max_active_runs=1 for this, do I need to use wait_for_downstream for this?
Should it be skipped or not scheduled to run.... an argument of depends_on_past: True in your DAG args might be what you need depending on your requirements.
Related
I have read that Airflows catchup feature applies to task instances that do not yet have a state - i.e. the scheduler will pick up any execution dates where the DAG has not yet ran (starting from the given start_date) - is this correct, and if so, does this mean catchup does not apply to failed DAG runs?
I am looking for a way to backfill any execution dates that failed, rather than not having ran at all.
Take a look at the backfill command options. You could use rerun-failed-tasks:
if set, the backfill will auto-rerun all the failed tasks for the backfill date range instead of throwing exceptions
Default: False
or reset-dagruns:
if set, the backfill will delete existing backfill-related DAG runs and start anew with fresh, running DAG runs
Default: False
Also, keep in mind this:
If reset_dag_run option is used, backfill will first prompt users whether airflow should clear all the previous dag_run and task_instances within the backfill date range. If rerun_failed_tasks is used, backfill will auto re-run the previous failed task instances within the backfill date range.
Before doing anything like the above, my suggestion is that you try it first with some dummy DAG or similiar.
Is there anyway to start the dag automatically without manually triggering dag once it is available in dagbag, considering is_paused_upon_creation=false is set.
Please, try to create your DAG with schedule_interval='#once'. It should works.
I have multiple dags that run on different cadence: some weekly, some daily etc. I want it to setup such that while dag-a is running, dag-b should wait until it is completed. Also, if dag-b is running dag-a should wait until dag-b completes, etc. Is there a way to do this in airflow out of the box?
What you are looking for is probably the ExternalTaskSensor
Airflow's Cross-DAG Dependencies description is also pretty useful.
If you are using this, there is also the Airflow DAG dependencies plugin, which can be pretty useful for visualizating those dependencies.
You could use the sensor operator to sense the dag runs or a task in a dag run. External task sensor is the best bet. Be careful how you set the timedelta passed. In general, the idea is to specify the when should the sensor be able to find the dag run.
Eg:
If the main dag is scheduled at 4 UTC, and a task sensor is a task in the dag like below
ExternalTaskSensor(
dag=dag,
task_id='dag_sensor_{}'.format(key),
external_dag_id=key,
timedelta=timedelta(days=1),
external_task_id=None,
mode='reschedule',
check_existence=True
)
Then the other dag that should get sensed must be triggering a run at 4.00UTC. That one day difference is set to offset the difference of execution date and current date
I have DAG with ExternalTaskSensor in it. I correctly set execution_delta and all work perfectly unless I want to run that DAG manually. My ExternalTaskSensor has state running and after timeout interval it's failed with exception airflow.exceptions.AirflowSensorTimeout: Snap. Time is OUT.
I know way to run it - after manually triggered DAG set every ExternalTaskSensor in DAG to state success in web interface.
But is the better way to run it manually without to set every ExternalTaskSensor in DAG to success ?
Is there a way to run airflow DAG in a loop?
When trying to create a cycle (connecting the last component to the upstream of the last one) I got "Cycle detected in DAG. Faulty task: ..."
Generally, I have a short flow of 3 BashOperator components which i want to run continually (without any input-output pass from the last component to the first).
Thanks!
You should be able to use the TriggerDagRunOperator to rerun the DAG after the last task is finished. Just put it after the last operator and make it trigger the same DAG.