Is there anyway to start the dag automatically without manually triggering dag once it is available in dagbag, considering is_paused_upon_creation=false is set.
Please, try to create your DAG with schedule_interval='#once'. It should works.
Related
Recently, we have been getting some errors on airflow where certain dags will not run any tasks but are being marked as complete.
We had the start_date using days_ago from airflow.
from airflow.utils.dates import days_ago
From: https://forum.astronomer.io/t/dag-run-marked-as-success-but-no-tasks-even-started/1423
If you see dag runs that are marked as success but don’t have any task runs, this means the dag runs’ execution_date was earlier than the dag’s start_date.
This is most commonly seen when the start_date is set to some dynamic value e.g. airflow.utils.dates.days_ago(0). This creates the opportunity for the execution date of a delayed dag execution to be before what the dag now thinks is it’s start_date. This can even happen in a cyclic pattern, where a few dagruns will work, and then at the beginning of every day a dagrun will experience this problem.
This simplest way to avoid this problem is the never use dynamic start_date. It is always better to specify a static start_date. If you are concerned about accidentally triggering multiple runs of the same dag, just set catchup=False.
There is an open ticket in Airflow project with this issue: https://github.com/apache/airflow/issues/17977
I am trying to manage airflow dags (create, execute etc.) via java backend. Currently after creating a dag and placing it in dags folder of airflow my backend is constantly trying to run the dag. But it can't run it until its picked up by airflow scheduler, which can take quite some time if the number of dags are more. I am wondering if there any events that airflow emits which I can tap to check for new dags processed by scheduler, and then trigger, execute command from my backend. Or is there a way or configuration where airflow will automatically start a dag once it processes it rather than we triggering it ?
is there a way or configuration where airflow will automatically start a dag once it processes it rather than we triggering it ?
Yes, one of the parameters that you can define is is_paused_upon_creation.
If you set your DAG as:
DAG(
dag_id='tutorial',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval="#daily",
start_date=datetime(2020, 12, 28),
is_paused_upon_creation=False
)
The DAG will start as soon as picked up by the scheduler (assuming conditions to run it are met)
I am wondering if there any events that airflow emits which I can tap to check for new dags processed by scheduler
In Airflow >=2.0.0 you can use the API - list dags endpoint to get all dags that are in the dagbag
In any Airflow version you can use this code to list the dag_ids:
from airflow.models import DagBag
print(DagBag().dag_ids())
I want to set up a dag there are few cases that I would like to address while creating the dag.
Next run of the dag should be skiped if the current dag is in execution or failed, using catchup=False and max_active_runs=1 for this, do I need to use wait_for_downstream for this?
Should it be skipped or not scheduled to run.... an argument of depends_on_past: True in your DAG args might be what you need depending on your requirements.
I have DAG with ExternalTaskSensor in it. I correctly set execution_delta and all work perfectly unless I want to run that DAG manually. My ExternalTaskSensor has state running and after timeout interval it's failed with exception airflow.exceptions.AirflowSensorTimeout: Snap. Time is OUT.
I know way to run it - after manually triggered DAG set every ExternalTaskSensor in DAG to state success in web interface.
But is the better way to run it manually without to set every ExternalTaskSensor in DAG to success ?
Is there a possibility to prevent the scheduler from triggering a DAG as long as there is still a running instance from the same DAG?
Thanks!
Found the answer in the Docs. Passing the flag max_active_runs while constructing the DAG object, does the trick:
DAG(max_active_runs=1)
https://airflow.apache.org/docs/stable/_api/airflow/models/dag/index.html