Airflow manually run DAG with ExternalTaskSensor - airflow

I have DAG with ExternalTaskSensor in it. I correctly set execution_delta and all work perfectly unless I want to run that DAG manually. My ExternalTaskSensor has state running and after timeout interval it's failed with exception airflow.exceptions.AirflowSensorTimeout: Snap. Time is OUT.
I know way to run it - after manually triggered DAG set every ExternalTaskSensor in DAG to state success in web interface.
But is the better way to run it manually without to set every ExternalTaskSensor in DAG to success ?

Related

Airflow Dag run marked as success although the tasks didn't run

Recently, we have been getting some errors on airflow where certain dags will not run any tasks but are being marked as complete.
We had the start_date using days_ago from airflow.
from airflow.utils.dates import days_ago
From: https://forum.astronomer.io/t/dag-run-marked-as-success-but-no-tasks-even-started/1423
If you see dag runs that are marked as success but don’t have any task runs, this means the dag runs’ execution_date was earlier than the dag’s start_date.
This is most commonly seen when the start_date is set to some dynamic value e.g. airflow.utils.dates.days_ago(0). This creates the opportunity for the execution date of a delayed dag execution to be before what the dag now thinks is it’s start_date. This can even happen in a cyclic pattern, where a few dagruns will work, and then at the beginning of every day a dagrun will experience this problem.
This simplest way to avoid this problem is the never use dynamic start_date. It is always better to specify a static start_date. If you are concerned about accidentally triggering multiple runs of the same dag, just set catchup=False.
There is an open ticket in Airflow project with this issue: https://github.com/apache/airflow/issues/17977

Datadog airflow alert when a dag is paused

We have encountered a scenario recently where someone mistakenly turned off a production dag, and we want to get alert whenever a dag is paused using datadog.
I have checked https://docs.datadoghq.com/integrations/airflow/?tab=host
But have not got any metric for dag to check if it is paused or not.
I can run a custom script in datadog as well.
One of the method is that I exec into postgres pod and get the list of active dags:
select * from dag where is_paused=true;
Or is there any other way I can get the unpaused dag list and also when new dag is added what is the best way to handle it.
I want the alert whenever a unpaused dag is paused.
If you are on Airflow 2 you can use the REST API to query for state of the DAG.
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dag
There is "is_paused" field.
And of you are not Airflow 2, you should be. Airflow 1.10 is end-of-life and will not receive any fixes (including critical security fixes) so you should upgrade as soon as you can.

Start Airflow dag automatically once it is available in dagbag

Is there anyway to start the dag automatically without manually triggering dag once it is available in dagbag, considering is_paused_upon_creation=false is set.
Please, try to create your DAG with schedule_interval='#once'. It should works.

Airflow - pause/unpause individual dagruns of the same DAG running in parallel

We are currently evaluating airflow for a project. I was wondering if there is a way to stop/start individual dagruns while running a DAG multiple times in parallel. Pause/unpause on dag_id seems to pause/unpause all the dagruns under a dag. Instead we want to pause individual dagruns (or tasks within them). Let me know if this is achievable in airflow.
If its not possible, here are other alternatives I am thinking of, let me know your opinion on these
Change task state. – Change all tasks under a dagrun to Mark Failed or Success. That way that particular dagrun is stopped on its tracks without affecting other dagruns.
Airflow sensor to pull this information from s3 or http or sql or somewhere to pause current dagrun. And have a task to check on s3 everytime if this dagrun needs to be stopped (not other dagruns).
subdags. - Can we pause/unpause subdags. That way for each parallel user's request we want to do we issue a subdag and we can pause userAs subdag without impacting other user’s subdags.
There's nothing "baked" into Airflow to support this but you could (ab)use the state of the DagRun by changing it to "failed" to pause and then back to "running" to resume; you won't be able to blanket unpause but for testing it should be workable.

Is there a way to to setup airflow dags such that dag-a won't run if dag-b is still running and vice versa?

I have multiple dags that run on different cadence: some weekly, some daily etc. I want it to setup such that while dag-a is running, dag-b should wait until it is completed. Also, if dag-b is running dag-a should wait until dag-b completes, etc. Is there a way to do this in airflow out of the box?
What you are looking for is probably the ExternalTaskSensor
Airflow's Cross-DAG Dependencies description is also pretty useful.
If you are using this, there is also the Airflow DAG dependencies plugin, which can be pretty useful for visualizating those dependencies.
You could use the sensor operator to sense the dag runs or a task in a dag run. External task sensor is the best bet. Be careful how you set the timedelta passed. In general, the idea is to specify the when should the sensor be able to find the dag run.
Eg:
If the main dag is scheduled at 4 UTC, and a task sensor is a task in the dag like below
ExternalTaskSensor(
dag=dag,
task_id='dag_sensor_{}'.format(key),
external_dag_id=key,
timedelta=timedelta(days=1),
external_task_id=None,
mode='reschedule',
check_existence=True
)
Then the other dag that should get sensed must be triggering a run at 4.00UTC. That one day difference is set to offset the difference of execution date and current date

Resources