I'm trying to implement a way to get notified when my dag fails.
I tried to use the email_on_failure and a webhook method ( https://code.mendhak.com/Airflow-MS-Teams-Operator/ ).
But for both of them, I got a notification for every task that failed.
Is there a way to get notified only if the whole dag doesn't work?
I really appreciate any help you can provide.
You can choose to set on_failure_callback on operator level or on DAG level.
On Dag - A function to be called when a DagRun of this dag fails.
On Operator - a function to be called when a task instance
of this task fails.
In your case you need to set on_failure_callback in your DAG object:
dag = DAG(
dag_id=dag_id,
on_failure_callback=func_to_execute,
)
Related
I followed the docs and created slack function:
It does work and I get notifications in channel, but get the name of the task and link to log are to another task, and not to the one that gets failed.
It gets the context of the upstream failed task, but not the failed task itself:
I tried with different operators and hooks, but get the same result.
If anyone could help, I would really appreciate it.
Thank you!
The goal of the on_failure_callback argument on the Dag level, is running this callback once when the DagRun fails, so we provide the context of the DagRun which is identical between the task instances, for that we don't care which task instance context we provide (I think we provide the context of the last defined task in the dag regardless its state).
If you want to run the callback on each failed ti, you can remove the on_failure_callback argument from the dag and add it to the default args: default_args=dict(on_failure_callback=task_fail_slack_alert).
I have a question about the TriggerDagRunOperator , specifically the wait_for_completion parameter.
Before moving to Airflow 2.2, we used this operator to trigger another DAG and a ExternalTaskSensor to wait for its completion.
In Airflow 2.2, there is a new parameter that is called wait_for_completion that if sets to True, will make the task complete only when the triggered DAG completed.
This is great, but I was wondering about wether the worker will be released between pokes or not. I know that the ExternalTaskSensor used to have a parameter reschedule that you can use for pokes larger than 1m which will release the worker slot between pokes - but I don’t see it in the documentation anymore.
My question is if the wait_for_completion parameter causes the operator to release the worker between pokes or not? From looking at the code I don’t think that is the case, so I just want to verify.
If it isn’t releasing the worker and the triggered DAG is bound to take more than 1m to finish, what should be the best approach here?
We are using MWAA Airflow 2.2 so I guess deferred operators are not an option (if it is a solution in this case)
When using wait_for_completion=True in TriggerDagRunOperator the worker will not be released as long as the operator is running. You can see that in the operator implementation. The operator use time.sleep(self.poke_interval)
As you pointed there are two ways to achieve the goal of verifying the triggered dag completed:
DAG A Using TriggerDagRunOperator followed by ExternalTaskSensor
Using TriggerDagRunOperator with wait_for_completion=True
However other than resources issue which you mentioned the two options are not really equivalent.
In option 1 if the triggered DAG fails then the ExternalTaskSensor will fail.
In option 2 consider:
from airflow.operators.trigger_dagrun import TriggerDagRunOperator
my_op = TriggerDagRunOperator (
task_id='task',
trigger_dag_id="dag_b",
...,
wait_for_completion=True,
retries=2
)
if the dag_b fails then TriggerDagRunOperator will retry which will invoke another DagRun of dag_b.
Both options are valid. You need to decide which behavior suitable for your use case.
on Airflow, we currently are using the {{ prev_execution_date_success }} at the dag level to execute queries.
I was wondering if it was possible to have it by task (i.e. retrieving the last successful execution date of a task in particular and not of the whole DAG)
Thanks for your help
from the current DAG run you can access to the task instance and look up for the previous task in success state.
from airflow.models.taskinstance import TaskInstance
from airflow.utils.state import State
ti = TaskInstance(task_id=your_task_id,
dag_id=your_task_id,
execution_date=execution_date)
prev_task_success_state = ti.get_previous_ti(state=State.SUCCESS)
Note that get_previous_ti returns TaskInstance object thus you can access anything related to the task.
I want to set up a dag there are few cases that I would like to address while creating the dag.
Next run of the dag should be skiped if the current dag is in execution or failed, using catchup=False and max_active_runs=1 for this, do I need to use wait_for_downstream for this?
Should it be skipped or not scheduled to run.... an argument of depends_on_past: True in your DAG args might be what you need depending on your requirements.
I have DAG with ExternalTaskSensor in it. I correctly set execution_delta and all work perfectly unless I want to run that DAG manually. My ExternalTaskSensor has state running and after timeout interval it's failed with exception airflow.exceptions.AirflowSensorTimeout: Snap. Time is OUT.
I know way to run it - after manually triggered DAG set every ExternalTaskSensor in DAG to state success in web interface.
But is the better way to run it manually without to set every ExternalTaskSensor in DAG to success ?