Aggregate failure notifications into a single email for a DAG in Airflow - airflow

So I have a DAG with lots of tasks, while some of them do fail during the day. I'm wondering if there is a way to aggregate all the failure notifications into a single email for a DAG in Airflow

Two main ways to go about this:
Having a task at the end of your DAG that uses a callback which will retrieve all failed tasks in that DAG and send an email with the info:
from airflow.utils.email import send_email
def custom_callback(context):
dag_run = context.get("dag_run")
task_instances = dag_run.get_task_instances(state="failed")
msg = f"These ti failed: {task_instances}"
subject = f"Failed tasks for DAG {dag_run}"
send_email(to=<your email>, subject=subject, html_content=msg)
(adapted from source)
Using a call to the Airflow API listing tasks with a state failed for a specifc dag or dag_run.

Related

Airflow - Notification on failure of the whole dag

I'm trying to implement a way to get notified when my dag fails.
I tried to use the email_on_failure and a webhook method ( https://code.mendhak.com/Airflow-MS-Teams-Operator/ ).
But for both of them, I got a notification for every task that failed.
Is there a way to get notified only if the whole dag doesn't work?
I really appreciate any help you can provide.
You can choose to set on_failure_callback on operator level or on DAG level.
On Dag - A function to be called when a DagRun of this dag fails.
On Operator - a function to be called when a task instance
of this task fails.
In your case you need to set on_failure_callback in your DAG object:
dag = DAG(
dag_id=dag_id,
on_failure_callback=func_to_execute,
)

Setting multiple DAG dependency in Airflow

I have multiple Ingestion DAGs -> 'DAG IngD1', 'DAG IngD2', 'DAG IngD3' , and so on which ingest data for individual tables.
After the ingestion DAGs are completed successfully, I want to run a single transformation DAG -> 'DAG Tran'. Which means that the DAG 'DAG Tran' should be triggered only when all the ingestion Dags 'DAG IngD1', 'DAG IngD2' and 'DAG IngD3' have successfully finished.
To achieve this if I use the ExternalTaskSensor operator, the external_dag_id parameter is a string and not a list. Which means that I need to have three ExternalTaskSensor operator in my 'DAG Tran' for each ingestion DAG? Is my understanding correct or is there an easy way?
Currently, meet dag dependency management problem too.
My solution is to set a mediator(dag) to use task flow to show dag dependency.
# create mediator_dag to show dag dependency
mediator_dag():
trigger_dag_a = TriggerDagRunOperator(dagid="a")
trigger_dag_b = TriggerDagRunOperator(dagid="b")
trigger_dag_c = TriggerDagRunOperator(dagid="c")
# taskflow
trigger_dag_a >> [trigger_dag_b, trigger_dag_c]
Cross-DAG dependencies in Apache Airflow This article might help you !

Airflow: How to only send email alerts when it is not a consecutive failure?

I've an airflow dag that executes 10 tasks (exporting different data from the same source) in parallel, every 15min. I've also enabled 'email_on_failure' to get notified on failures.
Once every month or so, the tasks start failing for a couple of hours due to the data source not being available. Causing airflow to generate hundreds of emails (10 emails every 15min.), until the raw data source is available again.
Is there a better way to avoid being spammed with emails once consecutive runs fail to succeed?
For example, is it possible to only send an email on failure once it is the first run that start failing (i.e. previous run was successful)?
To customise the logic in callbacks you can use on_failure_callback and define a python function to call on failure/success. in this function you can access the task instance.
A property on this task instance is try_number - which you can check before sending an alert. An example could be:
some_operator = BashOperator(
task_id="some_operator",
bash_command="""
echo "something"
""",
on_failure_callback=task_fail_email_alert,
dag=dag,
def task_fail_email_alert(context):
try_number = context["ti"].try_number
if try_number == 1:
# send alert
else:
# do nothing
You can them implement the code to send an email in this function, rather than use the builtin email_on_failure. The EmailOperator is available by importing from airflow.operators.email import EmailOperator.
Giving consideration that your tasks are running concurrently and one or multiple failures could occur, I would suggest to treat the dispatch of failure messages as one would a shared resource.
You need to implement a lock that is "dagrun-aware" –– one that knows about the DagRun.
You can back this lock using an in-memory database like Redis, an object store like S3, system file, or a database. How you choose to implement this up to you.
In your on_failure_callback implementation, you must acquire said Lock. If acquisition of said Lock is successful, carry on to dispatch the email. Otherwise, pass.
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
class OnlyOnceLock:
def __init__(self, run_id):
self.run_id = run_id
def acquire(self):
# Returns False if run_id already exists in a backing store.
# S3 example
hook = S3Hook()
key = self.run_id
bucket_name = 'coordinated-email-alerts'
try:
hook.head_object(key, bucket_name)
return False
except:
# This is the first time lock is acquired
hook.load_string('fakie', key, bucket_name)
return True
def __enter__(self):
return self.acquire()
def __exit__(self, exc_type, exc_val, exc_tb):
pass
def on_failure_callback(context):
error = context['exception']
task = context['task']
run_id = context['run_id']
ti = context['ti']
with OnlyOnceLock(run_id) as lock:
if lock:
ti.email_alert(error, task)

Is it possible to retrieve the last successful task execution date during a task execution on airflow?

on Airflow, we currently are using the {{ prev_execution_date_success }} at the dag level to execute queries.
I was wondering if it was possible to have it by task (i.e. retrieving the last successful execution date of a task in particular and not of the whole DAG)
Thanks for your help
from the current DAG run you can access to the task instance and look up for the previous task in success state.
from airflow.models.taskinstance import TaskInstance
from airflow.utils.state import State
ti = TaskInstance(task_id=your_task_id,
dag_id=your_task_id,
execution_date=execution_date)
prev_task_success_state = ti.get_previous_ti(state=State.SUCCESS)
Note that get_previous_ti returns TaskInstance object thus you can access anything related to the task.

Airflow - mark a specific task_id of given dag_id and run_id as success or failure

Can I externally(use a http request ?) to mark a specific task_id associated with dag_id and run_id as success/failure.
My task is a long running task on external system and I don't want my task to poll the system to find the status.. since we can probably have several 1000 task running at same time ..
Ideally want my task to
make a http request to start my external job
go to sleep
once the job is finished, it(External system or the post build action of my job) informs airflow that the task is done (identified by task_id, dag_id and run_id)
Thanks
You can solve this by sending SQL queries directly into Airflow's metadata DB:
UPDATE task_instance
SET state = 'success',
try_number = 0
WHERE
task_id = 'YOUR-TASK-ID'
AND
dag_id = 'YOUR-DAG-ID'
AND
execution_date = '2019-06-27T16:56:17.789842+00:00';
Notes:
The execution_date filter is crucial, Airflow identifies DagRuns by execution_date, not really by their run_id. This means you really need to get your DagRun's execution/run date to make it work.
The try_number = 0 part is added because sometimes Airflow will reset the task back to failed if it notices that try_number is already at its limit (max_tries)
You can see it in Airflow's source code here: https://github.com/apache/airflow/blob/750cb7a1a08a71b63af4ea787ae29a99cfe0a8d9/airflow/models/dagrun.py#L203
Airflow doesnt yet have a Rest endpoint. However you have a couple of options
- Using the airflow command line utilities to mark the jobs to success. E.g. In python using Popen.
- Directly update the Airflow DB table task_instance

Resources