Is it possible to setup Nagios alerts for airflow dags?
In case the dag is failed, I need to alert the respective groups.
You can add an "on_failure_callback" to any task which will call an arbitrary failure handling function. In that function you can then send an error call to Nagios.
For example:
dag = DAG(dag_id="failure_handling",
schedule_interval='#daily')
def handle_failure(context):
# first get useful fields to send to nagios/elsewhere
dag_id = context['dag'].dag_id
ds = context['ds']
task_id = context['ti'].task_id
# instead of printing these out - you can send these to somewhere else
logging.info("dag_id={}, ds={}, task_id={}".format(dag_id, ds, task_id))
def task_that_fails(**kwargs):
raise Exception("failing test")
task_to_fail = PythonOperator(
task_id='python_task_to_fail',
python_callable=task_that_fails,
provide_context=True,
on_failure_callback=handle_failure,
dag=dag)
If you run a test on this:
airflow test failure_handling task_to_fail 2018-08-10
You get the following in your log output:
INFO - dag_id=failure_handling, ds=2018-08-10, task_id=task_to_fail
Related
I'm working with Airflow 2.1.4 and looking to find the status of the prior task run (Task Run, not Task Instance and not Dag Run).
I.e., DAG MorningWorkflow runs a 9:00am, and task ConditionalTask is in that dag. There is some precondition logic that will throw an AirflowSkipException in a number of situations (including timeframe of day and other context-specific information to reduce the likelihood of collisions with independent processes)
If ConditionalTask fails, we can fix the issue, clear the failed run, and re-run it without running the entire DAG. However, the skip logic reruns and will often now skip it, even though the original conditions were non-skipping.
So, I want to update the precondition logic to never skip if this taskinstance ran previously and failed. I can determine if the taskinstance ran previously using TaskInstance.try_number orTaskInstance.prev_attempted_tries, but this doesn't tell me whether it actually tried to run originally or if it skipped (i.e., if we clear the entire DagRun to rerun the whole workflow, we would want it to still skip).
An alternative would be to determine whether the first attempted run was skipped or not.
#Kevin Crouse
In order to answer your question, we can take advantage of from airflow.models import DagRun
To provide you with a complete, answer I have created two functions to assist you in resolving similar quandaries in the future.
How to return the overall state/success of a specific dag_id passed as a function arg?
def get_last_dag_run_status(dag_id):
""" Returns the status of the last dag run for the given dag_id
1. Utilise the find method of DagRun class
2. Step 1 returns a list, so we sort it by the last execution date
3. I have returned 2 examples for you to see a) the state, b) the last execution date, you can explore this further by just returning last_dag_run[0]
Args:
dag_id (str): The dag_id to check
Returns:
List - The status of the last dag run for the given dag_id
List - The last execution date of the dag run for the given dag_id
"""
last_dag_run = DagRun.find(dag_id=dag_id)
last_dag_run.sort(key=lambda x: x.execution_date, reverse=True)
return [last_dag_run[0].state, last_dag_run[0].execution_date]
How to return the status of a specific task_id, within a specific dag_id?
def get_task_status(dag_id, task_id):
""" Returns the status of the last dag run for the given dag_id
1. The code is very similar to the above function, I use it as the foundation for many similar problems/solutions
2. The key difference is that in the return statement, we can directly access the .get_task_instance passing our desired task_id and its state
Args:
dag_id (str): The dag_id to check
task_id (str): The task_id to check
Returns:
List - The status of the last dag run for the given dag_id
"""
last_dag_run = DagRun.find(dag_id=dag_id)
last_dag_run.sort(key=lambda x: x.execution_date, reverse=True)
return last_dag_run[0].get_task_instance(task_id).state
I hope this helps you in your journey to resolve your issues.
For posterity, here is a complete dummy Dag to demonstrate the 2 functions working.
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import PythonOperator
from airflow.models import DagRun
from datetime import datetime
def get_last_dag_run_status(dag_id):
""" Returns the status of the last dag run for the given dag_id
Args:
dag_id (str): The dag_id to check
Returns:
List - The status of the last dag run for the given dag_id
List - The last execution date of the dag run for the given dag_id
"""
last_dag_run = DagRun.find(dag_id=dag_id)
last_dag_run.sort(key=lambda x: x.execution_date, reverse=True)
return [last_dag_run[0].state, last_dag_run[0].execution_date]
def get_task_status(dag_id, task_id):
""" Returns the status of the last dag run for the given dag_id
Args:
dag_id (str): The dag_id to check
task_id (str): The task_id to check
Returns:
List - The status of the last dag run for the given dag_id
"""
last_dag_run = DagRun.find(dag_id=dag_id)
last_dag_run.sort(key=lambda x: x.execution_date, reverse=True)
return last_dag_run[0].get_task_instance(task_id).state
with DAG(
'stack_overflow_ans_1',
tags = ['SO'],
start_date = datetime(2022, 1, 1),
schedule_interval = None,
catchup = False,
is_paused_upon_creation = False
) as dag:
t1 = DummyOperator(
task_id = 'start'
)
t2 = PythonOperator(
task_id = 'get_last_dag_run_status',
python_callable = get_last_dag_run_status,
op_args = ['YOUR_DAG_NAME'],
do_xcom_push = False
)
t3 = PythonOperator(
task_id = 'get_task_status',
python_callable = get_task_status,
op_args = ['YOUR_DAG_NAME', 'YOUR_DAG_TASK_WITHIN_THE_DAG'],
do_xcom_push = False
)
t4 = DummyOperator(
task_id = 'end'
)
t1 >> t2 >> t3 >> t4
I want to get the status of a task from an external DAG. I have the same tasks running in 2 different DAGs based on some conditions. So, I want to check the status of this task in DAG2 from DAG1. If the task status is 'running' in DAG2, then I will skip this task in DAG1.
I tried using:
dag_runs = DagRun.find(dag_id=dag_id,execution_date=exec_dt)
for dag_run in dag_runs:
dag_run.state
I couldn't figure out if we can get task status using DagRun.
If I use TaskDependencySensor, the DAG will have to wait until it finds the allowed_states of the task.
Is there a way to get the current status of a task in another DAG?
I used below code to get the status of a task from another DAG:
from airflow.api.common.experimental.get_task_instance import get_task_instance
def get_dag_state(execution_date, **kwargs):
ti = get_task_instance('dag_id', 'task_id', execution_date)
task_status = ti.current_state()
return task_status
dag_status = BranchPythonOperator(
task_id='dag_status',
python_callable=get_dag_state,
dag=dag
)
More details can be found here
I have list that I loop to create the tasks. The list are static as far as size.
for counter, account_id in enumerate(ACCOUNT_LIST):
task_id = f"bash_task_{counter}"
if account_id:
trigger_task = BashOperator(
task_id=task_id,
bash_command="echo hello there",
dag=dag)
else:
trigger_task = BashOperator(
task_id=task_id,
bash_command="echo hello there",
dag=dag)
trigger_task.status = SKIPPED # is there way to somehow set status of this to skipped instead of having a branch operator?
trigger_task
I tried this manually but cannot make the task skipped:
start = DummyOperator(task_id='start')
task1 = DummyOperator(task_id='task_1')
task2 = DummyOperator(task_id='task_2')
task3 = DummyOperator(task_id='task_3')
task4 = DummyOperator(task_id='task_4')
start >> task1
start >> task2
try:
start >> task3
raise AirflowSkipException
except AirflowSkipException as ase:
log.error('Task Skipped for task3')
try:
start >> task4
raise AirflowSkipException
except AirflowSkipException as ase:
log.error('Task Skipped for task4')
yes there you need to raise AirflowSkipException
from airflow.exceptions import AirflowSkipException
raise AirflowSkipException
For more information see the source code
Have a fixed number of tasks to execute per DAG. This is really fine and this is also planning how much max parallel task your system should handle without degrading downstream systems. Also, having fixed number of tasks makes it visible in the web UI and give you indication whether they are executed or skipped.
In the code below, I initialized the list with None items and then update the list with values based on returned data from the DB. In the python_callable function, check if the account_id is None then raise an AirflowSkipException, otherwise execute the function. In the UI, the tasks are visible and indicates whether executed or skipped(meaning there is no account_id)
def execute(account_id):
if account_id:
print(f'************Executing task for account_id:{account_id}')
else:
raise AirflowSkipException
def create_task(task_id, account_id):
return PythonOperator(task_id=task_id,
python_callable=execute,
op_args=[account_id])
list_from_dbhook = [1, 2, 3] # dummy list. Get records using DB Hook
# Need to have some fix size. Need to allocate fix resources or # of tasks.
# Having this fixed number of tasks will make this tasks to be visible in UI instead of being purely dynamic
record_size_limit = 5
ACCOUNT_LIST = [None] * record_size_limit
for index, account_id_val in enumerate(list_from_dbhook):
ACCOUNT_LIST[index] = account_id_val
for idx, acct_id in enumerate(ACCOUNT_LIST):
task = create_task(f"task_{idx}", acct_id)
task
could someone please give me step by step manual on how to connect Apache Airflow to Slack workspace.
I created webhook to my channel and what should I do with it next ?
Kind regards
Create a Slack Token from
https://api.slack.com/custom-integrations/legacy-tokens
Use the SlackAPIPostOperator Operator in your DAG as below
SlackAPIPostOperator(
task_id='failure',
token='YOUR_TOKEN',
text=text_message,
channel=SLACK_CHANNEL,
username=SLACK_USER)
The above is the simplest way you can use Airflow to send messages to Slack.
However, if you want to configure Airflow to send messages to Slack on task failures, create a function and add on_failure_callback to your tasks with the name of the created slack function. An example is below:
def slack_failed_task(contextDictionary, **kwargs):
failed_alert = SlackAPIPostOperator(
task_id='slack_failed',
channel="#datalabs",
token="...",
text = ':red_circle: DAG Failed',
owner = '_owner',)
return failed_alert.execute()
task_with_failed_slack_alerts = PythonOperator(
task_id='task0',
python_callable=<file to execute>,
on_failure_callback=slack_failed_task,
provide_context=True,
dag=dag)
Using SlackWebHook (Works only for Airflow >= 1.10.0):
If you want to use SlackWebHook use SlackWebhookOperator in a similar manner:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/slack_webhook_operator.py#L25
Try the new SlackWebhookOperator which is there in Airflow version>=1.10.0
from airflow.contrib.operators.slack_webhook_operator import SlackWebhookOperator
slack_msg = "Hi Wssup?"
slack_test = SlackWebhookOperator(
task_id='slack_test',
http_conn_id='slack_connection',
webhook_token='/1234/abcd',
message=slack_msg,
channel='#airflow_updates',
username='airflow_'+os.environ['ENVIRONMENT'],
icon_emoji=None,
link_names=False,
dag=dag)
Note: Make sure you have slack_connection added in your Airflow connections as
host=https://hooks.slack.com/services/
The full example with SlackWebhookOperator usage as in #kaxil answer:
def slack_failed_task(task_name):
failed_alert = SlackWebhookOperator(
task_id='slack_failed_alert',
http_conn_id='slack_connection',
webhook_token=Variable.get("slackWebhookToken", default_var=""),
message='#here DAG Failed {}'.format(task_name),
channel='#epm-marketing-dev',
username='Airflow_{}'.format(ENVIRONMENT_SUFFIX),
icon_emoji=':red_circle:',
link_names=True,
)
return failed_alert.execute
task_with_failed_slack_alerts = PythonOperator(
task_id='task0',
python_callable=<file to execute>,
on_failure_callback=slack_failed_task,
provide_context=True,
dag=dag)
As #Deep Nirmal Note: Make sure you have slack_connection added in your Airflow connections as
host=https://hooks.slack.com/services/
Problem: I've been trying to find a way to get tasks from a DAG that have no downstream tasks following them.
Why I need it: I'm building an "on success" notification for DAGs. Airflow DAGs have an on_success_callback argument, but problem with that is that it gets triggered after every task success instead of just DAG. I've seen other people approach this problem by creating notification task and appending it to the end. Problem I have with this approach is that many DAGs we're using have multiple ends, and some are auto-generated.
Making sure that all ends are caught manually is tedious.
I've spent hours digging for a way to access data I need to build this.
Sample DAG setup:
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime
default_args = {
'owner': 'airflow',
'start_date': datetime(2018, 7, 29)}
dag = DAG(
'append_to_end',
description='append a tast to all tasks without downstream',
default_args=default_args,
schedule_interval='* * * * *',
catchup=False)
task_1 = DummyOperator(dag=dag, task_id='task_1')
task_2 = DummyOperator(dag=dag, task_id='task_2')
task_3 = DummyOperator(dag=dag, task_id='task_3')
task_1 >> task_2
task_1 >> task_3
This produces following DAG:
What I want to achieve is an automated way to include a new task to a DAG that connects to all ends, like in an image below.
I know it's an old post, but I've had a similar need as the above posted.
You can add to your return function a statement that doesn't return your "final_task" id, and so it won't be added to the get_leaf_task return, something like:
def get_leaf_tasks(dag):
return [task for task_id, task in dag.task_dict.items() if len(task.downstream_list) == 0 and task_ids != 'final_task']
Additionally, you can change this part:
for task in leaf_tasks:
task >> final_task
to:
get_leaf_tasks(dag) >> final_task
Since it already gives you a list of task instances and the bitwise operator ">>" will do the loop for you.
What I've got to so far is code below:
def get_leaf_tasks(dag):
return [task for task_id, task in dag.task_dict.items() if len(task.downstream_list) == 0]
leaf_tasks = get_leaf_tasks(dag)
final_task = DummyOperator(dag=dag, task_id='final_task')
for task in leaf_tasks:
task >> final_task
It produces the result I want, but what I don't like about this solution is that get_leaf_tasks must be executed before final_task is created, or it will be included in leaf_tasks list and I'll have to find ways to exclude it.
I could wrap assignment in another function:
def append_to_end(dag, task):
leaf_tasks = get_leaf_tasks(dag)
dag.add_task(task)
for task in leaf_tasks:
task >> final_task
final_task = DummyOperator(task_id='final_task')
append_to_end(dag, final_task)
This is not ideal either, as caller must ensure they've created a final_task without DAG assigned to it.