airflow trigger_rule using ONE_FAILED cause dag failure - airflow

what i wanted to achieve is to create a task where will send notification if any-one of the task under the dag is failed. I am applying trigger rule to the task where:
batch11 = BashOperator(
task_id='Error_Buzz',
trigger_rule=TriggerRule.ONE_FAILED,
bash_command='python /home/admin/pythonwork/home/codes/notifications/dagLevel_Notification.py') ,
dag=dag,
catchup = False
)
batch>>batch11
batch1>>batch11
The problem for now is when there no other task failed, the batch11 task will not execute due to trigger_rule, which is what i wanted, but it will result the dag failure since the default trigger_rule for dag is ALL_SUCCESS. Is there a way to end the loop hole to make the dag runs successfully ?
screenshot of outcome :

We do something similar in our Airflow Deployment. The idea is to notify slack when a task in a dag fails. You can set a dag level configuration on_failure_callback as documented https://airflow.apache.org/code.html#airflow.models.BaseOperator
on_failure_callback (callable) – a function to be called when a task
instance of this task fails. a context dictionary is passed as a
single parameter to this function. Context contains references to
related objects to the task instance and is documented under the
macros section of the API.
Here is an example of how I use it. if any of the task fails or succeeds airflow calls notify function and I can get notification wherever I want.
import sys
import os
from datetime import datetime, timedelta
from airflow.operators.python_operator import PythonOperator
from airflow.models import DAG
from airflow.utils.dates import days_ago
from util.airflow_utils import AirflowUtils
schedule = timedelta(minutes=5)
args = {
'owner': 'user',
'start_date': days_ago(1),
'depends_on_past': False,
'on_failure_callback': AirflowUtils.notify_job_failure,
'on_success_callback': AirflowUtils.notify_job_success
}
dag = DAG(
dag_id='demo_dag',
schedule_interval=schedule, default_args=args)
def task1():
return 'Whatever you return gets printed in the logs!'
def task2():
return 'cont'
task1 = PythonOperator(task_id='task1',
python_callable=task1,
dag=dag)
task2 = PythonOperator(task_id='task2',
python_callable=task1,
dag=dag)
task1 >> task2

Related

Airflow triggering the "on_failure_callback" when the "dagrun_timeout" is exceeded

Currently working on setting up alerts for long running tasks in Airflow. To cancel/fail the airflow dag I've put "dagrun_timeout" in the default_args, and it does what I need, fails/errors the dag when its been running for too long (usually stuck). The only problem is that the function in "on_failure_callback" doesn't get called when the dagrun_timeout is exceeded, because the "on_failure_callback" is on the task level (I think) while the dagrun_timeout is on the dag level.
How can I execute the "on_failure_callback" when the dagrun_timeout is exceeded, or how can I specify a function to be called when a dag fails? Or should I re-think my approach?
Try setting on_failure_callback during DAG declaration:
with DAG(
dag_id="failure_callback_example",
on_failure_callback=_on_dag_run_fail,
...
) as dag:
...
The explanation is that on_failure_callback defined in default_args will get passed only to the Tasks being created and not to the DAG object.
Here is an example to try this behaviour:
from datetime import datetime, timedelta
from airflow import DAG
from airflow.models import TaskInstance
from airflow.operators.bash import BashOperator
def _on_dag_run_fail(context):
print("***DAG failed!! do something***")
print(f"The DAG failed because: {context['reason']}")
print(context)
def _alarm(context):
print("** Alarm Alarm!! **")
task_instance: TaskInstance = context.get("task_instance")
print(f"Task Instance: {task_instance} failed!")
default_args = {
"owner": "mi_empresa",
"email_on_failure": False,
"on_failure_callback": _alarm,
}
with DAG(
dag_id="failure_callback_example",
start_date=datetime(2021, 9, 7),
schedule_interval=None,
default_args=default_args,
catchup=False,
on_failure_callback=_on_dag_run_fail,
dagrun_timeout=timedelta(seconds=45),
) as dag:
delayed = BashOperator(
task_id="delayed",
bash_command='echo "waiting..";sleep 60; echo "Done!!"',
)
will_fail = BashOperator(
task_id="will_fail",
bash_command="exit 1",
# on_failure_callback=_alarm,
)
delayed >> will_fail
You can find the logs of the callbacks execution in the Scheduler logs AIRFLOW_HOME/logs/scheduler/date/failure_callback_example :
[2021-09-24 13:12:34,285] {logging_mixin.py:104} INFO - [2021-09-24 13:12:34,285] {dag.py:862} INFO - Executing dag callback function: <function _on_dag_run_fail at 0x7f83102e8670>
[2021-09-24 13:12:34,336] {logging_mixin.py:104} INFO - ***DAG failed!! do something***
[2021-09-24 13:12:34,345] {logging_mixin.py:104} INFO - The DAG failed because: timed_out
Edit:
Within the context dict the key reason is passed in order to specify the cause of the DAG run failure. Some values are: 'reason': 'timed_out' or 'reason': 'task_failure' . This could be use to perfom specific behaviour in the callback based on the reason of the DAG Run failure.

Is it possible to have a pipeline in Airflow that does not tie to any schedule?

I need to have pipeline that will be executed either manually or programmatically, is possible with Airflow? Looks like right now each workflow MUST be tied to a schedule.
Just set the schedule_interval to None when you create the DAG:
dag = DAG('workflow_name',
template_searchpath='path',
schedule_interval=None,
default_args=default_args)
From the Airflow Manual:
Each DAG may or may not have a schedule, which informs how DAG Runs
are created. schedule_interval is defined as a DAG arguments, and
receives preferably a cron expression as a str, or a
datetime.timedelta object.
The manual then goes on to list some cron 'presets' one of which is None.
Yes, this can be achieved by passing None to schedule_interval in default_args.
Check this documation on DAG run.
For example:
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2015, 12, 1),
'email': ['airflow#example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
'schedule_interval': None, # Check this line
}
In Airflow, every DAG is required to have a start date and schedule interval*, for example hourly:
import datetime
dag = DAG(
dag_id='my_dag',
schedule_interval=datetime.timedelta(hours=1),
start_date=datetime(2018, 5, 23),
)
(Without a schedule how would it know when to run?)
Alternatively to a cron schedule, you can set the schedule to #once to only run once.
*One exception: You can omit the schedule for externally triggered DAGs because Airflow will not schedule them itself.
However, that said, if you omit the schedule, then you need to trigger the DAG externally somehow. If you want to be able to call a DAG programmatically, for instance, as a result of a separate condition occurring in another DAG, you can do that with the TriggerDagRunOperator. You might also hear this idea called externally triggered DAGs.
Here's a usage example from the Airflow Example DAGs:
File 1 - example_trigger_controller_dag.py:
"""This example illustrates the use of the TriggerDagRunOperator. There are 2
entities at work in this scenario:
1. The Controller DAG - the DAG that conditionally executes the trigger
2. The Target DAG - DAG being triggered (in example_trigger_target_dag.py)
This example illustrates the following features :
1. A TriggerDagRunOperator that takes:
a. A python callable that decides whether or not to trigger the Target DAG
b. An optional params dict passed to the python callable to help in
evaluating whether or not to trigger the Target DAG
c. The id (name) of the Target DAG
d. The python callable can add contextual info to the DagRun created by
way of adding a Pickleable payload (e.g. dictionary of primitives). This
state is then made available to the TargetDag
2. A Target DAG : c.f. example_trigger_target_dag.py
"""
from airflow import DAG
from airflow.operators.dagrun_operator import TriggerDagRunOperator
from datetime import datetime
import pprint
pp = pprint.PrettyPrinter(indent=4)
def conditionally_trigger(context, dag_run_obj):
"""This function decides whether or not to Trigger the remote DAG"""
c_p = context['params']['condition_param']
print("Controller DAG : conditionally_trigger = {}".format(c_p))
if context['params']['condition_param']:
dag_run_obj.payload = {'message': context['params']['message']}
pp.pprint(dag_run_obj.payload)
return dag_run_obj
# Define the DAG
dag = DAG(dag_id='example_trigger_controller_dag',
default_args={"owner": "airflow",
"start_date": datetime.utcnow()},
schedule_interval='#once')
# Define the single task in this controller example DAG
trigger = TriggerDagRunOperator(task_id='test_trigger_dagrun',
trigger_dag_id="example_trigger_target_dag",
python_callable=conditionally_trigger,
params={'condition_param': True,
'message': 'Hello World'},
dag=dag)
File 2 - example_trigger_target_dag.py:
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from airflow.models import DAG
from datetime import datetime
import pprint
pp = pprint.PrettyPrinter(indent=4)
# This example illustrates the use of the TriggerDagRunOperator. There are 2
# entities at work in this scenario:
# 1. The Controller DAG - the DAG that conditionally executes the trigger
# (in example_trigger_controller.py)
# 2. The Target DAG - DAG being triggered
#
# This example illustrates the following features :
# 1. A TriggerDagRunOperator that takes:
# a. A python callable that decides whether or not to trigger the Target DAG
# b. An optional params dict passed to the python callable to help in
# evaluating whether or not to trigger the Target DAG
# c. The id (name) of the Target DAG
# d. The python callable can add contextual info to the DagRun created by
# way of adding a Pickleable payload (e.g. dictionary of primitives). This
# state is then made available to the TargetDag
# 2. A Target DAG : c.f. example_trigger_target_dag.py
args = {
'start_date': datetime.utcnow(),
'owner': 'airflow',
}
dag = DAG(
dag_id='example_trigger_target_dag',
default_args=args,
schedule_interval=None)
def run_this_func(ds, **kwargs):
print("Remotely received value of {} for key=message".
format(kwargs['dag_run'].conf['message']))
run_this = PythonOperator(
task_id='run_this',
provide_context=True,
python_callable=run_this_func,
dag=dag)
# You can also access the DagRun object in templates
bash_task = BashOperator(
task_id="bash_task",
bash_command='echo "Here is the message: '
'{{ dag_run.conf["message"] if dag_run else "" }}" ',
dag=dag)

apache airflow - Cannot load the dag bag to handle failure

I have created a on_failure_callback function(refering Airflow default on_failure_callback) to handle task's failure.
It works well when there is only one task in a DAG, however, if there are 2 more tasks, a task is randomly failed since the operator is null, it can resume later by manully . In airflow-scheduler.out the log is:
[2018-05-08 14:24:21,237] {models.py:1595} ERROR - Executor reports
task instance %s finished (%s) although the task says its %s. Was the
task killed externally? NoneType [2018-05-08 14:24:21,238]
{jobs.py:1435} ERROR - Cannot load the dag bag to handle failure for
. Setting task to FAILED without
callbacks or retries. Do you have enough resources?
The DAG code is:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta
import airflow
from devops.util import WechatUtil
from devops.util import JiraUtil
def on_failure_callback(context):
ti = context['task_instance']
log_url = ti.log_url
owner = ti.task.owner
ti_str = str(context['task_instance'])
wechat_msg = "%s - Owner:%s"%(ti_str,owner)
WeChatUtil.notify_team(wechat_msg)
jira_desc = "Please check log from url %s"%(log_url)
JiraUtil.create_incident("DW",ti_str,jira_desc,owner)
args = {
'queue': 'default',
'start_date': airflow.utils.dates.days_ago(1),
'retry_delay': timedelta(minutes=1),
'on_failure_callback': on_failure_callback,
'owner': 'user1',
}
dag = DAG(dag_id='test_dependence1',default_args=args,schedule_interval='10 16 * * *')
load_crm_goods = BashOperator(
task_id='crm_goods_job',
bash_command='date',
dag=dag)
load_crm_memeber = BashOperator(
task_id='crm_member_job',
bash_command='date',
dag=dag)
load_crm_order = BashOperator(
task_id='crm_order_job',
bash_command='date',
dag=dag)
load_crm_eur_invt = BashOperator(
task_id='crm_eur_invt_job',
bash_command='date',
dag=dag)
crm_member_cohort_analysis = BashOperator(
task_id='crm_member_cohort_analysis_job',
bash_command='date',
dag=dag)
crm_member_cohort_analysis.set_upstream(load_crm_goods)
crm_member_cohort_analysis.set_upstream(load_crm_memeber)
crm_member_cohort_analysis.set_upstream(load_crm_order)
crm_member_cohort_analysis.set_upstream(load_crm_eur_invt)
crm_member_kpi_daily = BashOperator(
task_id='crm_member_kpi_daily_job',
bash_command='date',
dag=dag)
crm_member_kpi_daily.set_upstream(crm_member_cohort_analysis)
I had tried to update the airflow.cfg by adding the default memory from 512 to even 4096, but no luck. Would anyone have any advice ?
Ialso try to updated my JiraUtil and WechatUtil as following, encoutering the same error
WechatUtil:
import requests
class WechatUtil:
#staticmethod
def notify_trendy_user(user_ldap_id, message):
return None
#staticmethod
def notify_bigdata_team(message):
return None
JiraUtil:
import json
import requests
class JiraUtil:
#staticmethod
def execute_jql(jql):
return None
#staticmethod
def create_incident(projectKey, summary, desc, assignee=None):
return None
(I'm shooting tracer bullets a bit here, so bear with me if this answer doesn't get it right on the first try.)
The null operator issue with multiple task instances is weird... it would help approaching troubleshooting this if you could boil the current code down to a MCVE e.g., 1–2 operators and excluding the JiraUtil and WechatUtil parts if they're not related to the callback failure.
Here are 2 ideas:
1. Can you try changing the line that fetches the task instance out of the context to see if this makes a difference?
Before:
def on_failure_callback(context):
ti = context['task_instance']
...
After:
def on_failure_callback(context):
ti = context['ti']
...
I saw this usage in the Airflow repo (https://github.com/apache/incubator-airflow/blob/c1d583f91a0b4185f760a64acbeae86739479cdb/airflow/contrib/hooks/qubole_check_hook.py#L88). It's possible it can be accessed both ways.
2. Can you try adding provide_context=True on the operators either as a kwarg or in default_args?

How to add task at run time if task 1 is failed

I want to execute task 2 if task 1 is success if task 1 fails i want to run task 3 and want to assign another flow if required.
Basically i want to run conditional tasks in airflow without ssh operators.
from airflow import DAG
from airflow.operators import PythonOperator,BranchPythonOperator
from airflow.operators import BashOperator
from datetime import datetime, timedelta
from airflow.models import Variable
def t2_error_task(context):
instance = context['task_instance']
if instance.task_id == "performExtract":
print ("Please implement something over this")
task_3 = PythonOperator(
task_id='performJoin1',
python_callable=performJoin1, # maybe main?
dag = dag
)
dag.add_task(task_3)
with DAG(
'manageWorkFlow',
catchup=False,
default_args={
'owner': 'Mannu',
'start_date': datetime(2018, 4, 13),
'schedule_interval':None,
'depends_on_past': False,
},
) as dag:
task_1 = PythonOperator(
task_id='performExtract',
python_callable=performExtract,
on_failure_callback=t2_error_task,
depends_on_past=True
)
task_2 = PythonOperator(
task_id='printSchemas',
depends_on_past=True,
python_callable=printSchemaAll, # maybe main?
)
task_2.set_upstream(task_1)
Adding tasks dynamically based on execution-time statuses is not something Airflow supports. In order to get the desired behaviour, you should add task_3 to your dag but change its trigger_rule to all_failed. In this case, the task will get marked as skipped when task_1 succeeds, but it will get executed when it fails.

Triggering A SubDag

EDITED
I have edited this question by considering the inputs from #tobi6
I copied the subdag operator from Airflow source code
Source code: https://github.com/apache/incubator-airflow/blob/master/airflow/operators/subdag_operator.py
I modified a few things in the execute method. The changes were made to trigger the SubDag and wait until the SubDag completes execution. The trigger is working great but the tasks are not being executed (DAG is in the running/Green state while the tasks are in the null/White state).
Please refer below for the changes I made:
from airflow.exceptions import AirflowException
from airflow.models import BaseOperator, Pool
from airflow.utils.decorators import apply_defaults
from airflow.utils.db import provide_session
from airflow.utils.state import State
from airflow.executors import GetDefaultExecutor
from time import sleep
import logging
from datetime import datetime
class SubDagOperator(BaseOperator):
template_fields = tuple()
ui_color = '#555'
ui_fgcolor = '#fff'
#provide_session
#apply_defaults
def __init__(
self,
subdag,
executor=GetDefaultExecutor(),
*args, **kwargs):
"""
Yo dawg. This runs a sub dag. By convention, a sub dag's dag_id
should be prefixed by its parent and a dot. As in `parent.child`.
:param subdag: the DAG object to run as a subdag of the current DAG.
:type subdag: airflow.DAG
:param dag: the parent DAG
:type subdag: airflow.DAG
"""
import airflow.models
dag = kwargs.get('dag') or airflow.models._CONTEXT_MANAGER_DAG
if not dag:
raise AirflowException('Please pass in the `dag` param or call '
'within a DAG context manager')
session = kwargs.pop('session')
super(SubDagOperator, self).__init__(*args, **kwargs)
# validate subdag name
if dag.dag_id + '.' + kwargs['task_id'] != subdag.dag_id:
raise AirflowException(
"The subdag's dag_id should have the form "
"'{{parent_dag_id}}.{{this_task_id}}'. Expected "
"'{d}.{t}'; received '{rcvd}'.".format(
d=dag.dag_id, t=kwargs['task_id'], rcvd=subdag.dag_id))
# validate that subdag operator and subdag tasks don't have a
# pool conflict
if self.pool:
conflicts = [t for t in subdag.tasks if t.pool == self.pool]
if conflicts:
# only query for pool conflicts if one may exist
pool = (
session
.query(Pool)
.filter(Pool.slots == 1)
.filter(Pool.pool == self.pool)
.first()
)
if pool and any(t.pool == self.pool for t in subdag.tasks):
raise AirflowException(
'SubDagOperator {sd} and subdag task{plural} {t} both '
'use pool {p}, but the pool only has 1 slot. The '
'subdag tasks will never run.'.format(
sd=self.task_id,
plural=len(conflicts) > 1,
t=', '.join(t.task_id for t in conflicts),
p=self.pool
)
)
self.subdag = subdag
self.executor = executor
def execute(self, context):
dag_run = self.subdag.create_dagrun(
conf=context['dag_run'].conf,
state=State.RUNNING,
execution_date=context['execution_date'],
run_id='trig__' + str(datetime.utcnow()),
external_trigger=True
)
while True:
if dag_run.get_state() == State.FAILED or dag_run.get_state() == State.SUCCESS:
break
else:
sleep(10)
continue
Below is the code that shows how I'm using the same
from airflow import DAG
from operators.sd_operator import SubDagOperator # My SubDag Operator
from airflow.operators.python_operator import PythonOperator
import logging
from datetime import datetime
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 7, 17),
'email': ['airflow#example.com'],
'email_on_failure': False,
'email_on_retry': False,
}
def print_dag_details(**kwargs):
logging.info(str(kwargs['dag_run'].conf))
with DAG('example_dag', schedule_interval=None, catchup=False, default_args=default_args) as dag:
task_1 = SubDagOperator(
subdag=sub_dag_func('example_dag', 'sub_dag_1'),
task_id='sub_dag_1'
)
task_2 = SubDagOperator(
subdag=sub_dag_func('example_dag', 'sub_dag_2'),
task_id='sub_dag_2',
)
print_kwargs = PythonOperator(
task_id='print_kwargs',
python_callable=print_dag_details,
provide_context=True
)
print_kwargs >> task_1 >> task_2
Any information you provide would be helpful. Thanks in advance.
It is a bit hard to understand your question without context.
"I copied the subdag operator and modified a few things in the execute method."
From where was this copied?
"The trigger is working great ..."
How does this look like?
There are a few things I saw in the code:
It might be helpful to add assigned fields to the function call of sub_dag_func, e.g. sub_dag_func(subdag='parent_dag'...).
In the binary shift definition, used to set upstream / downstream there are tasks defined I cannot find in the DAG (df_job_1, df_job_2). This might be connected to SubDAGs (haven't looked into them yet).
The name of the sub dag seems inconsistent with the comment in the code saying By convention, a sub dag's dag_id should be prefixed by its parent and a dot but it is sub_dag_1, sub_dag_2

Resources