How to get reason for failure using slack in airflow2.0 - airflow

How to get the reason for the failure of an operator, without going into logs. As I want to post the reason as a notification through slack?
Thanks,
Xi

I can think of one way of doing this as below.
set error notifications -> https://www.astronomer.io/guides/error-notifications-in-airflow/
Also create a slack email alias for DM https://slack.com/help/articles/206819278-Send-emails-to-Slack
Other way is using the Slack API from airflow : https://medium.com/datareply/integrating-slack-alerts-in-airflow-c9dcd155105
Check the above for SlackAPIPostOperator

exception=context.get('exception')is the function which will give exact reason for failure
Example of on_failure_callback using slack:
step_checker = EmrStepSensor(task_id='watch_step',
job_flow_id="{{ task_instance.xcom_pull('create_job_flow',
key='return_value') }}",
step_id="{{task_instance.xcom_pull(task_ids='add_steps',key='return_value')[0] }}",
aws_conn_id='aws_default',
on_failure_callback=task_fail_slack_alert,)
def task_fail_slack_alert(context):
SLACK_CONN_ID = 'slack'
slack_webhook_token = BaseHook.get_connection(SLACK_CONN_ID).password
slack_msg = """
:red_circle: Task Failed.
*Task*: {task}
*Dag*: {dag}
*Execution Time*: {exec_date}
*Log Url*: {log_url}
*Error*:{exception}
""".format(
task=context.get('task_instance').task_id,
dag=context.get('task_instance').dag_id,
exec_date=context.get('execution_date'),
log_url=context.get('task_instance').log_url,
exception=context.get('exception')
)
failed_alert = SlackWebhookOperator(
task_id='slack_test',
http_conn_id='slack',
webhook_token=slack_webhook_token,
message=slack_msg,
username='airflow',
dag=dag)
return failed_alert.execute(context=context)

Related

Pass Information between two SimpleHttpOperators with xom_pull()

I am fairly new to airflow and I am currently trying to pass information between my SimpleHttpOperators.
This is where the data is retrieved:
request_city_information = SimpleHttpOperator(
http_conn_id='overpass',
task_id='basic_city_information',
headers={"Content-Type": "application/x-www-form-urlencoded"},
method='POST',
data=f'[out:json]; node[name={name_city}][capital]; out center;',
response_filter=lambda response: response.json()['elements'][0],
dag=dag,)
And then I want to use the response from this in the following operator:
request_city_attractions = SimpleHttpOperator(
http_conn_id='overpass',
task_id='city_attractions',
headers={"Content-Type": "application/x-www-form-urlencoded"},
method='POST',
data=f"[out:json];(nwr[tourism='attraction'][wikidata](around:{search_radius},"
f"{request_city_information.xcom_pull(context='ti')['lat']}"
f",10););out body;>;out skel qt;",
dag=dag)
As you can see I tried to access the response via request_city_information.xcom_pull(context='ti'). However, my context seems to be wrong here.
As my data is already written into the XComs I take it that I don't need XCOM_push='True', as suggested here.
There seem to be changes to XCom since airflow 2.x as many of the suggested solutions I found do not work for me.
I believe there is a major gap in my thought process, I just don't know where.
I would appreciate any references to examples or help!
Thanks in advance
I have now solved it with a completely different approach, if you guys know how the first one works I would be happy for an explanation on that.
Here is my solution:
with DAG(
'city_info',
default_args=dafault_args,
description='xcom test',
schedule_interval=None,
) as dag:
#TODO: Tasks with conn_id
def get_city_information(**kwargs):
payload = f'[out:json]; node[name={name_city}][capital]; out center;'
#TODO: Request als Connection
r = requests.post('https://overpass-api.de/api/interpreter', data=payload)
ti = kwargs['ti']
ti.xcom_push('basic_city_information', r.json())
get_city_information_task = PythonOperator(
task_id='get_city_information_task',
python_callable=get_city_information
)
def get_city_attractions(**kwargs):
ti = kwargs['ti']
city_information = ti.xcom_pull(task_ids='get_city_information_task', key='basic_city_information')
payload = f"[out:json];(nwr[tourism='attraction'][wikidata](around:{search_radius}" \
f",{city_information['elements'][0]['lat']},{city_information['elements'][0]['lon']}" \
f"););out body;>;out skel qt;"
r = requests.post('https://overpass-api.de/api/interpreter', data=payload)
#TODO: Json as Object
ti.xcom_push('city_attractions', r.json())
get_city_attractions_task = PythonOperator(
task_id='get_city_attractions_task',
python_callable=get_city_attractions
)
get_city_information_task >> get_city_attractions_task

KubernetesPodOperator xcom_push key/values not available to subsequent task with xcom_pull

Here is an example of the KubernetesPodOperator I am trying --
set_tag = KubernetesPodOperator(
namespace='default',
task_id='set-tag',
name='set-tag',
image='ubuntu:18.04',
xcom_push=True,
cmds=["/bin/sh", "-c"],
arguments=['''mkdir /airflow &&
mkdir /airflow/xcom &&
echo '{"test_key":"test_value"}' > /airflow/xcom/return.json
''']
)
In the next downstream PythonOperator, I am trying to fetch this tag as follows -
def print_tag(**kwargs):
ti = kwargs['ti']
print(ti.xcom_pull(task_ids='set-tag', key='test_key'))
get_tag = PythonOperator(
task_id='get-tag',
dag=dag,
python_callable=print_tag,
provide_context=True
)
I am using 'airflow test' to first run task 'set-tag' and then run 'get-tag' hoping to see the 'test_value' printed. But the printed value appears as 'None'.
Any pointers are much appreciated.
Thanks in advance.
For the moment name of argument of KubernetesPodOperator for xcom push is do_xcom_push, not xcom_push
Source code

Airflow : Not receiving an Error message in Email, whenever the DAG/TASK is failed with on_failure_callback

Airflow version 1.10.3
Below is the module code that is been called by on_failure_callback
I have used reason = context.get("exception"), But I get an error as None in the email when the job is failed instead of getting an error message
Output in the email:
Reason for Failure: None
alert_email.py
from airflow.utils.email import send_email
from airflow.models import Variable
def failure_alert(context, config=None):
config = {} if config is None else config
email = config.get('email_id')
task_id = context.get('task_instance').task_id
dag_id = context.get("dag").dag_id
execution_time = context.get("execution_date")
reason = context.get("exception")
dag_failure_html_body = f"""<html>
<header><title>The below DAG has failed!</title></header>
<body>
<b>DAG Name</b>: {dag_id}<br/>
<b>Task Id</b>: {task_id}<br/>
<b>Execution Time (UTC)</b>: {execution_time}<br/>
<b>Reason for Failure</b>: {reason}<br/>
</body>
</html>
"""
try:
send_email(
to=email,
subject=f"Airflow alert: <DagInstance: {dag_id} - {execution_time} [failed]",
html_content=dag_failure_html_body,
)
except Exception as e:
logger.error(f'Error in sending email to address {email}: {e}', exc_info=True)
The issue with the version Airflow 1.10.3. We will be upgrading into Airflow 1.10.10

How to integrate Apache Airflow with slack?

could someone please give me step by step manual on how to connect Apache Airflow to Slack workspace.
I created webhook to my channel and what should I do with it next ?
Kind regards
Create a Slack Token from
https://api.slack.com/custom-integrations/legacy-tokens
Use the SlackAPIPostOperator Operator in your DAG as below
SlackAPIPostOperator(
task_id='failure',
token='YOUR_TOKEN',
text=text_message,
channel=SLACK_CHANNEL,
username=SLACK_USER)
The above is the simplest way you can use Airflow to send messages to Slack.
However, if you want to configure Airflow to send messages to Slack on task failures, create a function and add on_failure_callback to your tasks with the name of the created slack function. An example is below:
def slack_failed_task(contextDictionary, **kwargs):
failed_alert = SlackAPIPostOperator(
task_id='slack_failed',
channel="#datalabs",
token="...",
text = ':red_circle: DAG Failed',
owner = '_owner',)
return failed_alert.execute()
task_with_failed_slack_alerts = PythonOperator(
task_id='task0',
python_callable=<file to execute>,
on_failure_callback=slack_failed_task,
provide_context=True,
dag=dag)
Using SlackWebHook (Works only for Airflow >= 1.10.0):
If you want to use SlackWebHook use SlackWebhookOperator in a similar manner:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/slack_webhook_operator.py#L25
Try the new SlackWebhookOperator which is there in Airflow version>=1.10.0
from airflow.contrib.operators.slack_webhook_operator import SlackWebhookOperator
slack_msg = "Hi Wssup?"
slack_test = SlackWebhookOperator(
task_id='slack_test',
http_conn_id='slack_connection',
webhook_token='/1234/abcd',
message=slack_msg,
channel='#airflow_updates',
username='airflow_'+os.environ['ENVIRONMENT'],
icon_emoji=None,
link_names=False,
dag=dag)
Note: Make sure you have slack_connection added in your Airflow connections as
host=https://hooks.slack.com/services/
The full example with SlackWebhookOperator usage as in #kaxil answer:
def slack_failed_task(task_name):
failed_alert = SlackWebhookOperator(
task_id='slack_failed_alert',
http_conn_id='slack_connection',
webhook_token=Variable.get("slackWebhookToken", default_var=""),
message='#here DAG Failed {}'.format(task_name),
channel='#epm-marketing-dev',
username='Airflow_{}'.format(ENVIRONMENT_SUFFIX),
icon_emoji=':red_circle:',
link_names=True,
)
return failed_alert.execute
task_with_failed_slack_alerts = PythonOperator(
task_id='task0',
python_callable=<file to execute>,
on_failure_callback=slack_failed_task,
provide_context=True,
dag=dag)
As #Deep Nirmal Note: Make sure you have slack_connection added in your Airflow connections as
host=https://hooks.slack.com/services/

How to setup Nagios Alerts for Apache Airflow Dags

Is it possible to setup Nagios alerts for airflow dags?
In case the dag is failed, I need to alert the respective groups.
You can add an "on_failure_callback" to any task which will call an arbitrary failure handling function. In that function you can then send an error call to Nagios.
For example:
dag = DAG(dag_id="failure_handling",
schedule_interval='#daily')
def handle_failure(context):
# first get useful fields to send to nagios/elsewhere
dag_id = context['dag'].dag_id
ds = context['ds']
task_id = context['ti'].task_id
# instead of printing these out - you can send these to somewhere else
logging.info("dag_id={}, ds={}, task_id={}".format(dag_id, ds, task_id))
def task_that_fails(**kwargs):
raise Exception("failing test")
task_to_fail = PythonOperator(
task_id='python_task_to_fail',
python_callable=task_that_fails,
provide_context=True,
on_failure_callback=handle_failure,
dag=dag)
If you run a test on this:
airflow test failure_handling task_to_fail 2018-08-10
You get the following in your log output:
INFO - dag_id=failure_handling, ds=2018-08-10, task_id=task_to_fail

Resources