Airflow:Slack Alert:Post the error message - airflow

Requirement: How to get the error message into the slack message
Airflow version: 2.2.4
This below slack message only says it failed but doesn't say the error why it failed
def slack_alert(context):
SLACK_CONN_ID = 'slack_conn_id'
slack_webhook_token = BaseHook.get_connection(SLACK_CONN_ID).password
slack_msg = """
:red_circle: Task Failed.
*Task*: {task}
*Dag*: {dag}
*Execution Time*: {exec_date}
*Log Url*: {log_url}
""".format(
task=context.get("task_instance").task_id,
dag=context.get("task_instance").dag_id,
ti=context.get("task_instance"),
exec_date=context.get("logical_date"),
log_url=context.get("task_instance").log_url.replace(
"http://localhost:8080", "https://airflow.yourdomain.com"
),
)
slack_notification =SlackWebhookOperator(
task_id="slack_notification",
http_conn_id=SLACK_CONN_ID,
webhook_token=slack_webhook_token,
message=slack_msg,
username='airflow')
return slack_notification.execute(context=context)

Figured out the answer , its context.get("errror") or context.get("reason")

Related

Got Future <Future pending> attached to a different loop when using eventhub aio in Fastapi python

I am using python 3.9, azure-eventhub 5.10.1, azure-eventhub-checkpointstoreblob-aio, and I have following code that throws the exception regularly (we also have lots of successful case that sends the message with no error), but also got the runtime errors in the logs. Wondering what i did wrong here. Thanks
async def send_to_eventhub(self, producer, event_list, timestamp_event_received):
try:
async with producer:
event_data_batch = await producer.create_batch()
for (occupancy_status, hardware_id) in event_list:
# set message properties for space report
message_body = {
...
}
message = EventData(json.dumps(message_body))
message.properties = {
...
}
# Send message to the eventhub
logger.info("Sending message %s, %s", message, message.properties)
event_data_batch.add(message)
await producer.send_batch(event_data_batch)
logger.info(
"Message successfully sent %s, %s", message, message.properties
)
except (
EventDataError,
EventDataSendError,
OperationTimeoutError,
OwnershipLostError,
RuntimeError,
) as event_ex:
logger.error(
"eventhub Sending Error: Error ocurred\
sending message for hardware id %s %s %s",
hardware_id,
event_ex,
traceback.format_exc(),
) ```
And this function got called in the follow Fastapi
<!-- begin snippet: -->
#app.post(...)
async def handle_report(
...
):
...
try:
if len(incoming_data) > 0:
event_list = []
for sensor_data in incoming_data:
data = sensor_data["data"]
occupancy_status = json.loads(data)["value"]
hardware_id = sensor_data["properties"]["propertyList"][0]["value"]
event_list.append((occupancy_status, hardware_id))
await eventhub_helper.send_to_eventhub(
producer, event_list, received_timestamp
)
...`
<!-- end snippet -->
And the exception says:
`eventhub Sending Error: Error ocurred sending message for hardware id TSPR04ESH11000268 Task <Task pending name='Task-544711411' coro=<RequestResponseCycle.run_asgi() running at /opt/pysetup/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py:375> cb=[set.discard()]> got Future <Future pending> attached to a different loop Traceback (most recent call last):
File "/app/eventhub_helper.py", line 94, in send_to_eventhub
logger.info(
File "/opt/pysetup/.venv/lib/python3.9/site-packages/azure/eventhub/aio/_producer_client_async.py", line 218, in __aexit__
await self.close()
File "/opt/pysetup/.venv/lib/python3.9/site-packages/azure/eventhub/aio/_producer_client_async.py", line 811, in close
async with self._lock:
File "/usr/local/lib/python3.9/asyncio/locks.py", line 14, in __aenter__
await self.acquire()
File "/usr/local/lib/python3.9/asyncio/locks.py", line 120, in acquire
await fut
RuntimeError: Task <Task pending name='Task-544711411' coro=<RequestResponseCycle.run_asgi() running at /opt/pysetup/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py:375> cb=[set.discard()]> got Future <Future pending> attached to a different loop`
I tried to reproduce this error, but it was hard because it went through with no error. Wondering if I did not consider concurrency enough. Did notice that "event_data_batch.add(message)" can cause error if that batch if full, but dont think it could cause runtime error and i know that message we sent is small

How to get reason for failure using slack in airflow2.0

How to get the reason for the failure of an operator, without going into logs. As I want to post the reason as a notification through slack?
Thanks,
Xi
I can think of one way of doing this as below.
set error notifications -> https://www.astronomer.io/guides/error-notifications-in-airflow/
Also create a slack email alias for DM https://slack.com/help/articles/206819278-Send-emails-to-Slack
Other way is using the Slack API from airflow : https://medium.com/datareply/integrating-slack-alerts-in-airflow-c9dcd155105
Check the above for SlackAPIPostOperator
exception=context.get('exception')is the function which will give exact reason for failure
Example of on_failure_callback using slack:
step_checker = EmrStepSensor(task_id='watch_step',
job_flow_id="{{ task_instance.xcom_pull('create_job_flow',
key='return_value') }}",
step_id="{{task_instance.xcom_pull(task_ids='add_steps',key='return_value')[0] }}",
aws_conn_id='aws_default',
on_failure_callback=task_fail_slack_alert,)
def task_fail_slack_alert(context):
SLACK_CONN_ID = 'slack'
slack_webhook_token = BaseHook.get_connection(SLACK_CONN_ID).password
slack_msg = """
:red_circle: Task Failed.
*Task*: {task}
*Dag*: {dag}
*Execution Time*: {exec_date}
*Log Url*: {log_url}
*Error*:{exception}
""".format(
task=context.get('task_instance').task_id,
dag=context.get('task_instance').dag_id,
exec_date=context.get('execution_date'),
log_url=context.get('task_instance').log_url,
exception=context.get('exception')
)
failed_alert = SlackWebhookOperator(
task_id='slack_test',
http_conn_id='slack',
webhook_token=slack_webhook_token,
message=slack_msg,
username='airflow',
dag=dag)
return failed_alert.execute(context=context)

Airflow Error callback "on_failure_callback" is not executing all the lines in the function

I have a problem in the usage of the on_failure_callback. I have defined my error callback function to perform 2 "http post" requests and I have added a logging.error( ) message between the two. I notice that only one is getting executed. Is there any delay or some thing that I am missing here?
please help.
def custom_failure_function(context):
logging.error("These task instances ahhh")
to_json= json.loads(t_teams)
var1= json.dumps(to_json)
print(var1)
r = requests.post('https://myteamschannel/teams', data=var1,verify=False)
logging.error("hello")
runID='OPERATION_CONTEXT .OCV8.TEST2 alarm_object 193'
headers = {'Content-Type':'text/xml'}
alarmRequest='<soapenv:Envelope xmlns:soapenv=\"http://schemas.xmlsoap.org/soap/envelope/\" xmlns:oper=\"http://172.19.146.147:7180/TeMIP_WS/services/OPERATION_CONTEXT-alarm_object\"><soapenv:Header xmlns:wsa=\"http://www.w3.org/2005/08/addressing\"><wsu:Timestamp xmlns:wsu=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd\"><wsu:Created>2014-05-22T11:57:38.267Z</wsu:Created><wsu:Expires>2014-05-22T12:02:38.000Z</wsu:Expires></wsu:Timestamp><wsse:Security xmlns:wsse=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd\" soapenv:mustUnderstand=\"1\"><wsse:UsernameToken xmlns:wsu=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd\"><wsse:Username>girws</wsse:Username><wsse:Password Type=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText\">Temip</wsse:Password></wsse:UsernameToken></wsse:Security></soapenv:Header> <soapenv:Body> <oper:Set_Request xmlns:oper=\"http://172.19.146.147:7180/TeMIP_WS/services/OPERATION_CONTEXT-alarm_object\"><EntitySpec><Natural> '+ runID + '</Natural></EntitySpec><Arguments> <Attribute_Values><Filtering_Type>' + 'AUTOFAIL' + '</Filtering_Type></Attribute_Values></Arguments></oper:Set_Request> </soapenv:Body> </soapenv:Envelope>'
r = requests.post('http://myerrorappli:7180/TeMIP_WS/services/OPERATION_CONTEXT-alarm_object', header=headers, data=alarmRequest,verify=True)
logging.error ("FAILED TASK")
logging.error("============================================")
The logs of my airflow are below. Its stopping at the "hello" message and not printing "FAILED TASK".
*** Reading local file: /data/airflow//logs/MOE_TEST_DAG/TeamsTest/2021-10-02T08:24:14.821970+00:00/3.log
[2021-10-02 10:24:36,535] {MOE_TEST.py:132} ERROR - These task instances ahhh
[2021-10-02 10:24:36,987] {MOE_TEST.py:138} ERROR - hello
From your description it's more likely that there is an issue with requests.post() try to add timeout to the request:
def custom_failure_function(context):
...
try:
r = requests.post('http://myerrorappli:7180/TeMIP_WS/services/OPERATION_CONTEXT-alarm_object', header=headers,
data=alarmRequest, verify=True, timeout=5)
except requests.Timeout:
logging.error("request timeout")
except requests.ConnectionError:
logging.error("request connection error")
logging.error("FAILED TASK")
logging.error("============================================")

Failed when parsing body as json - call Rasa API with request.post()

I'm writing a function to call Rasa API for intent prediction. Here is my code:
def run_test():
url = "http://localhost:5005/model/parse"
obj = {"text": "What is your name?"}
response = requests.post(url, data=obj)
print(response.json())
I also start Rasa server with this command: rasa run -m models --enable-api --cors "*" --debug
And here is what I got from Rasa server terminal:
In the terminal that I excuted run_test(), I got this result:
{'version': '2.7.1', 'status': 'failure', 'message': 'An unexpected error occurred. Error: Failed when parsing body as json', 'reason': 'ParsingError', 'details': {}, 'help': None, 'code'
: 500}
Anybody help me to solve this problem? Thank you very much!
I found the solution:
Only need to use json.dumps() the object, because object in Python is different than object in json.
def run_test():
url = "http://localhost:5005/model/parse"
obj = {"text": "What is your name?"}
response = requests.post(url, data=json.dumps(obj))
print(response.json())

Airflow : Not receiving an Error message in Email, whenever the DAG/TASK is failed with on_failure_callback

Airflow version 1.10.3
Below is the module code that is been called by on_failure_callback
I have used reason = context.get("exception"), But I get an error as None in the email when the job is failed instead of getting an error message
Output in the email:
Reason for Failure: None
alert_email.py
from airflow.utils.email import send_email
from airflow.models import Variable
def failure_alert(context, config=None):
config = {} if config is None else config
email = config.get('email_id')
task_id = context.get('task_instance').task_id
dag_id = context.get("dag").dag_id
execution_time = context.get("execution_date")
reason = context.get("exception")
dag_failure_html_body = f"""<html>
<header><title>The below DAG has failed!</title></header>
<body>
<b>DAG Name</b>: {dag_id}<br/>
<b>Task Id</b>: {task_id}<br/>
<b>Execution Time (UTC)</b>: {execution_time}<br/>
<b>Reason for Failure</b>: {reason}<br/>
</body>
</html>
"""
try:
send_email(
to=email,
subject=f"Airflow alert: <DagInstance: {dag_id} - {execution_time} [failed]",
html_content=dag_failure_html_body,
)
except Exception as e:
logger.error(f'Error in sending email to address {email}: {e}', exc_info=True)
The issue with the version Airflow 1.10.3. We will be upgrading into Airflow 1.10.10

Resources