How to integrate Apache Airflow with slack? - airflow

could someone please give me step by step manual on how to connect Apache Airflow to Slack workspace.
I created webhook to my channel and what should I do with it next ?
Kind regards

Create a Slack Token from
https://api.slack.com/custom-integrations/legacy-tokens
Use the SlackAPIPostOperator Operator in your DAG as below
SlackAPIPostOperator(
task_id='failure',
token='YOUR_TOKEN',
text=text_message,
channel=SLACK_CHANNEL,
username=SLACK_USER)
The above is the simplest way you can use Airflow to send messages to Slack.
However, if you want to configure Airflow to send messages to Slack on task failures, create a function and add on_failure_callback to your tasks with the name of the created slack function. An example is below:
def slack_failed_task(contextDictionary, **kwargs):
failed_alert = SlackAPIPostOperator(
task_id='slack_failed',
channel="#datalabs",
token="...",
text = ':red_circle: DAG Failed',
owner = '_owner',)
return failed_alert.execute()
task_with_failed_slack_alerts = PythonOperator(
task_id='task0',
python_callable=<file to execute>,
on_failure_callback=slack_failed_task,
provide_context=True,
dag=dag)
Using SlackWebHook (Works only for Airflow >= 1.10.0):
If you want to use SlackWebHook use SlackWebhookOperator in a similar manner:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/slack_webhook_operator.py#L25

Try the new SlackWebhookOperator which is there in Airflow version>=1.10.0
from airflow.contrib.operators.slack_webhook_operator import SlackWebhookOperator
slack_msg = "Hi Wssup?"
slack_test = SlackWebhookOperator(
task_id='slack_test',
http_conn_id='slack_connection',
webhook_token='/1234/abcd',
message=slack_msg,
channel='#airflow_updates',
username='airflow_'+os.environ['ENVIRONMENT'],
icon_emoji=None,
link_names=False,
dag=dag)
Note: Make sure you have slack_connection added in your Airflow connections as
host=https://hooks.slack.com/services/

The full example with SlackWebhookOperator usage as in #kaxil answer:
def slack_failed_task(task_name):
failed_alert = SlackWebhookOperator(
task_id='slack_failed_alert',
http_conn_id='slack_connection',
webhook_token=Variable.get("slackWebhookToken", default_var=""),
message='#here DAG Failed {}'.format(task_name),
channel='#epm-marketing-dev',
username='Airflow_{}'.format(ENVIRONMENT_SUFFIX),
icon_emoji=':red_circle:',
link_names=True,
)
return failed_alert.execute
task_with_failed_slack_alerts = PythonOperator(
task_id='task0',
python_callable=<file to execute>,
on_failure_callback=slack_failed_task,
provide_context=True,
dag=dag)
As #Deep Nirmal Note: Make sure you have slack_connection added in your Airflow connections as
host=https://hooks.slack.com/services/

Related

Why does a pythonoperator callable not need to accept parameters in airflow?

I do not understand how callables (function called as specified by PythonOperator) n Airflow should have their parameter list set. I have seen the with no parameters or with named params or **kwargs. I can always add "ti" or **allargs as parameters it seems, and ti seems to be used for task instance info, or ds for execution date. But my callables do not NEED params apparently. They can be simply be "def function():". If I wrote a regular python function func() instead of func(**kwargs), it would fail at runtime when called unless no params were passed. Airflow always seems to pass t1 all the time, so how can the callable function signature not require it?? Example below from a training site where _process_data func gets the ti, but _extract_bitcoin_price() does not. I was thinking that is because of the xcom push, but ti is ALWAYS available it seems, so how can "def somefunc()" ever work? I tried looking at pythonoperator source code, but I am unclear how this works or best practices for including parameters in a callable. Thanks!!
from airflow import DAG
from airflow.operators.python_operator
import PythonOperator
from datetime import datetime
import json
from typing import Dict
import requests
import logging
API = "https://api.coingecko.com/api/v3/simple/price?ids=bitcoin&vs_currencies=usd&include_market_cap=true&include_24hr_vol=true&include_24hr_change=true&include_last_updated_at=true"
def \_extract_bitcoin_price():
return requests.get(API).json()\['bitcoin'\]
def \_process_data(ti):
response = ti.xcom_pull(task_ids='extract_bitcoin_price')
logging.info(response)
processed_data = {'usd': response\['usd'\], 'change': response\['usd_24h_change'\]}
ti.xcom_push(key='processed_data', value=processed_data)
def \_store_data(ti):
data = ti.xcom_pull(task_ids='process_data', key='processed_data')
logging.info(f"Store: {data\['usd'\]} with change {data\['change'\]}")
with DAG('classic_dag', schedule_interval='#daily', start_date=datetime(2021, 12, 1), catchup=False) as dag:
extract_bitcoin_price = PythonOperator(
task_id='extract_bitcoin_price',
python_callable=_extract_bitcoin_price
)
process_data = PythonOperator(
task_id='process_data',
python_callable=_process_data
)
store_data = PythonOperator(
task_id='store_data',
python_callable=_store_data
)
extract_bitcoin_price >> process_data >> store_data
Tried callables with no params somefunc() expecting to get error saying too many params passed, but it succeeded. Adding somefunc(ti) also works! How can both work?
I think what you are missing is that Airflow allows to pass the context of the task to the python callable (as you can see one of them is the ti). These are additional useful parameters that Airflow provides and you can use them in your task.
In older Airflow versions user had to set provide_context=True which for that to work:
process_data = PythonOperator(
...,
provide_context=True
)
Since Airflow>=2.0 there is no need to use provide_context. Airflow handles it under the hood.
When you see in the Python Callable signatures like:
def func(ti, **kwargs):
...
This means that the ti is "unpacked" from the kwargs. You can also do:
def func(**kwargs):
ti = kwargs['ti']
EDIT:
I think what you are missing is that while you write:
def func()
...
store_data = PythonOperator(
task_id='task',
python_callable=func
)
Airflow does more than just calling func. The code being executed is the execute() function of PythonOperator and this function calls the python callable you provided with args and kwargs.

How to get reason for failure using slack in airflow2.0

How to get the reason for the failure of an operator, without going into logs. As I want to post the reason as a notification through slack?
Thanks,
Xi
I can think of one way of doing this as below.
set error notifications -> https://www.astronomer.io/guides/error-notifications-in-airflow/
Also create a slack email alias for DM https://slack.com/help/articles/206819278-Send-emails-to-Slack
Other way is using the Slack API from airflow : https://medium.com/datareply/integrating-slack-alerts-in-airflow-c9dcd155105
Check the above for SlackAPIPostOperator
exception=context.get('exception')is the function which will give exact reason for failure
Example of on_failure_callback using slack:
step_checker = EmrStepSensor(task_id='watch_step',
job_flow_id="{{ task_instance.xcom_pull('create_job_flow',
key='return_value') }}",
step_id="{{task_instance.xcom_pull(task_ids='add_steps',key='return_value')[0] }}",
aws_conn_id='aws_default',
on_failure_callback=task_fail_slack_alert,)
def task_fail_slack_alert(context):
SLACK_CONN_ID = 'slack'
slack_webhook_token = BaseHook.get_connection(SLACK_CONN_ID).password
slack_msg = """
:red_circle: Task Failed.
*Task*: {task}
*Dag*: {dag}
*Execution Time*: {exec_date}
*Log Url*: {log_url}
*Error*:{exception}
""".format(
task=context.get('task_instance').task_id,
dag=context.get('task_instance').dag_id,
exec_date=context.get('execution_date'),
log_url=context.get('task_instance').log_url,
exception=context.get('exception')
)
failed_alert = SlackWebhookOperator(
task_id='slack_test',
http_conn_id='slack',
webhook_token=slack_webhook_token,
message=slack_msg,
username='airflow',
dag=dag)
return failed_alert.execute(context=context)

How to runn external DAG as part of my DAG?

I'm new to Airflow and I'm trying to run an external DAG (developed and owned by another team), as part of my DAG flow.
I was looking at SubDagOperator, but it seems that for some reason it enforces the name of the subdag to be . which I cannot do as the child dag is owned by a different team.
here is my code sample:
parent_dag = DAG(
dag_id='parent_dag', default_args=args,
schedule_interval=None)
external_dag = SubDagOperator(
subdag=another_teams_dag,
task_id='external_dag',
dag=parent_dag,
trigger_rule=TriggerRule.ALL_DONE
)
and the other team's dag is defined like this:
another_teams_dag = DAG(
dag_id='another_teams_dag', default_args=args,
schedule_interval=None)
but I'm getting this error:
The subdag's dag_id should have the form
'{parent_dag_id}.{this_task_id}'. Expected 'parent_dag.external_dag';
received 'another_teams_dag'.
Any ideas?
What am I missing?
Use TriggerDagRunOperator
More info: https://airflow.apache.org/code.html#airflow.operators.dagrun_operator.TriggerDagRunOperator
Example:
Dag that triggers: https://github.com/apache/incubator-airflow/blob/master/airflow/example_dags/example_trigger_controller_dag.py
Dag that is triggered: https://github.com/apache/incubator-airflow/blob/master/airflow/example_dags/example_trigger_target_dag.py
For your case, you can use something like:
trigger = TriggerDagRunOperator(task_id='external_dag',
trigger_dag_id="another_teams_dag",
dag=dag)

How to setup Nagios Alerts for Apache Airflow Dags

Is it possible to setup Nagios alerts for airflow dags?
In case the dag is failed, I need to alert the respective groups.
You can add an "on_failure_callback" to any task which will call an arbitrary failure handling function. In that function you can then send an error call to Nagios.
For example:
dag = DAG(dag_id="failure_handling",
schedule_interval='#daily')
def handle_failure(context):
# first get useful fields to send to nagios/elsewhere
dag_id = context['dag'].dag_id
ds = context['ds']
task_id = context['ti'].task_id
# instead of printing these out - you can send these to somewhere else
logging.info("dag_id={}, ds={}, task_id={}".format(dag_id, ds, task_id))
def task_that_fails(**kwargs):
raise Exception("failing test")
task_to_fail = PythonOperator(
task_id='python_task_to_fail',
python_callable=task_that_fails,
provide_context=True,
on_failure_callback=handle_failure,
dag=dag)
If you run a test on this:
airflow test failure_handling task_to_fail 2018-08-10
You get the following in your log output:
INFO - dag_id=failure_handling, ds=2018-08-10, task_id=task_to_fail

apache airflow - Cannot load the dag bag to handle failure

I have created a on_failure_callback function(refering Airflow default on_failure_callback) to handle task's failure.
It works well when there is only one task in a DAG, however, if there are 2 more tasks, a task is randomly failed since the operator is null, it can resume later by manully . In airflow-scheduler.out the log is:
[2018-05-08 14:24:21,237] {models.py:1595} ERROR - Executor reports
task instance %s finished (%s) although the task says its %s. Was the
task killed externally? NoneType [2018-05-08 14:24:21,238]
{jobs.py:1435} ERROR - Cannot load the dag bag to handle failure for
. Setting task to FAILED without
callbacks or retries. Do you have enough resources?
The DAG code is:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta
import airflow
from devops.util import WechatUtil
from devops.util import JiraUtil
def on_failure_callback(context):
ti = context['task_instance']
log_url = ti.log_url
owner = ti.task.owner
ti_str = str(context['task_instance'])
wechat_msg = "%s - Owner:%s"%(ti_str,owner)
WeChatUtil.notify_team(wechat_msg)
jira_desc = "Please check log from url %s"%(log_url)
JiraUtil.create_incident("DW",ti_str,jira_desc,owner)
args = {
'queue': 'default',
'start_date': airflow.utils.dates.days_ago(1),
'retry_delay': timedelta(minutes=1),
'on_failure_callback': on_failure_callback,
'owner': 'user1',
}
dag = DAG(dag_id='test_dependence1',default_args=args,schedule_interval='10 16 * * *')
load_crm_goods = BashOperator(
task_id='crm_goods_job',
bash_command='date',
dag=dag)
load_crm_memeber = BashOperator(
task_id='crm_member_job',
bash_command='date',
dag=dag)
load_crm_order = BashOperator(
task_id='crm_order_job',
bash_command='date',
dag=dag)
load_crm_eur_invt = BashOperator(
task_id='crm_eur_invt_job',
bash_command='date',
dag=dag)
crm_member_cohort_analysis = BashOperator(
task_id='crm_member_cohort_analysis_job',
bash_command='date',
dag=dag)
crm_member_cohort_analysis.set_upstream(load_crm_goods)
crm_member_cohort_analysis.set_upstream(load_crm_memeber)
crm_member_cohort_analysis.set_upstream(load_crm_order)
crm_member_cohort_analysis.set_upstream(load_crm_eur_invt)
crm_member_kpi_daily = BashOperator(
task_id='crm_member_kpi_daily_job',
bash_command='date',
dag=dag)
crm_member_kpi_daily.set_upstream(crm_member_cohort_analysis)
I had tried to update the airflow.cfg by adding the default memory from 512 to even 4096, but no luck. Would anyone have any advice ?
Ialso try to updated my JiraUtil and WechatUtil as following, encoutering the same error
WechatUtil:
import requests
class WechatUtil:
#staticmethod
def notify_trendy_user(user_ldap_id, message):
return None
#staticmethod
def notify_bigdata_team(message):
return None
JiraUtil:
import json
import requests
class JiraUtil:
#staticmethod
def execute_jql(jql):
return None
#staticmethod
def create_incident(projectKey, summary, desc, assignee=None):
return None
(I'm shooting tracer bullets a bit here, so bear with me if this answer doesn't get it right on the first try.)
The null operator issue with multiple task instances is weird... it would help approaching troubleshooting this if you could boil the current code down to a MCVE e.g., 1–2 operators and excluding the JiraUtil and WechatUtil parts if they're not related to the callback failure.
Here are 2 ideas:
1. Can you try changing the line that fetches the task instance out of the context to see if this makes a difference?
Before:
def on_failure_callback(context):
ti = context['task_instance']
...
After:
def on_failure_callback(context):
ti = context['ti']
...
I saw this usage in the Airflow repo (https://github.com/apache/incubator-airflow/blob/c1d583f91a0b4185f760a64acbeae86739479cdb/airflow/contrib/hooks/qubole_check_hook.py#L88). It's possible it can be accessed both ways.
2. Can you try adding provide_context=True on the operators either as a kwarg or in default_args?

Resources