Pass Information between two SimpleHttpOperators with xom_pull() - airflow

I am fairly new to airflow and I am currently trying to pass information between my SimpleHttpOperators.
This is where the data is retrieved:
request_city_information = SimpleHttpOperator(
http_conn_id='overpass',
task_id='basic_city_information',
headers={"Content-Type": "application/x-www-form-urlencoded"},
method='POST',
data=f'[out:json]; node[name={name_city}][capital]; out center;',
response_filter=lambda response: response.json()['elements'][0],
dag=dag,)
And then I want to use the response from this in the following operator:
request_city_attractions = SimpleHttpOperator(
http_conn_id='overpass',
task_id='city_attractions',
headers={"Content-Type": "application/x-www-form-urlencoded"},
method='POST',
data=f"[out:json];(nwr[tourism='attraction'][wikidata](around:{search_radius},"
f"{request_city_information.xcom_pull(context='ti')['lat']}"
f",10););out body;>;out skel qt;",
dag=dag)
As you can see I tried to access the response via request_city_information.xcom_pull(context='ti'). However, my context seems to be wrong here.
As my data is already written into the XComs I take it that I don't need XCOM_push='True', as suggested here.
There seem to be changes to XCom since airflow 2.x as many of the suggested solutions I found do not work for me.
I believe there is a major gap in my thought process, I just don't know where.
I would appreciate any references to examples or help!
Thanks in advance

I have now solved it with a completely different approach, if you guys know how the first one works I would be happy for an explanation on that.
Here is my solution:
with DAG(
'city_info',
default_args=dafault_args,
description='xcom test',
schedule_interval=None,
) as dag:
#TODO: Tasks with conn_id
def get_city_information(**kwargs):
payload = f'[out:json]; node[name={name_city}][capital]; out center;'
#TODO: Request als Connection
r = requests.post('https://overpass-api.de/api/interpreter', data=payload)
ti = kwargs['ti']
ti.xcom_push('basic_city_information', r.json())
get_city_information_task = PythonOperator(
task_id='get_city_information_task',
python_callable=get_city_information
)
def get_city_attractions(**kwargs):
ti = kwargs['ti']
city_information = ti.xcom_pull(task_ids='get_city_information_task', key='basic_city_information')
payload = f"[out:json];(nwr[tourism='attraction'][wikidata](around:{search_radius}" \
f",{city_information['elements'][0]['lat']},{city_information['elements'][0]['lon']}" \
f"););out body;>;out skel qt;"
r = requests.post('https://overpass-api.de/api/interpreter', data=payload)
#TODO: Json as Object
ti.xcom_push('city_attractions', r.json())
get_city_attractions_task = PythonOperator(
task_id='get_city_attractions_task',
python_callable=get_city_attractions
)
get_city_information_task >> get_city_attractions_task

Related

Airflow - Druid Operator is not getting host

I have a question for Druid Operator. I see that this test is successful, but I take a this error.
File "/home/airflow/.local/lib/python3.7/site-packages/requests/sessions.py", line 792, in get_adapter
raise InvalidSchema(f"No connection adapters were found for {url!r}")
I take a dag like this
DRUID_CONN_ID = "druid_ingest_conn_id"
ingestion = DruidOperator(
task_id='ingestion',
druid_ingest_conn_id=DRUID_CONN_ID,
json_index_file='ingestion.json'
)
Also I change the dag to look overload but I take same error.
Another step I change the type to like this but I have a different error
ingestion_2 = SimpleHttpOperator(
task_id='test_task',
method='POST',
http_conn_id=DRUID_CONN_ID,
endpoint='/druid/indexer/v1/task',
data=json.dumps(read_file),
dag=dag,
do_xcom_push=True,
headers={
'Content-Type': 'application/json'
},
response_check=lambda response: response.json()['Status'] == 200,
)
{"error":"Missing type id when trying to resolve subtype of [simple type, class org.apache.druid.indexing.common.task.Task]: missing type id property 'type'\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 1]"}
Finally I try giving Http connection in Druid Operator but I have a error like this
raise AirflowException(f'Did not get 200 when submitting the Druid job to {url}')
So that I am confused. I need a help. Thanks for answers.
P.S: We use 2.3.3 Airflow version

Why does a pythonoperator callable not need to accept parameters in airflow?

I do not understand how callables (function called as specified by PythonOperator) n Airflow should have their parameter list set. I have seen the with no parameters or with named params or **kwargs. I can always add "ti" or **allargs as parameters it seems, and ti seems to be used for task instance info, or ds for execution date. But my callables do not NEED params apparently. They can be simply be "def function():". If I wrote a regular python function func() instead of func(**kwargs), it would fail at runtime when called unless no params were passed. Airflow always seems to pass t1 all the time, so how can the callable function signature not require it?? Example below from a training site where _process_data func gets the ti, but _extract_bitcoin_price() does not. I was thinking that is because of the xcom push, but ti is ALWAYS available it seems, so how can "def somefunc()" ever work? I tried looking at pythonoperator source code, but I am unclear how this works or best practices for including parameters in a callable. Thanks!!
from airflow import DAG
from airflow.operators.python_operator
import PythonOperator
from datetime import datetime
import json
from typing import Dict
import requests
import logging
API = "https://api.coingecko.com/api/v3/simple/price?ids=bitcoin&vs_currencies=usd&include_market_cap=true&include_24hr_vol=true&include_24hr_change=true&include_last_updated_at=true"
def \_extract_bitcoin_price():
return requests.get(API).json()\['bitcoin'\]
def \_process_data(ti):
response = ti.xcom_pull(task_ids='extract_bitcoin_price')
logging.info(response)
processed_data = {'usd': response\['usd'\], 'change': response\['usd_24h_change'\]}
ti.xcom_push(key='processed_data', value=processed_data)
def \_store_data(ti):
data = ti.xcom_pull(task_ids='process_data', key='processed_data')
logging.info(f"Store: {data\['usd'\]} with change {data\['change'\]}")
with DAG('classic_dag', schedule_interval='#daily', start_date=datetime(2021, 12, 1), catchup=False) as dag:
extract_bitcoin_price = PythonOperator(
task_id='extract_bitcoin_price',
python_callable=_extract_bitcoin_price
)
process_data = PythonOperator(
task_id='process_data',
python_callable=_process_data
)
store_data = PythonOperator(
task_id='store_data',
python_callable=_store_data
)
extract_bitcoin_price >> process_data >> store_data
Tried callables with no params somefunc() expecting to get error saying too many params passed, but it succeeded. Adding somefunc(ti) also works! How can both work?
I think what you are missing is that Airflow allows to pass the context of the task to the python callable (as you can see one of them is the ti). These are additional useful parameters that Airflow provides and you can use them in your task.
In older Airflow versions user had to set provide_context=True which for that to work:
process_data = PythonOperator(
...,
provide_context=True
)
Since Airflow>=2.0 there is no need to use provide_context. Airflow handles it under the hood.
When you see in the Python Callable signatures like:
def func(ti, **kwargs):
...
This means that the ti is "unpacked" from the kwargs. You can also do:
def func(**kwargs):
ti = kwargs['ti']
EDIT:
I think what you are missing is that while you write:
def func()
...
store_data = PythonOperator(
task_id='task',
python_callable=func
)
Airflow does more than just calling func. The code being executed is the execute() function of PythonOperator and this function calls the python callable you provided with args and kwargs.

How to get reason for failure using slack in airflow2.0

How to get the reason for the failure of an operator, without going into logs. As I want to post the reason as a notification through slack?
Thanks,
Xi
I can think of one way of doing this as below.
set error notifications -> https://www.astronomer.io/guides/error-notifications-in-airflow/
Also create a slack email alias for DM https://slack.com/help/articles/206819278-Send-emails-to-Slack
Other way is using the Slack API from airflow : https://medium.com/datareply/integrating-slack-alerts-in-airflow-c9dcd155105
Check the above for SlackAPIPostOperator
exception=context.get('exception')is the function which will give exact reason for failure
Example of on_failure_callback using slack:
step_checker = EmrStepSensor(task_id='watch_step',
job_flow_id="{{ task_instance.xcom_pull('create_job_flow',
key='return_value') }}",
step_id="{{task_instance.xcom_pull(task_ids='add_steps',key='return_value')[0] }}",
aws_conn_id='aws_default',
on_failure_callback=task_fail_slack_alert,)
def task_fail_slack_alert(context):
SLACK_CONN_ID = 'slack'
slack_webhook_token = BaseHook.get_connection(SLACK_CONN_ID).password
slack_msg = """
:red_circle: Task Failed.
*Task*: {task}
*Dag*: {dag}
*Execution Time*: {exec_date}
*Log Url*: {log_url}
*Error*:{exception}
""".format(
task=context.get('task_instance').task_id,
dag=context.get('task_instance').dag_id,
exec_date=context.get('execution_date'),
log_url=context.get('task_instance').log_url,
exception=context.get('exception')
)
failed_alert = SlackWebhookOperator(
task_id='slack_test',
http_conn_id='slack',
webhook_token=slack_webhook_token,
message=slack_msg,
username='airflow',
dag=dag)
return failed_alert.execute(context=context)

KubernetesPodOperator xcom_push key/values not available to subsequent task with xcom_pull

Here is an example of the KubernetesPodOperator I am trying --
set_tag = KubernetesPodOperator(
namespace='default',
task_id='set-tag',
name='set-tag',
image='ubuntu:18.04',
xcom_push=True,
cmds=["/bin/sh", "-c"],
arguments=['''mkdir /airflow &&
mkdir /airflow/xcom &&
echo '{"test_key":"test_value"}' > /airflow/xcom/return.json
''']
)
In the next downstream PythonOperator, I am trying to fetch this tag as follows -
def print_tag(**kwargs):
ti = kwargs['ti']
print(ti.xcom_pull(task_ids='set-tag', key='test_key'))
get_tag = PythonOperator(
task_id='get-tag',
dag=dag,
python_callable=print_tag,
provide_context=True
)
I am using 'airflow test' to first run task 'set-tag' and then run 'get-tag' hoping to see the 'test_value' printed. But the printed value appears as 'None'.
Any pointers are much appreciated.
Thanks in advance.
For the moment name of argument of KubernetesPodOperator for xcom push is do_xcom_push, not xcom_push
Source code

How to integrate Apache Airflow with slack?

could someone please give me step by step manual on how to connect Apache Airflow to Slack workspace.
I created webhook to my channel and what should I do with it next ?
Kind regards
Create a Slack Token from
https://api.slack.com/custom-integrations/legacy-tokens
Use the SlackAPIPostOperator Operator in your DAG as below
SlackAPIPostOperator(
task_id='failure',
token='YOUR_TOKEN',
text=text_message,
channel=SLACK_CHANNEL,
username=SLACK_USER)
The above is the simplest way you can use Airflow to send messages to Slack.
However, if you want to configure Airflow to send messages to Slack on task failures, create a function and add on_failure_callback to your tasks with the name of the created slack function. An example is below:
def slack_failed_task(contextDictionary, **kwargs):
failed_alert = SlackAPIPostOperator(
task_id='slack_failed',
channel="#datalabs",
token="...",
text = ':red_circle: DAG Failed',
owner = '_owner',)
return failed_alert.execute()
task_with_failed_slack_alerts = PythonOperator(
task_id='task0',
python_callable=<file to execute>,
on_failure_callback=slack_failed_task,
provide_context=True,
dag=dag)
Using SlackWebHook (Works only for Airflow >= 1.10.0):
If you want to use SlackWebHook use SlackWebhookOperator in a similar manner:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/slack_webhook_operator.py#L25
Try the new SlackWebhookOperator which is there in Airflow version>=1.10.0
from airflow.contrib.operators.slack_webhook_operator import SlackWebhookOperator
slack_msg = "Hi Wssup?"
slack_test = SlackWebhookOperator(
task_id='slack_test',
http_conn_id='slack_connection',
webhook_token='/1234/abcd',
message=slack_msg,
channel='#airflow_updates',
username='airflow_'+os.environ['ENVIRONMENT'],
icon_emoji=None,
link_names=False,
dag=dag)
Note: Make sure you have slack_connection added in your Airflow connections as
host=https://hooks.slack.com/services/
The full example with SlackWebhookOperator usage as in #kaxil answer:
def slack_failed_task(task_name):
failed_alert = SlackWebhookOperator(
task_id='slack_failed_alert',
http_conn_id='slack_connection',
webhook_token=Variable.get("slackWebhookToken", default_var=""),
message='#here DAG Failed {}'.format(task_name),
channel='#epm-marketing-dev',
username='Airflow_{}'.format(ENVIRONMENT_SUFFIX),
icon_emoji=':red_circle:',
link_names=True,
)
return failed_alert.execute
task_with_failed_slack_alerts = PythonOperator(
task_id='task0',
python_callable=<file to execute>,
on_failure_callback=slack_failed_task,
provide_context=True,
dag=dag)
As #Deep Nirmal Note: Make sure you have slack_connection added in your Airflow connections as
host=https://hooks.slack.com/services/

Resources