Unexpected Jinja Template Behaviour in Custom Airflow Operator - airflow

I have made a custom sensor in Airflow which inherits BashSensor.
Sensor :
class MySensor(BashSensor):
def __init__(self, time, **kwargs): # {{ ts }} is passed as time in the DAG
self.time = time
cmd = f"java some-other-stuff {self.time}" # rendered/correct value for self.time
super().__init__(**kwargs, bash_command=cmd)
def poke(self, context):
status = super().poke() # returns True or False
if status:
print(self.time) # {{ ts }} is printed instead of rendered value
else:
print("trying again")
return status
When I look at the rendered tab for the operator task in DAG I see bash_command has the correct rendered value ({{ ts }} is passed as time).
The problem is whenever poke is called and True is returned, I see {{ ts }} in the print statement instead of the rendered value.
I expect self.time to have the rendered value (some timestamp) not {{ ts }} when I print it in poke function.

Both cmd and time are not templated field in your code so Jinja engine does not handle them. The reason you see the command being templated is because in the super call you do:
bash_command=cmd
and bash_command is templated field of BashSensor
So while the command is parsed to the correct string as expected the individual components that created it does not contain the render value.
To explain in some more details: time = "{{ ds }}" will always stay as this string. it will never be rendered.
When you do cmd = f"java some-other-stuff {self.time}" it becomes:
"java some-other-stuff {{ ds }}"
This string is assigned to bash_command which is templated field and when the code is executed the value of {{ ds }} is rendered.
To solve your issue you can simply add the parameters you want to template to the sequence:
class MySensor(BashSensor):
...
template_fields: Sequence[str] = tuple({'time'} | set(BashSensor.template_fields))
...

Related

Problem with push dict parameters to PapermillOperator from xcom airflow

I am trying to push parameter with dict inside from airflow xcom_pull to PapermillOperator like that:
send_to_jupyter_operator = PapermillOperator(
task_id='send_to_jupyter',
input_nb="./dags/notebooks/input_test.ipynb",
output_nb="./dags/notebooks/{{ execution_date }}-result.ipynb",
parameters={"table_list": "{{ ti.xcom_pull(dag_id='select_data_from_table',task_ids='select_data', key='table_result_dict') }}"} )
Task with task_id='select_data' - its a PythonOperator which push dict to xcom.
Inside ti.xcom_pull(dag_id='select_data_from_table', task_ids='select_data', key='table_result_dict') - dict of dicts (keys - name of dimension, values - dicts with key = attribute name, values - list of values);
But with this syntax jupyter-notebook import string, not dict, like:
table_list = "{'key1': {'attr1': []}}"
Are there any tips to solve this problem?
I have already tried to use:
parameters={"table_list": {{ ti.xcom_pull(dag_id='select_data_from_table', task_ids='select_data', key='table_result_dict') }} } - in this keys python doesn't know what 'ti' is actually.
parameters={"table_list": {{ context['ti'].xcom_pull(dag_id='select_data_from_table', task_ids='select_data', key='table_result_dict') }} } - in this keys python doesn't know what 'context' is actually.
I have resolved problem with another way.
Just add this to your jupyter-notebook:
list = json.loads(input_list.replace("\'",'"').replace('None', 'null'))

How can I set execution date in HttpSensor (Airflow)?

I try to pass execution date to httpsensor operator.
is_api_available = HttpSensor(
task_id='is_api_available',
http_conn_id='data_available',
endpoint='api/3/action/date= {{I want to set date in here}}'
)
I can get execution date parameter in python operator like this:
print("my start date : ",kwargs['execution_date'] )
it works but how can I get it in other operators?
thanks in advance
You can use Jinja template with the variable {{ ds }}, it format the datetime as YYYY-MM-DD
for more macros you can see at https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html
is_api_available = HttpSensor(
task_id='is_api_available',
http_conn_id='data_available',
endpoint='api/3/action/date={{ ds }}')
api/3/action/date=2022-06-25

Flask - Flask-Moment gives out an error: AttributeError: 'str' object has no attribute 'strftime'

Ok, so i want to change a datetime value from mysql database into a desired format. so my code goes:
file: app.py
from flask import Flask, render_template, flash, redirect, url_for, session, request, logging
from datetime import datetime
from flask_moment import Moment
app = Flask(__name__)
moment = Moment(app)
mysql = MySQL(cursorclass=DictCursor)
#app.route('/users')
def users():
# create cursor
cur = mysql.get_db().cursor()
# execute query
cur.execute('SELECT * FROM users')
# get results
users = cur.fetchall()
return render_template('users.html', title = 'User Accounts', users=users)
then on my users.html, i display the datetime value on a table which goes:
file: users.html
<tbody>
{% for row in users %}
<tr>
<td>{{row.id}}</td>
<td>{{row.username}}</td>
<td>{{row.fullname}}</td>
<td>{{moment(row.created_at).format('LLL')}}</td> # this causes an error
</tr>
{% endfor %}
</tbody>
But when i put in the following code for the datetime:
<td>{{moment().format('LLL')}}</td> # this is working
So in short,
# this is not working
# Causes an "AttributeError: 'str' object has no attribute 'strftime'" error
moment(row.created_at).format('LLL')
# this is working
# but i can't modify the data based on the value from mysql database
moment().format('LLL')
# by the way, this is also not working
# and it also causes the same error
row.created_at.strftime('%M %d, %Y')
What i need to know is how to format datetime value in the flask template and Flask-Moment seems to be the only way
EDIT:
You can also try using:
{{ row.created_at.strftime('%M %d, %Y') }}
or the jinja datetime filter:
{{ row.created_at|datetime }}
ORIGINAL ANSWER:
It appears you need to convert from MySQL DateTime to Python datetime. Without seeing more of your code, maybe something like this would work, though there is probably a more efficient way:
# get results
users = cur.fetchall()
for user in users:
user.created_at = datetime.strptime(user.created_at, '%Y-%M-%d') # you have to use the mysql format here
return render_template('users.html', title = 'User Accounts', users=users)
You are using strftime when it appears that python is interpreting the MySQL datetime as a string, not a datetime() object, so you have to use strptime
Momentjs couldn't parse it because it is an 'str'. Use datetime to parse the string before passing it to the momentjs object e.g
from datetime import datetime
row_created_at = datetime.strptime(row_created_at, %m %d,%Y ')
then pass it to the momentjs object like so
{{moment(row_created).format("LLL")}}
That should do the trick

How to render values from Xcom with MySqlToGoogleCloudStorageOperator

I have the following code:
import_orders_op = MySqlToGoogleCloudStorageOperator(
task_id='import_orders',
mysql_conn_id='mysql_con',
google_cloud_storage_conn_id='gcp_con',
sql='SELECT * FROM orders where orders_id>{0};'.format(LAST_IMPORTED_ORDER_ID),
bucket=GCS_BUCKET_ID,
filename=file_name,
dag=dag)
I want to change the query to:
sql='SELECT * FROM orders where orders_id>{0} and orders_id<{1};'.format(LAST_IMPORTED_ORDER_ID, ...)
The value for {1} is generated with operator in the task before this one. It's being pushed with XCOM.
How can I read the value here?
It should be something with xcom_pull but what is the proper way to do it ? Can I render this sql parameter inside the operator?
I tried to do this:
import_orders_op = MySqlToGoogleCloudStorageOperator(
task_id='import_orders',
mysql_conn_id='mysql_con',
google_cloud_storage_conn_id='gcp_con',
sql='SELECT * FROM orders where orders_id>{0} and orders_id<{1}'.format(LAST_IMPORTED_ORDER_ID,{{ task_instance.xcom_pull(task_ids=['get_max_order_id'], key='result_status') }}),
bucket=GCS_BUCKET_ID,
filename=file_name,
dag=dag)
It gives:
Broken DAG: name 'task_instance' is not defined
In your dag file you aren't actively in a dagrun context with an existing task instance to use as you have.
You can only pull the value when the operator is running, not while you're setting it up (that latter context is executed in a loop by the scheduler and would be run 1000s of times a day, even if the DAG were weekly or was disabled). But what you wrote is actually really close to something that would have worked, so maybe you already considered this contextual point.
Let's write it as a template:
# YOUR EXAMPLE FORMATTED A BIT MORE 80 COLS SYTLE
…
sql='SELECT * FROM orders where orders_id>{0} and orders_id<{1}'.format(
LAST_IMPORTED_ORDER_ID,
{{ task_instance.xcom_pull(
task_ids=['get_max_order_id'], key='result_status') }}),
…
# SHOULD HAVE BEEN AT LEAST: I hope you can spot the difference.
…
sql='SELECT * FROM orders where orders_id>{0} and orders_id<{1}'.format(
LAST_IMPORTED_ORDER_ID,
"{{ task_instance.xcom_pull("
"task_ids=['get_max_order_id'], key='result_status') }}"),
…
# AND COULD HAVE BEEN MORE CLEARLY READABLE AS:
…
sql='''
SELECT *
FROM orders
WHERE orders_id > {{ params.last_imported_id }}
AND orders_id < {{ ti.xcom_pull('get_max_order_id') }}
''',
params={'last_imported_id': LAST_IMPORTED_ORDER_ID},
…
And I know that you're populating LAST_IMPORTED_ORDER_ID from an Airflow variable. You could not do that in the dag file and instead change {{ params.last_imported_id }} to the {{ var.value.last_imported_order_id }} or whatever you named the Airflow variable you were setting.

How can I get `ds` in SlackAPIPostOperator?

I want to run a python script which need a datetime param and post its output to slack. But I don't know how to get airflow template variable ds.
Let's say, I have below code:
def make_txt():
# get ds
ds = get_ds()
ds = ds * 3 + 4 / 5 # do something with ds
return ds
slack_task = SlackAPIPostOperator(
text=make_txt(),
token='xoxp-xxxxxxx',
)
Because I will run with airflow backfill dag_id -s 2016-10-01, the ds (here is 2016-10-01) should pass to slack text.
I try to write python script output to a file and then read it and pass to slack text directly. But I don't think that's a perfect solution.
The text field of SlackAPIPostOperator is templated, so if you add {{ ds }} somewhere within the text it will be inserted by jinja.

Resources