Airflow Jinja Rendered Template - airflow

I've been able to successfully render Jinja Templates using the function within the BaseOperator, render_template.
My question is does anyone know the requirements to get rendered strings into the UI under the Rendered or Rendered Template tab?
Referring to this tab in the UI:
Any help or guidance here would be appreciated.

If you are using templated fields in an Operator, the created strings out of the templated fields will be shown there. E.g. with a BashOperator:
example_task = BashOperator(
task_id='task_example_task',
bash_command='mycommand --date {{ task_instance.execution_date }}',
dag=dag,
)
then the bash command would get parsed through the template engine (since a Jinja field is included) and later on you could see the result of this parsing in the web UI as you mentioned.
The fields must be templated, though. This can be seen in the code in the field templated_fields. For BashOperator (see code here https://github.com/apache/incubator-airflow/blob/master/airflow/operators/bash_operator.py) this is:
template_fields = ('bash_command', 'env')
Other fields in the BashOperator will not be parsed.
You can use macro commands (see here https://airflow.apache.org/code.html#macros) or information from xcom (see here https://airflow.apache.org/concepts.html?highlight=xcom#xcoms) in templated fields.

Related

Vertex AI Airflow Operators don't render XCom pulls (specifically CreateBatchPredictionJobOperator)

I am trying to run a batch predict job task using the Vertex AI Airflow Operator CreateBatchPredictionJobOperator. This requires pulling a model id from XCom which was pushed by a previous custom container training job. However, CreateBatchPredictionJobOperator doesn't seem to render Xcom pulls as expected.
I am running Airflow 2.3.0 on my local machine.
My code looks something like this:
batch_job_task = CreateBatchPredictionJobOperator(
gcp_conn_id="gcp_connection",
task_id="batch_job_task",
job_display_name=JOB_DISPLAY_NAME,
model_name="{{ ti.xcom_pull(key='model_conf')['model_id'] }}",
predictions_format="bigquery",
bigquery_source=BIGQUERY_SOURCE,
region=REGION,
project_id=PROJECT_ID,
machine_type="n1-standard-2",
bigquery_destination_prefix=BIGQUERY_DESTINATION_PREFIX,
This results in a value error when the task runs:
ValueError: Resource {{ ti.xcom_pull(key='model_conf')['model_id'] }} is not a valid resource id.
The expected behaviour would be to pull that variable by key and render it as a string.
I can also confirm that I am able to see the model id (and other info) in XCom by navigating there in the UI. I attempted using the same syntax with xcom_pull with a PythonOperator and it works.
def print_xcom_value(value):
print("VALUE:", value)
print_xcom_value_by_key = PythonOperator(
task_id="print_xcom_value_by_key", python_callable=print_xcom_value,
op_kwargs={"value": "{{ ti.xcom_pull(key='model_conf')['model_id'] }}" },
provide_context=True,
)
> [2022-12-15, 13:11:19 UTC] {logging_mixin.py:115} INFO - VALUE: 3673414612827265024
CreateBatchPredictionJobOperator does not accept provide_context as a variable. I assumed it would render xcom pulls by default since xcom pulls are used in the CreateBatchPredictionJobOperator in an example on the Airflow docs (link here).
Is there any way I can provide context to this Vertex AI Operator to pull from the XCom storage?
Is something wrong with the syntax that I am not seeing? Anything I a misunderstanding in the docs?
UPDATE:
One thing that confuses me is that model_name is a templated field according to the Airflow docs (link here) but the field is not rendering the XCom template.
Did you set render_template_as_native_obj=True in your DAG definition?
What version of apache-airflow-providers-google do you use?
====
From OP:
Your answer was a step in the right direction.
The solution was to upgrade apache-airflow-providers-google to the latest version (at the moment, this is 8.6.0). I'm not able to pinpoint exactly where in the changelog this fix is mentioned.
Setting render_template_as_native_obj=True was not useful for this issue since it rendered the id pulled from XCom as an int, and I found no proper way to convert it to str when passed into CreateBatchPredictionJobOperator in the model_name arg.

airflow configure mail template

I'm trying to create a mail template for my dags in airflow, I'm strugling to find the documentation, I tried this but it's poor
I'm trying to find out what are the variables that I could use in my template, for example : for the subject I want to access the dag name (not the {{ ti }} which contains others informations), also for the body, I want to choose which part to be send (in my case, the real exception is displayed as warning in aiflow log, thus, I want to send it instead of {{ exception_html }}

Can Airflow Macros be used with the CloudSqlInstanceExportOperator?

We're using Airflow to schedule daily database exports using the CloudSqlInstanceExportOperator. This doesn't appear to work with Airflow Macros. We are trying to export 1 day of data using the execution date macro or {{ ds }} in the where clause. It's important to use the macro because we want our DAG to backfill.
The sample code is made of of two parts. First we define the export context:
export_body = {
"exportContext": {
"fileType": "csv",
"uri": "gs://"+GCP_BUCKET+'/'data.csv',
"databases":["database"],
"csvExportOptions": {
"selectQuery": """
select * from table
where datetime BETWEEN "{{ ds }} 00:00:00"
AND "{{ ds }} 23:59:59
"""
}
}
}
Next, pass the export context to the task:
cloudsql_export_task = CloudSqlInstanceExportOperator(
project_id=PROJECT_ID,
body = export_body,
instance='instance',
task_id='cloudsql_export_task',
dag=dag)
The task runs and get's marked as a success, however, the Google Cloud Storage file created has no data in it. When we hard code the date, the query works as expected. As a result, we know the problem is being caused by the macro value not being populated.
Any suggestions would be appreciated. Either how to fix this task or an alternative way to achieve the same objective (note: query is large and uses too much memory for MySqlToGoogleCloudStorageOperator to work)
Make sure the operator includes body in template_fields.
You can also use Jinja templating with nested fields, as long as these
nested fields are marked as templated in the structure they belong to:
fields registered in template_fields property will be submitted to
template substitution
More info about templating: https://airflow.readthedocs.io/en/stable/concepts.html#jinja-templating
You can extend the operator like the following
class CloudSqlInstanceExportTemplatedOperator(CloudSqlInstanceExportOperator):
template_fields = CloudSqlInstanceExportOperator.template_fields + ('body',)
shankshera answer is correct however you are using deprecated operator. In the updated version there is no need for the suggested modification.
The CloudSqlInstanceExportOperator was renamed to CloudSQLExportInstanceOperator and moved to providers.
For Airflow <2.0 you will need to install backport providers :
pip install apache-airflow-backport-providers-google
For Airflow >=2.0 you will need to install providers:
pip install apache-airflow-providers-google
The you can import the operator as:
from airflow.providers.google.cloud.operators.cloud_sql import CloudSQLExportInstanceOperator
Since the operator already has body listed in the templated fields you are good to go.

dynamic task id names in Airflow

I have a DAG with one DataflowTemplateOperator that can deal with different json files. When I trigger the dag I pass some parameters via {{dag_run.conf['param1']}} and works fine.
The issue I have is trying to rename the task_id based on param1.
i.e. task_id="df_operator_read_object_json_file_{{dag_run.conf['param1']}}",
it complains about only alphanumeric characters
or
task_id="df_operator_read_object_json_file_{}".format(dag_run.conf['param1']),
it does not recognise dag_run plus the alpha issue.
The whole idea behind this is that when I see at the dataflow jobs console and job has failed I know who the offender is based on param1. Dataflow Job names are based on task_id like this:
df-operator-read-object-json-file-8b9eecec
and what I need is this:
df-operator-read-object-param1-json-file-8b9eecec
Any ideas if this is possible?
There is no need to generate new operator per file.
DataflowTemplatedJobStartOperator has job_name parameter which is also templated so can be used with Jinja.
I didn't test it but this should work:
from airflow.providers.google.cloud.operators.dataflow import DataflowTemplatedJobStartOperator
op = DataflowTemplatedJobStartOperator(
task_id="df_operator_read_object_json_file",
job_name= "df_operator_read_object_json_file_{{dag_run.conf['param1']}}"
template='gs://dataflow-templates/your_template',
location='europe-west3',
)

How to pass a value from Xcom to another operator?

DockerOperator has a parameter xcom_push which when set, pushes the output of the Docker container to Xcom:
t1 = DockerOperator(task_id='run-hello-world-container',
image='hello-world',
xcom_push=True, xcom_all=True,
dag=dag)
In the admin interface under Xcom, I can see these values with key return_value. However, how can I access them in the DAG?
If I try:
t1_email_output = EmailOperator(task_id='t1_email_output',
to='user#example.com',
subject='Airflow sent you an email!',
html_content={{ ti.xcom_pull(task_ids='return_value') }},
dag=dag)
I get Broken DAG: [PATH] name 'ti' is not defined.
If I try:
t1_email_output = EmailOperator(task_id='t1_email_output',
to='user#example.com',
subject='Airflow sent you an email!',
html_content=t1.xcom_pull(task_ids='return_value'),
dag=dag)
I get Broken DAG: [PATH] xcom_pull() missing 1 required positional argument: 'context'.
You need to pass the task id from which you are pulling the xcom and not the variable name
In your example it would be
{{ ti.xcom_pull('run-hello-world-container') }}
Also in the second snippet it should be "ti" instead of "t1"
html_content=ti.xcom_pull('run-hello-world-container'),
I found the problem - turns out I was missing a quote and my parameter was also wrong:
t1_email_output = EmailOperator(task_id='t1_email_output',
to='user#example.com',
subject='Airflow sent you an email!',
html_content="{{ ti.xcom_pull(key='return_value') }}",
dag=dag)
Sends an email with the Docker container's output like I expect.
I think what is happening is that the {{ }} syntax gets processed as a Jinja template by Airflow when the DAG is run, but not when it is loaded. So if I don't put the quotes around it, Airflow gets Python exceptions when it tries to detect and load the DAG, because the template hasn't been rendered yet. But if the quotes are added, the templated expression is treated as a string, and ignored by Python interpreter when being loaded by Airflow. However when the EmailOperator is actually triggered during a DAG run, the template is rendered into actual references to the relevant data.

Resources