How to get xcom as a PostgresOperator parameter? - airflow

I created a xcom and I would like to get the result as a PostgresOperator parameter. I tried this
my_task = PostgresOperator(
task_id=‘my_task',
postgres_conn_id=config.get(env, 'redshift_conn'),
sql="my_task.sql",
params={
‘my_parameter': {{ int(ti.xcom_pull(task_ids=‘previous_task')) }}
},
dag=dag
)

You need to use templating when accessing xcom within an operator.
my_task = PostgresOperator(
task_id='my_task',
postgres_conn_id=config.get(env, 'redshift_conn'),
sql="my_task.sql",
params={
'my_parameter': "{{ti.xcom_pull(task_ids='previous_task')}}"
},
dag=dag
)

Related

How to set airflow `http_conn_id` with a param?

Running Airflow 2.2.2
I would like to parametrize the http_conn_id using the DAG input parameters as such:
with DAG(params={'api': 'my-api-id'}) as dag:
post_op = SimpleHttpOperator(
task_id='post_op',
endpoint='custom-end-point',
http_conn_id='{{ params.api }}', # <- this doesn't get filled correctly
dag=dag)
Where my-api-id is set in the Airflow Connections.
However, when executing, the operator evaluates http_conn_id as '{{ params.api }}'.
I'm suspecting this is not possible - or is an anti-pattern?
Airflow operators do not render all the fields, they render only the fields which are listed in the attribute template_fields. For the operator SimpleHttpOperator, you have only the fiels:
template_fields: Sequence[str] = (
'endpoint',
'data',
'headers',
)
To get around the problem, you can create a new class which extend the official operator, and just add the extra fields you want to render:
from datetime import datetime
from airflow import DAG
from airflow.providers.http.operators.http import SimpleHttpOperator
class MyHttpOperator(SimpleHttpOperator):
template_fields = (
*SimpleHttpOperator.template_fields,
'http_conn_id'
)
with DAG(
dag_id='http_dag',
start_date=datetime.today(),
params={'api': 'my-api-id'}
) as dag:
post_op = MyHttpOperator(
task_id='post_op',
endpoint='custom-end-point',
http_conn_id='{{ params.api }}',
dag=dag
)

Airflow 2.1.0 passing variable to another DAG using TriggerDagRunOperator

We're using Airflow 2.1.0 and want to trigger a DAG and pass a variable to it (an S3 file name) using TriggerDagRunOperator.
I've found examples of this and can pass a static JSON to the next DAG using conf:
#task()
def trigger_target_dag_task(context):
TriggerDagRunOperator(
task_id="trigger_target_dag",
trigger_dag_id="target_dag",
conf={"file_name": "test.txt"}
).execute(context)
However, I cannot find current examples where the conf is dynamically created without using python_callable - this seems close:
Airflow 2.0.0+ - Pass a Dynamically Generated Dictionary to DAG Triggered by TriggerDagRunOperator
https://github.com/apache/airflow/pull/6317#issuecomment-859556243
Is this possible?
Updated question:
This method did not work when I used:
#task()
def trigger_dag_task(context):
TriggerDagRunOperator(
task_id="trigger_dag_task",
trigger_dag_id="target_dag",
conf={"payload": "{{ ti.xcom_pull(task_ids='extract_rss') }}"},
).execute(context)
The target_dag received the conf as a string:
{logging_mixin.py:104} INFO - Remotely received value of {{ ti.xcom_pull(task_ids='extract_rss') }}
Conf is a templated field, so you could use Jinja to pass in any variable. Consider this example based on the official TriggerDagRunOperator example
If the variable (object_name) is within your scope you could do:
Controller DAG:
dag = DAG(
dag_id="example_trigger_controller_dag",
default_args={"owner": "airflow"},
start_date=days_ago(2),
schedule_interval="#once",
tags=['example'],
)
object_name = "my-object-s3-aws"
trigger = TriggerDagRunOperator(
task_id="test_trigger_dagrun",
trigger_dag_id="example_trigger_target_dag",
conf={"s3_object": object_name},
dag=dag,
)
Target DAG:
dag = DAG(
dag_id="example_trigger_target_dag",
default_args={"owner": "airflow"},
start_date=days_ago(2),
schedule_interval=None,
tags=['example'],
)
def run_this_func(**context):
print("Remotely received value of {} for key=message".format(
context["dag_run"].conf["s3_object"]))
run_this = PythonOperator(
task_id="run_this", python_callable=run_this_func, dag=dag)
bash_task = BashOperator(
task_id="bash_task",
bash_command='echo "Here is the message: $message"',
env={'message': '{{ dag_run.conf["s3_object"] if dag_run else "" }}'},
dag=dag,
)
If the variable is stored as an Airflow Variable you could retrieve it like this:
conf={"s3_object": "{{var.json.s3_object}}"}
If it were an XCom from a previous task, you could do:
conf={"s3_object": "{{ ti.xcom_pull(task_ids='previous_task_id', key='return_value') }}"
Let me know if that worked for you!
docs
Edit:
This is a working example, tested in version 2.0.1, using xcom_pull in conf param:
Controller DAG:
from airflow import DAG
from airflow.operators.trigger_dagrun import TriggerDagRunOperator
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
def _do_something():
return "my-object-s3-aws"
dag = DAG(
dag_id="example_trigger_controller_dag",
default_args={"owner": "airflow"},
start_date=days_ago(2),
schedule_interval="#once",
tags=['example'],
)
task_1 = PythonOperator(task_id='previous_task_id',
python_callable=_do_something)
trigger = TriggerDagRunOperator(
task_id="test_trigger_dagrun",
trigger_dag_id="example_trigger_target_dag",
conf={
"s3_object":
"{{ ti.xcom_pull(task_ids='previous_task_id', key='return_value') }}"},
dag=dag,
)
task_1 >> trigger
Target DAG:
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
dag = DAG(
dag_id="example_trigger_target_dag",
default_args={"owner": "airflow"},
start_date=days_ago(2),
schedule_interval=None,
tags=['example'],
)
def run_this_func(**context):
print("Remotely received value of {} ".format(
context["dag_run"].conf["s3_object"]))
run_this = PythonOperator(
task_id="run_this", python_callable=run_this_func, dag=dag)
bash_task = BashOperator(
task_id="bash_task",
bash_command='echo "Here is the message: $s3_object"',
env={'s3_object': '{{ dag_run.conf["s3_object"] if dag_run else "" }}'},
dag=dag,
)
Logs from run_this task:
[2021-07-15 19:24:11,410] {logging_mixin.py:104} INFO - Remotely received value of my-object-s3-aws

Airflow: Jinja template not being rendered in subdag pythonOperator

I am passing a set parameters in SubDagOperator function which then calls PythonOprerator but the Jinja Template or macros are not getting rendered
file_check = SubDagOperator(
task_id=SUBDAG_TASK_ID,
subdag=load_sub_dag(
dag_id='%s.%s' % (DAG_ID, SUBDAG_TASK_ID),
params={
'countries': ["SG"],
'date': '{{ execution_date.strftime("%Y%m%d") }}',
'poke_interval': 60,
'timeout': 60 * 5
},
start_date=default_args['start_date'],
email=default_args['email'],
schedule_interval=None,
),
dag=dag
)
now in the load dag operator call an pythonOperaor as
def load_sub_dag(dag_id, start_date, email, params, schedule_interval):
dag = DAG(
dag_id=dag_id,
schedule_interval=schedule_interval,
start_date=start_date
)
start = PythonOperator(
task_id="start",
python_callable=get_start_time,
provide_context=True,
dag=dag
)
file_paths = source_detail['path'].replace('$date', params['date'])
file_paths = [file_paths.replace("$cc", country) for country in params['countries']]
for file_path in file_paths:
i += 1
check_files = PythonOperator(
task_id="success_file_check_{}_{}".format(source, i),
python_callable=check_success_file,
op_kwargs={"file_path": file_path, "params": params,
"success_file_name": success_file_name,
"hourly": hourly},
provide_context=True,
retries=0,
dag=dag
)
start >> check_files
return dag
Now, as far as I know, the jinja template should get rendered in the op_kawrgs section of check_file pythonoperator but it is not happening rather I am getting the same string in final file name
Also when I see task details I see file name as u'/something/dt={{ execution_date.strftime("%Y%m%d") }}'
Airflow ver - 1.10.2 &
celeryexecutor

How to use BigQueryOperator with execution_date?

This is my code:
EXEC_TIMESTAMP = "{{ execution_date.strftime('%Y-%m-%d %H:%M') }}"
query = """
select ... where date_purchased between TIMESTAMP_TRUNC(cast ( {{ params.run_timestamp }} as TIMESTAMP), HOUR, 'UTC') ...
"""
generate_op = BigQueryOperator(
bql=query,
destination_dataset_table=table_name,
task_id='generate',
bigquery_conn_id=CONNECTION_ID,
use_legacy_sql=False,
write_disposition='WRITE_TRUNCATE',
create_disposition='CREATE_IF_NEEDED',
query_params={'run_timestamp': EXEC_TIMESTAMP},
dag=dag)
This should work but it doesn't.
The render tab shows me:
between TIMESTAMP_TRUNC(cast ( as TIMESTAMP), HOUR, 'UTC')
The date is missing. It's being rendered into nothing.
How can I fix this? There is no provide_context=True for this operator. I don't know what to do.
Luis, the query_params are not the params you can refer to in the templating context. They are not added to it. And since params is empty, your {{ params.run_timestamp }} is either "" or None. If you changed that to params={'run_timestamp':…} it would still have a problem because params values are not templated. So when you use a templated field bql to include {{ params.run_timestamp }} you will get exactly what's in params: {'run_timestamp': …str… } filled in WITHOUT any recursive expansion of that value. You should get {{ execution_date.strftime('%Y-%m-%d %H:%M') }}.
Let me try re-writing this for you (but I may have got the parens around cast incorrectly, not sure):
generate_op = BigQueryOperator(
sql="""
select ...
where date_purchased between
TIMESTAMP_TRUNC(cast('{{execution_date.strftime('%Y-%m-%d %H:%M')}}') as TIMESTAMP), HOUR, 'UTC')
...
""",
destination_dataset_table=table_name,
task_id='generate',
bigquery_conn_id=CONNECTION_ID,
use_legacy_sql=False,
write_disposition='WRITE_TRUNCATE',
create_disposition='CREATE_IF_NEEDED',
dag=dag,
)
You can see the bql and sql fields are templated. However the bql field is deprecated and removed in later code.
The issue is you are using query_params which is not a templated field as #dlamblin mentioned.
Use the following code that directly uses the execution_date date inside bql:
import airflow
from airflow.models import DAG, Variable
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from datetime import datetime,timedelta
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
import os
CONNECTION_ID = Variable.get("Your_Connection")
args = {
'owner': 'airflow',
'start_date': datetime(2018, 12, 27, 11, 15),
'retries': 4,
'retry_delay': timedelta(minutes=10)
}
dag = DAG(
dag_id='My_Test_DAG',
default_args=args,
schedule_interval='15 * * * *',
max_active_runs=1,
catchup=False,
)
query = """select customers_email_address as email,
from mytable
where
and date_purchased = TIMESTAMP_SUB(TIMESTAMP_TRUNC(cast ({{ execution_date.strftime('%Y-%m-%d %H:%M') }} as TIMESTAMP), HOUR, 'UTC'), INTERVAL 1 HOUR) """
create_orders_temp_table_op = BigQueryOperator(
bql = query,
destination_dataset_table='some table',
task_id='create_orders_temp_table',
bigquery_conn_id=CONNECTION_ID,
use_legacy_sql=False,
write_disposition='WRITE_TRUNCATE',
create_disposition='CREATE_IF_NEEDED',
dag=dag)
start_task_op = DummyOperator(task_id='start_task', dag=dag)
start_task_op >> create_orders_temp_table_op

How to parse json string in airflow template

Is it possible to parse JSON string inside an airflow template?
I have a HttpSensor which monitors a job via a REST API, but the job id is in the response of the upstream task which has xcom_push marked True.
I would like to do something like the following, however, this code gives the error jinja2.exceptions.UndefinedError: 'json' is undefined
t1 = SimpleHttpOperator(
http_conn_id="s1",
task_id="job",
endpoint="some_url",
method='POST',
data=json.dumps({ "foo": "bar" }),
xcom_push=True,
dag=dag,
)
t2 = HttpSensor(
http_conn_id="s1",
task_id="finish_job",
endpoint="job/{{ json.loads(ti.xcom_pull(\"job\")).jobId }}",
response_check=lambda response: True if response.json().state == "complete" else False,
poke_interval=5,
dag=dag
)
t2.set_upstream(t1)
You can add a custom Jinja filter to your DAG with the parameter user_defined_filters to parse the json.
a dictionary of filters that will be exposed
in your jinja templates. For example, passing
dict(hello=lambda name: 'Hello %s' % name) to this argument allows
you to {{ 'world' | hello }} in all jinja templates related to
this DAG.
dag = DAG(
...
user_defined_filters={'fromjson': lambda s: json.loads(s)},
)
t1 = SimpleHttpOperator(
task_id='job',
xcom_push=True,
...
)
t2 = HttpSensor(
endpoint='job/{{ (ti.xcom_pull("job") | fromjson)["jobId"] }}',
...
)
However, it may be cleaner to just write your own custom JsonHttpOperator plugin (or add a flag to SimpleHttpOperator) that parses the JSON before returning so that you can just directly reference {{ti.xcom_pull("job")["jobId"] in the template.
class JsonHttpOperator(SimpleHttpOperator):
def execute(self, context):
text = super(JsonHttpOperator, self).execute(context)
return json.loads(text)
Alternatively, it is also possible to add the json module to the template by doing and the json will be available for usage inside the template. However, it is probably a better idea to create a plugin like Daniel said.
dag = DAG(
'dagname',
default_args=default_args,
schedule_interval="#once",
user_defined_macros={
'json': json
}
)
then
finish_job = HttpSensor(
task_id="finish_job",
endpoint="kue/job/{{ json.loads(ti.xcom_pull('job'))['jobId'] }}",
response_check=lambda response: True if response.json()['state'] == "complete" else False,
poke_interval=5,
dag=dag
)

Resources