Airflow Template_fields added but variable like {{ ds }} is not working - airflow

I want to pass airflow variables to SQL query template file like this (in sql/test.sql file):
select 'test', '{{ params.test_ds }}', '{{ test_dt }}' from test_table;
I created a Operator inherited from PostgresOperator:
class EtlOperator(PostgresOperator):
template_fields = ('sql', 'test_dt', 'params')
template_ext = PostgresOperator.template_ext
#apply_defaults
def __init__(self, test_dt, params, *args, **kwargs):
super(EtlRunIdOperator, self).__init__(*args, **kwargs)
self.test_dt = test_dt
self.params = params
def execute(self, context):
super(EtlRunIdOperator, self).execute(context)
I created this task:
test_task00 = EtlOperator(
task_id=f'test_task00',
postgres_conn_id='redshift',
sql='sql/test.sql',
params={
'test_ds': '{{ ds }}'
},
database='default',
test_dt='{{ execution_date }}',
provide_context=True, # tried without it too
dag=dag
)
However, no matter the params or test_dt that are part of template_fields, the SQL still not parsing the variables, results like this:
INFO - Executing: select 'test', '{{ ds }}', '' from test_table;
Is there anywhere wrong in my configurations?

Related

Airflow custom operator variables

I need to pass Airflow connection settings(AWS, Postgres) to docker container environment variables
I'm trying to do this using custom Operator and BaseHook. \
class S3ToPostgresDockerOperator(DockerOperator):
#apply_defaults
def __init__(self, aws_conn_id='aws_default', postgres_conn_id='postgres_default', **kwargs):
super(S3ToPostgresDockerOperator, self).__init__(**kwargs)
self.aws_conn = BaseHook.get_connection(aws_conn_id)
self.pg_conn = BaseHook.get_connection(postgres_conn_id)
Is it possible to do something like that, or if not how should I do it?
java_unpack_csv = S3ToPostgresDockerOperator(
...
environment={
'AWS_ACCESS_KEY': '{{ ??? }}',
'AWS_SECRET_KEY': '{{ ??? }}'
}
)
You can build up the environment kwarg passed in the DockerOperator constructor.
For example,
class S3ToPostgresDockerOperator(DockerOperator):
#apply_defaults
def __init__(self, aws_conn_id='aws_default', postgres_conn_id='postgres_default', **kwargs):
self.aws_conn = BaseHook.get_connection(aws_conn_id)
self.pg_conn = BaseHook.get_connection(postgres_conn_id)
credentials = self.aws_conn.get_credentials()
kwargs['environment'] = dict(
kwargs.pop('environment', {}),
AWS_ACCESS_KEY=credentials.access_key,
AWS_SECRET_KEY=credentials.secret_key,
PG_DATABASE_URI=self.pg_conn.get_uri()
)
super(S3ToPostgresDockerOperator, self).__init__(**kwargs)

Airflow: Jinja template not being rendered in subdag pythonOperator

I am passing a set parameters in SubDagOperator function which then calls PythonOprerator but the Jinja Template or macros are not getting rendered
file_check = SubDagOperator(
task_id=SUBDAG_TASK_ID,
subdag=load_sub_dag(
dag_id='%s.%s' % (DAG_ID, SUBDAG_TASK_ID),
params={
'countries': ["SG"],
'date': '{{ execution_date.strftime("%Y%m%d") }}',
'poke_interval': 60,
'timeout': 60 * 5
},
start_date=default_args['start_date'],
email=default_args['email'],
schedule_interval=None,
),
dag=dag
)
now in the load dag operator call an pythonOperaor as
def load_sub_dag(dag_id, start_date, email, params, schedule_interval):
dag = DAG(
dag_id=dag_id,
schedule_interval=schedule_interval,
start_date=start_date
)
start = PythonOperator(
task_id="start",
python_callable=get_start_time,
provide_context=True,
dag=dag
)
file_paths = source_detail['path'].replace('$date', params['date'])
file_paths = [file_paths.replace("$cc", country) for country in params['countries']]
for file_path in file_paths:
i += 1
check_files = PythonOperator(
task_id="success_file_check_{}_{}".format(source, i),
python_callable=check_success_file,
op_kwargs={"file_path": file_path, "params": params,
"success_file_name": success_file_name,
"hourly": hourly},
provide_context=True,
retries=0,
dag=dag
)
start >> check_files
return dag
Now, as far as I know, the jinja template should get rendered in the op_kawrgs section of check_file pythonoperator but it is not happening rather I am getting the same string in final file name
Also when I see task details I see file name as u'/something/dt={{ execution_date.strftime("%Y%m%d") }}'
Airflow ver - 1.10.2 &
celeryexecutor

Airflow is taking jinja template as string

in Airflow im trying to us jinja template in airflow but the problem is it is not getting parsed and rather treated as a string . Please see my code
``
from datetime import datetime
from airflow.operators.python_operator import PythonOperator
from airflow.models import DAG
def test_method(dag,network_id,schema_name):
print "Schema_name in test_method", schema_name
third_task = PythonOperator(
task_id='first_task_' + network_id,
provide_context=True,
python_callable=print_context2,
dag=dag)
return third_task
dag = DAG('testing_xcoms_pull', description='Testing Xcoms',
schedule_interval='0 12 * * *',
start_date= datetime.today(),
catchup=False)
def print_context(ds, **kwargs):
return 'Returning from print_context'
def print_context2(ds, **kwargs):
return 'Returning from print_context2'
def get_schema(ds, **kwargs):
# Returning schema name based on network_id
schema_name = "my_schema"
return get_schema
first_task = PythonOperator(
task_id='first_task',
provide_context=True,
python_callable=print_context,
dag=dag)
second_task = PythonOperator(
task_id='second_task',
provide_context=True,
python_callable=get_schema,
dag=dag)
network_id = '{{ dag_run.conf["network_id"]}}'
first_task >> second_task >> test_method(
dag=dag,
network_id=network_id,
schema_name='{{ ti.xcom_pull("second_task")}}')
``
The Dag creation is failing because '{{ dag_run.conf["network_id"]}}' is taken as string by airflow. Can anyone help me with the problem in my code ???
Airflow operators have a variable called template_fields. This variable is usually declared at the top of the operator Class, check out any of the operators in the github code base.
If the field you are trying to pass Jinja template syntax into is not in the template_fields list the jinja syntax will appear as a string.
A DAG object, and its definition code, isn't parsed within the context an execution, it's parsed with regards to the environment available to it when loaded by Python.
The network_id variable, which you use to define the task_id in your function, isn't templated prior to execution, it can't be since there is no execution active. Even with templating you still need a valid, static, non-templated task_id value to instantiate a DAG object.

How to parse json string in airflow template

Is it possible to parse JSON string inside an airflow template?
I have a HttpSensor which monitors a job via a REST API, but the job id is in the response of the upstream task which has xcom_push marked True.
I would like to do something like the following, however, this code gives the error jinja2.exceptions.UndefinedError: 'json' is undefined
t1 = SimpleHttpOperator(
http_conn_id="s1",
task_id="job",
endpoint="some_url",
method='POST',
data=json.dumps({ "foo": "bar" }),
xcom_push=True,
dag=dag,
)
t2 = HttpSensor(
http_conn_id="s1",
task_id="finish_job",
endpoint="job/{{ json.loads(ti.xcom_pull(\"job\")).jobId }}",
response_check=lambda response: True if response.json().state == "complete" else False,
poke_interval=5,
dag=dag
)
t2.set_upstream(t1)
You can add a custom Jinja filter to your DAG with the parameter user_defined_filters to parse the json.
a dictionary of filters that will be exposed
in your jinja templates. For example, passing
dict(hello=lambda name: 'Hello %s' % name) to this argument allows
you to {{ 'world' | hello }} in all jinja templates related to
this DAG.
dag = DAG(
...
user_defined_filters={'fromjson': lambda s: json.loads(s)},
)
t1 = SimpleHttpOperator(
task_id='job',
xcom_push=True,
...
)
t2 = HttpSensor(
endpoint='job/{{ (ti.xcom_pull("job") | fromjson)["jobId"] }}',
...
)
However, it may be cleaner to just write your own custom JsonHttpOperator plugin (or add a flag to SimpleHttpOperator) that parses the JSON before returning so that you can just directly reference {{ti.xcom_pull("job")["jobId"] in the template.
class JsonHttpOperator(SimpleHttpOperator):
def execute(self, context):
text = super(JsonHttpOperator, self).execute(context)
return json.loads(text)
Alternatively, it is also possible to add the json module to the template by doing and the json will be available for usage inside the template. However, it is probably a better idea to create a plugin like Daniel said.
dag = DAG(
'dagname',
default_args=default_args,
schedule_interval="#once",
user_defined_macros={
'json': json
}
)
then
finish_job = HttpSensor(
task_id="finish_job",
endpoint="kue/job/{{ json.loads(ti.xcom_pull('job'))['jobId'] }}",
response_check=lambda response: True if response.json()['state'] == "complete" else False,
poke_interval=5,
dag=dag
)

Airflow - how to make EmailOperator html_content dynamic?

I'm looking for a method that will allow the content of the emails sent by a given EmailOperator task to be set dynamically. Ideally I would like to make the email contents dependent on the results of an xcom call, preferably through the html_content argument.
alert = EmailOperator(
task_id=alertTaskID,
to='please#dontreply.com',
subject='Airflow processing report',
html_content='raw content #2',
dag=dag
)
I notice that the Airflow docs say that xcom calls can be embedded in templates. Perhaps there is a way to formulate an xcom pull using a template on a specified task ID then pass the result in as html_content? Thanks
Use PythonOperator + send_email instead:
from airflow.operators import PythonOperator
from airflow.utils.email import send_email
def email_callback(**kwargs):
with open('/path/to.html') as f:
content = f.read()
send_email(
to=[
# emails
],
subject='subject',
html_content=content,
)
email_task = PythonOperator(
task_id='task_id',
python_callable=email_callback,
provide_context=True,
dag=dag,
)
For those looking for an exact example of using jinja template with EmailOperator, here is one
from airflow.operators.email_operator import EmailOperator
from datetime import timedelta, datetime
email_task = EmailOperator(
to='some#email.com',
task_id='email_task',
subject='Templated Subject: start_date {{ ds }}',
params={'content1': 'random'},
html_content="Templated Content: content1 - {{ params.content1 }} task_key - {{ task_instance_key_str }} test_mode - {{ test_mode }} task_owner - {{ task.owner}} hostname - {{ ti.hostname }}",
dag=dag)
You can test run the above code snippet using
airflow test dag_name email_task 2017-05-10
might as well answer this myself. Turns out it's fairly straight forward using the template+xcom route. This code snippet works in the context of an already defined dag. It uses the BashOperator instead of EmailOperator because it's easier to test.
def pushparam(param, ds, **kwargs):
kwargs['ti'].xcom_push(key='specificKey', value=param)
return
loadxcom = PythonOperator(
task_id='loadxcom',
python_callable=pushparam,
provide_context=True,
op_args=['your_message_here'],
dag=dag)
template2 = """
echo "{{ params.my_param }}"
echo "{{ task_instance.xcom_pull(task_ids='loadxcom', key='specificKey') }}"
"""
t5 = BashOperator(
task_id='tt2',
bash_command=template2,
params={'my_param': 'PARAMETER1'},
dag=dag)
can be tested on commandline using something like this:
airflow test dag_name loadxcom 2015-12-31
airflow test dag_name tt2 2015-12-31
I will eventually test with EmailOperator and add something here if it doesn't work...

Resources