I have a DAG with three bash tasks which is scheduled to run every day.
I would like to access unique ID of dag instance(may be PID) in all bash scripts.
Is there any way to do this?
I am looking for similar functionality as Oozie where we can access WORKFLOW_ID in workflow xml or java code.
Can somebody point me to documentation of AirFlow on "How to use in-build and custom variables in AirFlow DAG"
Many Thanks
Pari
Object's attributes can be accessed with dot notation in jinja2 (see https://airflow.apache.org/code.html#macros). In this case, it would simply be:
{{ dag.dag_id }}
i made use of the fact that the python object for dag prints out the name of the current dag. so i just use jinja2 to change the dag name:
{{ dag | replace( '<DAG: ', '' ) | replace( '>', '' ) }}
bit of a hack, but it works.
therefore,
clear_upstream = BashOperator( task_id='clear_upstream',
trigger_rule='all_failed',
bash_command="""
echo airflow clear -t upstream_task -c -d -s {{ ts }} -e {{ ts }} {{ dag | replace( '<DAG: ', '' ) | replace( '>', '' ) }}
"""
)
Related
In one of my airflow task, I want to pass the current Month current hour and last hour(in UTC) as a variable.
I know macros are there, but the airflow is running with IST timestamp, how can I get the variable data in UTC? any sample code?
execution_date is Pendulum object so you can use in_tz()
{{ execution_date.in_tz('UTC') }}
you can then format the pattern and extract the str as you need.
For example to get the month:
op = BashOperator(
task_id="example",
bash_command="echo the month extracted from {{ execution_date.in_tz('UTC') }}" \
" is {{ execution_date.in_tz('UTC').strftime('%m') }}"
)
Here is an example of the KubernetesPodOperator I am trying --
set_tag = KubernetesPodOperator(
namespace='default',
task_id='set-tag',
name='set-tag',
image='ubuntu:18.04',
xcom_push=True,
cmds=["/bin/sh", "-c"],
arguments=['''mkdir /airflow &&
mkdir /airflow/xcom &&
echo '{"test_key":"test_value"}' > /airflow/xcom/return.json
''']
)
In the next downstream PythonOperator, I am trying to fetch this tag as follows -
def print_tag(**kwargs):
ti = kwargs['ti']
print(ti.xcom_pull(task_ids='set-tag', key='test_key'))
get_tag = PythonOperator(
task_id='get-tag',
dag=dag,
python_callable=print_tag,
provide_context=True
)
I am using 'airflow test' to first run task 'set-tag' and then run 'get-tag' hoping to see the 'test_value' printed. But the printed value appears as 'None'.
Any pointers are much appreciated.
Thanks in advance.
For the moment name of argument of KubernetesPodOperator for xcom push is do_xcom_push, not xcom_push
Source code
I have a DAG that is triggered externally with some additional parameters say 'name'.
Sample code:
with airflow.DAG(
'my_dag_name',
default_args=default_args,
# Not scheduled, trigger only
schedule_interval=None) as dag:
start = bash_operator.BashOperator(
task_id='start',
bash_command='echo Hello.')
some_operation = MyOperator(
task_id='my_task',
name='{{ dag_run.conf["name"] }}')
goodbye = bash_operator.BashOperator(
task_id='end',
bash_command='echo Goodbye.')
start >> some_operation >> goodbye
Now if I use {{ dag_run.conf["name"] }} directly with the echo for a BashOperator, it works. Another way to read the parameter is to use a PythonOperator where I can read conf by kwargs['dag_run'].conf['name'].
However, what I really want is to have the name beforehand so that I can pass it while construction of the MyOperator.
In Airflow I am creating branches with different operators with a for loop, my code looks like this:
for table in ['messages', 'conversations']:
Operator1 with operator1.task_id = 'operator1_{}'.format(table)
Operator1 does kwargs['ti'].xcom_push(key='file_name', value='y')
Operator2 is a BashOperator that needs to run:
bash_command = "echo {{ ti.xcom_pull(task_ids='operator1_{}', key='file_name') }}".format(table)
Operator1 >> Operator2
But in the UI the commands are rendered like that:
echo { ti.xcom_pull(task_ids='operator1_messages', key='file_name') }
echo { ti.xcom_pull(task_ids='operator1_conversations', key='file_name') }
How should I write the bash_command to have Airflow interpret correctly the template?
If I write directly
bash_command = "echo {{ ti.xcom_pull(task_ids='operator1_messages', key='file_name') }}"
it works but I want to create this command from a for loop.
Thanks!
It's doing this because the .format(table) part of your bash command is stripping off the outer { and }. You may be able to fix this with the following instead:
bash_command = "echo {{ ti.xcom_pull(task_ids='operator1_" + table + "', key='file_name') }}"
Whether this is the best way to do it is probably another question.
I'm looking for a method that will allow the content of the emails sent by a given EmailOperator task to be set dynamically. Ideally I would like to make the email contents dependent on the results of an xcom call, preferably through the html_content argument.
alert = EmailOperator(
task_id=alertTaskID,
to='please#dontreply.com',
subject='Airflow processing report',
html_content='raw content #2',
dag=dag
)
I notice that the Airflow docs say that xcom calls can be embedded in templates. Perhaps there is a way to formulate an xcom pull using a template on a specified task ID then pass the result in as html_content? Thanks
Use PythonOperator + send_email instead:
from airflow.operators import PythonOperator
from airflow.utils.email import send_email
def email_callback(**kwargs):
with open('/path/to.html') as f:
content = f.read()
send_email(
to=[
# emails
],
subject='subject',
html_content=content,
)
email_task = PythonOperator(
task_id='task_id',
python_callable=email_callback,
provide_context=True,
dag=dag,
)
For those looking for an exact example of using jinja template with EmailOperator, here is one
from airflow.operators.email_operator import EmailOperator
from datetime import timedelta, datetime
email_task = EmailOperator(
to='some#email.com',
task_id='email_task',
subject='Templated Subject: start_date {{ ds }}',
params={'content1': 'random'},
html_content="Templated Content: content1 - {{ params.content1 }} task_key - {{ task_instance_key_str }} test_mode - {{ test_mode }} task_owner - {{ task.owner}} hostname - {{ ti.hostname }}",
dag=dag)
You can test run the above code snippet using
airflow test dag_name email_task 2017-05-10
might as well answer this myself. Turns out it's fairly straight forward using the template+xcom route. This code snippet works in the context of an already defined dag. It uses the BashOperator instead of EmailOperator because it's easier to test.
def pushparam(param, ds, **kwargs):
kwargs['ti'].xcom_push(key='specificKey', value=param)
return
loadxcom = PythonOperator(
task_id='loadxcom',
python_callable=pushparam,
provide_context=True,
op_args=['your_message_here'],
dag=dag)
template2 = """
echo "{{ params.my_param }}"
echo "{{ task_instance.xcom_pull(task_ids='loadxcom', key='specificKey') }}"
"""
t5 = BashOperator(
task_id='tt2',
bash_command=template2,
params={'my_param': 'PARAMETER1'},
dag=dag)
can be tested on commandline using something like this:
airflow test dag_name loadxcom 2015-12-31
airflow test dag_name tt2 2015-12-31
I will eventually test with EmailOperator and add something here if it doesn't work...