How to access dag run config inside dag code - airflow

I am using a DAG (s3_sensor_dag) to trigger another DAG (advanced_dag) and I pass the tag_names configurations to the triggered DAG (advanced_dag) using the conf argument. It looks something like this:
s3_sensor_dag.py:
trigger_advanced_dag = TriggerDagRunOperator(
task_id="trigger_advanced_dag",
trigger_dag_id="advanced_dag",
wait_for_completion="True",
conf={"tag_names": "{{ task_instance.xcom_pull(key='tag_names', task_ids='get_file_paths') }}"}
)
In the advanced_dag, I am trying to access the dag_conf (tag_names) like this:
advanced_dag.py:
with DAG(
dag_id="advanced_dag",
start_date=datetime(2020, 12, 23),
schedule_interval=None,
is_paused_upon_creation=False,
catchup=False,
dagrun_timeout=timedelta(minutes=60),
) as dag:
dag_parser = DagParser(
home=HOME,
env=env,
global_cli_flags=GLOBAL_CLI_FLAGS,
tag=dag_run.conf["tag_names"]
)
But I get the error stating that dag_run does not exist. I realized that this is a run time variable from Accessing configuration parameters passed to Airflow through CLI.
So, I tried a solution which was mentioned in the comment that uses dag.get_dagrun(execution_date=dag.latest_execution_date).conf which goes something like:
dag_parser = DagParser(
home=HOME,
env=env,
global_cli_flags=GLOBAL_CLI_FLAGS,
tag=dag.get_dagrun(execution_date=dag.latest_execution_date).conf['tag_names']
)
But it looks like it didn't fetch the value either.
I was able to solve this issue by using Airflow Variables but I wanted to know if there is a way to use the dag_conf (which obviously gets data only during runtime) inside the dag() code and get the value.

Related

how to pass default values for run time input variable in airflow for scheduled execution

I come across one issue while running DAG in airflow. my code is working in two scenarios where is failing for one.
below are my scenarios,
Manual trigger with input - Running Fine
Manual trigger without input - Running Fine
Scheduled Run - Failing
Below is my code:
def decide_the_flow(**kwargs):
cleanup=kwargs['dag_run'].conf.get('cleanup','N')
print("IP is :",cleanup)
return cleanup
I am getting below error,
cleanup=kwargs['dag_run'].conf.get('cleanup','N')
AttributeError: 'NoneType' object has no attribute 'get'
I tried to define default variables like,
default_dag_args = {
'start_date':days_ago(0),
'params': {
"cleanup": "N"
},
'retries': 0
}
but it wont work.
I am using BranchPythonOperator to call this function.
Scheduling : enter image description here
Can anyone please guide me here. what I am missing ?
For workaround i am using below code,
try:
cleanup=kwargs['dag_run'].conf.get('cleanup','N')
except:
cleanup="N"
You can access the parameters from the context dict params, because airflow defines the default values on this dict after copying the dict dag_run.conf and checking if there is something missing:
from datetime import datetime
from airflow import DAG
from airflow.operators.empty import EmptyOperator
from airflow.operators.python import BranchPythonOperator
def decide_the_flow(**kwargs):
cleanup = kwargs['params']["cleanup"]
print(f"IP is : {cleanup}")
return cleanup
with DAG(
dag_id='airflow_params',
start_date=datetime(2022, 8, 25),
schedule_interval="* * * * *",
params={
"cleanup": "N",
},
catchup=False
) as dag:
branch_task = BranchPythonOperator(
task_id='test_param',
python_callable=decide_the_flow
)
task_n = EmptyOperator(task_id="N")
task_m = EmptyOperator(task_id="M")
branch_task >> [task_n, task_m]
I just tested it in scheduled and manual (with and without conf) runs, it works fine.

How to add data to Airflow DAG Context? [duplicate]

This question already has answers here:
Pass other arguments to on_failure_callback
(2 answers)
Closed 11 months ago.
We are working with airflow. We have something like 1000+ DAGS.
To manage DAG errors we use the same on_error_callback function to trigger alerts.
Each DAG is supposed to have context information, that could be expressed as constants, that I would like to share with the alerting stack.
Currently, I am only able to send the dag_id I retrieve from the context, via context['ti'].dag_id, and eventually the conf (parameters).
Is there a way to add other data (constants) to the context when declaring/creating the DAG?
Thanks.
You can pass params into the context.
dag = DAG(
dag_id='example_dag',
default_args=default_args,
params={
"param1": "value1",
"param2": "value2"
}
)
These are available in the
context
# example task that quickly outputs the context
start = PythonOperator(
task_id = 'start',
python_callable = lambda **context: print(context)
)
# outputs the context, in the conf dictionary you will see
# 'params': {'param1': 'value1'},
or via jinja templates
{{params.param1}}
define variables within a DAG and access variables by {{}} ref
Example ( with PROJECT_ID variable)
with models.DAG(
...
user_defined_macros={ "PROJECT_ID" : PROJECT_ID }
) as dag:
projectId = "{{PROJECT_ID}}"

How to see the templated values in airflow?

When I do something like:
some_value = "{{ dag_run.get_task_instance('start').start_date }}"
print(f"some interpolated value: {some_value}")
I see this in the airflow logs:
some interpolated value: {{ dag_run.get_task_instance('start').start_date }}
but not the actual value itself. How can I easily see what the value is?
Everything in the DAG task run comes through as kwargs (before 1.10.12 you needed to add provide_context, but all context is provided after version 2).
To get something out of kwargs, do something like this in your python callable:
run_id = kwargs['run_id']
print(f'run_id = {run_id}')
Additional info:
To get the kwargs out, add them to your callable, so:
def my_func(**kwargs):
run_id = kwargs['run_id']
print(f'run_id = {run_id}')
And just call this from your DAG task like:
my_task = PythonOperator(
task_id='my_task'
, dag=dag
, python_callable=my_func)
I'm not sure what your current code structure is because you haven't provided more info I'm afraid.

How to runn external DAG as part of my DAG?

I'm new to Airflow and I'm trying to run an external DAG (developed and owned by another team), as part of my DAG flow.
I was looking at SubDagOperator, but it seems that for some reason it enforces the name of the subdag to be . which I cannot do as the child dag is owned by a different team.
here is my code sample:
parent_dag = DAG(
dag_id='parent_dag', default_args=args,
schedule_interval=None)
external_dag = SubDagOperator(
subdag=another_teams_dag,
task_id='external_dag',
dag=parent_dag,
trigger_rule=TriggerRule.ALL_DONE
)
and the other team's dag is defined like this:
another_teams_dag = DAG(
dag_id='another_teams_dag', default_args=args,
schedule_interval=None)
but I'm getting this error:
The subdag's dag_id should have the form
'{parent_dag_id}.{this_task_id}'. Expected 'parent_dag.external_dag';
received 'another_teams_dag'.
Any ideas?
What am I missing?
Use TriggerDagRunOperator
More info: https://airflow.apache.org/code.html#airflow.operators.dagrun_operator.TriggerDagRunOperator
Example:
Dag that triggers: https://github.com/apache/incubator-airflow/blob/master/airflow/example_dags/example_trigger_controller_dag.py
Dag that is triggered: https://github.com/apache/incubator-airflow/blob/master/airflow/example_dags/example_trigger_target_dag.py
For your case, you can use something like:
trigger = TriggerDagRunOperator(task_id='external_dag',
trigger_dag_id="another_teams_dag",
dag=dag)

Apache airflow macro to get last dag run execution time

I thought the macro prev_execution_date listed here would get me the execution date of the last DAG run, but looking at the source code it seems to only get the last date based on the DAG schedule.
prev_execution_date = task.dag.previous_schedule(self.execution_date)
Is there any way via macros to get the execution date of the DAG when it doesn't run on a schedule?
Yes, you can define your own custom macro for this as follows:
# custom macro function
def get_last_dag_run(dag):
last_dag_run = dag.get_last_dagrun()
if last_dag_run is None:
return "no prev run"
else:
return last_dag_run.execution_date.strftime("%Y-%m-%d")
# add macro in user_defined_macros in dag definition
dag = DAG(dag_id="my_test_dag",
schedule_interval='#daily',
user_defined_macros={
'last_dag_run_execution_date': get_last_dag_run
}
)
# example of using it in practice
print_vals = BashOperator(
task_id='print_vals',
bash_command='echo {{ last_dag_run_execution_date(dag) }}',
dag=dag
)
Note that the dag.get_last_run() is just one of the many functions available on the Dag object. Here's where I found it: https://github.com/apache/incubator-airflow/blob/v1-10-stable/airflow/models.py#L3396
You can also tweak the formatting of the string for the date format, and what you want output if there is no previous run.
You can make your own user custom macro function, use airflow model to search meta-database.
def get_last_dag_run(dag_id):
//TODO search DB
return xxx
dag = DAG(
'example',
schedule_interval='0 1 * * *',
user_defined_macros={
'last_dag_run_execution_date': get_last_dag_run,
}
)
Then use the KEY in your template.

Resources