Is it possible to pass parameters to Airflow's jobs through UI?
AFAIK, 'params' argument in DAG is defined in python code, therefore it can't be changed at runtime.
Depending on what you're trying to do, you might be able to leverage Airflow Variables. These can be defined or edited in the UI under the Admin tab. Then your DAG code can read the value of the variable and pass the value to the DAG(s) it creates.
Note, however, that although Variables let you decouple values from code, all runs of a DAG will read the same value for the variable. If you want runs to be passed different values, your best bet is probably to use airflow templating macros and differentiate macros with the run_id macro or similar
Two ways to change your DAG behavior:
Use Airflow variables like mentioned by Bryan in his answer.
Use Airflow JSON Conf to pass JSON data to a single DAG run. JSON can be passed either from
UI - manual trigger from tree view
UI - create new DAG run from browse > DAG runs > create new record
or from
CLI
airflow trigger_dag 'MY_DAG' -r 'test-run-1' --conf '{"exec_date":"2021-09-14"}'
Within the DAG this JSON can be accessed using jinja templates or in the operator callable function context param.
def do_some_task(**context):
print(context['dag_run'].conf['exec_date'])
task1 = PythonOperator(
task_id='task1_id',
provide_context=True,
python_callable=do_some_task,
dag=dag,
)
#access in templates
task2 = BashOperator(
task_id="task2_id",
bash_command="{{ dag_run.conf['exec_date'] }}",
dag=dag,
)
Note that the JSON conf will not be present during scheduled runs. The best use case for JSON conf is to override the default DAG behavior. Hence set meaningful defaults in the DAG code so that during scheduled runs JSON conf is not used.
Related
When the DAG is triggered manually there are multiple ways to pass the config. It could be done from the UI, via the airflow CLI using --conf argument & using the REST API.
But when a DAG is scheduled using a cron expression, the DAG always fails because the tasks in the DAG are expecting the values from conf.
Is there a DAG level configuration which can be used to set "default" values for conf values (WITHOUT doing a null check in the Python code itself and hardcoding a default value)
The reason I do not want to do this null check in the code itself is because I want the conf keys & default values to be exposed via an Airflow API if possible
I am trying to write a test case where I:
instantiate a collection of (Python)Operators (patching some of their dependencies with unittest.mock.patch)
arrange those Operators in a DAG
run that DAG
make assertions about the calls to various mocked downstreams
I see from here that running a DAG is not so simple as calling dag.run - I should instantiate a local_client and call trigger_dag on that. However, the resultant code constructs its own DagBag, and does not accept any parameter that allows me to pass in my manually-constructed DAG - so I cannot see how to run this DAG with local_client.
I see a couple of options here:
I could declare my testing DAG in a separate folder, as specified by DagModel.get_current(dag_id).fileloc, so that my DAG will get picked up by trigger_dag and so run - but this seems pretty indirect, and also I doubt that I'd be able to cleanly reference the injected mocks from my test code.
I could directly call api.common.experimental.trigger_dag._trigger_dag, which has a dag_bag argument. Both the experimental in the name, and the underscored-prefixed method name, suggest that this would be A Bad Idea.
I am trying to pass jinja template to the DAG constructor. As the airflow best practice suggest don't call Variables outside the execute method.
Below is the code snippet.
dag = DAG(
dag_id=dag_id,
schedule_interval=schedule_interval,
dagrun_timeout=timedelta(hours=max_dagrun),
template_searchpath='{{var.value.sql_path}}')
But its failing to parse this.
Any suggestion how to pass these type of variables ? These are passed to dags not to the airflow operators.
Thanks in advance.
I have a branch task that relies on an XCOM set by it's direct upstream. The upstream task id's are generated via loop such as task_1, task_2..task_n.
So something like this:
task_n >> branch[task_a, task_b]
Is there a way for a branch to access an XCOM set by it's direct upstream? I know I could use op_kwargs and pass the task id to the branch. I just wanted to see if there was a more Airflow native way to do it.
The PythonBranchOperator should be created with provide_context=True and the python callable for it can look something like this:
def branch_callable(task_instance, task, **kwargs):
upstream_ids = task.upstream_task_ids # an iterable
xcoms = task_instance.xcom_pull(task_ids=upstream_ids)
# process the xcoms of the direct upstream tasks
I read the API reference and couldnt find anything on it, is that possible?
Currently, there is no such feature that does it out-of-the-box but you can write some custom code in your DAG to get around this. For example, use PythonOperator (you can use MySQL operator if your metadata db is mysql) to get status of the last X runs for the dag.
use BranchPythonOperator to see if the number is more than X, if it is then use a BashOperator to run airflow pause dag cli.
You can also just make this a 2-step task by adding logic of PythonOperator in BranchPythonOperator. This is just an idea, you can use a different logic.