When the DAG is triggered manually there are multiple ways to pass the config. It could be done from the UI, via the airflow CLI using --conf argument & using the REST API.
But when a DAG is scheduled using a cron expression, the DAG always fails because the tasks in the DAG are expecting the values from conf.
Is there a DAG level configuration which can be used to set "default" values for conf values (WITHOUT doing a null check in the Python code itself and hardcoding a default value)
The reason I do not want to do this null check in the code itself is because I want the conf keys & default values to be exposed via an Airflow API if possible
Related
I have a dag that triggers an external DAG using TriggerDagOperator. The trigger DAG queries a database and based on a type ID, it will trigger the associated external DAG along with the parameters needed for the external DAG. I would want to pass a pool name as part of these parameters and just wondering if I can create the pool in the (external) DAG if it does not exists.
I have an airflow DAG that is triggered externally via cli.
I have a requirement to change order of the execution of tasks based on a Boolean parameter which I would be getting from the CLI.
How do I achieve this?
I understand dag_run.conf can only be used in a template field of an operator.
Thanks in advance.
You can not change tasks dependency with runtime parameter.
However you can pass runtime parameter (with dag_run.conf) that according to it's value tasks will be executed or be skipped for that you need to place operators in your workflow that can handle this logic for example: ShortCircuitOperator, BranchPythonOperator
I read the API reference and couldnt find anything on it, is that possible?
Currently, there is no such feature that does it out-of-the-box but you can write some custom code in your DAG to get around this. For example, use PythonOperator (you can use MySQL operator if your metadata db is mysql) to get status of the last X runs for the dag.
use BranchPythonOperator to see if the number is more than X, if it is then use a BashOperator to run airflow pause dag cli.
You can also just make this a 2-step task by adding logic of PythonOperator in BranchPythonOperator. This is just an idea, you can use a different logic.
I would like to kick off dags on a remote webserver. These dags require arguments in order to make sense. Locally, I use a command like this:
airflow trigger_dag dag_id --conf '{"parameter":"~/path" }'
The problem is that this assumes I'm running locally. How can I trigger a dag on a remote airflow server with arguments? I realize I could use the ui to hit the play button, but that doesn't allow you to pass arguments that I am aware of.
Example url:
http://localhost:8080/api/experimental/dags/<dag_id>/dag_runs
Example post payload(application/json):
{"conf":"{\"client\":\"popsicle\"}"}
Note that the embedded conf object must be a string, not an object.
Is it possible to pass parameters to Airflow's jobs through UI?
AFAIK, 'params' argument in DAG is defined in python code, therefore it can't be changed at runtime.
Depending on what you're trying to do, you might be able to leverage Airflow Variables. These can be defined or edited in the UI under the Admin tab. Then your DAG code can read the value of the variable and pass the value to the DAG(s) it creates.
Note, however, that although Variables let you decouple values from code, all runs of a DAG will read the same value for the variable. If you want runs to be passed different values, your best bet is probably to use airflow templating macros and differentiate macros with the run_id macro or similar
Two ways to change your DAG behavior:
Use Airflow variables like mentioned by Bryan in his answer.
Use Airflow JSON Conf to pass JSON data to a single DAG run. JSON can be passed either from
UI - manual trigger from tree view
UI - create new DAG run from browse > DAG runs > create new record
or from
CLI
airflow trigger_dag 'MY_DAG' -r 'test-run-1' --conf '{"exec_date":"2021-09-14"}'
Within the DAG this JSON can be accessed using jinja templates or in the operator callable function context param.
def do_some_task(**context):
print(context['dag_run'].conf['exec_date'])
task1 = PythonOperator(
task_id='task1_id',
provide_context=True,
python_callable=do_some_task,
dag=dag,
)
#access in templates
task2 = BashOperator(
task_id="task2_id",
bash_command="{{ dag_run.conf['exec_date'] }}",
dag=dag,
)
Note that the JSON conf will not be present during scheduled runs. The best use case for JSON conf is to override the default DAG behavior. Hence set meaningful defaults in the DAG code so that during scheduled runs JSON conf is not used.