Airflow set dag run note - airflow

How to use the note present in DAG runs panel from the ui?
I would want to programmatically fill it. For example changing the content depending the on the params passed to the DAG run

Related

How Can I pass a Variable to a Single Airflow Task Instance in UI

I run airflow on Kubernetes (so don't want a solution involving CLI commands, everything should be doable via the GUI ideally.)
I have some task and want to inject a variable to the command manually only. I can achieve this with airflow variables, but the user has to create then reset the variable.
With variables it might look like:
flag = Variable.get(
"NAME_OF_VARIABLE", False
)
append_args = "--injected-argument" if flag == "True" else ""
Or you could use jinja templating.
Is there a way to inject variables one off to the task without the CLI?
There's no way to pass a value to one single task in Airflow, but you can trigger a DAG and provide a JSON object for that one single DAG run.
The JSON object is accessible when templating as {{ dag_run.conf }}.

Airflow: Set dag to not be automatically scheduled

When I create a new dag, I have to go into the UI and click on the 'schedule' toggle to turn scheduling off. How can I do this without needing to use the UI? Is there an option in the DAG constructor itself?
In other words: how do I turn those buttons above to 'Off' in my DAG file?
There is no way to set a DAG as disabled within a DAG file. You can mimic the behavior by temporarily setting the DAG's schedule_interval to None. You can also set the airflow configuration value dags_are_paused_at_creation to True if you want to make sure all new DAGs to be off by default. You'll need to then turn new DAGs on manually in the UI when they are ready to be scheduled.
you can set the is_paused_upon_creation=True
DAG(dag_id=dag_id,
schedule_interval='#once',
...
is_paused_upon_creation=True)
There's no way to set this within the DAG file, but if you're trying to enable or disable a large amount of DAGs you can run an UPDATE statement in your Airflow database: UPDATE dag SET is_paused = TRUE;

Finding out what triggered a task run programmatically

Is there a way to programmatically determine what triggered the current task run of the PythonOperator from inside of the operator?
I want to differentiate between the task runs triggered on schedule, those catching up, and those triggered by the backfill CLI command.
The template context contains two variables: dag_run and run_id that you can use to determine whether the run was scheduled, a backfill, or externally triggered.
from airflow import jobs
def python_target(**context):
is_backfill = context["dag_run"].is_backfill
is_external = context["dag_run"].external_trigger
is_latest = context["execution_date"] == context["dag"].latest_execution_date
# More code...

How to paramaterize DAGs in airflow from UI?

Context: I've defined a airflow DAG which performs an operation, compute_metrics, on some data for an entity based on a parameter called org. Underneath something like myapi.compute_metrics(org) is called. This flow will mostly be run on an ad-hoc basis.
Problem: I'd like to be able to select the org to run the flow against when I manually trigger the DAG from the airflow UI.
The most straightforward solution I can think of is to generate n different DAGs, one for each org. The DAGs would have ids like: compute_metrics_1, compute_metrics_2, etc... and then when I need to trigger compute metrics for a single org, I can pick the DAG for that org. This doesn't scale as I add orgs and as I add more types of computation.
I've done some research and it seems that I can create a flask blueprint for airflow, which to my understanding, extends the UI. In this extended UI I can add input components, like a text box, for picking an org and then pass that as a conf to a DagRun which is manually created by the blueprint. Is that correct? I'm imaging I could write something like:
session = settings.Session()
execution_date = datetime.now()
run_id = 'external_trigger_' + execution_date.isoformat()
trigger = DagRun(
dag_id='general_compute_metrics_needs_org_id',
run_id=run_id,
state=State.RUNNING,
execution_date=execution_date,
external_trigger=True,
conf=org_ui_component.text) # pass the org id from a component in the blueprint
session.add(trigger)
session.commit() # I don't know if this would actually be scheduled by the scheduler
Is my idea sound? Is there a better way to achieve what I want?
I've done some research and it seems that I can create a flask blueprint for airflow, which to my understanding, extends the UI.
The blueprint extends the API. If you want some UI for it, you'll need to serve a template view. The most feature-complete way of achieve this is developing your own Airflow Plugin.
If you want to manually create DagRuns, you can use this trigger as reference. For simplicity, I'd trigger a Dag with the API.
And specifically about your problem, I would have a single DAG compute_metrics that reads the org from an Airflow Variable. They are global and can be set dynamically. You can prefix the variable name with something like the DagRun id to make it unique and thus dag-concurrent safe.

Airflow: Dynamic SubDag creation

I have a use case where I have a list of clients. The client can be added or removed from the list, and they can have different start dates, and different initial parameters.
I want to use airflow to backfill all data for each client based on their initial start date + rerun if something fails. I am thinking about creating a SubDag for each client. Will this address my problem?
How can I dynamically create SubDags based on the client_id?
You can definitely create DAG objects dynamically:
def make_client_dag(parent_dag, client):
return DAG(
'%s.client_%s' % (parent_dag.dag_id, client.name),
start_date = client.start_date
)
You could then use that method in a SubDagOperator from your main dag:
for client in clients:
SubDagOperator(
task_id='client_%s' % client.name,
dag=main_dag,
subdag = make_client_dag(main_dag, client)
)
This will create a subdag specific to each member of the collection clients, and each will run for the next invocation of the main dag. I'm not sure if you'll get the backfill behavior you want.

Resources