How to combine Airflow operators? - airflow

Almost all of my DAGs will have a subset of repeating Operators. And because of my use case, it works out really well to create new wrapper Operators that combines multiple Operators in order to reduce boilerplate.
My question is how to go about combining them?
An example would be to query a database followed by sending a Slack message.
postgres_operator_task >> slack_operator_task -> query_then_slack_operator_task

Check the idea of subdags.
Resources:
SubDAGs - Ariflow Documentation .
Using SubDAGs in Airflow -Astronomer .
Example subdag .

Related

Importing a hardcoded code logic into several airflow dag scripts

There is a hardcoded sql code logic(combination of joins from multiple tables and filter conditions) which i want to import into several airflow dag scripts. So if any changes needs to be done in that sql code logic, I can do the change from a single location. And this hardcoded sql logic will be used as a paramater value in my operator which will be used in several dag scripts. What is the best approach I can go with a viable solution?
Thanks in Advance!!

Limitation of tags in an Airflow DAG

I'm looking for the best practice of tags in Airflow.
any limitation on the size of a tag name?
how many tags are good for an Airflow DAG?
what's a good tag example vs using the naming convention of the DAG's name? For example, which one is better: all Ads teams' dags are tagged with "Ads" or named as ads_XXX_XXX?
Thanks
The max length of a tag is 100 chars.
As far as a tagging strategy, it's entirely up to you. Not using tags is also a valid option. There is no right or wrong when it comes to naming conventions & tags. Use whatever makes the most sense to you.
They are used to help filter dags so you do not have to rely on the name. You could filter them by team, by task (a "downloader" dag vs a "loading data" dag etc), or any other arbitrary group you think would be useful.

Does Apache Airflow has rule editor

Does apache airflow has rule editor, where we can define multiple rules like "(x>y) && (z==a)" etc.. and later we can integrate these rules in the workflow steps similar to Drools workbench editor?
No, depending on what you need to implement you can use variables to store some data. You can mainly store strings or a json structure.
Using those you can define some kind of structure that will be read inside the DAG. You can even create different tasks dynamically from the value of a Variable.

Can an Airflow DAG schedule be altered in The Airflow UI?

I'm new to Apache Airflow. I have been modifying the schedule_interval and replacing the python script, each time I want to change the execution time.
Can I change the DAG schedule without uploading a new python script?
Thanks
There is an Airflow Plugin which allows for (visual) Dag-Generation and Modification, see here. It seems to be outdated and not very actively developed, though.
The general idea for Airflow is, roughly speaking, etl-as-code, including benefits like code versioning etc., i.e. you need to be aware of problems arising from redefining such central aspects as the schedule time from the UI. F.e., when you could edit the schedule-time in the UI (but that would not alter the code itself), what would be the state of your DAG? However, it`s for sure not impossible and Airflow's design allows for such modifications.
tldr; One could of course customize the UI (see above, f.e. using Airflow Plugins) and in fact your requirement is very understandable, especially to account for non-technical users which can't upload/modify code.
Another, probably easier option might be to use Variables in Airflow, i.e. pull the schedule-time/cron-linke schedule-string (1 * * * *,daily etc.) from the an Airflow Variable; such Variables can be altered in the GUI, so this might work out for you.
The Variable option doesn't work, at least in Airflow 2.0.1
The message :
Broken DAG: [/opt/airflow/dags/ADB-365/big-kahuna-365-adb-incremental.py] Invalid Cron expression: Exactly 5 or 6 columns has to be specified for iteratorexpression.
appears in the main view.

In airflow, is there a good way to call another dag's task?

I've got dag_prime and dag_tertiary.
dag_prime: Scans through a directory and intends to call dag_tertiary
on each one. Currently a PythonOperator.
dag_tertiary: Scans through the directory passed to it and does (possibly time-intensive) calculations on the contents thereof.
I can call the secondary one from a system call from the python operator, but i feel like there's got to be a better way. I'd also like to consider queuing the dag_tertiary calls, if there's a simple way to do that. Is there a better way than using system calls?
Thanks!
Use airflow.operators.trigger_dagrun for calling one DAG from another.
The details can be found in operator trigger_dagrun Airflow documentation.
Following post gives a good example of using this operator:
https://www.linkedin.com/pulse/airflow-lesson-1-triggerdagrunoperator-siddharth-anand
Use TriggerDagRunOperator from airflow.operators.dagrun_operator and pass the other DAG name to triger_dag_id parameter.
Follow Airflow updated documentation dag_run_operator Airflow Documentation

Resources