Importing a hardcoded code logic into several airflow dag scripts - airflow

There is a hardcoded sql code logic(combination of joins from multiple tables and filter conditions) which i want to import into several airflow dag scripts. So if any changes needs to be done in that sql code logic, I can do the change from a single location. And this hardcoded sql logic will be used as a paramater value in my operator which will be used in several dag scripts. What is the best approach I can go with a viable solution?
Thanks in Advance!!

Related

Does Apache Airflow has rule editor

Does apache airflow has rule editor, where we can define multiple rules like "(x>y) && (z==a)" etc.. and later we can integrate these rules in the workflow steps similar to Drools workbench editor?
No, depending on what you need to implement you can use variables to store some data. You can mainly store strings or a json structure.
Using those you can define some kind of structure that will be read inside the DAG. You can even create different tasks dynamically from the value of a Variable.

How to add Flyway to an existing database without a DDL script?

I currently have a Kotlin-Exposed project that I would like to add Flyway to. The problem I am having is that most documentation and answers online indicate that the best way to add Flwyay to an existing schema is to just have the first script be a data definition script. This usually would work, but since I'm dynamically generating my SQL with an ORM, this doesn't really make sense. Are there ways around this?
I really just want to use Flyway to add/delete persistent data that I will always need in certain tables. I don't want to insert it at the ORM level because if the application is run multiple times, then it can insert the data each time it's run (as opposed to Flyway where it will just migrate the database to the newest constructed state).
I think another way to word this question is: "Can I use Flyway for static data only, and not schema?"
Yes, you can.
Some info:
You are not required to have a first script containing the data definition / "baseline" of the schema. You can simply skip that.
When running Flyway against a non-empty database for the first time, you will still need to run the baseline command. In your case this will simply indicate to Flyway that it can assume the baseline schema is present and that it's safe to run migrations. (Your baseline schema was deployed by the ORM instead of a baseline script -- that's totally fine, Flyway won't check/doesn't care.)
You could choose to write your scripts that insert static data in a way that they are idempotent / use a guard clause so that they don't insert the data twice. That way it would be safe to use at the ORM level if you choose.

Is there any way to execute repeatable flyway scripts first?

We use flyway since years to maintain our DB scripts, and it does a wonderful job.
However there is one single situation where I am not really happy - possibly someone out there has a solution:
In order to reduce the number of scripts required (and also in order to keep overview about "where" our procedures are defined) I'd like to implement our functions/procedures in one script. Every time a procedure changes (or a new one is developed) this script shall be updated - repeatable scripts sound perfect for this purpose, but unfortunately they are not.
The drawback is, that a new procedure cannot be accessed by non-repeatable scripts, as repeatable scripts are executed last, so the procedure does not exist when the non-repeatable script executes.
I hoped I can control this by specifying different locations (e.g. loc_first containing the repeatables I want to be executed first, loc_normal for the standard scripts and the repeatables to be executed last).
Unfortunately the order of locations has no impact on execution order ;-(
What's the proper way to deal with this situation? Right now I need to specify the corresponding procedures in non-repeatable scripts, but that's exactly what I'd like to avoid ....
I found a workaround on my own: I'm using flyway directly with maven (the same would work in case you use the API of course). Each stage of my maven script has its own profile (specifiying URL etc.)
Now I create two profiles for every stage - so I have e.g. dev and devProcs.
The difference between these two maven profiles is, that the "[stage]Procs" profile operates on a different location (where only the repeatable scripts maintaining procedures are kept). Then I need to execute flyway twice - first with [stage]Procs then with [stage].
To me this looks a bit messy, but at least I can maintain my procedures in a repeatable script this way.
According to flyway docs, Repeatable migrations ALWAYS execute after versioned migration.
But, I guess, you can use Flyway callbacks. Looks like, beforeMigrate.sql callback is exactly what you need.

Can an Airflow DAG schedule be altered in The Airflow UI?

I'm new to Apache Airflow. I have been modifying the schedule_interval and replacing the python script, each time I want to change the execution time.
Can I change the DAG schedule without uploading a new python script?
Thanks
There is an Airflow Plugin which allows for (visual) Dag-Generation and Modification, see here. It seems to be outdated and not very actively developed, though.
The general idea for Airflow is, roughly speaking, etl-as-code, including benefits like code versioning etc., i.e. you need to be aware of problems arising from redefining such central aspects as the schedule time from the UI. F.e., when you could edit the schedule-time in the UI (but that would not alter the code itself), what would be the state of your DAG? However, it`s for sure not impossible and Airflow's design allows for such modifications.
tldr; One could of course customize the UI (see above, f.e. using Airflow Plugins) and in fact your requirement is very understandable, especially to account for non-technical users which can't upload/modify code.
Another, probably easier option might be to use Variables in Airflow, i.e. pull the schedule-time/cron-linke schedule-string (1 * * * *,daily etc.) from the an Airflow Variable; such Variables can be altered in the GUI, so this might work out for you.
The Variable option doesn't work, at least in Airflow 2.0.1
The message :
Broken DAG: [/opt/airflow/dags/ADB-365/big-kahuna-365-adb-incremental.py] Invalid Cron expression: Exactly 5 or 6 columns has to be specified for iteratorexpression.
appears in the main view.

In airflow, is there a good way to call another dag's task?

I've got dag_prime and dag_tertiary.
dag_prime: Scans through a directory and intends to call dag_tertiary
on each one. Currently a PythonOperator.
dag_tertiary: Scans through the directory passed to it and does (possibly time-intensive) calculations on the contents thereof.
I can call the secondary one from a system call from the python operator, but i feel like there's got to be a better way. I'd also like to consider queuing the dag_tertiary calls, if there's a simple way to do that. Is there a better way than using system calls?
Thanks!
Use airflow.operators.trigger_dagrun for calling one DAG from another.
The details can be found in operator trigger_dagrun Airflow documentation.
Following post gives a good example of using this operator:
https://www.linkedin.com/pulse/airflow-lesson-1-triggerdagrunoperator-siddharth-anand
Use TriggerDagRunOperator from airflow.operators.dagrun_operator and pass the other DAG name to triger_dag_id parameter.
Follow Airflow updated documentation dag_run_operator Airflow Documentation

Resources