How to run airflow dag on the given date? - airflow

When we give the date 22-09-2022 to the dag then the dag is running on 21-09-2022 so that is there any module to add in the dag code that the day should on the given date only.

Related

airflow DAG triggering for time consuming runs

I am completely new to Airflow and am trying to grasp the concepts of scheduling and default args.
I have a scenario where I would like to schedule my DAG hourly to do some data transfer task between a source and a database. What I am trying to understand is, lets say one of my DAG runs has triggered at 00:00 AM. Now if it takes more than an hour for this run to successfully complete all of its tasks (say 1 hour 30 min), does it mean that the next DAG run that was supposed to be triggered at 01:00 AM will NOT get triggered but the DAG run from 02:00 AM will get triggered?
Yes.
In order to avoid, you need catchup=True for the DAG object.
Reference : https://airflow.apache.org/docs/apache-airflow/stable/dag-run.html
(Search for Catchup)
Airflow Scheduler used to monitor all DAGs and tasks in Airflow.Default Arguments can be used to create tasks with default parameters in DAG.
The first DAG runs based on start_date and runs based on scheduled_interval sequentially. Scheduler doesn’t trigger tasks until the period has ended.For your requirement you can set dag.catchup to true as to run the DAG for each completed interval and scheduler will execute them sequentially.Catchup is used to start the DAG run since the last data interval which has not started for any data interval.

Define maintenance windows in airflow

I am trying to define periods of time where the scheduler will not mark tasks as ready.
For example, I have a DAG with a lengthy backfill period. The execution of the backfill will run throughout the day until the DAG is caught up. However I do not want and executions of the DAG to execute between midnight and 2 am.
Is it possible to achieve this through configuration?

Why airflow scheduler not triggering my dag?

I was trying to schedule an airflow dag that given a start date it will trigger that dag after every scheduled interval. This is what my dag definition looks like.
with DAG('Some Dag',
start_date=datetime(year=2020, month=12, day=23),
schedule_interval=timedelta(days=1, hours=5, minutes=0),
catchup=False) as dag:
Doesn't it mean that the DAG will run on 2020/12/24 at 05:00:00 UTC? But in reality, it isn't triggering at that moment.
I know I can easily achieve this via a Cron expression. But I want to do it without a Cron expression.

how to get airflow DAG execution date using command line?

In order for me to get the dag_state, I run the following LCI command:
airflow dag_state example_bash_operator '12-12T16:04:46.960661+00:00'
The trouble is - I have to explicitly pass the exact date-time (i.e. execution_date) to this command.
When I run airflow list_dags I only get a listing of DAG's but not their execution dates.
Is there a way to obtain the exact date time (i.e. -> '12-12T16:04:46.960661+00:00')
for a given dag, using command line CLI?
There's a conceptual issue here. Dags are objects that have schedules, not execution dates. When the schedule is due, DagRuns are created for that Dag with the appropriate execution_date.
So you can ask for the state of a DagRun using the CLI and providing the execution_date, because execution dates (almost uniquely) map to a specific DagRun. Almost uniquely because in practice you can trigger two DagRuns with the same execution_date, but that's an unusual scenario.
But if you ask for the execution_date of a Dag, what do you really want to know? The execution_date of the last recently created DagRun? The list of execution_dates for the currently running DagRuns?
You can check list_dag_runsdag_id CLI command and see if yon can filter it to your needs.

Retrive scheduled time for catchup=true in airflow

I have a dag that is scheduled to call a script daily passing the current date so i pass {{ ds }} to get the execution date.
On days when my dag doesn't run i have catchup = True.
so the dag needs to pass the scheduled date, not the execution date for the task to get done, so that the activity of the day on which the dag was unable to run is still completed.
How can i do this?
As far as I understand your scenario, the execution_date is exactly what you need.
The name might be a bit misleading, but it is in fact filled with the scheduled timestamp.

Resources