Airflow DAG Scheduling for end of month - airflow

I want to run a schedule on Airflow (v1.9.0).
My DAG needs to run at every end of month, but I don't know how to write the settings.
my_dag = DAG(dag_id=DAG_ID,
catchup=False,
default_args=default_args,
schedule_interval='30 0 31 * *',
start_date=datetime(2019, 7, 1))
But this won't work in a month where there's no 31st, right?
How can I write a schedule_interval to run at every end of the month?

You can do this by putting L in the month position of your schedule_interval cron expression.
schedule_interval='59 23 L * *' # 23:59 on the last day of the month

Related

DAG run as per timezone

I want to run my dag as per new york time zone. As the data comes as per the new york time zone and dag fails for the initial runs and skips last runs data as well since UTC is 4 hours ahead and the day gets completed for UTC 4 hours earlier than for new york.
I have done
timezone = pendulum.timezone("America/New_York")
default_args = {
'depends_on_past': False,
'start_date': pendulum.datetime(2022, 5, 13, tzinfo=timezone),
...
}
test_dag = DAG(
'test_data_import_tz', default_args=default_args,
schedule_interval="0 * * * *", catchup=False)
DAG CODE works with:
yesterday_report_date = context.get('yesterday_ds_nodash')
report_date = context.get('ds_nodash')
But when the DAG runs as per UTC only and the initial run fails as data is not present.
Dag Run times are:
2022-05-13T00:00:00+00:00
2022-05-13T01:00:00+00:00
and so on
whereas I want this to start from
2022-05-13T04:00:00+00:00
2022-05-13T05:00:00+00:00

Run Airflow DAG 2 times per day

How can I schedule my DAG 2 times per day with such settings?
I tried
'start_date': datetime(2022, 4, 6)
with
schedule_interval='0 1,4 * * *'
So if today is 2022-04-08 about 5 pm, we should see 4 (or 6?) starts, but i see 5 (last today at 01:00 UTC), how its works?

How to run airflow dag on working days between 9am to 4pm in every 10 minutes

I have a DAG that needs to be scheduled to run in working days (Mon to Fri) between 9AM to 4PM in every 10 minutes. How do i do this in Airflow.
Set your DAG with cron expression: */10 9-16 * * 1-5 Which means at every 10th minute past every hour from 9 through 16 on every day-of-week from Monday through Friday. See crontab guru.
dag = DAG(
dag_id='my_dag',
schedule_interval='*/10 9-16 * * 1-5',
start_date=datetime(2021, 1, 1),
)

Airflow start Dag on yesterday

When I deploy a new dag on airflow, let's say I deploy it today (28 April).
And I have the Cron expression as this: 0 3 * * *, then I expect the first run is on 29 April at 3 am. however, I get a run as soon as deploy with this run id: 2021-04-27, 03:00:00`.
Dag code:
DAG(
dag_id="namexx",
schedule_interval='0 3 * * *',
max_active_runs=1,
is_paused_upon_creation=False,
dagrun_timeout=timedelta(hours=1),
catchup=False,
default_args={
"start_date": datetime(2021, 1, 1),
"retries": 0,
"retry_delay": timedelta(minutes=1)
}
)
Any idea why is that?
This is expected.
Airflow schedule DAGs at the end of the interval.
if start_date is 2021-01-01 and interval is hourly a run will be triggered as soon as the DAG deployed.
See also previous answer 1, answer 2 on this subject

Execute the task on Airflow on 12th day of the month and 2 days before the last day of the month

I need to execute the airflow same task on 12th day of the month and 2 days before the last day of the month.
I was trying with macros and execution_date as well. Not sure how to proceed further. Could you please help on this?
def check_trigger(execution_date, day_offset, **kwargs):
target_date = execution_date - timedelta(days = day_offset)
return target_date
I would approach it like below. And twelfth_or_two_before is a Python function that simply checks the date & returns the task_id of the appropriate downstream task. (That way if the business needs ever change & you need to run the actual tasks on a different/additional day(s), you just modify that function.)
with DAG( ... ) as dag:
right_days = BranchPythonOperator(
task_id="start",
python_callable="twelfth_or_two_before,
)
do_nothing = DummyOperator(task_id="do_nothing")
actual_task = ____Operator( ... ) # This is the Operator that does actual work
start >> [do_nothing, actual_task]

Resources