I have the following DAG set up:
default_dag_args = {
'start_date': datetime.datetime(2021, 6, 25, 0, 0),
'email': 'foobar#foobar.com',
'email_on_failure': True,
'email_on_retry': False,
'retries': 1,
'retry_delay': datetime.timedelta(minutes=30)
}
with models.DAG(
'foobar',
schedule_interval = "30 5,7,9 * * *",
default_args=default_dag_args,
catchup=False) as dag:
The behaviour that I want to have is that the DAG will execute at 5:30, 7:30 and 9:30 UTC every day. The behaviour that I'm seeing is that the 5:30 run executes at 7:30 UTC, the 7:30 run executes at 9:30 and the 9:30 run executes at 5:30 the next day.
I think I kind of have a vague idea of why this is happening based on the docs - 9:30 marks the end of the schedule period and so the 9:30 run executes at the beginning of the next period. I can't figure out how to get the behaviour I want though. The DAG doesn't have any reference to the schedule time in the code, it just needs to run at 5:30, 7:30 and 9:30 and the 'run time' as Airflow considers it doesn't matter.
Is there any way to get a DAG to run at absolute times? If not, what schedule can I set to get the behaviour I desire?
Airflow is not a cron job scheduler. Airflow calculates start_date + schedule_interval and execute the job at the end of the interval. The reason behind this is explained in this answer.
In your case:
start_date=datetime(2021,06,25) with schedule_interval = "30 5,7,9 * * *" gives:
1st tasks with execution_date 2021-06-25 5:30 will start running on 2021-06-25 7:30
2nd task with execution_date 2021-06-25 7:30 will start running on 2021-06-25 9:30
3rd task with execution_date 2021-06-25 9:30 will start running on 2021-06-26 5:30
4th task with execution_date 2021-06-26 5:30 will start running on 2021-06-26 7:30
5th task with execution_date 2021-06-26 7:30 will start running on 2021-06-26 9:30
6th task with execution_date 2021-06-26 9:30 will start running on 2021-06-27 5:30
7th task with execution_date 2021-06-27 5:30 will start running on 2021-06-27 7:30
8th task with execution_date 2021-06-27 7:30 will start running on 2021-06-27 9:30
9th task with execution_date 2021-06-27 9:30 will start running on 2021-06-28 5:30
and so on...
Note that you still get 3 runs per day (except the first date) as you expect it just a matter of understanding how scheduling works. If you want to get 3 runs on the first date as well then change your start_date to datetime(2021,06,24,9,30). The execution_date is a logical date. If needed you can reference relevant dates within your DAG code usings macros - for example:
I mentioned that the 6th run execution_date is 2021-06-26 9:30 using macros with that runs can give you:
prev_execution_date is 2021-06-26 7:30
next_execution_date is 2021-06-27 5:30
Note: your code has catchup=False so the exact dates I wrote here won't be the same but that effects only on the first run. The following runs will follow the same logic.
Related
Is there any way we can branch airflow schedules.
Eg.
If the date is between 1-7, branch X should execute.
If the date is between 8-end_of_month branch Y should execute.
ie.
Task_1
|
Task_2
/\
X_1 Y_1
| |
X_2 Y_2
This will help to avoid new replica version of DAG and save maintenance cost.
This scenario can further be extended to regular days vs monthend, weekday vs weekends etc..
If the tasks take place in one DAG, you can download execution_date in the Python function and select which task to perform based on it. E.g:
def _branch_operator_func(**kwargs):
execution_date = kwargs['execution_date']
if execution_date.day >= 1 and execution_date.day <= 7:
return 'X_1'
elif (...)
We are working on airflow currently. If we have any daily dags which has to run as we follow current date minus 1
if the dag(daily dag 00 10 * * *) should have first run on 20 March 2022 then we provide start date as
'start_date': datetime(2022, 3, 19)
if we have to run on weekly basis or monthly basis, can any pls suggest how to provide start date ?
If the dag should run first on 19th march 2022(every saturday) cron
expression : 00 10 * * 6
what can i provide the start date in this case ? please suggest
is that something like 'start_date':
datetime(2022, 3, 12) or
'start_date': datetime(2022, 3, 11) ?????
is there any way i can get start date based on frequency with some python code ?
Is there any possibility we can create a dag and inside that task should run multiple iterations in every 10 minutes between a time frame..
We have two tasks : t1 and t2
t1 should run for 20 times in a day for every 5 minutes of gap and once 20 times is completed it should trigger a task2 (t2)
tried creating two different dags it worked but do we have any way to do it in a single dag.
Any suggestions please ..
task_1(run for 20 times each time should have 5 minutes of gap and then) >> task_2
What if you try something like this: (this is not executable code)
for i in range(1,6):
task1 = SomeOperator(id = f"task1_execution{i}")
sleep = BashOperator(id=f'sleep_{i}', command='sleep 5')
task2 = SomeOperator(id = "task2_execution")
task1 >> task2
I want a job a to execute every 15 mins starting from 10:30 AM to 8:30 PM.
I tried 30/15 10-20 * * *. But it ignores the times 11:00 AM, 11:15 AM, 12:00 PM, 12:15 PM, 1:00 PM, 1:15 PM etc.
I would like to know the proper cron string for the above expression.
# Run command every fifteen minutes between 10:30 and 20:30.
0,15,30,45 11-19 * * * …command…
30,45 10 * * * …command…
0,15,30 20 * * * …command…
The first line deals with the whole hours from 11:00 to 19:45. The second line handles 10:30 and 10:45. The third line handles 20:00, 20:15, 20:30 (assuming you want it to run at 20:30 too — if you don't the fix is obvious).
It may not be beautiful, but it will get the job done. Just make sure the …command… sequence is simple enough that the repetition is obvious.
You can probably replace the first component of the first line with 0/15 in your cron. You might use 30/15 for the second one, but it doesn't seem any clearer than what's written anyway. I'm not sure there's a good way to replace the other line.
One other possibility is to refactor it so that the hour range for each quarter hour is specified:
0 11-20 * * * …command…
15 11-20 * * * …command…
30 10-20 * * * …command…
45 10-19 * * * …command…
You could optionally combine the first two of those.
I want to schedule a job using autosys R11. I use start_time to specify the job start time and also use start_mins to specify the interval.
But now, I want to schedule a job from 1:00 PM to 5:00 PM at regular interval (10 mins) so that the job will run at 1:00, 1:10,1:20,1:30,....,2:00,2:10,2:30 etc. How do I specify the end time (5:00 PM)?
Could anyone help me with this?
Thanks,
Veena
Use a run_window: "13:00 - 17:00" and start_min: 0, 10, 20, 30, 40, 50