Airflow server execution date - airflow

Monthly days are not getting scheduled and I think the absence of year in the execution date could be the reason.

The year is absence because Airflow doesn't show the current year. This is how it was designed. As for your image it shows all dates for daily execution.

Related

Run a airflow dag on 2nd and 5th working day every month

I want to run a DAG on 2nd and 5th working day of every month.
eg1: Suppose, 1st Day of the month falls on Friday. In that case the 2nd working day of the month falls on 4th of that Month(i.e. on Monday) and 5th working Day falls on 7th of that month.
eg2: Suppose 1st Day of Month falls on Wednesday. In that case, the 2nd working Day will fall on 2nd Day of that Month, but the 5th Working Day will fall on 7th of that month (i.e. on Tuesday)
eg3: suppose 1st Day of Month falls on Sunday. In that case, the 2nd working day will fall on 3rd of that month and the 5th working day will fall on 6th of that month (i.e. on Friday)
So, how to schedule the DAG in Airflow for such scenarios.
#aiflow #DAG #schedule
I am looking for scheduling logic or code
Could you provide a code of your DAG?
It depends what operators are you using/willing to use.
Also, you might keep in mind bank holidays. Be sure that it is okay to run your airflow job on the 2nd day even if it is bank holiday?
You can schedule your dag daily and use python operator to validate if current date fits your restirctions. I would push this value to XCOM and then read and process this value in your DAG definition.
Another option would be to use bash operator in the beggining of flow and fail it in case current date violates your logic. This will not execute the rest of depented tasks.
Airflow uses cron definition of scheduling times so you must define your logic inside the code as cron can only run tasks on defined schedule and cannot do any calculations.
You can use custom timetables: https://airflow.apache.org/docs/apache-airflow/stable/howto/timetable.html
For a similar use case we implemented a branching operator with a logic to run the selected tasks when it is a specific workday of the month (I think we were using the workday package to identify specific holidays) and then this dag run daily. But the DAG had to complete some other tasks in all cases.

What is the difference between the ODATE and the RDATE? Control-M

Does somebody know the difference between the ODATE and the RDATE? The manual says:
ODATE --> Original scheduling date of the job.
RDATE --> Installation current working date.
But that is not helpful for me.
Thanks a lot.
The ODATE is when the job is scheduled for originally and you need to take into account 2 things; i) when is your New Day Processing (i.e. if it's 06:00 am each day then the ODATE runs from 6am-6am and will be the same for that whole period) ii) you don't necessarily have to order with the date set as the current date - e.g. today you might want to order in jobs where some are scheduled for the last day of the month due to batch processing considerations.
%%$RDATE or %%RDATE (one value gives the year as yyyy, the other as yy) will resolve to the current system date. Whatever your server is saying the local date is now, regardless of Control-M settings, the RDATE will give you that. Often this setting is used if your job could run on different days but you really need to have today's date in the processing.
Of course, if your New Day Processing runs at midnight, you always clear all jobs away during NDP and you never order jobs into the future - then RDATE and ODATE will be basically the same.

Airflow - Incorrect Last Run

I just ran an airflow DAG. When I see the airflow last run date, it displays the last but last run date. It catches my attention when I hover over the "i" icon it shows the correct date. Is there any way to solve this? Sounds like nonsense but I end up using it for QA of my data.
This is probably because your airflow job has catchup=True enabled and a start_date in the past, so it is back-filling.
The Start Date is the real-time date of the last run, whereas the Last Run is the execution date of the airflow job. For example, if I am back-filling a time partitioned table with data from 2016-01-01 to present, the Start Date will be the current date but the Last Run date will be 2016-01-01.
Please include your DAG file/code in the future.
Edit: If you don't have catchUp=True enabled, and the discrepancy is approximately one day (like in the picture you sent), then that is just due to the behaviour of the scheduler. From the docs, "The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period."
if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be triggered soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

what do execution_date and backfill means in airflow

I'm new to airflow and I'm trying understand what is execution_date means in airflow context. I've read the tutorial page from airflow's documentation which states that
The date specified in this context is an execution_date, which simulates the scheduler running your task or dag at a specific date + time:
I tried to run a task from the tutorial using following command.
airflow test tutorial print_date 2015-06-01
I expected it to print execution_date but the task is printing the actual date on my local system like this.
[2018-05-26 20:36:13,880] {bash_operator.py:101} INFO - Sat May 26 20:36:13 IST 2018
I thought the scheduler will be simulated at a given time. So I'm confused here understanding about execution_date param. Can anyone help me understand this? Thanks.
It's printing the current time in your log because it was actually executed at this time.
The execution date is a DAG run parameter. Tasks can use it to have a date reference different from when the task will actually be executed.
Example: say you're interested in storing currency rates once per day. You want to get rates since 2010. You'll have a task in your DAG to call an API which will return the currency rate for a day. You can create a DAG with a start date of 2010-01-01 with a schedule of once per day. Even if you create it now, in 2018, it will run for every day since the start date and thanks to the execution date you'll have the correct data.

Periodic Calendar value shifting after mid year

so I have defined a periodic calendar with following periods,
Period A: to factor in start of every month
Period B: to factor in end of every month
Calendar
Now I am trying to schedule the job on every days marked as period A i.e. start of every month by doing the following settings,
Scheduling Setting
Schedule
My Problem
From July month the schedule is getting pre-shifted by 1. Any ideas why?
Also can someone guide me to a detailed documentation on periodic calendars and the values example?
Thanks.
I was not able to reproduce the specific issue you are having. I created the same periodic calendar (Per A = SoM, Per B = EoM) and my job schedule came out as expected.
Calendar
Job
Schedule
Solution
There is no need to use a periodic calendar for such a simple scheduling requirement. Making use of a periodic calendar incurs maintenance overhead as you need to ensure it is updated every year.
Instead of using a periodic calendar you could simply configure the job as below. This will result in a job that will always execute on the first day of each month, no maintenance required.

Resources