I'm looking for a way to run an autosys job every minute of the day except Sunday between 11:00 and 12:00. Is there a way to accomplish this without creating 3 jobs (one that runs Mon-Sat, one that runs Sunday until 11am, and one that runs Sunday after 12)?
You can do this with two jobs. One that runs from Monday-Saturday and one that runs on Sunday in which you can specify the run_window from 12:00-11:00.
Related
I want to run a DAG on 2nd and 5th working day of every month.
eg1: Suppose, 1st Day of the month falls on Friday. In that case the 2nd working day of the month falls on 4th of that Month(i.e. on Monday) and 5th working Day falls on 7th of that month.
eg2: Suppose 1st Day of Month falls on Wednesday. In that case, the 2nd working Day will fall on 2nd Day of that Month, but the 5th Working Day will fall on 7th of that month (i.e. on Tuesday)
eg3: suppose 1st Day of Month falls on Sunday. In that case, the 2nd working day will fall on 3rd of that month and the 5th working day will fall on 6th of that month (i.e. on Friday)
So, how to schedule the DAG in Airflow for such scenarios.
#aiflow #DAG #schedule
I am looking for scheduling logic or code
Could you provide a code of your DAG?
It depends what operators are you using/willing to use.
Also, you might keep in mind bank holidays. Be sure that it is okay to run your airflow job on the 2nd day even if it is bank holiday?
You can schedule your dag daily and use python operator to validate if current date fits your restirctions. I would push this value to XCOM and then read and process this value in your DAG definition.
Another option would be to use bash operator in the beggining of flow and fail it in case current date violates your logic. This will not execute the rest of depented tasks.
Airflow uses cron definition of scheduling times so you must define your logic inside the code as cron can only run tasks on defined schedule and cannot do any calculations.
You can use custom timetables: https://airflow.apache.org/docs/apache-airflow/stable/howto/timetable.html
For a similar use case we implemented a branching operator with a logic to run the selected tasks when it is a specific workday of the month (I think we were using the workday package to identify specific holidays) and then this dag run daily. But the DAG had to complete some other tasks in all cases.
I just ran an airflow DAG. When I see the airflow last run date, it displays the last but last run date. It catches my attention when I hover over the "i" icon it shows the correct date. Is there any way to solve this? Sounds like nonsense but I end up using it for QA of my data.
This is probably because your airflow job has catchup=True enabled and a start_date in the past, so it is back-filling.
The Start Date is the real-time date of the last run, whereas the Last Run is the execution date of the airflow job. For example, if I am back-filling a time partitioned table with data from 2016-01-01 to present, the Start Date will be the current date but the Last Run date will be 2016-01-01.
Please include your DAG file/code in the future.
Edit: If you don't have catchUp=True enabled, and the discrepancy is approximately one day (like in the picture you sent), then that is just due to the behaviour of the scheduler. From the docs, "The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period."
if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be triggered soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.
I have a job which runs everyday 4 times in an hour (total 48 times).
I want to prevent it from running only between 7 AM to 9 AM Sunday. How can I achieve it without creating two different jobs. I want to accomplish it in a single job. Please help.
Mention all the time the job should run in start_times like below
*replace .. with missing times
start_times: "00:00, 00:15, .... 06:30, 06:45, 9:15, 9:30, .. 23:45"
I suggest to create a calender defined as Every Sunday 07:00 till 09:00.
And then in the Job defination assign this calender to job attribute exclude_calendar along with your run_calendar.
I have a Control-M job that runs every 20 min. Everything works great during that day's run. The issue is when we auto-order the next days jobs. If the current day's jobs are running we get 2 copies of the jobs running at the same time.
Is there a way to not start the new job if the previous day's job is executing?
The job starts every 20 minutes, but how long does it run? Set the end window for "submit between" time a few minutes before the new day build.
Let's say your new day builds at 0400. Since the job is intended to run every 20 minutes, you can have it run as late as 0340. Set the "to" time in Activity Window to 0340 and the job won't autosubmit after that time. The new day will build at 0400 and the new version of the job will start then - 20 minutes after the previous start.
You can also add a control resource to the job to prevent two of them from running at the same time. I don't know another way to do it. That's not a can of worms I'd open unless the activity period settings just won't work the way you want them to.
If you're referring to last day's job execution bleeding into current day, causing resource contention, your best bet (as Rob pointed out) is to define a resource with max count of 1 to be required by the job, so next day job instance cannot start until prev day job completes and release the resource. Alternatively you can have the job post a Condition for order date + 1, and have order date condition also be an In condition for the job.
i am trying ordering my jobs into planing until jobs gets success,but when previous days jobs for that particular CTM table run then only today order jobs need to be run , i have added keep active for 10 days for each job.
thanks in advance
You can try adding an IN-CONDITION of the same job with PREV date, so that todays job will run only when yesterday's job has completed its run successfully.
If you want to run todays job even when yesterdays job fails, you can add a DO-STEP on failure, to create a condition which can be used by todays job to start running.