Can I set up conditional scheduled tasks in R? - r

I have 2 questions about scheduling a task:
I would like to schedule a task to run on the first business day at 12pm every month. Is it possible to do so?
If the first run doesnt produce a file, a second task will be scheduled to run at the end of that day.

Related

How to get dag start and end time and dag run duration in airflow?

I am trying to get the dag start time and end time to calculate the duration/elapsed time and show it in airflow UI.
I tried with python date time but looks like airflow already records these things. I want to know if there is any way to leverage that.
I don't want to get the details from the database because it will complicate things. I want to keep it simple.

run 2 scripts in same DAG with different schedule

Let's say you have 2 scripts: Daily_summary.py and Weekly_summary.py.
You could create 2 separate DAGs with daily and weekly schedules, but is it possible to solve this with 1 DAG?
I've tried a daily schedule, and simply putting this at the bottom (simplified):
if datetime.today().strftime('%A') == 'Sunday':
SSHOperator(run weekly_summary.py)
But problem is that if it is still running on Sunday at midnight, airflow will terminate this task since the Operator no longer exists on Monday.
If I could somehow get the execution day's day-of-the-week, that would solve it, but with Jinja templating '{{ds}}' it is not actually a text of 'yyyy-mm-dd', so cannot change it to date with datetime package. It only becomes date format somehow AFTER the airflow script gets executed
You shoudl dynamically generate two DAGs. But you can reuse the same code for that. This is the power of airflow - this is Python code, so you can easily use the same code to generate same DAG "structure" but with two diferent dag ids, and two different schedules.
See this nice article from Astronomer with some examples: https://www.astronomer.io/guides/dynamically-generating-dags

Airflow : Tasks in a dag with different intervals or have a subdag that running with different frequency than parent dag

We are using airflow as workflow manager and scheduler.
Requiremnet
1- We have a ETL pipeline in which data is arriving hourly in different file and need to be processed once data is arrived.
2- Data for every hour have a cutt-off limit in which it can get updated and once updated data needs to be reprocessed.
To solve first we can use file sensor with hourly macros to look for file and start processing once data is available.
For the second requirement we were thinking of using some kind of subdag/task which can run with different frequency till the cut-off time and process if there is any update in data.
But in airflow we couldn't find something like that which could run a task/subdag with different frequency.
How can we achieve this ?

Trigger N future downstream tasks in Apache Airflow

My goal is to write a job which incrementally loads data from a source to a destination table, and then calculates a 7-day moving average on that data. My initial instinct is to split this into two tasks, one which loads a single day of data into a table, and a second which calculates a 7-day moving average and loads the result into another table.
How can I configure airflow to trigger the next 7 task instances of the downstream task calculate_7d_moving_average automatically when a single task instance of load_day_of_data is re-run?

Control-M scheduling changes

what all changes can we make in control-m job scheduling to minimize charges if we get charged on the basis of no of jobs ordered in a day in active schedule.
This is costing us a lot.
If some of your jobs are commands and share broad characteristics (nodeid, user, no alerts) then use conditional operators. E.g. linking commands with a semicolon will mean that each command is executed once. Linking with && will mean that the second command is only executed if the first runs successfully.
As Alex has mentioned, this is a broad area. And would be very difficult to anser to the point. But below are the few tips tht can be considered.
1. Check is the same script is being run by different jobs. This can be combined together by the help of scheduling tab.
2. File watcher jobs. If there is a requirement for checking the incoming file and then trigger a specific job for processign the file. [This makes up 2 jobs: Job1 - File watcher, Job2 - File Processing] This function can be achived by using AFT jobs. AFT jobs combine this two functions in one.
3. Low priority jobs, where alerting is not required, can be moved to unix/shell scripts if jobs are costly.
4. If Job2 is succeeding Job1, and Job2 is having only 1 IN CONDITION i.e from Job1, then rather than having two jobs, script (of Job2) can be called from the script (of Job1). So, ultimately we are doing two functions in Job1. Also, if the script (Job2) fails, then the Job1 will not get success return code. And you can get the details from log.
5. Keep the archiving functionality to the scripts and no need to bring it into the Control M jobs unless it is very important. And rather than doigni it every day for past 6 months, it would be better to schedule it once in a week or once in two weeks.
6. Sort the jobs in such a way that the regular jobs are in one table and the adhoc jobs (which is run only on request) into another. Keep the 'UserDaily' only for the jobs that are regular. Not keeping the 'UserDaily' for the jobs that are adhoc will not call these jobs in to the EM daily, and thus you see only those jobs that run daily and not the one that might or moght not run daily.
Hope this helps.
You can use ctmudly command to order only the user daily you want.
You can try using crontab in unix to schedule non-priority jobs which doesn't need manual intervention or observation.
You can avoid the FW jobs by including file checker logic in your Shell script which executes the actual process.

Resources