autosys - get the list of jobs that are scehduled in a particular time though unix command - autosys

I am working on an automation, where I will get the list of jobs that did not start to run even though the scheduled time has crossed. I am going to get the list based on an 2 hour time gap.
Now my question is how to get the list of jobs that are scheduled on a particular time period on that particular day.
For eg., 22-03-3018 08:00 - 10:00 am list of jobs scheduled on this period
I want to execute the command in unix.

Depending on how your linux system is set up, you can look in:
/var/spool/cron/* (user crontabs)
/etc/crontab (system-wide crontab)
also, many distros have:
/etc/cron.d/* These configurations have the same syntax as /etc/crontab
/etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, /etc/cron.monthly
These are simply directories that contain executables that are executed hourly, daily, weekly or monthly, per their directory name.
On top of that, you can have at jobs (check /var/spool/at/), anacron (/etc/anacrontab and /var/spool/anacron/) and probably others I'm forgetting.

In Autosys there is no easy way through using autosys native cmd's
However you can get all this information from the DB it is in the WCC DB and the table is dbo.MON_JOB query would be to get you started:
SELECT [NAME]
,[NEXT_TIME]
FROM [CAAutosys11_3_5_WCC_PRD].[dbo].[MON_JOB]
WHERE [NEXT_TIME] > '1970'
ORDER BY [NEXT_TIME] ASC
Let me know if you need more clarification.

Related

run 2 scripts in same DAG with different schedule

Let's say you have 2 scripts: Daily_summary.py and Weekly_summary.py.
You could create 2 separate DAGs with daily and weekly schedules, but is it possible to solve this with 1 DAG?
I've tried a daily schedule, and simply putting this at the bottom (simplified):
if datetime.today().strftime('%A') == 'Sunday':
SSHOperator(run weekly_summary.py)
But problem is that if it is still running on Sunday at midnight, airflow will terminate this task since the Operator no longer exists on Monday.
If I could somehow get the execution day's day-of-the-week, that would solve it, but with Jinja templating '{{ds}}' it is not actually a text of 'yyyy-mm-dd', so cannot change it to date with datetime package. It only becomes date format somehow AFTER the airflow script gets executed
You shoudl dynamically generate two DAGs. But you can reuse the same code for that. This is the power of airflow - this is Python code, so you can easily use the same code to generate same DAG "structure" but with two diferent dag ids, and two different schedules.
See this nice article from Astronomer with some examples: https://www.astronomer.io/guides/dynamically-generating-dags

Airflow : Tasks in a dag with different intervals or have a subdag that running with different frequency than parent dag

We are using airflow as workflow manager and scheduler.
Requiremnet
1- We have a ETL pipeline in which data is arriving hourly in different file and need to be processed once data is arrived.
2- Data for every hour have a cutt-off limit in which it can get updated and once updated data needs to be reprocessed.
To solve first we can use file sensor with hourly macros to look for file and start processing once data is available.
For the second requirement we were thinking of using some kind of subdag/task which can run with different frequency till the cut-off time and process if there is any update in data.
But in airflow we couldn't find something like that which could run a task/subdag with different frequency.
How can we achieve this ?

Schedule an Oozie workflow at two different frequencies

I have an Oozie job that processes data incrementally. Going forward, I would like to run this job on an hourly basis to prepare the results as soon as possible. But to backfill old data, it would be faster to run sequential jobs processing a week's worth of data at a time.
Is it possible to have a single coordinator.xml file that allows for both of these modes, and simply choose between them based on a flag specified ad-hoc when the job is scheduled?
In the parameters of the <coordinator-app> tag in coordinator.xml, there is a single frequency, which suggests that this is not possible, at least not in a natural way.
I don't think there is an easy way to do different frequencies within a coordinator. Based on your description you would not need the weekly job after the backfill happens.
I imagine you'd have to change the parameterization of the workflow as well to process more or less data.
On the other hand, you could just start the coordinator in the past with the frequency you'd like and tweak parameters like concurrency, throttle and execution in the app definition so Oozie can chew through the backlog by executing the workflow in parallel.
My eventual solution was to create the workflow at a given frequency (say, daily), and then create a second "backfill" workflow with a different frequency (weekly or monthly) that calls the original workflow as a sub-workflow.

Is Airflow a good fit for DAG that doesn’t care about execution date/time?

The API in Airflow seems to suggest it is build around backfilling, catching up and scheduling to run regularly in interval.
I have an ETL that extract data on S3 with the versions of the previous node (where the data comes from) in DAG. For example, here are the nodes of the DAG:
ImageNet-mono
ImageNet-removed-red
ImageNet-mono-scaled-to-100x100
ImageNet-removed-red-scaled-to-100x100
where ImageNet-mono is the previous node of ImageNet-mono-scaled-to-100x100 and
where ImageNet-removed-red is the previous node of ImageNet-removed-red-scaled-to-100x100
Both of them go through transformation of scaled-to-100x100 pipeline but producing different data since the input is different.
As you can see there is no date is involved. Is Airflow a good fit?
EDIT
Currently, the graph is simple enough to be managed manually with less than 10 nodes. They won't run in regularly interval. But instead as soon as someone update the code for a node, I would have to run the downstream nodes manually one by one python GetImageNet.py removed-red and then python scale.py 100 100 ImageNet-removed-redand then python scale.py 100 100 ImageNet-mono. I am looking into a way to manage the graph with a way to one click to trigger the run.
I think it's fine to use Airflow as long as you find it useful to use the DAG representation. If your DAG does not need to be executed on a regular schedule, you can set the schedule to None instead of a crontab. You can then trigger your DAG via the API or manually via the web interface.
If you want to run specific tasks you can trigger your DAG and mark tasks as success or clear them using the web interface.

Control-M scheduling changes

what all changes can we make in control-m job scheduling to minimize charges if we get charged on the basis of no of jobs ordered in a day in active schedule.
This is costing us a lot.
If some of your jobs are commands and share broad characteristics (nodeid, user, no alerts) then use conditional operators. E.g. linking commands with a semicolon will mean that each command is executed once. Linking with && will mean that the second command is only executed if the first runs successfully.
As Alex has mentioned, this is a broad area. And would be very difficult to anser to the point. But below are the few tips tht can be considered.
1. Check is the same script is being run by different jobs. This can be combined together by the help of scheduling tab.
2. File watcher jobs. If there is a requirement for checking the incoming file and then trigger a specific job for processign the file. [This makes up 2 jobs: Job1 - File watcher, Job2 - File Processing] This function can be achived by using AFT jobs. AFT jobs combine this two functions in one.
3. Low priority jobs, where alerting is not required, can be moved to unix/shell scripts if jobs are costly.
4. If Job2 is succeeding Job1, and Job2 is having only 1 IN CONDITION i.e from Job1, then rather than having two jobs, script (of Job2) can be called from the script (of Job1). So, ultimately we are doing two functions in Job1. Also, if the script (Job2) fails, then the Job1 will not get success return code. And you can get the details from log.
5. Keep the archiving functionality to the scripts and no need to bring it into the Control M jobs unless it is very important. And rather than doigni it every day for past 6 months, it would be better to schedule it once in a week or once in two weeks.
6. Sort the jobs in such a way that the regular jobs are in one table and the adhoc jobs (which is run only on request) into another. Keep the 'UserDaily' only for the jobs that are regular. Not keeping the 'UserDaily' for the jobs that are adhoc will not call these jobs in to the EM daily, and thus you see only those jobs that run daily and not the one that might or moght not run daily.
Hope this helps.
You can use ctmudly command to order only the user daily you want.
You can try using crontab in unix to schedule non-priority jobs which doesn't need manual intervention or observation.
You can avoid the FW jobs by including file checker logic in your Shell script which executes the actual process.

Resources