Scheduling Dependent Jobs in Contol-M - control-m

I have a flow that needs to be trigger automatically on Tuesday every week.
In this, I have 4 jobs all dependent on each other.
JobA<-JobB<-JobC<-JobD
I have put all 4 jobs in one SMART folder.
Now I have scheduled this flow so that as soon as a job gets completed another job should trigger.
How can I achieve it.

Control-M has a design element called "Conditions" that defines the flowchart of a process mesh. These conditions determine the Jobs that are successors and predecessors through the Input and Output Conditions.
Keep in mind that a Condition in Control-M is a prerequisite for the execution of a Job and is composed of the following elements:
A Name,
A date
A sign

Related

Schedule an Oozie workflow at two different frequencies

I have an Oozie job that processes data incrementally. Going forward, I would like to run this job on an hourly basis to prepare the results as soon as possible. But to backfill old data, it would be faster to run sequential jobs processing a week's worth of data at a time.
Is it possible to have a single coordinator.xml file that allows for both of these modes, and simply choose between them based on a flag specified ad-hoc when the job is scheduled?
In the parameters of the <coordinator-app> tag in coordinator.xml, there is a single frequency, which suggests that this is not possible, at least not in a natural way.
I don't think there is an easy way to do different frequencies within a coordinator. Based on your description you would not need the weekly job after the backfill happens.
I imagine you'd have to change the parameterization of the workflow as well to process more or less data.
On the other hand, you could just start the coordinator in the past with the frequency you'd like and tweak parameters like concurrency, throttle and execution in the app definition so Oozie can chew through the backlog by executing the workflow in parallel.
My eventual solution was to create the workflow at a given frequency (say, daily), and then create a second "backfill" workflow with a different frequency (weekly or monthly) that calls the original workflow as a sub-workflow.

Is Airflow a good fit for DAG that doesn’t care about execution date/time?

The API in Airflow seems to suggest it is build around backfilling, catching up and scheduling to run regularly in interval.
I have an ETL that extract data on S3 with the versions of the previous node (where the data comes from) in DAG. For example, here are the nodes of the DAG:
ImageNet-mono
ImageNet-removed-red
ImageNet-mono-scaled-to-100x100
ImageNet-removed-red-scaled-to-100x100
where ImageNet-mono is the previous node of ImageNet-mono-scaled-to-100x100 and
where ImageNet-removed-red is the previous node of ImageNet-removed-red-scaled-to-100x100
Both of them go through transformation of scaled-to-100x100 pipeline but producing different data since the input is different.
As you can see there is no date is involved. Is Airflow a good fit?
EDIT
Currently, the graph is simple enough to be managed manually with less than 10 nodes. They won't run in regularly interval. But instead as soon as someone update the code for a node, I would have to run the downstream nodes manually one by one python GetImageNet.py removed-red and then python scale.py 100 100 ImageNet-removed-redand then python scale.py 100 100 ImageNet-mono. I am looking into a way to manage the graph with a way to one click to trigger the run.
I think it's fine to use Airflow as long as you find it useful to use the DAG representation. If your DAG does not need to be executed on a regular schedule, you can set the schedule to None instead of a crontab. You can then trigger your DAG via the API or manually via the web interface.
If you want to run specific tasks you can trigger your DAG and mark tasks as success or clear them using the web interface.

Control-M: it is possible if first job fails to continue running

I have several jobs than will run in sequence. It is possible to create a dependency between them only for completion, but not that the prior job has to complete successfully?
If a job fails this should remain red and go to the next job and continue running.
It is mandatory that this jobs to run in sequence and not in paralel.
As Mark outlined you can simply create an On-Do action within the parent job to add a condition when the job ends Not OK. The parent job will still go red and the successor job will kick off.
See below for an example:
yes, on the actions tab you create and On/Do step and say when Not OK the job should add the output condition. In this way the next job will run (in sequence) regardless of what happens to the predecessor job.

Control-M scheduling changes

what all changes can we make in control-m job scheduling to minimize charges if we get charged on the basis of no of jobs ordered in a day in active schedule.
This is costing us a lot.
If some of your jobs are commands and share broad characteristics (nodeid, user, no alerts) then use conditional operators. E.g. linking commands with a semicolon will mean that each command is executed once. Linking with && will mean that the second command is only executed if the first runs successfully.
As Alex has mentioned, this is a broad area. And would be very difficult to anser to the point. But below are the few tips tht can be considered.
1. Check is the same script is being run by different jobs. This can be combined together by the help of scheduling tab.
2. File watcher jobs. If there is a requirement for checking the incoming file and then trigger a specific job for processign the file. [This makes up 2 jobs: Job1 - File watcher, Job2 - File Processing] This function can be achived by using AFT jobs. AFT jobs combine this two functions in one.
3. Low priority jobs, where alerting is not required, can be moved to unix/shell scripts if jobs are costly.
4. If Job2 is succeeding Job1, and Job2 is having only 1 IN CONDITION i.e from Job1, then rather than having two jobs, script (of Job2) can be called from the script (of Job1). So, ultimately we are doing two functions in Job1. Also, if the script (Job2) fails, then the Job1 will not get success return code. And you can get the details from log.
5. Keep the archiving functionality to the scripts and no need to bring it into the Control M jobs unless it is very important. And rather than doigni it every day for past 6 months, it would be better to schedule it once in a week or once in two weeks.
6. Sort the jobs in such a way that the regular jobs are in one table and the adhoc jobs (which is run only on request) into another. Keep the 'UserDaily' only for the jobs that are regular. Not keeping the 'UserDaily' for the jobs that are adhoc will not call these jobs in to the EM daily, and thus you see only those jobs that run daily and not the one that might or moght not run daily.
Hope this helps.
You can use ctmudly command to order only the user daily you want.
You can try using crontab in unix to schedule non-priority jobs which doesn't need manual intervention or observation.
You can avoid the FW jobs by including file checker logic in your Shell script which executes the actual process.

To run batch jobs one after the other

I submitting jobs to the batch process one after the other.
How do i control such that the second batch job runs only when the first one is finished.
Right now both the jobs executes simultaneously which i dont want to happen
There are two options. You can do this through code, or just via manual setup. Manual method is fairly easy, just go to (Basic>Inquiries>Batch Job), create a new batch job and save it. Then click "View Tasks" and create a new task, where this will be your first batch task. Choose your class, description, batch group, etc., then save. Click "parameters" to setup the parameters.
After that, you can setup your dependent task. Make sure your tasks both have descriptions. Add your second batch task and save. Then in the lower left corner, you click on your task that you want to have a condition, then add a row there and setup your conditions so that one task won't go until the second has completed.
Via X++ code, you would create a BatchHeader where you setup basically the same thing we just did manually. You use the .addDependency to make one task dependent on the completion of the other. This walkthrough will get you started with a job to create the batch header, and you'll just have to play around to get the dependency working.

Resources