Difference between ON ICE and ON HOLD jobs in Autosys - autosys

What is the difference of putting job on hold and putting it on ice?

There are two notable differences between ON HOLD and ON ICE jobs, which dictate when to use them. When an ON_HOLD job is put off hold, it runs, if it's starting conditions are satisfied, while an ON ICE job will not run, after putting into OFF ICE, even if it's starting conditions are met. It will only run, when it's starting condition will reoccur. For example, if you have a job which starts your Java services at 3.00 A.M., which was ON_HOLD, it will run as soon as you make OFF_HOLD and starting condition met, while in the case of ON ICE, it will only run on next day. Another critical difference between ON_ICE and ON_HOLD comes in terms of dependent jobs. All dependent jobs of an ON_ICE job will execute as though ON_ICE job was succeeded while all dependent jobs of an ON_HOLD job will not run until that job is put on OFF hold. This is the most important difference between them. In the case of box jobs, suppose you have 4 jobs inside a box jobs, and 3rd job depends upon the success of the 2nd job, which was put ON ICE, then when that box job is started, both first job and the 3rd job will start immediately because ON ICE makes the dependent job to run immediately.
Summary:
1 Dependent jobs of ON_HOLD doesn't run, but dependent jobs of ON_ICE runs as it succeeded.
2 An ON_ICE job doesn't run, when it puts OFF ICE and starting condition met while ON_HOLD jobs run when you put it OFF HOLD and it's starting conditions met (you can change its state to INACTIVE if this is undesirable).
3 Dependent jobs of ON_ICE jobs, which is inside a box job, will run immediately once box job is started.
Read more: http://javarevisited.blogspot.com/2013/08/difference-between-on-hold-and-on-ice-jobs-autosys-interview-question.html#ixzz4q7iBbf8Y

Related

Kill job if execution time less than 2 min

I have a chain with several jobs, and sometimes a certain job that normally takes about 2 hours, finishes in less than 2 minutes.
What I would like to do is to kill this job if it ends in less than 2 minutes so that the chain won't proceed.
Is this possible?
Thanks
Well you don't really want to kill anything, do you? If you do, see BMC's note (including video) on using ctmkilljob.
In this case your next job is dependent on 2 things, the predecessor job being ok and the duration of the predecessor job. Add another input condition to your next job (in addition to the existing condition) to represent the >2 minutes duration.
On the job that needs to run for more than 2 minutes, add a Notify when the Exectime exceeds 2 mins (or 60 mins or whatever you decide is the threshold) and get it to shout to an entry in your shout destination table.
On the Control-M Server create a new program entry in your shout destination table and create a small script referenced by the shout. The script should use the ctmcontb utility to create a new ODAT condition that your next is waiting on. If you have a look at the BMC help note for ctmkilljob (and just substitute in ctmcontb) then you'll see how to do this.

Airflow: Concurrency Depth first, rather than breadth first?

In airflow, the default configuration seems to be to queue up tasks, in parallel, across days--from one day to the next.
However, if I spin this process up across, say, two years, then the airflow dag will churn through preliminary processes first, across all days, rather than taking, say, 4 days forward from start to finish concurrently.
How do I toggle airflow to execute tasks according to a depth first paradigm rather than a breadth first paradigm?
I have come across a similar situation. I used the following trick to achieve that depth-first behaviour.
Assign all tasks of your DAG to a single pool (with limited number of slots like, say, 20-30)
Set weight_rule=upstream to all the above tasks
Explaination
The UPSTREAM weight_rule reverses prioritization of tasks based on their position across breadth of workflow, resulting in all downstream tasks to have a higher priority than upstream tasks.
This would ensure that whatever branches are launched will go onto completion before next branch is picked, thereby achieving that depth-first behaviour
Try to toggle with the parallelism and max_active_runs parameters in your airflow.cfg and the concurrency parameter at your DAGs.

How to run Airflow DAG for specific number of times?

How to run airflow dag for specified number of times?
I tried using TriggerDagRunOperator, This operators works for me.
In callable function we can check states and decide to continue or not.
However the current count and states needs to be maintained.
Using above approach I am able to repeat DAG 'run'.
Need expert opinion, Is there is any other profound way to run Airflow DAG for X number of times?
Thanks.
I'm afraid that Airflow is ENTIRELY about time based scheduling.
You can set a schedule to None and then use the API to trigger runs, but you'd be doing that externally, and thus maintaining the counts and states that determine when and why to trigger externally.
When you say that your DAG may have 5 tasks which you want to run 10 times and a run takes 2 hours and you cannot schedule it based on time, this is confusing. We have no idea what the significance of 2 hours is to you, or why it must be 10 runs, nor why you cannot schedule it to run those 5 tasks once a day. With a simple daily schedule it would run once a day at approximately the same time, and it won't matter that it takes a little longer than 2 hours on any given day. Right?
You could set the start_date to 11 days ago (a fixed date though, don't set it dynamically), and the end_date to today (also fixed) and then add a daily schedule_interval and a max_active_runs of 1 and you'll get exactly 10 runs and it'll run them back to back without overlapping while changing the execution_date accordingly, then stop. Or you could just use airflow backfill with a None scheduled DAG and a range of execution datetimes.
Do you mean that you want it to run every 2 hours continuously, but sometimes it will be running longer and you don't want it to overlap runs? Well, you definitely can schedule it to run every 2 hours (0 0/2 * * *) and set the max_active_runs to 1, so that if the prior run hasn't finished the next run will wait then kick off when the prior one has completed. See the last bullet in https://airflow.apache.org/faq.html#why-isn-t-my-task-getting-scheduled.
If you want your DAG to run exactly every 2 hours on the dot [give or take some scheduler lag, yes that's a thing] and to leave the prior run going, that's mostly the default behavior, but you could add depends_on_past to some of the important tasks that themselves shouldn't be run concurrently (like creating, inserting to, or dropping a temp table), or use a pool with a single slot.
There isn't any feature to kill the prior run if your next schedule is ready to start. It might be possible to skip the current run if the prior one hasn't completed yet, but I forget how that's done exactly.
That's basically most of your options there. Also you could create manual dag_runs for an unscheduled DAG; creating 10 at a time when you feel like (using the UI or CLI instead of the API, but the API might be easier).
Do any of these suggestions address your concerns? Because it's not clear why you want a fixed number of runs, how frequently, or with what schedule and conditions, it's difficult to provide specific recommendations.
This functionality isn't natively supported by Airflow
But by exploiting the meta-db, we can cook-up this functionality ourselves
we can write a custom-operator / python operator
before running the actual computation, check if 'n' runs for the task (TaskInstance table) already exist in meta-db. (Refer to task_command.py for help)
and if they do, just skip the task (raise AirflowSkipException, reference)
This excellent article can be used for inspiration: Use apache airflow to run task exactly once
Note
The downside of this approach is that it assumes historical runs of task (TaskInstances) would forever be preserved (and correctly)
in practise though, I've often found task_instances to be missing (we have catchup set to False)
furthermore, on large Airflow deployments, one might need to setup routinal cleanup of meta-db, which would make this approach impossible

dependent job not starting due to s condition

Job B
Condition s(Job A, 9.00)
Start time: 9:15
Job A has run within 9 hours. Sometimes the Job A starts running again just before the scheduled start time of Job B (9:15). Because of this, Job B cannot start as Job A is RU instead of SU.
What kind of condition on Job B would solve this issue?
Is your jobB should be really dependent with the success of jobA? if yes, then all you need to do is to wait for the jobA to be completed, and jobB will automatically start after that. If no, you can simply remove the jobA in the run condition of jobB and remain the start time of jobB as 9:15. Also, are your jobs reside in the same box? Is your jobA has multiple run time in a day?

Autosys Job Queue

I'm trying to set1 an autosys jobs configuration so that will have a "funnel" job queue behavior, or, as I call it, in a 'waterdrops' pattern, each job executing in sequence after a given time interval, with local job failure not cascading into sequence failure.
1 (ask for it to be setup, actually, as I do not control the Autosys machine)
Constraints
I have an (arbitrary) N jobs (all executing on success of job A)
For this discussion, lets say three (B1, B2, B3)
Real production numbers might go upward of 100 jobs.
All these jobs won't be created at the same time, so addition of a new job should be as less painful as possible.
None of those should execute simultaneously.
Not actually a direct problem for our machine
But side effect on a remote, client machine : jobs include file transfer, which are trigger-listened to on client machine, which doesn't handle well.
Adaptation of client-machine behavior is, unfortunately, not possible.
Failure of job is meaningless to other jobs.
There should be a regular delay in between each job
This is a soft requirement in that, our jobs being batch scripts, we can always append or prepend a sleep command.
I'd rather, however have a more elegant solution especially if the delay is centralised : a parameter - that could be set to greater values, should the need arise.
State of my reasearch
Legend
A(s) : Success status of job
A(d) : Done status of job
Solution 1 : Unfailing sequence
This is the current "we should pick this solution" solution.
A (s) --(delay D)--> B(d) --(delay D)--> B2(d) --(delay D)--> B3 ...
Pros :
Less bookeeping than solution 2
Cons :
Bookeeping of the (current) tailing job
Sequence doesn't resist to job being ON HOLD (ON ICE is fine).
Solution 2 : Stairway parallelism
A(s) ==(delay D)==> B1
A(s) ==(delay D x2)==> B2
A(s) ==(delay D x3)==> B3
...
Pros :
Jobs can be put ON HOLD without incidence.
Cons :
Bookeeping to know "who is when" (and what's the next delay to implement)
N jobs executed at the same time
Underlying race condition created
++ Risk of overlap of job execution, especially if small delays accumulates
Solution 3 : The Miracle Box ?
I have read a bit about Job Boxes, but the specific details eludes me.
-----------------
A(s) ====> | B1, B2, B3 |
-----------------
Can we limit the number of concurrent executions of jobs of a box (i.e a box-local max_load, if I understand that parameter) ?
Pros :
Adding jobs would be painless
Little to no bookeeping (box name, to add new jobs - and it's constant)
Jobs can be put ON HOLD without incidence (unless I'm mistaken)
Cons :
I'm half-convinced it can't be done (but that's why I'm asking you :) )
... any other problem I have failed to forseen
My questions to SO
Is Solution 3 a possibility, and if yes, what are the specific commands and parameters for implementing it ?
Am I correct in favoring Solution 1 over Solution 2 otherwise2 ?
An alternative solution fitting in the constraints is of course more than welcome!
Thanks in advance,
Best regards
PS: By the way, is all of this a giant race condition manager for the remote machine failing behavior ?
Yes, it is.
2 I'm aware it skirts a bit toward the "subjective" part of questions rejection rules, but I'm asking it in regards to the solution(s) correctness toward my (arguably) objective constraints.
I would suggest you to do below
Put all the jobs (B1,B2,B3) in a box job B.
Create another job (say M1) which would run on success of A. This job will call a shell/perl script (say forcejobs.sh)
The shell script will get a list of all the jobs in B and start a loop with a sleep interval of delay period. Inside loop it would force start jobs one by one after the delay period.
So outline of script would be
get all the jobs in B
for each job start for loop
force start the job
sleep for delay interval
At the end of the loop, when all jobs are successfully started, you can use an infinite loop and keep checking status of jobs. Once all jobs are SU/FA or whatever, you can end the script and send the result to you/stdout and finish the job M1.

Resources