There are 5 jobs inside a Main Box job like this
MainBox
Job1 -> Job2 -> Job3 -> Job4 -> Job5
job2 is dependent on Job1, Job3 is dependent on Job2 and so on.
The dependency is implemented with condition attribute. For instance, Job2 has condtion: success(Job1).
On starting the MainBox, jobs will run in sequence. Suppose Job3 got failed. Now, how to re-start jobs inside MainBox from the failed Job3?
If I manually force start Job3 then it runs but the dependent Job4 does not get started after the success of Job3.
Actually in the setup you have mentioned, if any of the job fails, the Box will fail and downstream jobs will go to inactive state if condition is "box_terminator: y" set for the jobs. In this case you will have to manually complete the downstream by force starting each as conditions will not work as the Box itself is failed.
In order to achieve what you need to do a little tweak in jobs, make "box_terminator: n". Then the on any job's failure, the Box will stay in running state and the downstream jobs will stay in activated state waiting for failed job to be success. You can the either mark the failed job as success to skip its run or force start to completion, since the box is still in running state, the dependencies will work for each jobs within it.
But be sure to include "trem_runtime: value_in_minutes " for the Box, so that box fails after some time if failed job is unattended, else box will run indefinitely.
I think this should do.
We can achieve your requirement through below steps. Its an alternative method,
Step 1: Put ON_HOLD job1 and job2 which are ran and completed successfully.
Step 2: Terminate the box job if it is in running state.
Step 3: Start the box job.
step 4: Mark the job1 and job2 to success state from ON_HOLD state.
step 5: Check the status of the Box job and you can see that the job sequence start running from job3.
Regards,
Kaliraja.
For you to be able to restart the failed job without impacting the success job/s and for the remaining ACTIVE/INACTIVE jobs to run automatically (since it's dependent on the failed job), you need to use the ON_ICE feature of autosys. ON_ICE is just like a SUCCESS status. If your job is ON_ICE, Autosys will read this as SUCCESS and it will not be triggered again. Hence, since your dependency of job3 is the SU of job2, your job3 will run automatically once you've force started your main BOX (skipping to run the job1 and job2 considering they are already in ON_ICE status). Below are the steps you can perform:
If your main box is still in RU status, force STOP your main box.
ON_ICE your job1 and job2.
FORCE START your main box.
Once box already force started, your job3 will automatically run.
Once your job3 already turned to SU, your job4 will run immediately.
Once job4 already completed its run, your main box should turn to SU since your job1 and job2 are in ON_ICE state and your job3 and job4 are in SU state.
You can now set OFF_ICE the jobs job1 and job2 so as in the next schedule run of the main box, these jobs will be kicked off.
Note: The start and end run of the job1 and job2 should not change. The start and end run of your main box, job3, and job4 should have the updated time.
Related
i have an autosys box which runs at 7AM and 8AM daily. I understand that if 7AM execution does not finish till 8AM, My box will not be triggered at 8AM again.
I need to send an email from autosys to business stating that previous job is already running that is why next execution is not happening at 8AM. How can i do that? Please try to give the JIL file for it.
Thanks in advance
It all depends how have you licensed Autosys CA schedulers for your project. In our firm, we have configuration setup wherein if an Autosys job fails/terminates/maxruns/etc., it creates high severity incident.
Coming to your query, there's no straight forward way that i am aware of. First, normally we do not schedule same job in frequent intervals, unless it is a reporting job.
Still, if you want the job to generate some sort of alert if first execution exceeds start time of second execution, then you can add max_run_alarm or term_run_time in your JIL as per requirement and set it for 60 min since that's the interval between two executions.
Here is what i have added my script JIL for a data loader job :
max_run_alarm: 150
alarm_if_fail: y
alarm_if_terminated: y
box_terminator: y
term_run_time: 780
Which parameter you want to use will depend on your requirement. If you want to kill the first job then use term_run_time parameter so that second execution "might" start or use max_run_alarm parameter if you just want to send a MAX ALARM notification.
I use both! Hope this was useful.
I have a task that's scheduled to run hourly, however it's not being triggered. When I look at theTask Instance Details it says:
All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:
- The scheduler is down or under heavy load
- The following configuration values may be limiting the number of queueable processes: parallelism, dag_concurrency, max_active_dag_runs_per_dag, non_pooled_task_slot_count
- This task instance already ran and had its state changed manually (e.g. cleared in the UI)
If this task instance does not start soon please contact your Airflow administrator for assistance.
If I clear the task in the UI I am able to execute it through terminal but it does not run when scheduled.
Why do I have to manually clear it after every run?
I am trying to understand autosys job. Suppose I have Job A that runs every 15 minutes. Suppose for some reason if Job A takes more than 15 minutes, will another instance of it run or it will wait for the job to finish before running another instance?
In my experience, if the previous job run is still running, another instance will not run if the next scheduled time comes. The next time the job runs is when the previous run is finished and the next scheduled time comes.
Another user also experienced this according to this answer.
I did not find any AutoSys documentation that officially confirms what happens in this situation, but I guess the best way to find out is to test it on your AutoSys instance.
I have experienced this first hand and can confirm that there won't be two instances in the mentioned scenario. The job will wait on the previous run to complete and will immediately kick off the next instance if the time condition is met before the previous completes.
But this will be the case only when the job is in running state, if the job is in any other state it will kick off based on the given start_time condition.
An oozie coordinator we own has been killed for operational reasons about a week ago. The cluster is now back up and running and ready for business. Can we revive it somehow so it will keep its run history and backfill all missing runs, or do we have to schedule a brand new one?
oozie job -resume xxxxxxx-xxxxxxxxxxxxxxx-oozie-oozi-C doesn't error out, but it also doesn't change the status of the coordinator back to RUNNING.
Have you tried out the killed -> ignored -> running transition? Based on the docs it should be possible.
It's a two step process: first one is based -ignore, second one is -change.
I've never tried to do this though :)
What is the behavior of the job if I am giving Dependency and Schedule on the same job?
Suppose a job J2 is dependent on J1 on s(J1)|f(J1) and is scheduled at 18:00.
If the job J1 is force started at 15:00 and is success at 15:05. What will happen to the job J2 will it get start at 15:05 or will it wait till 18:00?
Thanks in advance,
Suresh Reddy M
Job2 will wait until 18:00 to become "Active", then will check the dependency on s(Job1). If Job1 is in success at that time, Job2 will run.
The run time of a job (or a box) is always based on the condition (dependency) and the schedule set to it. If either of the two is not met yet, then the job will not run. So in your question, the job2 will not run until 18:00.