how to make oozie schedule with holiday excluded - oozie

we have source files are arrived in hdfs every day except holidays.
our oozie coordinator watch these files to start every day. I do not want the oozie to run on holidays defined. How to do that. Coodinator should not timeout if it is holiday.

One possible solution is run job regularly and exclude all the job actions through switch case using decision nodes for holidays. For this start to java action which will check if this is holiday, propagate this value to decision action and then decide if the required actions will run or not for this value.(oozie supports propagation of value in workflow from one action to other). For each of the two scenario provide different message for your confirmation, 'todays holiday required actions skipped' else 'No holidays job succeeded'.

Related

How to send an email if autosys box is running from previous scheduled run at the time of next scheduled run

i have an autosys box which runs at 7AM and 8AM daily. I understand that if 7AM execution does not finish till 8AM, My box will not be triggered at 8AM again.
I need to send an email from autosys to business stating that previous job is already running that is why next execution is not happening at 8AM. How can i do that? Please try to give the JIL file for it.
Thanks in advance
It all depends how have you licensed Autosys CA schedulers for your project. In our firm, we have configuration setup wherein if an Autosys job fails/terminates/maxruns/etc., it creates high severity incident.
Coming to your query, there's no straight forward way that i am aware of. First, normally we do not schedule same job in frequent intervals, unless it is a reporting job.
Still, if you want the job to generate some sort of alert if first execution exceeds start time of second execution, then you can add max_run_alarm or term_run_time in your JIL as per requirement and set it for 60 min since that's the interval between two executions.
Here is what i have added my script JIL for a data loader job :
max_run_alarm: 150
alarm_if_fail: y
alarm_if_terminated: y
box_terminator: y
term_run_time: 780
Which parameter you want to use will depend on your requirement. If you want to kill the first job then use term_run_time parameter so that second execution "might" start or use max_run_alarm parameter if you just want to send a MAX ALARM notification.
I use both! Hope this was useful.

Set SLA for Oozie job in WAITING status

I have a Coordinator, which has a data dependency on a parquet directory, partitioned by date. And it runs every day in the morning. If the file isn't available for that day, the workflow goes into "WAITING" status.
Now I want to use the Oozie SLA feature to alert for this condition. Unfortunately I am not able to get this setup following the standard Oozie SLA feature. This feature is working, if my jobs is in "RUNNING" status, but takes too long to complete, but not if the workflow is in "WAITING" status. Oozie documentation doesn't have any references to this, so appreciate any advice on how to set it up.
https://oozie.apache.org/docs/4.1.0/DG_SLAMonitoring.html

How to reschedule a coordinator job in OOZIE without restarting the job?

When i changed the start time of a coordinator job in job.properties in oozie, the job is not taking the changed time, instead its running in the old scheduled time.
Old job.properties:
startMinute=08
startTime=${startDate}T${startHour}:${startMinute}Z
New job.properties:
startMinute=07
startTime=${startDate}T${startHour}:${startMinute}Z
The job is not running at the changed time:07th minute,its running at 08th minute in every hour.
Please can you let me know the solution, how i can make the job pickup the updated properties(changed timing) without restarting or killing the job.
You can't really change the timing of the co-ordinator via any methods given by Oozie(v3.3.2) . When you submit a job the contents properties are stored in the database whereas the actual workflow is in the HDFS.
Everytime you execute the co-ordinator it is necessary to have the workflow in the path specified in properties during job submission but the properties file is not needed. What I mean to imply is the properties file does not come into the picture after submitting the job.
One hack is to update the time directly in the database using SQL query.But I am not sure about the implications of it.The property might become inconsistent across the database.
You have to kill the job and resubmit a new one.
Note: oozie provides a way to change the concurrency,endtime and pausetime as specified in the official docs.

Oozie: run at some time or at some frequency, whichever comes first

the benefits of coordinating by an absolute time is that (insofar as the jobs take a consistent amount of time) the output will be ready for others at some time (e.g. update a dashboard during the night for people to see in the morning).
the benefits of coordinating by a relative frequency is that, if oozie (or it's server) are down, no jobs are skipped (e.g. a daily job might run 2 hours late, but not 22 hours late).
how can i do something like:
start="2009-01-01T21:00Z"
frequency="${coord:days(1)}"
run-if-skipped="true"
i.e. when all's well, jobs run daily at 9pm. if something happens to oozie (e.g. the server is rolled) between 8pm and 10pm, once oozie comes back up at 10pm, the job should run at 10pm, and then tomorrow at 9pm as normal.
https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases
Not sure that I fully understand the question.
If server is down, and you re-start you coordinator, so it will start from the coordinator start time.
Also you can make you job run every hour, check if the output folder exist - that stop . for that use Decision Control Node

Strict coordinator job ordering on Oozie

I have a coordinator on oozie that runs a series of tasks, each of which depends on the output of the last.
Each task outputs a dated folder and looks for the output of its predecessor using
${coord:latest(0)}
This all worked fine on my dev cluster when nothing else was running; every 5 minutes oozie would queue up another job, and in that 5 minutes the previous job had run so when the new job was set up it would see the directory it needed.
I run into problems on the production cluster; the jobs get submitted, but are put in a queue and don't run for a while, but still every 5 minutes oozie queues up another one, and in its initialization stage it is assigned its 'previous' folder, which hasn't been created yet as its predecessor hasn't run so the 'latest' function gives it the same input as the previous job. I then end up with 10 jobs all taking the same input...
What I need is a way of strictly preventing the next job in a coordinator sequence from even being created until its predecessor has finished running.
Is there a way this can be done?
Thanks for reading
This is the exact use case that Oozie designed to solve. Oozie will wait all data dependency before launch.
Please try to understand the following configs in your coordinator.xml
<datasets>
<dataset name="my_data" frequency="${coord:days(1)}" initial-instance="2013-01-27T00:00Z">
<uri-template>YOUR_DATA/${YEAR}${MONTH}${DAY}</uri-template>
</dataset>
...
<datasets>
<input-events>
<data-in name="my_data" dataset="my_data">
<instance>${coord:current(-1)}</instance>
</data-in>
</input-events>
<output-events>
<data-out name="my_data" dataset="my_data">
<instance>${coord:current(0)}</instance>
</data-out>
</output-events>
the "coord:current(-1)" in input-events means the previous output. It will interpret the dataset URI teamplate to "yesterday", and Oozie will check whether the data exist in HDFS by checking a success flag, which by default is an empty file named "_SUCCESS", right under the output directory. Oozie will keep waiting this flag before launching the current workflow.
btw, you can also set
<coordinator-app name="my_coordinator" frequency="${coord:days(1)}" start="${start_time}" end="${end_time}" ...>
to define start time and end time of a coordinator job, so you can catch up backlog data.

Resources