I have a coordinator on oozie that runs a series of tasks, each of which depends on the output of the last.
Each task outputs a dated folder and looks for the output of its predecessor using
${coord:latest(0)}
This all worked fine on my dev cluster when nothing else was running; every 5 minutes oozie would queue up another job, and in that 5 minutes the previous job had run so when the new job was set up it would see the directory it needed.
I run into problems on the production cluster; the jobs get submitted, but are put in a queue and don't run for a while, but still every 5 minutes oozie queues up another one, and in its initialization stage it is assigned its 'previous' folder, which hasn't been created yet as its predecessor hasn't run so the 'latest' function gives it the same input as the previous job. I then end up with 10 jobs all taking the same input...
What I need is a way of strictly preventing the next job in a coordinator sequence from even being created until its predecessor has finished running.
Is there a way this can be done?
Thanks for reading
This is the exact use case that Oozie designed to solve. Oozie will wait all data dependency before launch.
Please try to understand the following configs in your coordinator.xml
<datasets>
<dataset name="my_data" frequency="${coord:days(1)}" initial-instance="2013-01-27T00:00Z">
<uri-template>YOUR_DATA/${YEAR}${MONTH}${DAY}</uri-template>
</dataset>
...
<datasets>
<input-events>
<data-in name="my_data" dataset="my_data">
<instance>${coord:current(-1)}</instance>
</data-in>
</input-events>
<output-events>
<data-out name="my_data" dataset="my_data">
<instance>${coord:current(0)}</instance>
</data-out>
</output-events>
the "coord:current(-1)" in input-events means the previous output. It will interpret the dataset URI teamplate to "yesterday", and Oozie will check whether the data exist in HDFS by checking a success flag, which by default is an empty file named "_SUCCESS", right under the output directory. Oozie will keep waiting this flag before launching the current workflow.
btw, you can also set
<coordinator-app name="my_coordinator" frequency="${coord:days(1)}" start="${start_time}" end="${end_time}" ...>
to define start time and end time of a coordinator job, so you can catch up backlog data.
Related
I have a Coordinator, which has a data dependency on a parquet directory, partitioned by date. And it runs every day in the morning. If the file isn't available for that day, the workflow goes into "WAITING" status.
Now I want to use the Oozie SLA feature to alert for this condition. Unfortunately I am not able to get this setup following the standard Oozie SLA feature. This feature is working, if my jobs is in "RUNNING" status, but takes too long to complete, but not if the workflow is in "WAITING" status. Oozie documentation doesn't have any references to this, so appreciate any advice on how to set it up.
https://oozie.apache.org/docs/4.1.0/DG_SLAMonitoring.html
I am trying to understand autosys job. Suppose I have Job A that runs every 15 minutes. Suppose for some reason if Job A takes more than 15 minutes, will another instance of it run or it will wait for the job to finish before running another instance?
In my experience, if the previous job run is still running, another instance will not run if the next scheduled time comes. The next time the job runs is when the previous run is finished and the next scheduled time comes.
Another user also experienced this according to this answer.
I did not find any AutoSys documentation that officially confirms what happens in this situation, but I guess the best way to find out is to test it on your AutoSys instance.
I have experienced this first hand and can confirm that there won't be two instances in the mentioned scenario. The job will wait on the previous run to complete and will immediately kick off the next instance if the time condition is met before the previous completes.
But this will be the case only when the job is in running state, if the job is in any other state it will kick off based on the given start_time condition.
An oozie coordinator we own has been killed for operational reasons about a week ago. The cluster is now back up and running and ready for business. Can we revive it somehow so it will keep its run history and backfill all missing runs, or do we have to schedule a brand new one?
oozie job -resume xxxxxxx-xxxxxxxxxxxxxxx-oozie-oozi-C doesn't error out, but it also doesn't change the status of the coordinator back to RUNNING.
Have you tried out the killed -> ignored -> running transition? Based on the docs it should be possible.
It's a two step process: first one is based -ignore, second one is -change.
I've never tried to do this though :)
I am scheduling an Oozie MapReduce job to run every 15 minutes. I wonder what would happen if each job will take longer than that set time? Will it result in a job backlog? Or Oozie will create a new task / thread / fork for the new job while the previous one is still running?
Oozie won't run the next job before the previous one is over. If the first job takes more than 15 minutes to execute then the next one will be run after scheduled time. So scheduled time and running time may be different in Oozie.
EDIT:
Anyway, the described behaviour is default only and can be changed. You can set concurrency property from controls block to more than 1, and the next job will be run even the first one is still running. Check my answer on similar question
When i changed the start time of a coordinator job in job.properties in oozie, the job is not taking the changed time, instead its running in the old scheduled time.
Old job.properties:
startMinute=08
startTime=${startDate}T${startHour}:${startMinute}Z
New job.properties:
startMinute=07
startTime=${startDate}T${startHour}:${startMinute}Z
The job is not running at the changed time:07th minute,its running at 08th minute in every hour.
Please can you let me know the solution, how i can make the job pickup the updated properties(changed timing) without restarting or killing the job.
You can't really change the timing of the co-ordinator via any methods given by Oozie(v3.3.2) . When you submit a job the contents properties are stored in the database whereas the actual workflow is in the HDFS.
Everytime you execute the co-ordinator it is necessary to have the workflow in the path specified in properties during job submission but the properties file is not needed. What I mean to imply is the properties file does not come into the picture after submitting the job.
One hack is to update the time directly in the database using SQL query.But I am not sure about the implications of it.The property might become inconsistent across the database.
You have to kill the job and resubmit a new one.
Note: oozie provides a way to change the concurrency,endtime and pausetime as specified in the official docs.