How can we get status of Oozie jobs running daily? We have many jobs running in Oozie coordinator and currently we are monitoring through Hue/Oozie browser.
Is there any way we can get a single log file which contains coordinator name/workflow name with date and status? Can we write any program or script to achieve this?
You use the below command and put it into a script to run it daily/cron.
oozie jobs -oozie http://localhost:11000/oozie -filter status=RUNNING -len 2
oozie jobs -oozie http://localhost:11000/oozie -filter startCreatedTime=2016-06-28T00:00Z\;endcreatedtime=2016-06-28T10:00Z -len 2
Basically you are using oozie's jobs api and -filter command to get the information about workflow/coordinator/bundle. The -filter command supports couple of options to get the data based on status/startCreatedTime/name.
By default, it will bring the workflow record information, if you want to get the coordinator/bindle information. You can use the -jobtype parameter and value as coord/bundle.
Let me know if you need anything specifically. The oozie doc is little outdated for this feature.
Command to get status of all running oozie coordinators
oozie jobs -jobtype coordinator -filter status=RUNNING -len 1000 -oozie http://localhost:11000/oozie
Command to get status of all running oozie workflows
oozie jobs -filter status=RUNNING -len 1000 -oozie http://localhost:11000/oozie
Command to get status of all workflows for a specific coordinator ID
oozie job -info COORDINATOR_ID_HERE
Based on these queries you can write required scripts to get what you want.
Terms explanation:
oozie : Command to initiate oozie
job/jobs : API
len: No. of oozie workflows/coordinators to display
-oozie: Param to specify oozie url
-filter : Param to specify list of filters.
Complete documentation https://oozie.apache.org/docs/3.1.3-incubating/DG_CommandLineTool.html
Below command worked for me.
oozie jobs -oozie http://xx.xxx.xx.xx:11000/oozie -jobtype wf -len 300 | grep 2016-07-01 > OozieJobsStatus_20160701.txt
However we need to parse the file.
Related
In below example, if shell script shell_script.sh sends a job to cluster, is it possible to have snakemake aware of that cluster job's completion? That is, first, file a should be created by shell_script.sh which sends its own job to the cluster, and then once this cluster job is completed, file b should be created.
For simplicity, let's assume that snakemake is run locally meaning that the only cluster job originating is from shell_script.sh and not by snakemake .
localrules: that_job
rule all:
input:
"output_from_shell_script.txt",
"file_after_cluster_job.txt"
rule that_job:
output:
a = "output_from_shell_script.txt",
b = "file_after_cluster_job.txt"
shell:
"""
shell_script.sh {output.a}
touch {output.b}
"""
PS - At the moment, I am using sleep command to give it a waiting time before the job is "completed". But this is an awful workaround as this could give rise to several problems.
Snakemake can manage this for you with the --cluster argument on the command line.
You can supply a template for the jobs to be executed on the cluster.
As an example, here is how I use snakemake on a SGE managed cluster:
template which will encapsulate the jobs which I called sge.sh:
#$ -S /bin/bash
#$ -cwd
#$ -V
{exec_job}
then I use directly on the login node:
snakemake -rp --cluster "qsub -e ./logs/ -o ./logs/" -j 20 --jobscript sge.sh --latency-wait 30
--cluster will tell which queuing system to use
--jobscript is the template in which jobs will be encapsulated
--latency-wait is important if the file system takes a bit of time to write the files. You job might end and return before the output of the rules are actually visible to the filesystem which will cause an error
Note that you can specify rules not to be executed on the nodes in the Snakefile with the keyword localrules:
Otherwise, depending on your queuing system, some options exist to wait for job sent to cluster to finish:
SGE:
Wait for set of qsub jobs to complete
SLURM:
How to hold up a script until a slurm job (start with srun) is completely finished?
LSF:
https://superuser.com/questions/46312/wait-for-one-or-all-lsf-jobs-to-complete
I'm very new to AutoSys jobs and I have following commands stored in single jil file. let's call it, test.jil.
insert_job: job_A
command: echo 'mock'
description : mock job A
sendevent -E JOB_ON_ICE -J job_A
I'm trying to run jil < test.jil. it doesn't recognize sendevent. How can i get it working ?
In jil file we can write commands like insert_job,
delete_job,update_job but sendevent is different command which should be triggered by autosys agent.
So you can separately create executable file in which you can write that sendevent command and execute it through CLI.
Thanks.
Actually there is a change in one of the last service packs. For your JIL you can specify status:
insert_job: test_job2
command:dir
machine:localhost
status:on_ice
The valid parms are:
FAILURE, INACTIVE, ON_HOLD, ON_ICE, ON_NOEXEC, SUCCESS, or TERMINATED.
I want to know about failed oozie job execution.
Does it run from start or run from the failed point?
I have MR and Pig task to be executed.
If Pig job failed, does it start again from MR job execution?
How to re-run the failed oozie jobs?
It will start from the failed action.
oozie job --oozie <oozie_url> -rerun <job_id> -config <job.properties>
Add the following property in the job.properties if not present already. -config parameter is required only if you are updating the job.properties file.
oozie.wf.rerun.failnodes=true
I have tried
oozie job -oozie http://sandbox.hortonworks.com:11000/oozie -config ./job.properties -kill *
...to no effect. I have done a few Google searches and checked Oozie's documentation, and there does not appear to be a command for this.
Would any one know of a way to accomplish this?
It seems that the recent versions of oozie (tested on 4.2) have made this a lot easier.
Here is a oneliner that I now use to kill all jobs that I created.
oozie jobs -oozie http://myserver:11000/oozie -kill -filter user=dennis -jobtype bundle & oozie jobs -oozie http://myserver:11000/oozie -kill -filter user=dennis -jobtype coordinator & oozie jobs -oozie http://myserver:11000/oozie -kill -filter user=dennis
First it kills all bundles, then it kills all coordinators and finally all workflows. Note that I set a filter to my own username, as it appears to be mandatory to have a filter set.
Update:
As mentioned in the comments by #Nutle:
Worth noting that (on 4.3 and win7 x64) the suggested command returned
a syntax error, solved by enclosing the filter terms in quotes, i.e.
oozie jobs <...> -kill -filter "user=dennis"
To my knowledge there is no such command.
Try a shell script that lists the jobs (sadly, workflow jobs, coordinators and bundles should be listed separately), then greps the captions and fancy formatting out, cuts the job id and kills them one by one.
Maybe you can use below type of code in python . It lists all running coordinator jobs in oozie and then kill them iteratively. You can modify if you want any another statuses as well for killing jobs . Define your oozieURL parameter (like http://localhost:11000/oozie)
import os , commands
def killAllRunningJobs(oozieURL):
runningJobs = commands.getoutput("oozie jobs -oozie " + oozieURL + " -jobtype coordinator | grep -i RUNNING | awk -F \" \" '{print $1} " )
print "Current Running Co-ordinator Jobs : " + runningJobs
for jobs in runningJobs:
os.system("oozie job -oozie " + oozieURL + " -kill " + jobs)
oozie jobs -oozie oozieURL -filter STATUS=RUNNING -kill
Is there a oozie coomandline that I can use or some other way to figure out the start and end time of an oozie jobs submitted through a coordinator?
Yes, you can see the start and end time of a cooridnator job
$ oozie job -oozie http://localhost:8080/oozie -info 14-20090525161321-oozie-joe
For more info, For finding the info about workflow,cooridnator and bundle jobs