I have tried
oozie job -oozie http://sandbox.hortonworks.com:11000/oozie -config ./job.properties -kill *
...to no effect. I have done a few Google searches and checked Oozie's documentation, and there does not appear to be a command for this.
Would any one know of a way to accomplish this?
It seems that the recent versions of oozie (tested on 4.2) have made this a lot easier.
Here is a oneliner that I now use to kill all jobs that I created.
oozie jobs -oozie http://myserver:11000/oozie -kill -filter user=dennis -jobtype bundle & oozie jobs -oozie http://myserver:11000/oozie -kill -filter user=dennis -jobtype coordinator & oozie jobs -oozie http://myserver:11000/oozie -kill -filter user=dennis
First it kills all bundles, then it kills all coordinators and finally all workflows. Note that I set a filter to my own username, as it appears to be mandatory to have a filter set.
Update:
As mentioned in the comments by #Nutle:
Worth noting that (on 4.3 and win7 x64) the suggested command returned
a syntax error, solved by enclosing the filter terms in quotes, i.e.
oozie jobs <...> -kill -filter "user=dennis"
To my knowledge there is no such command.
Try a shell script that lists the jobs (sadly, workflow jobs, coordinators and bundles should be listed separately), then greps the captions and fancy formatting out, cuts the job id and kills them one by one.
Maybe you can use below type of code in python . It lists all running coordinator jobs in oozie and then kill them iteratively. You can modify if you want any another statuses as well for killing jobs . Define your oozieURL parameter (like http://localhost:11000/oozie)
import os , commands
def killAllRunningJobs(oozieURL):
runningJobs = commands.getoutput("oozie jobs -oozie " + oozieURL + " -jobtype coordinator | grep -i RUNNING | awk -F \" \" '{print $1} " )
print "Current Running Co-ordinator Jobs : " + runningJobs
for jobs in runningJobs:
os.system("oozie job -oozie " + oozieURL + " -kill " + jobs)
oozie jobs -oozie oozieURL -filter STATUS=RUNNING -kill
Related
I am executing tpt jobs in script 24/7 from powershell with this command
cmd /c tbuild.exe -f $tptfile -v $configFile -j $jobname >> $logFile
When I check TPT folder for logs it adds job sequence number at the end of my jobname variable so I cannot work with that job. Is there any way how to disable adding job sequence number, or how to get exact job name after execution?
In below example, if shell script shell_script.sh sends a job to cluster, is it possible to have snakemake aware of that cluster job's completion? That is, first, file a should be created by shell_script.sh which sends its own job to the cluster, and then once this cluster job is completed, file b should be created.
For simplicity, let's assume that snakemake is run locally meaning that the only cluster job originating is from shell_script.sh and not by snakemake .
localrules: that_job
rule all:
input:
"output_from_shell_script.txt",
"file_after_cluster_job.txt"
rule that_job:
output:
a = "output_from_shell_script.txt",
b = "file_after_cluster_job.txt"
shell:
"""
shell_script.sh {output.a}
touch {output.b}
"""
PS - At the moment, I am using sleep command to give it a waiting time before the job is "completed". But this is an awful workaround as this could give rise to several problems.
Snakemake can manage this for you with the --cluster argument on the command line.
You can supply a template for the jobs to be executed on the cluster.
As an example, here is how I use snakemake on a SGE managed cluster:
template which will encapsulate the jobs which I called sge.sh:
#$ -S /bin/bash
#$ -cwd
#$ -V
{exec_job}
then I use directly on the login node:
snakemake -rp --cluster "qsub -e ./logs/ -o ./logs/" -j 20 --jobscript sge.sh --latency-wait 30
--cluster will tell which queuing system to use
--jobscript is the template in which jobs will be encapsulated
--latency-wait is important if the file system takes a bit of time to write the files. You job might end and return before the output of the rules are actually visible to the filesystem which will cause an error
Note that you can specify rules not to be executed on the nodes in the Snakefile with the keyword localrules:
Otherwise, depending on your queuing system, some options exist to wait for job sent to cluster to finish:
SGE:
Wait for set of qsub jobs to complete
SLURM:
How to hold up a script until a slurm job (start with srun) is completely finished?
LSF:
https://superuser.com/questions/46312/wait-for-one-or-all-lsf-jobs-to-complete
How can we get status of Oozie jobs running daily? We have many jobs running in Oozie coordinator and currently we are monitoring through Hue/Oozie browser.
Is there any way we can get a single log file which contains coordinator name/workflow name with date and status? Can we write any program or script to achieve this?
You use the below command and put it into a script to run it daily/cron.
oozie jobs -oozie http://localhost:11000/oozie -filter status=RUNNING -len 2
oozie jobs -oozie http://localhost:11000/oozie -filter startCreatedTime=2016-06-28T00:00Z\;endcreatedtime=2016-06-28T10:00Z -len 2
Basically you are using oozie's jobs api and -filter command to get the information about workflow/coordinator/bundle. The -filter command supports couple of options to get the data based on status/startCreatedTime/name.
By default, it will bring the workflow record information, if you want to get the coordinator/bindle information. You can use the -jobtype parameter and value as coord/bundle.
Let me know if you need anything specifically. The oozie doc is little outdated for this feature.
Command to get status of all running oozie coordinators
oozie jobs -jobtype coordinator -filter status=RUNNING -len 1000 -oozie http://localhost:11000/oozie
Command to get status of all running oozie workflows
oozie jobs -filter status=RUNNING -len 1000 -oozie http://localhost:11000/oozie
Command to get status of all workflows for a specific coordinator ID
oozie job -info COORDINATOR_ID_HERE
Based on these queries you can write required scripts to get what you want.
Terms explanation:
oozie : Command to initiate oozie
job/jobs : API
len: No. of oozie workflows/coordinators to display
-oozie: Param to specify oozie url
-filter : Param to specify list of filters.
Complete documentation https://oozie.apache.org/docs/3.1.3-incubating/DG_CommandLineTool.html
Below command worked for me.
oozie jobs -oozie http://xx.xxx.xx.xx:11000/oozie -jobtype wf -len 300 | grep 2016-07-01 > OozieJobsStatus_20160701.txt
However we need to parse the file.
How to kill all filtered oozie jobs?
For example I want to kill all oozie jobs, with next condition:
oozie jobs -filter name=calculationx
This should work (note that there are some field missing, like Oozie host & ect'):
for jobId in `oozie jobs -filter name=calculationx | grep RUNNING | cut -d" " -f1`
do
echo "Killing job ${jobId}"
job -kill ${jobId}
done
What is the difference between a job and a process in Unix ? Can you please give an example ?
Jobs are processes which are started by a shell. The shell keeps track of these in a job table. The jobs command shows a list of active background processes. They get a jobspec number which is not the pid of the process. Commands like fg use the jobspec id.
In the spirit of Jürgen Hötzel's example:
find $HOME | sort &
[1] 15317
$ jobs
[1]+ Running find $HOME | sort &
$ fg
find $HOME | sort
C-c C-z
[1]+ Stopped find $HOME | sort
$ bg 1
[1]+ find $HOME | sort &
Try the examples yourself and look at the man pages.
A Process Group can be considered as a Job. For example you create a background process group in shell:
$ find $HOME|sort &
[1] 2668
And you can see two processes as members of the new process group:
$ ps -p 2668 -o cmd,pgrp
CMD PGRP
sort 2667
$ ps -p "$(pgrep -d , -g 2667)" -o cmd,pgrp
CMD PGRP
find /home/juergen 2667
sort 2667
You can can also kill the whole process group/job:
$ pkill -g 2667
http://en.wikipedia.org/wiki/Job_control_%28Unix%29:
Processes under the influence of a job control facility are referred to as jobs.
http://en.wikipedia.org/wiki/Job_control_%28Unix%29
Jobs are one or more processes that are grouped together as a 'job', where job is a UNIX shell concept.
Jobs are one or more processes that are grouped together as a 'job', where job is a UNIX shell concept. A job consists of multiple processes running in series or parallel. while
A process is a program under execution. job is when you want to know about processes started from the current shell.
A job consists of multiple processes running in series or parallel. A process is a program under execution.
job is when you want to know about processes started from the current shell.
process is when you want to know about a process running from any shell or computer.
I think a job is a scheduled process or set of processes, a job always has the notion of schedule, otherwise we could call it a process.