Kill all filtered oozie jobs - oozie

How to kill all filtered oozie jobs?
For example I want to kill all oozie jobs, with next condition:
oozie jobs -filter name=calculationx

This should work (note that there are some field missing, like Oozie host & ect'):
for jobId in `oozie jobs -filter name=calculationx | grep RUNNING | cut -d" " -f1`
do
echo "Killing job ${jobId}"
job -kill ${jobId}
done

Related

get teradata tpt job name after script execution

I am executing tpt jobs in script 24/7 from powershell with this command
cmd /c tbuild.exe -f $tptfile -v $configFile -j $jobname >> $logFile
When I check TPT folder for logs it adds job sequence number at the end of my jobname variable so I cannot work with that job. Is there any way how to disable adding job sequence number, or how to get exact job name after execution?

Report of oozie jobs

How can we get status of Oozie jobs running daily? We have many jobs running in Oozie coordinator and currently we are monitoring through Hue/Oozie browser.
Is there any way we can get a single log file which contains coordinator name/workflow name with date and status? Can we write any program or script to achieve this?
You use the below command and put it into a script to run it daily/cron.
oozie jobs -oozie http://localhost:11000/oozie -filter status=RUNNING -len 2
oozie jobs -oozie http://localhost:11000/oozie -filter startCreatedTime=2016-06-28T00:00Z\;endcreatedtime=2016-06-28T10:00Z -len 2
Basically you are using oozie's jobs api and -filter command to get the information about workflow/coordinator/bundle. The -filter command supports couple of options to get the data based on status/startCreatedTime/name.
By default, it will bring the workflow record information, if you want to get the coordinator/bindle information. You can use the -jobtype parameter and value as coord/bundle.
Let me know if you need anything specifically. The oozie doc is little outdated for this feature.
Command to get status of all running oozie coordinators
oozie jobs -jobtype coordinator -filter status=RUNNING -len 1000 -oozie http://localhost:11000/oozie
Command to get status of all running oozie workflows
oozie jobs -filter status=RUNNING -len 1000 -oozie http://localhost:11000/oozie
Command to get status of all workflows for a specific coordinator ID
oozie job -info COORDINATOR_ID_HERE
Based on these queries you can write required scripts to get what you want.
Terms explanation:
oozie : Command to initiate oozie
job/jobs : API
len: No. of oozie workflows/coordinators to display
-oozie: Param to specify oozie url
-filter : Param to specify list of filters.
Complete documentation https://oozie.apache.org/docs/3.1.3-incubating/DG_CommandLineTool.html
Below command worked for me.
oozie jobs -oozie http://xx.xxx.xx.xx:11000/oozie -jobtype wf -len 300 | grep 2016-07-01 > OozieJobsStatus_20160701.txt
However we need to parse the file.

Is there a way to direct Oozie to kill all jobs?

I have tried
oozie job -oozie http://sandbox.hortonworks.com:11000/oozie -config ./job.properties -kill *
...to no effect. I have done a few Google searches and checked Oozie's documentation, and there does not appear to be a command for this.
Would any one know of a way to accomplish this?
It seems that the recent versions of oozie (tested on 4.2) have made this a lot easier.
Here is a oneliner that I now use to kill all jobs that I created.
oozie jobs -oozie http://myserver:11000/oozie -kill -filter user=dennis -jobtype bundle & oozie jobs -oozie http://myserver:11000/oozie -kill -filter user=dennis -jobtype coordinator & oozie jobs -oozie http://myserver:11000/oozie -kill -filter user=dennis
First it kills all bundles, then it kills all coordinators and finally all workflows. Note that I set a filter to my own username, as it appears to be mandatory to have a filter set.
Update:
As mentioned in the comments by #Nutle:
Worth noting that (on 4.3 and win7 x64) the suggested command returned
a syntax error, solved by enclosing the filter terms in quotes, i.e.
oozie jobs <...> -kill -filter "user=dennis"
To my knowledge there is no such command.
Try a shell script that lists the jobs (sadly, workflow jobs, coordinators and bundles should be listed separately), then greps the captions and fancy formatting out, cuts the job id and kills them one by one.
Maybe you can use below type of code in python . It lists all running coordinator jobs in oozie and then kill them iteratively. You can modify if you want any another statuses as well for killing jobs . Define your oozieURL parameter (like http://localhost:11000/oozie)
import os , commands
def killAllRunningJobs(oozieURL):
runningJobs = commands.getoutput("oozie jobs -oozie " + oozieURL + " -jobtype coordinator | grep -i RUNNING | awk -F \" \" '{print $1} " )
print "Current Running Co-ordinator Jobs : " + runningJobs
for jobs in runningJobs:
os.system("oozie job -oozie " + oozieURL + " -kill " + jobs)
oozie jobs -oozie oozieURL -filter STATUS=RUNNING -kill

Manage RabbitMQ consumers

I am planning to write a log processing application using RabbitMQ, Symfony2 and the RabbitMqBundle.
The tool I am working on has to be highly available and must process millions of entries per day, so it's important that the consumers are always up and running (short breaks are fine), otherwise my queue might overflow after a while.
Are there best practices on how to manage the consumers (written in PHP), start/restart them in case of an error etc?
Thanks
I use this bash script to make sure that there are all required consumers running on imagepush.to:
#!/bin/bash
NB_TASKS=1
SYMFONY_ENV="prod"
TEXT[0]="app/console rabbitmq:consumer primary"
TEXT[1]="app/console rabbitmq:consumer secondary"
for text in "${TEXT[#]}"
do
NB_LAUNCHED=$(ps ax | grep "$text" | grep -v grep | wc -l)
TASK="/usr/bin/env php ${text} --env=${SYMFONY_ENV}"
for (( i=${NB_LAUNCHED}; i<${NB_TASKS}; i++ ))
do
echo "$(date +%c) - Launching a new consumer"
nohup $TASK &
done
done
If I remember correctly I took base from KnpBundles code.
Stop consumer
To stop your consumer with name my_consumer use
kill `ps aux | less | grep 'rabbitmq:consumer my_consumer' | grep -v grep | awk '{print $2}'`
ps aux | less | grep 'rabbitmq:consumer my_consumer' - will find all running processes of the consumer
grep -v grep - will exclude your own search process
awk '{print $2}' - get only process id from the row
kill - will terminate all found processes
Start consumer
To start your consumer with name my_consumer use
nohup /usr/bin/env php app/console rabbitmq:consumer consumer --env=prod &
I have a lot of consumers in the project and it became hard to restart them after deploy. And I started using Capistrano + Symfony plugin to deploy my project. I wrote a few custom tasks to start/stop/restart the consumers based on the yaml config. Tasks are based on the commands from above.

Increase the performace of the code by reducing the number of ssh

This function take hugh amount of time to calculate the status of a process, beacuse every time it has to ssh into the machine and find the status of a process.
I only have four machines and around 50+ process to monitor and the details are mentioned into configDaemonDetails.txt
like:
abc#sn123|Daemon_1|processname_1
abc#sn123|Daemon_2|processname_2
efg#sn321|Daemon_3|processname_3
How to reduce the time with doing ssh once into a machine and finding all its process informations as defined in the txt file. ?
CheckProcessStatus ()
{
echo " ***** Checking Process Status ***** "
echo "========================================================="
IFS='|'
cat configDaemonDetails.txt | grep -v "^#" | while read MachineDetail Daemon ProcessName
do
Status=`ssh -f -T ${MachineDetail} ps -ef | egrep -v "grep|less|vi|more" | grep "$ProcessName"`
RunTime=`echo "$Status" | sed -e 1'p' -e '1,$d' | awk '{print $5" "$6}'`
if [ -z "$Status" ]
then
echo "The Process is DOWN $Daemon | $ProcessName "
else
echo "The Process $Daemon | $ProcessName is up since $RunTime"
fi
done
echo "-----------------------------------------------------"
}
Thanks :)
Can't you just fetch the entire ps -ef output at once, and then parse it appropriately? I suspect that is what you are asking, and maybe all you want is an example of how to do that? If that is the case, say so and I'll flesh out an example.
SSH is a bit over kill for getting status of a process, I'd suggest using SNMP instead.
e.g, you can get a process list like this:
snmpwalk -v2c -cPASSWORD HOST 1.3.6.1.2.1.25.4.2.1
Take a look at this Nagios plugin that does process checks, and look in the code for the actual SNMP OIDs.

Resources