How to re-run oozie failed job - oozie

I want to know about failed oozie job execution.
Does it run from start or run from the failed point?
I have MR and Pig task to be executed.
If Pig job failed, does it start again from MR job execution?
How to re-run the failed oozie jobs?

It will start from the failed action.
oozie job --oozie <oozie_url> -rerun <job_id> -config <job.properties>
Add the following property in the job.properties if not present already. -config parameter is required only if you are updating the job.properties file.
oozie.wf.rerun.failnodes=true

Related

Restart Autosys job when terminated via another Autosys job

I am setting up an Autosys job to restart another job when the main job is terminated.
Insert_job: Job_Restarter
Job_type: cmd
Condition: t(main_job)
Box_name: my_test_box
Permission: gx,get
Command: send -E FORCE_STARTJOB -J main_job
When the main job is terminated, the restart job runs but fails and I get an error code of 1. I know this is a generic error code, but dose anyone have an idea of what I am doing wrong?
Edit:
Did some digging. "Sendevent" is not recognized as a command. Is there another way to restart the job through another job?

Error while running a slurm job through crontab which uses Intel MPI

I am trying to run WRF (real.exe, wrf.exe) through the crontab using compute nodes but compute nodes are not able to run slurm job. I think there is some issue with the MPI library when it's running through the cron environment.
Tried to replicate the terminal path variables while running the crontab job.
The log file generated while running job on terminal and crontab is attached as with_terminal and with_crontab respectively in the link.
https://drive.google.com/drive/folders/1YE9OchSB8alpZSdRl-8uIbBPm6lI-0DQ
Error while running the job from crontab is as follows
#################################
compute-dy-c5n18xlarge-1
#################################
Processes 4
[mpiexec#compute-dy-c5n18xlarge-1] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on compute-dy-c5n18xlarge-1 (pid 23070, exit code 65280)
[mpiexec#compute-dy-c5n18xlarge-1] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[mpiexec#compute-dy-c5n18xlarge-1] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[mpiexec#compute-dy-c5n18xlarge-1] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:772): error waiting for event
[mpiexec#compute-dy-c5n18xlarge-1] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1938): error setting up the boostrap proxies
Thanks for looking in the issue.

How to re-run all failed tasks in Apache Airflow?

I have a Apache Airflow DAG with tens of thousands of tasks and after a run, say a handful of them failed.
I fixed the bug that caused some tasks to fail and I would like to re-run ONLY FAILED TASKS.
This SO post suggests using the GUI to "clear" failed task:
How to restart a failed task on Airflow
This approach works if you have a handful number of failed tasks.
I am wondering if we can bypass the GUI and do it problematically, through command line something like:
airflow_clear_failed_tasks dag_id execution_data
Use the following command to clear only failed tasks:
airflow clear [-s START_DATE] [-e END_DATE] --only_failed dag_id
Documentation: https://airflow.readthedocs.io/en/stable/cli.html#clear
The command to clear only failed tasks was updated. It is now (Airflow 2.0 as of March 2021):
airflow tasks clear [-s START_DATE] [-e END_DATE] --only-failed dag_id

Report of oozie jobs

How can we get status of Oozie jobs running daily? We have many jobs running in Oozie coordinator and currently we are monitoring through Hue/Oozie browser.
Is there any way we can get a single log file which contains coordinator name/workflow name with date and status? Can we write any program or script to achieve this?
You use the below command and put it into a script to run it daily/cron.
oozie jobs -oozie http://localhost:11000/oozie -filter status=RUNNING -len 2
oozie jobs -oozie http://localhost:11000/oozie -filter startCreatedTime=2016-06-28T00:00Z\;endcreatedtime=2016-06-28T10:00Z -len 2
Basically you are using oozie's jobs api and -filter command to get the information about workflow/coordinator/bundle. The -filter command supports couple of options to get the data based on status/startCreatedTime/name.
By default, it will bring the workflow record information, if you want to get the coordinator/bindle information. You can use the -jobtype parameter and value as coord/bundle.
Let me know if you need anything specifically. The oozie doc is little outdated for this feature.
Command to get status of all running oozie coordinators
oozie jobs -jobtype coordinator -filter status=RUNNING -len 1000 -oozie http://localhost:11000/oozie
Command to get status of all running oozie workflows
oozie jobs -filter status=RUNNING -len 1000 -oozie http://localhost:11000/oozie
Command to get status of all workflows for a specific coordinator ID
oozie job -info COORDINATOR_ID_HERE
Based on these queries you can write required scripts to get what you want.
Terms explanation:
oozie : Command to initiate oozie
job/jobs : API
len: No. of oozie workflows/coordinators to display
-oozie: Param to specify oozie url
-filter : Param to specify list of filters.
Complete documentation https://oozie.apache.org/docs/3.1.3-incubating/DG_CommandLineTool.html
Below command worked for me.
oozie jobs -oozie http://xx.xxx.xx.xx:11000/oozie -jobtype wf -len 300 | grep 2016-07-01 > OozieJobsStatus_20160701.txt
However we need to parse the file.

how to find the total time taken by a job in oozie

Is there a oozie coomandline that I can use or some other way to figure out the start and end time of an oozie jobs submitted through a coordinator?
Yes, you can see the start and end time of a cooridnator job
$ oozie job -oozie http://localhost:8080/oozie -info 14-20090525161321-oozie-joe
For more info, For finding the info about workflow,cooridnator and bundle jobs

Resources