Is there a way to revive a coordinator? - oozie

An oozie coordinator we own has been killed for operational reasons about a week ago. The cluster is now back up and running and ready for business. Can we revive it somehow so it will keep its run history and backfill all missing runs, or do we have to schedule a brand new one?
oozie job -resume xxxxxxx-xxxxxxxxxxxxxxx-oozie-oozi-C doesn't error out, but it also doesn't change the status of the coordinator back to RUNNING.

Have you tried out the killed -> ignored -> running transition? Based on the docs it should be possible.
It's a two step process: first one is based -ignore, second one is -change.
I've never tried to do this though :)

Related

Airflow Dependencies Blocking Task From Getting Scheduled

I have an airflow instance that had been running with no problem for 2 months until Sunday. There was a blackout in a system on which my airflow tasks depend and some tasks where queued for 2 days. After that we decided it was better to mark all the tasks for that day as failed and just lose that data.
Nevertheless, now all the new tasks get trigger at the proper time but they are never being set to any state (neither queued nor running). I check the logs and I see this output:
Dependencies Blocking Task From Getting Scheduled
All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:
The scheduler is down or under heavy load
The following configuration values may be limiting the number of queueable processes: parallelism, dag_concurrency, max_active_dag_runs_per_dag, non_pooled_task_slot_count
This task instance already ran and had its state changed manually (e.g. cleared in the UI)
I get the impression the 3rd topic is the reason why it is not working.
The scheduler and the webserver were working, however I restarted the scheduler and still I am having the same outcome. I also deleted the data in mysql database for one job and it is still not running.
I also saw a couple of post that said it is not running because the depens_on_past was set to true and if the previous runs failed, the next one will never be executed. I also checked it and it is not my case.
Any input would be really apreciated.
Any ideas? Thanks
While debugging a similar issue i found this setting: AIRFLOW__SCHEDULER__MAX_DAGRUNS_PER_LOOP_TO_SCHEDULE (or http://airflow.apache.org/docs/apache-airflow/2.0.1/configurations-ref.html#max-dagruns-per-loop-to-schedule), checking the airflow code it seems that the scheduler queries for dagruns to examine (consider to run ti's for), this query is limited to that number of rows (or 20 by default). So if you have >20 dagruns that are in some way blocked (in our case because ti's were on up-for-retry), then it won't consider other dagruns even though these could run fine.

Triggering an Airflow DAG from terminal always keep running state

I am trying to use airflow trigger_dag dag_id to trigger my dag, but it just show me running state and doesn't do anymore.
I have searched for many questions, but all people just say dag id paused. the problem is my dag is unpaused, but also keep the running state.
Note: I can use one dag to trigger another one in Web UI. But it doesn't work in command line.
please see the snapshot as below
I had the same issue many times, The state of the task is not running, it is not queued either, it's stuck after we 'clear'. Sometimes I found the task is going to Shutdown state before getting into stuck. And after a large time the instance will be failed, still, the task status will be in white. I have solved it in many ways, I
can't say its reason or exact solution, but try one of this:
Try trigger dag command again with the same Execution date and time instead of the clear option.
Try backfill it will run only unsuccessful instances.
or try with a different time within the same interval it will create another instance which is fresh and not have the issue.

Autosys Can a job run multiple instance at the same time

I am trying to understand autosys job. Suppose I have Job A that runs every 15 minutes. Suppose for some reason if Job A takes more than 15 minutes, will another instance of it run or it will wait for the job to finish before running another instance?
In my experience, if the previous job run is still running, another instance will not run if the next scheduled time comes. The next time the job runs is when the previous run is finished and the next scheduled time comes.
Another user also experienced this according to this answer.
I did not find any AutoSys documentation that officially confirms what happens in this situation, but I guess the best way to find out is to test it on your AutoSys instance.
I have experienced this first hand and can confirm that there won't be two instances in the mentioned scenario. The job will wait on the previous run to complete and will immediately kick off the next instance if the time condition is met before the previous completes.
But this will be the case only when the job is in running state, if the job is in any other state it will kick off based on the given start_time condition.

Oozie job taking longer than scheduled interval

I am scheduling an Oozie MapReduce job to run every 15 minutes. I wonder what would happen if each job will take longer than that set time? Will it result in a job backlog? Or Oozie will create a new task / thread / fork for the new job while the previous one is still running?
Oozie won't run the next job before the previous one is over. If the first job takes more than 15 minutes to execute then the next one will be run after scheduled time. So scheduled time and running time may be different in Oozie.
EDIT:
Anyway, the described behaviour is default only and can be changed. You can set concurrency property from controls block to more than 1, and the next job will be run even the first one is still running. Check my answer on similar question

How to reschedule a coordinator job in OOZIE without restarting the job?

When i changed the start time of a coordinator job in job.properties in oozie, the job is not taking the changed time, instead its running in the old scheduled time.
Old job.properties:
startMinute=08
startTime=${startDate}T${startHour}:${startMinute}Z
New job.properties:
startMinute=07
startTime=${startDate}T${startHour}:${startMinute}Z
The job is not running at the changed time:07th minute,its running at 08th minute in every hour.
Please can you let me know the solution, how i can make the job pickup the updated properties(changed timing) without restarting or killing the job.
You can't really change the timing of the co-ordinator via any methods given by Oozie(v3.3.2) . When you submit a job the contents properties are stored in the database whereas the actual workflow is in the HDFS.
Everytime you execute the co-ordinator it is necessary to have the workflow in the path specified in properties during job submission but the properties file is not needed. What I mean to imply is the properties file does not come into the picture after submitting the job.
One hack is to update the time directly in the database using SQL query.But I am not sure about the implications of it.The property might become inconsistent across the database.
You have to kill the job and resubmit a new one.
Note: oozie provides a way to change the concurrency,endtime and pausetime as specified in the official docs.

Resources