What happens when you start an Autosys job that is already running? Is the start ignored or is another instance of the job created and executed in parallel with the already-running job?
I know for a fact that issuing a FORCE_STARTJOB against a running job results in an error saying that the job is already running and that FORCE_STARTJOB has no effect, but I'm not sure how a STARTJOB is handled when the job is already running.
I found out that an already running job ignores any FORCE_STARTJOB or STARTJOB messages.
Related
I work in a team that uses one of the big cloud providers to host the stuff that we do. Every morning before I come into work I have a scheduled job that stands up a development environment within that cloud and every evening I have a scheduled job that tears it all down again. That development enviornment includes an instance of Apache Airflow and another thing that job does is run an Airflow DAG which contains one task.
I have an intermittent problem with that DAG, the DAG will run but occasionally the task instance for that one task fails to get scheduled. It has happened this morning, here are the task instance details:
In this case:
the scheduler is running and is definitely not under heavy load (nothing else is running)
as far as I'm aware it has not already ran
I have an easy way of fixing this, I go and restart the airflow scheduler (which, because we have setup airflow to run as a linux service, involves ssh'ing onto the VM on which we have airflow installed and issuing systemctl restart airflow-scheduler). Immediately after doing this the task instance will begin to execute.
As I said this problem is intermittent i.e. I cannot determine the root cause, some mornings everything works fine, sometimes it gets stuck like this.This morning it is stuck.
I have read Why isn't my task getting scheduled? and one thing there that piqued my attention was:
Is your start_date set properly? The Airflow scheduler triggers the task soon after the start_date + schedule_interval is passed.
I have just had a look at the task and its start_date is None:
The schedule_interval of the DAG is None because we don't schedule this DAG, we manually trigger it (which is what my morning job does):
So, the task doesn't have a start_date and the schedule_interval of the DAG is None which sort of explains why its not running, but it doesn't explain why some days it does run and some days it does not.
I have just gone and restarted the scheduler service (as explained above) and the task is now running. Taking a look at the details of the task instance again, it now gained a start_date:
I'm not clear on why restarting the scheduler causes the task instance to start running. Can anyone suggest what might be the cause? I admit I don't have a great understanding of start_date.
UPDATE 2020-04-21: A colleague brought to my attention a bug that sounds similar (though may not be the same): AIRFLOW-1641 - Task gets stuck in queued state. That issue was fixed in airflow 1.9, we are currently using airflow 1.8.1 but will soon be upgrading to airflow 1.10.
You are correct, restarting the scheduler shouldn't change the start date of the dag. I'm wondering if you have a small logic bug in your job that initially create the airflow instance and dag. It sounds like everything would work fine if your dag had a start date to begin with. Them you wouldn't need to dive into why restarting the scheduler gets it to work.
I am trying to understand autosys job. Suppose I have Job A that runs every 15 minutes. Suppose for some reason if Job A takes more than 15 minutes, will another instance of it run or it will wait for the job to finish before running another instance?
In my experience, if the previous job run is still running, another instance will not run if the next scheduled time comes. The next time the job runs is when the previous run is finished and the next scheduled time comes.
Another user also experienced this according to this answer.
I did not find any AutoSys documentation that officially confirms what happens in this situation, but I guess the best way to find out is to test it on your AutoSys instance.
I have experienced this first hand and can confirm that there won't be two instances in the mentioned scenario. The job will wait on the previous run to complete and will immediately kick off the next instance if the time condition is met before the previous completes.
But this will be the case only when the job is in running state, if the job is in any other state it will kick off based on the given start_time condition.
An oozie coordinator we own has been killed for operational reasons about a week ago. The cluster is now back up and running and ready for business. Can we revive it somehow so it will keep its run history and backfill all missing runs, or do we have to schedule a brand new one?
oozie job -resume xxxxxxx-xxxxxxxxxxxxxxx-oozie-oozi-C doesn't error out, but it also doesn't change the status of the coordinator back to RUNNING.
Have you tried out the killed -> ignored -> running transition? Based on the docs it should be possible.
It's a two step process: first one is based -ignore, second one is -change.
I've never tried to do this though :)
We are using DBMS_PARALLEL_EXECUTE in Oracle 11g R2 as following
scheduled job run a procedure which creates and runs parallel task using package DBMS_PARALLEL_EXECUTE. Sometime scheduled job hangs and is needed to be restarted. And my question is how to properly kill executing parallel task?
Using DBMS_PARALLEL_EXECUTE.DROP_TASK or DBMS_PARALLEL_EXECUTE.STOP_TASK procedures do not help - I can see sessions of task processes (it creates the same amount of new processes as parameter parallel_level of DBMS_PARALLEL_EXECUTE.RUN_TASK). The same with killing scheduled job (dbms_job.remove) and killing job session - task sessions still available.
Found a solution myself. I was concetrated on dbms_job package and related view dba_jobs_running. I found that dbms_job is depreciated package. Oracle (11g) use dbms_scheduler package when run parallel task's jobs. They are visible in dba_scheduler_running_jobs and can be stopped by dbms_scheduler.stop_job. This action also stops upper level job and all related sessions. Also parallel task get status Crashed.
again, im stucking in Gearman. I was implementing the ulabox gearman Bundle which works nicely. But there are two things which I dont unterstand yet.
How do I start a Worker??
Im the documentation, I should first execute a worker and the start the code.
https://github.com/ulabox/GearmanBundle/blob/master/README.md
Open the first console and run:
$ php app/console gearman:worker:execute --worker=AcmeDemoBundle:AcmeWorker
Now open another console and run:
$ php app/console gearman:client:execute --client=UlaboxGearmanBundle:GearmanClient:hello_world --worker=AcmeDemoBundle:AcmeWorker --params="{\"foo\": \"bar\" }"
So, if I dont start the worker manually, the job would be done by itsself. If I start the worker, everysthin is fine. But at least, it is a bit strange to start in manually, even if there is set an iteration of x so that the worker will kill itsself after that amount of job.
So please, can anyone help me out of this :((((
Heeeeeelp :) lol
thanks in advance an kind regards
Phil
Yes to run some task in background not only Gearman need to be run but also workers.
So you have run "gearman" that wait for some command (e.x. email send).
Additionally you have waiting workers.
When gearman view new command he look for first free worker and pass this command to it.
Next worker process execution for command and after finish return to Gearman server that it finished and ready to process new command.
More worker you have faster commands in queue processed.
You can use "supervisor" for automatic maintenance workers running.
Bellow you can find few links with more information:
http://www.daredevel.com/php-jobs-with-gearman-and-supervisor/
http://www.masnun.com/2011/11/02/gearman-php-and-supervisor-processing-background-jobs-with-sanity.html
Running Gearman Workers in the Background