I have an Oozie workflow that consists 4 sub-workflows. I need to configure oozie in such a way that
the parent workflow runs every 4 hours
sub-workflow 1 and sub-workflow 2 runs every 4 hours.
sub-workflow 3 should only run every 12 hours.
sub-workflow 4 should only run once in a day
I could not able to find any property with oozie coordinator or workflows that could help me schedule my workflow as mentioned above. Wondering what would be the best approach to achieve the same?
Related
I am using firebase pub/sub scheduler in nodejs.
I have a scheduler which runs for every 1 minute. What if the task inside a scheduler takes 10 minutes to complete and before 10 minutes if another instance of scheduler starts again then what will happen to the already existing scheduler instance?
Cloud function instances are independent of each other. The previous one will keep running and processing what it was.
Cloud functions can run up to a max duration of 9 minutes (10 minutes for 2nd gen) so do checkout if that's suitable for your use case.
Another way to phrase what #Dharmaraj said: The scheduler just publishes a message every minute. Each job triggered by such a message then runs for its 10 minutes (independently).
I have just started to explore Apache Airflow.
Is there any way to run a job that will look into the running DAGS and move those tasks in those DAGS to new DAG by creating them and adding those tasks in it.
For Example : DAG A has four tasks, 4th one has been waiting from 7 hours to start - Goal is to create new DAG and move that tasks automatically to new DAG.
Scenario : Actually we have around 40 VM, and each job time varies with its own instance. For Example : Task A will take 2 hours today but might take 12 Hours tomorrow in the same DAG. What i need is to move that task to other DAG if the waiting time of any task exceed certain time to run on other VM instantly.
The main benefi is to keep all the task waiting time minimum as possible by building dynamic DAGs
Currently I have a task that run every 5 minute.
what I want is to have that task rerun every time it is completed with 1 minute delay.
what I have in mind is to create multiple task, task A and task B. task B will run after task A complete and vice versa. But not sure how to execute that.
I have found a workaround for my situation. what I do is create loop for task A to run followed by task B with delay in between.
I am scheduling an Oozie MapReduce job to run every 15 minutes. I wonder what would happen if each job will take longer than that set time? Will it result in a job backlog? Or Oozie will create a new task / thread / fork for the new job while the previous one is still running?
Oozie won't run the next job before the previous one is over. If the first job takes more than 15 minutes to execute then the next one will be run after scheduled time. So scheduled time and running time may be different in Oozie.
EDIT:
Anyway, the described behaviour is default only and can be changed. You can set concurrency property from controls block to more than 1, and the next job will be run even the first one is still running. Check my answer on similar question
My requirement is to create a job in informatica which will run for every 15 min and look for a status column in abc table.If it is “Approved” THEN It will exit and kick off the rest of the jobs.
If the status is not approved it will not do anything and run after 15 min.This process wil continue until we have a approval status.
So, No matter what happens in the above two scenarios,This process will run in every 15 minutes.
I have worked on the same requirement in unix using loops and conditional statments but I am not sure how this can be achieved using informatica.Could you please help me on this.
Regards,
Karthik
I would try adding a scheduler that runs every 15 minutes. The best way that I've found to "loop" sessions in Informatica is:
run the session once, check if it failed using conditional links
if it did fail, run a timer task for an amount of time (a minute, an hour, whatever)
then try to run the same session again by copying and pasting the session up ahead of the timer task, and repeat a few times as necessary.
So if you added a scheduler into the mix, you could set the scheduler to have the workflow run every 15 minutes, and have the timer tasks halt the workflow for 4 or 5 minutes each. Then you could use SESSSTARTTIME function in some pre/post-session task to determine when the scheduler will fire off again and simply abort the workflow before that time.