I have an Autosys box and there is a couple of jobs inside it.
For example, MY_BOX is the name of the box and it has the jobs JOB_1 and JOB_2.
I would like to configure the box in such a way that it runs continuously. (i.e.) as soon as the JOB_2 completes (success or failure), MY_BOX should start running again. Could you please tell me how to configure this?
I tried setting the "run condition" for MY_BOX as SUCCESS(JOB_2), however, the Box does not start after the completeion of JOB_2.
I'm not exactly sure how to make MY_BOX run immediately after the success of JOB_2, but you could set the interval on which MY_BOX runs to just about (or a little bit more) than the average run of MY_BOX.
I.E. - if MY_BOX runs for about 10 minutes, have MY_BOX run every ten/eleven minutes. Or try setting it's condition to the SUCCESS(MY_BOX).
You can also give the condition as done(MY_BOX) to the MY_BOX as you want the MY_BOX to run immediately irrespective of the success or failure of JOB_2.
If the JOB_2 fails then the MY_BOX also fails then the condition success(MY_BOX) will not make the MY_BOX to run. So if you give the condition as done(MY_BOX),irrespective of the success or failure of MY_BOX, the MY_BOX will start running immediately.
Related
I am struggling to create (and to find similar examples) of what I understand as a "background timer" on Robot Framework.
My scenario is:
I have a Test Case to Execute, and at a certain point I give one input to my system. Then I need to keep testing other functionalities on the way, but after 30 min from that initial input, check if the expected outcome is happening or not.
For example:
Run Keyword And Continue On Failure My_Keyword
#This keyword should trigger the system timer.
Run Keyword And Continue On Failure Another_Keyword
#This one does not relate with the timer but must also be executed.
Run Keyword And Continue On Failure My_Third_Keyword
#Same for this one.
Run Keyword And Continue On Failure Check_Timer_Result
#This one shall see if the inputs from the first keyword are actually having effect.
Since I know the Keyword "Sleep" will pause the execution and wait, I have been think about using some logic + BultIn Get Time keyword. But wondering if this is the most efficient way.
I have an airflow process that runs every Sunday at 12:00am. Is there a way to trigger this process exactly at the same time (absolute time) every week regardless of previous run duration or outcome. I see that the start time of the process keeps creeping forward to the point that after a couple of weeks it now gets triggered a full 16 hours later than the scheduled time. How do I make it start exactly at the same time regardless of the previous run outcome or whether previously triggerred manually or not (cron like behaviour) ?
Add depends_on_past argument in your DAG's default_args, false value will make sure new dagruns will be created every interval without depending on the previous dagrun status:
'depends_on_past': False
It might not be necessary, but I recommend restart your scheduler after making this change.
How to run airflow dag for specified number of times?
I tried using TriggerDagRunOperator, This operators works for me.
In callable function we can check states and decide to continue or not.
However the current count and states needs to be maintained.
Using above approach I am able to repeat DAG 'run'.
Need expert opinion, Is there is any other profound way to run Airflow DAG for X number of times?
Thanks.
I'm afraid that Airflow is ENTIRELY about time based scheduling.
You can set a schedule to None and then use the API to trigger runs, but you'd be doing that externally, and thus maintaining the counts and states that determine when and why to trigger externally.
When you say that your DAG may have 5 tasks which you want to run 10 times and a run takes 2 hours and you cannot schedule it based on time, this is confusing. We have no idea what the significance of 2 hours is to you, or why it must be 10 runs, nor why you cannot schedule it to run those 5 tasks once a day. With a simple daily schedule it would run once a day at approximately the same time, and it won't matter that it takes a little longer than 2 hours on any given day. Right?
You could set the start_date to 11 days ago (a fixed date though, don't set it dynamically), and the end_date to today (also fixed) and then add a daily schedule_interval and a max_active_runs of 1 and you'll get exactly 10 runs and it'll run them back to back without overlapping while changing the execution_date accordingly, then stop. Or you could just use airflow backfill with a None scheduled DAG and a range of execution datetimes.
Do you mean that you want it to run every 2 hours continuously, but sometimes it will be running longer and you don't want it to overlap runs? Well, you definitely can schedule it to run every 2 hours (0 0/2 * * *) and set the max_active_runs to 1, so that if the prior run hasn't finished the next run will wait then kick off when the prior one has completed. See the last bullet in https://airflow.apache.org/faq.html#why-isn-t-my-task-getting-scheduled.
If you want your DAG to run exactly every 2 hours on the dot [give or take some scheduler lag, yes that's a thing] and to leave the prior run going, that's mostly the default behavior, but you could add depends_on_past to some of the important tasks that themselves shouldn't be run concurrently (like creating, inserting to, or dropping a temp table), or use a pool with a single slot.
There isn't any feature to kill the prior run if your next schedule is ready to start. It might be possible to skip the current run if the prior one hasn't completed yet, but I forget how that's done exactly.
That's basically most of your options there. Also you could create manual dag_runs for an unscheduled DAG; creating 10 at a time when you feel like (using the UI or CLI instead of the API, but the API might be easier).
Do any of these suggestions address your concerns? Because it's not clear why you want a fixed number of runs, how frequently, or with what schedule and conditions, it's difficult to provide specific recommendations.
This functionality isn't natively supported by Airflow
But by exploiting the meta-db, we can cook-up this functionality ourselves
we can write a custom-operator / python operator
before running the actual computation, check if 'n' runs for the task (TaskInstance table) already exist in meta-db. (Refer to task_command.py for help)
and if they do, just skip the task (raise AirflowSkipException, reference)
This excellent article can be used for inspiration: Use apache airflow to run task exactly once
Note
The downside of this approach is that it assumes historical runs of task (TaskInstances) would forever be preserved (and correctly)
in practise though, I've often found task_instances to be missing (we have catchup set to False)
furthermore, on large Airflow deployments, one might need to setup routinal cleanup of meta-db, which would make this approach impossible
Have a box which has 10 jobs. The box is scheduled to run everyday after every 1 hour say at 9, 10, 11... There are no conditions on any jobs within the box nor on the box.
When the last job fails say at 9:30, the box becomes failed.
When one of the middle job fails say at 9:30, the box remains in running state.
Now at the next run time comes ie 10:
In case of last job failure, the box restarts at 10.
In case of mid job failure, the box doesn't restart.
In case of last job failure, I want the box not to restart at 10 as we want the whole box to be completed or wait for someone to fix the last job.
How can I do that? Is there a way to put a condition on box that it starts only when last run of the box is success. Would the condition success(box_name) be appropriate? Please help. Let me know in case the issue is not clear.
In that case create a 11th job in main box with job_type: b and sleep_interval: 60
and change success condition of main box to success(job11).
Make job11 as dependent to job10, where job 11 runs after success(Job10).
If job10 fails, work on the failure and needs to trigger job11 for the mainbox to succeed.
I'm working with a fairly large dataset and any time I perform an operation, if I forget to include the semi-colon at the end of the statement it takes several minutes because it outputs all the data to the console window as it goes. How do I halt execution of the current statement?
I've tried using Ctrl-C like in MATLAB, as well as using the Abort and Interrupt options in the Control menu, none of which seem to be working. Is this a bug, or am I missing something? I'm running on Windows 8 64-bit in case that helps.
To halt execution, press Ctrl-C in the console window of Scilab. This will display
Type 'resume' or 'abort' to return to standard level prompt.
-1->
Entering abort abandons the computation completely. Entering resume resumes it.
I did not check it but this tutorial gives as option the ESC key
"To abort a command midway without executing it, press the Esc key on the keyboard."