Schedule java batch job with interval from last end date - ejb

I wrote my job using jsr-352 and deployed it on wildfly. How can I schedule one job with some delay after last end time like below time line where = is execution time and - is delay time:
===============--=====--========--
Note: maximum number of job execution is one

JBeret ejb scheduler supports repeating interval job executions, with a fixed interval duration or certain delay duration after the start of job execution. Delay after end of a job execution is currently not supported. If your job execution duration is relatively predictable, you can approximate it with either interval or delay after the start of a job execution.
To achieve this kind of job schedulgin with finer control, you can try the following:
schedule a single-action job schedule
configure a job listener in job.xml to watch for the end of the above job execution, and scheule the next single-action job execution with a short initial delay
specifically, the job listener's afterJob() method should be able to look up or inject TimerSchedulerBean, which is a local singleton EJB, and invoke its org.jberet.schedule.TimerSchedulerBean#schedule method. The job listener is reponsible for creating an instance of org.jberet.schedule.JobScheduleConfig, passing it when calling the ejb business method. The job listener should already have all info to create JobScheduleConfig.

Related

How to set the execution date same as the trigger time?

I'm just learning Apache Airflow. I understand that the execution date is not the same time as the actual time a dag run is triggered.
Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.
Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.
Yeah, For a daily job, cron jobs run at the start of the day; Airflow jobs run at the end of the day.
I humbly ask: Anyway to set the execution date same as the trigger time?
You generally structure your tasks such that you'll provide a date to the job via kwargs (for idempotency, etc).
Airflow provides macros (https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html) that expose both the data_interval_start and the data_interval_end.
I believe you're looking for the data_interval_end which aligns with the logical date that the job is running.

Airflow: Trigger DAG to run even when cross-dependent DAG is not finished yet

I am in the process of setting up cross-dependent DAGS using the Airflow documentation. I have a particular use case where my DAG B requires that DAG A runs first - however, if DAG A is delayed long enough DAG B should still run. So I'm essentially looking for a way to wire an OR operation between 2 sensors.
Say DAG B needs to run daily by 5PM then this is how I would do it in code:
while True:
CURRENT_TIME = getCurrentTime()
if DAG A completed OR CURRENT_TIME > 5pm:
run DAG B
This is much simpler to do in code however not seeing how this is done with Airflow.
Interesting problem, here's how I think it can be accomplished
You need an ExternalTaskSensor in the beginning of your DAG-B, as told in the Cross-DAG Dependencies guide, to hold off the execution of DAG-B until DAG-A completes.
Here you must also set the timeout param so that the sensor fails after certain maximum time
Then in the first actual task of DAG-B (that comes immediately after ExternalTaskSensor) set trigge_rule=TriggerRule.ALL_DONE to ensure that the actual processing of your DAG-B starts irrespective of whether DAG-A completes within stipulated time or not; in other words
execution of DAG-B will be held off until DAG-A completes, but only for a maximum delta duration
If DAG-A completes within this duration, then DAG-B will begin executing immediately after that
but if DAG-A fails to complete within this duration, DAG-B will begin executing anyways after the passing of this duration

Airflow - Specify time of the day for execution timeout parameter

My DAG is scheduled to run daily at 7 AM. Can I specify time of the day to execution timeout parameter instead of duration.
For example, I want to add specific time 12 PM so that job will fail if it is still running at 12 PM.
Such a param is not present in BaseOperator or DAG
You'll have a build it. Here's some hint how you can go about it (not certain if this would work)
Write a custom TimeSensor (not to be confused with TimeDeltaSensor) by subclassing it, that kills the DAG upon failure.
You'll have to override the execute() method
For killing you can look into _mark_dagrun_state_as_failed() method
With the specified datetime timeout, add that custom sensor task as one of the starting tasks (tasks that don't have an upstream task) of you DAG
In case you have to timeout some specific task(s) instead of entire DAG
you can change write another custom timesensor that marks a specific task as failed upon timing out.
You can use _mark_task_instance_state() method for it
you can wire up this custom timesensor with that task in parallel (so that both the task and it's sensor launch together)

Azkaban: Failure due to too many blocked jobs in pipeline mode

Azkaban's pipeline mode execution results in failed jobs after a sufficient number of jobs have accumulated in the backlog. This happens if executions of a flow take more time than its scheduling frequency:
Is there a way to make Azkaban launch a new instance only when the previous one has ended while at the same time not skip any instance. We cannot skip instances because the time param passed to each, helps in selecting the duration of data that has to be processed.

Oozie job taking longer than scheduled interval

I am scheduling an Oozie MapReduce job to run every 15 minutes. I wonder what would happen if each job will take longer than that set time? Will it result in a job backlog? Or Oozie will create a new task / thread / fork for the new job while the previous one is still running?
Oozie won't run the next job before the previous one is over. If the first job takes more than 15 minutes to execute then the next one will be run after scheduled time. So scheduled time and running time may be different in Oozie.
EDIT:
Anyway, the described behaviour is default only and can be changed. You can set concurrency property from controls block to more than 1, and the next job will be run even the first one is still running. Check my answer on similar question

Resources