Restarting failed sub-jobs - autosys

I have set of 50 sub jobs under a single box in Autosys. Few of them failed. I want to restart only failed sub jobs after all jobs have triggered at least once.
How can I use n_retrys is this scenario?

n_retrys: is only when the job fails, it will use when it should retry failed job based on config in seconds + seconds between each other restart.

Related

Quartz.NET Manually invoked recurring job

I have ASP.NET Core application and I'm using Quartz.NET for reccuring job which ran my TestMethod() every 60 seconds and it works fine.
I need to run this job sometime manually. How to force Quartz.NET to make next reccuring calling 60 seconds after the manual run?
This is what I need to accomplish:
00:01:00 -> Automatic run of TestMethod();
00:02:00 -> Automatic run of TestMethod();
00:02:10 -> Manual run of TestMethod();
00:03:10 -> Automatic run of TestMethod(); (Note: 60 seconds after last run)
...
Maybe is it possible by the Hangfire library?
I guess you can't do that using Quartz or Hangfire. Schedule-based jobs are not affected by manual runs usually. You may also notice that schedule-based jobs do not take into account the execution time, so it may lead to the situation then you have one job still in progress (from the previous run) and another one started by schedule.
Instead of using simple recurring jobs you can use the following pattern:
Schedule a single execution of your job
Re-schedule completed job at the end of the execution.
It will help you to avoid the situation described above and also it will allow you to schedule next run after manual trigger.

task must be cleared before being run

I have a task that's scheduled to run hourly, however it's not being triggered. When I look at theTask Instance Details it says:
All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:
- The scheduler is down or under heavy load
- The following configuration values may be limiting the number of queueable processes: parallelism, dag_concurrency, max_active_dag_runs_per_dag, non_pooled_task_slot_count
- This task instance already ran and had its state changed manually (e.g. cleared in the UI)
If this task instance does not start soon please contact your Airflow administrator for assistance.
If I clear the task in the UI I am able to execute it through terminal but it does not run when scheduled.
Why do I have to manually clear it after every run?

Autosys Can a job run multiple instance at the same time

I am trying to understand autosys job. Suppose I have Job A that runs every 15 minutes. Suppose for some reason if Job A takes more than 15 minutes, will another instance of it run or it will wait for the job to finish before running another instance?
In my experience, if the previous job run is still running, another instance will not run if the next scheduled time comes. The next time the job runs is when the previous run is finished and the next scheduled time comes.
Another user also experienced this according to this answer.
I did not find any AutoSys documentation that officially confirms what happens in this situation, but I guess the best way to find out is to test it on your AutoSys instance.
I have experienced this first hand and can confirm that there won't be two instances in the mentioned scenario. The job will wait on the previous run to complete and will immediately kick off the next instance if the time condition is met before the previous completes.
But this will be the case only when the job is in running state, if the job is in any other state it will kick off based on the given start_time condition.

How to prevent a Hangfire recurring job from restarting after 30 minutes of continuous execution

I am working on an asp.net mvc-5 web application, and I am facing a problem in using Hangfire tool to run long running background jobs. the problem is that if the job execution exceed 30 minutes, then hangfire will automatically initiate another job, so I will end up having two similar jobs running at the same time.
Now I have the following:-
Asp.net mvc-5
IIS-8
Hangfire 1.4.6
Windows server 2012
Now I have defined a hangfire recurring job to run at 17:00 each day. The background job mainly scan our network for servers and vms and update the DB, and the recurring job will send an email after completing the execution.
The recurring job used to work well when its execution was less than 30 minutes. But today as our system grows, the recurring job completed after 40 minutes instead of 22-25 minutes as it used to be. and I received 2 emails instead of one email (and the time between the emails was around 30 minutes). Now I re-run the job manually and I have noted that that the problem is as follow:-
"when the recurring job reaches 30 minutes of continuous execution, a
new instance of the recurring job will start, so I will have two
instances instead of one running at the same time, so that why I received 2 emails."
Now if the recurring job takes less than 30 minutes (for example 29 minute) I will not face any problem, but if the recurring job execution exceeds 30 minutes then for a reason or another hangfire will initiate a new job.
although when I access the hangfire dashboard during the execution of the job, I can find that there is only one active job, when I monitor our DB I can see from the sql profiler that there are two jobs accessing the DB. this happens after 30 minutes from the beginning of the recurring job (at 17:30 in our case), and that why I received 2 emails which mean 2 recurring jobs were running in the background instead of one.
So can anyone advice on this please, how I can avoid hangfire from automatically initiating a new recurring job if the current recurring job execution exceeds 30 minutes?
Thanks
Did you look at InvisibilityTimeout setting from the Hangfire docs?
Default SQL Server job storage implementation uses a regular table as
a job queue. To be sure that a job will not be lost in case of
unexpected process termination, it is deleted only from a queue only
upon a successful completion.
To make it invisible from other workers, the UPDATE statement with
OUTPUT clause is used to fetch a queued job and update the FetchedAt
value (that signals for other workers that it was fetched) in an
atomic way. Other workers see the fetched timestamp and ignore a job.
But to handle the process termination, they will ignore a job only
during a specified amount of time (defaults to 30 minutes).
Although this mechanism ensures that every job will be processed,
sometimes it may cause either long retry latency or lead to multiple
job execution. Consider the following scenario:
Worker A fetched a job (runs for a hour) and started it at 12:00.
Worker B fetched the same job at 12:30, because the default invisibility timeout was expired.
Worker C (did not fetch) the same job at 13:00, because (it
will be deleted after successful performance.)
If you are using cancellation tokens, it will be set for Worker A at
12:30, and at 13:00 for Worker B. This may lead to the fact that your
long-running job will never be executed. If you aren’t using
cancellation tokens, it will be concurrently executed by WorkerA and
Worker B (since 12:30), but Worker C will not fetch it, because it
will be deleted after successful performance.
So, if you have long-running jobs, it is better to configure the
invisibility timeout interval:
var options = new SqlServerStorageOptions
{
InvisibilityTimeout = TimeSpan.FromMinutes(30) // default value
};
GlobalConfiguration.Configuration.UseSqlServerStorage("<name or connection string>", options);
As of Hangfire 1.5 this option is now Obsolete. Jobs that are being worked on are invisible to other workers.
Say goodbye to confusing invisibility timeout with unexpected
background job retries after 30 minutes (by default) when using SQL
Server. New Hangfire.SqlServer implementation uses plain old
transactions to fetch background jobs and hide them from other
workers.
Even after ungraceful shutdown, the job will be available for other
workers instantly, without any delays.
I was having trouble finding documentation on how to do this properly for a Postgresql database, every example I was see is using sqlserver, I found how the invisibility timeout was a property inside the PostgreSqlStorageOptions object, I found this here : https://github.com/frankhommers/Hangfire.PostgreSql/blob/master/src/Hangfire.PostgreSql/PostgreSqlStorageOptions.cs#L36. Luckily through trial and error I was able to figure out that the UsePostgreSqlStorage has an overload to accept this object. For .Net Core 2.0 when you are setting up the hangfire postgresql DB in the ConfigureServices method in the startup class add this(the default timeout is set to 30 mins):
services.AddHangfire(config =>
config.UsePostgreSqlStorage(Configuration.GetConnectionString("Hangfire1ConnectionString"), new PostgreSqlStorageOptions {
InvisibilityTimeout = TimeSpan.FromMinutes(720)
}));
I had this problem when using Hangfire.MemoryStorage as the storage provider. With memory storage you need to set the FetchNextJobTimeout in the MemoryStorageOptions, otherwise by default jobs will timeout after 30 minutes and a new job will be executed.
var options = new MemoryStorageOptions
{
FetchNextJobTimeout = TimeSpan.FromDays(1)
};
GlobalConfiguration.Configuration.UseMemoryStorage(options);
Just would like to point out that even though, it is stated the thing below:
As of Hangfire 1.5 this option is now Obsolete. Jobs that are being worked on are invisible to other workers.
Say goodbye to confusing invisibility timeout with unexpected background job retries after 30 minutes (by default) when using SQL Server. New Hangfire.SqlServer implementation uses plain old transactions to fetch background jobs and hide them from other workers.
Even after ungraceful shutdown, the job will be available for other workers instantly, without any delays.
It seems that for many people using MySQL, PostgreSQL, MongoDB, InvisibilityTimeout is still the way to go: https://github.com/HangfireIO/Hangfire/issues/1197

how to create a wait job in informatica

My requirement is to create a job in informatica which will run for every 15 min and look for a status column in abc table.If it is “Approved” THEN It will exit and kick off the rest of the jobs.
If the status is not approved it will not do anything and run after 15 min.This process wil continue until we have a approval status.
So, No matter what happens in the above two scenarios,This process will run in every 15 minutes.
I have worked on the same requirement in unix using loops and conditional statments but I am not sure how this can be achieved using informatica.Could you please help me on this.
Regards,
Karthik
I would try adding a scheduler that runs every 15 minutes. The best way that I've found to "loop" sessions in Informatica is:
run the session once, check if it failed using conditional links
if it did fail, run a timer task for an amount of time (a minute, an hour, whatever)
then try to run the same session again by copying and pasting the session up ahead of the timer task, and repeat a few times as necessary.
So if you added a scheduler into the mix, you could set the scheduler to have the workflow run every 15 minutes, and have the timer tasks halt the workflow for 4 or 5 minutes each. Then you could use SESSSTARTTIME function in some pre/post-session task to determine when the scheduler will fire off again and simply abort the workflow before that time.

Resources