Spring #Scheduler Overlap - spring-scheduled

I have 2 methods to run periodically with spring scheduler, one is with #Scheduled(fixedRate=300000) - for every 5 mins and
other is #Scheduled(cron="0 0 0 2 * * ?) - for daily 2 am. We are not using taskScheduler with thread pool so it uses only one thread
so there is no overlap.
What I observed is, when 5 mins job taking more time ( say > 30 mins ) its not allowing other #Scheduled job to run. Suppose my 5 mins job started at 1.45 am and it took 45 mins to process, Other cron which suppose to start at 2 am couldn't start because of thread was busy in 5mins job. Is there any settings where other 2 am job will start as soon thread is released. Help us to understand the behavoir of threads in such cases.

By default Spring scheduler provides a single thread for job execution. You can configure a task executor with more than 1 thread and a queue so that even if all threads are busy job is added to the queue and gets picked up as soon as a thread is free.
Following example creates a simple ScheduledThreadPool.
#EnableScheduling
#Configuration
public class Configuration implements SchedulingConfigurer {
#Override
public void configureTasks(ScheduledTaskRegistrar
scheduledTaskRegistrar) {
scheduledTaskRegistrar.setScheduler(taskExecutor());
}
#Bean(destroyMethod="shutdown")
public Executor taskExecutor() {
return Executors.newScheduledThreadPool(5);
}
}

Related

JobRunr - Trying to run multiple recurring jobs with Spring Boot

I am using JobRunr to run my background jobs and in this I am providing the users to setup recurring jobs using an endpoint like below:
#PostMapping("/schedule-recurring")
public String scheduleRecurring(#RequestBody ExecutionJob executionJob) {
return BackgroundJob.scheduleRecurrently(executionJob.getId(),executionJob.getCronExpression(), ()
-> jobService.executeSomeJob(executionJob, JobContext.Null));
}
These jobs could run in 5 mins, 10 mins or it might sometimes take upto 4 hours. This all depends on how many records to process. Right now, I am in a phase where I have only one background job server since I am building a POC for this. However, in future we plan to scale it to support 1 instance per customer having one or multiple background job server based on the client license.
My issue here is that, I have 2 recurring jobs which run 1 hour apart from each other. The first recurring job executes for more than 1 hour and deems the second job unexecuted, because this job is not triggered as there is no available Background Job Worker to address this request. I am thinking of adding a check to the job to trigger itself only if a Background Job Worker is available. But is there a better idea where-in the schedule-recurring method itself adds a condition to queue the job if a Background Job worker is not available?
Thanks in advance.

Shutdown kafka consumer after processing messages

I am using #KafkaListener(topics = "${topic}") to consume messages from a topic in a spring-boot application and I need this running periodically. spring-kafka version is 2.2.4.RELEASE.
One way to achieve this could have been batching every 6 hours using fetch.max.wait.ms, but 6 hours seems too much for this configuration.
Hence, I am looking for a way to shut down the application after processing, and restart it every 6 hours.
Other way is something like below, but it does not guarantee that application had finished processing within the sleep time(30 sec in below example).
public class Application {
public static void main(String[] args) throws InterruptedException {
ConfigurableApplicationContext run = SpringApplication.run(Application.class, args);
Thread.sleep(30000);
run.close();
}
}
What is the graceful way to shutdown the consumer, to make sure that shutdown happens only after it has processed the batch of messages?
See this answer.
Shut down the application when all of the container instances go idle.

Failure to achieve proper high concurrency for ASP.NET

Originally trying to create an HTTP endpoint that would remain open for a long time (until a remote service executes and finished, then return the result to the original caller), I hit some concurrency issues: this endpoint would only execute a small number of times concurrently (like 10 or so, whereas I'd expect hundreds if not more).
I then narrowed down my code to a test endpoint that merely returns after a certain amount of MS you give it via the URL. This method should, in theory, give maximum concurrency, but it doesn't happen neither when running under an IIS on a Windows 10 desktop PC nor when running on a Windows 2012 Server.
This is the test Web API endpoint:
[Route("throughput/raw")]
[HttpGet]
public async Task<IHttpActionResult> TestThroughput(int delay = 0)
{
await Task.Delay(delay);
return Ok();
}
And this is a simple test app:
class Program
{
static readonly HttpClient HttpClient = new HttpClient();
static readonly ConcurrentBag<long> Stats = new ConcurrentBag<long>();
private static Process _currentProcess;
private static string url = "http://local.api/test/throughput/raw?delay=0";
static void Main()
{
// Warm up
var dummy = HttpClient.GetAsync(url).Result;
Console.WriteLine("Warm up finished.");
Thread.Sleep(500);
// Get current process for later
_currentProcess = Process.GetCurrentProcess();
for (var i = 1; i <= 100; i++)
{
Thread t = new Thread(Proc);
t.Start();
}
Console.ReadKey();
Console.WriteLine($"Total requests: {Stats.Count}\r\nAverage time: {Stats.Average()}ms");
Console.ReadKey();
}
static async void Proc()
{
Stopwatch sw = Stopwatch.StartNew();
sw.Start();
await HttpClient.GetAsync(url);
sw.Stop();
Stats.Add(sw.ElapsedMilliseconds);
Console.WriteLine($"Thread finished at {sw.ElapsedMilliseconds}ms. Total threads running: {_currentProcess.Threads.Count}");
}
}
The results I get are these:
Warm up finished.
Thread finished at 118ms. Total threads running: 32
Thread finished at 114ms. Total threads running: 32
Thread finished at 130ms. Total threads running: 32
Thread finished at 110ms. Total threads running: 32
Thread finished at 115ms. Total threads running: 32
Thread finished at 117ms. Total threads running: 32
Thread finished at 119ms. Total threads running: 32
Thread finished at 112ms. Total threads running: 32
Thread finished at 163ms. Total threads running: 32
Thread finished at 134ms. Total threads running: 32
...
...
Some more
...
...
Thread finished at 4511ms. Total threads running: 32
Thread finished at 4504ms. Total threads running: 32
Thread finished at 4500ms. Total threads running: 32
Thread finished at 4507ms. Total threads running: 32
Thread finished at 4504ms. Total threads running: 32
Thread finished at 4515ms. Total threads running: 32
Thread finished at 4502ms. Total threads running: 32
Thread finished at 4528ms. Total threads running: 32
Thread finished at 4538ms. Total threads running: 32
Thread finished at 4535ms. Total threads running: 32
So:
I'm not sure why are there only 32 threads running (I assume it's related to the number of cores on my machine although sometimes the number is 34 and anyway it should be much more I think).
The main issue I'm trying to tackle: The running time goes up as more calls are created, whereas I'd expect it to remain relatively constant.
What am I missing here? I'd expect an ASP.NET site (API in this case but it doesn't matter), running on a Windows Server (so no artificial concurrency limit is applied) to handle all these concurrent requests just fine and not increase the response time. I believe the response time is increased because threads are capped on the server side so subsequent HTTP calls wait for their turn. I'd also expect more than 32/34 threads running on the client (test) application.
I also tried to tweak machine.config without much success but I think that even the default should give much more throughput.
HTTP Client
The number of simultaneous HttpClient connections is limited by your ServicePointManager. If you believe this article, the default is 2. TWO!! So your requests are getting queued. You can increase the number by setting the DefaultConnectionLimit.
Threads
Edit of the OP: although factually true for thread pools, my question did not involve a usage of the thread pool. I'm leaving this here though for any future reference (with usages slightly different than the one demonstrated in the question) and with respect to the person who gave this answer.
There is a maximum number of threads in your default thread pool. The default is not preset; it depends on the amount of memory available and other factors, and is apparently 32 on your machine. See this article, which states:
Beginning with the .NET Framework 4, the default size of the thread pool for a process depends on several factors, such as the size of the virtual address space. A process can call the GetMaxThreads method to determine the number of threads.
You can, of course, change it.
John's answer addresses setting the default connection limit. Additionally, don't use blocking threads at all; that way you won't need to care about the size of the thread pool. Your tester is I/O bound, not CPU bound. Your Proc already returns immediately, so just call it without a new thread. Change its return type to Task so you can tell when its deferred portion is done.
Then Main will go something like this:
public static async Task Main() {
await HttpClient.GetAsync(url);
await Task.Delay(500); // Wait for warm up.
await Task.WhenAll(Enumerable.Range(0, 100).Select(_ => Proc()));
// Print results here.
}

How to prevent a Hangfire recurring job from restarting after 30 minutes of continuous execution

I am working on an asp.net mvc-5 web application, and I am facing a problem in using Hangfire tool to run long running background jobs. the problem is that if the job execution exceed 30 minutes, then hangfire will automatically initiate another job, so I will end up having two similar jobs running at the same time.
Now I have the following:-
Asp.net mvc-5
IIS-8
Hangfire 1.4.6
Windows server 2012
Now I have defined a hangfire recurring job to run at 17:00 each day. The background job mainly scan our network for servers and vms and update the DB, and the recurring job will send an email after completing the execution.
The recurring job used to work well when its execution was less than 30 minutes. But today as our system grows, the recurring job completed after 40 minutes instead of 22-25 minutes as it used to be. and I received 2 emails instead of one email (and the time between the emails was around 30 minutes). Now I re-run the job manually and I have noted that that the problem is as follow:-
"when the recurring job reaches 30 minutes of continuous execution, a
new instance of the recurring job will start, so I will have two
instances instead of one running at the same time, so that why I received 2 emails."
Now if the recurring job takes less than 30 minutes (for example 29 minute) I will not face any problem, but if the recurring job execution exceeds 30 minutes then for a reason or another hangfire will initiate a new job.
although when I access the hangfire dashboard during the execution of the job, I can find that there is only one active job, when I monitor our DB I can see from the sql profiler that there are two jobs accessing the DB. this happens after 30 minutes from the beginning of the recurring job (at 17:30 in our case), and that why I received 2 emails which mean 2 recurring jobs were running in the background instead of one.
So can anyone advice on this please, how I can avoid hangfire from automatically initiating a new recurring job if the current recurring job execution exceeds 30 minutes?
Thanks
Did you look at InvisibilityTimeout setting from the Hangfire docs?
Default SQL Server job storage implementation uses a regular table as
a job queue. To be sure that a job will not be lost in case of
unexpected process termination, it is deleted only from a queue only
upon a successful completion.
To make it invisible from other workers, the UPDATE statement with
OUTPUT clause is used to fetch a queued job and update the FetchedAt
value (that signals for other workers that it was fetched) in an
atomic way. Other workers see the fetched timestamp and ignore a job.
But to handle the process termination, they will ignore a job only
during a specified amount of time (defaults to 30 minutes).
Although this mechanism ensures that every job will be processed,
sometimes it may cause either long retry latency or lead to multiple
job execution. Consider the following scenario:
Worker A fetched a job (runs for a hour) and started it at 12:00.
Worker B fetched the same job at 12:30, because the default invisibility timeout was expired.
Worker C (did not fetch) the same job at 13:00, because (it
will be deleted after successful performance.)
If you are using cancellation tokens, it will be set for Worker A at
12:30, and at 13:00 for Worker B. This may lead to the fact that your
long-running job will never be executed. If you aren’t using
cancellation tokens, it will be concurrently executed by WorkerA and
Worker B (since 12:30), but Worker C will not fetch it, because it
will be deleted after successful performance.
So, if you have long-running jobs, it is better to configure the
invisibility timeout interval:
var options = new SqlServerStorageOptions
{
InvisibilityTimeout = TimeSpan.FromMinutes(30) // default value
};
GlobalConfiguration.Configuration.UseSqlServerStorage("<name or connection string>", options);
As of Hangfire 1.5 this option is now Obsolete. Jobs that are being worked on are invisible to other workers.
Say goodbye to confusing invisibility timeout with unexpected
background job retries after 30 minutes (by default) when using SQL
Server. New Hangfire.SqlServer implementation uses plain old
transactions to fetch background jobs and hide them from other
workers.
Even after ungraceful shutdown, the job will be available for other
workers instantly, without any delays.
I was having trouble finding documentation on how to do this properly for a Postgresql database, every example I was see is using sqlserver, I found how the invisibility timeout was a property inside the PostgreSqlStorageOptions object, I found this here : https://github.com/frankhommers/Hangfire.PostgreSql/blob/master/src/Hangfire.PostgreSql/PostgreSqlStorageOptions.cs#L36. Luckily through trial and error I was able to figure out that the UsePostgreSqlStorage has an overload to accept this object. For .Net Core 2.0 when you are setting up the hangfire postgresql DB in the ConfigureServices method in the startup class add this(the default timeout is set to 30 mins):
services.AddHangfire(config =>
config.UsePostgreSqlStorage(Configuration.GetConnectionString("Hangfire1ConnectionString"), new PostgreSqlStorageOptions {
InvisibilityTimeout = TimeSpan.FromMinutes(720)
}));
I had this problem when using Hangfire.MemoryStorage as the storage provider. With memory storage you need to set the FetchNextJobTimeout in the MemoryStorageOptions, otherwise by default jobs will timeout after 30 minutes and a new job will be executed.
var options = new MemoryStorageOptions
{
FetchNextJobTimeout = TimeSpan.FromDays(1)
};
GlobalConfiguration.Configuration.UseMemoryStorage(options);
Just would like to point out that even though, it is stated the thing below:
As of Hangfire 1.5 this option is now Obsolete. Jobs that are being worked on are invisible to other workers.
Say goodbye to confusing invisibility timeout with unexpected background job retries after 30 minutes (by default) when using SQL Server. New Hangfire.SqlServer implementation uses plain old transactions to fetch background jobs and hide them from other workers.
Even after ungraceful shutdown, the job will be available for other workers instantly, without any delays.
It seems that for many people using MySQL, PostgreSQL, MongoDB, InvisibilityTimeout is still the way to go: https://github.com/HangfireIO/Hangfire/issues/1197

how to avoid any timeout during a long running method execution

I am working on an asp.net mvc 5 web application , deployed inside IIS-8, and i have a method inside my application to perform a long running task which mainly scans our network for servers & VMs and update our database with the scan results. method execution might last between 30-40 minutes to complete on production environment. and i am using a schedule tool named Hangfire which will call this method 2 times a day.
here is the job definition inside the startup.cs file, which will call the method at 8:01 am & 8:01 pm:-
public void Configuration(IAppBuilder app)
{
var options = new SqlServerStorageOptions
{
PrepareSchemaIfNecessary = false
};
GlobalConfiguration.Configuration.UseSqlServerStorage("scanservice",options);
RecurringJob.AddOrUpdate(() => ss.Scan(), "01 8,20 ***");
}
and here is the method which is being called twice a day by the schedule tool:-
public void Scan()
{
Service ss = new Service();
ss.NetworkScan().Wait();
}
Finally the method which do the real scan is (i only provide a high level description of what the method will do):-
public async Task<ScanResult> NetworkScan()
{
// retrieve the server info from the DB
// loop over all servers & then execute some power shell commands to scan the network & retrieve the info for each server one by one...
// after the shell command completed for each server, i will update the related server info inside the DB
currently i did some tests on our test environment and every thing worked well ,, where the scan took around 25 seconds to scan 2 test servers.but now we are planning to move the application to production and we have around 120++ servers to scan. so i estimate the method execution to take around 30 -40 minutes to complete on the production environment. so my question is how i can make sure that this execution will never expire , and the ScanNetwork() method will complete till the end?
Instead of worrying about your task timing out, perhaps you could start a new task for each server. In this way each task will be very short lived, and any exceptions caused by scanning a single server will not effect all the others. Additionally, if your application is restarted in IIS any scans which were not yet completed will be resumed. With all scans happening in one sequential task this is not possible. You will likely also see the total time to complete a scan of your entire network plummet, as the majority of time would likely be spent waiting on remote servers.
public void Scan()
{
Service ss = new Service();
foreach (var server in ss.GetServers())
{
BackgroundJob.Enqueue<Service>(s => s.ServerScan(server));
}
}
Now your scheduled task will simply enqueue one new task for each server.

Resources