I am seeing the continuous rebalancing on my application. My application is developed in batch mode and here are configuration properties which have been added.
myapp.consumer.group.id= cg-id-local
myapp.changefeed.topic= test_topic
myapp.auto.offset.reset=latest
myapp.enable.auto.commit=false
myapp.max.poll.interval.ms=300000
myapp.max.poll.records= 20000
myapp.idle.time.between.polls=240000
myapp.concurrency = 10
container factory:
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory(poSummaryCGID));
factory.setConcurrency(poSummNoOfConsumers);
factory.setBatchListener(true);
factory.setAckDiscarded(true);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);
factory.getContainerProperties().setIdleBetweenPolls(idleTimeBetweenPolls);
I have few Questions here:
I have setup the maximum record count per poll(4 min) is 20000 and we have 10 partitions in a TOPIC. Since i setup the concurrency as 10, so 10 consumers will up and running and each will listen to 1 partition. My question here is, does the record count will be split across all the consumers like each consumer can handle 2000 records ?
The max.poll.interval.ms has been setup with 5 min. I am sure that the consumer will process 2000(if my above understanding is correct) records in a given poll interval(4 min) which is less than max.poll.interval.ms which has upper bound limit. But not sure why rebalancing is happening? are there any other configuration properties i need to setup ?
Help would be greatly appreciated!!
Tried with these configurations:
myapp.max.poll.interval.ms=600000
myapp.max.poll.records= 2000
myapp.idle.time.between.polls=360000
myapp.max.poll.interval.ms=300000
myapp.max.poll.records= 2000
myapp.idle.time.between.polls=300000
myapp.max.poll.interval.ms=300000
myapp.max.poll.records= 2000
myapp.idle.time.between.polls=180000
EDIT FIX :
We should always
myapp.max.poll.interval.ms >
(myapp.idle.time.between.polls + myapp.max.poll.records processing time).
No. max.poll.records is per consumer, not per topic or container.
If you have concurrency=10 and 10 partitions you should reduce max.poll.records to 2000 so that each consumer gets a max of 2000 per poll.
The container will automatically reduce the idle between polls so that the max.poll.interval.ms won't be exceeded, but you should be conservative with these properties (max.poll.records and max.poll.interval.ms) such that it will never be possible to exceed the interval.
I’ve inherited a system that uses Hangfire with sql server job storage. Usually when a job is scheduled to be run immediately we notice it takes a few seconds before it’s triggered.
Looking at SQL Profiler when running in my dev environment, the SQL run against Hangfire db looks like this -
exec sp_executesql N'delete top (1) JQ
output DELETED.Id, DELETED.JobId, DELETED.Queue
from [HangFire].JobQueue JQ with (readpast, updlock, rowlock, forceseek)
where Queue in (#queues1) and (FetchedAt is null or FetchedAt < DATEADD(second, #timeout, GETUTCDATE()))',N'#queues1 nvarchar(4000),#timeout float',#queues1=N'MYQUEUENAME_master',#timeout=-1800
-- Exactly the same SQL as above is executed about 6 times/second for about 3-4 seconds,
-- then nothing for about 2 seconds, then:
exec sp_getapplock #Resource=N'HangFire:recurring-jobs:lock',#DbPrincipal=N'public',#LockMode=N'Exclusive',#LockOwner=N'Session',#LockTimeout=5000
exec sp_getapplock #Resource=N'HangFire:locks:schedulepoller',#DbPrincipal=N'public',#LockMode=N'Exclusive',#LockOwner=N'Session',#LockTimeout=5000
exec sp_executesql N'select top (#count) Value from [HangFire].[Set] with (readcommittedlock, forceseek) where [Key] = #key and Score between #from and #to order by Score',N'#count int,#key nvarchar(4000),#from float,#to float',#count=1000,#key=N'recurring-jobs',#from=0,#to=1596053348
exec sp_executesql N'select top (#count) Value from [HangFire].[Set] with (readcommittedlock, forceseek) where [Key] = #key and Score between #from and #to order by Score',N'#count int,#key nvarchar(4000),#from float,#to float',#count=1000,#key=N'schedule',#from=0,#to=1596053348
exec sp_releaseapplock #Resource=N'HangFire:recurring-jobs:lock',#LockOwner=N'Session'
exec sp_releaseapplock #Resource=N'HangFire:locks:schedulepoller',#LockOwner=N'Session'
-- Then nothing is executed for about 8-10 seconds, then:
exec sp_executesql N'update [HangFire].Server set LastHeartbeat = #now where Id = #id',N'#now datetime,#id nvarchar(4000)',#now='2020-07-29 20:09:19.097',#id=N'ps12345:19764:fe362d1a-5ee4-4d97-b70d-134fdfab2b87'
-- Then about 500ms-2s later I get
exec sp_executesql N'delete top (1) JQ ... -- i.e. Same as first query
The update LastHeartbeat query is only there every second time (from just a brief inspection, maybe that’s not exactly right).
It looks like there’s at least 3 threads running the DELETE query against JQ, since I can see several RPC:Starting before the RPC:Completed, suggesting they’re being executed in parallel instead of sequentially.
I don’t know if that’s normal but seems weird as I thought we had just one ‘consumer’ of the jobs.
I only have one Queue in my dev environment, although in live we’d have 20-50 I’d guess.
Any suggestions on where I should look for the configuration that’s causing:
a) the 8-10s pause between checking for jobs
b) the number of threads that are checking for jobs - it seems like I have too many
After writing this I realised we were using an old version so I upgraded from 1.5.x to 1.7.12, upgraded the database, and changed the startup config to this:
app.UseHangfireDashboard();
GlobalConfiguration.Configuration
.UseSqlServerStorage(connstring, new SqlServerStorageOptions
{
CommandBatchMaxTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.Zero,
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
UseRecommendedIsolationLevel = true,
PrepareSchemaIfNecessary = true, // Default value: true
EnableHeavyMigrations = true // Default value: false
})
.UseAutofacActivator(_container);
JobActivator.Current = new AutofacJobActivator(_container);
but if anything the problem is now worse. Or the same but faster: 20 calls to delete top (1) JQ... happen within about 1s now, then the other queries, then a 15s wait, then it starts all over again.
To be clear, the main problem is that if any jobs are added during that 15s delay then it'll take the remainder of that 15s before my job is executed. A second problem I think is it's hitting SQL Server more than needed: 20 times in a second is a bit much, for my needs at least.
(Cross-posted to hangfire forums)
If you don't set QueuePollInterval then Hangfire with sql server storage defaults to polling every 15s. So the first thing to do if you have this problem is set QueuePollInterval to something smaller, e.g. 1s.
But in my case even when I set that it wasn't having any effect. The reason for that was calling app.UseHangfireServer() before I was calling GlobalConfiguration.Configuration.UseSqlServerStorage() with the SqlServerStorageOptions.
When you call app.UseHangfireServer() it uses the current value of JobStorage.Current. My code had set that:
var storage = new SqlServerStorage(connstring);
JobStorage.Current = storage;
then later called
app.UseHangfireServer()
then later called
GlobalConfiguration.Configuration
.UseSqlServerStorage(connstring, new SqlServerStorageOptions
{
CommandBatchMaxTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.Zero,
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
UseRecommendedIsolationLevel = true,
PrepareSchemaIfNecessary = true,
EnableHeavyMigrations = true
})
Reordering it to use SqlServerStorageOptions before app.UseHangfireServer() means the SqlServerStorageOptions take effect.
I would suggest checking the Hangfire BackgroundJobServerOptions to see what polling interval you have set up there. This will define the time before the hangfire server will check to see if there are any jobs in queue to execute.
From the documentation
Hangfire Docs
Hangfire Server periodically checks the schedule to enqueue scheduled jobs to their queues, allowing workers to
execute them. By default, check interval is equal to 15 seconds, but you can change it by setting the SchedulePollingInterval property on the options you pass to the BackgroundJobServer constructor:
var options = new BackgroundJobServerOptions
{
SchedulePollingInterval = TimeSpan.FromMinutes(1)
};
var server = new BackgroundJobServer(options);
I am using Python to download some data from bloomberg. It works most of the time, but sometimes it pops up a 'Time Out Issue`. And after that the response and request does not match anymore.
The code I use in the for loop is as follows:
result_IVM=con.bdh(option_name,'IVOL_MID',date_string,date_string,longdata=True)
volatility=result_IVM['value'].values[0]
When I set up the connection, I used following code:
con = pdblp.BCon(debug=True, port=8194, timeout=5000)
If I increase the timeout parameter (now is 5,000), will it help for this issue?
I'd suggest to increase the timeout to 5000 or even 10000 then test for few times. The default value of timeout is 500 milliseconds, which is small!
The TIMEOUT Event is triggered by the blpapi when no Event(s) arrives within milliseconds
The author of pdblp defines timeout as:
timeout: int Number of milliseconds before timeout occurs when
parsing response. See blp.Session.nextEvent() for more information.
Ref: https://github.com/matthewgilbert/pdblp/blob/master/pdblp/pdblp.py
In a mariadb table with tokuDb engine; I am ecountering the below error - either on a delete statement; whilst there is a background insert load, and vice versa.
Lock wait timeout exceeded; try restarting transaction
Does tokuDb user a setting that can be updated to determine how long it waits before it timesout a statement?
I couldn't find the answer in tokuDb documents. The maria varaible is still at its default value: 'lock_wait_timeout', '31536000' -- but my timeout is coming back in quite a bit less than a year. The timeouts are coming during a load test; and I haven't spotted a time value in the error - but it feels like a few seconds; to minutes at the most before the timeout is thrown.
Thanks,
Brent
TokuDB has its own timeout variable, tokudb_lock_timeout, it is measured in milliseconds and has the default value 4000 (4 seconds), which fits your observations. It can be modified both on the session and global levels, and can also be configured in the .cnf file.
Remember that when you set a global value for a variable which has both scopes, it only affects future sessions (connections), but not the existing ones.
-- for the current session
SET SESSION tokudb_lock_timeout = 60000;
-- for future sessions
SET GLOBAL tokudb_lock_timeout = 60000;
I need to upload a plugin in word press but each time it tells
Downloading update from http://wordpress.org/wordpress-3.7.1-new-bundled.zip…
Fatal error: Maximum execution time of 30 seconds exceeded in C:\xampp\htdocs\ubuy\wp-includes\class-http.php on line 1152
this how can increase the time limit ?
I ended up editing the ...\wp_includes\class-http.php file.
Around line 1250 (depending on your version), look for the line that reads:
$theResponse = curl_exec( $handle );
and change it to read:
$timelimit = ini_get('max_execution_time');
set_time_limit(900);
$theResponse = curl_exec( $handle );
set_time_limit(max($timelimit, 30));
This stores the current timeout in a variable, sets the new timeout to 900 seconds (5 minutes should be plenty for most connections), then it performs the request, and resets the timeout to what ever it was before we began.
That has worked for me in the past.
NB: v3.8.2 uses another asynchronous method to perform installs and if you're on windows, you might need to set some additional security permissions etc. to get it all working properly.
Cheers.