Rebalance issue with spring kafka max.poll.interval.ms, max.poll.records and idleTimeBetweenPolls - spring-kafka

I am seeing the continuous rebalancing on my application. My application is developed in batch mode and here are configuration properties which have been added.
myapp.consumer.group.id= cg-id-local
myapp.changefeed.topic= test_topic
myapp.auto.offset.reset=latest
myapp.enable.auto.commit=false
myapp.max.poll.interval.ms=300000
myapp.max.poll.records= 20000
myapp.idle.time.between.polls=240000
myapp.concurrency = 10
container factory:
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory(poSummaryCGID));
factory.setConcurrency(poSummNoOfConsumers);
factory.setBatchListener(true);
factory.setAckDiscarded(true);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);
factory.getContainerProperties().setIdleBetweenPolls(idleTimeBetweenPolls);
I have few Questions here:
I have setup the maximum record count per poll(4 min) is 20000 and we have 10 partitions in a TOPIC. Since i setup the concurrency as 10, so 10 consumers will up and running and each will listen to 1 partition. My question here is, does the record count will be split across all the consumers like each consumer can handle 2000 records ?
The max.poll.interval.ms has been setup with 5 min. I am sure that the consumer will process 2000(if my above understanding is correct) records in a given poll interval(4 min) which is less than max.poll.interval.ms which has upper bound limit. But not sure why rebalancing is happening? are there any other configuration properties i need to setup ?
Help would be greatly appreciated!!
Tried with these configurations:
myapp.max.poll.interval.ms=600000
myapp.max.poll.records= 2000
myapp.idle.time.between.polls=360000
myapp.max.poll.interval.ms=300000
myapp.max.poll.records= 2000
myapp.idle.time.between.polls=300000
myapp.max.poll.interval.ms=300000
myapp.max.poll.records= 2000
myapp.idle.time.between.polls=180000
EDIT FIX :
We should always
myapp.max.poll.interval.ms >
(myapp.idle.time.between.polls + myapp.max.poll.records processing time).

No. max.poll.records is per consumer, not per topic or container.
If you have concurrency=10 and 10 partitions you should reduce max.poll.records to 2000 so that each consumer gets a max of 2000 per poll.
The container will automatically reduce the idle between polls so that the max.poll.interval.ms won't be exceeded, but you should be conservative with these properties (max.poll.records and max.poll.interval.ms) such that it will never be possible to exceed the interval.

Related

How to create Asynchronous Camel-kafka consumer?

I have a route in camel to consumer from kafka. It is consuming and producing with a TPS of 2000 if the incoming message is 18000 TPS. so the consumer topic has consumer lag. If I keep max.poll.recors = 500 i'm able to achieve 2000 TPS. If I keep producer settings requestRequiredAcks=0 I can achieve 4000 TPS. but still with consumer lag.
We know that camel route is complete when from->to is complete. a consumer which is consuming from 2 partitions with a consumer count 2, is busy until the route is complete.
is there a way to make camel-kafka consumer asynchronous . any code example?
from("kafka:{{consumer.topic}}?brokers={{kafka_dev.host}}"
+ "&maxPollRecords={{consumer.maxPollRecords}}" + "&consumersCount=2"
+ "&seekTo=latest" + "&groupId={{consumer.group}}" + "&keyDeserializer="
+ KEYDESERIALIZER + "&valueDeserializer=" + VALUEDESERIALIZER + SSL).doTry()
.routeId("route1")
.process(new CamelProcessor())
.to("kafka:{{producer.topic}}?brokers={{kafka_dev.host}}" +"&requestRequiredAcks=1" )
.doCatch(Exception.class));
Also we have observed that introducing threads in this route re-reads the same processed and sent messages. is this link saying camel-kafka https://stackoverflow.com/questions/56716812/how-to-commit-offsets-thread-safe-using-camel-kafka

Connection timeout in virtuoso, even after changing the MaxQueryExecutionTime

I changed the Virtuoso 6.1 configuration in order to avoid the Timeout constraint.
Here is the important part of the virtuoso.ini:
MaxQueryCostEstimationTime = 40000 ; in seconds
MaxQueryExecutionTime = 60000 ; in seconds
However, it still times out for complex queries.
Did I miss something?

Gatling - how to boost performance

We are using Gatling with a very simple scenario: reading urls from a CSV file and invoking them.
We get a throughput of ~18K requests/secs .
Are there any ideas of how to push this number up?
We tried putting the Keep-Alive header in order to refrain from the overhead of open/close of connections, but it doesn't help.
Here's our code:
class MySimulation extends Simulation {
val httpProtocol = http
.baseURL("http://localhost:9090/")
val csvFeeder = csv("uniq_urls_500.csv").random
val scn = scenario("MySimulation")
.feed(csvFeeder)
.repeat(10000) {
exec(http("request_0")
.get("?loc=${Url}")
.header("Keep-Alive", "1500000")
)
}
setUp(scn.inject(
rampUsers(100) over(5 seconds)
)).protocols(httpProtocol)
}
Increase the number of users for your test scenario from 100 to a higher number to increase load on your server.
Ensure your box from where you running gatling test can handle that much.
If the box is struggling then you can execute gatling from multiple boxes.

Units of Work and Backout in DataPower

I Have set the configuration as below
Units of Work : 1
Automatic Backout: on.
Backout Threshold: 3
Backout Queue Name: Queue Name is given.
So according to this settings , since threshold value is 3 and in case of failure, there should be 4 transaction in the probe?
can you please confirm
Thanks
Vathsa
No, only one as it is the same transaction in DP but three transport retires.

linkedin-j fetching all network updates

I am using linkedin-j for fetching linkedin data
Set<NetworkUpdateType> set = new HashSet<NetworkUpdateType>();
set.add(NetworkUpdateType.SHARED_ITEM);
Network network = client.getNetworkUpdates(set);
and it returns only 10 my own network updates. How to get all the public updates (not necessarily my own network connections) using these linkedin apis?
Try like this,
Set<NetworkUpdateType> set = new HashSet<NetworkUpdateType>();
set.add(NetworkUpdateType.SHARED_ITEM);
Network network = client.getNetworkUpdates(set,1, 50);
Here First argument is START and second argument is END.
So you will get 50 results.
Total :102
Start :1
Count :50
You can iterate over Total. :)
You can get your updates and updates of your first degree connections using the LinkedIn APIs, but you cannot get all updates across all people.

Resources