Reactor Kafka: Consumption is stopped with out of order commit enabled (maxDeferredCommits>0) - spring-kafka

We use reactiveKafkaConsumerTemplate to receive messages, then Acknowledge the offset after processing the message. We enabled the out of order commit (maxDeferredCommits=250) and noticed the consumer paused infinitely in certain situations.
The events pattern are:
There might be some network glitch or kafka server maintenance. RetriableCommitFailedException triggered
Consumer pause with “Paused - commits are retrying”
Consumer “Resume” and “Emitting records”. But there is no more “Async committing” log. (no message acknowledgement exception identify)
After some “Emitting records” logs, Consumer pause with “Paused - too many deferred commits”
No more “ConsumerEventLoop” log
Rebalance fixes the issue. (We have 3 consumers deployed on 3 hosts, remove 1 host fix the issue)
reactor-kafka-1.3.13.jar
logging:
level:
reactor:
kafka:
receiver: DEBUG
maxDeferredCommits: 250
ConsumerConfig
auto.commit.interval.ms = 1000
auto.offset.reset = earliest
connections.max.idle.ms = 540000
enable.auto.commit = false
heartbeat.interval.ms = 1000
max.poll.interval.ms = 300000
max.poll.records = 500
request.timeout.ms = 30000
session.timeout.ms = 10000
Logs:
11/24/22 6:50:06.386 AM DEBUG r.k.r.internals.ConsumerEventLoop Async committing: {
test-0=OffsetAndMetadata{offset=12206778, leaderEpoch=null, metadata=''},
test-1=OffsetAndMetadata{offset=12253822, leaderEpoch=null, metadata=''}
test-2=OffsetAndMetadata{offset=12257066, leaderEpoch=null, metadata=''}
test-3=OffsetAndMetadata{offset=12265134, leaderEpoch=null, metadata=''}}
No more “Async committing” after this
11/24/22 6:50:06.451 AM WARN r.k.r.internals.ConsumerEventLoop Commit failed with org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets. Caused by: org.apache.kafka.common.errors.DisconnectException: null
11/24/22 6:50:06.452 AM WARN r.k.r.internals.ConsumerEventLoop Commit failed with exceptionorg.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets., retries remaining 99
…
11/24/22 6:50:06.452 AM WARN r.k.r.internals.Commit failed with exceptionorg.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets., retries remaining 93
11/24/22 6:50:06.486 DEBUG r.k.r.internals.ConsumerEventLoop -Paused - commits are retrying
11/24/22 6:50:06.987 DEBUG r.k.r.internals.ConsumerEventLoop -Resumed
11/24/22 6:50:07.387 DEBUG r.k.r.internals.ConsumerEventLoop -Emitting 1 records, requested now 1
11/24/22 6:50:07.387 DEBUG r.k.r.internals.ConsumerEventLoop -onRequest.toAdd 1, paused false
…
11/24/22 6:51:05.248 DEBUG r.k.r.internals.ConsumerEventLoop -Paused - too many deferred commits
11/24/22 6:51:05.248 DEBUG r.k.r.internals.ConsumerEventLoop -Consumer woken
No more “ConsumerEventLoop” log after this until rebalance
code detail:
consumeMessge() {
ReceiverOptions basicReceiverOptions = ReceiverOptions.create(
consumerProperties)
.maxDeferredCommits(250)
.commitInterval(Duration.ofMillis(commitInterval))
.subscription(topics);
reactiveKafkaConsumerTemplate=new ReactiveKafkaConsumerTemplate<>(basicReceiverOptions);
return reactiveKafkaConsumerTemplate
.receive()
.publishOn(Schedulers.boundedElastic())
.flatMap(x -> Mono.just(x)
.delayElement(Duration.ofMillis(500),10))
.flatMap(receiverRecord ->
//process the record
messageServiceImpl.process(receiverRecord)
.doFinally(x -> {
//ack offset
log.info("MessageConsumer ACK offset={} ", receiverRecord.offset());
receiverRecord.receiverOffset().acknowledge();
})
.subscribeOn(Schedulers.boundedElastic())
)
.....
}

Looks like it might be a bug; please open an issue on GitHub.

Related

Reactor Kafka how to gracefully shutdown?

We use reactiveKafkaConsumerTemplate to receive messages,then process the message. We try to graceful shutdown the application.
Stop poll new messages
Wait for the current message finish process
Shutdown application
We tried reactiveKafkaConsumerTemplate.pause. It doesn’t stop consumer poll new messages when there is back pressure pause. We also tried dispose, it doesn't wait for message complete processing. How could we archive the graceful shutdown? Thanks.
log
r.k.r.internals.ConsumerEventLoop - Paused - back pressure
ReactiveKafkaConsumerTemplate pause topic=test-topic, partition=5
r.k.r.internals.ConsumerEventLoop - Paused - Consumer woken
r.k.r.internals.ConsumerEventLoop - Paused - Resume
reactiveKafkaConsumerTemplate.
receiveAutoAck()
.publishOn(Schedulers.boundedElastic())
.flatMap(x -> Mono.just(x)
.delayElement(Duration.ofMillis(300)),5)
.flatMap(message -> Mono.just(message)
.flatMap(processMessageImp::processMessage)
.onErrorResume(t -> Mono.empty())
);
public void pauseKafkaMessageConsumer() {
reactiveKafkaConsumerTemplate.assignment()
.doOnNext(tp -> log.info("ReactiveKafkaConsumerTemplate pause topic={}, partition={}",
tp.topic(),tp.partition()))
.flatMap(topicParts -> reactiveKafkaConsumerTemplate.pause(topicParts))
.subscribe();
}
#PreDestroy
public void onExit() {
pauseKafkaMessageConsumer();
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
log.error("onExit Error while PreDestroy ");
}
}
with 1.3.13 consumer paused in back pressure case, But Consumer resumed after rebalance, Would it be expected behavior?
12/8/22 10:53:42.559 PM DEBUG r.k.r.internals.ConsumerEventLoop Emitting 1 records, requested now 1
12/8/22 10:53:42.559 PM DEBUG r.k.r.internals.ConsumerEventLoop - onRequest.toAdd 1, paused false
12/8/22 10:53:45.966 PM DEBUG r.k.r.internals.ConsumerEventLoop Async committing: {test-topic-9=OffsetAndMetadata{offset=10934, leaderEpoch=null, metadata=''}}
12/8/22 10:53:46.258 PM reactiveKafkaConsumerTemplate pause
12/8/22 10:54:06.185 PM DEBUG r.k.r.internals.ConsumerEventLoop onPartitionsRevoked [test-topic-9, test-topic-8]
12/8/22 10:54:07.289 PM DEBUG r.k.r.internals.ConsumerEventLoop onPartitionsAssigned [test-topic-9, test-topic-8]
12/8/22 10:54:07.505 PM DEBUG r.k.r.internals.ConsumerEventLoop Emitting 12 records, requested now 1

ansible async module returning task cannot be completed withing limited time though it completes well in time

I have written this playbook in ansible 2.9.9
sample.yml
---
name: async
hosts: [test-servers]
tasks:
- name: sleep for 30 seconds
command: sleep 30
async: 50
poll: 5
- name: side by side task
command: touch /tmp/sideTask
but when running ansible-playbook sample.yml
its returning
PLAY [async] *******************************************************************
TASK [Gathering Facts] *********************************************************
ok: [target1]
TASK [sleep for 30 seconds] ****************************************************
fatal: [target1]: FAILED! => {
"changed": false
}
MSG:
async task did not complete within the requested time - 50s
msg:
async task did not complete within the requested time - 50s
PLAY RECAP *********************************************************************
target1 : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0

java.util.concurrent.TimeoutException: Request timeout after 60000 ms when using camel-ahc

I have a simple web service, ws1, that just has a setBody to "hello world" which is exposed by netty. I want to call this web service asynchronously with the help of camel-ahc.
for doing that, I have a main camel context that call the ws1 every 6 seconds, but after calling ws1 in another thread, the control of the program is not returning to the main camel context thread and it seems camel-ahc component is not working and after 60 seconds a request timeout exception is happening.
in my pom i have added:
camel-ahc
camel-reactive-streams
<camelContext trace="true" id="mainCamelContext" xmlns="http://camel.apache.org/schema/blueprint" >
<route id="ahc-route-first-api">
<from uri="timer://webinar?period=6000"/>
<log message="this is body: ${body}"/>
<to uri="ahc:http://192.168.100.232:9999/ws1"/>
<log message="this is body after call: ${body}"/>
</route>
</camelContext>
when install bundle in Fuse:
10:35:18.914 INFO [Camel (mainCamelContext) thread #316 -
timer://webinar] this is body: 10:35:18.914 INFO [Camel
(mainCamelContext) thread #316 - timer://webinar]
ID-localhost-localdomain-1552973873885-38-116 >>>
(ahc-route-first-api) log[this is body: ${body}] -->
ahc://http://192.168.100.232:9999/api?throwExceptionOnFailure=false
<<< Pattern:InOnly,
Headers:{breadcrumbId=ID-localhost-localdomain-1552973873885-38-116,
firedTime=Sat Apr 06 10:35:18 IRDT 2019}, BodyType:null, Body:[Body is
null] 10:35:19.202 WARN [AsyncHttpClient-timer-87-1] Error processing
exchange. Exchange[ID-localhost-localdomain-1552973873885-38-114].
Caused by: [java.util.concurrent.TimeoutException - Request timeout to
192.168.100.232/192.168.100.232:9999 after 60000 ms] java.util.concurrent.TimeoutException: Request timeout to
192.168.100.232/192.168.100.232:9999 after 60000 ms at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
[1990:wrap_file__home_ossl_.m2_repository_org_asynchttpclient_async-http-client_2.4.3_async-http-client-2.4.3.jar_Export-Package_org.asynchttpclient.__version_2.4.3:0.0.0]
at
org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
[1990:wrap_file__home_ossl_.m2_repository_org_asynchttpclient_async-http-client_2.4.3_async-http-client-2.4.3.jar_Export-Package_org.asynchttpclient.__version_2.4.3:0.0.0]
at
io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663)
[654:io.netty.common:4.1.16.Final-redhat-2] at
io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738)
[654:io.netty.common:4.1.16.Final-redhat-2] at
io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466)
[654:io.netty.common:4.1.16.Final-redhat-2] at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
[654:io.netty.common:4.1.16.Final-redhat-2] at
java.lang.Thread.run(Thread.java:748) [?:?] 10:35:19.203 ERROR
[AsyncHttpClient-timer-87-1] Failed delivery for (MessageId:
ID-localhost-localdomain-1552973873885-38-117 on ExchangeId:
ID-localhost-localdomain-1552973873885-38-114). Exhausted after
delivery attempt: 1 caught: java.util.concurrent.TimeoutException:
Request timeout to 192.168.100.232/192.168.100.232:9999 after 60000 ms

Exception 504 when registering the consumer

I've been working with Symfony 2.7 and the RabbitMQBundle to handle some long processes asynchronously.
After facing the issue where the MySQL connection dies after a few minutes, I discovered rabbitmq-cli-consumer, a small app in Go that takes care of consuming the queue, and gives its content to a command.
In my case, I use it with this command: ./rabbitmq-cli-consumer -c configuration-stock.conf --include -V -e 'php app/console amqp:consume:stock --env=prod -vvv', with this configuration file:
[rabbitmq]
host = HOST
username = USERNAME
password = PASSWORD
vhost=/VHOST
port=PORT
queue=stock
compression=Off
[exchange]
name=exports
type=direct
durable=On
[queuesettings]
routingkey=stock
messagettl=10000
deadLetterExchange=exports.dl
deadLetterroutingkey=stock
priority=10
To handle errors, I intend to use RabbitMQ's x-dead-letter-exchange and x-dead-letter-routing-key configuration, to be able to retry the message later (in case something went temporarly wrong).
My issue is that, when I define my queues in RabbitMQBundle's configuration, rabbitmq-cli-consumer is unable to consume the queue, throwing this error:
2018/04/23 11:35:54 Connecting RabbitMQ...
2018/04/23 11:35:54 Connected.
2018/04/23 11:35:54 Opening channel...
2018/04/23 11:35:54 Done.
2018/04/23 11:35:54 Setting QoS...
2018/04/23 11:35:54 Succeeded setting QoS.
2018/04/23 11:35:54 Declaring queue "stock"...
2018/04/23 11:35:54 Registering consumer...
2018/04/23 11:35:54 failed to register a consumer: Exception (504) Reason: "channel/connection is not open"
Here is the configuration I use for RabbitMQBundle:
old_sound_rabbit_mq:
producers:
exports:
connection: default
exchange_options:
name: 'exports'
type: direct
exports_dl:
connection: default
exchange_options:
name: 'exports.dl'
type: direct
consumers:
stock_dead_letter:
connection: default
exchange_options:
name: exports.dl
type: direct
queue_options:
name: stock.dl
routing_keys:
- stock
arguments:
x-dead-letter-exchange: ['S', 'exports']
x-dead-letter-routing-key: ['S', 'stock']
x-message-ttl: ['I', 60000]
callback: amqp.consumers.exports.stock
multiple_consumers:
exports:
connection: default
exchange_options:
name: 'exports'
type: direct
queues:
stock:
name: stock
callback: amqp.consumers.exports.stock
routing_keys:
- stock
arguments:
x-dead-letter-exchange: ['S', 'exports.dl']
x-dead-letter-routing-key: ['S', 'stock']
Has anyone ever encountered something similar ? And how did you solve it ?

FIRAnalytics No network. Upload task will not be scheduled

+(void)load
{
[super load];
[self aspect_hookSelector:#selector(viewWillAppear:) withOptions:0 usingBlock:^(id<AspectInfo> info, BOOL animated) {
HDFAppLog(#"**************==");
NSString *currentPageName = [[info instance] hdf_className]; //页面名称,如:HDFSearchHospitalViewController
//FireBaseAnalytics
[FIRAnalytics logEventWithName:"page" parameters:{
"pageName":currentPageName
}];
//GoogleAnalystics
id<GAITracker> tracker = [GAI sharedInstance].defaultTracker; //调用默认跟踪器
[tracker set:kGAIScreenName value:currentPageName];
[tracker send:[[GAIDictionaryBuilder createScreenView]build]];
} error:NULL];
}
end
use FIRAnalytics like this,
but it print error below:
FIRAnalytics/DEBUG> No network. Upload task will not be scheduled
and these:
2016-10-10 15:01:58.038 newPatient[8480:] FIRAnalytics/DEBUG> Do not schedule an upload task. Task already exists
2016-10-10 15:02:07.134 newPatient[8480:] FIRAnalytics/DEBUG> Network status has changed. Code, status: 1, Disconnected
2016-10-10 15:02:07.136 newPatient[8480:] FIRAnalytics/ERROR> Encounter network error. Code, error: -1003, Error Domain=NSURLErrorDomain Code=-1003 "未能找到使用指定主机名的服务器。" UserInfo={NSUnderlyingError=0x7fbf305dcd30 {Error Domain=kCFErrorDomainCFNetwork Code=-1003 "(null)" UserInfo={_kCFStreamErrorCodeKey=8, _kCFStreamErrorDomainKey=12}}, NSErrorFailingURLStringKey=https://app-measurement.com/config/app/1:442821079824:ios:88cc404211cdcfea?platform=ios&app_instance_id=1419B4CCA10A4607861CEDB35CB95174&gmp_version=3403, NSErrorFailingURLKey=https://app-measurement.com/config/app/1:442821079824:ios:88cc404211cdcfea?platform=ios&app_instance_id=1419B4CCA10A4607861CEDB35CB95174&gmp_version=3403, _kCFStreamErrorDomainKey=12, _kCFStreamErrorCodeKey=8, NSLocalizedDescription=未能找到使用指定主机名的服务器。}
2016-10-10 15:02:07.138 newPatient[8480:] FIRAnalytics/DEBUG> Fetched configuration. Status code: 0
2016-10-10 15:02:07.138 newPatient[8480:] FIRAnalytics/DEBUG> Unable to get the configuration from server. Network request failed. Code, Error: 0, Error Domain=NSURLErrorDomain Code=-1003 "未能找到使用指定主机名的服务器。" UserInfo={NSUnderlyingError=0x7fbf305dcd30 {Error Domain=kCFErrorDomainCFNetwork Code=-1003 "(null)" UserInfo={_kCFStreamErrorCodeKey=8, _kCFStreamErrorDomainKey=12}}, NSErrorFailingURLStringKey=https://app-measurement.com/config/app/1:442821079824:ios:88cc404211cdcfea?platform=ios&app_instance_id=1419B4CCA10A4607861CEDB35CB95174&gmp_version=3403, NSErrorFailingURLKey=https://app-measurement.com/config/app/1:442821079824:ios:88cc404211cdcfea?platform=ios&app_instance_id=1419B4CCA10A4607861CEDB35CB95174&gmp_version=3403, _kCFStreamErrorDomainKey=12, _kCFStreamErrorCodeKey=8, NSLocalizedDescription=未能找到使用指定主机名的服务器。}
2016-10-10 15:02:07.139 newPatient[8480:] FIRAnalytics/DEBUG> Network fetch failed. Will retry later. Code, error: 0, Error Domain=NSURLErrorDomain Code=-1003 "未能找到使用指定主机名的服务器。" UserInfo={NSUnderlyingError=0x7fbf305dcd30 {Error Domain=kCFErrorDomainCFNetwork Code=-1003 "(null)" UserInfo={_kCFStreamErrorCodeKey=8, _kCFStreamErrorDomainKey=12}}, NSErrorFailingURLStringKey=https://app-measurement.com/config/app/1:442821079824:ios:88cc404211cdcfea?platform=ios&app_instance_id=1419B4CCA10A4607861CEDB35CB95174&gmp_version=3403, NSErrorFailingURLKey=https://app-measurement.com/config/app/1:442821079824:ios:88cc404211cdcfea?platform=ios&app_instance_id=1419B4CCA10A4607861CEDB35CB95174&gmp_version=3403, _kCFStreamErrorDomainKey=12, _kCFStreamErrorCodeKey=8, NSLocalizedDescription=未能找到使用指定主机名的服务器。}
2016-10-10 15:02:07.139 newPatient[8480:] FIRAnalytics/DEBUG> No network. Upload task will not be scheduled
2016-10-10 15:02:07.139 newPatient[8480:] FIRAnalytics/DEBUG> Canceling active timer
2016-10-10 15:02:27.958 newPatient[8480:13764850] Firebase/Network/ERROR> Encounter network error. Code, error: -1001, Error Domain=NSURLErrorDomain Code=-1001 "请求超时。" UserInfo={NSErrorFailingURLStringKey=https://play.googleapis.com/log, NSErrorFailingURLKey=https://play.googleapis.com/log, _kCFStreamErrorDomainKey=4, _kCFStreamErrorCodeKey=-2103, NSLocalizedDescription=请求超时。}
2016-10-10 15:02:27.961 newPatient[8480] [Firebase/Core][I-COR000020] Error posting to Clearcut: Error Domain=NSURLErrorDomain Code=-1001 "请求超时。" UserInfo={NSErrorFailingURLStringKey=https://play.googleapis.com/log, NSErrorFailingURLKey=https://play.googleapis.com/log, _kCFStreamErrorDomainKey=4, _kCFStreamErrorCodeKey=-2103, NSLocalizedDescription=请求超时。}, with Status Code: 0
debug logs below:
2016-10-10 11:38:58.152 newPatient[7428:] FIRAnalytics/DEBUG> Debug mode is enabled. Marking event as debug and real-time. Event name, parameters: page, {
"_dbg" = 1;
"_o" = app;
"_r" = 1;
pageName = HDFPhDoctorIntroduceViewController;
}
There is either no network connection or your network condition is flaky that it could not send data to the server. If there is no network, it will not schedule upload task. Sometimes, depending on where you are from, the network traffic might be filtered so it also fail to upload data. I think this is normal behavior.

Resources