Spring Kafka got an infinite retry attempt
Version: Spring-Kafka-2.2.4.RELEASE
Listener setting
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setRetryTemplate(getRetryTemplate());
factory.setConcurrency(concurrency);
return factory;
}
And using a RetryTemplate with SimpleRetryPolicy for setting the MaxAttempts
Kafka Listener
#KafkaListener(topics = "topic")
public void myListener(ConsumerRecord<String, String> record) {
//Code here
}
Some strange issue is happening on server with multiple instance running. So whenever myListener throwing an exception, the message will be retried up to the max attempt and ACK'ed fine on local.
But on the server, the following is happen:
Node A: Consume #1 -> Retry #1 -> Retry #1 -> Consume #2 -> Retry #2 -> Retry #2 -> Consume #3 ....
Node B:...................................... Consume #1 -> Retry #1 -> Retry #1 -> Consume #2 ....
And so on without any ACK. After 500 Message (ex: Consume #500), node A reset again back to Consume #1. Not sure if the number #500 is having any relation to max.poll.records
Still can't reproducing the issue on local setup since it's working fine (ACK after maxAttempt).
But is there any kind of explanation/solution for this problem? currently the shortcut is just catch the exception and prevent the retrying mechanism so the message can be ACK and offset moving forwards.
Thanks!
Related
A simple example is currently being made through the spring kafka.
If an exception occurs at the service layer, I want to commit the original offset after trying to retry and loading it into the dead letter queue.
However, the dead letter queue is loaded properly, but the original message remains in the kafka because the commit is not processed.
To show you my code, it is as follows.
KafkaConfig.java
...
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setCommonErrorHandler(kafkaListenerErrorHandler());
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
return factory;
}
private CommonErrorHandler kafkaListenerErrorHandler() {
DefaultErrorHandler defaultErrorHandler = new DefaultErrorHandler(
new DeadLetterPublishingRecoverer(template, DEAD_TOPIC_DESTINATION_RESOLVER),
new FixedBackOff(1000, 3));
defaultErrorHandler.setCommitRecovered(true);
defaultErrorHandler.setAckAfterHandle(true);
defaultErrorHandler.setResetStateOnRecoveryFailure(false);
return defaultErrorHandler;
}
...
KafkaListener.java
...
#KafkaListener(topics = TOPIC_NAME, containerFactory = "kafkaListenerContainerFactory", groupId = "stock-adjustment-0")
public void subscribe(final String message, Acknowledgment ack) throws IOException {
log.info(String.format("Message Received : [%s]", message));
StockAdjustment stockAdjustment = StockAdjustment.deserializeJSON(message);
if(stockService.isAlreadyProcessedOrderId(stockAdjustment.getOrderId())) {
log.info(String.format("AlreadyProcessedOrderId : [%s]", stockAdjustment.getOrderId()));
} else {
if(stockAdjustment.getAdjustmentType().equals("REDUCE")) {
stockService.decreaseStock(stockAdjustment);
}
}
ack.acknowledge(); // <<< does not work!
}
...
Stockservice.java
...
if(stockAdjustment.getQty() > stock.getAvailableStockQty()) {
throw new RuntimeException(String.format("Stock decreased Request [decreasedQty: %s][availableQty : %s]", stockAdjustment.getQty(), stock.getAvailableStockQty()));
}
...
At this time, when RuntimeException occur in the service layer as above, the DLT is issued through an CommonErrorhandler according to the Kafka setting.
However, after issuing the DLT, the original message remains in Kafka, so there is a need for a solution.
I looked it up and found that the setting I wrote is processed through SeekUtils.seekOrRecover(), and if it is not processed even if the maximum number of attempts is not processed, an exception occurs and the original offset is rolled back without processing a commit.
According to the document, it seems that the AfterRollbackProcessor handles rollback if it fails with the default value, but I don't know how to write the code to commit even if it fails.
EDITED
The above code and settings work normally.
I thought the consumer lag would occur, but when I judged the actual logic code(SeekUtils.seekOrRecover()) and checked the offset commit and lag, I confirmed that it works normally.
I think it was caused by my mistake.
Records are never removed (until they expire), the consumer's committed offset is updated.
Use kafka-consumer-groups.sh to describe the group to see the committed offset for the failed record that was sent to the DLT.
I am trying on reactor-kafka for consuming messages. Everything else work fine, but I want to add a retry(2) for failing messages. spring-kafka already retries failed record 3 times by default, I want to achieve the same using reactor-kafka.
I am using spring-kafka as a wrapper for reactive-kafka. Below is my consumer template:
reactiveKafkaConsumerTemplate
.receiveAutoAck()
.map(ConsumerRecord::value)
.flatMap(this::consumeWithRetry)
.onErrorContinue((error, value)->log.error("something bad happened while consuming : {}", error.getMessage()))
.retryWhen(Retry.backoff(30, Duration.of(10, ChronoUnit.SECONDS)))
.subscribe();
Let us consider the consume method is as follows
public Mono<Void> consume(MessageRecord message){
return Mono.error(new RuntimeException("test retry"); //sample error scenario
}
I am using the following logic to retry the consume method on failure.
public Mono<Void> consumeWithRetry(MessageRecord message){
return consume(message)
.retry(2);
}
I want to retry consuming the message if the current consumer record fails with exception. I have tried to wrap the consume method with another retry(3) but that does not serve the purpose. The last retryWhen is only for retrying subscription on kafka rebalances.
#simon-baslé #gary-russell
Previously while retrying I was using the below approach:
public Mono<Void> consumeWithRetry(MessageRecord message){
return consume(message)
.retry(2);
}
But it was not retrying. After adding Mono.defer, the above code works and adds required retry.
public Mono<Void> consumeWithRetry(MessageRecord message){
return Mono.defer(()->consume(message))
.retry(2);
}
For a spring boot app I am using RedisTemplate injected into a service bean to do simple gets/sets. This is for AWS Elasticache cluster enabled. I started the app and sent a few requests thru and the performance is slow and the Elasticache metrics show new connections equivalent to the number of requests. By slow I am seeing times of roughly 100ms for each call. The long latency and metrics for new connections indicates the native connection in the LettuceConnection is not retained. I am only using spring-data to manage getting the connection setup. Specifically, I don't want to connect in a #Config class and have the app fail on startup if there is an issue connecting. It's a critical app that needs to start even if the cache is not available at that time. And I don't want to write the code to synchronize getting the single native connection during multi-threaded access. Any ideas why the native connection would not be saved? Here's my config:
private ClusterClientOptions clusterClientOptions() {
//#formatter:off
return ClusterClientOptions.builder()
.socketOptions(SocketOptions
.builder()
.connectTimeout(Duration.ofMillis(properties.getConnectionTimeoutMs()))
.build())
.requestQueueSize(properties.getRequestQueueSize())
.topologyRefreshOptions(ClusterTopologyRefreshOptions
.builder()
.enablePeriodicRefresh(properties.isPeriodicRefresh())
.build())
.build();
//#formatter:on
}
private LettuceClientConfiguration lettuceClientConfiguration() {
//#formatter:off
return LettuceClientConfiguration
.builder()
.clientOptions(clusterClientOptions())
.commandTimeout(Duration.ofMillis(properties.getCommandTimeoutMs()))
.useSsl()
.build();
//#formatter:on
}
private LettuceConnectionFactory serviceContextLettuceConnectionFactory() {
RedisClusterConfiguration clusterConfig = new RedisClusterConfiguration();
clusterConfig.clusterNode(properties.getCacheEndpoint(), properties.getCachePort());
clusterConfig.setPassword(RedisPassword.of(properties.getCachePassword()));
LettuceConnectionFactory lettuceConnectionFactory =
new LettuceConnectionFactory(clusterConfig, lettuceClientConfiguration());
lettuceConnectionFactory.setShareNativeConnection(true);
lettuceConnectionFactory.afterPropertiesSet();
return lettuceConnectionFactory;
}
private RedisTemplate<String, String> redisTemplate() {
RedisTemplate<String, String> template = new RedisTemplate<>();
template.setConnectionFactory(serviceContextLettuceConnectionFactory());
template.afterPropertiesSet();
return template;
}
The template gets set into a singleton service class which calls template.opsForValue().get(key), etc. It works but it's slow and always creating new connections.
Solved: there was #RefreshScope in this #Configuration class (not shown) which caused a different behavior than I expected. Still not 100% sure of the details but it seemed to be recreating the factory and template each time causing the new connections. I removed that annotation and it is re-using the native connection as expected
We are using spring-kafka-2.2.8.RELEASE. I have an specific situation where I need help. I have 4 topics topic, retryTopic, successTopic and errorTopic. If topic fails, should be redirected to retryTopic where the 3 attempts to retry will be made. If those attempts fails, must redirect to errorTopic. In case of sucess on both topic and retryTopic, should be redirected to the sucessTopic. This situation is already implemented based on the question How to retry with spring kafka version 2..2.
But now, I have a new situation where I need to call the retryTopic listener from inside the topic listener based on a business logic error without an Exception been thrown(it already calls the retryTopic when an exception is thrown and it must remain with this behavior). And I also need to know on which retry attempt number the retryTopic is been called as a paramater of the listener bellow.
#KafkaListener(id = "so60172304.2", topics = "retryTopic")
public void listen2(String in) {
RetryTemplate retryTemplate = new RetryTemplate();
retryTemplate.execute(new RetryCallback<Void, RuntimeException>() {
#Override
public Void doWithRetry(RetryContext retryContext) throws RuntimeException {
// Can I get the retry count here? It didn't work
Integer count =RetrySynchronizationManager.getContext().getRetryCount());
return this.doWithRetry(retryContext);
}
});
}
There is no reason you can't call one listener from another (but you won't get retries unless you call it using a RetryTemplate in the first method).
If you use a RetryTemplate configured on the container factory to do the retries (rather than adding a BackOff to the SeektoCurrentErrorHandler in versions 2.3.x and higher), you can obtain the retry count (starting at zero) like this...
#KafkaListener(id = "so60172304.2", topics = "retryTopic")
public void listen2(String in) {
int retryCount = RetrySynchronizationManager.getContext().getRetryCount();
...
}
getContext() will return null if you call this directly from the first method (unless you wrap the call in a RetryTemplate.execute()).
In 2.5.x a delivery attempt header will be available (optionally) even if using the SeektoCurrentErrorHandler with a BackOff instead of using a RetryTemplate in the container factory.
Hi am migrating my application from jrun to websphere server and it has functionality of asynchronous messaging usnig message listener.
public void startMessageListener(final String queueName,
final MessageListener listener) throws Exception {
QueueSession queueSession = createNewQueueSession();
Queue queue = (Queue) queueContext.lookup(queueName);
QueueReceiver queueReceiver = queueSession.createReceiver((queue));
queueReceiver.setMessageListener(listener);
//LOG Forging fix-Fortify Scan TTP 345546 -START
log.debug("started queue " + queueName);
//LOG Forging fix-Fortify Scan TTP 345546 -END
}
when i use same code in websphere it throws error javax.jms.IllegalStateException: Method setMessageListener not permitted
Because the JMS spec does not allow you to use this feature in a JEE Container.
Please help me to make it work with less code changes.