spring kafka error handling and retries - spring-kafka

Hi We are using Spring kafka 1.3.3 and our app is consume - process - publish pipeline.
How can we handle retry and seek backs if any failure in the pipeline at produce phase. Ex: app is consuming messages, process them and publish into another topic in async fashion. But if there is any error in publishing
How can i retry publishing failed message.
If the message sending is failed even after retries how can i seek back my consumer to the previous offset ? because by then my consumer position will be some where ahead in the log.
How can acknowledge the message in producer call back when the message is successfully produced.

It's easier with newer releases because you have direct access to the Consumer, but with 1.3.x you can implement ConsumerSeekAware - see the documentation. You have to perform the seek on the listener thread because the Consumer is not thread-safe.

Related

spring-kafka consumer throws deserialization exception when producer produces messages

I am new to spring-kafka. Basically I have a spring boot app with Spring Kafka (spring-kafka 2.9.0 and spring-boot 2.6.4). As long as I run the program as producer or consumer I don’t see run into any issues.
But if I run the same program to produce messages to topic-A and listen to messages coming from topic-B at the same time then I run into deserialization errors in Producer (which is confusing) while sending messages to topic-A. Producer and consumer have their own configs and producer produces a different POJO to serialize and consumer de-serializes a different POJO, but I am failing to understand why consumer is invoked while messages are being produced by producer.
Can someone please help me understand what am I doing wrong?
My apologies, on further investigation I found that the issue was not with spring-kafka. Its an symptom of some other issue. The issue itself is that I am using a connector to read and write messages to a database. For some reason when producer publishes the message, sink connector is publishing messages to topic-B. Since kafka-consumer is listening to topic-B and is not configured to deserialize the newly published messages it runs into exceptions. This has nothing to do with Spring Kafka

How can we pause Kafka consumer polling/processing records when there is an exception because of downstream system

I'm using spring boot 2.1.7.RELEASE and spring-kafka 2.2.8.RELEASE.And I'm using #KafkaListener annotation to create a consumer and I'm using all default settings for the consumer.
Now, In my consumer, the processing logic includes a DB call and I'm sending the record to DLT if there is an error/exception during processing.
With this setup, If the DB is down for few mins because of some reason, I want to pause/stop my consumer from consuming more records otherwise it keeps on consuming the messages and will get the DB exception and eventually fill up my DLT which I don't want to do unless the DB is back (based on some health check).
Now I've few questions here.
Does spring-kafka provide an option to trigger infinite retry based on the exception type (in this case a DB exception but I want to add few more types of exception based on my consumer logic)
Does spring-kafka provide an option to trigger the message consumption based on a condition?
There is a ContainerStoppingErrorHandler but it will stop the container for all exceptions.
You would need to create a custom error handler that stops (or pauses) the container after a specific failure as well as some mechanism to restart (or resume) the container.

Get Failed Messages with KafkaListener

I am using spring-kafka latest version and using #KafkaListener. I am using BatchListener. In the method that is listening to the list of messages i want to call the acknowledge only if the batch of records are processed. But the spring framework is not sending those messages again until I restart the application. So I used stop() and start() methods on KafkaListenerEndpointRegistry if the records were not processed but I feel like its not a good way of solving the problem. Is there a better way of handling this.
See the documentation for the SeekToCurrentBatchErrorHandler.
The SeekToCurrentBatchErrorHandler seeks each partition to the first record in each partition in the batch so the whole batch is replayed. This error handler does not support recovery because the framework cannot know which message in the batch is failing.

How to handle errors and retries in spring-kafka

This is a question related to :
https://github.com/spring-projects/spring-kafka/issues/575
I'm using spring-kafka 1.3.7 and transactions in a read-process-write cycle.
For this purpose, I should use a KTM on the spring kafka container to enable transaction on the whole listener process and automatic handling the transaction id based on the partition for zombie fencing(1.3.7 changes).
If I understand well from the issue #575, I can not use a RetryTemplate in a container when using a transaction manager.
How am I supposed to handle errors and retries in a such case ?
The default behavior with transaction is infinite retries ? This seems really dangerous. An unexpected exception might simply block the whole process in production.
The upcoming 2.2 release adds recovery to the DefaultAfterRollbackProcessor - so you can stop retrying after some number of attempts.
Docs Here, PR here.
It also provides an optional mechanism to send the failed record to a dead-letter topic.
If you can't move to 2.2 (release candidate due at the end of this week, with GA in October), you can provide a custom AfterRollbackProcessor with similar functionality.
EDIT
Or, you could add code to your listener (or its error handler) to keep track of how many times the same record has been delivered, and handle the error in your listener, or its listener-level error handler.

Difference between resuming a suspended message or suspended service instance

We encounter a bunch of suspended service instances (like 100). Also we notice that there are more than 100 (related) suspended messages (mostly with NACKs).
What is the difference then to resume from a suspended service instance or to resume from a suspended message ?
Service instances process messages.
BizTalk breaks up services into Service Classes, such as Routing Failure, Isolated Adapter, and Messaging. Those services are distinct from the messages, though the messages are associated with a service. When something fails in BizTalk, typically both a message and a service instance show up in the BizTalk Administration Console as being suspended. If you view the details of the service, then you'll see that it contains a tab with the message(s).
In this context, the message is a property of the service instance. The service was trying to do something with the message and failed. Thus is makes sense to resume the action (service instance), which will make use of the data (message). It doesn't make sense to try and do something like resume a NACK (a message); instead, you should resume the service instance. The NACK can help you find out what went wrong, but if it doesn't go away after resolving the problem and resuming the service instance, then it typically can be safely cleared out.

Resources