Spring Kafka - "ErrorHandler threw an exception" and lost some records - spring-kafka

Having Consumer polling 2 records at a time, i.e.:
#Bean
ConsumerFactory<String, String> consumerFactory() {
Map<String, Object> config = Map.of(
BOOTSTRAP_SERVERS_CONFIG, "localhost:9092",
GROUP_ID_CONFIG, "my-consumers",
AUTO_OFFSET_RESET_CONFIG, "earliest",
MAX_POLL_RECORDS_CONFIG, 2);
return new DefaultKafkaConsumerFactory<>(config, new StringDeserializer(), new StringDeserializer());
}
and ErrorHandler which can fail handling faulty record:
class MyListenerErrorHandler implements ContainerAwareErrorHandler {
#Override
public void handle(Exception thrownException,
List<ConsumerRecord<?, ?>> records,
Consumer<?, ?> consumer,
MessageListenerContainer container) {
simulateBugInErrorHandling(records.get(0));
skipFailedRecord(); // seek offset+1, which never happens
}
private void simulateBugInErrorHandling(ConsumerRecord<?, ?> record) {
throw new NullPointerException(
"DB transaction failed when saving info about failure on offset = " + record.offset());
}
}
Then such scenario is possible:
Topic gets 3 records
Consumer polls 2 records at a time
MessageListener fails to process the first record due to faulty payload
ErrorHandler fails to process the failure and itself throws an exception, e.g. due to some temporary issue
Third record gets processed
Second record is never processed (never enters MessageListener)
How to ensure no record is left unprocessed when ErrorHandler throws an exception with above scenario?
My goal is to achieve stateful retry logic with delays, but for brevity I omitted code responsible for tracking failed records and delaying retry.
I'd expect that after ErrorHandler throws an exception, skipping an entire batch of records should not happen. But it does.
Is it correct behavior?
Should I rather deal with commits manually that use Spring/Kafka defaults?
Should I use different ErrorHandler or handle method? (I need an access to Container to make a pause() for delayed retry logic; cannot use Thread.sleep())
Somehow related issue: https://github.com/spring-projects/spring-kafka/issues/1265
Full code: https://github.com/ptomaszek/spring-kafka-error-handler

The consumer has to be re-positioned (using seeks) in order to re-fetch the records after the failed one.
Use a DefaultErrorHandler (2.8.x and later) or a SeekToCurrentErrorHandler with earlier versions.
You can add retry options and a recoverer to deal with the failed record; by default it is just logged.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#default-eh
https://docs.spring.io/spring-kafka/docs/2.7.x/reference/html/#seek-to-current
You need to do the seeks first (or in a finally block), before any exceptions can be thrown; the container does not commit the offset if the error handler throws an exception.
Kafka maintains 2 offsets - the current committed offset and the current position (set to the committed offset when the consumer starts). The next poll always returns the next record after the last poll. unless a seek is performed.
The default error handlers catch any exceptions thrown by the recoverer and makes sure that the current (and subsequent) records will be returned by the next poll. See SeekUtils.doSeeks().

Related

How to immediately stop processing new messages when inside a message handler?

I have a Rebus bus setup with a single worker and max parallelism of 1 that processes messages "sequentialy". In case an handler fails, or for specific business reason, I'd like the bus instance to immediately stop processing messages.
I tried using the Rebus.Event package to detect the exception in the AfterMessageHandled handler and set the number of workers to 0, but it seems other messages are processed before it can actually succeed in stoping the single worker instance.
Where in the event processing pipeline could I do
bus.Advanced.Workers.SetNumberOfWorkers(0); in order to prevent further message processing?
I also tried setting the number of workers to 0 inside a catch block in the handler itself, but it doesn't seem like the right place to do it since SetNumberOfWorkers(0) waits for handlers to complete before returning and the caller is the handler... Looks like a some kind of a deadlock to me.
Thank you
This particular situation is a little bit of a dilemma, because – as you've correctly observed – SetNumberOfWorkers is a blocking function, which will wait until the desired number of threads has been reached.
In your case, since you're setting it to zero, it means your message handler needs to finish before the number of threads has reached zero... and then: 💣 ☠🔒
I'm sorry to say this, because I bet your desire to do this is because you're in a pickle somehow – but generally, I must say that wanting to process messages sequentually and in order with message queues is begging for trouble, because there are so many things that can lead to messages being reordered.
But, I think you can solve your problem by installing a transport decorator, which will bypass the real transport when toggled. If the decorator then returns null from the Receive method, it will trigger Rebus' built-in back-off strategy and start chilling (i.e. it will increase the waiting time between polling the transport).
Check this out – first, let's create a simple, thread-safe toggle:
public class MessageHandlingToggle
{
public volatile bool ProcessMessages = true;
}
(which you'll probably want to wrap up and make pretty somehow, but this should do for now)
and then we'll register it as a singleton in the container (assuming Microsoft DI here):
services.AddSingleton(new MessageHandlingToggle());
We'll use the ProcessMessages flag to signal whether message processing should be enabled.
Now, when you configure Rebus, you decorate the transport and give the decorator access to the toggle instance in the container:
services.AddRebus((configure, provider) =>
configure
.Transport(t => {
t.Use(...);
// install transport decorator here
t.Decorate(c => {
var transport = c.Get<ITransport>();
var toggle = provider.GetRequiredService<MessageHandlingToggle>();
return new MessageHandlingToggleTransportDecorator(transport, toggle);
})
})
.(...)
);
So, now you'll just need to build the decorator:
public class MessageHandlingToggleTransportDecorator : ITransport
{
static readonly Task<TransportMessage> NoMessage = Task.FromResult(null);
readonly ITransport _transport;
readonly MessageHandlingToggle _toggle;
public MessageHandlingToggleTransportDecorator(ITransport transport, MessageHandlingToggle toggle)
{
_transport = transport;
_toggle = toggle;
}
public string Address => _transport.Address;
public void CreateQueue(string address) => _transport.CreateQueue(address);
public Task Send(string destinationAddress, TransportMessage message, ITransactionContext context)
=> _transport.Send(destinationAddress, message, context);
public Task<TransportMessage> Receive(ITransactionContext context, CancellationToken cancellationToken)
=> _toggle.ProcessMessages
? _transport.Receive(context, cancellationToken)
: NoMessage;
}
As you can see, it'll just return null when ProcessMessages == false. Only thing left is to decide when to resume processing messages again, pull MessageHandlingToggle from the container somehow (probably by having it injected), and then flick the bool back to true.
I hope can work for you, or at least give you some inspiration on how you can solve your problem. 🙂

Deserialisation error and logging the partition, topic and offset

I am handling deserialisation error using the ErrorHandlingDeserialiser sent on my DefaultKafkaConsumerFactory.
I have code a custom
try (ErrorHandlingDeserializer<MyEvent> errorHandlingDeserializer = new ErrorHandlingDeserializer<>(theRealDeserialiser)) {
errorHandlingDeserializer.setFailedDeserializationFunction(myCustomFunction::apply);
return new DefaultKafkaConsumerFactory<>(getConsumerProperties(), consumerKeyDeserializer, errorHandlingDeserializer);
}
My custom function does some processing and publishes to a poison pill topic and returns null.
When a deserialisation error occurs, I would like to log the topic, partition and offset. The only way I can think of doing this is to stop returning null in the function and return a new sub type of MyEvent. My KafkaListener could then interrogate the new sub type.
I have a #KafkaListener component, which listens for the ConsumerRecord as follows:
#KafkaListner(....)
public void onMessage(ConsumerRecord<String, MyEvent> record) {
...
...
// if record.value instance of MyNewSubType
// I have access to the topic, partition and offset here, so I could log it here
// I'd have to check that the instance of MyEvent is actually my sub type representing a failed record.
}
Is this the way to do it? I know null has special meaning Kafka.
The downside of this sub type approach is, I'd have to create a subtype every type using the ErrorHandlingDeserialiser.
Don't use a function; instead, the thrown DeserializationException is passed directly the container's ErrorHandler.
The SeekToCurrentErrorHandler considers these exceptions to be fatal and won't retry them, it passes the record to the recoverer.
There is a provided DeadLetterPublishingRecoverer which sends the record.
See https://docs.spring.io/spring-kafka/docs/current/reference/html/#annotation-error-handling
and
https://docs.spring.io/spring-kafka/docs/current/reference/html/#dead-letters

Spring Kafka Manual Immediate Acknowledgement along with SeekToCurrentErrorHandler

I am referring this answer:
https://stackoverflow.com/questions/56728833/seektocurrenterrorhandler-deadletterpublishingrecoverer-is-not-handling-deseria#:~:text=It%20works%20fine%20for%20me%20(note%20that%20Boot%20will%20auto-configure%20the%20error%20handler)...
Can we add manual immediate acknowledgement like below:
#KafkaListener(id = "so56728833", topics = "so56728833")
public void listen(Foo in, Acknowledgment ack {
System.out.println(in);
if (in.getBar().equals("baz")) {
throw new IllegalStateException("Test retries");
}
ack.acknowledge();
}
I want this because of following scenario:
Let's say I have processed 100 messages, now while processing next 10 records, my consumer gets down after processing 4 messages. In this case, rebalance will get triggered and this 4 messages will be processed again because I have not committed my offset.
Please help.
Yes, you can use manual immediate here - you can also use AckMode.RECORD and the container will automatically commit each offset after the record has been processed.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#committing-offsets

when using #StreamListener, customization to KafkaListenerContainerFactory are getting reflected in generated KafkaMessageListenerContainer?

I am using spring-cloud-stream with kafka binder to consume message from kafka . The application is basically consuming messages from kafka and updating a database.
There are scenarios when DB is down (which might last for hours) or some other temporary technical issues. Since in these scenarios there is no point in retrying a message for a limited amount of time and then move it to DLQ , i am trying to achieve infinite number of retries when we are getting certain type of exceptions (e.g. DBHostNotAvaialableException)
In order to achieve this i tried 2 approaches (facing issues in both the approaches) -
In this approach, Tried setting an errorhandler on container properties while configuring ConcurrentKafkaListenerContainerFactory bean but the error handler is not getting triggered at all. While debugging the flow i realized in the KafkaMessageListenerContainer that are created have the errorHandler field is null hence they use the default LoggingErrorHandler. Below are my container factory bean configurations -
the #StreamListener method for this approach is the same as 2nd approach except for the seek on consumer.
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object>
kafkaListenerContainerFactory(ConsumerFactory<String, Object> kafkaConsumerFactory) {
ConcurrentKafkaListenerContainerFactory<String, Object> factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(kafkaConsumerFactory);
factory.getContainerProperties().setAckOnError(false);
ContainerProperties containerProperties = factory.getContainerProperties();
// even tried a custom implementation of RemainingRecordsErrorHandler but call never went in to the implementation
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
return factory;
}
Am i missing something while configuring factory bean or this bean is only relevant for #KafkaListener and not #StreamListener??
The second alternative was trying to achieve it using manual acknowledgement and seek, Inside a #StreamListener method getting Acknowledgment and Consumer from headers, in case a retryable exception is received, I do certain number of retries using retrytemplate and when those are exhausted I trigger a consumer.seek() . Example code below -
#StreamListener(MySink.INPUT)
public void processInput(Message<String> msg) {
MessageHeaders msgHeaders = msg.getHeaders();
Acknowledgment ack = msgHeaders.get(KafkaHeaders.ACKNOWLEDGMENT, Acknowledgment.class);
Consumer<?,?> consumer = msgHeaders.get(KafkaHeaders.CONSUMER, Consumer.class);
Integer partition = msgHeaders.get(KafkaHeaders.RECEIVED_PARTITION_ID, Integer.class);
String topicName = msgHeaders.get(KafkaHeaders.RECEIVED_TOPIC, String.class);
Long offset = msgHeaders.get(KafkaHeaders.OFFSET, Long.class);
try {
retryTemplate.execute(
context -> {
// this is a sample service call to update database which might throw retryable exceptions like DBHostNotAvaialableException
consumeMessage(msg.getPayload());
return null;
}
);
}
catch (DBHostNotAvaialableException ex) {
// once retries as per retrytemplate are exhausted do a seek
consumer.seek(new TopicPartition(topicName, partition), offset);
}
catch (Exception ex) {
// if some other exception just log and put in dlq based on enableDlq property
logger.warn("some other business exception hence putting in dlq ");
throw ex;
}
if (ack != null) {
ack.acknowledge();
}
}
Problem with this approach - since I am doing consumer.seek() while there might be pending records from last poll those might be processed and committed if DB comes up during that period(hence out of order). Is there a way to clear those records while a seek is performed?
PS - we are currently in 2.0.3.RELEASE version of spring boot and Finchley.RELEASE or spring cloud dependencies (hence cannot use features like negative acknowledgement either and upgrade is not possible at this moment).
Spring Cloud Stream does not use a container factory. I already explained that to you in this answer.
Version 2.1 introduced the ListenerContainerCustomizer and if you add a bean of that type it will be called after the container is created.
Spring Boot 2.0 went end-of-life over a year ago and is no longer supported.
The answer I referred you shows how you can use reflection to add an error handler.
Doing the seek in the listener will only work if you have max.poll.records=1.

EJB 3.0 - Sequence of transactionally independent EJB calls with CMT

There is an MDB and a sequence of stateless EJBs to do some work with message (WAS 7.0, Java EE 5, EJB 3.0, JPA).
Sequence (using CMT):
MDB accepts message
MDB persists entity with the message details
MDB calls EJB 1 passing entities ID,
EJB1 does its piece of work with the message, depending on whether EJB1 succeed or not it calls
EJB2 or EJB5 passing the ID
EJB2 does its piece of ...
and so on till the last EJB (running time is several minutes).
All of those happens in one transaction: so if something is thrown in EJB4 - everything that happened before in this transaction will be rolled back.
I've tried to use REQUIRES_NEW for all the subsequent calls but it seems that changes in previous calls is not visible for subsequent calls.
Also, transaction becomes to long and sometimes timed out.
I'd like to have separate, independent transactions for:
A receiving and persisting message
B processing in EJB1
C processing in EJB2
....
so if execution of EJB2 failed, message should remain in DB and result of execution in EJB1 should be persisted as well.
So main question is, using CMT - is it possible to have short and indipendent transactions for the sequence?
Some more
can the transaction originated in MDB be commited independently of
the results of the call to EJB1? Moreover -- commited before the call
to EJB1..?
can changes on an entity made inside the MDB be visile inside the call to a EJB1 method with REQUIRES_NEW attribute?
is there a way except BTM or WorkManager to achieve the goal?
You shouldn't have any issues with REQUIRES_NEW.
If you had this:
#EJB(..)
EJB1 ejb1;
#EJB(..)
EJB2 ejb2;
public void onMessage(Message message) {
Thing thing = getThingFromMessage(message);
persistThingStuff(thing);
ejb1.doThingStuffWithRequiresNew(thing);
ejb2.doThingStuffWithRequiresNew(thing);
}
That should Just Work(tm) with one caveat.
If ejb2 throws an exception, ejb1's work will be committed, but persistThingStuff will rollback.
But if you do something like:
public void onMessage(Message message) {
Thing thing = getThingFromMessage(message);
persistThingStuff(thing);
ejb1.doThingStuffWithRequiresNew(thing);
try {
ejb2.doThingStuffWithRequiresNew(thing);
} catch (Throwable t) {
youBetterLogThis();
}
}
That should prevent any exceptions from blowing out the work of the MDB, however if EJB2 is long running, any work done by the MDB is still pending and in an open transaction, waiting for it.
To get around a lot of these thing, we have a utility EJB function:
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public <T extends Runnable> T runInTransaction(T runner) {
runner.run();
return runner;
}
Then we don't necessarily have to annotate specific methods for this.
#EJB(...)
UtilEJB utilEjb;
public void onMessage(Message message) {
final Thing thing = getThingFromMessage(message);
utilEjb.runInTransaction(new Runnable() {
public void run() {
persistThingStuff(thing);
}
});
ejb1.doThingStuffWithRequiresNew(thing);
try {
ejb2.doThingStuffWithRequiresNew(thing);
} catch (Throwable t) {
youBetterLogThis();
}
}
This will commit the MDB work immediately, even before EJB1.

Resources