Deserialisation error and logging the partition, topic and offset - spring-kafka

I am handling deserialisation error using the ErrorHandlingDeserialiser sent on my DefaultKafkaConsumerFactory.
I have code a custom
try (ErrorHandlingDeserializer<MyEvent> errorHandlingDeserializer = new ErrorHandlingDeserializer<>(theRealDeserialiser)) {
errorHandlingDeserializer.setFailedDeserializationFunction(myCustomFunction::apply);
return new DefaultKafkaConsumerFactory<>(getConsumerProperties(), consumerKeyDeserializer, errorHandlingDeserializer);
}
My custom function does some processing and publishes to a poison pill topic and returns null.
When a deserialisation error occurs, I would like to log the topic, partition and offset. The only way I can think of doing this is to stop returning null in the function and return a new sub type of MyEvent. My KafkaListener could then interrogate the new sub type.
I have a #KafkaListener component, which listens for the ConsumerRecord as follows:
#KafkaListner(....)
public void onMessage(ConsumerRecord<String, MyEvent> record) {
...
...
// if record.value instance of MyNewSubType
// I have access to the topic, partition and offset here, so I could log it here
// I'd have to check that the instance of MyEvent is actually my sub type representing a failed record.
}
Is this the way to do it? I know null has special meaning Kafka.
The downside of this sub type approach is, I'd have to create a subtype every type using the ErrorHandlingDeserialiser.

Don't use a function; instead, the thrown DeserializationException is passed directly the container's ErrorHandler.
The SeekToCurrentErrorHandler considers these exceptions to be fatal and won't retry them, it passes the record to the recoverer.
There is a provided DeadLetterPublishingRecoverer which sends the record.
See https://docs.spring.io/spring-kafka/docs/current/reference/html/#annotation-error-handling
and
https://docs.spring.io/spring-kafka/docs/current/reference/html/#dead-letters

Related

Spring Kafka - "ErrorHandler threw an exception" and lost some records

Having Consumer polling 2 records at a time, i.e.:
#Bean
ConsumerFactory<String, String> consumerFactory() {
Map<String, Object> config = Map.of(
BOOTSTRAP_SERVERS_CONFIG, "localhost:9092",
GROUP_ID_CONFIG, "my-consumers",
AUTO_OFFSET_RESET_CONFIG, "earliest",
MAX_POLL_RECORDS_CONFIG, 2);
return new DefaultKafkaConsumerFactory<>(config, new StringDeserializer(), new StringDeserializer());
}
and ErrorHandler which can fail handling faulty record:
class MyListenerErrorHandler implements ContainerAwareErrorHandler {
#Override
public void handle(Exception thrownException,
List<ConsumerRecord<?, ?>> records,
Consumer<?, ?> consumer,
MessageListenerContainer container) {
simulateBugInErrorHandling(records.get(0));
skipFailedRecord(); // seek offset+1, which never happens
}
private void simulateBugInErrorHandling(ConsumerRecord<?, ?> record) {
throw new NullPointerException(
"DB transaction failed when saving info about failure on offset = " + record.offset());
}
}
Then such scenario is possible:
Topic gets 3 records
Consumer polls 2 records at a time
MessageListener fails to process the first record due to faulty payload
ErrorHandler fails to process the failure and itself throws an exception, e.g. due to some temporary issue
Third record gets processed
Second record is never processed (never enters MessageListener)
How to ensure no record is left unprocessed when ErrorHandler throws an exception with above scenario?
My goal is to achieve stateful retry logic with delays, but for brevity I omitted code responsible for tracking failed records and delaying retry.
I'd expect that after ErrorHandler throws an exception, skipping an entire batch of records should not happen. But it does.
Is it correct behavior?
Should I rather deal with commits manually that use Spring/Kafka defaults?
Should I use different ErrorHandler or handle method? (I need an access to Container to make a pause() for delayed retry logic; cannot use Thread.sleep())
Somehow related issue: https://github.com/spring-projects/spring-kafka/issues/1265
Full code: https://github.com/ptomaszek/spring-kafka-error-handler
The consumer has to be re-positioned (using seeks) in order to re-fetch the records after the failed one.
Use a DefaultErrorHandler (2.8.x and later) or a SeekToCurrentErrorHandler with earlier versions.
You can add retry options and a recoverer to deal with the failed record; by default it is just logged.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#default-eh
https://docs.spring.io/spring-kafka/docs/2.7.x/reference/html/#seek-to-current
You need to do the seeks first (or in a finally block), before any exceptions can be thrown; the container does not commit the offset if the error handler throws an exception.
Kafka maintains 2 offsets - the current committed offset and the current position (set to the committed offset when the consumer starts). The next poll always returns the next record after the last poll. unless a seek is performed.
The default error handlers catch any exceptions thrown by the recoverer and makes sure that the current (and subsequent) records will be returned by the next poll. See SeekUtils.doSeeks().

How to immediately stop processing new messages when inside a message handler?

I have a Rebus bus setup with a single worker and max parallelism of 1 that processes messages "sequentialy". In case an handler fails, or for specific business reason, I'd like the bus instance to immediately stop processing messages.
I tried using the Rebus.Event package to detect the exception in the AfterMessageHandled handler and set the number of workers to 0, but it seems other messages are processed before it can actually succeed in stoping the single worker instance.
Where in the event processing pipeline could I do
bus.Advanced.Workers.SetNumberOfWorkers(0); in order to prevent further message processing?
I also tried setting the number of workers to 0 inside a catch block in the handler itself, but it doesn't seem like the right place to do it since SetNumberOfWorkers(0) waits for handlers to complete before returning and the caller is the handler... Looks like a some kind of a deadlock to me.
Thank you
This particular situation is a little bit of a dilemma, because – as you've correctly observed – SetNumberOfWorkers is a blocking function, which will wait until the desired number of threads has been reached.
In your case, since you're setting it to zero, it means your message handler needs to finish before the number of threads has reached zero... and then: 💣 ☠🔒
I'm sorry to say this, because I bet your desire to do this is because you're in a pickle somehow – but generally, I must say that wanting to process messages sequentually and in order with message queues is begging for trouble, because there are so many things that can lead to messages being reordered.
But, I think you can solve your problem by installing a transport decorator, which will bypass the real transport when toggled. If the decorator then returns null from the Receive method, it will trigger Rebus' built-in back-off strategy and start chilling (i.e. it will increase the waiting time between polling the transport).
Check this out – first, let's create a simple, thread-safe toggle:
public class MessageHandlingToggle
{
public volatile bool ProcessMessages = true;
}
(which you'll probably want to wrap up and make pretty somehow, but this should do for now)
and then we'll register it as a singleton in the container (assuming Microsoft DI here):
services.AddSingleton(new MessageHandlingToggle());
We'll use the ProcessMessages flag to signal whether message processing should be enabled.
Now, when you configure Rebus, you decorate the transport and give the decorator access to the toggle instance in the container:
services.AddRebus((configure, provider) =>
configure
.Transport(t => {
t.Use(...);
// install transport decorator here
t.Decorate(c => {
var transport = c.Get<ITransport>();
var toggle = provider.GetRequiredService<MessageHandlingToggle>();
return new MessageHandlingToggleTransportDecorator(transport, toggle);
})
})
.(...)
);
So, now you'll just need to build the decorator:
public class MessageHandlingToggleTransportDecorator : ITransport
{
static readonly Task<TransportMessage> NoMessage = Task.FromResult(null);
readonly ITransport _transport;
readonly MessageHandlingToggle _toggle;
public MessageHandlingToggleTransportDecorator(ITransport transport, MessageHandlingToggle toggle)
{
_transport = transport;
_toggle = toggle;
}
public string Address => _transport.Address;
public void CreateQueue(string address) => _transport.CreateQueue(address);
public Task Send(string destinationAddress, TransportMessage message, ITransactionContext context)
=> _transport.Send(destinationAddress, message, context);
public Task<TransportMessage> Receive(ITransactionContext context, CancellationToken cancellationToken)
=> _toggle.ProcessMessages
? _transport.Receive(context, cancellationToken)
: NoMessage;
}
As you can see, it'll just return null when ProcessMessages == false. Only thing left is to decide when to resume processing messages again, pull MessageHandlingToggle from the container somehow (probably by having it injected), and then flick the bool back to true.
I hope can work for you, or at least give you some inspiration on how you can solve your problem. 🙂

when using #StreamListener, customization to KafkaListenerContainerFactory are getting reflected in generated KafkaMessageListenerContainer?

I am using spring-cloud-stream with kafka binder to consume message from kafka . The application is basically consuming messages from kafka and updating a database.
There are scenarios when DB is down (which might last for hours) or some other temporary technical issues. Since in these scenarios there is no point in retrying a message for a limited amount of time and then move it to DLQ , i am trying to achieve infinite number of retries when we are getting certain type of exceptions (e.g. DBHostNotAvaialableException)
In order to achieve this i tried 2 approaches (facing issues in both the approaches) -
In this approach, Tried setting an errorhandler on container properties while configuring ConcurrentKafkaListenerContainerFactory bean but the error handler is not getting triggered at all. While debugging the flow i realized in the KafkaMessageListenerContainer that are created have the errorHandler field is null hence they use the default LoggingErrorHandler. Below are my container factory bean configurations -
the #StreamListener method for this approach is the same as 2nd approach except for the seek on consumer.
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object>
kafkaListenerContainerFactory(ConsumerFactory<String, Object> kafkaConsumerFactory) {
ConcurrentKafkaListenerContainerFactory<String, Object> factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(kafkaConsumerFactory);
factory.getContainerProperties().setAckOnError(false);
ContainerProperties containerProperties = factory.getContainerProperties();
// even tried a custom implementation of RemainingRecordsErrorHandler but call never went in to the implementation
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
return factory;
}
Am i missing something while configuring factory bean or this bean is only relevant for #KafkaListener and not #StreamListener??
The second alternative was trying to achieve it using manual acknowledgement and seek, Inside a #StreamListener method getting Acknowledgment and Consumer from headers, in case a retryable exception is received, I do certain number of retries using retrytemplate and when those are exhausted I trigger a consumer.seek() . Example code below -
#StreamListener(MySink.INPUT)
public void processInput(Message<String> msg) {
MessageHeaders msgHeaders = msg.getHeaders();
Acknowledgment ack = msgHeaders.get(KafkaHeaders.ACKNOWLEDGMENT, Acknowledgment.class);
Consumer<?,?> consumer = msgHeaders.get(KafkaHeaders.CONSUMER, Consumer.class);
Integer partition = msgHeaders.get(KafkaHeaders.RECEIVED_PARTITION_ID, Integer.class);
String topicName = msgHeaders.get(KafkaHeaders.RECEIVED_TOPIC, String.class);
Long offset = msgHeaders.get(KafkaHeaders.OFFSET, Long.class);
try {
retryTemplate.execute(
context -> {
// this is a sample service call to update database which might throw retryable exceptions like DBHostNotAvaialableException
consumeMessage(msg.getPayload());
return null;
}
);
}
catch (DBHostNotAvaialableException ex) {
// once retries as per retrytemplate are exhausted do a seek
consumer.seek(new TopicPartition(topicName, partition), offset);
}
catch (Exception ex) {
// if some other exception just log and put in dlq based on enableDlq property
logger.warn("some other business exception hence putting in dlq ");
throw ex;
}
if (ack != null) {
ack.acknowledge();
}
}
Problem with this approach - since I am doing consumer.seek() while there might be pending records from last poll those might be processed and committed if DB comes up during that period(hence out of order). Is there a way to clear those records while a seek is performed?
PS - we are currently in 2.0.3.RELEASE version of spring boot and Finchley.RELEASE or spring cloud dependencies (hence cannot use features like negative acknowledgement either and upgrade is not possible at this moment).
Spring Cloud Stream does not use a container factory. I already explained that to you in this answer.
Version 2.1 introduced the ListenerContainerCustomizer and if you add a bean of that type it will be called after the container is created.
Spring Boot 2.0 went end-of-life over a year ago and is no longer supported.
The answer I referred you shows how you can use reflection to add an error handler.
Doing the seek in the listener will only work if you have max.poll.records=1.

Calling FluentMigrator methods inside Execute.WithConnection action

Calling FluentMigrator's builder methods while inside the action that I pass to Execute.WithConnection causes a null reference exception to be thrown.
What I am trying to do is select some data so that I may manipulate it in c#, as that is easier than manipulating it in T-SQL, and use the result of my c# operations to update the data or insert new data (to be more specific, I need to pick one query string parameter out of a stored url string and insert it somewhere else).
The only way I see to select data within a migration is to use Execute.WithConnection and retrieve the data myself (FluentMigrator provides no helpers for selecting data), but if I try to use any fluent migrator expression in the action I pass to Execute.WithConnection a null reference exception is thrown.
Here is a boiled down version of my code:
[Migration(1)]
public class MyMigration : Migration
{
public void Up()
{
Execute.WithConnection(CustomDml);
}
public void CustomDml(IDbConnection conn, IDbTransaction tran)
{
var db = new NPoco.Database(conn).SetTransaction(tran); // NPoco is a micro-ORM, a fork of PetaPoco
var records = db.Fetch<Record>("-- some sql"); // this is immediately evaluated, no reader is left open
foreach (var r in records) {
var newValue = Manipulate(r.OriginalValue);
Insert.IntoTable("NewRecords").Row(new { OriginalValueId = r.Id, NewValue = newValue }); // <-- this line causes the exception
}
}
public void Down() {}
}
The line that calls Inser.IntoTable causes a null exception to be thrown from line 36 of FluentMigrator\Builders\Insert\InsertExpressionRoot.cs - it appears that the _context variable may be null at this point but I do not understand why this is. (when testing Create.Table, e.g., it occurs on line 49 of FluentMigrator\Builders\Create\CreateExpressionRoot.cs)
Any help would be appreciated. Perhaps there is disagreement on whether DML is appropriate in a migration, and I am open to suggestions, but this scenario has come up twice this week alone. For now I am simply performing the insert using my micro-ORM within the action rather than FluentMigrator and that does work, but it seems like what I am trying to do should work.
When using the Execute.WithConnection expression all you get is the db connection and the transaction.
Using Execute.WithConnection creates an PerformDBOperationExpression expression. When processing the expression, a processor calls the Operation property (an example in the SqlServerProcessor) and the processor does not have a reference to the MigrationContext. But even if it did have access to the MigrationContext, when FluentMigrator has come to the processing stage, it is already too late. You would be trying to process expressions in a expression and at the moment FluentMigrator is not built to handle that type of nesting.
An alternative would be to make the connection string available in the migration context, see this issue: https://github.com/schambers/fluentmigrator/issues/240
Would that be a better approach?

Code Analysis warning CA2000: Call Dispose on object 'new ContainerControlledLifetimeManager()'

I'm getting a code analysis warning on some of my unit tests:
WidgetManagerTests.cs (40): CA2000 :
Microsoft.Reliability : In method
'WidgetManagerTests.TestInitialize()',
call System.IDisposable.Dispose on
object 'new
ContainerControlledLifetimeManager()'
before all references to it are out of
scope.
I'm using Unity and Moq, this is the offending line:
var loggingServiceMock = new Mock<ILoggingService>();
this.unityContainer.RegisterInstance<ILoggingService>(loggingServiceMock.Object, new ContainerControlledLifetimeManager());
The CA2000 implementation is very sensitive to cases where an exception might be thrown before a disposable instance is "handed off" to another method. In this case, even though the container will eventually take care of cleaning up the lifetime manager if no exceptions occur during registration, it's possible an exception to occur either before the RegisterInstance call or within the call but before the container add the lifetime manager to its own internal state.
To address this possibility, you could use code like the following (although I probably wouldn't bother with this myself unless the disposition did something significant):
var loggingServiceMock = new Mock<ILoggingService>();
var lifetimeManager = new ContainerControlledLifetimeManager();
try
{
this.unityContainer.RegisterInstance<ILoggingService>(loggingServiceMock.Object, lifetimeManager);
}
catch
{
lifetimeManager.Dispose();
throw;
}

Resources