How to avoid hitting all retryable topics for fatal-by-default exceptions? - spring-kafka

My team is writing a service that leverages the retryable topics mechanism offered by Spring Kafka (version 2.8.2). Here is a subset of the configuration:
#Bean
public ConsumerFactory<String, UploadMessage> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(
this.springProperties.buildConsumerProperties(),
new StringDeserializer(),
new ErrorHandlingDeserializer<>(new KafkaMessageDeserializer()));
}
#Bean
public RetryTopicConfiguration retryTopicConfiguration(KafkaTemplate<String, Object> kafkaTemplate) {
final var retry = this.applicationProperties.retry();
return RetryTopicConfigurationBuilder.newInstance()
.doNotAutoCreateRetryTopics()
.suffixTopicsWithIndexValues()
.maxAttempts(retry.attempts())
.exponentialBackoff(retry.initialDelay(), retry.multiplier(), retry.maxDelay())
.dltHandlerMethod(DeadLetterTopicProcessor.ENDPOINT_HANDLER_METHOD)
.create(kafkaTemplate);
}
KafkaMessageDeserializer is a custom deserialiser that decodes protobuf-encoded messages and may throw a SerializationException in case of a failure. This exception is correctly captured and transformed into a DeserializationException by Spring Kafka. What I find a bit confusing is that the intercepted poison pill message then hits all of the retry topics before eventually reaching the dead letter one. Obviously it fails with exactly the same error at every step.
I know that RetryTopicConfigurationBuilder::notRetryOn may be used to skip the retry attempts for particular exception types, but what if I want to use exactly the same list of exceptions as in ExceptionClassifier::configureDefaultClassifier? Is there a way to programmatically access this information without basically duplicating the code?

That is a good suggestion; it probably should be the default behavior (or at least optionally).
Please open a feature request on GitHub.
There is a, somewhat, related discussion here: https://github.com/spring-projects/spring-kafka/discussions/2101

Related

KafkaMessageListenerContainer how to do nack for specific error

I am using KafkaMessageListenerContainer with (KafkaAdapter).
How can I "nack" offsets in case of specific error, so the next poll() will take them again?
properties.setAckMode(ContainerProperties.AckMode.BATCH);
final KafkaMessageListenerContainer<String, String> kafkaContainer = new KafkaMessageListenerContainer<>(consumerFactory , properties);
kafkaContainer.setCommonErrorHandler(new CommonErrorHandler() {
#Override
public void handleBatch(Exception thrownException, ConsumerRecords<?, ?> data, Consumer<?, ?> consumer, MessageListenerContainer container, Runnable invokeListener) {
CommonErrorHandler.super.handleBatch(thrownException, data, consumer, container, invokeListener);
}
});
Inside handleBatch I am detecting the exception, for that specific exception I would like to do nack.
Tried to throw from there RuntimeException.
using springboot 2.7
Use the DefaultErrorHandler - it does exactly that (the whole batch is retried according to the back off). You can classify which exceptions are retryable or not.
If you throw a BatchListenerFailedException you can specify exactly which record in the batch had the failure and only retry it (and the following records).
EDIT
If any other type of exception is thrown, the DefaultErrorHandler falls back to using a FallbackBatchErrorHandler which calls ErrorHandlingUtils.retryBatch() which, pauses the consumer and redelivers the whole batch without seeking and re-polling (the polls within the loop return no records because the consumer is paused).
See the documentation. https://docs.spring.io/spring-kafka/docs/current/reference/html/#retrying-batch-eh
This is required, because there is no guarantee that the batch will be fetched in the same order after a seek.
This is because we need to know the state of the batch (how many times we have retried). We can't do that if the batch keeps changing; hence the algorithm I described above.
To retry indefinitely you can, for example, use a FixedBackOff with Long.MAX_VALUE in the maxAttempts property. Or use an ExponentialBackOff with no termination.
Just be sure that the largest back off (and time to process a batch) is significantly less than max.poll.interval.ms to avoid a rebalance.

transactional behavior in spring-kafka

I read spring-kafka/kafka documentation back and forth, and still cannot find a way, how to do proper transactional behavior with error recovering. I believe this is not trivial question, so please read until end. I believe whole this question revolves around finding way how to reposition over failing record or how to ack in error handler. But mabye there are better ways, I don't know.
So records are flowing in, and some of them are invalid. What I would like to have as a minimal solution is(in which I will then fix sevaral problems you probably see as well):
1) we cannot afford the luxury of stopping the production in case of some trivial mishap, like one or few invalid records. Thus if there is invalid record in kafka topic, I would like to log it, or resend it to different queue, but then proceed with processing following records.
2) there are permanent and temporary failures. Permanent failure is record unable to deserialize, record failing data validation. In this case, I'd like to skip the invalid record, as discussed in 1). Temporary failure might be some specific exception or state, like for example database connection errors, network issues etc. In this case, we do not want to skip failing record, we want to retry, after some delay.
Subject of this question is ONLY implementing skip/don't skip behavior.
Lets say, that this is our starting point:
private Map<String, Object> createKafkaConsumerFactoryProperties(String bootstrapServers, String groupId, Class<?> valueDeserializerClass) {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDeserializerClass);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
return props;
}
#Bean(name="SomeFactory")
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory(
#Value("${…}") String bootstrapServers,
#Value("${…}") String groupId) {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
ConsumerFactory<String, String> consumerFactory = new DefaultKafkaConsumerFactory<>(
createKafkaConsumerFactoryProperties(bootstrapServers, groupId, AvroDeserializer.class),
new StringDeserializer(),
new AvroDeserializer(SomeClass.class));
factory.setConsumerFactory(consumerFactory);
// factory.setConcurrency(2);
// factory.setBatchListener(true);
return factory;
}
and we have listener like:
#KafkaListener(topics = "${…}", containerFactory = "SomeFactory")
public void receive(#Valid List<SomeClass> messageList) {/*logic*/}
Now how this behave if I understand correctly:
when listener gets message, ~when we reach inside of receive method, the kafka message will be already acked, and if receive method throw an exception, the next poll will return following record. Because ack happened, and we do not have error handler defined, thus logging error handler will kick in. This is not necessarily what we want. We can use SeekToCurrentErrorHandler to reprocess the message. Or one can specify TransactionManager, and if exception 'leaks' from listener, repositioning will also happen. If someone know performance comparison of these two approaches, please tell me.
when message cannot be deserialized, deserializer will fail, message will not be acked and same record will be polled again. This is some sort of "poison packet" since kafka will spin on this message indefinitelly. We do have retry.backoff.ms to at least slow it down, but I can't see any max number retries or something. So the best thing we can do is to stop/pause container in this situation. Which is way to harsh. Btw. I'm new to kafka/spring-kafka, I did not see anywhere mention, how to manually reposition offset from outside of an application, meaning OK, listener is down, but now what? Another solution would be not to fail deserializer, and return something. But what?? KafkaNull, great, but then our listener will fail because SomeClass ClassCastException. We can send some artificial value of SomeClass, which is again horrible, because this is not a data what we actually get. Also this is architectonically incorrect.
or we can use repositioning error handler, which would be great, well if we know how to do that. I need to seek to next record. But while documentation says, that ErrorHandler should communicate which record caused the failure, it seems that it fails to do so. So even in non-batch listener I have list of records(1 failed + bunch of unprocessed), and have no idea where set offset to.
So what is the solution to this madness?
Well the best I can come up with right now is pretty ugly: do not fail in deserializer (bad), do not accept specific type in listener (bad), filter out KafkaNulls manually (bad) and finally trigger bean validation manually (bad). Is there a better way? Thanks for examplantion, I'd be grateful for every hint or direction given how to achieve this.
See the documentation for the upcoming 2.2 release (due tomorrow).
The DefaultAfterRollbackProcessor (when using transactions) and SeekToCurrentErrorHandler (when not using transactions) can now recover (skip) records that keep failing, and will do so after 10 failures, by default. They can be configured to publish failed records to a dead-letter topic.
Also see the Error Handling Deserializer which catches deserialization problems and passes them to the container so they can be sent to the error handler.

In Disassembler pipeline component - Send only last message out from GetNext() method

I have a requirement where I will be receiving a batch of records. I have to disassemble and insert the data into DB which I have completed. But I don't want any message to come out of the pipeline except the last custom made message.
I have extended FFDasm and called Disassembler(), then we have GetNext() which is returning every debatched message out and they are failing as there is subscribers. I want to send nothing out from GetNext() until Last message.
Please help if anyone have already implemented this requirement. Thanks!
If you want to send only one message on the GetNext, you have to call on Disassemble method to the base Disassemble and get all the messages (you can enqueue this messages to manage them on GetNext) as:
public new void Disassemble(IPipelineContext pContext, IBaseMessage pInMsg)
{
try
{
base.Disassemble(pContext, pInMsg);
IBaseMessage message = base.GetNext(pContext);
while (message != null)
{
// Only store one message
if (this.messagesCount == 0)
{
// _message is a Queue<IBaseMessage>
this._messages.Enqueue(message);
this.messagesCount++;
}
message = base.GetNext(pContext);
}
}
catch (Exception ex)
{
// Manage errors
}
Then on GetNext method, you have the queue and you can return whatever you want:
public new IBaseMessage GetNext(IPipelineContext pContext)
{
return _messages.Dequeue();
}
The recommended approach is to publish messages after disassemble stage to BizTalk message box db and use a db adapter to insert into database. Publishing messages to message box and using adapter will provide you more options on design/performance and will decouple your DB insert from receive logic. Also in future if you want to reuse the same message for something else, you would be able to do so.
Even then for any reason if you have to insert from pipeline component then do the following:
Please note, GetNext() method of IDisassembler interface is not invoked until Disassemble() method is complete. Based on this, you can use following approach assuming you have encapsulated FFDASM within your own custom component:
Insert all disassembled messages in disassemble method itself and enqueue only the last message to a Queue class variable. In GetNext() message then return the Dequeued message, when Queue is empty return null. You can optimize the DB insert by inserting multiple rows at a time and saving them in batches depending on volume. Please note this approach may encounter performance issues depending on the size of file and number of rows being inserted into db.
I am calling DBInsert SP from GetNext()
Oh...so...sorry to say, but you're doing it wrong and actually creating a bunch of problems doing this. :(
This is a very basic scenario to cover with BizTalk Server. All you need is:
A Pipeline Component to Promote BTS.InterchageID
A Sequential Convoy Orchestration Correlating on BTS.InterchangeID and using Ordered Delivery.
In the Orchestration, call the SP, transform to SOAP, call the SOAP endpoint, whatever you need.
As you process the Messages, check for BTS.LastInterchagneMessage, then perform your close out logic.
To be 100% clear, there are no practical 'performance' issues here. By guessing about 'performance' you've actually created the problem you were thinking to solve, and created a bunch of support issues for later on, sorry again. :( There is no reason to not use an Orchestration.
As noted, 25K records isn't a lot. Be sure to have the Receive Location and Orchestration in different Hosts.

How can I handle exceptions in WebAPI 2. methods?

In my WebAPI 2.1 application I am processing inserts like this:
[Route("Post")]
public async Task<IHttpActionResult> Post([FromBody]City city)
{
if (!ModelState.IsValid)
{
return BadRequest(ModelState);
}
try
{
db.Exams.Add(city);
await db.SaveChangesAsync(User, DateTime.UtcNow);
}
catch (Exception ex)
{
var e = ex;
return BadRequest("Error: City not created");
}
return Ok(city);
}
I use something similar for updates and deletes. I think I should do something more meaningfull with
the exceptions but I am not sure where to start with how to handle these. I looked at elmah but it said it deals with unhandled exceptions. Does this mean I should not be catching exceptions like this?
Can someone give me some
pointers as to if what I have is okay and also should I be logging exceptions and how?
What you are doing is not "bad", it's just a bit verbose and won't scale well if you have many try/catch blocks all over your code. When an exception is raised, you decide what to do so, returning a bad request response is fine if it's really a bad request. There are many things you can return, depending on what went wrong. But you have to handle what to do when exceptions are thrown all over your code, so maintaining this logic scattered all over your code can quickly become a problem.
It's better to utilise asp.net web api's exception handling framework. For example, take a look at these articles:
http://www.asp.net/web-api/overview/web-api-routing-and-actions/exception-handling
http://www.asp.net/web-api/overview/web-api-routing-and-actions/web-api-global-error-handling
The idea is to centralise your logic in a global exception handler and to use that as the only place in your code where you have to worry about this. The rest of your code will be throwing exceptions and everything will be coming through your exception handler/filter/etc., depending on what you decide.
For example, I have created my own exception types (e.g. CustomExceptionA, CustomExceptionB, etc.) and when I throw an exception of a type I know exactly how to handle it in one place and perform a certain bit of logic. If I want to change the way I handle a particular exception type, then there's only one place I have to make a change and the rest of the code will be unaffected.
The second article link above also includes a global exception logger to log such exceptions.

Basic SignalR IMessageBus implementation

I have a service application that works pretty much like a SignalR backplane, so I thought it would be good idea to create my own IMessageBus implementation to talk with the backend, rather than roll out my own thing. The problem is that I cannot find much information about this contract. Although I have been taking a look at the code (that looks very good), I'm struggling to understand some concepts.
public interface IMessageBus
{
Task Publish(Message message);
IDisposable Subscribe(ISubscriber subscriber, string cursor, Func<MessageResult, object, Task<bool>> callback, int maxMessages, object state);
}
Task Publish(Message message);
This one is easy, basically it must send a message to the backend. I am not worried about this one, because my app is unidirectional from server to client.
IDisposable Subscribe(ISubscriber subscriber, string cursor, Func<MessageResult, object, Task<bool>> callback, int maxMessages, object state);
return: Despite of saying IDisposable, I have seen it always return a Subscription object, but why IDisposable?
subscriber identifies a connection. That connection can subscribe or unsubscribe to groups.
cursor: is the last received message id.
callback: when is this callback executed?
state: what is this exactly?
Can somebody explain me how this method work?
I would recommend to inherit from ScaleoutMessageBus (https://msdn.microsoft.com/en-us/library/microsoft.aspnet.signalr.messaging.scaleoutmessagebus(v=vs.111).aspx)
It provides an abstraction and encapsulates all subscription management, so it is possible to focus on a back plane implementation.
You can also take a look on Redis base implementation (https://github.com/SignalR/SignalR/blob/master/src/Microsoft.AspNet.SignalR.Redis/RedisMessageBus.cs) just as example.
If it is interesting SignalR is open source, so you can look at ScaleoutMessageBus implementation as well (https://github.com/SignalR/SignalR/blob/master/src/Microsoft.AspNet.SignalR.Core/Messaging/ScaleoutMessageBus.cs)
Hope that helps.

Resources