I read spring-kafka/kafka documentation back and forth, and still cannot find a way, how to do proper transactional behavior with error recovering. I believe this is not trivial question, so please read until end. I believe whole this question revolves around finding way how to reposition over failing record or how to ack in error handler. But mabye there are better ways, I don't know.
So records are flowing in, and some of them are invalid. What I would like to have as a minimal solution is(in which I will then fix sevaral problems you probably see as well):
1) we cannot afford the luxury of stopping the production in case of some trivial mishap, like one or few invalid records. Thus if there is invalid record in kafka topic, I would like to log it, or resend it to different queue, but then proceed with processing following records.
2) there are permanent and temporary failures. Permanent failure is record unable to deserialize, record failing data validation. In this case, I'd like to skip the invalid record, as discussed in 1). Temporary failure might be some specific exception or state, like for example database connection errors, network issues etc. In this case, we do not want to skip failing record, we want to retry, after some delay.
Subject of this question is ONLY implementing skip/don't skip behavior.
Lets say, that this is our starting point:
private Map<String, Object> createKafkaConsumerFactoryProperties(String bootstrapServers, String groupId, Class<?> valueDeserializerClass) {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDeserializerClass);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
return props;
}
#Bean(name="SomeFactory")
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory(
#Value("${…}") String bootstrapServers,
#Value("${…}") String groupId) {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
ConsumerFactory<String, String> consumerFactory = new DefaultKafkaConsumerFactory<>(
createKafkaConsumerFactoryProperties(bootstrapServers, groupId, AvroDeserializer.class),
new StringDeserializer(),
new AvroDeserializer(SomeClass.class));
factory.setConsumerFactory(consumerFactory);
// factory.setConcurrency(2);
// factory.setBatchListener(true);
return factory;
}
and we have listener like:
#KafkaListener(topics = "${…}", containerFactory = "SomeFactory")
public void receive(#Valid List<SomeClass> messageList) {/*logic*/}
Now how this behave if I understand correctly:
when listener gets message, ~when we reach inside of receive method, the kafka message will be already acked, and if receive method throw an exception, the next poll will return following record. Because ack happened, and we do not have error handler defined, thus logging error handler will kick in. This is not necessarily what we want. We can use SeekToCurrentErrorHandler to reprocess the message. Or one can specify TransactionManager, and if exception 'leaks' from listener, repositioning will also happen. If someone know performance comparison of these two approaches, please tell me.
when message cannot be deserialized, deserializer will fail, message will not be acked and same record will be polled again. This is some sort of "poison packet" since kafka will spin on this message indefinitelly. We do have retry.backoff.ms to at least slow it down, but I can't see any max number retries or something. So the best thing we can do is to stop/pause container in this situation. Which is way to harsh. Btw. I'm new to kafka/spring-kafka, I did not see anywhere mention, how to manually reposition offset from outside of an application, meaning OK, listener is down, but now what? Another solution would be not to fail deserializer, and return something. But what?? KafkaNull, great, but then our listener will fail because SomeClass ClassCastException. We can send some artificial value of SomeClass, which is again horrible, because this is not a data what we actually get. Also this is architectonically incorrect.
or we can use repositioning error handler, which would be great, well if we know how to do that. I need to seek to next record. But while documentation says, that ErrorHandler should communicate which record caused the failure, it seems that it fails to do so. So even in non-batch listener I have list of records(1 failed + bunch of unprocessed), and have no idea where set offset to.
So what is the solution to this madness?
Well the best I can come up with right now is pretty ugly: do not fail in deserializer (bad), do not accept specific type in listener (bad), filter out KafkaNulls manually (bad) and finally trigger bean validation manually (bad). Is there a better way? Thanks for examplantion, I'd be grateful for every hint or direction given how to achieve this.
See the documentation for the upcoming 2.2 release (due tomorrow).
The DefaultAfterRollbackProcessor (when using transactions) and SeekToCurrentErrorHandler (when not using transactions) can now recover (skip) records that keep failing, and will do so after 10 failures, by default. They can be configured to publish failed records to a dead-letter topic.
Also see the Error Handling Deserializer which catches deserialization problems and passes them to the container so they can be sent to the error handler.
Related
I am using KafkaMessageListenerContainer with (KafkaAdapter).
How can I "nack" offsets in case of specific error, so the next poll() will take them again?
properties.setAckMode(ContainerProperties.AckMode.BATCH);
final KafkaMessageListenerContainer<String, String> kafkaContainer = new KafkaMessageListenerContainer<>(consumerFactory , properties);
kafkaContainer.setCommonErrorHandler(new CommonErrorHandler() {
#Override
public void handleBatch(Exception thrownException, ConsumerRecords<?, ?> data, Consumer<?, ?> consumer, MessageListenerContainer container, Runnable invokeListener) {
CommonErrorHandler.super.handleBatch(thrownException, data, consumer, container, invokeListener);
}
});
Inside handleBatch I am detecting the exception, for that specific exception I would like to do nack.
Tried to throw from there RuntimeException.
using springboot 2.7
Use the DefaultErrorHandler - it does exactly that (the whole batch is retried according to the back off). You can classify which exceptions are retryable or not.
If you throw a BatchListenerFailedException you can specify exactly which record in the batch had the failure and only retry it (and the following records).
EDIT
If any other type of exception is thrown, the DefaultErrorHandler falls back to using a FallbackBatchErrorHandler which calls ErrorHandlingUtils.retryBatch() which, pauses the consumer and redelivers the whole batch without seeking and re-polling (the polls within the loop return no records because the consumer is paused).
See the documentation. https://docs.spring.io/spring-kafka/docs/current/reference/html/#retrying-batch-eh
This is required, because there is no guarantee that the batch will be fetched in the same order after a seek.
This is because we need to know the state of the batch (how many times we have retried). We can't do that if the batch keeps changing; hence the algorithm I described above.
To retry indefinitely you can, for example, use a FixedBackOff with Long.MAX_VALUE in the maxAttempts property. Or use an ExponentialBackOff with no termination.
Just be sure that the largest back off (and time to process a batch) is significantly less than max.poll.interval.ms to avoid a rebalance.
My team is writing a service that leverages the retryable topics mechanism offered by Spring Kafka (version 2.8.2). Here is a subset of the configuration:
#Bean
public ConsumerFactory<String, UploadMessage> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(
this.springProperties.buildConsumerProperties(),
new StringDeserializer(),
new ErrorHandlingDeserializer<>(new KafkaMessageDeserializer()));
}
#Bean
public RetryTopicConfiguration retryTopicConfiguration(KafkaTemplate<String, Object> kafkaTemplate) {
final var retry = this.applicationProperties.retry();
return RetryTopicConfigurationBuilder.newInstance()
.doNotAutoCreateRetryTopics()
.suffixTopicsWithIndexValues()
.maxAttempts(retry.attempts())
.exponentialBackoff(retry.initialDelay(), retry.multiplier(), retry.maxDelay())
.dltHandlerMethod(DeadLetterTopicProcessor.ENDPOINT_HANDLER_METHOD)
.create(kafkaTemplate);
}
KafkaMessageDeserializer is a custom deserialiser that decodes protobuf-encoded messages and may throw a SerializationException in case of a failure. This exception is correctly captured and transformed into a DeserializationException by Spring Kafka. What I find a bit confusing is that the intercepted poison pill message then hits all of the retry topics before eventually reaching the dead letter one. Obviously it fails with exactly the same error at every step.
I know that RetryTopicConfigurationBuilder::notRetryOn may be used to skip the retry attempts for particular exception types, but what if I want to use exactly the same list of exceptions as in ExceptionClassifier::configureDefaultClassifier? Is there a way to programmatically access this information without basically duplicating the code?
That is a good suggestion; it probably should be the default behavior (or at least optionally).
Please open a feature request on GitHub.
There is a, somewhat, related discussion here: https://github.com/spring-projects/spring-kafka/discussions/2101
I have came across a requirement where i want axon to wait untill all events in the eventbus fired against a particular Command finishes their execution. I will the brief the scenario:
I have a RestController which fires below command to create an application entity:
#RestController
class myController{
#PostMapping("/create")
#ResponseBody
public String create(
org.axonframework.commandhandling.gateway.CommandGateway.sendAndWait(new CreateApplicationCommand());
System.out.println(“in myController:: after sending CreateApplicationCommand”);
}
}
This command is being handled in the Aggregate, The Aggregate class is annotated with org.axonframework.spring.stereotype.Aggregate:
#Aggregate
class MyAggregate{
#CommandHandler //org.axonframework.commandhandling.CommandHandler
private MyAggregate(CreateApplicationCommand command) {
org.axonframework.modelling.command.AggregateLifecycle.apply(new AppCreatedEvent());
System.out.println(“in MyAggregate:: after firing AppCreatedEvent”);
}
#EventSourcingHandler //org.axonframework.eventsourcing.EventSourcingHandler
private void on(AppCreatedEvent appCreatedEvent) {
// Updates the state of the aggregate
this.id = appCreatedEvent.getId();
this.name = appCreatedEvent.getName();
System.out.println(“in MyAggregate:: after updating state”);
}
}
The AppCreatedEvent is handled at 2 places:
In the Aggregate itself, as we can see above.
In the projection class as below:
#EventHandler //org.axonframework.eventhandling.EventHandler
void on(AppCreatedEvent appCreatedEvent){
// persists into database
System.out.println(“in Projection:: after saving into database”);
}
The problem here is after catching the event at first place(i.e., inside aggregate) the call gets returned to myController.
i.e. The output here is:
in MyAggregate:: after firing AppCreatedEvent
in MyAggregate:: after updating state
in myController:: after sending CreateApplicationCommand
in Projection:: after saving into database
The output which i want is:
in MyAggregate:: after firing AppCreatedEvent
in MyAggregate:: after updating state
in Projection:: after saving into database
in myController:: after sending CreateApplicationCommand
In simple words, i want axon to wait untill all events triggered against a particular command are executed completely and then return to the class which triggered the command.
After searching on the forum i got to know that all sendAndWait does is wait until the handling of the command and publication of the events is finalized, and then i tired with Reactor Extension as well using below but got same results: org.axonframework.extensions.reactor.commandhandling.gateway.ReactorCommandGateway.send(new CreateApplicationCommand()).block();
Can someone please help me out.
Thanks in advance.
What would be best in your situation, #rohit, is to embrace the fact you are using an eventually consistent solution here. Thus, Command Handling is entirely separate from Event Handling, making the Query Models you create eventually consistent with the Command Model (your aggregates). Therefore, you wouldn't necessarily wait for the events exactly but react when the Query Model is present.
Embracing this comes down to building your application such that "yeah, I know my response might not be up to date now, but it might be somewhere in the near future." It is thus recommended to subscribe to the result you are interested in after or before the fact you have dispatched a command.
For example, you could see this as using WebSockets with the STOMP protocol, or you could tap into Project Reactor and use the Flux result type to receive the results as they go.
From your description, I assume you or your business have decided that the UI component should react in the (old-fashioned) synchronous way. There's nothing wrong with that, but it will bite your *ss when it comes to using something inherently eventually consistent like CQRS. You can, however, spoof the fact you are synchronous in your front-end, if you will.
To achieve this, I would recommend using Axon's Subscription Query to subscribe to the query model you know will be updated by the command you will send.
In pseudo-code, that would look a little bit like this:
public Result mySynchronousCall(String identifier) {
// Subscribe to the updates to come
SubscriptionQueryResult<Result> result = QueryGateway.subscriptionQuery(...);
// Issue command to update
CommandGateway.send(...);
// Wait on the Flux for the first result, and then close it
return result.updates()
.next()
.map(...)
.timeout(...)
.doFinally(it -> result.close());
}
You could see this being done in this sample WebFluxRest class, by the way.
Note that you are essentially closing the door to the front-end to tap into the asynchronous goodness by doing this. It'll work and allow you to wait for the result to be there as soon as it is there, but you'll lose some flexibility.
I have a requirement where I will be receiving a batch of records. I have to disassemble and insert the data into DB which I have completed. But I don't want any message to come out of the pipeline except the last custom made message.
I have extended FFDasm and called Disassembler(), then we have GetNext() which is returning every debatched message out and they are failing as there is subscribers. I want to send nothing out from GetNext() until Last message.
Please help if anyone have already implemented this requirement. Thanks!
If you want to send only one message on the GetNext, you have to call on Disassemble method to the base Disassemble and get all the messages (you can enqueue this messages to manage them on GetNext) as:
public new void Disassemble(IPipelineContext pContext, IBaseMessage pInMsg)
{
try
{
base.Disassemble(pContext, pInMsg);
IBaseMessage message = base.GetNext(pContext);
while (message != null)
{
// Only store one message
if (this.messagesCount == 0)
{
// _message is a Queue<IBaseMessage>
this._messages.Enqueue(message);
this.messagesCount++;
}
message = base.GetNext(pContext);
}
}
catch (Exception ex)
{
// Manage errors
}
Then on GetNext method, you have the queue and you can return whatever you want:
public new IBaseMessage GetNext(IPipelineContext pContext)
{
return _messages.Dequeue();
}
The recommended approach is to publish messages after disassemble stage to BizTalk message box db and use a db adapter to insert into database. Publishing messages to message box and using adapter will provide you more options on design/performance and will decouple your DB insert from receive logic. Also in future if you want to reuse the same message for something else, you would be able to do so.
Even then for any reason if you have to insert from pipeline component then do the following:
Please note, GetNext() method of IDisassembler interface is not invoked until Disassemble() method is complete. Based on this, you can use following approach assuming you have encapsulated FFDASM within your own custom component:
Insert all disassembled messages in disassemble method itself and enqueue only the last message to a Queue class variable. In GetNext() message then return the Dequeued message, when Queue is empty return null. You can optimize the DB insert by inserting multiple rows at a time and saving them in batches depending on volume. Please note this approach may encounter performance issues depending on the size of file and number of rows being inserted into db.
I am calling DBInsert SP from GetNext()
Oh...so...sorry to say, but you're doing it wrong and actually creating a bunch of problems doing this. :(
This is a very basic scenario to cover with BizTalk Server. All you need is:
A Pipeline Component to Promote BTS.InterchageID
A Sequential Convoy Orchestration Correlating on BTS.InterchangeID and using Ordered Delivery.
In the Orchestration, call the SP, transform to SOAP, call the SOAP endpoint, whatever you need.
As you process the Messages, check for BTS.LastInterchagneMessage, then perform your close out logic.
To be 100% clear, there are no practical 'performance' issues here. By guessing about 'performance' you've actually created the problem you were thinking to solve, and created a bunch of support issues for later on, sorry again. :( There is no reason to not use an Orchestration.
As noted, 25K records isn't a lot. Be sure to have the Receive Location and Orchestration in different Hosts.
I have a Kafka consumer implemented as :
#KafkaListener(topics="...", group-id="....")
public void doProcessing(#Payload String data, #Headers Map<String, Object> headers)
{
//read one of the headers and get a unique id pertaining for thread
//set that header value in MDC
String messageUniqueIdentifier=headers.get("myRequestIdentifierKey");
MDC.put("myRequestIdentifierKey",messageUniqueIdentifier)
log.info("logging just to see if the unique identifier comes in the logs or not);
//do some processing
}
Is this a safe approach? Is it always gauranteed that the same thread will
service one message in the consumer?
It's not clear what you are asking. If you have concurrency there will be more than one thread but each message will be processed on one thread (as long as your listener doesn't hand off to another thread).
So, as long as you set the MDC in the listener each time and don't hand off to another thread; it will work.
If you have only one thread, the same thread will be used for every message (unless the container is stopped and restarted, in which case it will get a new thread).