KafkaMessageListenerContainer how to do nack for specific error - spring-kafka

I am using KafkaMessageListenerContainer with (KafkaAdapter).
How can I "nack" offsets in case of specific error, so the next poll() will take them again?
properties.setAckMode(ContainerProperties.AckMode.BATCH);
final KafkaMessageListenerContainer<String, String> kafkaContainer = new KafkaMessageListenerContainer<>(consumerFactory , properties);
kafkaContainer.setCommonErrorHandler(new CommonErrorHandler() {
#Override
public void handleBatch(Exception thrownException, ConsumerRecords<?, ?> data, Consumer<?, ?> consumer, MessageListenerContainer container, Runnable invokeListener) {
CommonErrorHandler.super.handleBatch(thrownException, data, consumer, container, invokeListener);
}
});
Inside handleBatch I am detecting the exception, for that specific exception I would like to do nack.
Tried to throw from there RuntimeException.
using springboot 2.7

Use the DefaultErrorHandler - it does exactly that (the whole batch is retried according to the back off). You can classify which exceptions are retryable or not.
If you throw a BatchListenerFailedException you can specify exactly which record in the batch had the failure and only retry it (and the following records).
EDIT
If any other type of exception is thrown, the DefaultErrorHandler falls back to using a FallbackBatchErrorHandler which calls ErrorHandlingUtils.retryBatch() which, pauses the consumer and redelivers the whole batch without seeking and re-polling (the polls within the loop return no records because the consumer is paused).
See the documentation. https://docs.spring.io/spring-kafka/docs/current/reference/html/#retrying-batch-eh
This is required, because there is no guarantee that the batch will be fetched in the same order after a seek.
This is because we need to know the state of the batch (how many times we have retried). We can't do that if the batch keeps changing; hence the algorithm I described above.
To retry indefinitely you can, for example, use a FixedBackOff with Long.MAX_VALUE in the maxAttempts property. Or use an ExponentialBackOff with no termination.
Just be sure that the largest back off (and time to process a batch) is significantly less than max.poll.interval.ms to avoid a rebalance.

Related

How to avoid hitting all retryable topics for fatal-by-default exceptions?

My team is writing a service that leverages the retryable topics mechanism offered by Spring Kafka (version 2.8.2). Here is a subset of the configuration:
#Bean
public ConsumerFactory<String, UploadMessage> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(
this.springProperties.buildConsumerProperties(),
new StringDeserializer(),
new ErrorHandlingDeserializer<>(new KafkaMessageDeserializer()));
}
#Bean
public RetryTopicConfiguration retryTopicConfiguration(KafkaTemplate<String, Object> kafkaTemplate) {
final var retry = this.applicationProperties.retry();
return RetryTopicConfigurationBuilder.newInstance()
.doNotAutoCreateRetryTopics()
.suffixTopicsWithIndexValues()
.maxAttempts(retry.attempts())
.exponentialBackoff(retry.initialDelay(), retry.multiplier(), retry.maxDelay())
.dltHandlerMethod(DeadLetterTopicProcessor.ENDPOINT_HANDLER_METHOD)
.create(kafkaTemplate);
}
KafkaMessageDeserializer is a custom deserialiser that decodes protobuf-encoded messages and may throw a SerializationException in case of a failure. This exception is correctly captured and transformed into a DeserializationException by Spring Kafka. What I find a bit confusing is that the intercepted poison pill message then hits all of the retry topics before eventually reaching the dead letter one. Obviously it fails with exactly the same error at every step.
I know that RetryTopicConfigurationBuilder::notRetryOn may be used to skip the retry attempts for particular exception types, but what if I want to use exactly the same list of exceptions as in ExceptionClassifier::configureDefaultClassifier? Is there a way to programmatically access this information without basically duplicating the code?
That is a good suggestion; it probably should be the default behavior (or at least optionally).
Please open a feature request on GitHub.
There is a, somewhat, related discussion here: https://github.com/spring-projects/spring-kafka/discussions/2101

How to pause a specific kafka consumer thread when concurrency is set to more than 1?

I am using spring-kafka 2.2.8 and setting concurrency to 2 as shown below and trying to understand how do i pause an consumer thread/instance when particular condition is met.
#KafkaListener(id = "myConsumerId", topics = "myTopic", concurrency=2)
public void listen(String in) {
System.out.println(in);
}
Now, I've two questions.
Would my consumer span two different poll threads to poll the records?
If i'm setting an id to the consumer as shown above. How can i pause a specific consumer thread (with concurrency set to more than 1).
Please suggest.
Use the KafkaListenerEndpointRegistry.getListenerContainer(id) method to get a reference to the container.
Cast it to a ConcurrentMessageListenerContainer and call getContainers() to get a list of the child KafkaMessageListenerContainers; you can then pause/resume them individually.
You can determine which topics/partitions each one has using getAssignedPartitions().

transactional behavior in spring-kafka

I read spring-kafka/kafka documentation back and forth, and still cannot find a way, how to do proper transactional behavior with error recovering. I believe this is not trivial question, so please read until end. I believe whole this question revolves around finding way how to reposition over failing record or how to ack in error handler. But mabye there are better ways, I don't know.
So records are flowing in, and some of them are invalid. What I would like to have as a minimal solution is(in which I will then fix sevaral problems you probably see as well):
1) we cannot afford the luxury of stopping the production in case of some trivial mishap, like one or few invalid records. Thus if there is invalid record in kafka topic, I would like to log it, or resend it to different queue, but then proceed with processing following records.
2) there are permanent and temporary failures. Permanent failure is record unable to deserialize, record failing data validation. In this case, I'd like to skip the invalid record, as discussed in 1). Temporary failure might be some specific exception or state, like for example database connection errors, network issues etc. In this case, we do not want to skip failing record, we want to retry, after some delay.
Subject of this question is ONLY implementing skip/don't skip behavior.
Lets say, that this is our starting point:
private Map<String, Object> createKafkaConsumerFactoryProperties(String bootstrapServers, String groupId, Class<?> valueDeserializerClass) {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDeserializerClass);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
return props;
}
#Bean(name="SomeFactory")
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory(
#Value("${…}") String bootstrapServers,
#Value("${…}") String groupId) {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
ConsumerFactory<String, String> consumerFactory = new DefaultKafkaConsumerFactory<>(
createKafkaConsumerFactoryProperties(bootstrapServers, groupId, AvroDeserializer.class),
new StringDeserializer(),
new AvroDeserializer(SomeClass.class));
factory.setConsumerFactory(consumerFactory);
// factory.setConcurrency(2);
// factory.setBatchListener(true);
return factory;
}
and we have listener like:
#KafkaListener(topics = "${…}", containerFactory = "SomeFactory")
public void receive(#Valid List<SomeClass> messageList) {/*logic*/}
Now how this behave if I understand correctly:
when listener gets message, ~when we reach inside of receive method, the kafka message will be already acked, and if receive method throw an exception, the next poll will return following record. Because ack happened, and we do not have error handler defined, thus logging error handler will kick in. This is not necessarily what we want. We can use SeekToCurrentErrorHandler to reprocess the message. Or one can specify TransactionManager, and if exception 'leaks' from listener, repositioning will also happen. If someone know performance comparison of these two approaches, please tell me.
when message cannot be deserialized, deserializer will fail, message will not be acked and same record will be polled again. This is some sort of "poison packet" since kafka will spin on this message indefinitelly. We do have retry.backoff.ms to at least slow it down, but I can't see any max number retries or something. So the best thing we can do is to stop/pause container in this situation. Which is way to harsh. Btw. I'm new to kafka/spring-kafka, I did not see anywhere mention, how to manually reposition offset from outside of an application, meaning OK, listener is down, but now what? Another solution would be not to fail deserializer, and return something. But what?? KafkaNull, great, but then our listener will fail because SomeClass ClassCastException. We can send some artificial value of SomeClass, which is again horrible, because this is not a data what we actually get. Also this is architectonically incorrect.
or we can use repositioning error handler, which would be great, well if we know how to do that. I need to seek to next record. But while documentation says, that ErrorHandler should communicate which record caused the failure, it seems that it fails to do so. So even in non-batch listener I have list of records(1 failed + bunch of unprocessed), and have no idea where set offset to.
So what is the solution to this madness?
Well the best I can come up with right now is pretty ugly: do not fail in deserializer (bad), do not accept specific type in listener (bad), filter out KafkaNulls manually (bad) and finally trigger bean validation manually (bad). Is there a better way? Thanks for examplantion, I'd be grateful for every hint or direction given how to achieve this.
See the documentation for the upcoming 2.2 release (due tomorrow).
The DefaultAfterRollbackProcessor (when using transactions) and SeekToCurrentErrorHandler (when not using transactions) can now recover (skip) records that keep failing, and will do so after 10 failures, by default. They can be configured to publish failed records to a dead-letter topic.
Also see the Error Handling Deserializer which catches deserialization problems and passes them to the container so they can be sent to the error handler.

In Disassembler pipeline component - Send only last message out from GetNext() method

I have a requirement where I will be receiving a batch of records. I have to disassemble and insert the data into DB which I have completed. But I don't want any message to come out of the pipeline except the last custom made message.
I have extended FFDasm and called Disassembler(), then we have GetNext() which is returning every debatched message out and they are failing as there is subscribers. I want to send nothing out from GetNext() until Last message.
Please help if anyone have already implemented this requirement. Thanks!
If you want to send only one message on the GetNext, you have to call on Disassemble method to the base Disassemble and get all the messages (you can enqueue this messages to manage them on GetNext) as:
public new void Disassemble(IPipelineContext pContext, IBaseMessage pInMsg)
{
try
{
base.Disassemble(pContext, pInMsg);
IBaseMessage message = base.GetNext(pContext);
while (message != null)
{
// Only store one message
if (this.messagesCount == 0)
{
// _message is a Queue<IBaseMessage>
this._messages.Enqueue(message);
this.messagesCount++;
}
message = base.GetNext(pContext);
}
}
catch (Exception ex)
{
// Manage errors
}
Then on GetNext method, you have the queue and you can return whatever you want:
public new IBaseMessage GetNext(IPipelineContext pContext)
{
return _messages.Dequeue();
}
The recommended approach is to publish messages after disassemble stage to BizTalk message box db and use a db adapter to insert into database. Publishing messages to message box and using adapter will provide you more options on design/performance and will decouple your DB insert from receive logic. Also in future if you want to reuse the same message for something else, you would be able to do so.
Even then for any reason if you have to insert from pipeline component then do the following:
Please note, GetNext() method of IDisassembler interface is not invoked until Disassemble() method is complete. Based on this, you can use following approach assuming you have encapsulated FFDASM within your own custom component:
Insert all disassembled messages in disassemble method itself and enqueue only the last message to a Queue class variable. In GetNext() message then return the Dequeued message, when Queue is empty return null. You can optimize the DB insert by inserting multiple rows at a time and saving them in batches depending on volume. Please note this approach may encounter performance issues depending on the size of file and number of rows being inserted into db.
I am calling DBInsert SP from GetNext()
Oh...so...sorry to say, but you're doing it wrong and actually creating a bunch of problems doing this. :(
This is a very basic scenario to cover with BizTalk Server. All you need is:
A Pipeline Component to Promote BTS.InterchageID
A Sequential Convoy Orchestration Correlating on BTS.InterchangeID and using Ordered Delivery.
In the Orchestration, call the SP, transform to SOAP, call the SOAP endpoint, whatever you need.
As you process the Messages, check for BTS.LastInterchagneMessage, then perform your close out logic.
To be 100% clear, there are no practical 'performance' issues here. By guessing about 'performance' you've actually created the problem you were thinking to solve, and created a bunch of support issues for later on, sorry again. :( There is no reason to not use an Orchestration.
As noted, 25K records isn't a lot. Be sure to have the Receive Location and Orchestration in different Hosts.

Submit a task to an ExecutorService using a SheduleExecutorService

I'm developing a JavaFX application for read data from a serial device and show a notification when a new device is connected to the computer.
I have a task DeviceDetectorTask which scans all the ports and creates an event when a new device is connected. This task must be submited every 3 seconds.
When a device is detected the user can press a button to read all the data contained in it. This is performed by another task ReadDeviceTask. At this point and while the ReadDeviceTask is running scan operations should not be performed (I cannot read and scan one port at the same time). So only one of the two task can be running at a time.
My actual solution is:
public class DeviceTaskQueue {
private ExecutorService executorService = Executors.newSingleThreadExecutor();
public void submit(Runnable task) {
executorService.submit(task);
}
}
public class ScanScheduler {
private ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor();
public void start() {
AddScanTask task = new AddScanTask();
executor.scheduleAtFixedRate(task, 0, 3, TimeUnit.SECONDS);
}
}
public class AddScanTask implements Runnable {
#Autowired
DeviceTaskQueue deviceTaskQueue;
#Override
public void run() {
deviceTaskQueue.submit(new DeviceDetectorTask());
}
}
public class ViewController {
#Autowired
DeviceTaskQueue deviceTaskQueue;
#FXML
private readDataFromDevice() {
deviceTaskQueue.submit(new ReadDeviceTask());
}
}
My question is: is it ok to add a task to the ExecutorService from the task AddScanTask which has been scheduled by the ScheduledExecutorService?
Yes, An Executor May Post Task To Another Executor
To answer your simple question in last line:
is it ok to add a task to the ExecutorService from the task AddScanTask which has been scheduled by the ScheduledExecutorService?
Yes. Certainly you can submit a Callable/Runnable from any other code. That the submitting code happens to be running from another executor is irrelevant, as code run from an executor is still “normal” Java code, just running on a different thread.
That is the whole point of the executor, to handle the juggling of threads in a manner convenient to you the programmer. Making multi-threaded coding easier and less error-prone is why these classes were added to Java. See the extremely helpful book, Java Concurrency in Practice by Brian Goetz et al. And see other writings by Goetz.
In your case you have two executors each with their own thread, each executing a series of submitted tasks. One has tasks submitted automatically (timed) while the other has tasks submitted manually (arbitrarily). Each executes on their own thread independent of one another. With multiple cores they may execute simultaneously.
Therein lies the bigger problem: In your scenario you don't want them to be independent. You want the reading tasks to block the scanning tasks.
Bigger Problem
The problem you present is that a regularly occurring activity (scanning) must halt when an arbitrary event (reading) happens. That means the two activities must coordinate with one another. The question is how to coordinate.
Semaphores
When the arbitrary event is happening, it should raise a flag. The recurring activity, when it runs, should always check for that flag. If raised, wait until the flag lowers before proceeding with scan. The ScheduledExecutorService is designed for this, tolerating a task that may run for a time longer than the scheduled period. If one execution of the task runs long, the SES does not run again, so it does not pile up a backlog of executions. That is just the behavior you want.
Vice versa, if the recurring activity is executing, it should raise a flag. The arbitrary event’s first to-do item is to check for that flag. If raised, wait until lowered. Then proceed, first raising its own flag and then proceeding with the task at hand (scanning).
Perhaps your scenario should be designed with a single flag rather than scanner and reader each having their own. I would have to think about it more and probably know more about your scenario.
The technical term for such flags is semaphore.
Unfortunately your comment says you cannot alter the scanner’s source code. So you cannot implement the semaphores and coordinate the activities. So I am stuck, cannot see a solution.
Hack
Given your frozen code, one hack solution, which I do not recommend, is that the regularly occurring activity (the scanning) not actually do the work but instead post a scanning task on another thread (another executor). That other executor would also be the same executor used to post the arbitrary activity (the reading). So there is one single queue of to-do items, a mix of scanning and reading jobs, submitted to a single-thread executor. The single-thread means they get done one at a time in sequence of their submission.
I do not like this hack because if any of the to-do items takes a long while you will begin to accumulate a backlog. That could be a mess.
By the way, no need for the DeviceTaskQueue in your example code. Just call the instance of the ExecutorService directly to submit a task. That is the job of an ExecutorService, and wrapping it adds no value that I can see.

Resources