How to pause a specific kafka consumer thread when concurrency is set to more than 1? - spring-kafka

I am using spring-kafka 2.2.8 and setting concurrency to 2 as shown below and trying to understand how do i pause an consumer thread/instance when particular condition is met.
#KafkaListener(id = "myConsumerId", topics = "myTopic", concurrency=2)
public void listen(String in) {
System.out.println(in);
}
Now, I've two questions.
Would my consumer span two different poll threads to poll the records?
If i'm setting an id to the consumer as shown above. How can i pause a specific consumer thread (with concurrency set to more than 1).
Please suggest.

Use the KafkaListenerEndpointRegistry.getListenerContainer(id) method to get a reference to the container.
Cast it to a ConcurrentMessageListenerContainer and call getContainers() to get a list of the child KafkaMessageListenerContainers; you can then pause/resume them individually.
You can determine which topics/partitions each one has using getAssignedPartitions().

Related

KafkaMessageListenerContainer how to do nack for specific error

I am using KafkaMessageListenerContainer with (KafkaAdapter).
How can I "nack" offsets in case of specific error, so the next poll() will take them again?
properties.setAckMode(ContainerProperties.AckMode.BATCH);
final KafkaMessageListenerContainer<String, String> kafkaContainer = new KafkaMessageListenerContainer<>(consumerFactory , properties);
kafkaContainer.setCommonErrorHandler(new CommonErrorHandler() {
#Override
public void handleBatch(Exception thrownException, ConsumerRecords<?, ?> data, Consumer<?, ?> consumer, MessageListenerContainer container, Runnable invokeListener) {
CommonErrorHandler.super.handleBatch(thrownException, data, consumer, container, invokeListener);
}
});
Inside handleBatch I am detecting the exception, for that specific exception I would like to do nack.
Tried to throw from there RuntimeException.
using springboot 2.7
Use the DefaultErrorHandler - it does exactly that (the whole batch is retried according to the back off). You can classify which exceptions are retryable or not.
If you throw a BatchListenerFailedException you can specify exactly which record in the batch had the failure and only retry it (and the following records).
EDIT
If any other type of exception is thrown, the DefaultErrorHandler falls back to using a FallbackBatchErrorHandler which calls ErrorHandlingUtils.retryBatch() which, pauses the consumer and redelivers the whole batch without seeking and re-polling (the polls within the loop return no records because the consumer is paused).
See the documentation. https://docs.spring.io/spring-kafka/docs/current/reference/html/#retrying-batch-eh
This is required, because there is no guarantee that the batch will be fetched in the same order after a seek.
This is because we need to know the state of the batch (how many times we have retried). We can't do that if the batch keeps changing; hence the algorithm I described above.
To retry indefinitely you can, for example, use a FixedBackOff with Long.MAX_VALUE in the maxAttempts property. Or use an ExponentialBackOff with no termination.
Just be sure that the largest back off (and time to process a batch) is significantly less than max.poll.interval.ms to avoid a rebalance.

TransactionId prefix for producer-only and read-process-write - ProducerFencedException

Background: We have been getting ProducerFencedException in our producer-only transactions, and want to introduce uniqueness to our prefix to prevent this issue.
In this discussion, Gary mentions that in the case of read-process-write, the prefix must be the same in all instances and after each restart.
How to choose Kafka transaction id for several applications, hosted in Kubernetes?
While digging into this issue, I came to the realisation that we are sharing the same prefixId for both producer-only and read-process-write.
In our TopicPublisher class wrapping kafkaTemplate, we already have a publish() and publishInTransaction() methods for read-process-write and producer-only use cases respectively.
I am thinking to have 2 sets of kafkaTemplates/TransactionManagers/ProducerFactories, one with a fixed prefixId to be used by the publish() method and one with a unique prefix to be used in publishInTransaction().
My question is:
Does the prefix for producer-only need to be the same after a pod is restarted. Can we just append some uuid or k8s podId? Someone mentioned there may be delays with aborting transactions.
Is there a clean way to detect if the TopicPublisher is being called from a KafkaListener, so we can have just 1 publish method that uses the correct kafkaTemplate as needed?
Actually, there is no issue using the same transactionIdPrefix, at least with recent versions.
The factory gets a txIdPrefix.
For read-process-write, we create (and cache) a producer with transactionalId:
private String zombieFenceTxIdSuffix(String topic, int partition) {
return this.consumerGroupId + "." + topic + "." + partition;
}
which is suffixed onto the prefix.
For producer only-transactions, we create (and cache) a producer with the prefix and simple numeric suffix.
In the upcoming 2.3 release, there is also an option to assign a producer to a thread so the same thread always uses the same transactional.id.
I believe it needs to be the same, unless you don't mind waiting for transaction.timeout.ms (default 1 minute).
The maximum amount of time in ms that the transaction coordinator will wait for a transaction status update from the producer before proactively aborting the ongoing transaction.If this value is larger than the transaction.max.timeout.ms setting in the broker, the request will fail with a InvalidTransactionTimeout error.
This is what we do in spring-integration-kafka
if (this.transactional
&& TransactionSynchronizationManager.getResource(this.kafkaTemplate.getProducerFactory()) == null) {
sendFuture = this.kafkaTemplate.executeInTransaction(t -> {
return t.send(producerRecord);
});
}
else {
sendFuture = this.kafkaTemplate.send(producerRecord);
}
You can also use String suffix = TransactionSupport.getTransactionIdSuffix(); which is what the factory uses when it is asked for producer - if null, you are not running on a transactional consumer thread.

How to get specified message from Azure Service Bus Topic and then delete it from Topic?

I’m writing functionality for receiving messages from Azure Service Bus Topic and delete the specified message from Topic. Before deleting that message, I need to send that message to other Topic.
static async Task ProcessMessagesAsync(Message message, CancellationToken token)
{
// Process the message.
Console.WriteLine($"Received message: WorkOrderNumber:{message.MessageId} SequenceNumber:{message.SystemProperties.SequenceNumber} Body:{Encoding.UTF8.GetString(message.Body)}");
Console.WriteLine("Enter the WorkOrder Number you want to delete:");
string WorkOrderNubmer = Console.ReadLine();
if (message.MessageId == WorkOrderNubmer)
{
//TODO:Post message into other topic(Priority) then delete from this current topic.
var status=await SendMessageToBus(message);
if (status == true)
{
await normalSubscriptionClient.CompleteAsync(message.SystemProperties.LockToken);
Console.WriteLine($"Successfully deleted your message from Topic:{NormalTopicName}-WorkOrderNumber:" + message.MessageId);
}
else
{
Console.WriteLine($"Failed to send message to PriorityTopic:{PriorityTopicName}-WorkOrderNumber:" + message.MessageId);
}
}
else
{
Console.WriteLine($"Failed to delete your message from Topic:{NormalTopicName}-WorkOrderNumber:" + WorkOrderNubmer);
// Complete the message so that it is not received again.
// This can be done only if the subscriptionClient is created in ReceiveMode.PeekLock mode (which is the default).
await normalSubscriptionClient.CompleteAsync(message.SystemProperties.LockToken);
// Note: Use the cancellationToken passed as necessary to determine if the subscriptionClient has already been closed.
// If subscriptionClient has already been closed, you can choose to not call CompleteAsync() or AbandonAsync() etc.
// to avoid unnecessary exceptions.
}
}
My issue with this approach is:
It’s not scalable; what if the message is the 50th in the collection? We’d have to iterate through 49 times and mark i.e deleted.
It’s a long-running process.
To avoid these problems, I want to get the specified message from the queue based on Index or sequence number then I can delete that from the topic.
So, can anyone suggest me how to resolve this problem?
So if I understand your questions and comments correctly you are trying to do something like this:
Incoming messages come into either a standard topic or priority
topic.
Some process checks messages in the standard topic and
"moves" them to the priority topic based on some criteria by
deleting them from the standard topic and adding them to the
priority topic.
Messages are processed as normal.
As Sean noted, step 2 simply won't work. Service Bus is a first=in-first-out-ish system where a consumer simply picks up the next available message. You can sort through a queue by pulling out all the messages and abandoning/completing them based on specific criteria, but scaling is a problem. In addition, you can think of each topic subscription as its own separate queue- removing a message form one subscription does not remove it from any of the other subscriptions.
What I would suggest instead of trying to pull out everything from the topics and then putting back the ones you want to keep, add a sorting queue in front of the two topics. If you don't need to sort the high priority messages you could put this sorting process in front of the standard priority topic only.
This is how the process would work:
Incoming messages are added to a sorting queue Note that this is a single queue, not a topic. At this point in the process we want to ensure there is only one copy of each message.
A sorting process moves messages from the sorting queue into either the standard or priority queue as is appropriate. Using something like Azure Functions you can scale this process fairly easily.
Messages are processed from the topics as normal.

Spring Kafka and MDC

I have a Kafka consumer implemented as :
#KafkaListener(topics="...", group-id="....")
public void doProcessing(#Payload String data, #Headers Map<String, Object> headers)
{
//read one of the headers and get a unique id pertaining for thread
//set that header value in MDC
String messageUniqueIdentifier=headers.get("myRequestIdentifierKey");
MDC.put("myRequestIdentifierKey",messageUniqueIdentifier)
log.info("logging just to see if the unique identifier comes in the logs or not);
//do some processing
}
Is this a safe approach? Is it always gauranteed that the same thread will
service one message in the consumer?
It's not clear what you are asking. If you have concurrency there will be more than one thread but each message will be processed on one thread (as long as your listener doesn't hand off to another thread).
So, as long as you set the MDC in the listener each time and don't hand off to another thread; it will work.
If you have only one thread, the same thread will be used for every message (unless the container is stopped and restarted, in which case it will get a new thread).

Submit a task to an ExecutorService using a SheduleExecutorService

I'm developing a JavaFX application for read data from a serial device and show a notification when a new device is connected to the computer.
I have a task DeviceDetectorTask which scans all the ports and creates an event when a new device is connected. This task must be submited every 3 seconds.
When a device is detected the user can press a button to read all the data contained in it. This is performed by another task ReadDeviceTask. At this point and while the ReadDeviceTask is running scan operations should not be performed (I cannot read and scan one port at the same time). So only one of the two task can be running at a time.
My actual solution is:
public class DeviceTaskQueue {
private ExecutorService executorService = Executors.newSingleThreadExecutor();
public void submit(Runnable task) {
executorService.submit(task);
}
}
public class ScanScheduler {
private ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor();
public void start() {
AddScanTask task = new AddScanTask();
executor.scheduleAtFixedRate(task, 0, 3, TimeUnit.SECONDS);
}
}
public class AddScanTask implements Runnable {
#Autowired
DeviceTaskQueue deviceTaskQueue;
#Override
public void run() {
deviceTaskQueue.submit(new DeviceDetectorTask());
}
}
public class ViewController {
#Autowired
DeviceTaskQueue deviceTaskQueue;
#FXML
private readDataFromDevice() {
deviceTaskQueue.submit(new ReadDeviceTask());
}
}
My question is: is it ok to add a task to the ExecutorService from the task AddScanTask which has been scheduled by the ScheduledExecutorService?
Yes, An Executor May Post Task To Another Executor
To answer your simple question in last line:
is it ok to add a task to the ExecutorService from the task AddScanTask which has been scheduled by the ScheduledExecutorService?
Yes. Certainly you can submit a Callable/Runnable from any other code. That the submitting code happens to be running from another executor is irrelevant, as code run from an executor is still “normal” Java code, just running on a different thread.
That is the whole point of the executor, to handle the juggling of threads in a manner convenient to you the programmer. Making multi-threaded coding easier and less error-prone is why these classes were added to Java. See the extremely helpful book, Java Concurrency in Practice by Brian Goetz et al. And see other writings by Goetz.
In your case you have two executors each with their own thread, each executing a series of submitted tasks. One has tasks submitted automatically (timed) while the other has tasks submitted manually (arbitrarily). Each executes on their own thread independent of one another. With multiple cores they may execute simultaneously.
Therein lies the bigger problem: In your scenario you don't want them to be independent. You want the reading tasks to block the scanning tasks.
Bigger Problem
The problem you present is that a regularly occurring activity (scanning) must halt when an arbitrary event (reading) happens. That means the two activities must coordinate with one another. The question is how to coordinate.
Semaphores
When the arbitrary event is happening, it should raise a flag. The recurring activity, when it runs, should always check for that flag. If raised, wait until the flag lowers before proceeding with scan. The ScheduledExecutorService is designed for this, tolerating a task that may run for a time longer than the scheduled period. If one execution of the task runs long, the SES does not run again, so it does not pile up a backlog of executions. That is just the behavior you want.
Vice versa, if the recurring activity is executing, it should raise a flag. The arbitrary event’s first to-do item is to check for that flag. If raised, wait until lowered. Then proceed, first raising its own flag and then proceeding with the task at hand (scanning).
Perhaps your scenario should be designed with a single flag rather than scanner and reader each having their own. I would have to think about it more and probably know more about your scenario.
The technical term for such flags is semaphore.
Unfortunately your comment says you cannot alter the scanner’s source code. So you cannot implement the semaphores and coordinate the activities. So I am stuck, cannot see a solution.
Hack
Given your frozen code, one hack solution, which I do not recommend, is that the regularly occurring activity (the scanning) not actually do the work but instead post a scanning task on another thread (another executor). That other executor would also be the same executor used to post the arbitrary activity (the reading). So there is one single queue of to-do items, a mix of scanning and reading jobs, submitted to a single-thread executor. The single-thread means they get done one at a time in sequence of their submission.
I do not like this hack because if any of the to-do items takes a long while you will begin to accumulate a backlog. That could be a mess.
By the way, no need for the DeviceTaskQueue in your example code. Just call the instance of the ExecutorService directly to submit a task. That is the job of an ExecutorService, and wrapping it adds no value that I can see.

Resources