How to retry in kafka with separate retry topics and exponentional backoff - spring-kafka

To implement Retry architecture in spring kafka ( kafka version 2.3.6) with backoffs.
Requirement.
Event is first published in main-topic.
If the processing fails, then its pushed to retry-topic-2s after 2s. This retry topic has a separate processing code as compared to the main topic (consequently have a different listener function).
If the processing fails in retry-topic-2s, then its pushed to retry-topic-6s after 6s. This retry topic has a separate processing code as compared to the previous retry topic.
Finally if processing fails, event is pushed to DLT (dead-letter-topic).
Problem.
How to push events into a custom topic with a delay ( without using Thread.sleep ).
I have tried ContainerCustomizer to direct messages to retry-topic-2s directly without and delay and applying retry backoffs in this topic. But this is not satisfying the requirement.
private ContainerCustomizer<String,String, ConcurrentMessageListenerContainer<String, String>> customizer(KafkaTemplate<String,String> template) {
return container -> {
ExponentialBackOff exponentialBackOff = new ExponentialBackOff(retryFixedBackoff, retryMultiplier);
exponentialBackOff.setMaxElapsedTime(retryMaxElapsedTime);
if (container.getContainerProperties().getTopics()[0].equals(callbackRetryTopic)) {
container.setErrorHandler(new SeekToCurrentErrorHandler(
new DeadLetterPublishingRecoverer(template,
(cr, ex) -> new TopicPartition(callbackDeadLetterTopic, cr.partition())),
exponentialBackOff));
}
};
}
Can someone help me with this usecase?

It is non-trivial and is now supported natively by all supported versions of Spring for Apache Kafka. There is a lot of code involved in supporting this feature.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#retry-topic

Related

ASP.NET Core multithreaded background threads

Using ASP.NET Core .NET 5. Running on Windows.
Users upload large workbooks that need to be converted to a different format. Each conversion process is CPU intensive and takes around a minute to complete.
The idea is to use a pattern where the requests are queued in a background queue and then processed by background tasks.
So, I followed this Microsoft article
The queuing part worked well but the issue was that workbooks were executing sequentially in the background:
private async Task BackgroundProcessing(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
var workItem =
await TaskQueue.DequeueAsync(stoppingToken);
try
{
await workItem(stoppingToken);
}
catch (Exception ex)
{
_logger.LogError(ex,
"Error occurred executing {WorkItem}.", nameof(workItem));
}
}
}
If I queued 10 workbooks. Workbook 2 wouldn't start until workbook 1 is done. Workbook 3 wouldn't start until workbook 2 is done, etc.
So, I modified the code to run tasks without await and hid the warning with the discard operator (please note workItem is now Action, not Task):
while (!stoppingToken.IsCancellationRequested)
{
var workItem = await TaskQueue.DequeueAsync(stoppingToken);
_ = Task.Factory.StartNew(() =>
{
try
{
workItem(stoppingToken);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error occurred executing {WorkItem}.", nameof(workItem));
}
}, TaskCreationOptions.LongRunning);
}
That works -- I get all workbooks starting processing around the same time, and then they complete around the same time too. But, I am not sure if doing this is dangerous and can lead to bugs, crashes, etc.
Is the second version a workable solution, or will it lead to some disaster in the future? Is there a better way to implement parallel workloads on the background threads in ASP.NET?
Thanks.
Using an external queue has some advantages over in-memory queueing. In particular, the queue message are stored in a reliable external store with features around retries, multiple consumers, etc. If your app crashes, the queue item remains and can be tried again.
In Azure, you can use several services including Azure Storage Queues and Service Bus. I like Service Bus because it uses push-based behavior to avoid the need for a polling loop in your code. Either way, you can create an instance of IHostedService that will watch the queue and process the work items in a separate thread with configurable parallelization.
Look for examples on using within ASP.NET Core, for example:
https://damienbod.com/2019/04/23/using-azure-service-bus-queues-with-asp-net-core-services/
The idea is to use a pattern where the requests are queued in a background queue and then processed by background tasks.
The proper solution for request-extrinsic code is to use a durable queue with a separate backend processor. Any in-memory solution will lose that work any time the application is shut down (e.g., during a rolling upgrade).

Cosmo ChangeFeed -Errors,exceptions and Service fail scenario's

All,
I am using Change Feed Processor Library.Want to know the best way to handle service failure along with the exceptions/errors scenario's in ProcessChangesAsync method. Below are the events am referring to.
1) Service failure - Service having the processor library crashed in the middle of some operation. How to start the process from the same document(doc on failure instance)? is there any inbuilt mechanism where change feed will start with the last failed documents? E.g. Let assume,in current batch we have 10 docs.5 processed successfully and then service breaks because of network failure or by some other reasons.Will my process starts with 6th document once service is re-started? How to achieve this?
2) Exception and Errors- Any errors in ProcessChangesAsync method can be handle using try catch at the global level but how to persist those failure records and make them available for the next batch? Again,looking for any available inbuilt mechanism in change feed process.
1) The Processor Library, by default, checkpoints after a successful run of ProcessChangesAsync. In the latest library version, you can customize the Checkpointer to do manual checkpoints in case you need it. If for some reason the processor shuts down before checkpointing, then it will start processing next from the the last successful checkpoint stored in the Leases collection. In your case, it will start with the first document again, so you will never lose a change but you could experience double processing (this is an "at least once" model).
2) There is no built-in mechanism that you can leverage, handling exceptions within the ProcessChangesAsync is your responsibility. You could not only add a global try/catch but, in the case you are looping over the documents, add a try/catch inside the loop, to handle a failing document (maybe send it to queue for later analysis/post-process) without losing the batch. If you require logging for those errors (I'm assuming that's what you mean by persisting errors?), then the latest version is compatible with LibLog, so plugging your own custom logging is as simple as:
using Microsoft.Azure.Documents.ChangeFeedProcessor.Logging;
var hostName = "SampleHost";
var tracelogProvider = new TraceLogProvider(); //You can use any provider supported by LibLog
using (tracelogProvider.OpenNestedContext(hostName))
{
LogProvider.SetCurrentLogProvider(tracelogProvider);
// After this, create IChangeFeedProcessor instance and start/stop it.
}
Source
Extra info for the comments
To avoid exceptions halting the batch or causing a batch to be reprocessed, you can have handling like this:
public async Task ProcessChangesAsync(IChangeFeedObserverContext context, IReadOnlyList<Document> documents, CancellationToken cancellationToken)
{
try
{
foreach(var document in documents)
{
try
{
// Do your work for the document
}
catch(Exception ex)
{
// Something happened with the current document, handle it, send it to a queue / another storage to analyze, log it. This catch will make the loop continue with the next.
}
}
}
catch(Exception ex)
{
// Something unhandled happened, log it and avoid throwing it again so the next batch is processed
}
}

How AppDynamics 4.4 to track async transaction

Consider below code:
public class Job {
private final ExecutorService executorService;
public void process() {
executorService.submit(() -> {
// do something slow
}
}
}
I could use AppDynamics "Java POJO" rule to create a business transaction to track all the calls to Job.process() method. But the measured response time didn't reflect real cost by the async thread started by java.util.concurrent.ExecutorService. This exact problem is also described in AppDynamics document: End-to-End Latency Performance that:
The return of control stops the clock on the transaction in terms of measuring response time, but meanwhile the logical processing for the transaction continues.
The same AppDynamics document tries to give a solution to address this issue but the instructions it provides is not very clear to me.
Could anyone give more executable guide on how to configure AppD to track async calls like the one shown above?
It seems that you schould be able to define your custom Asynchronous Transaction Demarcator as described in: https://docs.appdynamics.com/display/PRO44/Asynchronous+Transaction+Demarcators
which will point to the last method of Runnable that you passes to the Executor. Then according to the documentation all you need is to attach the Demarcator to your Business Transaction and it will collect the asynchronous call.

How is the batch asynchronous approach with the intuit SDK any different to the batch synchronous approach?

I have looked at the documentation for both synchronous and asynchronous approaches for the QuickBooks Online API V3. They both allow the creation of a data object and the adding of requests to a batch operation followed by the execution of the batch. In both the documentations they state:
"Batch items are executed sequentially in the order specified in the
request..."
This confuses me because I don't understand how asynchronous processing is allowed if the batch process executes each batch operation sequentially.
The documentation for asynchronous processing states at the top:
"To asynchronously access multiple data objects in a single request..."
I don't understand how this can occur if batch operations are executed sequentially within a batch process request.
Would someone kindly clarify.
In asyn call( from devkit ), calling thread doesn't wait for the response from service. You can associate a handler which will take care of that.
for Ex -
public void asyncAddAccount() throws FMSException, Exception {
Account accountIn = accountHelper.getBankAccountFields();
try {
service.addAsync(accountIn, new CallbackHandler() {
#Override
public void execute(CallbackMessage callbackMessage) {
callbackMessageResult = callbackMessage;
lock_add.countDown();
}
});
} catch (FMSException e) {
Assert.assertTrue(false, e.getMessage());
}
lock_add.await();
Account accountOut = (Account) callbackMessageResult.getEntity();
Assert.assertNotNull(accountOut);
accountHelper.verifyAccountFields(accountIn, accountOut);
}
Server always executes the requests sequentially.
In a batch, if you specify multiple operations, then server will execute it sequentially (top - down).
Thanks

ActiveMQ Override scheduled message

I am trying to implement delayed queue with overriding of messages using Active MQ.
Each message is scheduled to be delivered with delay of x (say 60 seconds)
In between if same message is received again it should override previous message.
So even if I receive 10 messages say in x seconds. Only one message should be processed.
Is there clean way to accomplish this?
The question has two parts that need to be addressed separately:
Can a message be delayed in ActiveMQ?
Yes - see Delay and Schedule Message Delivery. You need to set <broker ... schedulerSupport="true"> in your ActiveMQ config, as well as setting the AMQ_SCHEDULED_DELAY property of the JMS message saying how long you want the message to be delayed (10000 in your case).
Is there any way to prevent the same message being consumed more than once?
Yes, but that's an application concern rather than an ActiveMQ one. It's often referred to as de-duplication or idempotent consumption. The simplest way if you only have one consumer is to keep track of messages received in a map, and check that map whether you receive a message. It it has been seen, discard.
For more complex use cases where you have multiple consumers on different machines, or you want that state to survive application restart, you will need to keep a table of messages seen in a database, and query it each time.
Please vote this answer up if it helps, as it encourages people to help you out.
Also according to method from ActiveMQ BrokerService class you should configure persistence to have ability to use scheduler functionality.
public boolean isSchedulerSupport() {
return this.schedulerSupport && (isPersistent() || jobSchedulerStore != null);
}
you can configure activemq broker to enable "schedulerSupport" with the following entry in your activemq.xml file located in conf directory of your activemq home directory.
<broker xmlns="http://activemq.apache.org/schema/core" brokerName="localhost" dataDirectory="${activemq.data}" schedulerSupport="true">
You can Override the BrokerService in your configuration
#Configuration
#EnableJms
public class JMSConfiguration {
#Bean
public BrokerService brokerService() throws Exception {
BrokerService brokerService = new BrokerService();
brokerService.setSchedulerSupport(true);
return brokerService;
}
}

Resources