Out-of-the-box capabilities for Spring-Kafka consumer to avoid duplicate message processing - spring-kafka

I stumbled over Handling duplicate messages using the Idempotent consumer pattern :
Similar, but slightly different is the Transactional Inbox Pattern which acknowledges the kafka message receipt after the transaction INSERT into messages (no business transaction) concluded successfully and having a background polling to detect new messages in this table and trigger the real business logic (i.e. the message listener) subsequently.
Now I wonder, if there is a Spring magic to just provide a special DataSource config to track all received messages and discard duplicated message deliveries?
Otherwise, the application itself would need to take care to ack the kafka message receipt, message state changes and data cleanup of the event table, retry after failure and probably a lot of other difficult things that I did not yet thought about.

The framework does not provide this out of the box (there is no general solution that will work for all), but you can implement it via a filter, to avoid putting this logic in your listener.
https://docs.spring.io/spring-kafka/docs/2.7.9/reference/html/#filtering-messages

Related

Handling defunct deferred (timeout) messages?

I am new to Rebus and am trying to get up to speed with some patterns we currently use in Azure Logic Apps. The current target implementation would use Azure Service Bus with Saga storage preferably in Cosmos DB (still investigating that sample implementation). Maybe even use Rebus Mongo DB with Cosmos DB using the Mongo DB API (not sure if that is possible though).
One major use case we have is an event/timeout pattern, and after doing some reading of samples/forums/Stack Overflow this is not uncommon. The tricky part is that our Sagas would behave more as a Finite State Machine vs. a Directed Acyclic Graph. This mainly happens because dates are externally changed and therefore timeouts for events change.
The Defer() method does not return a timeout identifier, which we assume is an implementation restriction (Azure Service Bus returns a long). Since we must ignore timeouts that had been scheduled for an event which has now shifted in time, we see a way of having those timeouts "ignored" (since they cannot be cancelled) as follows:
Use a Dictionary<string, Guid> in our own SagaData-derived base class, where the key is some derivative of the timeout message type, and the Guid is the identifier given to the timeout message when it was created. I don't believe this needs to be a concurrent dictionary but that is why I am here...
On receipt of the event message, remove the corresponding timeout message type key from the above dictionary;
On receipt of the timeout message:
Ignore if it's timeout message type key is not present or the Guid does not match the dictionary key/value; else
Process. We could also remove the dictionary key at this point as well.
When event rescheduling occurs, simply add the timeout message type/Guid dictionary entry, or update the Guid with the new timeout message Guid.
Is this on the right track, or is there a more 'correct' way of handling defunct timeout (deferred) messages?
You are on the right track 🙂
I don't believe this needs to be a concurrent dictionary but that is why I am here...
Rebus lets your saga handler work on its own copy of the saga data (using optimistic concurrency), so you're free to model the saga data as if it's being only being accessed by one at a time.

DB Transaction and Integrations Events dispatch - how to make it atomic?

I'm designing a system with multiple bounded contexts (microservices). I will have 2 kind of events.
Domain Events, which happens "in memory" within single transaction (sync)
Integration Events, which are used between bounded contexts (async)
My problem is, how to make sure that once transaction is committed (at this point I'm sure all Domain Events were processed successfully) that Integration Events are successful as well.
When my Transaction is committed, normally I will dispatch Integration Events (e.g. to the queue), but there is possibility that this queue is down as well, so previously just-committed transaction has to be "reverted". How?
The only solution that comes to my mind is to store Integration Events to the same DB, within the same Transaction, and then process the Integration Events records and push them to the queue - this would be something like "using current DB, as a pre-queue, before pushing it to The Real Queue (however I read that using DB for this is an anti-pattern).
Is there any pattern (reliable approach) to make sure both: Transaction commit and Message pushed to the queue is an atomic operation?
EDIT
After reading https://devblogs.microsoft.com/cesardelatorre/domain-events-vs-integration-events-in-domain-driven-design-and-microservices-architectures/ , the author actually suggests the approach of "pre-queue" in same DB (he calls it “ready to publish the event”).
Checkout transactional outbox pattern.
This pattern does create a pre-queue. But the nice part is that pushing messages from pre-queue to real queue is fully decoupled. Instead you have a middleman called, a message relay that reads your transaction logs and pushes your event from to the real queue. Now since sending message and your domain events are fully decoupled, you can do all your domain events in a single transaction.
And make sure you that all your services are idempontent(same result despite duplicate calls). This transactional outbox patter does guarantee that messages are published, but in case when the message relay fails just after publishing(before acknowledging) it would publish the same event again.
Idempotent services is also necessary in other scenarios. As the event bus(the real queue) could have the same issue. Event bus propagates events, services acknowledge, then network error, then since the event bus is not acknowledged, the same event would be sent again.
Hmm actually idempotence alone could solve the whole issue. After the domain events computation completes(single transaction), if publishing message fails the service can simply throw an error without roll back. Since the event is not acknowledged the event bus will send the same event again. Now since the service is idempotent, the same database transaction will not happen twice, it will basically overwrite or better(should) skip and directly move to message publishing and acknowledging.

Axon4 - Re-queue failed messages

In below scenario, what would be the bahavior of Axon -
Command Bus recieved the command
It creates an event
However messaging infra is down (say kafka)
Does Axon has re-queing capability for event or any other alternative to handle this scenario.
If you're using Axon, you know it differentiates between Command, Event and Query messages. I'd suggest to be specific in your question which message type you want to retry.
However, I am going to make the assumption it's about events, as your stating Kafka.
If this is the case, I'd highly recommend reading the reference guide on the matter, as it states how you can uncouple Kafka publication from actual event storage in Axon.
Simply put, use a TrackingEventProcessor as the means to publish events on Kafka, as this will ensure a dedicate thread is used for publication instead of the same thread storing the event. Added, the TrackingEventProcessor can be replayed, thus "re-process" events.

Not able to do async dispatch to consumer and understand how "prefetch limit" is relevant

My understanding was that default behavior of ActiveMQ is to do async dispatch of messages to the consumers, but when I tried to test it by doing a Thread.sleep(60000); in my MessageListener#onMessage() then broker was not able to send queued messages until it received the acknowledgment from the dispatch of previous message.
So, then I tried to explicitly set the async flag, just in case, using ((ActiveMQConnectionFactory)connectionFactory).setDispatchAsync(true); as mentioned here but still same behavior.
Is there a way in which I can make sure that my ActiveMQ broker doesn't get blocked if one of the consumer is taking long time, please note that I know and read about "slow consumers" but this is not what I want, I want a truly async dispatch where-in where broker sends the message doesn't wait for any acknowledgement/response.
EDIT:
I just read about what-is-the-prefetch-limit-for and I am wondering that when broker is sending message synchronously to the consumer then what's the point of "prefetch limit"?
With the default configuration, ActiveMQ is configured to use a dispatch thread per Queue - you can use set the optimizedDispatch property on the destination policy entry - see configuring Queues.
set the optimizedDispatch="true" in activemq.xml
optimizedDispatch :
Default Value : false
Description : Don't use a separate thread for dispatching from a Queue.
Note that by doing a Thread.sleep(60000); in the MessageListener#onMessage() when using a single consumer the dispatcher of the consumer cannot send another messages.
UPDATE
<destinationPolicy>
<policyMap>
<policyEntries>
<policyEntry queue=">" optimizedDispatch="true"/>
<policyEntries>
<policyMap>
<destinationPolicy>
queue=">" means all queues
EDIT by OP (hagrawal): To help future visitor to catch the concept quickly I am putting below the core concept in nut shell, please feel free to read all the comments below in order to know more. Many thanks to #HassenBennour for clarifying all this.
If there are 2 consumers connected and messages getting produced then
it will do robin round message dispatching to those consumer, but
suppose no consumer is connected, broker got 4 messages enqueued, a
consumer got connected with 3 as prefetch limit then it will deliver 3
messages to the consumer and then wait, meanwhile if some other
consumer gets connected then it will immediately deliver 4th message
to that otherwise it will wait for acknowledgment of 1st message
before delivering 4th message to same consumer.

Rebus Publish Exception Handling

Lets assume rebus could not publish message to rabbitmq or some other queue, what is the best practice to handle this exception.
I stopped rabbitmq service and rebus threw Aggregate exception. I can manually cacth this exception in try - catch block but is there a better solution to catch exceptions when such situations happened ?
First off: If you get an exception when initially sending/publishing a message (e.g. while handling a web request), there's nothing you can do, really. Sorry ;)
You should probably log - thoroughly - all the information you can, and then be sure to set up logging so that the information ends up in a file or in some other persistent log. And then you should have some kind of notification or a process in place that ensures that someone will at some point look at the log.
You should probably have this kind of logging in place, regardless of the type of work you do.
Depending on how important your information is, you could also set up some kind of retry mechanism (although you should be careful that you do not consume threads and too much memory while retrying). Also, since your web application should be able to be recycled at any time, you probably should not rely (too much) on retries.
You can do some things, though, in order to minimize the risk of ending up in a situation where you can't send/publish.
I can recommend that you use some kind of high-availability transport, like MSMQ (because it has local outgoing queues), RabbitMQ (with a shovel on each machine), or Azure Service Bus or Azure Storage Queues if you're in Azure.
Moreover - if you were using MSMQ, and you want to publish an event - I would recommend that you await bus.Send(theEvent) first, and then when you handle the message, you await bus.Publish(theEvent). This is because Rebus (with the MSMQ transport) needs to do a lookup in the subscription storage in order to get all subscribers for the given event. This is not a problem with RabbitMQ though, because Rebus will use Rabbit's topics to do pub/sub and will be just as safe as doing an ordinary send.
When you're sending/publishing from within a Rebus message handler, there is of course no problem, since the receive operation will be rolled back, and eventually the incoming message will end up in an error queue.
I hope that cast some light on the situation :)

Resources