spring kafka message processing telemetry - spring-kafka

I'm trying to create some kind of "kafka message processing graph" - which service is consuming which topics and what messages - with some additional metadata(processing duration, whether it was processed OK or it ended with exception,...).
I could create some interceptor that would be invoked before each message processing, but in interceptor I don't know whether there is some handler for this type of event, nor do I know whether message was later processed OK or it ended in error handler.
For checking whether there is some handler I suppose there is some registry i could peek into (?), but is there also some way of wrapping message processing (like filters in spring-mvc) so I can calculate processing duration and end result?

Micrometer timers have been supported since 2.3 (for successes and failures).
https://docs.spring.io/spring-kafka/docs/current/reference/html/#micrometer
You can also add an AOP around advice to your listener beans.

Related

How to cancel a deferred Rebus message?

I am considering using deferred messages as a delay/timeout alerting mechanism within my Saga. I would do this by sending a deferred message at the same time as sending a message to a handler to do some time-consuming work.
If that handler fails to publish a response within the deferred timespan the Saga is alerted to the timeout/delay via the deferred message arrival. This is different to a handler failure as its still running, just slower than expected.
The issue comes if everything runs as expected, its possible that the Saga will complete all of its steps and you'd find many deferred messages waiting to be delivered to a Saga that no longer exists. Is there a way to clean up the deferred messages you know are no longer required?
Perhaps there is a nicer way of implementing this functionality in Rebus?
Once sent, deferred messages cannot be cancelled.
But Rebus happens to ignore messages that cannot be correlated with a saga instance, and the saga handler does not allow for that particular message type to initiate a new saga, so if the saga instance is gone, the message will simply be swallowed.
That's the difference between using IHandleMessages<CanBeIgnored> and IAmInitiatedBy<CannotBeIgnored> on your saga. 🙂

Tracking Event Processor - Retry Execption originating from Event Handlers

I am trying to configure an ErrorHandler for a TrackingEventProcessor so that any exception in my #EventHandler annotated method will be retried.
Currently using Axon Framework 4 and looking how to achieve this.
I'd recommend to set the PropagatingErrorHandler as the ListenerInvocationErrorHandler in this case.
Know that there are two error handling levels within a given Event Processor, namely:
The ListenerInvocationErrorHandler, catching the exceptions thrown from within your #EventHandler annotated methods. This defaults to a LoggingErrorHandler instance, which logs the exception.
The ErrorHandler, catching the transaction exceptions of the given EventProcessor. This defaults to a PropagatingErrorHandler, which rethrows the exceptions.
The TrackingEventProcessor (TEP) EventProcessor implementation is the one which will retry events with a certain (unconfigurable) back-off, if the configured ErrorHandler throws an exception.
If you want to retry on every exception (which might be a debatable approach anyhow), you'd thus want to enter the retry scheme of the TEP. To achieve this, you simply should configure the ListenerInvocationErrorHandler to also be a PropagatingErrorHandler. I'd recommending reading the Reference Guide on the matter too, to get a better idea of how to configure this.

Looking for non-blocking spring kafka ErrorHandler

After use of SeekToCurrentErrorHandler i am looking for a non-blocking kafka ErrorHandler. Because of some unstable subsystems we need to set high interval times as 5 minutes or more. Which would block our processing.
My idea is to use the topic itself to re-queue failing messages. But with two additional header values kafka_try-counter and kafka_try-timestamp.
Based on the SeekToCurrentErrorHandler and the DeadLetterPublishingRecoverer i implemented a draft of RePublishingErrorHandler and a RePublishingRecoverer
The RePublishingRecoverer update the kafka headers and produce the message in the same topic.
The RePublishingErrorHandler check header values and if kafka_try-counter exeeds max-attempts calls another ConsumerRecordRecoverer like the DLT or Logging.
The kafka_try-timestamp used determine the wait time of a message. If it returns to fast it should re-queued without the incremention of the try-counter.
The expectation of this aproach is to get a non-blocking listener.
Because of i am new to spring-kafka implementation and also kafka itself. I'm not sure if this aproach is OK.
And i am also somehow stuck in the implementation of that concept.
My idea is to use the topic itself to re-queue failing messages.
That won't work; you would have to publish it to another topic and have a (delaying) consumer on that topic, perhaps polling at some interval rather than using a message-driven consumer. Then have that consumer publish it back to the original topic.
All of this assumes that strict ordering within a partition is not a requirement for you.
It's easy enough to subclass the DeadLetterPublishingRecoverer and override the createProducerRecord() method. Call super() and then add your headers.
Set the BackOff in the SeekToCurrentErrorHandler to have a zero back off and 0 retries to immediately publish to the DLT.

Axon4 - Re-queue failed messages

In below scenario, what would be the bahavior of Axon -
Command Bus recieved the command
It creates an event
However messaging infra is down (say kafka)
Does Axon has re-queing capability for event or any other alternative to handle this scenario.
If you're using Axon, you know it differentiates between Command, Event and Query messages. I'd suggest to be specific in your question which message type you want to retry.
However, I am going to make the assumption it's about events, as your stating Kafka.
If this is the case, I'd highly recommend reading the reference guide on the matter, as it states how you can uncouple Kafka publication from actual event storage in Axon.
Simply put, use a TrackingEventProcessor as the means to publish events on Kafka, as this will ensure a dedicate thread is used for publication instead of the same thread storing the event. Added, the TrackingEventProcessor can be replayed, thus "re-process" events.

How to handle errors and retries in spring-kafka

This is a question related to :
https://github.com/spring-projects/spring-kafka/issues/575
I'm using spring-kafka 1.3.7 and transactions in a read-process-write cycle.
For this purpose, I should use a KTM on the spring kafka container to enable transaction on the whole listener process and automatic handling the transaction id based on the partition for zombie fencing(1.3.7 changes).
If I understand well from the issue #575, I can not use a RetryTemplate in a container when using a transaction manager.
How am I supposed to handle errors and retries in a such case ?
The default behavior with transaction is infinite retries ? This seems really dangerous. An unexpected exception might simply block the whole process in production.
The upcoming 2.2 release adds recovery to the DefaultAfterRollbackProcessor - so you can stop retrying after some number of attempts.
Docs Here, PR here.
It also provides an optional mechanism to send the failed record to a dead-letter topic.
If you can't move to 2.2 (release candidate due at the end of this week, with GA in October), you can provide a custom AfterRollbackProcessor with similar functionality.
EDIT
Or, you could add code to your listener (or its error handler) to keep track of how many times the same record has been delivered, and handle the error in your listener, or its listener-level error handler.

Resources