Performance comparison of RetryTemplate vs ErrorHandler - spring-kafka

I have two ways of recovering from KafkaListener errors with using
RetryTemplate in the KafkaListener method:
#KafkaListener(topics: "topic1")
public void handle(command) {
retryTemplate.execute(ctx -> {
processCommand(command);
});
// Retries exhausted, execute the recoverer logic
recover(command);
}
Set ErrorHandler to MessageListenerContainer via ContainerCustomizer:
#Component
public class ContainerCustomizer {
public ContainerCustomizer(CustomConcurrentContainerListenerFactory factory) {
factory.setContainerCustomizer(container -> {
container.setErrorHandler(new SeekToCurrentErrorHandler((ConsumerRecord<?, ?> record, Exception e) -> {
//logic for recoverer after retries exhausted
recover(convertRecord(record));
}, new ExponentialBackOffWithMaxRetries(2)));
});
}
}
When it comes to performance and blocking the consumer thread, how these two options compare? Is it true to say that with RetryTemplate.execute, retries are handled in a separate thread while with containerListener.setErrorHandler it blocks the main consumer's thread?

Both will block the consumer thread - otherwise you'd continue processing records and Kafka's ordering guarantees would be lost.
Also, both approaches are deprecated in favor of DefaultErrorHandler, which is an evolution of SeekToCurrentErrorHandler.
The difference between the two is, with Spring Retry, all invocations will be retried in memory, so you should make sure the aggregate backoff won't exceed the max.poll.interval.ms or the broker will think your server is dead and will perform a rebalance.
SeekToCurrentErrorHandler, as well as DefaultErrorHandler, will perform a new seek to the broker for each retry, so you just need to make sure the biggest delay plus execution time does not exceed max.poll.interval.ms.
If you need non-blocking retries, take a look at the non-blocking retries feature.

Related

when to use RecoveryCallback vs KafkaListenerErrorHandler

I'm trying to understand when should i use org.springframework.retry.RecoveryCallback and org.springframework.kafka.listener.KafkaListenerErrorHandler?
As of today, I'm using a class (implements org.springframework.retry.RecoveryCallback) to log error message and send the message to DLT and it's working. For sending a message to DLT, I'm using Spring KafkaTemplate and then I came across KafkaListenerErrorHandler and DeadLetterPublishingRecoverer. Now, can you please suggest me, how should i use KafkaListenerErrorHandler and DeadLetterPublishingRecoverer? Can this replace the RecoveryCallback?
Here is my current kafkaListenerContainerFactory code
#Bean
public ConcurrentKafkaListenerContainerFactory kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(primaryConsumerFactory());
factory.setRetryTemplate(retryTemplate());
factory.setRecoveryCallback(recoveryCallback);
factory.getContainerProperties().setAckMode(AckMode.RECORD);
factory.setConcurrency(1);
factory.getContainerProperties().setMissingTopicsFatal(false);
return factory; }
If it's working as you want now, why change it?
There are several layers and you can choose which one to do the error handling, depending on your needs.
KafkaListenerErrorHandler would be invoked for each delivery attempt within the retry, so you typically won't use it with retry.
Retry RecoveryCallback is invoked after retries are exhausted (or immmediately if you have classified an exception as not retryable).
ErrorHandler - is in the container and is invoked if any listener throws an exception, not just #KafkaListeners.
With recent versions of the framework you can completely replace listener level retry with a SeekToCurrentErrorHandler configured with a DeadLetterPublishingRecoverer and a BackOff.
The DeadLetterPublishingRecoverer is intended for use in a container error handler since it needs the raw ConsumerRecord<?, ?>.
The KafkaListenerErrorHandler only has access to the spring-messaging Message<?> that is converted from the ConsumerRecord<?, ?>.
To add on to the excellent context from #GaryRussell, this is what i am currently using:
I am handling any errors(a.k.a exception) like this:
factory.setErrorHandler(new SeekToCurrentErrorHandler(
new DeadLetterPublishingRecoverer(kafkaTemplate), new FixedBackOff(0L, 0L)));
And to print this error, i have a listener on the .DLT and i am printing the exception stack trace that is stored in the header like so:
#KafkaListener(id = "MY_ID", topics = MY_TOPIC + ".DLT")
public void listenDlt(ConsumerRecord<String, SomeClassName> consumerRecord,
#Header(KafkaHeaders.DLT_EXCEPTION_STACKTRACE) String exceptionStackTrace) {
logger.error(exceptionStackTrace);
}
Note: I am using logger.error, because i am redirecting all error messages to an error log file that is being monitored.
BONUS:
If you set the following:
logging.level.org.springframework.kafka=DEBUG
You will see this in your console/log:
xxx [org.springframework.kafka.KafkaListenerEndpointContainer#7-2-C-1] DEBUG o.s.k.listener.SeekToCurrentErrorHandler - Skipping seek of: ConsumerRecord xxx
xxx [kafka-producer-network-thread | producer-3] DEBUG o.s.k.l.DeadLetterPublishingRecoverer - Successful dead-letter publication: SendResult xxx
If you have a better way to log, i would appreciate your comment.
Thanks!
Cheers

Test both Handle and Handle<IFailed> methods with 1st level and 2nd Level Retry with Rebus

How to debug both Handle methods below?
I set breakpoints on both Handle method within Visual Studio, and send message to Subscriber1 queue, but both methods are not called under VS.
public class SomeHandler : IHandleMessages<string>, IHandleMessages<IFailed<string>>
{
readonly IBus _bus;
public SomeHandle(IBus bus)
{
_bus = bus;
}
public async Task Handle(string message)
{
// do stuff that can fail here...
}
public async Task Handle(IFailed<string> failedMessage)
{
await _bus.Advanced.TransportMessage.Defer(TimeSpan.FromSeconds(30));
}
}
Below is the message sent to Subscriber1.
I tried sending the message excluding either rbs2-msg-id or rbs2-msg-type, neither of them triggers the Handle method above.
{
"body": "Test",
//other fields
"properties": {
"rbs2-intent": "pub",
"rbs2-msg-id": "cd57d735-3989-45b5-8a3c-e457fa61dc94",
"rbs2-return-address": "publisher",
"rbs2-senttime": "2019-05-27T15:07:25.1770000+01:00",
"rbs2-sender-address": "publisher",
"rbs2-msg-type": "System.String, mscorlib",
"rbs2-corr-id": "cd57d735-3989-45b5-8a3c-e457fa61dc94",
"rbs2-corr-seq": "0",
"rbs2-content-type": "application/json;charset=utf-8"
},
//other fields
}
Update 1
If an exception is thrown within Handle(string message), the method will be retried based on the 1st level try count. This is what we need.
However, Handle(IFailed<string> failedMessage) is not invoked, how to debug Handle(IFailed<string> failedMessage) like abvoe?
One note: when an exception is thrown within Handle(string message), IErrorHandler is NOT called, and AddTransportMessageForwarder is not called either, are these correct?
Could you try and check your logs?
When you remove the rbs2-msg-id header, Rebus will immediately move the message to the dead-letter queue, simply refusing to handle it. This is because a message without a message ID cannot be tracked by Rebus' error tracker.
If you remove the rms2-msg-type header, the serializer will most likely throw an error and not deserialize the incoming message.
In both cases, the error will be output to the logger.
And in both cases, the message body (string in this case, but it could be any message type) cannot be constructed from the incomingbyte[], so Rebus cannot dispatch the message as eitherstringnotIFailed`.
Edit after Update 1:
If your 2nd level retries do not kick in, it's most likely because you haven't enabled them:
Configure.With(...)
.(...)
.Options(o => o.SimpleRetryStrategy(secondLevelRetriesEnabled: true))
.Start();
IErrorHandler is called when it is time to move the message to the dead-letter queue. You should see it being called, when all delivery attempts have failed (5 normal delivery attempts + 5 2nd level delivery attempts).
If you've configured things the way I suggested it, AddTransportMessageForwarder is used in another bus instance, which receives messages from the error queue. When IErrorHandler has been called, and the failed messge has been forwarded to the queue error, then your transport message forwarder should be called.

Rebus backoff and Polly support

I have questions on Backoff policy below:
1 Can it be used for both Publiser (enqueue messages) and Subscriber (dequeue messages)?
2 Is Rebus Backoff policy same as Polly's Retry? But the description below mentions idle time, which I am a bit confused.
//
// Summary:
// Configures the timespans to wait when backing off polling the transport during
// idle times. backoffTimes must be a sequence of timespans, which indicates the
// time to wait for each second elapsed being idle. When the idle time exceeds the
// number of timespans, the last timespan will be used.
public static void SetBackoffTimes(this OptionsConfigurer configurer, params TimeSpan[] backoffTimes);
Configure.With(...)
.(...)
.Options(o => {
o.SetBackoffTimes(
TimeSpan.FromMilliseconds(100),
TimeSpan.FromMilliseconds(200),
TimeSpan.FromSeconds(1)
);
})
.Start();
3 Does Rebus support Polly extension? For example, exponential back-off plus some jitter like at the bottom
Random jitterer = new Random();
Policy
.Handle<HttpResponseException>() // etc
.WaitAndRetry(5, // exponential back-off plus some jitter
retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt))
+ TimeSpan.FromMilliseconds(jitterer.Next(0, 100))
);
https://learn.microsoft.com/en-us/dotnet/standard/microservices-architecture/implement-resilient-applications/implement-http-call-retries-exponential-backoff-polly
4 I cannot find ISyncBackoffStrategy on the latest Rebus nutget. Is it deprecated?
Configure.With(...)
.(...)
.Options(o => {
o.Register<ISyncBackoffStrategy>(c => {
var strategy = new MyOwnBackoffStrategy();
return strategy;
});
})
.Start();
https://github.com/rebus-org/Rebus/wiki/Back-off-strategy
Rebus' backoff strategy is used only by a message consumer, so it can relax while there is less work to do.
It's not about RETRY, it's more about being easy on the CPU in quiet times.
The term "idle time" in the documentation simply means "time, where no messages have been received". So, as long as there's messages in the queue, Rebus will process messages as fast as you can handle them, but if suddenly the queue is empty, then it will gradually poll less and less frequently.
You can implement your own backoff strategy by implementing IBackoffStrategy (that's what it's called now, I've updated the wiki accordingly).

exactly once delivery Is it possible through spring-cloud-stream-binder-kafka or spring-kafka which one to use

I am trying to achieve exactly once delivery using spring-cloud-stream-binder-kafka in a spring boot application.
The versions I am using are:
spring-cloud-stream-binder-kafka-core-1.2.1.RELEASE
spring-cloud-stream-binder-kafka-1.2.1.RELEASE
spring-cloud-stream-codec-1.2.2.RELEASE spring-kafka-1.1.6.RELEASE
spring-integration-kafka-2.1.0.RELEASE
spring-integration-core-4.3.10.RELEASE
zookeeper-3.4.8
Kafka version : 0.10.1.1
This is my configuration (cloud-config):
spring:
autoconfigure:
exclude: org.springframework.cloud.netflix.metrics.servo.ServoMetricsAutoConfiguration
kafka:
consumer:
enable-auto-commit: false
cloud:
stream:
kafka:
binder:
brokers: "${BROKER_HOST:xyz-aws.local:9092}"
headers:
- X-B3-TraceId
- X-B3-SpanId
- X-B3-Sampled
- X-B3-ParentSpanId
- X-Span-Name
- X-Process-Id
zkNodes: "${ZOOKEEPER_HOST:120.211.316.261:2181,120.211.317.252:2181}"
bindings:
feed_platform_events_input:
consumer:
autoCommitOffset: false
binders:
xyzkafka:
type: kafka
bindings:
feed_platform_events_input:
binder: xyzkafka
destination: platform-events
group: br-platform-events
I have two main classes:
FeedSink Interface:
package au.com.xyz.proxy.interfaces;
import org.springframework.cloud.stream.annotation.Input;
import org.springframework.messaging.MessageChannel;
public interface FeedSink {
String FEED_PLATFORM_EVENTS_INPUT = "feed_platform_events_input";
#Input(FeedSink.FEED_PLATFORM_EVENTS_INPUT)
MessageChannel feedlatformEventsInput();
}
EventConsumer
package au.com.xyz.proxy.consumer;
#Slf4j
#EnableBinding(FeedSink.class)
public class EventConsumer {
public static final String SUCCESS_MESSAGE =
"SEND-SUCCESS : Successfully sent message to platform.";
public static final String FAULT_MESSAGE = "SOAP-FAULT Code: {}, Description: {}";
public static final String CONNECT_ERROR_MESSAGE = "CONNECT-ERROR Error Details: {}";
public static final String EMPTY_NOTIFICATION_ERROR_MESSAGE =
"EMPTY-NOTIFICATION-ERROR Empty Event Received from platform";
#Autowired
private CapPointService service;
#StreamListener(FeedSink.FEED_PLATFORM_EVENTS_INPUT)
/**
* method associated with stream to process message.
*/
public void message(final #Payload EventNotification eventNotification,
final #Header(KafkaHeaders.ACKNOWLEDGMENT) Acknowledgment acknowledgment) {
String caseMilestone = "UNKNOWN";
if (!ObjectUtils.isEmpty(eventNotification)) {
SysMessage sysMessage = processPayload(eventNotification);
caseMilestone = sysMessage.getCaseMilestone();
try {
ClientResponse response = service.sendPayload(sysMessage);
if (response.hasFault()) {
Fault faultDetails = response.getFaultDetails();
log.error(FAULT_MESSAGE, faultDetails.getCode(), faultDetails.getDescription());
} else {
log.info(SUCCESS_MESSAGE);
}
acknowledgment.acknowledge();
} catch (Exception e) {
log.error(CONNECT_ERROR_MESSAGE, e.getMessage());
}
} else {
log.error(EMPTY_NOTIFICATION_ERROR_MESSAGE);
acknowledgment.acknowledge();
}
}
private SysMessage processPayload(final EventNotification eventNotification) {
Gson gson = new Gson();
String jsonString = gson.toJson(eventNotification.getData());
log.info("Consumed message for platform events with payload : {} ", jsonString);
SysMessage sysMessage = gson.fromJson(jsonString, SysMessage.class);
return sysMessage;
}
}
I have set the autocommit property for Kafka and spring container as false.
if you see in the EventConsumer class I have used Acknowledge in cases where I service.sendPayload is successful and there are no Exceptions. And I want container to move the offset and poll for next records.
What I have observed is:
Scenario 1 - In case where the Exception is thrown and there are no new messages published on kafka. There is no retry to process the message and it seems there is no activity. Even if the underlying issue is resolved. The issue I am referring to is down stream server unavailability. Is there a way to retry the processing n times and then give up. Note this is retry of processing or repoll from the last committed offset. This is not about Kafka instance not available.
If I restart the service (EC2 instance) then the processing happens from the offset where the last successful Acknowledge was done.
Scenario 2 - In case where Exception happened and then a subsequent message is pushed to kafka. I see the new message is processed and the offset moved. It means I lost the message which was not acknowledged. So the question is if I have handled the Acknowledge. How do I control to read from last commit not just the latest message and process it. I am assuming there is internally a poll happening and it did not take into account or did not know about the last message not being acknowledged. I don't think there are multiple threads reading from kafka. I dont know how the #Input and #StreamListener annotations are controlled. I assume the thread is controlled by property consumer.concurrency which controls the thread and by default it is set to 1.
So I have done research and found a lot of links but unfortunately none of them answers my specific questions.
I looked at (https://github.com/spring-cloud/spring-cloud-stream/issues/575)
which has a comment from Marius (https://stackoverflow.com/users/809122/marius-bogoevici):
Do note that Kafka does not provide individual message acking, which
means that acknowledgment translates into updating the latest consumed
offset to the offset of the acked message (per topic/partition). That
means that if you're acking messages from the same topic partition out
of order, a message can 'ack' all the messages before it.
not sure if it is the issue with order when there is one thread.
Apologies for long post, but I wanted to provide enough information. The main thing is I am trying to avoid losing messages when consuming from kafka and I am trying to see if spring-cloud-stream-binder-kafka can do the job or I have to look at alternatives.
Update 6th July 2018
I saw this post https://github.com/spring-projects/spring-kafka/issues/431
Is this a better approach to my problem? I can try latest version of spring-kafka
#KafkaListener(id = "qux", topics = "annotated4", containerFactory = "kafkaManualAckListenerContainerFactory",
containerGroup = "quxGroup")
public void listen4(#Payload String foo, Acknowledgment ack, Consumer<?, ?> consumer) {
Will this help in controlling the offset to be set to where the last
successfully processed record? How can I do that from the listen
method. consumer.seekToEnd(); and then how will listen method reset to get the that record?
Does putting the Consumer in the signature provide support to get
handle to consumer? Or I need to do anything more?
Should I use Acknowledge or consumer.commitSyncy()
What is the significance of containerFactory. do I have to define it
as a bean.
Do I need #EnableKafka and #Configuration for above approach to work?
Bearing in mind the application is a Spring Boot application.
By Adding Consumer to listen method I don't need to implement
ConsumerAware Interface?
Last but not least, Is it possible to provide some example of above approach if it is feasible.
Update 12 July 2018
Thanks Gary (https://stackoverflow.com/users/1240763/gary-russell) for providing the tip of using maxAttempts. I have used that approach. And I am able to achieve exactly once delivery and preserve the order of the message.
My updated cloud-config:
spring:
autoconfigure:
exclude: org.springframework.cloud.netflix.metrics.servo.ServoMetricsAutoConfiguration
kafka:
consumer:
enable-auto-commit: false
cloud:
stream:
kafka:
binder:
brokers: "${BROKER_HOST:xyz-aws.local:9092}"
headers:
- X-B3-TraceId
- X-B3-SpanId
- X-B3-Sampled
- X-B3-ParentSpanId
- X-Span-Name
- X-Process-Id
zkNodes: "${ZOOKEEPER_HOST:120.211.316.261:2181,120.211.317.252:2181}"
bindings:
feed_platform_events_input:
consumer:
autoCommitOffset: false
binders:
xyzkafka:
type: kafka
bindings:
feed_platform_events_input:
binder: xyzkafka
destination: platform-events
group: br-platform-events
consumer:
maxAttempts: 2147483647
backOffInitialInterval: 1000
backOffMaxInterval: 300000
backOffMultiplier: 2.0
Event Consumer remains the same as my initial implementation. Except for rethrowing the error for the container to know the processing has failed. If you just catch it then there is no way container knows the message processing has failures. By doing acknoweldgement.acknowledge you are just controlling the offset commit. In order for retry to happen you must throw the exception. Don't forget to set the kafka client autocommit property and spring (container level) autocommitOffset property to false. Thats it.
As explained by Marius, Kafka only maintains an offset in the log. If you process the next message, and update the offset; the failed message is lost.
You can send the failed message to a dead-letter topic (set enableDlq to true).
Recent versions of Spring Kafka (2.1.x) have special error handlers ContainerStoppingErrorHandler which stops the container when an exception occurs and SeekToCurrentErrorHandler which will cause the failed message to be redelivered.

What if only send without recv in my Thrift client?

I'm implementing a Thrift client in order to make connection to a built-in scribe server.
Everything is going OK if I use a standard Log method, like this:
public boolean log(List<LogEntry> messages) {
boolean ret = false;
PooledClient client = borrowClient();
try {
if ((client != null) && (client.getClient() != null)) {
ResultCode result = client.getClient().Log(messages);
ret = (result != null && result.equals(ResultCode.OK));
returnClient(client);
}
} catch (Exception ex) {
logger.error(LogUtil.stackTrace(ex));
invalidClient(client);
}
return ret;
}
However, when I use send_Log instead:
public void send_Log(List<LogEntry> messages) {
PooledClient client = borrowClient();
try {
if ((client != null) && (client.getClient() != null)) {
client.getClient().send_Log(messages);
returnClient(client);
}
} catch (Exception ex) {
logger.error(LogUtil.stackTrace(ex));
invalidClient(client);
}
}
It acctually causes some problems:
Total network connection to port 1463 (default port for a scribe server) is going to increase so much, and always in a CLOSE_WAIT state.
Cause my application got stuck without throwing any error, I think it may be an issue with network connection.
what if send without recv
As this is clearly TCP, the sender will block (in blocking mode), or incur EAGAIN/EWOULDBLOCK in non-blocking mode. EDIT It is now clear that you want to send without receiving the reply. You can do that by just sending and then closing the socket, but that may cause the peer to incur ECONNRESET, which may upset it. You should really implement the application protocol correctly.
1/ Total network connection to port 1463 (default port for a scribe server) is going to increase so much, and always in a CLOSE_WAIT state.
Lots of ports in CLOSE_WAIT state indicates a socket leak on the part of the local application.
2/ Cause my application got stuck without throwing any error. I think it may be an issues with network connection.
It is an issue with sending and not receiving.
Since you labelled this as a Thrift related question, the answer is oneway.
service foo {
oneway void FireAndForget(1: some args)
}
The oneway keyword does exactly what the name suggests. You get a client implementation that only sends and does not wait for anything to be returned from the server. This rule also includes exceptions. Hence a oneway method must always be void and can't throw any exceptions.
However, when I use send_Log instead ...
client.getClient().send_Log(messages);
Neither one of the Thrift-generated send_Xxx and recv_Xxx methods are meant to be public. That's why they are usually either private or protected methods. They should not be called directly, unless you are sure that you know what you are doing (and very obviously the latter is not the case here).
And since the real question is about performance: Why don't you just delegate the call(s) into a secondary thread? That way the I/O will not block the UI.

Resources