How to retry failed ConsumerRecord in reactor-kafka - spring-kafka

I am trying on reactor-kafka for consuming messages. Everything else work fine, but I want to add a retry(2) for failing messages. spring-kafka already retries failed record 3 times by default, I want to achieve the same using reactor-kafka.
I am using spring-kafka as a wrapper for reactive-kafka. Below is my consumer template:
reactiveKafkaConsumerTemplate
.receiveAutoAck()
.map(ConsumerRecord::value)
.flatMap(this::consumeWithRetry)
.onErrorContinue((error, value)->log.error("something bad happened while consuming : {}", error.getMessage()))
.retryWhen(Retry.backoff(30, Duration.of(10, ChronoUnit.SECONDS)))
.subscribe();
Let us consider the consume method is as follows
public Mono<Void> consume(MessageRecord message){
return Mono.error(new RuntimeException("test retry"); //sample error scenario
}
I am using the following logic to retry the consume method on failure.
public Mono<Void> consumeWithRetry(MessageRecord message){
return consume(message)
.retry(2);
}
I want to retry consuming the message if the current consumer record fails with exception. I have tried to wrap the consume method with another retry(3) but that does not serve the purpose. The last retryWhen is only for retrying subscription on kafka rebalances.
#simon-baslé #gary-russell

Previously while retrying I was using the below approach:
public Mono<Void> consumeWithRetry(MessageRecord message){
return consume(message)
.retry(2);
}
But it was not retrying. After adding Mono.defer, the above code works and adds required retry.
public Mono<Void> consumeWithRetry(MessageRecord message){
return Mono.defer(()->consume(message))
.retry(2);
}

Related

How to manual commit do not recorverd offset already sent DLT through CommonErrorHandler

A simple example is currently being made through the spring kafka.
If an exception occurs at the service layer, I want to commit the original offset after trying to retry and loading it into the dead letter queue.
However, the dead letter queue is loaded properly, but the original message remains in the kafka because the commit is not processed.
To show you my code, it is as follows.
KafkaConfig.java
...
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setCommonErrorHandler(kafkaListenerErrorHandler());
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
return factory;
}
private CommonErrorHandler kafkaListenerErrorHandler() {
DefaultErrorHandler defaultErrorHandler = new DefaultErrorHandler(
new DeadLetterPublishingRecoverer(template, DEAD_TOPIC_DESTINATION_RESOLVER),
new FixedBackOff(1000, 3));
defaultErrorHandler.setCommitRecovered(true);
defaultErrorHandler.setAckAfterHandle(true);
defaultErrorHandler.setResetStateOnRecoveryFailure(false);
return defaultErrorHandler;
}
...
KafkaListener.java
...
#KafkaListener(topics = TOPIC_NAME, containerFactory = "kafkaListenerContainerFactory", groupId = "stock-adjustment-0")
public void subscribe(final String message, Acknowledgment ack) throws IOException {
log.info(String.format("Message Received : [%s]", message));
StockAdjustment stockAdjustment = StockAdjustment.deserializeJSON(message);
if(stockService.isAlreadyProcessedOrderId(stockAdjustment.getOrderId())) {
log.info(String.format("AlreadyProcessedOrderId : [%s]", stockAdjustment.getOrderId()));
} else {
if(stockAdjustment.getAdjustmentType().equals("REDUCE")) {
stockService.decreaseStock(stockAdjustment);
}
}
ack.acknowledge(); // <<< does not work!
}
...
Stockservice.java
...
if(stockAdjustment.getQty() > stock.getAvailableStockQty()) {
throw new RuntimeException(String.format("Stock decreased Request [decreasedQty: %s][availableQty : %s]", stockAdjustment.getQty(), stock.getAvailableStockQty()));
}
...
At this time, when RuntimeException occur in the service layer as above, the DLT is issued through an CommonErrorhandler according to the Kafka setting.
However, after issuing the DLT, the original message remains in Kafka, so there is a need for a solution.
I looked it up and found that the setting I wrote is processed through SeekUtils.seekOrRecover(), and if it is not processed even if the maximum number of attempts is not processed, an exception occurs and the original offset is rolled back without processing a commit.
According to the document, it seems that the AfterRollbackProcessor handles rollback if it fails with the default value, but I don't know how to write the code to commit even if it fails.
EDITED
The above code and settings work normally.
I thought the consumer lag would occur, but when I judged the actual logic code(SeekUtils.seekOrRecover()) and checked the offset commit and lag, I confirmed that it works normally.
I think it was caused by my mistake.
Records are never removed (until they expire), the consumer's committed offset is updated.
Use kafka-consumer-groups.sh to describe the group to see the committed offset for the failed record that was sent to the DLT.

gRPC client failing with "CANCELLED: io.grpc.Context was cancelled without error"

I have a gRPC server written in C++ and a client written in Java.
Everything was working fine using a blocking stub. Then I decided that I want to change one of the calls to be asynchronous, so I created an additional stub in my client, this one is created with newStub(channel) as opposed to newBlockingStub(channel). I didn't make any changes on the server side. This is a simple unary RPC call.
So I changed
Empty response = blockingStub.callMethod(request);
to
asyncStub.callMethod(request, new StreamObserver<Empty>() {
#Override
public void onNext(Empty response) {
logInfo("asyncStub.callMethod.onNext");
}
#Override
public void onError(Throwable throwable) {
logError("asyncStub.callMethod.onError " + throwable.getMessage());
}
#Override
public void onCompleted() {
logInfo("asyncStub.callMethod.onCompleted");
}
});
Ever since then, onError is called when I use this RPC (Most of the time) and the error it gives is "CANCELLED: io.grpc.Context was cancelled without error". I read about forking Context objects when making an RPC call from within an RPC call, but that's not the case here. Also, the Context seems to be a server side object, I don't see how it relates to the client. Is this a server side error propagating back to the client? On the server side everything seems to complete successfully, so I'm at a loss as to why this is happening. Inserting a 1ms sleep after calling asyncStub.callMethod seems to make this issue go away, but defeats the purpose. Any and all help in understanding this would be greatly appreciated.
Some notes:
The processing time on the server side is around 1 microsecond
Until now, the round trip time for the blocking call was several hundred microseconds (This is the time I'm trying to cut down, as this is essentially a void function, so I don't need to wait for a response)
This method is called multiple times in a row, so before it used to wait until the previous one finished, now they just fire off one after the other.
Some snippets from the proto file:
service EventHandler {
rpc callMethod(Msg) returns (Empty) {}
}
message Msg {
uint64 fieldA = 1;
int32 fieldB = 2;
string fieldC = 3;
string fieldD = 4;
}
message Empty {
}
So it turns out that I was wrong. The context object is used by the client too.
The solution was to do the following:
Context newContext = Context.current().fork();
Context origContext = newContext.attach();
try {
// Call async RPC here
} finally {
newContext.detach(origContext);
}
Hopefully this can help someone else in the future.

Call the retry listener from inside another listener maintaining with the whole retry Logic

We are using spring-kafka-2.2.8.RELEASE. I have an specific situation where I need help. I have 4 topics topic, retryTopic, successTopic and errorTopic. If topic fails, should be redirected to retryTopic where the 3 attempts to retry will be made. If those attempts fails, must redirect to errorTopic. In case of sucess on both topic and retryTopic, should be redirected to the sucessTopic. This situation is already implemented based on the question How to retry with spring kafka version 2..2.
But now, I have a new situation where I need to call the retryTopic listener from inside the topic listener based on a business logic error without an Exception been thrown(it already calls the retryTopic when an exception is thrown and it must remain with this behavior). And I also need to know on which retry attempt number the retryTopic is been called as a paramater of the listener bellow.
#KafkaListener(id = "so60172304.2", topics = "retryTopic")
public void listen2(String in) {
RetryTemplate retryTemplate = new RetryTemplate();
retryTemplate.execute(new RetryCallback<Void, RuntimeException>() {
#Override
public Void doWithRetry(RetryContext retryContext) throws RuntimeException {
// Can I get the retry count here? It didn't work
Integer count =RetrySynchronizationManager.getContext().getRetryCount());
return this.doWithRetry(retryContext);
}
});
}
There is no reason you can't call one listener from another (but you won't get retries unless you call it using a RetryTemplate in the first method).
If you use a RetryTemplate configured on the container factory to do the retries (rather than adding a BackOff to the SeektoCurrentErrorHandler in versions 2.3.x and higher), you can obtain the retry count (starting at zero) like this...
#KafkaListener(id = "so60172304.2", topics = "retryTopic")
public void listen2(String in) {
int retryCount = RetrySynchronizationManager.getContext().getRetryCount();
...
}
getContext() will return null if you call this directly from the first method (unless you wrap the call in a RetryTemplate.execute()).
In 2.5.x a delivery attempt header will be available (optionally) even if using the SeektoCurrentErrorHandler with a BackOff instead of using a RetryTemplate in the container factory.

Should I log errors in a controller or in a service?

I have a Symfony controller which basically checks if requested parameters are in the request, then passes these parameters to a service. The service use Guzzle to call an API, does some things with the result and then passes it back to the controller in order to display a Json with the response.
I have a noob question about the handling of errors, if the Api I call with Guzzle return an error, what is the best solution ?
Solution 1: Should I log the error using the Logger service injected in my own service and return an error to my controller in order to display it.
Solution 2: Should I throw an Exception in the service, catch it in my controller and use the $this->get("Logger") in the controller in order to log the error in log files
It would be nice if your core logic is itself in a service and not in your controller.
That way, you could use try-catch block inside the service where you call another service and your controller stays clean and neat - you just call the service without catching any exception.
// AppBundle/src/Controller/MainController.php
public function mainAction()
{
// ...
$result = $this->get('my_service')->getResult($parameters);
if (!$result) {
// show an error message, pass it to another service, ignore it or whatever you like
}
}
// AppBundle/src/Service/MyService.php
public function getResult($parameters)
{
try {
$apiResult = $this->apiService->get($parameters);
} catch (ApiException $e)
$this->logger->error('My error message');
$apiResult = null;
}
return $apiResult;
}
Consider also a Solution 3: Throw an exception in a service and catch it in a custom exception listener, where you can log it and take further action (like replacing the Response object, etc.).

Set default timeout in all asynchronous requests

I'm using #Suspended AsyncResponse response in my requests and starting threads to process the request. When the process finishes, I'm trying to resume the response but RestEasy is marking the request as done because the request thread has finished and no timeout was set in the response. If I set timeout, it works fine but I would need to set the timeout in every asynchronous request I want want to implement. Is there anyway to horizontally set the timeout to all my suspended AsyncRequests?
Unfortunately, the JAX-RS 2.0 specification, the RESTEasy documentation and the Jersey documentation don't mention anything about setting a default timeout for the AsyncResponse.
The Jersey documentation mentions the following:
By default, there is no timeout defined on the suspended AsyncResponse instance. A custom timeout and timeout event handler may be defined using setTimeoutHandler(TimeoutHandler) and setTimeout(long, TimeUnit) methods. The setTimeoutHandler(TimeoutHandler) method defines the handler that will be invoked when timeout is reached. The handler resumes the response with the response code 503 (from Response.Status.SERVICE_UNAVAILABLE). A timeout interval can be also defined without specifying a custom timeout handler (using just the setTimeout(long, TimeUnit) method).
So, the solution won't be different from the solution you are already using:
#GET
public void longRunningOperation(#Suspended final AsyncResponse asyncResponse) {
// Register a timeout handler
asyncResponse.setTimeoutHandler(new TimeoutHandler() {
#Override
public void handleTimeout(AsyncResponse asyncResponse) {
asyncResponse.resume(Response.status(SERVICE_UNAVAILABLE)
.entity("Operation timed out. Please try again.").build());
}
});
// Set timeout
asyncResponse.setTimeout(15, SECONDS);
// Execute long running operation in new thread
executor.execute(new Runnable() {
#Override
public void run() {
executeLongRunningOp();
asyncResponse.resume("Hello async world!");
}
});
}

Resources