I got the rebalance issue in Kafka Consumer and able to solve by setting correct value for max.poll.interval.ms.
But Currently I have seen some of the transaction missing.
So wanted to see the logs which offset value not processed.
So how I can see the logs in spring Kafka.
I am using spring Kafka version 2.2.2.RELEASE
If you are using #KafkaListener, you can add #Header(KafkaHeaders.OFFSET) long offset as a parameter to the listener method.
Also, you can set the commitLogLevel container property on the containerProperties to INFO and the container will log all commits at that level.
https://docs.spring.io/spring-kafka/docs/2.3.3.RELEASE/api/index.html?org/springframework/kafka/listener/ConsumerProperties.html
(ContainerProperties is s subclass of ConsumerProperties in 2.3).
Related
I am working on Spring Kafka implementation and my use case is consume messages from Kafka topic as batch (using batch listener). when I consumer the list of messages, will iterate and call the REST endpoint for message enrichment. In case REST API fails for any runtime exception, I have implemented retry logic using spring retry. I want to stop the container, after the number of retries fails. So planning to use KafkaContainerStoppingErrorHandler to achieve this. Does the KafkaContainerStoppingErrorHandler commits the previous success messages - say if we receive 10 messages, and for message 1,2,3,4, enrichment call is success and for message 5 enrichment API call fails. so when we restart the container, will I get all 10 again or will I receive messages 5- 10?
or is there a way we can achieve above use case? I looked into all types of error handles of Spring kafka and need input on how to achieve above requirement.
You will get them all again.
You can use the DefaultErrorHandler (with a custom recoverer) and throw a BatchListenerFailedException to indicate which record in the batch failed.
The error handler will commit the offsets up to that record and call the recoverer with the failed record; in your custom recoverer you can stop the container (use the same logic as the container stopping error handler).
In versions before 2.8, this same functionality is provided by the RecoveringBatchErrorHandler.
while upgrading our spring kafka to 2.7.8, we are getting error in the setAckOnError(false) method as it has been deprecated now. Is there any way now to set the acknowledgement for the errors to false? Any other methods that can help me to set it as false for errors acknowledgement?
P.S: I am new to Kafka, any help appreciated!
That property was found to have a (very small) timing hole in that a record could be ack'd before the error handler handles it; if the app dies at that time, the record could be "lost".
It was replaced by a new feature in the error handlers ackAfterHandle; which is true by default - i.e. the record's offset is only committed if the error handler "handles" the error.
Records are now never ack'd if the error handler (such as the SeekToCurrentErrorHandler) throws an exception (after it repositions the partitions).
There is no extra configuration needed any more.
See Spring Kafka AckOnError for more details.
I am using spring data 1.5.8 and manual ack mode of kafka. There is only a simple ErrorHandler interface, the handle method has parameter exception and consumerRecord, but how can I set the offset to next if exception threw when json can not be deserialized? I need omit the message or else the consumer will stuck there
spring data 2.0 introduce ConsumerAwareErrorHandler, the only way is update spring version?
Yes. You need to upgrade. The current version is 2 4.0 or 2.3.4.
I am using ConcurrentMessageListenerContainer with auto commit set to false to consume messages from the topic and write to Database. If Database is down I need to stop the container from processing current record in a poll and don't do next poll(). I have implemented DataSourceHealthIndicator and after checking Database status to UP I want to restart my container again to process remaining records.
Any suggestion how can I stop processing remaining records and stop the container, I have tried to use consumer.close(). But it doesn't stop the process and kept throwing consumer is already closed.
Call container.stop(). If you are using #KafkaListeners, call stop() on the listener container registry which will stop all registered containers.
EDIT
Error handlers for automatically stopping the container have been available since the 2.1 release; at the time of writing the current version is 2.2.3.
I have written a very simple Flink streaming job which takes data from Kafka using a FlinkKafkaConsumer082.
protected DataStream<String> getKafkaStream(StreamExecutionEnvironment env, String topic) {
Properties result = new Properties();
result.put("bootstrap.servers", getBrokerUrl());
result.put("zookeeper.connect", getZookeeperUrl());
result.put("group.id", getGroup());
return env.addSource(
new FlinkKafkaConsumer082<>(
topic,
new SimpleStringSchema(), result);
}
This works very well and whenever I put something into the topic on Kafka, it is received by my Flink job and processed. Now I tried to see what happens if my Flink Job isn't online for some reason. So I shut down the flink job and kept sending messages to Kafka. Then I started my Flink job again and was expecting that it would process the messages that were sent meanwhile.
However, I got this message:
No prior offsets found for some partitions in topic collector.Customer. Fetched the following start offsets [FetchPartition {partition=0, offset=25}]
So it basically ignored all messages that came since the last shutdown of the Flink job and just started to read at the end of the queue. From the documentation of FlinkKafkaConsumer082 I gathered, that it automatically takes care of synchronizing the processed offsets with the Kafka broker. However that doesn't seem to be the case.
I am using a single-node Kafka installation (the one that comes with the Kafka distribution) with a single-node Zookeper installation (also the one that is bundled with the Kafka distribution).
I suspect it is some kind of misconfiguration or something the like but I really don't know where to start looking. Has anyone else had this issue and maybe solved it?
I found the reason. You need to explicitly enable checkpointing in the StreamExecutionEnvironment to make the Kafka connector write the processed offsets to Zookeeper. If you don't enable it, the Kafka connector will not write the last read offset and it will therefore not be able to resume from there when the collecting Job is restarted. So be sure to write:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(); // <-- this is the important part
Anatoly's suggestion for changing the initial offset is probably still a good idea, in case checkpointing fails for some reason.
https://kafka.apache.org/08/configuration.html
set auto.offset.reset to smallest(by default it's largest)
auto.offset.reset:
What to do when there is no initial offset in Zookeeper or if an
offset is out of range:
smallest : automatically reset the offset to the smallest offset
largest : automatically reset the offset to the largest offset
anything else: throw exception to the consumer.
If this is set to largest, the consumer may lose some messages when the number of partitions, for the topics it subscribes to, changes on the broker. To
prevent data loss during partition addition, set auto.offset.reset to
smallest
Also make sure getGroup() is the same after restart