I am using ConcurrentMessageListenerContainer with auto commit set to false to consume messages from the topic and write to Database. If Database is down I need to stop the container from processing current record in a poll and don't do next poll(). I have implemented DataSourceHealthIndicator and after checking Database status to UP I want to restart my container again to process remaining records.
Any suggestion how can I stop processing remaining records and stop the container, I have tried to use consumer.close(). But it doesn't stop the process and kept throwing consumer is already closed.
Call container.stop(). If you are using #KafkaListeners, call stop() on the listener container registry which will stop all registered containers.
EDIT
Error handlers for automatically stopping the container have been available since the 2.1 release; at the time of writing the current version is 2.2.3.
Related
I am working on Spring Kafka implementation and my use case is consume messages from Kafka topic as batch (using batch listener). when I consumer the list of messages, will iterate and call the REST endpoint for message enrichment. In case REST API fails for any runtime exception, I have implemented retry logic using spring retry. I want to stop the container, after the number of retries fails. So planning to use KafkaContainerStoppingErrorHandler to achieve this. Does the KafkaContainerStoppingErrorHandler commits the previous success messages - say if we receive 10 messages, and for message 1,2,3,4, enrichment call is success and for message 5 enrichment API call fails. so when we restart the container, will I get all 10 again or will I receive messages 5- 10?
or is there a way we can achieve above use case? I looked into all types of error handles of Spring kafka and need input on how to achieve above requirement.
You will get them all again.
You can use the DefaultErrorHandler (with a custom recoverer) and throw a BatchListenerFailedException to indicate which record in the batch failed.
The error handler will commit the offsets up to that record and call the recoverer with the failed record; in your custom recoverer you can stop the container (use the same logic as the container stopping error handler).
In versions before 2.8, this same functionality is provided by the RecoveringBatchErrorHandler.
The hazelcast cluster runs in an application running on Kubernetes. I can't see any traces of partitioning or other problems in the logs. At some point, this exception starts to appear in the logs:
hz.dazzling_morse.partition-operation.thread-1 com.hazelcast.logging.StandardLoggerFactory$StandardLogger: app-name, , , , , - [172.30.67.142]:5701 [app-name] [4.1.5] Executor is shut down.
java.util.concurrent.RejectedExecutionException: Executor is shut down.
at com.hazelcast.scheduledexecutor.impl.operations.AbstractSchedulerOperation.checkNotShutdown(AbstractSchedulerOperation.java:73)
at com.hazelcast.scheduledexecutor.impl.operations.AbstractSchedulerOperation.getContainer(AbstractSchedulerOperation.java:65)
at com.hazelcast.scheduledexecutor.impl.operations.SyncBackupStateOperation.run(SyncBackupStateOperation.java:39)
at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:184)
at com.hazelcast.spi.impl.operationexecutor.OperationRunner.runDirect(OperationRunner.java:150)
at com.hazelcast.spi.impl.operationservice.impl.operations.Backup.run(Backup.java:174)
at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:184)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:256)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:237)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:452)
at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:166)
at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:136)
at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)
I can't see any particular operation failing, prior to that. I do run some scheduled operations myself, but they are executing inside try-catch blocks and are never throwing.
The consequence is that whenever a node in the cluster restarts no data is replicated to the new node, which eventually renders the entire cluster useless - all data that's supposed to be cached and replicated among nodes disappears.
What could be the cause? How can I get more details about what causes whatever executor hazelcast uses to shut down?
Based on other conversations...
Your Runnable / Callable should implement HazelcastInstanceAware.
Don't pass the HazelcastInstance or IExecutorService as a non-transient argument... as the instance where the runnable is submitted will be different from the one where it runs.
See this.
when I click the Start button to run the workflow, I meet the following situation: the task status always in the successfully submitted status, how can I solve this problem?
1、first check whether the WorkerServer service exists through jps, or directly check whether there is a worker service in zk from the service monitoring.
2、If the WorkerServer service is normal, you need to check whether the MasterServer puts the task task in the zk queue. You need to check whether the task is blocked in the MasterServer log and the zk queue.
3、If there is no problem above, you need to locate whether the Worker group is specified, but the machine grouped by the worker is not online.
We get many number of requests in a queue. We are instantiating the workermanager work as and when we get any request. How does the ExistingWorkPolicy.REPLACE work?
Document says
If there is existing pending (uncompleted) work with the same unique name, cancel and delete it.
Will it also kill the existing running worker in the middle? We really do not want the existing worker to stop in the middle, it is ok to be replaced when the worker is enqueued but not in running state. Can we use REPLACE option here?
https://developer.android.com/reference/androidx/work/ExistingWorkPolicy
As explained in WorkManager's guide and in your question, when you enqueue a new UniqueWorkRequest using REPLACE as the existing worker policy, this is going to stop a previous worker that is currently running.
What happens to your worker really depends on how you implemented it (Worker, CoroutineWorker, or another ListenableWorker subclass) and how you handle stoppages and cancellations]2.
What this means, is that your Worker needs to "cooperatively" finish and cleanup your worker:
In the case of unique work, you explicitly enqueued a new WorkRequest with an ExistingWorkPolicy of REPLACE. The old WorkRequest is immediately considered terminated.
Under these conditions, your worker will receive a call to ListenableWorker.onStopped(). You should perform cleanup and cooperatively finish your worker in case the OS decides to shut down your app.
I got the rebalance issue in Kafka Consumer and able to solve by setting correct value for max.poll.interval.ms.
But Currently I have seen some of the transaction missing.
So wanted to see the logs which offset value not processed.
So how I can see the logs in spring Kafka.
I am using spring Kafka version 2.2.2.RELEASE
If you are using #KafkaListener, you can add #Header(KafkaHeaders.OFFSET) long offset as a parameter to the listener method.
Also, you can set the commitLogLevel container property on the containerProperties to INFO and the container will log all commits at that level.
https://docs.spring.io/spring-kafka/docs/2.3.3.RELEASE/api/index.html?org/springframework/kafka/listener/ConsumerProperties.html
(ContainerProperties is s subclass of ConsumerProperties in 2.3).