I have one question and more of trying to understand so i can implement it right way. Requirement is to stop the ConcurrentMessageListenerContainer by invoking stop method on container which OOTB iterates for KafkaMessageListenerContainer(based on concurrency defined) and invoke stop for each consumer Thread.
Just FYI, i am on 1.3.5 and i cannot migrate to 2.* due to Spring Boot 1.5.*.
Configuration:
Let's say i have topic with 5 partitions and concurrency defined as 5 as well. Using Batchlistener so have batch records count =100 for each poll.
Question:
When we invoke stop on container, it appears internally, it set running false for each KafkaMessageListenerContainer and call wakeup on listenerConsumer.
setRunning(false);
this.listenerConsumer.consumer.wakeup();
During testing what i have observed, by invoking stop on container in separate thread, it does below:
1)It stops listenerConsumer and that's for sure working. No more polling happens after calling stop.
2)It seems if any listenerConsumer has already polled 100 records, and in middle of processing it completes the execution before stopping.
is #2 per design that invoking container stop only send wakeup to stop next polling? Because i don't see any handling of below in KafkaMessageListenerContainer.run()
catch (WakeupException e) {
//Ignore, we're stopping
}
One more thing, even in spring kafka 2.1 version by having ContainerStoppingBatchErrorHandler, it calls same container stop so i guess it's more of my understanding how to handle this scenario..
To conclude, if above is lot more detail, i want to terminate the listenerThread if stop is been invoked from separate thread. I have manual offset commit so replaying of batch is fine.
Hi Gary,
As you suggested to have consumeraware listener, my question is specific to listener been stopped through container. Once container invokes stop, listener thread be it BatchListner supposed to be interrupted from execution. I know entire poll of records has been received by listener and question is not about loosing offset as ack is at batch level.
It's not clear what your issue is; the container stops when the listener returns after processing the batch.
If you want to manage offsets at the record level, don't use a BatchMessageListener; use a record-level MessageListener instead.
With the former, the whole batch of records returned by the poll is considered a unit of work; offsets for the entire batch are committed, or not.
Even with a record-level listener, the container will not stop until the current batch of records have been sent to the listener; in that case, what you should do depends on the acknowledge mode; with manual acks, simply ignore those records that are received after the stop. If the container manages the acks; throw an exception for the records you want to discard (with ackOnError=false - default is true).
It's unfortunate that you can't move to a more recent release.
With 2.1; there is much more flexibility. For example; the SeekToCurrentErrorHandler and SeekToCurrentBatchErrorHandler are provided.
The SeekToCurrentErrorHandler extends RemainingRecordsErrorHandler which means the remaining records in the poll are sent to the error handler instead of the listener; when the container detects there is a RemainingRecordsErrorHandler the listener won't get the remaining records in the batch and the handler can decide what do do - stop the container, perform seeks etc.
So you no longer need to stop the container when a bad record is received.
This is all designed for handling errors/stopping the container due to some data error.
There currently is no stopNow() option on the containers - immediately stop calling the listener after the current record is processed and discard any remaining records so they will be resent on a start. Currently, you have to discard them yourself as I described above.
We could consider adding such an option but it would be a 2.2 feature at the earliest.
We only provide bug fixes in 1.3.x and, then, only if there is no work around.
It's open source, so you can always fork it yourself, and make contributions back to the framework.
Related
I'm trying to verify that kafka-listeners are closed gracefully when i shutdown my application (either gracefully or in some aggressive way). Can somebody point me to place where it is handled? I was looking at destroy and close methods but I couldn't locate where kafka client deregistration is actually happening.
What happens if deregistration takes too much time? Or if some other spring bean shutdown hook takes too much time?
Also what it actually means for currently processing events? Can they commit offset during shutdown if they happen to finish their work?
I'm working with spring-kafka 1.3.5 version.
The current 1.3.x version is 1.3.8; you should upgrade. 1.3.9 will be released early January.
When you stop() a listener container, the container will stop in an orderly fashion and process any records that have previously returned by a pol() (but no further polls will be made). However, you should ensure your listener will process any remaining records within the shutdownTimeout (default 10 seconds, but configurable). The stop() operation will block for that time, waiting for the container(s) to stop.
With more modern versions (2.2.x, currently 2.2.2) you can use an ApplicationListener or #EventListener to consume ContainerStoppedEvent.
This is a question related to :
https://github.com/spring-projects/spring-kafka/issues/575
I'm using spring-kafka 1.3.7 and transactions in a read-process-write cycle.
For this purpose, I should use a KTM on the spring kafka container to enable transaction on the whole listener process and automatic handling the transaction id based on the partition for zombie fencing(1.3.7 changes).
If I understand well from the issue #575, I can not use a RetryTemplate in a container when using a transaction manager.
How am I supposed to handle errors and retries in a such case ?
The default behavior with transaction is infinite retries ? This seems really dangerous. An unexpected exception might simply block the whole process in production.
The upcoming 2.2 release adds recovery to the DefaultAfterRollbackProcessor - so you can stop retrying after some number of attempts.
Docs Here, PR here.
It also provides an optional mechanism to send the failed record to a dead-letter topic.
If you can't move to 2.2 (release candidate due at the end of this week, with GA in October), you can provide a custom AfterRollbackProcessor with similar functionality.
EDIT
Or, you could add code to your listener (or its error handler) to keep track of how many times the same record has been delivered, and handle the error in your listener, or its listener-level error handler.
How does this.unblock work in Meteor?
The docs say:
Call inside a method invocation. Allow subsequent method from this client to begin running in a new fiber.
On the server, methods from a given client run one at a time. The N+1th invocation from a client won't start until the Nth invocation returns. However, you can change this by calling this.unblock. This will allow the N+1th invocation to start running in a new fiber.
How can new code start running in a new fiber if Node runs in a single thread? Does it only unblock when we get to an I/O request, but no unblock would happen if we were running a long computation?
Fibers are an abstraction layer on top of Node's Event Loop. They change how we write code to interact with the Event Loop, but they do not change how Node works. Meteor, among other things, is sort of an API to Fibers.
Each client request in Meteor creates a new fiber. Meteor methods called by the client, by default, will queue up behind each other. This is the default behavior likely because there is an assumption that you want Mongo up to date for all clients before continuing execution. However, if you do not need your clients to work with the latest up to date globals or data, you can use this.unblock() to put each of these client requests in Node's Event Loop without waiting for the previous to complete. However, we are still constrained to Node's Event Loop.
So this.unblock() works by allowing all client requests to that method enter the Event Loop (non IO blocking execution based on callbacks). However, as Node is still a single threaded application, CPU intensive operations will block the callbacks in the Event Loop. That is why Node is not a good choice for CPU intensive work, and that doesn't change with Meteor or Meteor's interaction with Fibers/the Event Loop.
A simple analogy: The Event Loop, or our single Node thread, is a highway. Each car on the highway is a complex event driven function that will eventually exit off the highway when its callbacks complete. Fibers allow us to more easily control who gets on the highway and when. Meteor methods allow a single car on the highway at a time by default, but when properly using this.unblock() you allow multiple cars on the highway. However, a CPU intensive operation on any fiber will cause a traffic jam. I/O and network will not.
Kind of an open question that I run into once in a while -- if you have an EJB stateful or stateless bean, or possibly a direct servlet process, that may with the wrong parameters start running long on a production system, how could you effectively add in a manual 'kill switch' for an administrator/person to specifically kill that thread/process?
You can't, or at least you shouldn't, interfere with application server threads directly. So a "kill switch" look definitively inappropriate to me in a Java EE environment.
I do however understand the problem you have, but would rather suggest to take an asynchronous approach where you split you job in smaller work unit.
I did that using EJB Timers and was happy with the result: An initial timer is created for the first work unit. When the app. server executes the timer, it then register as second one that correspond to the 2nd work unit, etc. Information can be passed form one work unit to the other because EJB Timers support the storage of custom information. Also, timer execution and registration is transactional, which is fine to work with database. You can even shutdown and restart the application sever with this approach. Before each work unit ran, we checked in database if the job had been canceled in the meantime.
I have an ASP.Net application that needs needs to have some work performed by another machine. To do this I am leaving a message on queue visible to both machines. When the work is done a message is left on second queue.
I need the ASP.Net application to check the second queue periodically to see if any of the tasks are complete.
Where is the best place to but such a loop? Global.asax?
I remember reading somewhere that you can get a function called after an interval. Would that be suitable?
To achieve periodical tasks on asp.net, I've found two acceptable approaches:
Spawn a thread during Application_Start at global.asax, in a while loop (1) Do the work (2) Sleep the thread for an interval.
Again in Application_Start, insert a dummy item into asp.net cache, expires in a certain interval and give that cache item a callback to be called when it's expired. In that callback, you can do the work and insert the cache item back the same way.
In both ways, you need to make sure that your thread keeps working even if there happens an error. You may place a restore code in SessionStart and BeginRequest to check your thread or cache item is there, and renew it if something has happened to it.
I assume that this is done on a regular basis, and that some other process puts the items on the queue?
If that is the case, you might put something in Global.asax that on application start creates a separate thread that simply monitors the queue, you could use a timer to have that thread sleep for X seconds, then check for results.