autoStartup for #StreamListener - spring-kafka

Unlike #KafkaListener, it looks like #StreamListener does not support the autoStartup parameter. Is there a way to achieve this same behavior for #StreamListener? Here's my use case:
I have a generic Spring application that can listen to any Kafka topic and write to its corresponding table in my database. For some topics, the volume is low and thus processing a single message with very low latency is fine. For other topics that are high volume, the code should receive a microbatch of messages and write to the database using Jdbc batch on a less frequent basis. Ideally the definition for the listeners would look something like this:
// low volume listener
#StreamListener(target = Sink.INPUT, autoStartup="${application.singleMessageListenerEnabled}")
public void handleSingleMessage(#Payload GenericRecord message) ...
// high volume listener
#StreamListener(target = Sink.INPUT, autoStartup="${application.multipleMessageListenerEnabled}")
public void handleMultipleMessages(#Payload List<GenericRecord> messageList) ...
For a low-volume topic, I would set application.singleMessageListenerEnabled to true and application.multipleMessageListenerEnabled to false, and vice versa for a high-volume topic. Thus, only one of the listeners would be actively listening for messages and the other not actively listening.
Is there a way to achieve this with #StreamListener?

First, please consider upgrading to functional programming model which would take you minutes to refactor. We've all but deprecated the annotation-based programming model.
If you do then what you're trying to accomplish is very easy:
#SpringBootApplication
public class SimpleStreamApplication {
public static void main(String[] args) throws Exception {
SpringApplication.run(SimpleStreamApplication.class);
}
#Bean
public Consumer<GenericRecord> singleRecordConsumer() {...}
#Bean
public Consumer<List<GenericRecord>> multipleRecordConsumer() {...}
}
Then you can simply use --spring.cloud.function.definition=singleRecordConsumer property for a single case and --spring.cloud.function.definition=multipleRecordConsumer when starting the application, this ensuring which specific listener you want to activate.

Related

Starting multiple Kafka listeners in Spring Kafka?

One of our dev teams is doing something I've never seen before.
First they're defining an abstract class for their consumers.
public abstract class KafkaConsumerListener {
protected void processMessage(String xmlString) {
}
}
Then they use 10 classes like the one below to create 10 individual consumers.
#Component
public class <YouNameIt>Consumer extends KafkaConsumerListener {
private static final String <YouNameIt> = "<YouNameIt>";
#KafkaListener(topics = "${my-configuration.topicname}",
groupId = "${my-configuration.topicname.group-id}",
containerFactory = <YouNameIt>)
public void listenToStuff(#Payload String message) {
processMessage(message);
}
}
So with this they're trying to start 10 Kafka listeners (one class/object per listener). Each listener should have own consumer group (with own name) and consume from one (but different) topic.
They seem to use different ConcurrentKafkaListenerContainerFactories, each with #Bean annotation so they can assign different groupId to each container factory.
Is something like that supported by Spring Kafka?
It seems that it worked until few days ago and now it seems that one consumer group gets stuck all the time. It starts, reads few records and then it hangs, the consumer lag is getting bigger and bigger
Any ideas?
Yes, it is supported, but it's not necessary to create multiple factories just to change the group id - the groupId property on the annotation overrides the factory property.
Problems like the one you describe is most likely the consumer thread is "stuck" in user code someplace; take a thread dump to see what the thread is doing.

Which design pattern violation does this bug do and what to call it?

There is a singleton service class ItemsDataSource: IItemsDataSource injected into many other classes (business domain classes)
These business domain classes are many and run asynchronously calling methods on that ItemsDataSource service.
public interface IItemsDataSource
{
Task<IEnumerable<string>> GetItemsAsync();
void SetSourceConfiguration(JToken src);
}
public class ItemsDataSource : IItemsDataSource
{
private JToken m_configuration;
public Task<IEnumerable<string>> GetItemsAsync()
{
// Use m_configuration to some items
}
public void SetSourceConfiguration(JToken config)
{
m_configuration = src;
}
}
When multiple classes that are using this service are running asynchronously (let's say on 2 threads T1 and T2), this is sometimes happening:
T1 calls SetSourceConfiguration(config1) then starts running GetItemsAsync() asynchronously.
T2 calls SetSourceConfiguration(config2) (m_configuration is now assigned with config2) before T1 is done running GetItemsAsync(). For that T1 uses config2 instead of config1 and unexpected behavior happens.
The questions:
1- The optimal fix I think is removing SetSourceConfiguration and passing the JToken config directly as parameter into GetItemsAsync, or locking the code in the business classes, or is there another better solution ?
2- Which design pattern violation caused this bug ? So It could be avoided in the first place.
3- What is the "technical" term for this bug ? Methods with Side Effects, Design pattern violation, etc. ?
1- The optimal fix I think is removing SetSourceConfiguration and
passing the JToken config directly as parameter into GetItemsAsync, or
locking the code in the business classes, or is there another better
solution ?
If your service ItemsDataSource is not singleton service, you can remove SetSourceConfiguration and passing the JToken config directly as parameter into GetItemsAsync. However, it is singleton service, so the way to go, imho, is using lock of critical section.
The code would look like this, if you are using C#.
private readonly object myLock = new object();
public void Foo()
{
lock (myLock )
{
// critical section of code here
}
}
Read more about lock here
2- Which design pattern violation caused this bug ? So It could be
avoided in the first place.
It is not pattern. It is race condition. A race condition occurs when two or more threads can access shared data and they try to change it at the same time.
3- What is the "technical" term for this bug ? Methods with Side
Effects, Design pattern violation, etc. ?
It is race condition.

Axon Sagas duplicates events in event store when replaying events to new DB

we have Axon application that stores new Order. For each order state change (OrderStateChangedEvent) it plans couple of tasks. The tasks are triggered and proceeded by yet another Saga (TaskSaga - out of scope of the question)
When I delete the projection database, but leave the event store, then run the application again, the events are replayed (what is correct), but the tasks are duplicated.
I suppose this is because the OrderStateChangedEvent triggers new set of ScheduleTaskCommand each time.
Since I'm new in Axon, can't figure out how to avoid this duplication.
Event store running on AxonServer
Spring boot application autoconfigures the axon stuff
Projection database contains the projection tables and the axon tables:
token_entry
saga_entry
association_value_entry
I suppose all the events are replayed because by recreating the database, the Axon tables are gone (hence no record about last applied event)
Am I missing something?
should the token_entry/saga_entry/association_value_entry tables be part of the DB for the projection tables on each application node?
I thought that the event store might be replayed onto new application node's db any time without changing the event history so I can run as many nodes as I wish. Or I can remove the projection dB any time and run the application, what causes that the events are projected to the fresh db again. Or this is not true?
In general, my problem is that one event produces command leading to new events (duplicated) produced. Should I avoid this "chaining" of events to avoid duplication?
THANKS!
Axon configuration:
#Configuration
public class AxonConfig {
#Bean
public EventSourcingRepository<ApplicationAggregate> applicationEventSourcingRepository(EventStore eventStore) {
return EventSourcingRepository.builder(ApplicationAggregate.class)
.eventStore(eventStore)
.build();
}
#Bean
public SagaStore sagaStore(EntityManager entityManager) {
return JpaSagaStore.builder().entityManagerProvider(new SimpleEntityManagerProvider(entityManager)).build();
}
}
CreateOrderCommand received by Order aggregate (method fromCommand just maps 1:1 command to event)
#CommandHandler
public OrderAggregate(CreateOrderCommand cmd) {
apply(OrderCreatedEvent.fromCommand(cmd))
.andThenApply(() -> OrderStateChangedEvent.builder()
.applicationId(cmd.getOrderId())
.newState(OrderState.NEW)
.build());
}
Order aggregate sets the properties
#EventSourcingHandler
protected void on(OrderCreatedEvent event) {
id = event.getOrderId();
// ... additional properties set
}
#EventSourcingHandler
protected void on(OrderStateChangedEvent cmd) {
this.state = cmd.getNewState();
}
OrderStateChangedEvent is listened by Saga that schedules couple of tasks for the order of the particular state
private Map<String, TaskStatus> tasks = new HashMap<>();
private OrderState orderState;
#StartSaga
#SagaEventHandler(associationProperty = "orderId")
public void on(OrderStateChangedEvent event) {
orderState = event.getNewState();
List<OrderStateAwareTaskDefinition> tasksByState = taskService.getTasksByState(orderState);
if (tasksByState.isEmpty()) {
finishSaga(event.getOrderId());
}
tasksByState.stream()
.map(task -> ScheduleTaskCommand.builder()
.orderId(event.getOrderId())
.taskId(IdentifierFactory.getInstance().generateIdentifier())
.targetState(orderState)
.taskName(task.getTaskName())
.build())
.peek(command -> tasks.put(command.getTaskId(), SCHEDULED))
.forEach(command -> commandGateway.send(command));
}
I think I can help you in this situation.
So, this happens because the TrackingToken used by the TrackingEventProcessor which supplies all the events to your Saga instances is initialized to the beginning of the event stream. Due to this the TrackingEventProcessor will start from the beginning of time, thus getting all your commands dispatched for a second time.
There are a couple of things you could do to resolve this.
You could, instead of wiping the entire database, only wipe the projection tables and leave the token table intact.
You could configure the initialTrackingToken of a TrackingEventProcessor to start at the head of the event stream instead of the tail.
Option 1 would work out find, but requires some delegation from the operations perspective. Option 2 leaves it in the hands of a developer, potentially a little safer than the other solution.
To adjust the token to start at the head, you can instantiate a TrackingEventProcessor with a TrackingEventProcessorConfiguration:
EventProcessingConfigurer configurer;
TrackingEventProcessorConfiguration trackingProcessorConfig =
TrackingEventProcessorConfiguration.forSingleThreadedProcessing()
.andInitialTrackingToken(StreamableMessageSource::createHeadToken);
configurer.registerTrackingEventProcessor("{class-name-of-saga}Processor",
Configuration::eventStore,
c -> trackingProcessorConfig);
You'd thus create the desired configuration for your Saga and call the andInitialTrackingToken() function and ensuring the creation of a head token of no token is present.
I hope this helps you out Tomáš!
Steven's solution works like a charm but only in Sagas. For those who want to achieve the same effect but in classic #EventHandler (to skip executions on replay) there is a way. First you have to find out how your tracking event processor is named - I found it in AxonDashboard (8024 port on running AxonServer) - usually it is location of a component with #EventHandler annotation (package name to be precise). Then add configuration as Steven indicated in his answer.
#Autowired
public void customConfig(EventProcessingConfigurer configurer) {
// This prevents from replaying some events in #EventHandler
var trackingProcessorConfig = TrackingEventProcessorConfiguration
.forSingleThreadedProcessing()
.andInitialTrackingToken(StreamableMessageSource::createHeadToken);
configurer.registerTrackingEventProcessor("com.domain.notreplayable",
org.axonframework.config.Configuration::eventStore,
c -> trackingProcessorConfig);
}

Spring Data JPA - Java 8 Stream Support & Transactional Best Practices

I have a pretty standard MVC setup with Spring Data JPA Repositories for my DAO layer, a Service layer that handles Transactional concerns and implements business logic, and a view layer that has some lovely REST-based JSON endpoints.
My question is around wholesale adoption of Java 8 Streams into this lovely architecture: If all of my DAOs return Streams, my Services return those same Streams (but do the Transactional work), and my Views act on and process those Streams, then by the time my Views begin working on the Model objects inside my Streams, the transaction created by the Service layer will have been closed. If the underlying data store hasn't yet materialized all of my model objects (it is a Stream after all, as lazy as possible) then my Views will get errors trying to access new results outside of a transaction. Previously this wasn't a problem because I would fully materialize results into a List - but now we're in the brave new world of Streams.
So, what is the best way to handle this? Fully materialize the results inside of the Service layer as a List and hand them back? Have the View layer hand the Service layer a completion block so further processing can be done inside of a transaction?
Thanks for the help!
In thinking through this, I decided to try the completion block solution I mentioned in my question. All of my service methods now have as their final parameter a results transformer that takes the Stream of Model objects and transforms it into whatever resulting type is needed/requested by the View layer. I'm pleased to report it works like a charm and has some nice side-effects.
Here's my Service base class:
public class ReadOnlyServiceImpl<MODEL extends AbstractSyncableEntity, DAO extends AbstractSyncableDAO<MODEL>> implements ReadOnlyService<MODEL> {
#Autowired
protected DAO entityDAO;
protected <S> S resultsTransformer(Supplier<Stream<MODEL>> resultsSupplier, Function<Stream<MODEL>, S> resultsTransform) {
try (Stream<MODEL> results = resultsSupplier.get()) {
return resultsTransform.apply(results);
}
}
#Override
#Transactional(readOnly = true)
public <S> S getAll(Function<Stream<MODEL>, S> resultsTransform) {
return resultsTransformer(entityDAO::findAll, resultsTransform);
}
}
The resultsTransformer method here is a gentle reminder for subclasses to not forget about the try-with-resources pattern.
And here is an example Controller calling in to the service base class:
public abstract class AbstractReadOnlyController<MODEL extends AbstractSyncableEntity,
DTO extends AbstractSyncableDTOV2,
SERVICE extends ReadOnlyService<MODEL>>
{
#Autowired
protected SERVICE entityService;
protected Function<MODEL, DTO> modelToDTO;
protected AbstractReadOnlyController(Function<MODEL, DTO> modelToDTO) {
this.modelToDTO = modelToDTO;
}
protected List<DTO> modelStreamToDTOList(Stream<MODEL> s) {
return s.map(modelToDTO).collect(Collectors.toList());
}
// Read All
protected List<DTO> getAll(Optional<String> lastUpdate)
{
if (!lastUpdate.isPresent()) {
return entityService.getAll(this::modelStreamToDTOList);
} else {
Date since = new TimeUtility(lastUpdate.get()).getTime();
return entityService.getAllUpdatedSince(since, this::modelStreamToDTOList);
}
}
}
I think it's a pretty neat use of generics to have the Controllers dictate the return type of the Services via the Java 8 lambda's. While it's strange for me to see the Controller directly returning the result of a Service call, I do appreciate how tight and expressive this code is.
I'd say this is a net positive for attempting a wholesale switch to Java 8 Streams. Hopefully this helps someone with a similar question down the road.

Using Twitter4J's UserStreamListener with EJB

Looking around StackOverflow, I see this answer to a similar problem - according to the Twitter4J documentation, TwitterStream#addListener takes a callback function. I have naively written my class as follows:
#Stateless
#LocalBean
public class TwitterListenerThread implements Runnable {
private TwitterStream twitterStream;
public TwitterListenerThread(){}
#EJB private TwitterDispatcher dispatcher;
#Override
public void run() {
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true)
.setJSONStoreEnabled(true)
.setOAuthConsumerKey(Properties.getProperty("twitter_OAuthConsumerKey"))
.setOAuthConsumerSecret(Properties.getProperty("twitter_OAuthConsumerSecret"))
.setOAuthAccessToken(Properties.getProperty("twitter_OAuthAccessToken"))
.setOAuthAccessTokenSecret(Properties.getProperty("twitter_OAuthAccessTokenSecret"));
twitterStream = new TwitterStreamFactory(cb.build()).getInstance();
UserStreamListener listener = new UserStreamListener() {
#Override
public void onStatus(Status status) {
dispatcher.dispatch(status);
}
// Standard code
};
twitterStream.addListener(listener);
// Listen for all user activity
String user = Properties.getProperty("twitter-userid");
String[] users = {user};
twitterStream.user(users);
}
}
Now, on my colleague's PC this soon fails with an attempt to invoke when container is undeployed on the dispatcher.dispatch(status); line. I understand the reason as being due to the Twitter4J threading model not playing well with the JavaEE EJB model, but I cannot work out what to do based on the answer presented in the linked answer - how would I use a Message-Driven Bean to listen in to the Twitter stream?
After a little thinking, I worked out that the solution offered was to write a separate application that used just Java SE code to feed, using non-annotated code, a JMS queue with tweets, and then in my main application use a Message-Driven Bean to listen to the queue.
However, I was not satisfied with that work-around, so I searched a little more, and found Issue TFJ-285, Allow for alternative implementations of Dispatcher classes:
Now it is possible to introduce your own dispatcher implementation.
It can be Quartz based, it can be MDB based, and it can be EJB-timer based.
By default, Twitter4J still uses traditional and transient thread based dispatcher.
Implement a class implementing twtitter4j.internal.async.Dispatcher interface
put the class in the classpath
set -Dtwitter4j.async.dispatcherImpl to locate your dispatcher implementation
This is the default implementation on GitHub, so one could replace the:
private final ExecutorService executorService;
with a:
private final ManagedExecutorService executorService;
And, in theory, Bob's your uncle. If I ever get this working, I shall post the code here.

Resources