Axon Framework Implementing IntervalRetryScheduler only for some commands - axon

I have a Saga and the Saga sends Commands to different microservices on specific Events.
Some of the microservices may be down more than others so, I want to configure a CommandGateway with a RetryScheduler and also to implement my own IntervalRetryScheduler so that I'm able to do a retry for every RuntimeException but only for some Axon Commands (this was a big help Why does the RetryScheduler in Axon Framework not retry after a NoHandlerForCommandException?).
Everything works as expected, my only concern is if there are any issues coming from the fact that some Commands will be sent with the default CommandGateway and some with my custom CommandGateway that has the custom retry built in ?
For now I would not use the custom CommandGateway even for Commands with no retry
I've gone with the distinct CommandGateway beans approach
#Bean
public CommandGateway commandGateway(){
Configurer configurer = DefaultConfigurer.defaultConfiguration();
CommandBus commandBus = configurer.buildConfiguration().commandBus();
CommandGateway commandGateway = DefaultCommandGateway.builder().commandBus(commandBus).build();
return commandGateway;
}
#Bean
public CommandGateway commandGatewayWithRetry(){
Configurer configurer = DefaultConfigurer.defaultConfiguration();
CommandBus commandBus = configurer.buildConfiguration().commandBus();
ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(1);
RetryScheduler rs = IntervalRetrySchedulerImpl.builder().retryExecutor(scheduledExecutorService).maxRetryCount(5).retryInterval(1000).build();
CommandGateway commandGateway = DefaultCommandGateway.builder().commandBus(commandBus).retryScheduler(rs).build();
return commandGateway;
}

There are a couple of angles you can take from here on out.
If you are set on using the RetryScheduler/CommandGateway idea, you can do either of the following.
Configure distinct CommandGateway beans, either with our without the RetryScheduler
Be specific about the type of exception thrown from the Sagas for retries, so that your RetryScheduler can tailor towards retrying yes or no.
Build a Custom Command Gateway (as described here). From there on out, you can be as specific as you want when it comes to how certain commands behave.
However, I think the solution suggested in this Axon Usergroup post would be more worthwhile to follow in your situation.
To summarize the suggested approach, the idea is to schedule the retry in the Saga itself by using the Deadline mechanism provided by Axon.
That way, you can just let the command fail if the microservice is unavailable (which I assume is the problem you are trying to solve) and have the Saga itself retry the operation after a certain time out.
Hope this helps you out!

Related

Can I use CompletableFuture.runAsync inside a spring kafka batch listener?

considering my question:
#KafkaListener(..)
public void receive(
List<ConsumerRecord<String, String>> records,
Acknowledgment ack) {
records.stream().forEach(r -> CompletableFuture.runAsync(ConsumerService::process);
ack.acknowledge();
}
What are the pitfalls? Is it a good code?
My process method will to repost to kafka if fail, in this case I can commit if or not I get some error...
You run the risk of losing messages because you are committing the offsets before the async tasks complete. (If there is a failure (server crash, power failure etc.).

Sending a message to a queue on different Host in Rebus

In my setup I have two RabbitMQ servers that are used by different applications employing Rebus ESB. What I would like to know is if I can map a message to a queue on a different Host the way I can with MassTransit.
I also would like to know if I can send messages in a batch mode the same way with MassTransit.
Thanks In Advance.
In my setup I have two RabbitMQ servers that are used by different applications employing Rebus ESB. What I would like to know is if I can map a message to a queue on a different Host the way I can with MassTransit.
I am not sure how this works with MassTransit, but I'm pretty sure it's not readily possible with Rebus.
With Rebus, you're encouraged to treat this as you would any other integration scenario, where you'd put a ICanSendToOtherSystem in your IoC container, which just happens to be implemented by CanSendToOtherSystemUsingRebus. Your CanSendToOtherSystemUsingRebus class would probably look somewhat like this:
public class CanSendToOtherSystemUsingRebus : ICanSendToOtherSystem, IDisposable
{
readonly IBus _bus;
public CanSendToOtherSystemUsingRebus(string connectionString)
{
_bus = Configure.With(new BuiltinHandlerActivator())
.Transport(t => t.UseRabbitMqAsOneWayClient(connectionString))
.Start();
}
public Task Send(object message) => _bus.Send(message);
public void Dispose() => _bus.Dispose();
}
(i.e. just something that wraps a one-way client that can connect to that other RabbitMQ host, registered as a SINGLETON in the container)
I also would like to know if I can send messages in a batch mode the same way with MassTransit. Thanks In Advance.
Don't know how this works with MassTransit, but with Rebus, you can give the transport more convenient circumstances for optimizing the send operation(s) by using scopes:
using var scope = new RebusTransactionScope();
foreach (var message in lotsOfMessages)
{
// scope is automagically detected
await bus.Send(message);
}
await scope.CompleteAsync();
which will improve the rate with which you can send/publish with most transports. Just remember that the scope results in queuing up messages in memory before actually sending them, so you'll probably not want to send millions of messages in each batch.
I hope that answered your questions 🙂

Kafka Listeners stop reading from topics after a few hours

An app I have been working on has started causing issues in our staging and production environment that are seemingly due to Kafka listeners no longer reading anything from their assigned topics after a few hours from the app starting.
The app is running in a cloud foundry environment and it has 13 #KafkaListener, reading from multiple topics based on their given pattern. The amount of topics is equal (each user on the app creates its own topic for each of the 13 listeners using the pattern). Topics have 3 partitions. Auto-scaling is also used, with a minimum of 2 instances of the app running at the same time. One of the topics is under heavier load than the others, receiving between 1 to 200 messages each second. The processing time for each message is short, as we receive batches and the processing part only proceeds to write the batch to a DB.
The current issue is, as stated, that it works for a while after starting and then suddenly the listeners are no longer picking up messages. With no apparent error or warning in the logs. A temporary endpoint was created where KafkaListenerEndpointRegistry is used to look at the Listener Containers, and all of them seem to be running and have proper partitions assigned. Doing a .stop() and .start() on the containers leads to one additional batch of messages being processed, and then nothing else.
The following are the configs used:
#Bean
public ConsumerFactory<String, String> consumerFactory(){
return new DefaultKafkaConsumerFactory<>(kafkaConfig.getConfiguration());
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory(){
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setBatchListener(true);
factory.setConcurrency(3);
factory.getContainerProperties().setPollTimeout(5000);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);
}
The kafkaConfig sets the following settings:
PARTITION_ASSIGNMENT_STRATEGY_CONFIG: RoundRobinAssignor
MAX_POLL_INTERVAL_MS_CONFIG: 60000
MAX_POLL_RECORDS_CONFIG: 10
MAX_PARTITION_FETCH_BYTES_CONFIG: Integer.MAX_VALUE
ENABLE_AUTO_COMMIT_CONFIG: false
METADATA_MAX_AGE_CONFIG: 15000
REQUEST_TIMEOUT_MS_CONFIG: 30000
HEARTBEAT_INTERVAL_MS_CONFIG: 15000
SESSION_TIMEOUT_MS_CONFIG: 60000
Additionally, each listener is in its own class and has the listen method as follows:
#KafkaListener(id="<patternName>-container", topicPattern = "<patternName>.*", groupId = "<patternName>Group")
public void listen(#Payload List<String> payloads,
#Header(KafkaHeaders.RECEIVED_TOPIC) String topics,
Acknowledgement acknowledgement){
//processPayload...
acknowledgement.acknowledge();
}
The spring-kakfa version is 2.7.4.
Is there an issue with this config that could solve the issue? I have recently tried multiple changes with no success, changing these config settings around, moving the #KafkaListener annotation at class level, restarting the Listener Containers when they stop reading, and even having all the processing on the messages be done asynchronously and acknowledging the messages the moment they are picked up by the listener method. There were no errors or warning logs, and I wasn't able to see anything helpful on debug logging due to the amount of messages sent each second. We also have another app running the same settings in the same environments, but only 3 listeners (different topic patterns), where this issue does not occur. It is under a similar load, as the messages received by those 3 listeners are being output to the topic causing the large load on the app with the problem.
I would very much appreciate any help or pointers to what else I can do, since this issue is blocking us heavily in our production. Let me know if I missed something that could help.
Thank you.
Most problems like this are due to the listener thread being stuck in user code somplace; take a thread dump when this happens to see what the threads are doing.

How AppDynamics 4.4 to track async transaction

Consider below code:
public class Job {
private final ExecutorService executorService;
public void process() {
executorService.submit(() -> {
// do something slow
}
}
}
I could use AppDynamics "Java POJO" rule to create a business transaction to track all the calls to Job.process() method. But the measured response time didn't reflect real cost by the async thread started by java.util.concurrent.ExecutorService. This exact problem is also described in AppDynamics document: End-to-End Latency Performance that:
The return of control stops the clock on the transaction in terms of measuring response time, but meanwhile the logical processing for the transaction continues.
The same AppDynamics document tries to give a solution to address this issue but the instructions it provides is not very clear to me.
Could anyone give more executable guide on how to configure AppD to track async calls like the one shown above?
It seems that you schould be able to define your custom Asynchronous Transaction Demarcator as described in: https://docs.appdynamics.com/display/PRO44/Asynchronous+Transaction+Demarcators
which will point to the last method of Runnable that you passes to the Executor. Then according to the documentation all you need is to attach the Demarcator to your Business Transaction and it will collect the asynchronous call.

ActiveMQ Override scheduled message

I am trying to implement delayed queue with overriding of messages using Active MQ.
Each message is scheduled to be delivered with delay of x (say 60 seconds)
In between if same message is received again it should override previous message.
So even if I receive 10 messages say in x seconds. Only one message should be processed.
Is there clean way to accomplish this?
The question has two parts that need to be addressed separately:
Can a message be delayed in ActiveMQ?
Yes - see Delay and Schedule Message Delivery. You need to set <broker ... schedulerSupport="true"> in your ActiveMQ config, as well as setting the AMQ_SCHEDULED_DELAY property of the JMS message saying how long you want the message to be delayed (10000 in your case).
Is there any way to prevent the same message being consumed more than once?
Yes, but that's an application concern rather than an ActiveMQ one. It's often referred to as de-duplication or idempotent consumption. The simplest way if you only have one consumer is to keep track of messages received in a map, and check that map whether you receive a message. It it has been seen, discard.
For more complex use cases where you have multiple consumers on different machines, or you want that state to survive application restart, you will need to keep a table of messages seen in a database, and query it each time.
Please vote this answer up if it helps, as it encourages people to help you out.
Also according to method from ActiveMQ BrokerService class you should configure persistence to have ability to use scheduler functionality.
public boolean isSchedulerSupport() {
return this.schedulerSupport && (isPersistent() || jobSchedulerStore != null);
}
you can configure activemq broker to enable "schedulerSupport" with the following entry in your activemq.xml file located in conf directory of your activemq home directory.
<broker xmlns="http://activemq.apache.org/schema/core" brokerName="localhost" dataDirectory="${activemq.data}" schedulerSupport="true">
You can Override the BrokerService in your configuration
#Configuration
#EnableJms
public class JMSConfiguration {
#Bean
public BrokerService brokerService() throws Exception {
BrokerService brokerService = new BrokerService();
brokerService.setSchedulerSupport(true);
return brokerService;
}
}

Resources