How to immediately stop processing new messages when inside a message handler? - rebus

I have a Rebus bus setup with a single worker and max parallelism of 1 that processes messages "sequentialy". In case an handler fails, or for specific business reason, I'd like the bus instance to immediately stop processing messages.
I tried using the Rebus.Event package to detect the exception in the AfterMessageHandled handler and set the number of workers to 0, but it seems other messages are processed before it can actually succeed in stoping the single worker instance.
Where in the event processing pipeline could I do
bus.Advanced.Workers.SetNumberOfWorkers(0); in order to prevent further message processing?
I also tried setting the number of workers to 0 inside a catch block in the handler itself, but it doesn't seem like the right place to do it since SetNumberOfWorkers(0) waits for handlers to complete before returning and the caller is the handler... Looks like a some kind of a deadlock to me.
Thank you

This particular situation is a little bit of a dilemma, because – as you've correctly observed – SetNumberOfWorkers is a blocking function, which will wait until the desired number of threads has been reached.
In your case, since you're setting it to zero, it means your message handler needs to finish before the number of threads has reached zero... and then: 💣 ☠🔒
I'm sorry to say this, because I bet your desire to do this is because you're in a pickle somehow – but generally, I must say that wanting to process messages sequentually and in order with message queues is begging for trouble, because there are so many things that can lead to messages being reordered.
But, I think you can solve your problem by installing a transport decorator, which will bypass the real transport when toggled. If the decorator then returns null from the Receive method, it will trigger Rebus' built-in back-off strategy and start chilling (i.e. it will increase the waiting time between polling the transport).
Check this out – first, let's create a simple, thread-safe toggle:
public class MessageHandlingToggle
{
public volatile bool ProcessMessages = true;
}
(which you'll probably want to wrap up and make pretty somehow, but this should do for now)
and then we'll register it as a singleton in the container (assuming Microsoft DI here):
services.AddSingleton(new MessageHandlingToggle());
We'll use the ProcessMessages flag to signal whether message processing should be enabled.
Now, when you configure Rebus, you decorate the transport and give the decorator access to the toggle instance in the container:
services.AddRebus((configure, provider) =>
configure
.Transport(t => {
t.Use(...);
// install transport decorator here
t.Decorate(c => {
var transport = c.Get<ITransport>();
var toggle = provider.GetRequiredService<MessageHandlingToggle>();
return new MessageHandlingToggleTransportDecorator(transport, toggle);
})
})
.(...)
);
So, now you'll just need to build the decorator:
public class MessageHandlingToggleTransportDecorator : ITransport
{
static readonly Task<TransportMessage> NoMessage = Task.FromResult(null);
readonly ITransport _transport;
readonly MessageHandlingToggle _toggle;
public MessageHandlingToggleTransportDecorator(ITransport transport, MessageHandlingToggle toggle)
{
_transport = transport;
_toggle = toggle;
}
public string Address => _transport.Address;
public void CreateQueue(string address) => _transport.CreateQueue(address);
public Task Send(string destinationAddress, TransportMessage message, ITransactionContext context)
=> _transport.Send(destinationAddress, message, context);
public Task<TransportMessage> Receive(ITransactionContext context, CancellationToken cancellationToken)
=> _toggle.ProcessMessages
? _transport.Receive(context, cancellationToken)
: NoMessage;
}
As you can see, it'll just return null when ProcessMessages == false. Only thing left is to decide when to resume processing messages again, pull MessageHandlingToggle from the container somehow (probably by having it injected), and then flick the bool back to true.
I hope can work for you, or at least give you some inspiration on how you can solve your problem. 🙂

Related

Spring Kafka - "ErrorHandler threw an exception" and lost some records

Having Consumer polling 2 records at a time, i.e.:
#Bean
ConsumerFactory<String, String> consumerFactory() {
Map<String, Object> config = Map.of(
BOOTSTRAP_SERVERS_CONFIG, "localhost:9092",
GROUP_ID_CONFIG, "my-consumers",
AUTO_OFFSET_RESET_CONFIG, "earliest",
MAX_POLL_RECORDS_CONFIG, 2);
return new DefaultKafkaConsumerFactory<>(config, new StringDeserializer(), new StringDeserializer());
}
and ErrorHandler which can fail handling faulty record:
class MyListenerErrorHandler implements ContainerAwareErrorHandler {
#Override
public void handle(Exception thrownException,
List<ConsumerRecord<?, ?>> records,
Consumer<?, ?> consumer,
MessageListenerContainer container) {
simulateBugInErrorHandling(records.get(0));
skipFailedRecord(); // seek offset+1, which never happens
}
private void simulateBugInErrorHandling(ConsumerRecord<?, ?> record) {
throw new NullPointerException(
"DB transaction failed when saving info about failure on offset = " + record.offset());
}
}
Then such scenario is possible:
Topic gets 3 records
Consumer polls 2 records at a time
MessageListener fails to process the first record due to faulty payload
ErrorHandler fails to process the failure and itself throws an exception, e.g. due to some temporary issue
Third record gets processed
Second record is never processed (never enters MessageListener)
How to ensure no record is left unprocessed when ErrorHandler throws an exception with above scenario?
My goal is to achieve stateful retry logic with delays, but for brevity I omitted code responsible for tracking failed records and delaying retry.
I'd expect that after ErrorHandler throws an exception, skipping an entire batch of records should not happen. But it does.
Is it correct behavior?
Should I rather deal with commits manually that use Spring/Kafka defaults?
Should I use different ErrorHandler or handle method? (I need an access to Container to make a pause() for delayed retry logic; cannot use Thread.sleep())
Somehow related issue: https://github.com/spring-projects/spring-kafka/issues/1265
Full code: https://github.com/ptomaszek/spring-kafka-error-handler
The consumer has to be re-positioned (using seeks) in order to re-fetch the records after the failed one.
Use a DefaultErrorHandler (2.8.x and later) or a SeekToCurrentErrorHandler with earlier versions.
You can add retry options and a recoverer to deal with the failed record; by default it is just logged.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#default-eh
https://docs.spring.io/spring-kafka/docs/2.7.x/reference/html/#seek-to-current
You need to do the seeks first (or in a finally block), before any exceptions can be thrown; the container does not commit the offset if the error handler throws an exception.
Kafka maintains 2 offsets - the current committed offset and the current position (set to the committed offset when the consumer starts). The next poll always returns the next record after the last poll. unless a seek is performed.
The default error handlers catch any exceptions thrown by the recoverer and makes sure that the current (and subsequent) records will be returned by the next poll. See SeekUtils.doSeeks().

when using #StreamListener, customization to KafkaListenerContainerFactory are getting reflected in generated KafkaMessageListenerContainer?

I am using spring-cloud-stream with kafka binder to consume message from kafka . The application is basically consuming messages from kafka and updating a database.
There are scenarios when DB is down (which might last for hours) or some other temporary technical issues. Since in these scenarios there is no point in retrying a message for a limited amount of time and then move it to DLQ , i am trying to achieve infinite number of retries when we are getting certain type of exceptions (e.g. DBHostNotAvaialableException)
In order to achieve this i tried 2 approaches (facing issues in both the approaches) -
In this approach, Tried setting an errorhandler on container properties while configuring ConcurrentKafkaListenerContainerFactory bean but the error handler is not getting triggered at all. While debugging the flow i realized in the KafkaMessageListenerContainer that are created have the errorHandler field is null hence they use the default LoggingErrorHandler. Below are my container factory bean configurations -
the #StreamListener method for this approach is the same as 2nd approach except for the seek on consumer.
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object>
kafkaListenerContainerFactory(ConsumerFactory<String, Object> kafkaConsumerFactory) {
ConcurrentKafkaListenerContainerFactory<String, Object> factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(kafkaConsumerFactory);
factory.getContainerProperties().setAckOnError(false);
ContainerProperties containerProperties = factory.getContainerProperties();
// even tried a custom implementation of RemainingRecordsErrorHandler but call never went in to the implementation
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
return factory;
}
Am i missing something while configuring factory bean or this bean is only relevant for #KafkaListener and not #StreamListener??
The second alternative was trying to achieve it using manual acknowledgement and seek, Inside a #StreamListener method getting Acknowledgment and Consumer from headers, in case a retryable exception is received, I do certain number of retries using retrytemplate and when those are exhausted I trigger a consumer.seek() . Example code below -
#StreamListener(MySink.INPUT)
public void processInput(Message<String> msg) {
MessageHeaders msgHeaders = msg.getHeaders();
Acknowledgment ack = msgHeaders.get(KafkaHeaders.ACKNOWLEDGMENT, Acknowledgment.class);
Consumer<?,?> consumer = msgHeaders.get(KafkaHeaders.CONSUMER, Consumer.class);
Integer partition = msgHeaders.get(KafkaHeaders.RECEIVED_PARTITION_ID, Integer.class);
String topicName = msgHeaders.get(KafkaHeaders.RECEIVED_TOPIC, String.class);
Long offset = msgHeaders.get(KafkaHeaders.OFFSET, Long.class);
try {
retryTemplate.execute(
context -> {
// this is a sample service call to update database which might throw retryable exceptions like DBHostNotAvaialableException
consumeMessage(msg.getPayload());
return null;
}
);
}
catch (DBHostNotAvaialableException ex) {
// once retries as per retrytemplate are exhausted do a seek
consumer.seek(new TopicPartition(topicName, partition), offset);
}
catch (Exception ex) {
// if some other exception just log and put in dlq based on enableDlq property
logger.warn("some other business exception hence putting in dlq ");
throw ex;
}
if (ack != null) {
ack.acknowledge();
}
}
Problem with this approach - since I am doing consumer.seek() while there might be pending records from last poll those might be processed and committed if DB comes up during that period(hence out of order). Is there a way to clear those records while a seek is performed?
PS - we are currently in 2.0.3.RELEASE version of spring boot and Finchley.RELEASE or spring cloud dependencies (hence cannot use features like negative acknowledgement either and upgrade is not possible at this moment).
Spring Cloud Stream does not use a container factory. I already explained that to you in this answer.
Version 2.1 introduced the ListenerContainerCustomizer and if you add a bean of that type it will be called after the container is created.
Spring Boot 2.0 went end-of-life over a year ago and is no longer supported.
The answer I referred you shows how you can use reflection to add an error handler.
Doing the seek in the listener will only work if you have max.poll.records=1.

Cancelling a LoadAsync operation after a timeout period and recalling LoadAsync afterwards throws exception

I am working with SerialDevice on C++/winrt and need to listen for data coming over the port. I can successfully work with SerialDevice when data is streaming over the port but if nothing is read the DataReader.LoadAsync() function hangs even though I set timeouts through SerialDevice.ReadTimeout() and SerialDevice.WriteTimeout(). So to cancel the operation I am using IAsyncOperation's wait_for() operation which times out after a provided interval and I call IAsyncOperation's Cancel() and Close(). The problem is I can no longer make another call to DataReader.LoadAsync() without getting a take_ownership_from_abi exception. How can I properly cancel a DataReader.LoadAsync() call to allow subsequent calls to LoadAsync() on the same object?
To work around this, I tried setting the timeouts of SerialDevice but it didn't affect the DataRead.LoadAsync() calls. I also tried using create_task with a cancellation token which also didn't allow for an additional call to LoadAsync(). It took a lot of searching to find this article by Kenny Kerr:
https://kennykerr.ca/2019/06/10/cppwinrt-async-timeouts-made-easy/
where he describes the use of the IAsyncOperation's wait_for function.
Here is the initialization of the SerialDevice and DataReader:
DeviceInformation deviceInfo = devices.GetAt(0);
m_serialDevice = SerialDevice::FromIdAsync(deviceInfo.Id()).get();
m_serialDevice.BaudRate(nBaudRate);
m_serialDevice.DataBits(8);
m_serialDevice.StopBits(SerialStopBitCount::One);
m_serialDevice.Parity(SerialParity::None);
m_serialDevice.ReadTimeout(m_ts);
m_serialDevice.WriteTimeout(m_ts);
m_dataWriter = DataWriter(m_serialDevice.OutputStream());
m_dataReader = DataReader(m_serialDevice.InputStream());
Here is the LoadAsync call:
AsyncStatus iainfo;
auto async = m_dataReader.LoadAsync(STREAM_SIZE);
try {
auto iainfo = async.wait_for(m_ts);
}
catch (...) {};
if (iainfo != AsyncStatus::Completed)
{
async.Cancel();
async.Close();
return 0;
}
else
{
nBytesRead = async.get();
async.Close();
}
So in the case that the AsyncStatus is not Completed, the IAsyncOperation Cancel() and Close() are called which according to the documentation should cancel the Async call but now on subsequent LoadAsync calls I get a take_ownership_from_abi exception.
Anyone have a clue what I'm doing wrong? Why do the SerialDevice timeouts not work in the first place? Is there a better way to cancel the Async call that would allow for further calls without re-initializing DataReader? Generally, it feels like there is very little activity in the C++/winrt space and the documentation is severely lacking (didn't even find the wait_for method until about a day of trying other stuff and randomly searching for clues through different posts) - is there something I'm missing or is this really the case?
Thanks!
Cause: When the wait time is over, the async object is in the AsyncStatus::Started state. It means that the async object is still running.
Solution: When you use close() method, you could use Sleep(m_nTO) let asynchronous operation have enough time to close. Refer the following code.
if (iainfo != AsyncStatus::Completed)
{
m_nReadCount++;
//Sleep(m_nTO);
async.Cancel();
async.Close();
Sleep(m_nTO);
return 0;
}

Solution for asynchronous notification upon future completion in GridGain needed

We are evaluating Grid Gain 6.5.5 at the moment as a potential solution for distribution of compute jobs over a grid.
The problem we are facing at the moment is a lack of a suitable asynchronous notification mechanism that will notify the sender asynchronously upon job completion (or future completion).
The prototype architecture is relatively simple and the core issue is presented in the pseudo code below (the full code cannot be published due to an NDA). *** Important - the code represents only the "problem", the possible solution in question is described in the text at the bottom together with the question.
//will be used as an entry point to the grid for each client that will submit jobs to the grid
public class GridClient{
//client node for submission that will be reused
private static Grid gNode = GridGain.start("config xml file goes here");
//provides the functionality of submitting multiple jobs to the grid for calculation
public int sendJobs2Grid(GridJob[] jobs){
Collection<GridCallable<GridJobOutput>> calls = new ArrayList<>();
for (final GridJob job : jobs) {
calls.add(new GridCallable<GridJobOutput>() {
#Override public GridJobOutput call() throws Exception {
GridJobOutput result = job.process();
return result;
}
});
}
GridFuture<Collection<GridJobOutput>> fut = this.gNode.compute().call(calls);
fut.listenAsync(new GridInClosure<GridFuture<Collection<GridJobOutput>>>(){
#Override public void apply(GridFuture<Collection<GridJobOutput>> jobsOutputCollection) {
Collection<GridJobOutput> jobsOutput;
try {
jobsOutput = jobsOutputCollection.get();
for(GridJobOutput currResult: jobsOutput){
//do something with the current job output BUT CANNOT call jobFinished(GridJobOutput out) method
//of sendJobs2Grid class here
}
} catch (GridException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
});
return calls.size();
}
//This function should be invoked asynchronously when the GridFuture is
//will invoke some processing/aggregation of the result for each submitted job
public void jobFinished(GridJobOutput out) {}
}
}
//represents a job type that is to be submitted to the grid
public class GridJob{
public GridJobOutput process(){}
}
Description:
The idea is that a GridClient instance will be used to in order to submit a list/array of jobs to the grid, notify the sender how many jobs were submitted and when the jobs are finished (asynchronously) is will perform some processing of the results. For the results processing part the "GridClient.jobFinished(GridJobOutput out)" method should be invoked.
Now getting to question at hand, we are aware of the GridInClosure interface that can be used with "GridFuture.listenAsync(GridInClosure> lsnr)"
in order to register a future listener.
The problem (if my understanding is correct) is that it is a good and pretty straightforward solution in case the result of the future is to be "processed" by code that is within the scope of the given GridInClosure. In our case we need to use the "GridClient.jobFinished(GridJobOutput out)" which is out of the scope.
Due to the fact that GridInClosure has a single argument R and it has to be of the same type as of GridFuture result it seems impossible to use this approach in a straightforward manner.
If I got it right till now then in order to use "GridFuture.listenAsync(..)" aproach the following has to be done:
GridClient will have to implement an interface granting access to the "jobFinished(..)" method let's name it GridJobFinishedListener.
GridJob will have to be "wrapped" in new class in order to have an additional property of GridJobFinishedListener type.
GridJobOutput will have to be "wrapped" in new class in order to have an addtional property of GridJobFinishedListener type.
When the GridJob will be done in addition to the "standard" result GridJobOutput will contain the corresponding GridJobFinishedListener reference.
Given the above modifications now GridInClosure can be used now and in the apply(GridJobOutput) method it will be possible to call the GridClient.jobFinished(GridJobOutput out) method through the GridJobFinishedListener interface.
So if till now I got it all right it seems a bit clumsy work around so I hope I have missed something and there is a much better way to handle this relatively simple case of asynchronous call back.
Looking forward to any helpful feedback, thanks a lot in advance.
Your code looks correct and I don't see any problems in calling jobFinished method from the future listener closure. You declared it as an anonymous class which always has a reference to the external class (GridClient in your case), therefore you have access to all variables and methods of GridClient instance.

How to rewrite synchronous controller to be asynchronous in Play?

I'm using Play framework 2.2 for one of my upcoming web application. I have implemented my controllers in synchronous pattern, with several blocking calls (mainly, database).
For example,
Synchronous version:
public static Result index(){
User user = db.getUser(email); // blocking
User anotherUser = db.getUser(emailTwo); // blocking
...
user.sendEmail(); // call to a webservice, blocking.
return ok();
}
So, while optimising the code, decided to make use of Asynchronous programming support of Play. Gone through the documentation, but the idea is still vague to me, as I'm confused about how to properly convert the above synchronous block of code to Async.
So, I came up with below code:
Asynchronous version:
public static Promise<Result> index(){
return Promise.promise(
new Function0<Result>(){
public Result apply(){
User user = db.getUser(email); // blocking
User anotherUser = db.getUser(emailTwo); // blocking
...
user.sendEmail(); // call to a webservice, blocking.
return ok();
}
}
);
}
So, I just wrapped the entire control logic inside a promise block.
Is my approach correct?
Should I convert each and every blocking request inside the controller, as Asynchronous, or wrapping several blocking calls inside single Async block is enough?
The play framework is asynchronous by nature and it allows the creation of fully non-blocking code. But in order to be non-blocking - with all its benefits - you can't just wrap your blocking code and expect magic to happen...
In an ideal scenario, your complete application is written in a non-blocking manner. If this is not possible (for whatever reason), you might want to abstract your blocking code in Akka actors or behind async interfaces which return scala.concurrent.Future's. This way you can execute your blocking code (simultaneously) in a dedicated Execution Context, without impacting other actions. After all, having all your actions share the same ExecutionContext means they share the same Thread pool. So an Action that blocks Threads might drastically impact other Actions doing pure CPU while having CPU not fully utilized!
In your case, you probably want to start at the lowest level. It looks like the database calls are blocking so start by refactoring these first. You either need to find an asynchronous driver for whatever database you are using or if there is only a blocking driver available, you should wrap them in a future to execute using a DB-specific execution context (with a ThreadPool that's the same size as the DB ConnectionPool).
Another advantage of abstracting the DB calls behind an async interface is that, if at some point in the future, you switch to a non-blocking driver, you can just change the implementation of your interface without having to change your controllers!
In your re-active controller, you can then handle these futures and work with them (when they complete). You can find more about working with Futures here
Here's a simplified example of your controller method doing non-blocking calls, and then combining the results in your view, while sending an email asynchronous:
public static Promise<Result> index(){
scala.concurrent.Future<User> user = db.getUser(email); // non-blocking
scala.concurrent.Future<User> anotherUser = db.getUser(emailTwo); // non-blocking
List<scala.concurrent.Future<User>> listOfUserFutures = new ArrayList<>();
listOfUserFutures.add(user);
listOfUserFutures.add(anotherUser);
final ExecutionContext dbExecutionContext = Akka.system().dispatchers().lookup("dbExecutionContext");
scala.concurrent.Future<Iterable<User>> futureListOfUsers = akka.dispatch.Futures.sequence(listOfUserFutures, dbExecutionContext);
final ExecutionContext mailExecutionContext = Akka.system().dispatchers().lookup("mailExecutionContext");
user.andThen(new OnComplete<User>() {
public void onComplete(Throwable failure, User user) {
user.sendEmail(); // call to a webservice, non-blocking.
}
}, mailExecutionContext);
return Promise.wrap(futureListOfUsers.flatMap(new Mapper<Iterable<User>, Future<Result>>() {
public Future<Result> apply(final Iterable<User> users) {
return Futures.future(new Callable<Result>() {
public Result call() {
return ok(...);
}
}, Akka.system().dispatcher());
}
}, ec));
}
It you don't have anything to not block on then there may not be a reason to make your controller async. Here is a good blog about this from one of the creators of Play: http://sadache.tumblr.com/post/42351000773/async-reactive-nonblocking-threads-futures-executioncont

Resources