Redis streams - free struck messages in a consumer group without claiming - .net-core

Lets say, there are messages in a Redis consumer group that has not been processed for N seconds. I am trying to understand if its possible to free them and put them back for other members of the consumer group to see it. I don't want to claim/process these struck messages. I just want to make them accessible to other active members of the consumer group. Is this possible?
From what I have understood from the documents, options mentioned are XAUTOCLAIM or use a combination of XPENDING and XCLAIM and neither of these are meeting my requirements.
Essentially, I am trying to create a standalone process that can act as monitor and make those messages visible to active consumers in the consumer group and I am planning to use this standalone process to perform similar activity for multiple consumer groups (around 30). So I don't want this standalone process to be taking other actions.
Please suggest how this can be designed.
Thanks!

Pending messages are removed from the Redis' PEL only when they are acknowledged: this is by design and allows to scale the message re-distribution process to each individual consumer and to avoid the single point of failure condition of having a single monitoring process like the one you described.
So, in short, what you are looking for can't be done and I would suggest to consider using XAUTOCLAIM or XPENDING / XCLAIM into your consumer processes instead.

Related

Event-sourcing: when (and not) should I use Message Queue?

I am building a project from scratch using event-sourcing with Java and Cassandra.
My apps we be based on microservices and in some use cases information will be processed asynchronously. I was wondering what part a Message Queue (such as Rabbit, Active MQ Artemis, Kafka, etc) would play to improve the technology stack in this environment and if I understand the scenarios if I won't use it.
I would start with separating messaging infrastructure like RabbitMQ from event streaming/storing/processing like Kafka. These are two different things made for two (or more) different purposes.
Concerning the event sourcing, you have to have a place where you must store events. This storage must be append-only and support fast reads of unstructured data based on an identity. One example of such persistence is the EventStore.
Event sourcing goes together with CQRS, which means you have to project your changes (event) to another store, which you can query. This is done by projecting events to that store, this is where events get processed to change the domain object state. It is important to understand that using message infrastructure for projections is generally a bad idea. This is due to the nature of messaging and two-phase commit issue.
If you look at how events get persisted, you can see that they get saved to the store as one transaction. If you then need to publish events, this will be another transaction. Since you are dealing with two different pieces of infrastructure, things can get broken.
The messaging issue as such is that messages are usually guaranteed to be delivered "at least once" and the order of messages is usually not guaranteed. Also, when your message consumer fails and NACKs the message, it will be redelivered but usually a bit later, again breaking the sequence.
The ordering and duplication concerns, whoever, do not apply to event streaming servers like Kafka. Also, the EventStore will guarantee once only event delivery in order if you use catch-up subscription.
In my experience, messages are used to send commands and to implement event-driven architecture to connect independent services in a reactive way. Event stores, at the other hand, are used to persist events and only events that get there are then projected to the query store and also get published to the message bus.
Make sure you are clear on the distinction between send(command) and publish(event). Udi Dahan touches on that topic in his essay on busses and brokers.
In most cases where you are event sourcing, you do not want to be reconstructing state from published events. If you need state, then query the technical authority/book of record for the history, and reconstruct the state from the history.
On the other hand, event driven activity off of a message queue should be fine. When a single event (plus the subscriber's state) has everything you need, then running off of the bus is fine.
In some cases, you might do both. For example, if you were updating cached views, you'd subscribe to various BobChanged events to know when your cached data was stale; to rebuild a stale view, you would reload a representation of the history and transform it into an updated view.
In the world of event-sourcing applications, message queues usually allow you to implement publish-subscribe pattern style of communication between producers and consumers. Also, they usually help you with delivery guarantees: which messages were delivered to which subscribers and which ones were not.
But they don't store all messages indefinitely. You need to have an event store to do any kind of event sourcing.
The question is not 'to queue or not to queue', but it is more like:
can this thing store huge volume of events indefinitely?
does it have publish-subscribe capabilities?
does it provide at-least-once delivery guarantees?
So, you should use something like Kafka or EventStore to have all that out-of-the-box. Alternatively, you can combine event store with message queue manually, but this is going to be more involved.

Kafka consumer synchronization behavior

I am currently exploring kafka as a beginner for a simple problem.
There will one Producer pushing message to one Topic but there will
be n number of Consumer of spark application massage the data from
kafka and insert into database (each consumer inserts to different
table).
Is there a possibility that consumers will go out of sync (like some part of the consumer goes down for quite some time), then
one or more consumer will not process the message and insert to table
?
assuming the code is always correct, no exception will arise when
massaging the data. It is important that every message is processed
only once.
My question is that does Kafka handles this part for us or do we have to write some other code to make sure this does not happen.
You can group consumers (see group.id config) and that grouped consumers split topic's partitions among themselves. Once a consumer drops, another consumers from the group will take over partitions read by dropped one.
However, there may be some problems: when consumer read a partition it commit offset back to Kafka and if consumer dropped after it processed received data but before commit offset, other consumers will start read from the latest available offset. Fortunately, you can manage strategy of how offset is committed (see consumer's settings enable.auto.commit, auto.offset.reset etc)
Kafka and Spark Streaming guide provide some explanations and possible strategies of how to manage offsets.
By design Kafka decouples the producer and the consumer. Consumer will read as fast as they can - and consumers can produce as fast as they can.
Consumers can be organized into "consumer groups" and you can set it up so that multiple consumers can read from a single group as well set it up so that an individual consumer reads from its own group.
If you have 1 consumer to 1 group you (depending on your acknowledgement strategy) should be able to ensure each message is read only once (per consumer).
Otherwise if you want multiple consumer reading from a single group - same thing - but the message is read once by a one of n consumers.

Ensure In Process Records are Unique ActiveMQ

I'm working on a system where clients enter data into a program and the save action posts a message to activemq for more time intensive processing.
We are running into rare occasions where a record will be updated by a client twice in a row and a consumer on that activemq queue will process the two records at the same time. I'm looking for a way to ensure that messages containing records with the same identity are processed in-order and only one at a time. To be clear if a record with ID 1, 1, and 2 (in that order) are sent to activemq, 1 would process, then 2 (if 1 was still in process) and finally 1.
Another requirement, (due to volume) requires that the consumer be multi-threaded, so there may be 16 threads accessing that queue. This would have to be taken into consideration.
So if you have multiple threads reading that queue and you want the solution to be close to ActiveMQ you have to think about how you scale related to order concerns.
If you have multiple consumers, they may operate at different speed and you can never be sure which consumer goes before the other. The only way is to have a single consumer (you can still achieve High Availability by using exclusive-consumers).
You can, however, segment the load in other ways. How depends a lot on your application. If you can create, say 16 "worker" queues (or whatever your max consumer count would be) and distribute load to these queues while guarantee that requests from a single user always come to the same "worker queue", message order will remain per user.
If you have no good way to divide users into groups, simply take the userID mod MAX_CONSUMER_THREADS as a simple solution.
There may be better ways to deal with this problem in the consumer logic itself. Like keeping track of the sequence number and postpone updates that are out of order (scheduled delay can be used for that).

Event Driven Architecture - Service Contract Design

I'm having difficulty conceptualising a requirement I have into something that will fit into our nascent SOA/EDA
We have a component I'll call the Data Downloader. This is a facade for an external data provider that has both high latency and a cost associated with every request. I want to take this component and create a re-usable service out of it with a clear contract definition. It is up to me to decide how that contract should work, however its responsibilities are two-fold:
Maintain the parameter list (called a Download Definition) for an upcoming scheduled download
Manage the technical details of the communication to the external service
Basically, it manages the 'how' of the communication. The 'what' and the 'when' are the responsibilities of two other components:
The 'what' is managed by 'Clients' who are responsible for
determining the parameters for the download.
The 'when' is managed by a dedicated scheduling component. Because of the cost associated with the downloads we'd like to batch the requests intraday.
Hopefully this sequence diagram explains the responsibilities of the services:
Because each of the responsibilities are split out in three different components, we get all sorts of potential race conditions with async messaging. For instance when the Scheduler tells the Downloader to do its work, because the 'Append to Download Definition' command is asynchronous, there is no guarantee that the pending requests from Client A have actually been serviced. But this all screams high-coupling to me; why should the Scheduler necessarily know about any 'prerequisite' client requests that need to have been actioned before it can invoke a download?
Some potential solutions we've toyed with:
Make the 'Append to Download Definition' command a blocking request/response operation. But this then breaks the perf. and scalability benefits of having an EDA
Build something in the Downloader to ensure that it only runs when there are no pending commands in its incoming request queue. But that then introduces a dependency on the underlying messaging infrastructure which I don't like either.
Makes me think I'm thinking about this problem in a completely backward way. Or is this just a classic case of someone trying to fit a synchronous RPC requirement into an async event-driven architecture?
The thing I like most about EDA and SOA, is that it almost completely eliminates the notion of race condition. As long as your events are associated with some association key (e.g. downloadId), the problem you describe can be addressed with several solutions of different complexities - depending on your needs. I'm not sure I totally understand the described use-case but I will try my best
Out of the top of my head:
DataDownloader maintains a list of received Download Definitions and a list of triggered downloads. When a definition is received it is checked against the triggers list to see if the associated download has already been triggered, and if it was, execute the download. When a TriggerDownloadCommand is recieved, the definitions list is checked against a definition with the associated downloadId.
For more complex situation, consider using the Saga pattern, which is implemented by some 3rd party messaging infrastructures. With some simple configuration, it will handle both messages, and initiate the actual download when the required condition is satisfied. This is more appropriate for distributed systems, where an in-memory collection is out of the question.
You can also configure your scheduler (or the trigger command handler) to retry when an error is signaled (e.g. by an exception), in order to avoid that race condition, and ultimately give up after a specified timeout.
Does this help?

NServiceBus, when are too many message used?

When considering a service in NServiceBus at what point do you start questioning how many messages handled by a service is too much and start to break these into a new service?
Consider the following: I have a sales service which can currently be broken into a few distinct business components, these are sales order validation, sales order processing, purchase order validation and purchase order processing.
There are currently about 20 message handlers and 2 sagas used within this service. My concern is that during high volume traffic from my website this can cause an initial spike in the messages to jump into the hundreds. Considering that the messages need to be processed in the order they are taken off the queue this can cause a delay for the last in the queue ( depending on what processing each message does).
When separating concerns within a service into smaller business components I find this makes things a little easier. Sure, it's a logical separation, but it seems to provide a layer of clarity and understanding. To me it seems it seems an easier option to do this than creating new services where in the end the more services I have the more maintenance I need to do.
Does anyone have any similar concerns to this?
I think you have actually answered you own question :)
As soon as the message volume reaches a point where the lag becomes an issue you could look to instance your endpoint. You do not necessarily need to reduce the number of handlers. You could simply install the service a number of times and have specific message types sent to the relevant endpoint by mapping.
So it becomes a matter of a simple instance installation and some config changes. So you can then either split messages on sending so that messages from a particular source end up on a particular endpoint (maybe priority) or on message type.
I happened to do the same thing on a previous project (not using NServiecBus though) where we needed document conversion messages coming from the UI to be processed ASAP. We simply installed the conversion service again with its own set of queues and changed the UI configuration to send the conversion messages to the new endpoint. The background conversion messages were still going to the previous endpoint. So here the source determined the separation.

Resources