Best way to consume multi-service consumption model - .net-core

We have multiple background services(workers) having containerization, which will consume multiple Kafka topics, in order to maintain the chronological order and the integrity of data. What should be the best possible way for the consumption, should we use one consumer per topic or multiple topics per consumer.

consume multiple Kafka topics... maintain the chronological order
This simply isn't possible from a consumer client, regardless of the number of them you have. At least, not without actively sorting data as you're consuming it into an in-memory data structure (i.e. not parallelized or distributed)
You could write your data to a database first (using Kafka Connect, ideally, instead of your own .NET services), then write your apps to query the database, sorting by timestamp, instead of reading from Kafka directly.

Related

Redis streams - free struck messages in a consumer group without claiming

Lets say, there are messages in a Redis consumer group that has not been processed for N seconds. I am trying to understand if its possible to free them and put them back for other members of the consumer group to see it. I don't want to claim/process these struck messages. I just want to make them accessible to other active members of the consumer group. Is this possible?
From what I have understood from the documents, options mentioned are XAUTOCLAIM or use a combination of XPENDING and XCLAIM and neither of these are meeting my requirements.
Essentially, I am trying to create a standalone process that can act as monitor and make those messages visible to active consumers in the consumer group and I am planning to use this standalone process to perform similar activity for multiple consumer groups (around 30). So I don't want this standalone process to be taking other actions.
Please suggest how this can be designed.
Thanks!
Pending messages are removed from the Redis' PEL only when they are acknowledged: this is by design and allows to scale the message re-distribution process to each individual consumer and to avoid the single point of failure condition of having a single monitoring process like the one you described.
So, in short, what you are looking for can't be done and I would suggest to consider using XAUTOCLAIM or XPENDING / XCLAIM into your consumer processes instead.

what is the best way to generate axon sequence number while working with multiple micro services

I am new to Axon framework and we are using Axon 3.3.3 with Mongo DB as Event Store.
We would like to know the best option to generate Aggregate Id with Microservices, as we see problem with loading events from event store
Example : we have order service and product service.
orderService generated aggregate id as 101 of type OrderAggregate and it has been stored into the event store.
If product service also generated id as 101 of type ProductAggregate.
Then how can we load particular micro service events from event store
I generally recommend not using sequential numbers. Besides the fact that it is a process that is hard to scale, you tend to easily bump into duplicates, and the scope of the sequential numbers is generally on the entity-type level.
Instead, consider using UUIDs (using UUID.random()) for your aggregates. They can be generated by the sender of commands, allowing the identifier of an aggregate to be used to consistently route messages, including the creation method, to the same machine. This would allow you to configure caching on the handling side, and be sure that any updates are sent to the same node where the aggregate was created.

Integration testing a DynamoDB client which uses inconsistent reads?

Situation:
A web service with an API to read records from DynamoDB. It uses eventually consistent reads (GetItem default mode)
An integration test consisting of two steps:
create test data in DynamoDB
call the service to verify that it is returning the expected result
I worry that this test is bound to be fragile due to eventual consistency of the data.
If I attempt to verify the data immediately after writing using GetItem withConsistenRead=true it only guarantees that the data has been written to the majority of DB copies, but not all, so the service under test still has a chance to read from the non-updated copy on the next step.
Is there a way to ensure that the data has been written to all DynamoDB copies before proceeding?
The data usually reach all geographically distributed replicas in a second.
My suggestion is to wait (i.e. in Java terms sleep for few seconds) for couple of seconds before calling the web service should produce the desired result.
After inserting the data into DynamoDB table, wait for few seconds before calling the web service.
Eventually Consistent Reads (Default) – the eventual consistency
option maximizes your read throughput. However, an eventually
consistent read might not reflect the results of a recently completed
write. Consistency across all copies of data is usually reached within
a second. Repeating a read after a short time should return the
updated data.

Kafka consumer synchronization behavior

I am currently exploring kafka as a beginner for a simple problem.
There will one Producer pushing message to one Topic but there will
be n number of Consumer of spark application massage the data from
kafka and insert into database (each consumer inserts to different
table).
Is there a possibility that consumers will go out of sync (like some part of the consumer goes down for quite some time), then
one or more consumer will not process the message and insert to table
?
assuming the code is always correct, no exception will arise when
massaging the data. It is important that every message is processed
only once.
My question is that does Kafka handles this part for us or do we have to write some other code to make sure this does not happen.
You can group consumers (see group.id config) and that grouped consumers split topic's partitions among themselves. Once a consumer drops, another consumers from the group will take over partitions read by dropped one.
However, there may be some problems: when consumer read a partition it commit offset back to Kafka and if consumer dropped after it processed received data but before commit offset, other consumers will start read from the latest available offset. Fortunately, you can manage strategy of how offset is committed (see consumer's settings enable.auto.commit, auto.offset.reset etc)
Kafka and Spark Streaming guide provide some explanations and possible strategies of how to manage offsets.
By design Kafka decouples the producer and the consumer. Consumer will read as fast as they can - and consumers can produce as fast as they can.
Consumers can be organized into "consumer groups" and you can set it up so that multiple consumers can read from a single group as well set it up so that an individual consumer reads from its own group.
If you have 1 consumer to 1 group you (depending on your acknowledgement strategy) should be able to ensure each message is read only once (per consumer).
Otherwise if you want multiple consumer reading from a single group - same thing - but the message is read once by a one of n consumers.

Can I rely on riak as master datastore in e-commerce

In riak documentation, there are often examples that you could model your e-commerce datastore in certain way. But here is written:
In a production Riak cluster being hit by lots and lots of concurrent writes,
value conflicts are inevitable, and Riak Data Types
are not perfect, particularly in that they do not guarantee strong
consistency and in that you cannot specify the rules yourself.
From http://docs.basho.com/riak/latest/theory/concepts/crdts/#Riak-Data-Types-Under-the-Hood, last paragraph.
So, is it safe enough to user Riak as primary datastore in e-commerce app, or its better to use another database with stronger consistency?
Riak out of the box
In my opinion out of the box Riak is not safe enough to use as the primary datastore in an e-commerce app. This is because of the eventual consistency nature of Riak (and a lot of the NoSQL solutions).
In the CAP Theorem distributed datastores (Riak being one of them) can only guarentee at most 2 of:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response
about whether it succeeded or failed)
Partition tolerance (the system
continues to operate despite arbitrary partitioning due to network
failures)
Riak specifically errs on the side of Availability and Partition tolerance by having eventual consistency of data held in its datastore
What Riak can do for an e-commerce app
Using Riak out of the box, it would be a good source for the content about the items being sold in your e-commerce app (content that is generally written once and read lots is a great use case for Riak), however maintaining:
count of how many items left
money in a users' account
Need to be carefully handled in a distributed datastore.
Implementing consistency in an eventually consistent datastore
There are several methods you can use, they include:
Implement a serialization method when writing updates to values that you need to be consistent (ie: go through a single/controlled service that guarantees that it will only update a single item sequentially), this would need to be done outside of Riak in your API layer
Change the replication properties of your consistent buckets so that you can 'guarantee' you never retrieve out of date data
At the bucket level, you can choose how many copies of data you want
to store in your cluster (N, or n_val), how many copies you wish to
read from at one time (R, or r), and how many copies must be written
to be considered a success (W, or w).
The above method is similar to using the strong consistency model available in the latest versions of Riak.
Important note: In all of these data store systems (distributed or not) you in general will do:
Read the current data
Make a decision based on the current value
Change the data (decrement the Item count)
If all three of the above actions cannot be done in atomic way (either by locking or failing the 3rd if the value was changed by something else) an e-commerce app is open to abuse. This issue exists in traditional SQL storage solutions (which is why you have SQL Transactions).

Resources