How can Cosmos DB guarantee similar availability as eventual consistency for strong consistent account? - azure-cosmosdb

Cosmos DB documentation seems to suggest that if we configure our Strong consistent Cosmos DB account with >= 3 regions, we get similar availability as eventual consistency (SLAs).
But, according to the CAP theorem how can this be the case. Suppose we have 3 regions, and there is a network partition isolating 1 read region from the remaining two (1 write and 1 read region). If a write requests comes to the write region, there are two options:
Fail the request
Commit the write to the write region and the reachable read region. The region outside the partition cannot be reached.
If Cosmos DB goes with option 2, then if a read requests were to come to the region that could not be reached, then because Cosmos DB uses local quorum, it will return stale data, which violates the consistency is guarantees.
Therefore, Cosmos DB must fail the write request in the face of network partitions.

This is accomplished by the use of a dynamic quorum over the regions when using 3+ regions. When one of the secondary read regions is impacted by a network partition, the service will remove it from the quorum, allowing writes to commit and replicate to the other online region for an RTO of 0.

The primary region periodically gets health signals from all regions that have been created for the account. It also keeps track of the commits that all regions have caught up to. Until the read region that was previously unavailable has caught up to commits it missed out on, it is not marked online. Once fully caught up, it starts accepting new writes from the primary region at steady state and is simultaneously marked available for serving read traffic.

Related

DynamoDB: Time frame to avoid stale read

I am writing to dynamoDB using AWS lambda. I am using AWS Console to read data from dynamoDB. But I have seen instances of stale read with latest not records getting updated when trying to pull records under few mins. What is a safe time interval for data pull which would ensure that latest data is available on read? Would 30 mins be a safe interval?
The below is from AWS site: Just want to understand how recent is recent here. "When you read data from a DynamoDB table, the response might not reflect the results of a recently completed write operation. The response might include some stale data"
Regards,
Dbeings
If you must have a strongly consistent read, you can specify that in your read statement. That way the client will always read from the leader storage node for that partition.
In order for DynamoDB to acknowledge a write, the write must be durable on the leader storage node for that partition and one other storage node for the partition.
If you do an eventually consistent read (which is the default), you might get that read you have a 1:3 chance of that read coming from a node that was not part of the write and an even lesser chance that the item has not been updated on that third storage node.
So, if you need a strongly consistent read, ask for one and you'll get the newest of that item. There is no real performance degradation for doing a strongly consistent read.

CosmosDB Change Feed in multiple regions with Multi-Master configuration

The system I'm working on has multiple environments, each running in separate Azure regions. Our CosmosDB is replicated to these regions and multi-region writes are enabled. We're using the default consistency model (Session).
We have azure functions that use the CosmosDb trigger deployed in all three regions. Currently these use the same lease prefix which means that only one function processes changes at any given time. I know that we can set each
region to have different lease prefixes to enable concurrent processing but I'd like to solidify my understanding before taking this step.
My question is around the behaviour of the change feed with regards to replication in this scenario? According to this link https://github.com/MicrosoftDocs/azure-docs/issues/42248#issuecomment-552207409 data is first converged on the primary region and then the change feed is updated.
Other resources I've read seem to suggest that each region has it's own change feed which will update upon replication. Also, the previous link recommends only running a change feed processor in the primary region in multi-master.
In an ideal world, I'd like change feed processors in each region to handle local writes quickly. These functions will make updates to CosmosDB and I also want to avoid issues with replication. My question is - what is the actual behavior in a multi master configuration (and by extension the correct architecture)?. Is it "safe" to use per-region change feed processors, or should we use a single processor in the primary region?
You cannot have per-region Change Feed Processor's that only process the local changes, because the Change Feed in each region contains the local writes plus the replicated writes from each other region.
Technically you can use a single Change Feed Processor deployment connecting to one of the regions to process events on all the regions.

Does time to live cause race conditions in DynamoDB global tables?

I'm about to migrate an existing DynamoDB database from a table in one AWS region to a global table in four regions. Here's the catch: several services depend on the time-to-live (ttl) functionality in the current database, and ttl needs to be enabled in the new global table. It seems that problems could potentially arise from enabling ttl on multiple table replicas. If ttl runs in one region, there is now a race condition between the event stream from that region and the ttl deletions in the other regions. If another replica manages to delete its own record before the event stream from the other replica arrives, what will happen? Does the second deletion attempt fail gracefully because the record is already gone? Perhaps AWS has already worked out this issue, but I can't find any documentation or question forums on this topic.
One potential workaround would be to enable ttl on one region and let the stream from that replica manage ttl in the other replicas, but I don't really like that solution, since a regional outage could halt ttl deletions completely. One of the primary reasons for having global tables is to protect against regional outages.

CosmosDB ChangeFeedProcessor: Can I use 2 separate ChangeFeedProcessor hosts for processing the same unpartitioned feed?

I currently have 2 separate microservices monitoring the same unpartitioned CosmosDB collection (let's call it MasterCollection).
I only ever want one microservice to process MasterCollection's change feed at any given time - the rationale for having 2 separate hosts monitor the same feed basically boils down to redundancy.
This is what I'm doing in code (note that only the first hostName param differs - every other param is identical):
microservice #1:
ChangeFeedEventHost host = new ChangeFeedEventHost("east", monitoredCollInfo, leaseCollInfo, feedOptions, feedHostOptions);
microservice #2:
ChangeFeedEventHost host = new ChangeFeedEventHost("west", monitoredCollInfo, leaseCollInfo, feedOptions, feedHostOptions);
My testing seems to indicate that this works (only one of them processes the changes), but I was wondering if this is a good practice?
The Change Feed Processor library has a load balancing mechanism that will share the load between them as explained here.
Sharing the load means that they will distribute the Leases among themselves. Each Lease represents a Partition Key Range in the collection.
In your scenario, where you are creating 2 hosts for the same monitored collection using the same lease collection, the effect is that they will share the load and each will hold half of the leases and process changes only for those Partition Key Ranges. So half of the partitions will be processed by the host named westand half by the one named east. If the collection is Single Partition, one of them will process all the changes while the other sits doing nothing.
If what you want is for both to process all the changes independently, you have multiple options:
Use a different lease collection for each Host.
Use the LeasePrefix option in the ChangeFeedHostOptions that can be set on the Host creation. That will let you share the Lease collection for both hosts but they will track independently. Just keep in mind that the RU usage in the Lease collection will raise depending on the amount of activity your main collection has.

Can I rely on riak as master datastore in e-commerce

In riak documentation, there are often examples that you could model your e-commerce datastore in certain way. But here is written:
In a production Riak cluster being hit by lots and lots of concurrent writes,
value conflicts are inevitable, and Riak Data Types
are not perfect, particularly in that they do not guarantee strong
consistency and in that you cannot specify the rules yourself.
From http://docs.basho.com/riak/latest/theory/concepts/crdts/#Riak-Data-Types-Under-the-Hood, last paragraph.
So, is it safe enough to user Riak as primary datastore in e-commerce app, or its better to use another database with stronger consistency?
Riak out of the box
In my opinion out of the box Riak is not safe enough to use as the primary datastore in an e-commerce app. This is because of the eventual consistency nature of Riak (and a lot of the NoSQL solutions).
In the CAP Theorem distributed datastores (Riak being one of them) can only guarentee at most 2 of:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response
about whether it succeeded or failed)
Partition tolerance (the system
continues to operate despite arbitrary partitioning due to network
failures)
Riak specifically errs on the side of Availability and Partition tolerance by having eventual consistency of data held in its datastore
What Riak can do for an e-commerce app
Using Riak out of the box, it would be a good source for the content about the items being sold in your e-commerce app (content that is generally written once and read lots is a great use case for Riak), however maintaining:
count of how many items left
money in a users' account
Need to be carefully handled in a distributed datastore.
Implementing consistency in an eventually consistent datastore
There are several methods you can use, they include:
Implement a serialization method when writing updates to values that you need to be consistent (ie: go through a single/controlled service that guarantees that it will only update a single item sequentially), this would need to be done outside of Riak in your API layer
Change the replication properties of your consistent buckets so that you can 'guarantee' you never retrieve out of date data
At the bucket level, you can choose how many copies of data you want
to store in your cluster (N, or n_val), how many copies you wish to
read from at one time (R, or r), and how many copies must be written
to be considered a success (W, or w).
The above method is similar to using the strong consistency model available in the latest versions of Riak.
Important note: In all of these data store systems (distributed or not) you in general will do:
Read the current data
Make a decision based on the current value
Change the data (decrement the Item count)
If all three of the above actions cannot be done in atomic way (either by locking or failing the 3rd if the value was changed by something else) an e-commerce app is open to abuse. This issue exists in traditional SQL storage solutions (which is why you have SQL Transactions).

Resources