CosmosDB Change Feed in multiple regions with Multi-Master configuration - azure-cosmosdb

The system I'm working on has multiple environments, each running in separate Azure regions. Our CosmosDB is replicated to these regions and multi-region writes are enabled. We're using the default consistency model (Session).
We have azure functions that use the CosmosDb trigger deployed in all three regions. Currently these use the same lease prefix which means that only one function processes changes at any given time. I know that we can set each
region to have different lease prefixes to enable concurrent processing but I'd like to solidify my understanding before taking this step.
My question is around the behaviour of the change feed with regards to replication in this scenario? According to this link https://github.com/MicrosoftDocs/azure-docs/issues/42248#issuecomment-552207409 data is first converged on the primary region and then the change feed is updated.
Other resources I've read seem to suggest that each region has it's own change feed which will update upon replication. Also, the previous link recommends only running a change feed processor in the primary region in multi-master.
In an ideal world, I'd like change feed processors in each region to handle local writes quickly. These functions will make updates to CosmosDB and I also want to avoid issues with replication. My question is - what is the actual behavior in a multi master configuration (and by extension the correct architecture)?. Is it "safe" to use per-region change feed processors, or should we use a single processor in the primary region?

You cannot have per-region Change Feed Processor's that only process the local changes, because the Change Feed in each region contains the local writes plus the replicated writes from each other region.
Technically you can use a single Change Feed Processor deployment connecting to one of the regions to process events on all the regions.

Related

How can Cosmos DB guarantee similar availability as eventual consistency for strong consistent account?

Cosmos DB documentation seems to suggest that if we configure our Strong consistent Cosmos DB account with >= 3 regions, we get similar availability as eventual consistency (SLAs).
But, according to the CAP theorem how can this be the case. Suppose we have 3 regions, and there is a network partition isolating 1 read region from the remaining two (1 write and 1 read region). If a write requests comes to the write region, there are two options:
Fail the request
Commit the write to the write region and the reachable read region. The region outside the partition cannot be reached.
If Cosmos DB goes with option 2, then if a read requests were to come to the region that could not be reached, then because Cosmos DB uses local quorum, it will return stale data, which violates the consistency is guarantees.
Therefore, Cosmos DB must fail the write request in the face of network partitions.
This is accomplished by the use of a dynamic quorum over the regions when using 3+ regions. When one of the secondary read regions is impacted by a network partition, the service will remove it from the quorum, allowing writes to commit and replicate to the other online region for an RTO of 0.
The primary region periodically gets health signals from all regions that have been created for the account. It also keeps track of the commits that all regions have caught up to. Until the read region that was previously unavailable has caught up to commits it missed out on, it is not marked online. Once fully caught up, it starts accepting new writes from the primary region at steady state and is simultaneously marked available for serving read traffic.

How does azure cosmosdb change feed internal communication work?

Hi I am wandering how the internal mechanism of subscribing to an azure cosmosdb change feed actually works. Specifically if you are using azure-cosmosdb-js from node. Is there some sort of long polling mechanism that checks a change feed table or are events pushed to the subscriber using web-sockets?
Are there any limits on the number of subscriptions that you can have to any partition keys change feed?
Imagine the change feed as nothing other than an event source that keeps track of document changes.
All the actual change feed consuming logic is abstracted into the SDKs. The server just offers the change feed as something the SDK can consume. That is what the change feed processor libraries are using to operate.
We don't know much about the Change Feed Processor SDKs mainly because they are not open source. (Edit: Thanks to Matias for pointing out that they are actually open source now). However, from extensive personal usage I can tell you the following.
The Change Feed Processor will need a collection to store some documents. Those documents are just checkpoints for the processor to keep track of the consumption. Each main document in those lease collections corresponds to a physical partition. Each physical partition will be polled by the processor in a set interval. you can set that by setting the FeedPollDelay setting which "gets or sets the delay in between polling a partition for new changes on the feed, after all current changes are drained.".
The library is also capable of spreading the leases if multiple processors are running against a single collection. If a service fails, the running services will pick up the lease. Due to polling and delays you might end up reprocessing already processed documents. You can also choose to set the CheckpointFrequency of the change feed processor.
In terms of "subscriptions", you can have as many as you want. Keep in mind however that the change feed processor is writing the lease documents. They are smaller than 1kb so you will be paying the minimum charge of 10 RUs per change. However, if you end up with more than 40 physical partitions you might have to raise the throughput from the minimum 400 RU/s.

CosmosDB ChangeFeedProcessor: Can I use 2 separate ChangeFeedProcessor hosts for processing the same unpartitioned feed?

I currently have 2 separate microservices monitoring the same unpartitioned CosmosDB collection (let's call it MasterCollection).
I only ever want one microservice to process MasterCollection's change feed at any given time - the rationale for having 2 separate hosts monitor the same feed basically boils down to redundancy.
This is what I'm doing in code (note that only the first hostName param differs - every other param is identical):
microservice #1:
ChangeFeedEventHost host = new ChangeFeedEventHost("east", monitoredCollInfo, leaseCollInfo, feedOptions, feedHostOptions);
microservice #2:
ChangeFeedEventHost host = new ChangeFeedEventHost("west", monitoredCollInfo, leaseCollInfo, feedOptions, feedHostOptions);
My testing seems to indicate that this works (only one of them processes the changes), but I was wondering if this is a good practice?
The Change Feed Processor library has a load balancing mechanism that will share the load between them as explained here.
Sharing the load means that they will distribute the Leases among themselves. Each Lease represents a Partition Key Range in the collection.
In your scenario, where you are creating 2 hosts for the same monitored collection using the same lease collection, the effect is that they will share the load and each will hold half of the leases and process changes only for those Partition Key Ranges. So half of the partitions will be processed by the host named westand half by the one named east. If the collection is Single Partition, one of them will process all the changes while the other sits doing nothing.
If what you want is for both to process all the changes independently, you have multiple options:
Use a different lease collection for each Host.
Use the LeasePrefix option in the ChangeFeedHostOptions that can be set on the Host creation. That will let you share the Lease collection for both hosts but they will track independently. Just keep in mind that the RU usage in the Lease collection will raise depending on the amount of activity your main collection has.

Can I rely on riak as master datastore in e-commerce

In riak documentation, there are often examples that you could model your e-commerce datastore in certain way. But here is written:
In a production Riak cluster being hit by lots and lots of concurrent writes,
value conflicts are inevitable, and Riak Data Types
are not perfect, particularly in that they do not guarantee strong
consistency and in that you cannot specify the rules yourself.
From http://docs.basho.com/riak/latest/theory/concepts/crdts/#Riak-Data-Types-Under-the-Hood, last paragraph.
So, is it safe enough to user Riak as primary datastore in e-commerce app, or its better to use another database with stronger consistency?
Riak out of the box
In my opinion out of the box Riak is not safe enough to use as the primary datastore in an e-commerce app. This is because of the eventual consistency nature of Riak (and a lot of the NoSQL solutions).
In the CAP Theorem distributed datastores (Riak being one of them) can only guarentee at most 2 of:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response
about whether it succeeded or failed)
Partition tolerance (the system
continues to operate despite arbitrary partitioning due to network
failures)
Riak specifically errs on the side of Availability and Partition tolerance by having eventual consistency of data held in its datastore
What Riak can do for an e-commerce app
Using Riak out of the box, it would be a good source for the content about the items being sold in your e-commerce app (content that is generally written once and read lots is a great use case for Riak), however maintaining:
count of how many items left
money in a users' account
Need to be carefully handled in a distributed datastore.
Implementing consistency in an eventually consistent datastore
There are several methods you can use, they include:
Implement a serialization method when writing updates to values that you need to be consistent (ie: go through a single/controlled service that guarantees that it will only update a single item sequentially), this would need to be done outside of Riak in your API layer
Change the replication properties of your consistent buckets so that you can 'guarantee' you never retrieve out of date data
At the bucket level, you can choose how many copies of data you want
to store in your cluster (N, or n_val), how many copies you wish to
read from at one time (R, or r), and how many copies must be written
to be considered a success (W, or w).
The above method is similar to using the strong consistency model available in the latest versions of Riak.
Important note: In all of these data store systems (distributed or not) you in general will do:
Read the current data
Make a decision based on the current value
Change the data (decrement the Item count)
If all three of the above actions cannot be done in atomic way (either by locking or failing the 3rd if the value was changed by something else) an e-commerce app is open to abuse. This issue exists in traditional SQL storage solutions (which is why you have SQL Transactions).

Im building a salesforce.com type of CRM - what is the right database architecture?

I have developed a CRM for my company. Next I would like to take that system and make it available for others to use in a hosted format. Very much like a salesforce.com. The question is what type of database structure would I use. I see two options:
Option 1. Each time a company signs up, I clone the master database for them.
The disadvantage of this is that I could end up with thousands of databases. Thats a lot of databases to backup every night. My CRM uses cron jobs for maintanance, those jobs would have to run on all databases.
Each time I upgrade the system with a new feature, and need to add a new column to the database, I will have to add that column to thousands of databases.
Option 2. Use only one database.
At the beginning of EVERY table add "CompanyID". In EVERY SQL statement add "and
companyid={companyid}".
The advantage of this method is the simplicity of only one database. Just one database to backup each night. Just one database to update when needed.
The disadvantage is what if I get 1000 companies signing up, and each wants to store data on 100,000 leads, that 100,000,000 rows in the lead table, which worries me.
Does anyone know how the online hosted CRMs like salesforce.com do this?
Thanks
Wouldn't you clone a table structure style to each new database id all sheets archived in master base indexed client clone is hash verified to access specific sheet run through a host machine at the front end of the master system. Then directing requests as primary role. Internal access is batched to read/write slave systems in groups. Obviously set raid configurations to copy real time and scheduled. Balance requests for load to speed of system resources. That way you separated the security flawed from ui and connection to the retention schema. it seems like simplified structures and reducing policy requests cut down request rss in the query processing. or Simply a man in the middle approach from inside out.
1) Splitting your data into smaller groups in a safe, unthinking way (such as one database per grouping) is almost always best if you want to scale. In this case, unless for some reason you want to query between companies, keeping them in separate databases is best.
2) If you are updating all of the databases by hand, you are doing something wrong if you want to scale. You'd want to automate the process.
3) Ultimately, salesforce.com uses this as the basis of their DB infrastructure:
http://blog.database.com/blog/2011/08/30/database-com-is-open-for-business/

Resources