Best way to handle multiple container transactions operations in Cosmosdb Nosql? - azure-cosmosdb

Currently I am trying to design an application where we have a CosmosDB account representing a group of customers with:
One container is used an overall Metadata store that contains all customers
Other containers will containers will contain data specific to one customer where data will be partitioned on according to different categories of customer history etc.
When we onboard a new customer (which will not happen too often and once) we'd like to make sure that we create an row in the Overall customer Metadata and then provision the customer specific container if fail rollback the transaction if it fails. (In the future we'd like to remove customers as well.)
Unfortunately the Cosmosdb Nosql only supports transactions in one container within the same logical partition, and does not support multi-container transactions. Our own POC indicates the MongoDB api does support this but unfortunately MongoDB does not fit our use case as we need support for Azure Functions.

The heart of the problem here isn't whether Cosmos DB supports distributed transactions. The core problem is you can't enlist an Azure Control Plane action (in this case, creating a container resource) into a transaction.
Since you're building in the cloud, my recommendation would be to employ the outbox pattern to manage your provisioning state for your customers. There's an easy to understand example here you can read.
Given you are building a multi-tenant application for Cosmos DB and using containers as your tenant boundary, please note that the maximum number of databases and/or containers in an account is 500. Please see Service Quotas for more information.

Related

Where to use CosmosDB?

CosmosDb has a good feature of Globally Distributed which gives Faster Response of data. This will be useful for Mobile Applications directly accessing CosmosDb where Users are spread across the Globe.
However I am using ASP.NET Web Application hosted in Azure. Here my Application to Database communication will be of Fixed Distance always.
Can I benefit from CosmosDb in this case?
This is for Azure hosted ASP.NET Application
You can utilize CosmosDB when you know noSQL concept and so is your code, it has different implementation for read and write processes or you are planning to do microservices or you have other projects that depends/communicate on your Webapp project and your using the same database
There are some points you need to take into account before choosing CosmosDB as the database.
Pricing model! CosmosDB is not a cheep database and pricing model is based on the provisioned throughput. Requests that exceed the provisioned throughput will be rejected by the database. So first make sure you completely understand how things work.
Like other document based databases, if you wanna keep a graph of objects in a document, you should consider how to handle concurrent updates to the documents (if that is the case in your app). Hope you know well the difference between document based and relational databases.
But regarding the benefits:
It has a great a integration support with other PaaS services in Azure
It scales very well if you have a good partitioning strategy

Firebase Scalability - More than 100k user at the same time and across multiple region

I know scalability is not an issue in Firebase and supports up to 100k Simultaneous connections(in general).
Based on pricing documentation:
You can create multiple database instances to go beyond the 100K
concurrent limit. See Pricing FAQ for more information.
Question 1: What if there is more than 200k users using simultaneously on the same database? The other half of the users could not query, connect or the request will be placed in queue?
(As a Firebase plan subscriber, I would like to know how Firebase deals with the problem to ensure the quality of the services provided to our customers are always in top-notch)
Since, App globalisation is common nowadays and many companies' practices are to have servers across multiple regions to provide better and stable performance. Online game for example which required low latency.
As for now, the firebase user is required to set the default location when creating the project which is non-editable afterward. Some issues even rises where the users realised they deployed their app to the wrong regions and do not have clues on how to change the regions.
This represents the country/region of your organisation/company. Your
selection also sets the appropriate currency for your revenue
reporting. The selected country does not determine the location of
your data for Firebase features. Google may process and store Customer
Data anywhere Google or its agents maintain facilities.
Question 2: Will or does Firebase provide a solution / tailor-made to such practice which having our database in multiple regions while having a headquartered region and multiple other regions sharing all the databases, functions and auth across the regions?
(For now to have multiple servers location, we have to create different projects and the user and data syncing will be a problem)
Hope the language does not offend, cheers!
It seems like your question (or at least your assumptions) is based on the Firebase Realtime Database, so I'll answer for that below.
Q1) You can create more than 2 databases in a single project, each of which allows 100K connections. So it can scale beyond 200K connections. All of these are hosted in the same region though, so you can't use each database for a separate region.
Q2) For a database solution that handles multiple regions, I'd recommend looking at Cloud Firestore. Also see: Cloud Firestore - selecting region to store data?

Can I add listeners to Multiple Databases in Firebase?

Regarding the recent announcement of the multi-database support within a Firebase project, can we add listeners to multiple databases? Or should we connect to maximum one database at a time?
For example, let's say that I have created two databases, DB-1 and DB-2. I want to add a listener for changes in node-A in DB-1 and another listener in node-B in DB-2. Is this possible? I've read the documentation but it's a bit contradicting:
Each app instance only connects to one database at any given moment.
...
If each client needs to connect to multiple databases during a session, you can reduce the number of simultaneous connections to each database instance by connecting to each database instance for only as long as is necessary.
You can certainly connect to multiple databases at the same time, according to the documentation. There may be cases when you want to reduce the active number of active connections your app is making, especially if you have a lot of shards, each with a lot of activity, so the advice stands for those cases, if this applies to you.

Sharding at application level

I am designing a multi-tenant system and am considering sharding by tenant at the application layer level instead of database.
Hypothetically, the way this should work is that for incoming request a router process has a global collection of tenants containing primary attributes to determine the tenant for this request as well as the virtual shard id. This virtual shard id is further mapped to an actual shard.
The actual shard contains both the code for application as well as whole data for this tenant. These shards would be LNMP (Linux, Nginx, MySQL/MongoDB, PHP) servers.
The router process should act as proxy. It should be able to run some code to determine the target shard for incoming request based on the collection stored in some local db or files. To be able to scale this better, i am considering making the shards themselves act as routers also so that they can run a reverse proxy that will forward the request to appropriate shard. Maybe the nginx instance running on shard can also act as that reverse proxy. But how will it execute the application logic needed to match up the request with the appropriate shard.
I will appreciate any ideas and suggestions for this router implementation.
Thanks
Another option would be to use a product such as dbShards. dbShards is the only sharding product that shards at the application level. This way you can use any RDMS (Postgres, MySQL, etc.) and still be able to shard your database without having to put some kind of proxy in-between. A lot of the other sharding products rely on a proxy to point the transactions to the correct shard, but dbShards knows where to go without having to "ask" anyone else.
Great product. dbshards
Unless you expect your tenants to generate approximately equal data volume, sharding by tenant will not be very efficient.
As to application level sharding in general, let me share my own experience:
Version 1 of our high-volume SaaS product sharded at the application level. You will find that resharding as you grow will be a major headache if you shard against a SQL type solution at the application level, or you will have to write significant tooling to automate the process.
We switched to MongoDB (after considering multiple alternatives including Cassandra) in no small part because of all of the built-in support for resharding / rebalancing as data grows.
If your application does not need the relational capabilities of MySQL, I would suggest concentrating your efforts on MongoDB (since you have already identified that as a possible data platform) if you expect more than modest data growth. Allow MongoDB to handle the data sharding.

SQL Azure Sharding and Social Networking Apps

The concept of sharding on SQL azure is one of the top recommended options to get over the 50Gb DB size limit, it has at the moment. A key strategy in sharding is to group related records called atomic units together in a single shard , so that the application needs to only query a single SQL azure instance to retrieve the data.
However in applications such as Social networking Apps, grouping a atomic unit in a single shard is not trivial, due to the inter-connectivity of entities and records. what could be a recommended approach based on such a scenario?
Also in a sharded DB , what primary keys should be used for the tables ? Big Int or GUID. i currently use BIGINT Identity columns but if the data was to be merged for some reason this would be a problem due to conflicts between the values in different shards. i have heard some people recommend GUID's (UniqueIdentifier) but i'm wary on how this could affect performance. Indexing On-premise SQL servers with UniqueIdentifier columns is not possible, and i wonder how SQL azure implements similar strategies if i were to employ a UniqueIdentifier column.
For a social networking app, I'd presonally forgo using SQL and instead leverage a noSQL solution such as MongoDB or Azure Table Storage. These non-normalized but in-expensive systems allow you to create multiple entity datasets that are customized to your various indexing needs.
So instead of having something like...
User1 -< relationshiptable -< User2
You'd instead have tables like
Users
User1's Friends
User2's Friends
If Users 1 and 2 are both friends, then you'd have two entries to define that relationship, not one. But if makes retrieving a list of a specific user's friends trivial. It also now opens you up for executing tasks in parallel, by searching multiple index tables at a time.
This process scales extremely well, but does require that you invest more time in how the relationships are maintained. Admittedly, this is a simiplied example. Things get much more complex when you start discussing tasks like searching across your entire user base.

Resources