I can't find anything related to performance considerations when it comes to choosing a multitenant architecture with Cloud Datastore
Is there any performance gain when you have a namespace per tenant vs having all tenants sharing the same namespace since all the properties are indexed anyways?
are the entities within the same namespace collocated?
is there any gain when the amount of data grows large?
We don't consider the namespace with respect to sharding across regions, the app id (s~app_name) is the entire prefix used to determine splitting/replication, and appid data is kept within the same geo as the ~ prefix; in this case America (s~). All entities/namespaces will have similar performance characteristics under that appid.
No gain as the Datastore grows.
tl;dr Namespaces are a mechanism to simplify your multi-tenant management and are a feature of the SDK. They also help with selective export. They of course allow scoping of the queries (for data segregation between tenants), but composite indexes are shared among all namespaces.
Related
Currently I am trying to design an application where we have a CosmosDB account representing a group of customers with:
One container is used an overall Metadata store that contains all customers
Other containers will containers will contain data specific to one customer where data will be partitioned on according to different categories of customer history etc.
When we onboard a new customer (which will not happen too often and once) we'd like to make sure that we create an row in the Overall customer Metadata and then provision the customer specific container if fail rollback the transaction if it fails. (In the future we'd like to remove customers as well.)
Unfortunately the Cosmosdb Nosql only supports transactions in one container within the same logical partition, and does not support multi-container transactions. Our own POC indicates the MongoDB api does support this but unfortunately MongoDB does not fit our use case as we need support for Azure Functions.
The heart of the problem here isn't whether Cosmos DB supports distributed transactions. The core problem is you can't enlist an Azure Control Plane action (in this case, creating a container resource) into a transaction.
Since you're building in the cloud, my recommendation would be to employ the outbox pattern to manage your provisioning state for your customers. There's an easy to understand example here you can read.
Given you are building a multi-tenant application for Cosmos DB and using containers as your tenant boundary, please note that the maximum number of databases and/or containers in an account is 500. Please see Service Quotas for more information.
CosmosDb has a good feature of Globally Distributed which gives Faster Response of data. This will be useful for Mobile Applications directly accessing CosmosDb where Users are spread across the Globe.
However I am using ASP.NET Web Application hosted in Azure. Here my Application to Database communication will be of Fixed Distance always.
Can I benefit from CosmosDb in this case?
This is for Azure hosted ASP.NET Application
You can utilize CosmosDB when you know noSQL concept and so is your code, it has different implementation for read and write processes or you are planning to do microservices or you have other projects that depends/communicate on your Webapp project and your using the same database
There are some points you need to take into account before choosing CosmosDB as the database.
Pricing model! CosmosDB is not a cheep database and pricing model is based on the provisioned throughput. Requests that exceed the provisioned throughput will be rejected by the database. So first make sure you completely understand how things work.
Like other document based databases, if you wanna keep a graph of objects in a document, you should consider how to handle concurrent updates to the documents (if that is the case in your app). Hope you know well the difference between document based and relational databases.
But regarding the benefits:
It has a great a integration support with other PaaS services in Azure
It scales very well if you have a good partitioning strategy
I have a project which is using BizTalk Server 2013 R2 in which, there's lots of policies stored in Business Rule Composer and it's really hard to manage and find them.
According to the business, the policies can be grouped into some categories like Contract Policies, Cost Policiyes and ...
In the Business Rule Composer software, there is no Categorizing mechanism like folders.
Questions
Is there any mechanism that I can use to facilitate managing and finding policies?
If there's not such a mechanism, is there a way to have multiple Rule Store Databases so I can separate them by databases?
Policy itself is a logical grouping of similar rules. All you can do use some naming convention of policies as per your need.
Rule Store is one per BizTalk Group, so that is not an option.
Depending on how many policies you have, you can look for an option of creating your own UI to manage these policies. BizTalk provides the API Microsoft.RuleEngine which you can use to manage these policies.
I am designing a multi-tenant system and am considering sharding by tenant at the application layer level instead of database.
Hypothetically, the way this should work is that for incoming request a router process has a global collection of tenants containing primary attributes to determine the tenant for this request as well as the virtual shard id. This virtual shard id is further mapped to an actual shard.
The actual shard contains both the code for application as well as whole data for this tenant. These shards would be LNMP (Linux, Nginx, MySQL/MongoDB, PHP) servers.
The router process should act as proxy. It should be able to run some code to determine the target shard for incoming request based on the collection stored in some local db or files. To be able to scale this better, i am considering making the shards themselves act as routers also so that they can run a reverse proxy that will forward the request to appropriate shard. Maybe the nginx instance running on shard can also act as that reverse proxy. But how will it execute the application logic needed to match up the request with the appropriate shard.
I will appreciate any ideas and suggestions for this router implementation.
Thanks
Another option would be to use a product such as dbShards. dbShards is the only sharding product that shards at the application level. This way you can use any RDMS (Postgres, MySQL, etc.) and still be able to shard your database without having to put some kind of proxy in-between. A lot of the other sharding products rely on a proxy to point the transactions to the correct shard, but dbShards knows where to go without having to "ask" anyone else.
Great product. dbshards
Unless you expect your tenants to generate approximately equal data volume, sharding by tenant will not be very efficient.
As to application level sharding in general, let me share my own experience:
Version 1 of our high-volume SaaS product sharded at the application level. You will find that resharding as you grow will be a major headache if you shard against a SQL type solution at the application level, or you will have to write significant tooling to automate the process.
We switched to MongoDB (after considering multiple alternatives including Cassandra) in no small part because of all of the built-in support for resharding / rebalancing as data grows.
If your application does not need the relational capabilities of MySQL, I would suggest concentrating your efforts on MongoDB (since you have already identified that as a possible data platform) if you expect more than modest data growth. Allow MongoDB to handle the data sharding.
The concept of sharding on SQL azure is one of the top recommended options to get over the 50Gb DB size limit, it has at the moment. A key strategy in sharding is to group related records called atomic units together in a single shard , so that the application needs to only query a single SQL azure instance to retrieve the data.
However in applications such as Social networking Apps, grouping a atomic unit in a single shard is not trivial, due to the inter-connectivity of entities and records. what could be a recommended approach based on such a scenario?
Also in a sharded DB , what primary keys should be used for the tables ? Big Int or GUID. i currently use BIGINT Identity columns but if the data was to be merged for some reason this would be a problem due to conflicts between the values in different shards. i have heard some people recommend GUID's (UniqueIdentifier) but i'm wary on how this could affect performance. Indexing On-premise SQL servers with UniqueIdentifier columns is not possible, and i wonder how SQL azure implements similar strategies if i were to employ a UniqueIdentifier column.
For a social networking app, I'd presonally forgo using SQL and instead leverage a noSQL solution such as MongoDB or Azure Table Storage. These non-normalized but in-expensive systems allow you to create multiple entity datasets that are customized to your various indexing needs.
So instead of having something like...
User1 -< relationshiptable -< User2
You'd instead have tables like
Users
User1's Friends
User2's Friends
If Users 1 and 2 are both friends, then you'd have two entries to define that relationship, not one. But if makes retrieving a list of a specific user's friends trivial. It also now opens you up for executing tasks in parallel, by searching multiple index tables at a time.
This process scales extremely well, but does require that you invest more time in how the relationships are maintained. Admittedly, this is a simiplied example. Things get much more complex when you start discussing tasks like searching across your entire user base.