Cosmos DB replication cost - azure-cosmosdb

I want to set up a Cosmos DB account with a single write region and multiple read regions. I also want to use autoscaling RU provisioning, so I only pay for what I use (above the floor).
Now if there is zero load I expect the RU cost to be 400 RU multiplied by region count (since 400 RU is the cost floor for autoscaling).
If I perform a write charged at a specific RU cost that I can see in the response, is that only counted once (against the write region), and then the replication only incurs extra costs for egress and storage? Or will the RU cost be multiplied by the region count behind the scenes?
Similarly for reads, is that RU cost only counted once (against the read region), or is it multiplied by the region count?
Under Metrics (Classic), I see that Avg Throughput/s (RU/s) only changes in the write region when writing, but I'm not sure if this reflects the actual charge.
I felt that this was not answered clearly in: In Cosmos DB, how does geo-replication impact RU consumption of writes?

The throughput that you have configured for various Azure Cosmos
databases and containers will be reserved in each of the Azure regions
associated with your Azure Cosmos database account. If the sum of
provisioned throughput (RU/sec) configured across all the databases
and containers within your Azure Cosmos database account (provisioned
per hour) is T and the number of Azure regions associated with your
database account is N, then the total provisioned throughput for a
given hour, for your Azure Cosmos database account is equal to T x N
RU/sec.
Provisioned throughput (single write region) costs $0.008/hour per 100 RU/sec and provisioned throughput with multiple writable regions (multi-region writes config) costs $0.016/per hour per 100 RU/sec.
Source: Understand your Azure Cosmos DB bill

Related

Can replication cause request throttling?

I have a following use case:
we have single write region Azure Cosmos
the db will be replicated to other Azure regions (e.g. 5 additional Azure regions treated as read replicas)
we have a daily ETL job that cannot interrupt users querying the database. Because of that we're rate limiting in the application layer the requests we're making to Cosmos - e.g. we're consuming only 5k RUs/s out 10k RUs/s provisioned (to be strict we're provisioning 1k RUs/s with Auto-Scale setting). Thanks to that, while we're doing the ETL job we're consuming 50% of available RUs.
Question:
is it possible that during replication we will hit 100% RU utilization in one of the read replicas because Cosmos will try to replicate everything as fast as possible?
It depends on (1) whether the ETL is reading from Cosmos DB as a source or writing to Cosmos DB as a target and (2) what the aggregate workload (ETL + app) looks like.
I'll explain -
The best way to think about RU's is its a proxy metrics for the physical system resources it takes to perform a request (CPU, memory, IOPs).
Writes must be applied to all regions - and therefore consume RUs (CPU/memory/IOPs) in each of the replicated regions. Given an example 3-region setup consisting of West US + East US + North Europe - writing a record will result in RU consumption in West US, East US, and North Europe.
Reads can be served out a single region independently of another region. Given an example 3-region setup consisting of West US + East US + North Europe - reading a record in West US has no impact on East US or North Europe.
As you suggested -
Rate-limiting the ETL job is a good choice. Depending on what your ETL tool is - a few of them have easier to use client rate-limiting configuration options (e.g. notably, Azure Data Factory data flow and the spark connector for Cosmos DB's Core SQL API has a "Write throughput budget" concept) - alternatively, you can scale down the ETL job itself to ensure the ETL job is a natural bottleneck.
Configuring autoscale maximums to have sufficient headroom for [RU/sec needed for the rate-limited ETL] + [upper bound for expected RU/sec needed for the application] is a good call as well - while also noting Cosmos DB's autoscale comes with a 10x scaling factor. (e.g. configuring a 20K RU/sec maximum on cosmos db autoscale results in automatic scaling between 2K - 20K RU/sec).
One side note worth mentioning... depending on what the use-case is for the ETL job - if this is a classic ETL from OLTP => OLAP, it may be worthwhile to consider looking at Cosmos DB's analytical storage + Synapse Link feature set as an easier out-of-box solution.

Calculating RU charge on Cosmos DB

When you are doing your RU calculations for Cosmos DB, do you need to be calculating the max values of reads, inserts, updates and deletes or the average number per second?
Reason why I ask is because the average documents read (in current mongo db) is around 5500 but the maximum number of documents read (in on second) over my sampling period was 965880.
I have looked through all of Microsoft's documentation on Costing Cosmos DB and there is no clear guidance on whether the figure for RU throughput is average or max
As you said there's no MS document on 'average or max' for setting throughput, in my opinion, both average and max are meaningful, but we also need to look at the most common situation, for example, there's always around 5800 per second, and also usually 4500 per second, the min is 3000 and the max is 9000. 1 RU means '1KB doc read', if we set the max number as the throughput, it's expensive and waste, if we set the average, maybe the system usually 'in debt' as the answer said. That's why I say we also need to consider the 'most common' situation.
By the way, MS provides a web based tool for helping estimate the request unit requirements for typical operations. If admin also don't know the real situation about the database, I think this doc may help, in short for the doc, that says, if you're building a new application or a small application, you can start at the minimum RU/s to avoid over-provisioning in the beginning. After running the application for some time, maybe you can use azure monitor to determine if your traffic pattern is suitable.
To avoid throttling you need to provide the MAX throughput you need in RUs. Also, it depends how frequently you hit the max RUs. There are basically 3 ways to provision RUs- Provisioned throughput, Autoscale & Serverless(Preview).
If you provision standard (manual) RU/s at the entry point of 400 RU/s, you won't be able to consume above 400 RU/s, unless you manually change the throughput. You'll be billed for 400 RU/s at the standard (manual) provisioned throughput rate, per hour.
If you provision autoscale throughput at the entry point of max RU/s of 4000 RU/s, the resource will scale between 400 to 4000 RU/s. Since the autoscale throughput billing rate per RU/s is 1.5x of the standard (manual) rate, for hours where the system has scaled down to the minimum of 400 RU/s, your bill will be higher than if you provisioned 400 RU/s manually. However, with autoscale, at any time, if your application traffic spikes, you can consume up to 4000 RU/s with no user action required. In general, you should weigh the benefit of being able to consume up to the max RU/s at any time with the 1.5x rate of autoscale.
For small applications with a low expected traffic, you can also consider the serverless capacity mode, which bills you as you use.
Use the Azure Cosmos DB capacity calculator to estimate your throughput requirements.
Should definitely go through this and related pages of documentation- https://learn.microsoft.com/en-us/azure/cosmos-db/request-units

Why does Cosmos DB return 429 for a portion of requests despite not exceeding my manual set throughput

My Cosmos DB is using Shared Throughput across several containers. I have manually scaled up my Cosmos DB to 70,000 RU/s and I am currently running a large number of requests.
Looking in azure I can see that a portion of my requests are being throttled (returning 429).
To give an idea of numbers around 25k requests return 200 and around 5k requests return 429.
When I follow the warning in the azure portal that says my collection is exceeding provisioned throughput it shows the average throughput is 6.78k RU/s.
I don't understand why when I have 70,000 RU/s that my requests are being throttled when the average throughput is supposedly only 6,780 RU/s.
No other containers are being read or written to, all these requests are made against just one container.
As all these requests are to run a stored procedure they all have a Partition key supplied.
The most likely reason is you have a hot partition that is reaching its allocated throughput before the other partitions are.
For a horizontally scalable database, throughput is allocated across physical partitions (computers) and data is partitioned using a partition key that basically acts as an address to route it to a specific computer to be stored.
Assume I have a collection with three partitions 1, 2, 3 and 30K RU/s. Each one of those will get 10K RU/s allocated to it. If I then run an operation that does a ton of operations on partition 2 and consumes all of it's 10K I'm going to get rate limited (429) even I don't touch partition 1 or 3.
To avoid this you need to pick a partition key that BOTH distributes data as evenly as possible during writes and ideally can also be used to answer queries within one or a small number (bounded) number of partitions, trying to avoid "fan out" queries where queries have to hit every partition.
Now for small collections that only reside on a single physical partition none of this matters because your data is all on a single physical partition. However, as the collection grows larger this causes issues which will prevent the database from scaling fully.
You can learn more here

Cosmos db high RUs for writes

In cosmos Db hope are RUs increased for writes? To my understanding, the replicas are only for reads but only one node is used for writes. What is the model for writes distribution in cosmos db when the load is high?
In Cosmos DB, data is stored by it's partition key. In high write volume scenarios a key to success is ensuring your partition key has high enough cardinality to give you enough throughput to meet your performance needs for a given amount of RU/s.
Beyond scalability in a single geographic region though, using multi-master is what provides geographic scale for writes or lower latency.

How DynamoDB provisions throughput of reads independently of writes

Amazon DynamoDB allows the customer to provision the throughput of reads and writes independently. I have read the Amazon Dynamo paper about the system that preceded DynamoDB and read about how Cassandra and Riak implemented these ideas.
I understand how it is possible to increase the throughput of these systems by adding nodes to the cluster which then divides the hash keyspace of tables across more nodes, thereby allowing greater throughput as long as access is relatively random across hash keys. But in systems like Cassandra and Riak this adds throughput to both reads and writes at the same time.
How is DynamoDB architected differently that they are able to scale reads and write independently? Or are they not and Amazon is just charging for them independently even though they essentially have to allocate enough nodes to cover the greater of the two?
You are correct that adding nodes to a cluster should increase the amount of available throughput but that would be on a cluster basis, not a table basis. The DynamoDB cluster is a shared resource across many tables across many accounts. It's like an EC2 node: you are paying for a virtual machine but that virtual machine is hosted on a real machine that is shared among several EC2 virtual machines and depending on the instance type, you get a certain amount of memory, CPU, network IO, etc.
What you are paying for when you pay for throughput is IO and they can be throttled independently. Paying for more throughput does not cause Amazon to partition your table on more nodes. The only thing that cause a table to be partitioned more is if the size of your table grows to the point where more partitions are needed to store the data for your table. The maximum size of the partition, from what I have gathered talking to DynamoDB engineers, is based on the size of the SSDs of the nodes in the cluster.
The trick with provisioned throughput is that it is divided among the partitions. So if you have a hot partition, you could get throttling and ProvisionedThroughputExceededExceptions even if your total requests aren't exceeding the total read or write throughput. This is contrary to what your question ask. You would expect that if your table is divided among more partitions/nodes, you'd get more throughput but in reality it is the opposite unless you scale your throughput with the size of your table.

Resources