Scaling a CosmosDB collection - the minimum has increased recently

Scaling a CosmosDB collection - the minimum has increased recently - azure-cosmosdb

I created a couple of CosmosDB collections with the minimum 400 RU/s during my Azure trial.
I was able to scale them up and down on demand, typically between 400 and 5000 RU/s.
I filled one of my collections with lots of test data (I'm currently at approx. 50GB in there), evenly split across 8 partitions (according to the "metrics" view in the portal).
I'm not able to scale the collection down to 400 RU/s anymore. The new minimum is shown as 800 RU/s:
Screenshot from my portal
I suspect that it has something to do with the number of partitions but I wasn't able to find anything about this in the documentation.
This is confusing, my understanding was that the RU/s can be scaled down to 400 at any time.
My goal is to scale down the RU/s as much as possible and I was hoping to be able to get back to 400 RU/s.

When you have collection level throughput provisioned, the minimum amount of RUs you can allocate is equal to 100 * number of physical partitions. This is because the minimum number of RUs per physical partitions is 100.
400 is by default the minimum because partitioned collection come out of the box with 4 physical partitions.

Related

Azure Data Factory "write throughput budget" setting in Cosmos sink always caps RU usage to ~10%

We are using a Cosmos dataset as a sink for a data flow in our pipeline. The Cosmos database is configured to use 400 RUs during normal operations, but we upscale it during the pipeline run. The upscaling works flawlessly.
The pipeline consumes 100% of the provisioned throughput, as is expected. We would like to limit this to about 80%, so that our customers don't experience delays and timeout exceptions. According to the documentation the "Write throughput budget" setting in the Cosmos sink is suppose to be "An integer that represents the RUs you want to allocate for this Data Flow write operation, out of the total throughput allocated to the collection". Unless I am mistaken, this means that you can set a limit to how many RUs the pipeline is allowed to consume.
However, no matter what value we use for "Write throughput budget", the pipeline will always consume ~10% of the total provisioned throughput. We have tested with a wide range of values, and the result is always the same. If we do not set a value 100% of RUs are consumed, but ~10% is always used whether we set the value to 1, 500, 1000, or even 1200 (of a total 1000).
Does anyone know if this is a bug with the ADF Cosmos sink, or have I misunderstood what this setting is supposed to be? Is there any other way of capping how many Cosmos RUs an ADF pipeline is allowed to use?
EDIT:
This is definitely related to data size. Setting provisioned throughput to 10000 RUs and write throughput budget to 7500 uses ~85% of total RUs when we test with 300 000 documents. Using the same settings, but 10 000 000 documents, we see a consistent ~10% RU usage for the pipeline run.

The solution to our problem was to set "write throughput budget" to a much higher value than provisioned throughput. Data size and number of partitions used in the pipeline is definitely have an effect on what settings you should use. For reference we had ~10 000 000 documents of 455 bytes each. Setting throughput to 10 000 and write throughput budget to 60 000, ADF used on average ~90% of the provisioned throughput.
I recommend trial and error for your specific case, and to not be afraid to set the write budget to a much higher value than you think is necessary.

It will depend on the number of partitions. Check how many partitions you have at the sink.

CosmosDb - physical partition placement, RU impact

Understanding so far..
Logical partitions are mapped to physical partitions and we have no control over the number of physical partitions. One physical partition can contain multiple logical partitions.
I also understand that provisioned RUs are divided equally among physical partitions.
The question..
Say I have a 500 RU limit, 1 million distinct partition key values and 50GB of data. That's 1 million logical partitions.
Will the container's logical partitions be grouped on a small pool of physical partitions that are reserved exclusively for our use? E.g. among 5 physical partitions, so each partition has 100 RUs?
Or will each logical partition end up being stored somewhere random on physical partitions shared with other Cosmos users? Thus my 500 RUs is actually 500 divided by a really, really high number of physical partitions (at most 1 million), with queries likely to fail as the per-physical partition RU limit is exceeded?
My understanding is that it's the former, but I want to validate this at the planning stage!

The RU has some relationship with your size of data. Recall that 500 is the lowest possible RU for any container. For 50 GB worth of data, your minimum RU is over that. Actually it is over 2000.
Let's say it is 5000. Then your 5000 RU is distributed over all your physical partitions, correct. In your case one physical partition will be more than one logical ones. As far as where exactly those partitions are stored - well - that is not published so that is unknown.
What is known is that the performance and availability SLA is the same. Hope this helps.

How does DynamoDB partition tables?

The DynamoDB documentation describes how table partitioning works in principle, but its very light on specifics (i.e. numbers). Exactly how, and when, does DynamoDB table partitioning take place?

I found this presentation produced by Rick Houlihan (Principal Solutions Architect DynamoDB) from AWS Loft San Franciso on 20th January 2016.
The presention is also on Youtube.
This slide provides the important detail on how/when table partitioning occurs:
And below I have generalised the equation you can plug your own values into.
Partitions by capacity = (RCUs/3000) + (WCUs/1000)
Partitions by size = TableSizeInGB/10
Total Partitions = Take the largest of your Partitions by capacity and Partitions by size. Round this up to an integer.
In summary a partition can contain a maximum of 3000 RCUs, 1000 WCUs and 10GB of data. Once partitions are created, RCUs, WCUs and data are spread evenly across them.
Note that, to the best of my knowledge, once you have created partitions, lowering RCUs, WCUs and removing data will not result in the removal of partitions. I don't currently have a reference for this.

Regarding the "removal of partitions" point Stu mentioned.
You don't directly control the number of partitions and once the partitions are created they cannot be deleted => this behaviour can cause performance issues which are many times not expected.
Consider you have a Table which has 500WCU assigned. For this example consider you have 15GB of data stored in this Table. This means we reached a data size cap (10GB per partition) thus we currently have 2 partitions between which the RCUs and WCUs are split (each partition can use 250WCU).
Soon there will be an enormous increase (let's say Black Friday) of users that needs to write the data to the Table. So what would you do is to increase the WCUs to 10000, to handle the load, right? Well, what happens behind the scenes is that DynamoDB has reached another cap - WCU capacity per partition (max 1000) - so it creates 10 partitions between which the data are spread by the hashing function in our Table.
Once the Black Friday is over - you decide to decrease the WCU back to 500 to save the cost. What will happen is that even though you decreased the WCU, the number of partitions will not decrease => now you have to SPLIT those 500 WCU between 10 partitions (so effectively every partition can only use 50WCU).
You see the problem? This is often forgotten and can bite you if you are not planning properly how the data will be used in your application.
TLDR: Always understand how your data will be used and plan your database design properly.

Why is cosmos db creating 5 partitions for a same partition key value?

We are using Cosmos DB SQL API and here's a collection XYZ with:
Size: Unlimited
Throughput: 50000 RU/s
PartitionKey: Hashed
We are inserting 200,000 records each of size ~2.1 KB and having same value for a partition key column. Per our knowledge all the docs with same partition key value are stored in the same logical partition, and a logical partition should not exceed 10 GB limit whether we are on fixed or unlimited sized collection.
Clearly our total data is not even 0.5 GB. However, in the metrics blade of Azure Cosmos DB (in portal), it says:
Collection XYZ has 5 partition key ranges. Provisioned throughput is
evenly distributed across these partitions (10000 RU/s per partition).
This does not match with what we have studied so far from the MSFT docs. Are we missing something? Why are these 5 partitions created?

When using the Unlimited collection size, by default you will be provisioned 5 physical partition key ranges. This number can change, but as of May 2018, 5 is the default. You can think of each physical partition as a "server". So your data will be spread amongst 5 physical "servers". As your data size grows, your data will automatically be re-distributed against more physical partitions. That's why getting partition key correct upfront in your design is so important.
The problem in your scenario of having the same Partition Key (PK) for all 200k records is that you will have hot spots. You have 5 physical "servers" but only one will ever be used. The other 4 will go idle, and the result is that you'll have less performance for the same price point. You're paying for 50k RU/s but will ever only be able to use 10k RU/s. Change your PK to something that is more uniformly distributed. This will vary of course how you read the data. If you give more detail about the docs you're storing then we may be able to help give a recommendation. If you're simply doing point lookups (calling ReadDocumentAsync() by each Document ID) then you can safely partition on the ID field of the document. This will spread all 200k of your docs across all 5 physical partitions and your 50k RU/s throughput will be maximized. Once you effectively do this, you will probably see that you can reduce the RU usage to something much lower and save a ton of money. With only 200k records each at 2.1KB, you probably could go low as 2500 RU/s (1/20th of the cost you're paying now).
*Server is in quotes because each physical partition is actually a collection of many servers that are load-balanced for high availability and also throughput (depending on your consistency level).

From "How does partitioning work":
In brief, here's how partitioning works in Azure Cosmos DB:
You provision a set of Azure Cosmos DB containers with T RU/s
(requests per second) throughput.
Behind the scenes, Azure Cosmos DB
provisions physical partitions needed to serve T requests per second.
If T is higher than the maximum throughput per physical partition t,
then Azure Cosmos DB provisions N = T/t physical partitions. The value
of maximum throughput per partition(t) is configured by Azure Cosmos
DB, this value is assigned based on total provisioned throughput and
the hardware configuration used.
.. and more importantly:
When you provision throughput higher than t*N, Azure Cosmos DB splits one or more of your physical partitions to support the higher throughput.
So, it seems your requested RU throughput of 50k is higher than that t mentioned above. Considering the numbers, it seems t is ~10k RU/s.
Regarding the actual value of t, CosmosDB team member Aravind Krishna R. has said in another SO post:
[---] the reason this value is not explicitly mentioned is because it will be changed (increased) as the Azure Cosmos DB team changes hardware, or rolls out hardware upgrades. The intent is to show that there is always a limit per partition (machine), and that partition keys will be distributed across these partitions.
You can discover the current value by saturating the writes for a single partition key at maximum throughput.

Impact of Decrease of dynamodb WCU

I have a requirement where I need to initialise my dynamodb table with large volumne of data. Say around 1M in 15 min so I ll have to provision WCU to 10k but after that my load is ~1k per second so I ll decrease WCU to 1k from 10k . Is there any performance drawback or issues in decreasing WCU.
Thanks

In general, assuming the write request doesn't exceed the write capacity units (i.e. as you have not mentioned the item size), there should not be any performance issue.
If at any point you anticipate traffic growth that may exceed your
provisioned throughput, you can simply update your provisioned
throughput values via the AWS Management Console or Amazon DynamoDB
APIs. You can also reduce the provisioned throughput value for a table
as demand decreases. Amazon DynamoDB will remain available while
scaling it throughput level up or down.
Consider this scenario:-
Assume the item size is 1.5KB in size.
First, you would determine the number of write capacity units required per item, rounding up to the nearest whole number, as shown following:
1.5 KB / 1 KB = 1.5 --> 2
The result is two write capacity units per item. Now, you multiply this by the number of writes per second (i.e. 1K per second).
2 write capacity units per item × 1K writes per second = 2K write capacity units
In this scenario, the DynamoDB would throw error code 400 on your extra requests.
If your application performs more reads/second or writes/second than
your table’s provisioned throughput capacity allows, requests above
your provisioned capacity will be throttled and you will receive 400
error codes. For instance, if you had asked for 1,000 write capacity
units and try to do 1,500 writes/second of 1 KB items, DynamoDB will
only allow 1,000 writes/second to go through and you will receive
error code 400 on your extra requests. You should use CloudWatch to
monitor your request rate to ensure that you always have enough
provisioned throughput to achieve the request rate that you need.

Yes, there is a potential impact.
Once you write at high TPS, the more no. of partitions gets created which cannot be reduced later on.
If this number was higher than what is needed eventually for application to run well, this can cause problems.
Read more about DDB partitions for same.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex