Why is COSMOS is throttling when RU is not crossed - azure-cosmosdb

From my understanding if the RUs exceeds more than what is set there will be throttling. but from what I see below this is has not even crossed the RU threshold (actual value 2457) but I am getting HTTP429.
But in another collection I see that it has cross the RU threshold but there is no throttling.

I guess "Max consumed RU/s per partion key range" is not aggregated over 1 minute. So you have more than one request in that exact throttled minute and it sums up with the request for 2457.

Related

DynamoDB ConsumedWriteCapacityUnits vs Consumed CloudWatch Metrics for 1 second Period

I am confused by this live chart, ConsumedWriteCapacityUnits is exceeding the provisioned units, while "consumed" is way below. Do I have a real problem or not?
This seems to only show for the
One Second Period
One Minute Period
Your period is wrong for the metrics. DynamoDB emits metrics at the following periods:
ConsumedCapacity: 1 min
ProvisionedCapacity: 5 min
For ConsumedCapacity you should divide the metric by the period but only at a minimum of 1min.
Exceeding provisioned capacity for short periods of time is fine, as burst capacity will allow you to do so. But if you exceed it for long periods it will lead to throttling.

Azure Data Factory "write throughput budget" setting in Cosmos sink always caps RU usage to ~10%

We are using a Cosmos dataset as a sink for a data flow in our pipeline. The Cosmos database is configured to use 400 RUs during normal operations, but we upscale it during the pipeline run. The upscaling works flawlessly.
The pipeline consumes 100% of the provisioned throughput, as is expected. We would like to limit this to about 80%, so that our customers don't experience delays and timeout exceptions. According to the documentation the "Write throughput budget" setting in the Cosmos sink is suppose to be "An integer that represents the RUs you want to allocate for this Data Flow write operation, out of the total throughput allocated to the collection". Unless I am mistaken, this means that you can set a limit to how many RUs the pipeline is allowed to consume.
However, no matter what value we use for "Write throughput budget", the pipeline will always consume ~10% of the total provisioned throughput. We have tested with a wide range of values, and the result is always the same. If we do not set a value 100% of RUs are consumed, but ~10% is always used whether we set the value to 1, 500, 1000, or even 1200 (of a total 1000).
Does anyone know if this is a bug with the ADF Cosmos sink, or have I misunderstood what this setting is supposed to be? Is there any other way of capping how many Cosmos RUs an ADF pipeline is allowed to use?
EDIT:
This is definitely related to data size. Setting provisioned throughput to 10000 RUs and write throughput budget to 7500 uses ~85% of total RUs when we test with 300 000 documents. Using the same settings, but 10 000 000 documents, we see a consistent ~10% RU usage for the pipeline run.
The solution to our problem was to set "write throughput budget" to a much higher value than provisioned throughput. Data size and number of partitions used in the pipeline is definitely have an effect on what settings you should use. For reference we had ~10 000 000 documents of 455 bytes each. Setting throughput to 10 000 and write throughput budget to 60 000, ADF used on average ~90% of the provisioned throughput.
I recommend trial and error for your specific case, and to not be afraid to set the write budget to a much higher value than you think is necessary.
It will depend on the number of partitions. Check how many partitions you have at the sink.

Scaling a CosmosDB collection - the minimum has increased recently

I created a couple of CosmosDB collections with the minimum 400 RU/s during my Azure trial.
I was able to scale them up and down on demand, typically between 400 and 5000 RU/s.
I filled one of my collections with lots of test data (I'm currently at approx. 50GB in there), evenly split across 8 partitions (according to the "metrics" view in the portal).
I'm not able to scale the collection down to 400 RU/s anymore. The new minimum is shown as 800 RU/s:
Screenshot from my portal
I suspect that it has something to do with the number of partitions but I wasn't able to find anything about this in the documentation.
This is confusing, my understanding was that the RU/s can be scaled down to 400 at any time.
My goal is to scale down the RU/s as much as possible and I was hoping to be able to get back to 400 RU/s.
When you have collection level throughput provisioned, the minimum amount of RUs you can allocate is equal to 100 * number of physical partitions. This is because the minimum number of RUs per physical partitions is 100.
400 is by default the minimum because partitioned collection come out of the box with 4 physical partitions.

Billing in CosmosDb when increasing RU for 5 minutes

How would billing work in Azure CosmosDb if I use the SDK to increase the throughput for a small amount of time, like 5 minutes?
Will I be charged one hour of the max RU or just a fraction of the hour?
Indeed CosmosDB charges you for the highest provisioned throughput within an hour. It is also cycle based so if you increase at 01:58 and decrease at 02:03 (o'clock might not be the actual cycle time) you could be charged for 2 hours.
Reserved RUs/second (per 100 RUs, 400 RUs minimum) £0.006/hour
"You're billed the flat rate for each hour the container or database exists, regardless of usage or if the container or database is active for less than an hour. For example, if you create a container or database and delete it 5 minutes later, your bill will reflect a 1 hour."
More info here: https://azure.microsoft.com/en-us/pricing/details/cosmos-db/

Impact of Decrease of dynamodb WCU

I have a requirement where I need to initialise my dynamodb table with large volumne of data. Say around 1M in 15 min so I ll have to provision WCU to 10k but after that my load is ~1k per second so I ll decrease WCU to 1k from 10k . Is there any performance drawback or issues in decreasing WCU.
Thanks
In general, assuming the write request doesn't exceed the write capacity units (i.e. as you have not mentioned the item size), there should not be any performance issue.
If at any point you anticipate traffic growth that may exceed your
provisioned throughput, you can simply update your provisioned
throughput values via the AWS Management Console or Amazon DynamoDB
APIs. You can also reduce the provisioned throughput value for a table
as demand decreases. Amazon DynamoDB will remain available while
scaling it throughput level up or down.
Consider this scenario:-
Assume the item size is 1.5KB in size.
First, you would determine the number of write capacity units required per item, rounding up to the nearest whole number, as shown following:
1.5 KB / 1 KB = 1.5 --> 2
The result is two write capacity units per item. Now, you multiply this by the number of writes per second (i.e. 1K per second).
2 write capacity units per item × 1K writes per second = 2K write capacity units
In this scenario, the DynamoDB would throw error code 400 on your extra requests.
If your application performs more reads/second or writes/second than
your table’s provisioned throughput capacity allows, requests above
your provisioned capacity will be throttled and you will receive 400
error codes. For instance, if you had asked for 1,000 write capacity
units and try to do 1,500 writes/second of 1 KB items, DynamoDB will
only allow 1,000 writes/second to go through and you will receive
error code 400 on your extra requests. You should use CloudWatch to
monitor your request rate to ensure that you always have enough
provisioned throughput to achieve the request rate that you need.
Yes, there is a potential impact.
Once you write at high TPS, the more no. of partitions gets created which cannot be reduced later on.
If this number was higher than what is needed eventually for application to run well, this can cause problems.
Read more about DDB partitions for same.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html

Resources