Ok, so my understanding of read units is that it costs 1 read unit per item, unless the item exceeds 4KB in which case read units = ceiling(item size/4).
However when I submit a scan asking for 80 items (provisioned throughput is 100), the response returns a ConsumedCapacity of either 2.5 or 3 read units. This is frustrating because 97% of the provisioned hardware is not being used. Any idea why this might be the case?
What is your item size for the 80 items? Looking at the documentation here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ProvisionedThroughputIntro.html
You can use the Query and Scan operations in DynamoDB to retrieve
multiple consecutive items from a table in a single request. With
these operations, DynamoDB uses the cumulative size of the processed
items to calculate provisioned throughput. For example, if a Query
operation retrieves 100 items that are 1 KB each, the read capacity
calculation is not (100 × 4 KB) = 100 read capacity units, as if those
items were retrieved individually using GetItem or BatchGetItem.
Instead, the total would be only 25 read capacity units ((100 * 1024
bytes) = 100 KB, which is then divided by 4 KB).
So if your items are small, that would explain why Scan is not consuming as much capacity as you would expect. Also, note Scan uses eventually consistent reads, which consume half of the read capacity units.
Related
In AWS documentation, it stated that
"For provisioned mode tables, you specify throughput capacity in terms of read capacity units (RCUs) and write capacity units (WCUs):
One read capacity unit represents **one strongly consistent read per second**, or two eventually consistent reads per second, for an item up to 4 KB in size."
But what count as one read? If I loop through different partitions to read from dynamodb, will each loop count as one read? Thank you.
Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html
For a GetItem and BatchGetItem operation which read an individual item, the size of the entire item is used to calculate the amount of RCU (read capacity units) used, even if you only ask to read specific parts from this item. As you quoted, this size is than rounded up to a multiple of 4K: If the item is 3.9K you'll pay one RCU for a strongly-consistent read (ConsistentRead=true), and two RCUs for a 4.1K item. Again, as you quoted, if you asked for an eventual-consistent read (ConsistentRead=false) the number of RCUs would be halved.
For transactions (TransactGetItems) the number of RCUs is double what it would have been with consistent reads.
For scans - Scan or Query - the cost is calculated the same as reading a single item, except for one piece of good news: The rounding up happens for the entire size read, not for each individual item. This is very important for small items - for example consider that you have items of 100 bytes each. Reading each one individually costs you one RCU even though it's only 100 bytes, not 4K. But if you Query a partition that has 40 of these items, the total size of these 40 items is 4000 bytes so you pay just one RCU to read all 40 items - not 40 RCUs. If the length of the entire partion is 4 MB, you'll pay 1024 RCUs when ConsistentRead=true, or 512 RCUs when ConsistentRead=false, to read the entire partition - regardless of how many items this partition contains.
I am currently doing a batch load to DynamoDB and dividing our data items into batch units:
According to the limits documentation:
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
Some of the limits are:
There are more than 25 requests in the batch.
Any individual item in a batch exceeds 400 KB.
The total request size exceeds 16 MB.
The big unknown for me is how is possible with 25 items of a maximum of 400 Kb, the payload will exceed 16Mbs. Accounting for table names of less than 255 bytes, etc. I don't understand the limit or am I missing something simple.
Thanks.
The 16MB size is actually the total size of the request. Consider an object with many many small objects, the DynamoDB request map could be larger than the size of the combined items.
When I am scanning the data, it is having a limit of 1MB for 1 segment. To display all the data in one segment is possible in asynchronous parallel in dynamodb.
When doing a parallel scan, each Scan API call will return at most 1MB of data. So, if you increase the number of segments enough, you will be able to get all of your data in one trip. According to the documentation, the maximum number of segments is 1 million. That means that as long as you have enough provisioned read throughput on your table, and the size of your table is less than 976GB, reading the entire table in one round trip is possible. Each 1MB page will incur 128 RCU if ConsistentRead=false. Therefore, if each partition supports up to 3000 Reads per second, each partition can support reading up to 23 segments in parallel. Dividing 1 million segments by 23 segments per partition yields 43479 partitions required to support a simultaneous read of 976GB.
To create a table with 43479 partitions or more, find the next largest power of 2. In this case, the next largest power of 2 to 43479 is 2^16=65536. Provision a table with 65536*750 = 49152000 WPS and 49152000 RPS to create it with 65536 partitions. The instantaneous RCU consumption of this parallel scan will be 128 * 1000000 = 128 million RCU, so you would need to re-provision your table at 128 million RPS before performing the parallel scan.
For any of this to work, you will need to request increases to your account-level and table level provisioned capacity limits. Otherwise, in us-east-1, default table provisioning limits are 40k rps and 40k wps per table. If you provision a table at 40k reads per second, you can read a maximum of 312 segments in parallel, or 312 MB.
How to calculate RCU and WCU with the data given as: reading throughput of 32 GB/s and writing throughput of 16 GB/s.
DynamoDB Provisioned Throughput is based upon a certain size of units, and the number of items being written:
In DynamoDB, you specify provisioned throughput requirements in terms of capacity units. Use the following guidelines to determine your provisioned throughput:
One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for items up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units. The total number of read capacity units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.
One write capacity unit represents one write per second for items up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB will need to consume additional write capacity units. The total number of write capacity units required depends on the item size.
Therefore, when determining your desired capacity, you need to know how many items you wish to read and write per second, and the size of those items.
Rather than seeking a particular GB/s, you should be seeking a given number of items that you wish to read/write per second. That is the functionality that your application would require to meet operational performance.
There are also some DynamoDB limits that would apply, but these can be changed upon request:
US East (N. Virginia) Region:
Per table – 40,000 read capacity units and 40,000 write capacity units
Per account – 80,000 read capacity units and 80,000 write capacity units
All Other Regions:
Per table – 10,000 read capacity units and 10,000 write capacity units
Per account – 20,000 read capacity units and 20,000 write capacity units
At 40,000 read capacity units x 4KB x 2 (eventually consistent) = 320MB/s
If my calculations are correct, your requirements are 100x this amount, so it would appear that DynamoDB is not an appropriate solution for such high throughputs.
Are your speeds correct?
Then comes the question of how you are generating so much data per second. A full-duplex 10GFC fiber runs at 2550MB/s, so you would need multiple fiber connections to transmit such data if it is going into/out of the AWS cloud.
Even 10Gb Ethernet only provides 10Gbit/s, so transferring 32GB would require 28 seconds -- and that's to transmit one second of data!
Bottom line: Your data requirements are super high. Are you sure they are realistic?
if you click on capacity tab of your dynamodb table there is a capacity calcuator link next to Estimated cost. you can use that to determine the read and write capacity units along with estimated cost.
read capacity units are dependent on the type of read that you need (strongly consistent/eventually consistent), item size and throughput that you desire.
write capacity units are determined by throughput and item size only.
for calculating item size you can refer this and below is a screenshot of the calculator
I have a requirement where I need to initialise my dynamodb table with large volumne of data. Say around 1M in 15 min so I ll have to provision WCU to 10k but after that my load is ~1k per second so I ll decrease WCU to 1k from 10k . Is there any performance drawback or issues in decreasing WCU.
Thanks
In general, assuming the write request doesn't exceed the write capacity units (i.e. as you have not mentioned the item size), there should not be any performance issue.
If at any point you anticipate traffic growth that may exceed your
provisioned throughput, you can simply update your provisioned
throughput values via the AWS Management Console or Amazon DynamoDB
APIs. You can also reduce the provisioned throughput value for a table
as demand decreases. Amazon DynamoDB will remain available while
scaling it throughput level up or down.
Consider this scenario:-
Assume the item size is 1.5KB in size.
First, you would determine the number of write capacity units required per item, rounding up to the nearest whole number, as shown following:
1.5 KB / 1 KB = 1.5 --> 2
The result is two write capacity units per item. Now, you multiply this by the number of writes per second (i.e. 1K per second).
2 write capacity units per item × 1K writes per second = 2K write capacity units
In this scenario, the DynamoDB would throw error code 400 on your extra requests.
If your application performs more reads/second or writes/second than
your table’s provisioned throughput capacity allows, requests above
your provisioned capacity will be throttled and you will receive 400
error codes. For instance, if you had asked for 1,000 write capacity
units and try to do 1,500 writes/second of 1 KB items, DynamoDB will
only allow 1,000 writes/second to go through and you will receive
error code 400 on your extra requests. You should use CloudWatch to
monitor your request rate to ensure that you always have enough
provisioned throughput to achieve the request rate that you need.
Yes, there is a potential impact.
Once you write at high TPS, the more no. of partitions gets created which cannot be reduced later on.
If this number was higher than what is needed eventually for application to run well, this can cause problems.
Read more about DDB partitions for same.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html