The DynamoDB documentation describes how table partitioning works in principle, but its very light on specifics (i.e. numbers). Exactly how, and when, does DynamoDB table partitioning take place?
I found this presentation produced by Rick Houlihan (Principal Solutions Architect DynamoDB) from AWS Loft San Franciso on 20th January 2016.
The presention is also on Youtube.
This slide provides the important detail on how/when table partitioning occurs:
And below I have generalised the equation you can plug your own values into.
Partitions by capacity = (RCUs/3000) + (WCUs/1000)
Partitions by size = TableSizeInGB/10
Total Partitions = Take the largest of your Partitions by capacity and Partitions by size. Round this up to an integer.
In summary a partition can contain a maximum of 3000 RCUs, 1000 WCUs and 10GB of data. Once partitions are created, RCUs, WCUs and data are spread evenly across them.
Note that, to the best of my knowledge, once you have created partitions, lowering RCUs, WCUs and removing data will not result in the removal of partitions. I don't currently have a reference for this.
Regarding the "removal of partitions" point Stu mentioned.
You don't directly control the number of partitions and once the partitions are created they cannot be deleted => this behaviour can cause performance issues which are many times not expected.
Consider you have a Table which has 500WCU assigned. For this example consider you have 15GB of data stored in this Table. This means we reached a data size cap (10GB per partition) thus we currently have 2 partitions between which the RCUs and WCUs are split (each partition can use 250WCU).
Soon there will be an enormous increase (let's say Black Friday) of users that needs to write the data to the Table. So what would you do is to increase the WCUs to 10000, to handle the load, right? Well, what happens behind the scenes is that DynamoDB has reached another cap - WCU capacity per partition (max 1000) - so it creates 10 partitions between which the data are spread by the hashing function in our Table.
Once the Black Friday is over - you decide to decrease the WCU back to 500 to save the cost. What will happen is that even though you decreased the WCU, the number of partitions will not decrease => now you have to SPLIT those 500 WCU between 10 partitions (so effectively every partition can only use 50WCU).
You see the problem? This is often forgotten and can bite you if you are not planning properly how the data will be used in your application.
TLDR: Always understand how your data will be used and plan your database design properly.
Related
In AWS documentation, it stated that
"For provisioned mode tables, you specify throughput capacity in terms of read capacity units (RCUs) and write capacity units (WCUs):
One read capacity unit represents **one strongly consistent read per second**, or two eventually consistent reads per second, for an item up to 4 KB in size."
But what count as one read? If I loop through different partitions to read from dynamodb, will each loop count as one read? Thank you.
Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html
For a GetItem and BatchGetItem operation which read an individual item, the size of the entire item is used to calculate the amount of RCU (read capacity units) used, even if you only ask to read specific parts from this item. As you quoted, this size is than rounded up to a multiple of 4K: If the item is 3.9K you'll pay one RCU for a strongly-consistent read (ConsistentRead=true), and two RCUs for a 4.1K item. Again, as you quoted, if you asked for an eventual-consistent read (ConsistentRead=false) the number of RCUs would be halved.
For transactions (TransactGetItems) the number of RCUs is double what it would have been with consistent reads.
For scans - Scan or Query - the cost is calculated the same as reading a single item, except for one piece of good news: The rounding up happens for the entire size read, not for each individual item. This is very important for small items - for example consider that you have items of 100 bytes each. Reading each one individually costs you one RCU even though it's only 100 bytes, not 4K. But if you Query a partition that has 40 of these items, the total size of these 40 items is 4000 bytes so you pay just one RCU to read all 40 items - not 40 RCUs. If the length of the entire partion is 4 MB, you'll pay 1024 RCUs when ConsistentRead=true, or 512 RCUs when ConsistentRead=false, to read the entire partition - regardless of how many items this partition contains.
The documentation to the partitionKeyPath for the Cosmos DB only point to large data and scaling. But what is with small data which frequently changed. For example with a container with a TTL of some seconds. Is the frequently creating and removing of logical partitions an overhead?
Should I use a static partition key value in this case for best performance?
Or should I use the /id because this irrelevant if all is in one physical partition?
TLDR: Use as granular LP key as possible. document id will do the job.
There are couple factors which affect performance and results you get from logical partition (LP) selection. When assessing your partitioning strategy you should bear in mind some limitations on the Logical and Physical Partition (PP) sizing.
LP limitation:
Max 20GB documents
PP limitations:
Max 10k RU per one physical partition
Max 50GB documents
Going beyond the PP limits will cause partition split - skewed PP will be replaced and data split equally between two newly provisioned PPs. It has an effect on max RU per PP as max throughput is calculated based on [provisioned throughput]/[number of PPs]
I definitely wouldn't suggest using static LP key. Smaller logical partitions - more maintainable and predictable performance of your container.
Very specific and unique data consumption patterns may benefit from larger LPs but only if you're trying to micro-optimize queries for better performance and majority of queries you will be running will filter data by LP key. Moreover even for this scenario there is a high risk of a major drawback - hot partitions and partition data skew for containers/DBs with more than 50GB in size.
Newbie to DDB here. I've been using a DDB table for a year now. Recently, I made improvements by compressing the payload using gzip (and representing it as a binary in DDB) and storing the new data in another newly created beta table. Overall compression was 3x. I expected the read latency(GetItem) to improve as well as it's less data to be transported over the wire. However, I'm seeing that the read latency has increased from ~ 50ms p99.9 to ~114 ms p99.9. I'm not sure how that happened and was wondering if because of the compression, now I have a lot of rows per partition (which I think is defined as <= 10 GB). I now have 3-4x more rows per partition. So, I'm wondering that once dynamoDb determines the right partition for a partition key, then within the partition how does it find the correct item? Gut feel is that this shouldn't lead to an increase in latency as a simplified representation of the partition can be a giant hashmap so it'd just be a simple lookup. I'd appreciate any help here.
My DDB schema:
partition-key - user-id,dataset-name
range-key - update-timestamp
payload - used to be string, now is compressed/binary.
In my GetItem requests, I specify both partition key and range key.
According to your description, your change included two unrelated parts: You compressed the payload, and increased the number of items per partition. The first change - the compression - probably has little effect on the p99 latency (it could have a more noticable effect on the mean latency - which, according to Little's Law is related to throughput, if your client has fixed concurrency - but I'd expect it to lower, not increase).
Some guesses as to what might have increased the p99 latency:
More items per partition means that DynamoDB (which uses a B-tree) needs to do more disk reads to find a specific item. Since each disk access has rare delays caused by queueing, this adds to the tail latency.
You said that the change caused each partition to hold more items, I guess this means you now have fewer partitions. If you have too few of them, you can start getting unbalanced load on the different DynamoDB partitions, and more contention and latency for specific "hot" partitions.
I don't know how you measure your latency. Your client now needs (I guess) to uncompress the returned result, maybe it is now busier, adding queening delays in the client? Can you lower your client's concurrency (how many client threads run in parallel) and see if the high tail latency is an artifact of the server design, or the client's design?
I have a bunch of documents. Right now only about 100,000. But I could potentially have millions. These documents are each about 15KB each.
Right now the way I'm calculating the partition key is to take the Id field from Sql, which is set to autoincrement by 1, and dividing that number by 1000. I think this is not a good idea.
Sometimes I have to hit the CosmosDB very hard with parallel writes. When I do this, the documents usually have very closely grouped SQL Ids. For example, like this:
12000
12004
12009
12045
12080
12090
12102
As you can see, all of these documents would be written at the same time to the same partition because they would all have a partition key of 12. And from the documentation I've read, this is not good. I should be spreading my writes across partitions.
I'm considering changing this so that the PartitionKey is the Sql Id divided by 10,000 plus the last digit. Assuming that the group of Ids being written at the same time are randomly distributed (which they pretty much are).
So like this:
(12045 / 10000).ToString() + (12045 % 10).ToString()
This means, given my list above, the partition keys would be:
12000: 10
12004: 14
12009: 19
12045: 15
12080: 10
12090: 10
12102: 12
Instead of writing all 7 to a single partition, this will write all 7 to partitions 10, 12, 14, 15, and 19 (5 total). Will this result in faster write times? What are the effects on read time? Am I doing this right?
Also, is it better to have the first part of the key be the Id / 1000 or Id / 1000000? In other words, is it better to have lots of small partitions or should I aim to fill up the 10 GB limit of single partitions?
you should aim at evenly distributing load between your partitions. 10gb is the limit,you shouldn't aim to hit that limit (because that would mean you wont be able to add documents to the partition anymore).
Creating a synthetic partition key is a valid way to distribute your documents evenly between partitions. Its up to you to find\invent a key that would fit your load pattern.
You could simply take the last digit of your Id, thus nicely spreading the documents over exactly 10 partitions.
In regards to your comment on max partitions: the value of the partitionKey is hashed and THAT hash determines the physical partitions. So when your partitionKey has 1.000 possible values, it does not mean you have 1.000 partitions.
I have a question...
If I have 1000 item having same partition key in a table... And if I made a query for this partition key with limit 10 then I want to know does it take read capacity unit for 1000 items or for just 10 items
Please clear my doubt
I couldn't find the exact point in the DynamoDB documentation. From my experience it uses only the returned limit for consumed capacity which is 10 (Not 1000).
You can quickly evaluate this also using the following approach.
However, you can specify the ReturnConsumedCapacity parameter in
a Query request to obtain this information.
The limit option will limit the number of results returned. The capacity consumed depends on the size of the items, and how many of them are accessed (I say accessed because if you have filters in place, more capacity may be consumed than the number of items actually returned would consume if there are items that get filtered out) to produce the results returned.
The reason I mention this is because, for queries, each 4KB of returned capacity is equivalent to 1 read capacity unit.
Why is this important? Because if your items are small, then for each capacity unit consumed you could return multiple items.
For example, if each item is 200 bytes in size, you could be returning up to 20 items for each capacity unit.
According to the aws documentation:
The maximum number of items to evaluate (not necessarily the number of matching items). If DynamoDB processes the number of items up to the limit while processing the results, it stops the operation and returns the matching values up to that point, and a key in LastEvaluatedKey to apply in a subsequent operation, so that you can pick up where you left off.
It seems to me that it means that it will not consume the capacity units for all the items with the same partition key. According to your example the consumed capacity units will be for your 10 items.
However since I did not test it I cannot be sure, but that is how I understand the documentation.