Choosing A PartitionKey in Azure Cosmos DB - azure-cosmosdb

I have a bunch of documents. Right now only about 100,000. But I could potentially have millions. These documents are each about 15KB each.
Right now the way I'm calculating the partition key is to take the Id field from Sql, which is set to autoincrement by 1, and dividing that number by 1000. I think this is not a good idea.
Sometimes I have to hit the CosmosDB very hard with parallel writes. When I do this, the documents usually have very closely grouped SQL Ids. For example, like this:
12000
12004
12009
12045
12080
12090
12102
As you can see, all of these documents would be written at the same time to the same partition because they would all have a partition key of 12. And from the documentation I've read, this is not good. I should be spreading my writes across partitions.
I'm considering changing this so that the PartitionKey is the Sql Id divided by 10,000 plus the last digit. Assuming that the group of Ids being written at the same time are randomly distributed (which they pretty much are).
So like this:
(12045 / 10000).ToString() + (12045 % 10).ToString()
This means, given my list above, the partition keys would be:
12000: 10
12004: 14
12009: 19
12045: 15
12080: 10
12090: 10
12102: 12
Instead of writing all 7 to a single partition, this will write all 7 to partitions 10, 12, 14, 15, and 19 (5 total). Will this result in faster write times? What are the effects on read time? Am I doing this right?
Also, is it better to have the first part of the key be the Id / 1000 or Id / 1000000? In other words, is it better to have lots of small partitions or should I aim to fill up the 10 GB limit of single partitions?

you should aim at evenly distributing load between your partitions. 10gb is the limit,you shouldn't aim to hit that limit (because that would mean you wont be able to add documents to the partition anymore).
Creating a synthetic partition key is a valid way to distribute your documents evenly between partitions. Its up to you to find\invent a key that would fit your load pattern.

You could simply take the last digit of your Id, thus nicely spreading the documents over exactly 10 partitions.
In regards to your comment on max partitions: the value of the partitionKey is hashed and THAT hash determines the physical partitions. So when your partitionKey has 1.000 possible values, it does not mean you have 1.000 partitions.

Related

Which partitionKeyPath should be used for frequently changed small data in Cosmos DB?

The documentation to the partitionKeyPath for the Cosmos DB only point to large data and scaling. But what is with small data which frequently changed. For example with a container with a TTL of some seconds. Is the frequently creating and removing of logical partitions an overhead?
Should I use a static partition key value in this case for best performance?
Or should I use the /id because this irrelevant if all is in one physical partition?
TLDR: Use as granular LP key as possible. document id will do the job.
There are couple factors which affect performance and results you get from logical partition (LP) selection. When assessing your partitioning strategy you should bear in mind some limitations on the Logical and Physical Partition (PP) sizing.
LP limitation:
Max 20GB documents
PP limitations:
Max 10k RU per one physical partition
Max 50GB documents
Going beyond the PP limits will cause partition split - skewed PP will be replaced and data split equally between two newly provisioned PPs. It has an effect on max RU per PP as max throughput is calculated based on [provisioned throughput]/[number of PPs]
I definitely wouldn't suggest using static LP key. Smaller logical partitions - more maintainable and predictable performance of your container.
Very specific and unique data consumption patterns may benefit from larger LPs but only if you're trying to micro-optimize queries for better performance and majority of queries you will be running will filter data by LP key. Moreover even for this scenario there is a high risk of a major drawback - hot partitions and partition data skew for containers/DBs with more than 50GB in size.

What will happen to my sqlite database ten years from now in terms of capacity and query speed

I have created a database with one single table (check the code bellow). I plan to insert 10 rows per minute, which is about 52 million rows in ten years from now.
My question is, what can I expect in terms of database capacity and how long it will take to execute select query. Of course, I know you can not provide me an absolute values, but if you can give me any tips on change/speed rates, traps etc. I would be very glad.
I need to tell you, there will be 10 different observations (this is why I will insert ten rows per minute).
create table if not exists my_table (
date_observation default current_timestamp,
observation_name text,
value_1 real(20),
value_1_name text,
value_2 real(20),
value_2_name text,
value_3 real(20),
value_3_name text);
Database capacity exceeds known storage device capacity as per Limits In SQLite.
The more pertinent paragraphs are :-
Maximum Number Of Rows In A Table
The theoretical maximum number of rows in a table is 2^64
(18446744073709551616 or about 1.8e+19). This limit is unreachable
since the maximum database size of 140 terabytes will be reached
first. A 140 terabytes database can hold no more than approximately
1e+13 rows, and then only if there are no indices and if each row
contains very little data.
Maximum Database Size
Every database consists of one or more "pages". Within a single
database, every page is the same size, but different database can have
page sizes that are powers of two between 512 and 65536, inclusive.
The maximum size of a database file is 2147483646 pages. At the
maximum page size of 65536 bytes, this translates into a maximum
database size of approximately 1.4e+14 bytes (140 terabytes, or 128
tebibytes, or 140,000 gigabytes or 128,000 gibibytes).
This particular upper bound is untested since the developers do not
have access to hardware capable of reaching this limit. However, tests
do verify that SQLite behaves correctly and sanely when a database
reaches the maximum file size of the underlying filesystem (which is
usually much less than the maximum theoretical database size) and when
a database is unable to grow due to disk space exhaustion.
Speed determination has many aspects and is thus not a simple how fast will it go, like a car. The file system, the memory, optimisation are all factors that need to be taken into consideration. As such the answer is the same as the length of the anecdotal piece of string.
Note 18446744073709551616 is if you utilise negative numbers otherwise the more frequently mentioned number of 9223372036854775807 is the limit (i.e a 64 bit signed integer)
To utilise negative rowid numbers and therefore the higher range you have to insert at least 1 negative value explicitly into a rowid or alias thereof as per If no negative ROWID values are inserted explicitly, then automatically generated ROWID values will always be greater than zero.

How does DynamoDB partition tables?

The DynamoDB documentation describes how table partitioning works in principle, but its very light on specifics (i.e. numbers). Exactly how, and when, does DynamoDB table partitioning take place?
I found this presentation produced by Rick Houlihan (Principal Solutions Architect DynamoDB) from AWS Loft San Franciso on 20th January 2016.
The presention is also on Youtube.
This slide provides the important detail on how/when table partitioning occurs:
And below I have generalised the equation you can plug your own values into.
Partitions by capacity = (RCUs/3000) + (WCUs/1000)
Partitions by size = TableSizeInGB/10
Total Partitions = Take the largest of your Partitions by capacity and Partitions by size. Round this up to an integer.
In summary a partition can contain a maximum of 3000 RCUs, 1000 WCUs and 10GB of data. Once partitions are created, RCUs, WCUs and data are spread evenly across them.
Note that, to the best of my knowledge, once you have created partitions, lowering RCUs, WCUs and removing data will not result in the removal of partitions. I don't currently have a reference for this.
Regarding the "removal of partitions" point Stu mentioned.
You don't directly control the number of partitions and once the partitions are created they cannot be deleted => this behaviour can cause performance issues which are many times not expected.
Consider you have a Table which has 500WCU assigned. For this example consider you have 15GB of data stored in this Table. This means we reached a data size cap (10GB per partition) thus we currently have 2 partitions between which the RCUs and WCUs are split (each partition can use 250WCU).
Soon there will be an enormous increase (let's say Black Friday) of users that needs to write the data to the Table. So what would you do is to increase the WCUs to 10000, to handle the load, right? Well, what happens behind the scenes is that DynamoDB has reached another cap - WCU capacity per partition (max 1000) - so it creates 10 partitions between which the data are spread by the hashing function in our Table.
Once the Black Friday is over - you decide to decrease the WCU back to 500 to save the cost. What will happen is that even though you decreased the WCU, the number of partitions will not decrease => now you have to SPLIT those 500 WCU between 10 partitions (so effectively every partition can only use 50WCU).
You see the problem? This is often forgotten and can bite you if you are not planning properly how the data will be used in your application.
TLDR: Always understand how your data will be used and plan your database design properly.

DocumentDB read latency when queried across partitions

I created 2 empty documentDB collections: 1) with single partition and 2) with multi-partition. Next inserted a single row on both these collections and ran a scan (select * from c). I found that the single partition took up ~2RUs whereas multi-partition took about ~50RUs. It's not just the RU's, but the read latency was about 20x slower with multi-partition. So is it that multi-partition always has high read latency when queried across partitions?
You can get the same latency for multi-partition collections as single-partition collections. Let's take the example of scans:
If you have non-empty collections, then the performance
will be the same as data is read from one of the partitions. Data is read from the first partition, and paginated across partitions in order.
If you use the MaxDegreeOfParallelism option, you'll get the same low
latencies. Note that query execution is serial by default, in order
to optimize for queries with larger datasets. If you use the
parallelism option, the query will have the same low latency
If you scan with a filter on partition key = value, then you'll get the
same performance even without the parallelism.
It is true that there is a small RU overhead for each partition touched during query (~2 RU per partition for query parsing). Note that this doesn't increase with query size, i.e., even if your query returned e.g. 1000 documents, the query will be 1000 + P*2 RUs for partitioned collections in place of 1000 RUs. You can eliminate this overhead of course by including a filter on partition key.

ratio between unique hash key and range key in dynamo db

Is it a problem if I choose my hash key and range key so that the number of unique hash keys is very low (maximum: 1000), while there are many more unique range keys?
Does the ratio between the number of unique hash and range keys affect the performance of retrieval of information?
It should not be a problem to have few hash keys with many range keys for each if:
The number of hash keys is not too low
Your access is randomly spread across the hash keys
You don't need to scale to extreme levels
According to the AWS Developer Guidelines for Working with Tables:
Provisioned throughput is dependent on the primary key selection, and
the workload patterns on individual items. When storing data, DynamoDB
divides a table's items into multiple partitions, and distributes the
data primarily based on the hash key element. The provisioned
throughput associated with a table is also divided evenly among the
partitions, with no sharing of provisioned throughput across
partitions.
Essentially, each hash key resides on a single node (i.e. server). Actually, it is redundantly stored to prevent data loss, but that can be ignored for this discussion. When you provision throughput you are indirectly determining the number of nodes to spread the hash keys across. However, no matter how much throughput you provision, it is limited for a single hash key by what a single node can handle.
To explain my three caveats:
1. The number of hash keys is not too low
You mention a max of 1000 hash keys, but the concern is what the minimum is. If for example there were only 10 hash keys then you would quickly reach the throughput limit for each key and would not actually realize the provisioned throughput.
2. Your access is randomly spread across the hash keys
It doesn't matter how many hash keys you have if there are a small number of keys that are "hot". That is if you are frequently reading or writing to only a small subset of the hash keys then you will reach the throughput limit of the nodes those keys are stored on.
3. You don't need to scale to extreme levels
Even assuming you have 1000 distinct hash keys and your access is randomly spread across them, if you need to scale to extreme levels you will eventually reach a point where each hash key is on a separate node. That is, if you provision enough throughput that each hash key is allocated to a separate node (i.e. you have 1000+ nodes), then any throughput provisioned beyond that level will not be realized because you will reach the limit of each node for each key.
The ratio of range keys to hash keys should have little to no affect on get, scan and query performance.
It is my understanding that the range keys for each hash key are efficiently stored in some kind of index that will scale well. However, remember that all the rows for a given hash key are stored together on the same node, so you can reach a point where there is too much data for a given hash key. The AWS Limits in DynamoDB states:
For a table with local secondary indexes, there is a limit on item
collection sizes: For every distinct hash key value, the total sizes
of all table and index items cannot exceed 10 GB. Depending on your
item sizes, this may constrain the number of range keys per hash
value.
As far as I know, this doesn't matter. The load distribution depends on the "frequency" of access and not on the "possible combinations". If your access is uniformly distributed across the 1000 keys you are taking about, then it is OK - This means the probability of fetching by key1 should me similar to probability of fetching key10 or key100. Internally I guess they would be bucketing your 1000 keys into say 3 groups and each of these groups "might" be served by 3 machines. You need to ensure that your access is nearly uniform so that all 3 machines get uniform load share.

Resources