I'm new to CosmosDB and I need to store there some simple key/value pairs.
Is it mandatory to have a value of a collection a json format? If not, what can I mention as partitionKey?
Is it mandatory to have a value of a collection a json format?
The short answer is yes, Cosmos DB is no-sql database which contains key-value structure document. The document has a flexible schema, de-normalized data, it can have mixed type of data such as a string value, number, array or an object. More details,please refer to this guide.
If not, what can I mention as partitionKey?
Per my experience, when you design the partition key, you need to consider many factors.There is a nice paragraph listed here for your reference about choosing pk. My suggestion is that you could pick the most frequently queried column as pk.
Related
I have a DynamoDB table:
How shoul I filter entried in DB table where all keys are: access.role = "ADMIN"?
You would be best served by setting up an Global Index (GSI). You set the Partition Key equal to that attribute, and the Sort Key equal to some other attribute that you can guarantee will be unique. Then you use your SDK of choice or the Query option in the console, select the index, and query for partion_key = ADMIN
However. Be aware. Index's are a complete replication of the table. Dynamo is very good at this and relatively fast at doing so, but there is still the possibility that your index will be out of sync with the actual data. If you are not making the call against the index very often you are pretty much fine. If you are calling it very often, then you should restructure your table.
Dynamo is not an SQL. When setting up a dynamo schema you have to consider how you will access your data. your Access Patterns. You should design your data with your Partition Key as the data you will have when looking up (Ie: i always will have a user ID number) and your sort keys as the individual documents related to that PK (ie: a user has a document that is his profile data, a document that is his profile picture url, a document that is a list of his friends user numbers, a document that is ... ect)
Then you use Indexs for things like your question that you wont be doing very often.
First of all, I have table structure like this,
Users:{
UserId
Name
Email
SubTable1:[{
Column-111
Column-112
},
{
Column-121
Column-122
}]
SubTable2:[{
Column-211
Column-212
},
{
Column-221
Column-222
}]
}
As I am new to DynamoDB, so I have couple of questions regarding this as follows:
1. Can I create structure like this?
2. Can we set primary key for subtables?
3. Luckily, I found DynamoDB helper class to do some operations into my DB.
https://www.gopiportal.in/2018/12/aws-dynamodb-helper-class-c-and-net-core.html
But, don't know how to fetch only perticular subtable
4. Can we fetch only specific columns from my main table? Also need suggestion for subtables
Note: I am using .net core c# language to communicate with DynamoDB.
Can I create structure like this?
Yes
Can we set primary key for subtables?
No, hash key can be set on top level scalar attributes only (String, Number etc.)
Luckily, I found DynamoDB helper class to do some operations into my DB.
https://www.gopiportal.in/2018/12/aws-dynamodb-helper-class-c-and-net-core.html
But, don't know how to fetch only perticular subtable
When you say subtables, I assume that you are referring to Array datatype in the above sample table. In order to fetch the data from DynamoDB table, you need hash key to use Query API. If you don't have hash key, you can use Scan API which scans the entire table. The Scan API is a costly operation.
GSI (Global Secondary Index) can be created to avoid scan operation. However, it can be created on scalar attributes only. GSI can't be created on Array attribute.
Other option is to redesign the table accordingly to match your Query Access Pattern.
Can we fetch only specific columns from my main table? Also need suggestion for subtables
Yes, you can fetch specific columns using ProjectionExpression. This way you get only the required attributes in the result set
I wanted to store a set of strings which are not available in any other table in the schema. For example,
I have a table with country_id, country_name and animals
The column animals need to be a Collection (Set or List) and the animals in the list are not available in any other table of the schema. They are just plain java strings. I went through the link here. But it only mentions the ways to store a collection of objects that are available as a column in another table. Any help would be much appreciated. Thanks in advance. Cheers!
Just use a JDO AttributeConverter, and the Collection field can be converted into a String for example (stored in a single column). As per the docs
I have a JSON document with two properties deviceIdentity, version.
Partition Key for my collection is deviceIdentity.
My JSON documents comes with different versions I want to keep all versions of this document.
Like:
deviceIdentity1, v1
deviceIdentity1, v2
Two documents should be there.
Problem is since my PK is deviceIdentity, it is always updating the existing record even though I have defined a unique key constraint on deviceIdentity, version.
enter image description here
Any pointers will be of help!
I believe you are confusing partition key with primary key.
Partition key determines how data is scaled horizontally. This should not be unique as otherwise any read except exact document lookup would require to scan all partitions, which would be innefficient. In your case deviceIdentity may be a suitable candidate - all versions of the same device would fall to same partition.
Primary key is your document identity (the field id). As you already noticed, there can be only 1 document with given id. The id field MUST be unique per document you want to store. In your case, you could use a combination values like "deviceIdentity1, v2" as the identity. Or, you could use technical unique id, like a guid.
Also, note that by Unique keys in Azure Cosmos DB:
By creating a unique key policy when a container is created, you ensure the uniqueness of one or more values per partition key.
Meaning if your partition key is deviceIdentity then you don't have to duplicate the deviceIdentity in unique constraint part. Constraint on /version would suffice to ensure that every single partition/device has at most one document per version.
Thanks for all the answers.
The problem was we have an old legacy system where “id “was an already heavily used property but it did not have unique values.
So whenever a document comes with a different version it was updating as “id” in cosmos has predefined meaning that is UPSERT of any arriving document is done on unique id value, in our case id is never unique.
Solution we found.
Whenever a document comes we process it in an azure function and swap “id” column with the value of unique “deviceidentity” value and save it, as structure of JSON cannot be changed as stated by our client and while reading these documents we have exposed an API which does the swapping again and sends the document to the requesting client as it is.
I have set the partition key of one of my Cosmos DBs to /partition.
For example: We have a Chat document that contains a list of Subscribers, then we have ChatMessages that contain a text, a reference to the author and some other properties. Both documents have a partition property that contains the type 'chat' and the chats id.
Chat example:
{
"id" : "955f3eca-d28d-4f83-976a-f5ff26d0cf2c",
"name" : "SO questions",
"isChat" : true,
"partition" : "chat_955f3eca-d28d-4f83-976a-f5ff26d0cf2c",
"subscribers" : [
...
]
}
We then have Message documents like this:
{
"id" : "4d1c7b8c-bf89-47e0-83e1-a8cf0d71ce5a",
"authorId" : "some guid",
"isMessage" : true,
"partition" : "chat_955f3eca-d28d-4f83-976a-f5ff26d0cf2c",
"text" : "What should I do?"
}
It is now very convenient to return all messages for a specific chat, I just need to query all documents of the partition chat_955f3eca-d28d-4f83-976a-f5ff26d0cf2c with the property isMessage = true. All good...
But if I now want to query my db for a specific message by id, I usually just know the id, but not the partition and therefor have to run a slow crosspartition query. Which then led me to the question if I should not add the partitionKey to the message id so I can split the id when querying the db for a faster lookup. I saw that the _rid property of a document looks like a combination of the id of a db and the id of the collection and then a document specific id. What I mean by this is (simplified):
Chat.Id = "abc"
Chat.Partition = "chat_abc" //[type]_[chatId]
Message.Id = "chat_abc|123" //[Chat.Partition]|[Message.Id]
Message.Partition = chat_abc //[Chat.Partition]
Lets assume that I now want to get the Message document by the id, I just split the id by the | symbol and then query the document with the 1st part of the id as partition and the full id as the key.
Does that make sense? Are there better ways to do this? Should I just always also pass the partitionKey of a document along, not just it's id? Should I just use the _rid properties instead?
Any experience is highly appreciated!
UPDATE
I have found the following answer here:
Some applications encode partition key as part of the ID, e.g.
partition key would be customer ID, and ID = "customer_id.order_id",
so you can extract the partition key from the ID value.
I have further asked the cosmos team by email if this is a recommended pattern and post an answer, in case I get any.
Yes, your proposal to extract partition key from id (via a convention like a prefix/delimiter) makes sense. This is common among applications that have a single key and want to refactor it to use Cosmos DB from a different storage system.
If you're building your application from scratch, you should consider wiring the composite key (partition key + item key ("id")) through your API/application.
First, if you know your data (and index) size) will remain within the 10gb limit and you RU/sec limit is ok, then a fixed partition-less collection will bypass this problem. Probably OP has knowlingly made the decision that partitioning is required, but it is an important consideration to note for generalization purposes. If possible, KISS ;)
If partitioning is a must, then AFAIK you cannot avoid crosspartition split and its overhead unless you know the partition key.
Imho the OP suggestion of merging the duplicated partition key into id field is a rather ugly solution, because:
Name id implies it is unique key, partition key is not part of it or necessary for this key and its uniqueness. Anyone using this key upstream would incur the forced excess cost of longer key, blocked from using the simpler Guid type, etc.
It will become a mess should your partitioning key change in future.
The internal structure of merged id would not be intuitive without documentation - it's parts are not named and even if they look like to have a pattern new devs would not know for sure without finding external documentation to reliably understand what's going on.
Your data model does not require this duplication on semantic level, it would be for your application querying comfort and hence such hacks should belong to your application code, not data model. Such leaking concerns should be avoided if possible.
Data duplication within document would unnecessarily increase document size, bandwidth, etc. (may or may not be notable, depending on scale and usage). in-document duplication is necessary at times, but imho not necessarily in this case.
A better design would be to ensure the partition key is always present in logic context and could be passed along to lookups. If you don't have it available, then maybe you should refactor you application code (not data design) to explicitly pass around the chatId along with id where needed. That is WITHOUT merging them together into some opaque string format.
Also, I don't see a good way to use _rid for this as if I remember correctly, it did not contain any internal reference to a partition or partition key.
Disclaimer: I don't have any access or deep insight into internal CosmosDB index design or _rid logic on partitioned collections. I may have misunderstood how it works.