I am thinking of creating a reporting tool for a Cosmos DB instance in our system. I want to discover all the partition keys and the number of items stored for each. I think I should be using pkranges but do not seem to be able to find any examples of how this should work. Any suggestions?
You can try to get list of pkranges using this api
As of now, there is no api to return list of partition keys. There was a feature request at Azure Cosmos DB Improvement Ideas, but it is declined.
So, you can try "SELECT distinct c.partitionKey FROM c" to get list of distinct partition keys.
Related
If we are partitioning a container in cosmosDb sql api , is it ok to have a partition key as unique in each document. I mean each document in the container will have its own logical partition and each logical parition will have only one document, we need to query on the unique key only so only one parition/document will get hit. Is there still any downside of such modelling related to performance /storage...?
If you are using Cosmos DB SQL API as a key/value store and only reading using ReadItemAsync() there is no downside to doing this.
I am new to cosmos db. I chose cosmos db (core sql), created a database having two containers say EmployeeContainer and DepartmentContainer. Now I want to query these two container and want to fetch employee details with associated department details. I stuck on a point and need help.
Below is the structure of my containers.
EmployeeContainer : ID, Name, DepartmentID
DepartmentContainer: ID, Name
Thanks in advance.
Cosmos DB is not a relational database. You do not store different entities in different containers if they are queried together. They are either embedded in other entities or stored as separate rows using a shared partition key with other entities in the same container.
Before you get too far with Cosmos you need to understand how to model and partition data to ensure the best possible performance. I strongly recommend you read the docs on partitioning and specifically read these docs below.
Data modeling in Cosmos DB
Partitioning in Cosmos DB
How to model and partition data - a real world example
And watch Data Modeling in Cosmos DB - What every relational developer should know
It completely depends on the type of data you are trying to model. Generally, it comes down to relationships. 1:1 or 1:few often are best for embedding related items or where queries are updated together. 1:many or many:many for referencing related items are queried or updated independently.
For great talks on these issues check out https://www.gotcosmos.com/conf/ondemand
You can use subquery.
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-subquery#mimic-join-with-external-reference-data
But this may consumes a lot of RU. And only inner join for now.
We are using cosmos db for our data storage and there is a case where I have to do cross partition query because I don't know the specific partition key. But I will know a part of it.
To elaborate, my partition key is combination of multiple strings, lets say A-B.
and lets say I only know A but not B. So is there any way to do wild card searching on the partition key.
would that optimize the query or its not possible. Depending on that I will consider if to put A in the the partition key at all or not
Based on my researching and Partitioning in Azure Cosmos DB, nowhere mentions cosmos db partition key supports wildcard searching feature. Only index policy supports wildcard setting:https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy#including-and-excluding-property-paths
So,for your situation,you don't know B so that i'd suggest you considering setting pk as A. Besides,you could vote up this thread:https://github.com/PlagueHO/CosmosDB/issues/153
I'm trying to execute a very large query that needs to return millions of records, so I want to partition the query and use multiple machines to process the results.
My logical partition key would be a UUID of a document, so that will not be very helpful for me to allocate different parts to each worker node. Can I get the physical partition ID and execute my query only within a particular physical partition?
Here's what I have tried:
FeedOptions feedOptions = new FeedOptions();
feedOptions.setEnableCrossPartitionQuery(false);
feedOptions.setPartitionKeyRangeIdInternal("0");
client.queryDocuments(collectionPath, "SELECT * FROM e where e.docType
= 'address'", feedOptions).flatMapIterable(FeedResponse::getResults);
But changing the partitionKeyRangeId doesn't seem to change the results at all.
Please advise.
Per my knowledge, it can't be performed within a particular physical partition so far. I could't find any parameters related to physical partition in Cosmos DB Rest Api. The PartitionKeyRangeId you mentioned in your code is used in change feed requests.
Based on the statement in official doc, we can't manage physical partitions in cosmos db:
Azure Cosmos DB will automatically scale the number of physical
partitions based on your workload. So you shouldn’t corelate your
database design based on the number of physical partitions instead you
should make sure to choose the right partition key which determines
the logical partitions.
However, since cosmos db is flexible,available and enlightened, you could submit feedback to ask for further assistant if you do have such requirements related to physical partitions.
Hope it helps you.
Update Answer:
There are many ways to improve the performance of processing large volumes of data, I just give some personal advice here.
1.You could tried to consider choosing a partition key that is more appropriate than the UUID for greatly improve performance.
2.Try using page size to limit the number of items per query, then implement query and process parallelism by multithreading.
3.Increase the RUs setting to promote performance.
More ideas,please refer to this doc.
How do we achieve the similar functionality of distinct keyword from SQL in Amazon DynamoDB?
dyanamo db is not support this kind of functionality, but you can achieve this in some ways (client side, lambda on dynamodb stream the updates another table with distinct values..)
you can find a good answer here: Retrieve distinct values from the hash key - DynamoDB