Would cross partition query in cosmos db be helpful if we know one part of it - azure-cosmosdb

We are using cosmos db for our data storage and there is a case where I have to do cross partition query because I don't know the specific partition key. But I will know a part of it.
To elaborate, my partition key is combination of multiple strings, lets say A-B.
and lets say I only know A but not B. So is there any way to do wild card searching on the partition key.
would that optimize the query or its not possible. Depending on that I will consider if to put A in the the partition key at all or not

Based on my researching and Partitioning in Azure Cosmos DB, nowhere mentions cosmos db partition key supports wildcard searching feature. Only index policy supports wildcard setting:https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy#including-and-excluding-property-paths
So,for your situation,you don't know B so that i'd suggest you considering setting pk as A. Besides,you could vote up this thread:https://github.com/PlagueHO/CosmosDB/issues/153

Related

Multi partition key search operation in DynamoDB

Is there some operation of the Scan API or the Query API that allows to perform a lookup on a table with a composite key (pk/sk) but that varies only in the pk to optimize the Scan operation of the table ?
Let me introduce a use case:
Suppose I have a partition key defined by the id of a project and within each project I have a huge amount of records (sk)
Now, I need to solve the query "return all projects". So I don't have a partition key and I have to perform a scan.
I know that I could create a GSI that solves this problem, but let's assume that this is not the case.
Is there any way to perform a scan that "hops" between each pk, ignoring the elements of the sk's?
In other words, I will collect the information of the first record of each partition key.
DynamoDB is a NoSQL database, as you already know. It is optimized for LOOKUP, and practices that you used to have in SQL databases or other (low-scale) databases are not always available in DynamoDB.
The concept of a partition key is to put records that are part of the same partition together and sorted by the sort key. The other side of it is that records that don't have the same partition key, are stored in other locations. It is not a long list (or tree) of records that you can scan over.
When you design your schema in a NoSQL database, you need to consider the access pattern to that data. If you need a list of all the projects, you need to maintain an index that will allow it.

Is it recommended to have large number of logical partitions in cosmosdb

If we are partitioning a container in cosmosDb sql api , is it ok to have a partition key as unique in each document. I mean each document in the container will have its own logical partition and each logical parition will have only one document, we need to query on the unique key only so only one parition/document will get hit. Is there still any downside of such modelling related to performance /storage...?
If you are using Cosmos DB SQL API as a key/value store and only reading using ReadItemAsync() there is no downside to doing this.

Query Cosmos DB for all partitions

I am thinking of creating a reporting tool for a Cosmos DB instance in our system. I want to discover all the partition keys and the number of items stored for each. I think I should be using pkranges but do not seem to be able to find any examples of how this should work. Any suggestions?
You can try to get list of pkranges using this api
As of now, there is no api to return list of partition keys. There was a feature request at Azure Cosmos DB Improvement Ideas, but it is declined.
So, you can try "SELECT distinct c.partitionKey FROM c" to get list of distinct partition keys.

How to create a good primary key in DynamoDB

I have an application on AWS using DynamoDB with user sending messages to each other. I am not familiar with AWS and I a lacking best practice knowledge
My application has now started to get slow to retrieve messages for a user because I have more and more data in my database.
I am thinking that it is because of my primary key and I wonder what could be a good primary key in this case.
Currently I am using a random guid as a primary key.
I am looking to retrieve all messages corresponding to a user, I am doing a scan operation.
I would like to use a composite value based on username as a primary key but I wonder if it will be better. For instance if I need to retrieve the number of messages for a user and to increment it will probably be even longer to do the request to create the primary key.
What would be a good primary key here ?
Thanks!
It will be better since it appears you often query based on the userid. Scans are expensive and should be avoided where possible. AWS has a great article on best practices for choosing a partition key (primary key). The key takeaway is the following:
You should evaluate various approaches based on your data ingestion and access pattern, then choose the most appropriate key with the least probability of hitting throttling issues.
Using a guid for the partition/primary key is a waste if you never query the data using it. Since using the query operation (rather than using scan) requires querying using the partition/primary (and sort key), you want to ensure you choose a value that you use to retrieve the data often and also has the sufficient cardinality to ensure your data is distributed across a reasonable amount of partitions.
What other access patterns do you have in your application? From what you've mentioned so far, userid seems to be a reasonable choice.

Maintain unique value for DynamoDB partition key

I'm new to "DynamoDB" and wanting to know best practice to maintaining unique partition key value when you add records to a table.
With my existing experience related to SQL, primary keys are normally maintained by the system with identity columns or via a trigger. I've searched through various forums and "AWS" documentation, but did not find any specifics. Do you manually determine the existence of partition key value or am I missing something obvious?
In DynamoDB the querying is flexibility is limited when compared to SQL. So the schema as well as partition key / sort key should be designed to make the most common and important queries as fast as possible. You can find some generic best practices here
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html
https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/
If you can provide better context on the use case that you are trying to use DynamoDB, you should get more pointed answere

Resources