query cosmos db without enabling cross partition query - azure-cosmosdb

When query cosmos db, there is an option of setting enableCrossPartitionQuery as true.
I am wondering what happens that if I did not set it? Which partition will be used for the query?
thanks

If your collection is partitioned, then the query,update, delete opeartions need partition key setting.
If you don't set, perhaps you could see below error:
For this situation, if you don't want to set any partition key or you don't know which partition the row data belongs to, then you could set enableCrossPartitionQuery= true to avoid the error. If you set enableCrossPartitionQuery= true, it means this request will scan all the partitions to filter the data. Of course,it's query performance is bound to decline.
BTW,if your data size is small,i think the impact may be small. However,if the data size is large, i suggest you trying your best to avoid setting this property.
I tested the sample project : https://github.com/Azure-Samples/azure-cosmos-db-sql-api-nodejs-getting-started.git and it doesn't require partition key indeed when the container is partitioned.
However, based on the statements in the cosmos db rest api :
I tested java sdk and it requires the partition key when i query partitioned container. Anyway,i want to say that if you met the error which indicates the lack of partition key, you could try to add the property enableCrossPartitionQuery = true to solve it. Mostly, i still suggest you providing partition key for the query performance.

Related

How to filter DynamoDb by object property value

I have a DynamoDB table:
How shoul I filter entried in DB table where all keys are: access.role = "ADMIN"?
You would be best served by setting up an Global Index (GSI). You set the Partition Key equal to that attribute, and the Sort Key equal to some other attribute that you can guarantee will be unique. Then you use your SDK of choice or the Query option in the console, select the index, and query for partion_key = ADMIN
However. Be aware. Index's are a complete replication of the table. Dynamo is very good at this and relatively fast at doing so, but there is still the possibility that your index will be out of sync with the actual data. If you are not making the call against the index very often you are pretty much fine. If you are calling it very often, then you should restructure your table.
Dynamo is not an SQL. When setting up a dynamo schema you have to consider how you will access your data. your Access Patterns. You should design your data with your Partition Key as the data you will have when looking up (Ie: i always will have a user ID number) and your sort keys as the individual documents related to that PK (ie: a user has a document that is his profile data, a document that is his profile picture url, a document that is a list of his friends user numbers, a document that is ... ect)
Then you use Indexs for things like your question that you wont be doing very often.

Is it possible to check if a logical partition exist in Azure Cosmos DB?

Is there any Partition key not found exception when we query with partitionkey via QueryRequestOptions? or is there any other ways I can be notified that the logical partition does not exist in a query?
Is there any Partition key not found exception when we query with
partitionkey via QueryRequestOptions?
Assuming you are talking about the value of the partition key, as far as I know there is no such thing in Cosmos DB.
or is there any other ways I can be notified that the logical
partition does not exist in a query?
One possible way to find out is query your container with the partition key value in the query and try to fetch at most one document. If you don't get any documents back (i.e. get empty resultset), that would mean the logical partition does not exist in the container.

Why Not Always Use EnableCrossPartitionQuery

If my cosmos DB has multiple partitions is there any reason to NOT set EnableCrossPartitionQuery to true?
I know it is necessary if running a query that could hit multiple partitions. But what if the query uses a valid partition key and definitely will only hit one partition, is there any performance loss or increased cost because I set that flag to true?
But what if the query uses a valid partition key and definitely will
only hit one partition, is there any performance loss or increased
cost because I set that flag to true?
Per my knowledge, you need set the partition key for partitioned collection and the cost will not change even if you still set the EnableCrossPartitionQuery as true.Because the request only scans the specific partition you already set. I did a sample test and try to verify it.
FeedOptions feedOptions = new FeedOptions();
PartitionKey partitionKey = new PartitionKey("A");
feedOptions.setPartitionKey(partitionKey);
feedOptions.setEnableCrossPartitionQuery(true);
FeedResponse<Document> queryResults = client.queryDocuments(
"/dbs/db/colls/part",
"SELECT * FROM c",
feedOptions);
System.out.println("Running SQL query...");
for (Document document : queryResults.getQueryIterable()) {
System.out.println(String.format("\tRead %s", document));
}
System.out.println(queryResults.getRequestCharge());
I think maybe you don't have to struggle with this problem. EnableCrossPartitionQuery option only need to be used if the query for partitioned collection is not scoped to single partition key value. If you know the specific partition key,then no need to set EnableCrossPartitionQuery.

Encode PartitionKey into Document Id?

I have set the partition key of one of my Cosmos DBs to /partition.
For example: We have a Chat document that contains a list of Subscribers, then we have ChatMessages that contain a text, a reference to the author and some other properties. Both documents have a partition property that contains the type 'chat' and the chats id.
Chat example:
{
"id" : "955f3eca-d28d-4f83-976a-f5ff26d0cf2c",
"name" : "SO questions",
"isChat" : true,
"partition" : "chat_955f3eca-d28d-4f83-976a-f5ff26d0cf2c",
"subscribers" : [
...
]
}
We then have Message documents like this:
{
"id" : "4d1c7b8c-bf89-47e0-83e1-a8cf0d71ce5a",
"authorId" : "some guid",
"isMessage" : true,
"partition" : "chat_955f3eca-d28d-4f83-976a-f5ff26d0cf2c",
"text" : "What should I do?"
}
It is now very convenient to return all messages for a specific chat, I just need to query all documents of the partition chat_955f3eca-d28d-4f83-976a-f5ff26d0cf2c with the property isMessage = true. All good...
But if I now want to query my db for a specific message by id, I usually just know the id, but not the partition and therefor have to run a slow crosspartition query. Which then led me to the question if I should not add the partitionKey to the message id so I can split the id when querying the db for a faster lookup. I saw that the _rid property of a document looks like a combination of the id of a db and the id of the collection and then a document specific id. What I mean by this is (simplified):
Chat.Id = "abc"
Chat.Partition = "chat_abc" //[type]_[chatId]
Message.Id = "chat_abc|123" //[Chat.Partition]|[Message.Id]
Message.Partition = chat_abc //[Chat.Partition]
Lets assume that I now want to get the Message document by the id, I just split the id by the | symbol and then query the document with the 1st part of the id as partition and the full id as the key.
Does that make sense? Are there better ways to do this? Should I just always also pass the partitionKey of a document along, not just it's id? Should I just use the _rid properties instead?
Any experience is highly appreciated!
UPDATE
I have found the following answer here:
Some applications encode partition key as part of the ID, e.g.
partition key would be customer ID, and ID = "customer_id.order_id",
so you can extract the partition key from the ID value.
I have further asked the cosmos team by email if this is a recommended pattern and post an answer, in case I get any.
Yes, your proposal to extract partition key from id (via a convention like a prefix/delimiter) makes sense. This is common among applications that have a single key and want to refactor it to use Cosmos DB from a different storage system.
If you're building your application from scratch, you should consider wiring the composite key (partition key + item key ("id")) through your API/application.
First, if you know your data (and index) size) will remain within the 10gb limit and you RU/sec limit is ok, then a fixed partition-less collection will bypass this problem. Probably OP has knowlingly made the decision that partitioning is required, but it is an important consideration to note for generalization purposes. If possible, KISS ;)
If partitioning is a must, then AFAIK you cannot avoid crosspartition split and its overhead unless you know the partition key.
Imho the OP suggestion of merging the duplicated partition key into id field is a rather ugly solution, because:
Name id implies it is unique key, partition key is not part of it or necessary for this key and its uniqueness. Anyone using this key upstream would incur the forced excess cost of longer key, blocked from using the simpler Guid type, etc.
It will become a mess should your partitioning key change in future.
The internal structure of merged id would not be intuitive without documentation - it's parts are not named and even if they look like to have a pattern new devs would not know for sure without finding external documentation to reliably understand what's going on.
Your data model does not require this duplication on semantic level, it would be for your application querying comfort and hence such hacks should belong to your application code, not data model. Such leaking concerns should be avoided if possible.
Data duplication within document would unnecessarily increase document size, bandwidth, etc. (may or may not be notable, depending on scale and usage). in-document duplication is necessary at times, but imho not necessarily in this case.
A better design would be to ensure the partition key is always present in logic context and could be passed along to lookups. If you don't have it available, then maybe you should refactor you application code (not data design) to explicitly pass around the chatId along with id where needed. That is WITHOUT merging them together into some opaque string format.
Also, I don't see a good way to use _rid for this as if I remember correctly, it did not contain any internal reference to a partition or partition key.
Disclaimer: I don't have any access or deep insight into internal CosmosDB index design or _rid logic on partitioned collections. I may have misunderstood how it works.

Is partition key needed in queries even though JSON is indexed

I'm planning on using Cosmos Db (Document Db) and I'm trying to understand how the queries, indexing and partitions relate to each other.
How to partition and scale in Azure Cosmos Db talks about the partition key and other documentation indicates that partition key + id = unique id for the document. But then SQL Query and SQL syntax in Azure Cosmos Db says it provides automatic indexing of JSON documents without requiring explicit schema or creation of secondary indexes.
I understand that partition key is important for scalability and how data is stored. But if we think about searching is the partition key kind of like extra filter/where clause? All the documents are indexed so I can execute query like:
SELECT *
FROM Families
WHERE Families.address.state = "NY"
Should I still specify the partition key or indicate some how that cross partition queries are allowed when using this SQL query syntax?
Your first link gives the answer for this:
For partitioned collections, you can use PartitionKey to run the query against a single partition (though Cosmos DB can automatically extract this from the query text), and EnableCrossPartitionQuery to run queries that may need to be run against multiple partitions.
So, yes, you either need to specify the WHERE clause which will make query run against a single partition, or set EnableCrossPartitionQuery to true in query options.
You don't have to do that anymore, EnableCrossPartitionQuery is set to true by default nowadays. This means Cosmos won't complain if you don't skip the partition key in your query.
More info here.
You don't need to specify a partition key to the query. Recent version enabled cross partition queries by default

Resources