Azure DocumentDB Orderby not working with Partitioned collection - azure-cosmosdb

We have created a partitioned collection with MyID as partition Key in azure DocumentDB and populated data in to the collection , but when we try to run a query with order by it fails,
Sample Query,
SELECT *
FROM Families f
JOIN c IN f.children
WHERE f.MyId = 123
ORDER BY f.address.city ASC
Please suggest on the same.

DocumentDB supports cross-partition Order By only as a preview feature. You need to email askdocdb#microsoft.com to get access to it.
DocumentDB supports cross partition ORDER BY as of SDK 1.9.0+.

Related

Fastest way to fetch multiple documents from Azure Cosmos DB

I have a SQL API Cosmos DB collection with the id and partition key both equal to /id.
Given a list of IDs, I need to fetch all those documents. When using the .NET SDK (v3.25), which of the below Container class methods is recommended to get the lowest latency:
In parallel, use ReadItemAsync to read all documents.
Use ReadManyItemsAsync to read all the documents.
Use GetItemQueryIterator with a SQL query of the form SELECT * FROM c where c.id in ('id-1', 'id-2', ...).
If you want to retrieve large group of individual items, the most efficient way is to use ReadManyItemsAsync() rather than invoking ReadItemAsync() many times/Parallel.

Are Azure CosmosDB indexes split by partition

I am sending some IoT events into Azure Cosmos DB. I am partitioning by device id and I am always querying by device id. I want to know if the automatically created indexes are separated by partition key. Specifically if I do query like
SELECT TOP 5 ... FROM events WHERE deviceId = X ORDER BY timeStamp DESC
Will it use the automatically created index on timeStamp and if so is it effective. Basically what I am asking is if there are separate indexes on timeStamp for each partition key (deviceId in my case) because otherwise the index will be relatively useless because the range will contain a lot of irrelevant data from other devices. If this was SQL Server I would create an index on deviceId followed by timeStamp but I am not sure how Cosmos DB works by default.
Indexes sit within the partition so yes.
For this query you have you should also create a composite index with DESC sort order for the best performance.

Cosmos DB - Why partition key has to be mentioned when making point queries using sql api SDK?

Why partition key has to be mentioned when making point queries against partitioned collection using sql api SDK?
Is there way to execute point queries against partitioned collection using sql api SDK without mentioning "PartitionKey"? as it is point query.
Working Example:
// - db_id is the ID property of the Database
// - coll_id is the ID property of the DocumentCollection
// - doc_id is the ID property of the Document wish to read.
var docUri = UriFactory.CreateDocumentUri("db_id", "coll_id", "doc_id");
await docClient.ReadDocumentAsync(docUri, new RequestOptions { PartitionKey = new PartitionKey(actualId) });
Non working example:
var docUri = UriFactory.CreateDocumentUri("db_id", "coll_id", "doc_id");
await docClient.ReadDocumentAsync(docUri);
Below query without partition key, fails with errormessage: "PartitionKey value must be supplied for this operation."
Why partition key needs to be mentioned for point query against partitioned collection?
PartitionKey is mandatory, When you create an account you will be asked to provide the Partition Key. It is used for sharding and it acts as a logical partition for your data, provides Cosmos DB with a natural boundary for distributing data across partitions. So it comes with the design.
So whenever you are querying data from cosmosdb with SDK it is necessary to pass the PartitionKey
The reason ParitionKey is needed because that CosmosDB does not parse the query until it locates the data.

High request-charge for very simple query in Azure Cosmos DB (SQL API)

In Azure Cosmos DB (SQL API) the following query charges 9356.66 RU's:
SELECT * FROM Core c WHERE c.id = #id -- #id is a GUID
In contrast the following more complex query charges only 6.84 RU's:
SELECT TOP 10 * FROM Core c WHERE c.type = "Agent"
The documents in both examples are pretty small having a handful of attributes. Also the document collection does not use any custom indexing policy. The collection contains 105685 documents.
To me this sounds as if there is no properly working index on the "id" field in place.
How is this possible and how can this be fixed?
Updates:
Without the TOP keyword the second query charges 3516.35 RU's and returns 100000 records.
The partition key is "/partition" and its values are 0 or 1 (evenly distributed).
If you have partition collection you need to specify partition keyif you want to do request most efficiently. Cross-partition queries is really expensive (and slower) in cosmos, because partitions data can be stored in different places.
Try following:
SELECT * FROM Core c WHERE c.id = #id AND c.partition = #partition
Or, specify partition key in feed options if you're using CosmosDB SDK.
Let me know, if this helps.
I assume the solution is the same as posted here:
Azure DocumentDB Query by Id is very slow
I will close my own question once I am able to verify this with Microsoft Support.

Is partition key needed in queries even though JSON is indexed

I'm planning on using Cosmos Db (Document Db) and I'm trying to understand how the queries, indexing and partitions relate to each other.
How to partition and scale in Azure Cosmos Db talks about the partition key and other documentation indicates that partition key + id = unique id for the document. But then SQL Query and SQL syntax in Azure Cosmos Db says it provides automatic indexing of JSON documents without requiring explicit schema or creation of secondary indexes.
I understand that partition key is important for scalability and how data is stored. But if we think about searching is the partition key kind of like extra filter/where clause? All the documents are indexed so I can execute query like:
SELECT *
FROM Families
WHERE Families.address.state = "NY"
Should I still specify the partition key or indicate some how that cross partition queries are allowed when using this SQL query syntax?
Your first link gives the answer for this:
For partitioned collections, you can use PartitionKey to run the query against a single partition (though Cosmos DB can automatically extract this from the query text), and EnableCrossPartitionQuery to run queries that may need to be run against multiple partitions.
So, yes, you either need to specify the WHERE clause which will make query run against a single partition, or set EnableCrossPartitionQuery to true in query options.
You don't have to do that anymore, EnableCrossPartitionQuery is set to true by default nowadays. This means Cosmos won't complain if you don't skip the partition key in your query.
More info here.
You don't need to specify a partition key to the query. Recent version enabled cross partition queries by default

Resources