I have a SQL API Cosmos DB collection with the id and partition key both equal to /id.
Given a list of IDs, I need to fetch all those documents. When using the .NET SDK (v3.25), which of the below Container class methods is recommended to get the lowest latency:
In parallel, use ReadItemAsync to read all documents.
Use ReadManyItemsAsync to read all the documents.
Use GetItemQueryIterator with a SQL query of the form SELECT * FROM c where c.id in ('id-1', 'id-2', ...).
If you want to retrieve large group of individual items, the most efficient way is to use ReadManyItemsAsync() rather than invoking ReadItemAsync() many times/Parallel.
Related
I am sending some IoT events into Azure Cosmos DB. I am partitioning by device id and I am always querying by device id. I want to know if the automatically created indexes are separated by partition key. Specifically if I do query like
SELECT TOP 5 ... FROM events WHERE deviceId = X ORDER BY timeStamp DESC
Will it use the automatically created index on timeStamp and if so is it effective. Basically what I am asking is if there are separate indexes on timeStamp for each partition key (deviceId in my case) because otherwise the index will be relatively useless because the range will contain a lot of irrelevant data from other devices. If this was SQL Server I would create an index on deviceId followed by timeStamp but I am not sure how Cosmos DB works by default.
Indexes sit within the partition so yes.
For this query you have you should also create a composite index with DESC sort order for the best performance.
In Azure Cosmos DB (SQL API) the following query charges 9356.66 RU's:
SELECT * FROM Core c WHERE c.id = #id -- #id is a GUID
In contrast the following more complex query charges only 6.84 RU's:
SELECT TOP 10 * FROM Core c WHERE c.type = "Agent"
The documents in both examples are pretty small having a handful of attributes. Also the document collection does not use any custom indexing policy. The collection contains 105685 documents.
To me this sounds as if there is no properly working index on the "id" field in place.
How is this possible and how can this be fixed?
Updates:
Without the TOP keyword the second query charges 3516.35 RU's and returns 100000 records.
The partition key is "/partition" and its values are 0 or 1 (evenly distributed).
If you have partition collection you need to specify partition keyif you want to do request most efficiently. Cross-partition queries is really expensive (and slower) in cosmos, because partitions data can be stored in different places.
Try following:
SELECT * FROM Core c WHERE c.id = #id AND c.partition = #partition
Or, specify partition key in feed options if you're using CosmosDB SDK.
Let me know, if this helps.
I assume the solution is the same as posted here:
Azure DocumentDB Query by Id is very slow
I will close my own question once I am able to verify this with Microsoft Support.
Team,
I have a dynamodb with a given hashkey (userid) and sort key (ages). Lets say if we want to retrieve the elements as "per each hashkey(userid), smallest age" output, what would be the query and filter expression for the dynamo query.
Thanks!
I don't think you can do it in a query. You would need to do full table scan. If you have a list of hash keys somewhere, then you can do N queries (in parallel) instead.
[Update] Here is another possible approach:
Maintain a second table, where you have just a hash key (userID). This table will contain record with the smallest age for given user. To achieve that, make sure that every time you update main table you also update second one if new age is less than current age in the second table. You can use conditional update for that. Update can either be done by application itself, or you can have AWS lambda listening to dynamoDB stream. Now if you need smallest age for each use, you still do full table scan of the second table, but this scan will only read relevant records, to it will be optimal.
There are two ways to achieve that:
If you don't need to get this data in realtime you can export your data into a other AWS systems, like EMR or Redshift and perform complex analytics queries there. With this you can write SQL expressions using joins and group by operators.
You can even perform EMR Hive queries on DynamoDB data, but they perform scans, so it's not very cost efficient.
Another option is use DynamoDB streams. You can maintain a separate table that stores:
Table: MinAges
UserId - primary key
MinAge - regular numeric attribute
On every update/delete/insert of an original query you can query minimum age for an updated user and store into the MinAges table
Another option is to write something like this:
storeNewAge(userId, newAge)
def smallestAge = getSmallestAgeFor(userId)
storeSmallestAge(userId, smallestAge)
But since DynamoDB does not has native transactions support it's dangerous to run code like that, since you may end up with inconsistent data. You can use DynamoDB transactions library, but these transactions are expensive. While if you are using streams you will have consistent data, at a very low price.
You can do it using ScanIndexForward
YourEntity requestEntity = new YourEntity();
requestEntity.setHashKey(hashkey);
DynamoDBQueryExpression<YourEntity> queryExpression = new DynamoDBQueryExpression<YourEntity>()
.withHashKeyValues(requestEntity)
.withConsistentRead(false);
equeryExpression.setIndexName(IndexName); // if you are using any index
queryExpression.setScanIndexForward(false);
queryExpression.setLimit(1);
I'm planning on using Cosmos Db (Document Db) and I'm trying to understand how the queries, indexing and partitions relate to each other.
How to partition and scale in Azure Cosmos Db talks about the partition key and other documentation indicates that partition key + id = unique id for the document. But then SQL Query and SQL syntax in Azure Cosmos Db says it provides automatic indexing of JSON documents without requiring explicit schema or creation of secondary indexes.
I understand that partition key is important for scalability and how data is stored. But if we think about searching is the partition key kind of like extra filter/where clause? All the documents are indexed so I can execute query like:
SELECT *
FROM Families
WHERE Families.address.state = "NY"
Should I still specify the partition key or indicate some how that cross partition queries are allowed when using this SQL query syntax?
Your first link gives the answer for this:
For partitioned collections, you can use PartitionKey to run the query against a single partition (though Cosmos DB can automatically extract this from the query text), and EnableCrossPartitionQuery to run queries that may need to be run against multiple partitions.
So, yes, you either need to specify the WHERE clause which will make query run against a single partition, or set EnableCrossPartitionQuery to true in query options.
You don't have to do that anymore, EnableCrossPartitionQuery is set to true by default nowadays. This means Cosmos won't complain if you don't skip the partition key in your query.
More info here.
You don't need to specify a partition key to the query. Recent version enabled cross partition queries by default
We have created a partitioned collection with MyID as partition Key in azure DocumentDB and populated data in to the collection , but when we try to run a query with order by it fails,
Sample Query,
SELECT *
FROM Families f
JOIN c IN f.children
WHERE f.MyId = 123
ORDER BY f.address.city ASC
Please suggest on the same.
DocumentDB supports cross-partition Order By only as a preview feature. You need to email askdocdb#microsoft.com to get access to it.
DocumentDB supports cross partition ORDER BY as of SDK 1.9.0+.