1) I have a Cosmos DB collection with about 500k documents and which is Partitioned by a property "SITEID". In the Query Request Options only one partition key value can be passed. In my case I have queries where the SITEID in (1,2,3,4) needs to be executed where SiteID is the partition key.
For example, my SP is as follows:
SELECT * FROM c WHERE c.SITEID IN
("SiteId1","SiteId2","SiteId3","SiteId4","SiteId5")
AND c.STATUS IN ("Status1","Status2","Status3","Status4")
I am Calling the above SP using the below SQL API code.
await client.ExecuteStoredProcedureAsync<string>(UriFactory.CreateStoredProcedureUri("DBName", "CollectionName", "Sample"),new RequestOptions { PartitionKey = new PartitionKey("SiteId1") })
In the above SQL API Code, PartitionKey property only supports a Single value. Where I need to pass several partition values. Is there any other options to do this?
2) "EnableCrossPartitionQuery" property is only availbale in the FeedOptions but not in the Request Options class. Client.ExecuteStoredProcedureAsync only supports the RequestOptions parameter not FeedOptions. Now I need to execute a Stored Procedure at once and across all partitions. Is there any other options to pass EnableCrossPartitionQuery in ExecuteStoredProcedureAsync method.
E.g)
client.CreateDocumentQuery<Doc>(UriFactory.CreateDocumentCollectionUri("DBName", "CollectionName"), "select * from c", new FeedOptions { EnableCrossPartitionQuery = true }).ToList()
await client.ExecuteStoredProcedureAsync<string>(UriFactory.CreateStoredProcedureUri("DBName", "CollectionName", "Sample"),new RequestOptions { PartitionKey = new PartitionKey("WGC") })
Stored procedures can only be executed against a single partition. There is nothing you can do about that.
They are not considered a query that returns a feed but a request that could return a response of any type. That's they they don't used the FeedOptions but rather the RequestOptions.
You can still execute your query as a normal document query and set the EnableCrossPartitionQuery to true. Cosmos should recognise the partition key in the query and should limit the requests to the specific partition key values.
I say should because this answer suggests that this is the case but there are some comments that say otherwise. I would suggest you check your metrics regarding the amount of collection hits.
Related
How can I get items count for a particular partition key using .net core preferably using Object Persistence Interface or Document Interfaces?
Since I do not see any docs any where, currently I get the number of items count by retrieve all the item and get its count, but it is very expensive to do the reads.
What is the best practices for such item count request? Thank you.
dynamodb is mostly a document oriented key-value db; so its not optimized for functionality of the common relation db functions (like item count).
to minimize the data that is transmitted and to improve speed you may want to do the following:
Create Lambda Function that returns Item Count
To avoid transmitting data outside of AWS; which is slow and expensive.
query options
use only keys in your projection-expression,
reducing the data that is transmitted from db
max page-size, reducing number of calls needed
Stream Option
Streams could also be used for keeping counts; e.g. as described in
https://medium.com/signiant-engineering/real-time-aggregation-with-dynamodb-streams-f93547cfb244
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-aggregation.html
Related SO Question
Complexity of finding total records count with partition key in nosql dynamodb table?
I just realized that using low level interface in QueryRequest one can set Select = "COUNT" then when calling QueryAsync() orQuery() will return the count only as a integer only. Please refer to code sample below.
private static QueryRequest getStockRecordCountQueryRequest(string tickerSymbol, string prefix)
{
string partitionName = ":v_PartitionKeyName";
string sortKeyPrefix = ":v_sortKeyPrefix";
var request = new QueryRequest
{
TableName = Constants.TableName,
ReturnConsumedCapacity = ReturnConsumedCapacity.TOTAL,
Select = "COUNT",
KeyConditionExpression = $"{Constants.PartitionKeyName} = {partitionName} and begins_with({Constants.SortKeyName},{sortKeyPrefix})",
ExpressionAttributeValues = new Dictionary<string, AttributeValue>
{
{ $"{partitionName}", new AttributeValue {
S = tickerSymbol
}},
{ $"{sortKeyPrefix}", new AttributeValue {
S = prefix
}}
},
// Optional parameter.
ConsistentRead = false,
ExclusiveStartKey = null,
};
return request;
}
but I would like to point out that this still will consumed the same read units as retrieving all the item and get its count by yourself. but since it is only returning the count as an integer, it is a lot more efficient then transmitting the entire items list cross the wire.
I think using DynamoDB Streams in a more proper way to get the counts for large project. It is just a lot more complicated to implement.
I want to find out details about a Gremlim query - so I set the PopulateQueryMetrics property of the FeedOptions argument to true.
But the FeedResponse object I get back doesn't have the QueryMetrics property populated.
var queryString = $"g.addV('{d.type}').property('id', '{d.Id}')";
var query = client.CreateGremlinQuery<dynamic>(graphCollection, queryString,
new FeedOptions {
PopulateQueryMetrics = true
});
while (query.HasMoreResults)
{
FeedResponse<dynamic> response = await query.ExecuteNextAsync();
//response.QueryMetrics is null
}
Am I missing something?
According to your description, I created my Azure Cosmos DB account with Gremlin (graph) API, and I could encounter the same issue as you mentioned. I found a tutorial Monitoring and debugging with metrics in Azure Cosmos DB and read the Debugging why queries are running slow section as follows:
In the SQL API SDKs, Azure Cosmos DB provides query execution statistics.
IDocumentQuery<dynamic> query = client.CreateDocumentQuery(
UriFactory.CreateDocumentCollectionUri(DatabaseName, CollectionName),
“SELECT * FROM c WHERE c.city = ‘Seattle’”,
new FeedOptions
{
PopulateQueryMetrics = true,
MaxItemCount = -1,
MaxDegreeOfParallelism = -1,
EnableCrossPartitionQuery = true
}).AsDocumentQuery();
FeedResponse<dynamic> result = await query.ExecuteNextAsync();
// Returns metrics by partition key range Id
IReadOnlyDictionary<string, QueryMetrics> metrics = result.QueryMetrics;
Then, I queried my Cosmos DB Gremlin (graph) account via the SQL API above, I retrieved the QueryMetrics as follows:
Note: I also checked that you could specify the SQL expression like this SELECT * FROM c where c.id='thomas' and c.label='person'. For adding new Vertex, I do not know how to construct the SQL expression. Moreover, the CreateDocumentAsync method does not support the FeedOptions parameter.
Per my understanding, the PopulateQueryMetrics setting may only work when using the SQL API. You could add your feedback here.
I am looking to get the count of all documents in a chosen partition. The following code however will return the count of all documents in the collection and costs 0 RU.
var collectionLink = UriFactory.CreateDocumentCollectionUri(databaseId, collectionId);
string command = "SELECT VALUE COUNT(1) FROM Collection c";
FeedOptions feedOptions = new FeedOptions()
{
PartitionKey = new PartitionKey(BuildPartitionKey(contextName, domainName)),
EnableCrossPartitionQuery = false
};
var count = client.CreateDocumentQuery<int>(collectionLink, command, feedOptions)
.ToList()
.First();
adding a WHERE c.partition = 'blah' clause to the query will work, but costs 3.71 RUs with 11 documents in the collection.
Why would the above code snippet return the Count of the whole Collection and is there a better solution to for getting the count of all documents in a chosen partition?
If the query includes a filter against the partition key, like SELECT
* FROM c WHERE c.city = "Seattle", it is routed to a single partition. If the query does not have a filter on partition key, then it is
executed in all partitions, and results are merged client side.
You could check the logical steps the SDK performs from this official doc when we issue a query to Azure Cosmos DB.
If the query is an aggregation like COUNT, the counts from individual
partitions are summed to produce the overall count.
So when you just use SELECT VALUE COUNT(1) FROM Collection c, it is executed in all partitions and results are merged client side.
If you want to get the count of all documents in a chosen partition, you just add the where c.partition = 'XX' filter.
Hope it helps you.
I believe this is actually a bug since I am having the same problem with the partition key set in both the query and the FeedOptions.
A similar issue has been reported here:
https://github.com/Azure/azure-cosmos-dotnet-v2/issues/543
And Microsoft's response makes it sound like it is an SDK issue that is x64-specific.
Imagine the following function which is querying a GlobalSecondaryIndex and associated Range Key in order to find a limited number of results:
#Override
public List<Statement> getAllStatementsOlderThan(String userId, String startingDate, int limit) {
if(StringUtils.isNullOrEmpty(startingDate)) {
startingDate = UTC.now().toString();
}
LOG.info("Attempting to find all Statements older than ({})", startingDate);
Map<String, AttributeValue> eav = Maps.newHashMap();
eav.put(":userId", new AttributeValue().withS(userId));
eav.put(":receivedDate", new AttributeValue().withS(startingDate));
DynamoDBQueryExpression<Statement> queryExpression = new DynamoDBQueryExpression<Statement>()
.withKeyConditionExpression("userId = :userId and receivedDate < :receivedDate").withExpressionAttributeValues(eav)
.withIndexName("userId-index")
.withConsistentRead(false);
if(limit > 0) {
queryExpression.setLimit(limit);
}
List<Statement> statementResults = mapper.query(Statement.class, queryExpression);
LOG.info("Successfully retrieved ({}) values", statementResults.size());
return statementResults;
}
List<Statement> results = statementRepository.getAllStatementsOlderThan(userId, UTC.now().toString(), 5);
assertThat(results.size()).isEqualTo(5); // NEVER passes
The limit isn't respected whenever I query against the database. I always get back all results that match my search criteria so if I set the startingDate to now then I get every item in the database since they're all older than now.
You should use queryPage function instead of query.
From DynamoDBQueryExpression.setLimit documentation:
Sets the maximum number of items to retrieve in each service request
to DynamoDB.
Note that when calling DynamoDBMapper.query, multiple
requests are made to DynamoDB if needed to retrieve the entire result
set. Setting this will limit the number of items retrieved by each
request, NOT the total number of results that will be retrieved. Use
DynamoDBMapper.queryPage to retrieve a single page of items from
DynamoDB.
As they've rightly answered the setLimit or withLimit functions limit the number of records fetched only in each particular request and internally multiple requests take place to fetch the results.
If you want to limit the number of records fetched in all the requests then you might want to use "Scan".
Example for the same can be found here
How to query range key programmatically in DynamoDB, I am using .Net AWSSDK ,I am able to query on Hash key with below code :
GetItemRequest request = new GetItemRequest
{
TableName = tableName
};
request.Key = new Dictionary<string,AttributeValue>();
request.Key.Add("ID",new AttributeValue { S = PKValue });
GetItemResponse response = client.GetItem(request);
Please suggest,
Thanks in advance.
There are two kinds of primary key in DynamoDB: Hash-only or Hash-Range.
In the above code I guess your table is Hash-only and you use the hash key to retrieve an element with hashkey equals to PKValue.
If your table is in H-R schema and you want to retrieve a specific element with a hashKey and rangeKey, you can reuse the above code and in addition, add the {"RangeKey", new AttributeValue } into your your request.KEY
On the other hand, query means a different thing in DynamoDB. Query will return you a list of rows sorted in some order.