All these days, in our cosmos db we had non partitioned collections & recently moved our app data to partitioned collections to overcome 10gb cap on single partition.
Few things we have noticed right after introducing partitions.
ResourceResponse.ContentLocation property returns null. (usually it returns collection path like "dbs/developmentdb/colls/accountmodel" as value with non partitioned collections)
For "GetAll" query (provided same data maintained in both partitioned and non partitioned collections)
RUs went up (from 400RUs to 750RUs)
Slower response time
For your reference included below, the code used. Appreciate any of your suggestions to reduce RUs & improve response time (OR) these all are overheads of moving to partitioned collection? please suggest
Sample code used:
var docClient = await _documentClient;
var docDb = await _documentDatabase;
var docCollection = await _documentCollection;
var queryFeed = new FeedOptions()
{
MaxItemCount = -1,
MaxDegreeOfParallelism = -1,
EnableCrossPartitionQuery = true
};
var documentCollectionUri = UriFactory.CreateDocumentCollectionUri(docDb.Id, docCollection.Id);
IDocumentQuery<T> query = docClient.CreateDocumentQuery<T>(documentCollectionUri, queryFeed).AsDocumentQuery();
while (query.HasMoreResults)
{
var page = await query.ExecuteNextAsync<T>();
result.AddRange(page);
_rULogHelper.LogFromFeedResponse(page, docDb.Id, docCollection.Id, DBOperationType.GET.ToString()); //custom logging related code
}
Related
I read the article here about IAsyncEnumerable, more specifically towards a Cosmos Db-datasource
public async IAsyncEnumerable<T> Get<T>(string containerName, string sqlQuery)
{
var container = GetContainer(containerName);
using FeedIterator<T> iterator = container.GetItemQueryIterator<T>(sqlQuery);
while (iterator.HasMoreResults)
{
foreach (var item in await iterator.ReadNextAsync())
{
yield return item;
}
}
}
I am wondering how the CosmosDB is handling this, compared to paging, lets say 100 documents at the time. We have had some "429 - Request rate too large"-errors in the past and I dont wish to create new ones.
So, how will this affect server load/performance.
I dont see a big difference from the servers perspective, between when client is streaming (and doing some quick checks), and old way, get all document and while (iterator.HasMoreResults) and collect the items in a list.
The SDK will retrieve batches of documents that can be adjusted in size using the QueryRequestOptions and changing the MaxItemCount (which defaults to 100 if not set). It has no option though to throttle the RU usage apart from it running into the 429 error and using the retry mechanism the SDK offers to retry a while later. Depending on how generous you set the retry mechanism it'll retry oft & long enough to get a proper response.
If you have a situation where you want to limit the RU usage for e.g. there's multiple processes using your cosmos and you don't want those to result in 429 errors you would have to write the logic yourself.
An example of how something like that could look:
var qry = container
.GetItemLinqQueryable<Item>(requestOptions: new() { MaxItemCount = 2000 })
.ToFeedIterator();
var results = new List<Item>();
var stopwatch = new Stopwatch();
var targetRuMsRate = 200d / 1000; //target 200RU/s
var previousElapsed = 0L;
var delay = 0;
stopwatch.Start();
var totalCharge = 0d;
while (qry.HasMoreResults)
{
if (delay > 0)
{
await Task.Delay(delay);
}
previousElapsed = stopwatch.ElapsedMilliseconds;
var response = await qry.ReadNextAsync();
var charge = response.RequestCharge;
var elapsed = stopwatch.ElapsedMilliseconds;
var delta = elapsed - previousElapsed;
delay = (int) ((charge - targetRuMsRate * delta) / targetRuMsRate);
foreach (var item in response)
{
results.Add(item);
}
}
Edit:
Internally the SDK will call the underlying Cosmos REST API. Once your code reaches the iterator.ReadNextSync() it will call the query documents method in the background. If you would dig into the source code or intercept the message send to HttpClient you can observe the resulting message which lacks the x-ms-max-item-count header that determines the number of the documents it'll try to retrieve (unless you have specified a MaxItemCount yourself). According to the Microsoft Docs it'll default to 100 if not set:
Query requests support pagination through the x-ms-max-item-count and x-ms-continuation request headers. The x-ms-max-item-count header specifies the maximum number of values that can be returned by the query execution. This can be between 1 and 1000, and is configured with a default of 100.
I have query that returns 25000 documents from Cosmos DB.
Currently I do it like:
FeedOptions feedOptions = new FeedOptions();
feedOptions.MaxItemCount = -1;
feedOptions.EnableCrossPartitionQuery = true;
var result = new List<ProductDocument>();
var documentQuery = _client.CreateDocumentQuery<ProductDocument>(UriFactory.CreateDocumentCollectionUri(_databaseName, _collectionName), feedOptions)
.Where(p => p.CompanyId == companyId);
However this gets in batches of around 375 documents, so takes a while.
What I would like to do is fetch 10 pages at a time (so I would get around 3750 ish each time). I don't miond increasing RU to support this, however I can't find a way to achieve this at all.
Is it possible? Or is there another way to return a lot of documents efficiently?
I can found some samples stating following should be bulk inserts:
var options = new CosmosClientOptions() { AllowBulkExecution = true, MaxRetryAttemptsOnRateLimitedRequests = 1000 };
Client = new CosmosClient(ConnStr, options);
public async Task AddVesselsFromJSON(List<JObject> vessels)
{
List<Task> concurrentTasks = new List<Task>();
foreach (var vessel in vessels)
{
concurrentTasks.Add(VesselContainer.UpsertItemAsync(vessel));
}
await Task.WhenAll(concurrentTasks);
}
I am running the code on an Azure Function (App Plan) with 10 instances. However I can see it is only around 4 inserts pr seconds. With SQL bulk insert I can do thousands a second. It does not seem like above is bulk inserting have I missed something?
Check your Cosmos DB Scale settings. I ran into the same issue. When you change from manual to autoscale, the default max RUs are set to 4000. Change it to an appropriate number based on your scenario. You can use https://cosmos.azure.com/capacitycalculator/
I have an error when I'm trying to use BulkExecutor to update one of the properties in CosmosDb. The error message is "Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: index"
Important point- I don't have partition key defined on my collection.
Here is my code:
SetUpdateOperation<string> player1NameUpdateOperation = new SetUpdateOperation<string>("Player1Name", name);
var updateOperations = new List<UpdateOperation>();
updateOperations.Add(player1NameUpdateOperation);
var updateItems = new List<UpdateItem>();
foreach (var match in list)
{
string id = match.id;
updateItems.Add(new UpdateItem(id, null, updateOperations));
}
var executor = new Microsoft.Azure.CosmosDB.BulkExecutor.BulkExecutor(_client, _collection);
await executor.InitializeAsync();
var executeResult = await executor.BulkUpdateAsync(updateItems);
var count = executeResult.NumberOfDocumentsUpdated;
What am I missing?
If I run the bulk executor on a collection without a partition key, I get the same error. If I run it with a collection that does have it and i specify it, the bulk executor works fine.
Pretty sure they just don't support it right now through the bulk executor api, just use the normal cosmos api for updating the doc as a workaround for now.
I want to find out details about a Gremlim query - so I set the PopulateQueryMetrics property of the FeedOptions argument to true.
But the FeedResponse object I get back doesn't have the QueryMetrics property populated.
var queryString = $"g.addV('{d.type}').property('id', '{d.Id}')";
var query = client.CreateGremlinQuery<dynamic>(graphCollection, queryString,
new FeedOptions {
PopulateQueryMetrics = true
});
while (query.HasMoreResults)
{
FeedResponse<dynamic> response = await query.ExecuteNextAsync();
//response.QueryMetrics is null
}
Am I missing something?
According to your description, I created my Azure Cosmos DB account with Gremlin (graph) API, and I could encounter the same issue as you mentioned. I found a tutorial Monitoring and debugging with metrics in Azure Cosmos DB and read the Debugging why queries are running slow section as follows:
In the SQL API SDKs, Azure Cosmos DB provides query execution statistics.
IDocumentQuery<dynamic> query = client.CreateDocumentQuery(
UriFactory.CreateDocumentCollectionUri(DatabaseName, CollectionName),
“SELECT * FROM c WHERE c.city = ‘Seattle’”,
new FeedOptions
{
PopulateQueryMetrics = true,
MaxItemCount = -1,
MaxDegreeOfParallelism = -1,
EnableCrossPartitionQuery = true
}).AsDocumentQuery();
FeedResponse<dynamic> result = await query.ExecuteNextAsync();
// Returns metrics by partition key range Id
IReadOnlyDictionary<string, QueryMetrics> metrics = result.QueryMetrics;
Then, I queried my Cosmos DB Gremlin (graph) account via the SQL API above, I retrieved the QueryMetrics as follows:
Note: I also checked that you could specify the SQL expression like this SELECT * FROM c where c.id='thomas' and c.label='person'. For adding new Vertex, I do not know how to construct the SQL expression. Moreover, the CreateDocumentAsync method does not support the FeedOptions parameter.
Per my understanding, the PopulateQueryMetrics setting may only work when using the SQL API. You could add your feedback here.