I have a CosmosDB collection with id field and a partition key ManagerName.
When I run two queries.
SELECT * FROM c
where c.id = '76e24380-71cb-45d5-807a-ce2374f57624' and c.ManagerName ='Darin Jast2'
SELECT * FROM c
where c.id = '76e24380-71cb-45d5-807a-ce2374f57624'
in data explorer the RU's result is sort of strange. For the first query I get 3.070 RUs and the second I get 2.9 RUs. Almost every time I run the two queries?
That is strange to me because from what I read when you have a partition id in the where clause the query will run on a single partition.
The stranger thing is that when I run a
SELECT * FROM c
where c.ManagerName ='Darin Jast2'
I get 2.9 in fact any field I get the same number. It seams to be related to the number of where conditions instead of having or not having partitions?
Can someone explain to me what is going on here and why am I getting the results. Dose this have something to do with indexing? Size of the collection? Number of partitions?
All the resources I found on CosmosDb say you should include the partition key in your query and if you can do single partition queries.
Related
I need to run a query to find all documents with duplicated e-mails.
SELECT * FROM (SELECT c.Email, COUNT(1) as cnt FROM c GROUP BY c.Email) a WHERE a.cnt > 1
When I run it in Data Explorer in Azure Portal it finds 4 results, but it's not a complete list of duplicated emails, because I already know one email that is duplicated and when the query is narrowed (where email = 'x') it is returned and there are about 70 duplicated emails in the collection.
Currently, throughput is set to autoscale with 6000 Max RU/s, the collection has about 4kk of documents. When running the query I observe an increased count of 429s responses on this collection.
Query Statistics shows that all documents are retrieved from the collection, but output is only 4 (should be around 70).
Query used 277324 RUs and took 71 seconds which gives 3905 RU/s in average, so it shouldn't be throttled.
Why cosmos returns only limited results for this query?
What can I do to get all duplicates?
Considering the following query:
SELECT TOP 1 * FROM c
WHERE c.Type = 'Case'
AND c.Entity.SomeField = #someValue
AND c.Entity.CreatedTimeUtc > #someTime
ORDER BY c.Entity.CreatedTimeUtc DESC
Until recently, when I ran this query, the number of documents processed by the query (RetrievedDocumentCount in the query metrics) was the number of documents that satisfies the first two condition, regardless the "CreatedTimeUtc" or the TOP 1.
Only when I added a composite index of (Type DESC, Entity.SomeField DESC, Entity.CreatedTimeUtc DESC) and added them to the ORDER BY clause, the retrieved documents count dropped to the number of documents that satisfies all 3 conditions (still not one document as expected, but better).
Then, starting a few days ago, we noticed in our dev environment that the composite index is no longer needed as retrieved documents count changed to only one document (= the number in the TOP, as expected), and the RU/s reduced significantly.
My question – is this a new improvement/fix in CosmosDB? I couldn’t find any announcement/documentation on this manner.
If so, is the roll-out completed or still in-progress? We have several production instances in different regions.
Thanks
There have not been any recent changes to our query engine that would explain why this query is suddenly less expensive.
The only thing that would explain this is fewer results match the filter than before and that our query engine was able to perform an optimization that it would not otherwise be able to have done with a larger set of results.
Thanks.
I'm confusing about the partition key with cosmos db. I have a database/container with about 4000 small records. If I try a sql statement with my partition key filter, the RUs and the duration time is larger then without.
Does someone understand this?
in this sample my partition key of the container is /partitionKey
I tried this statement:
SELECT * FROM c where c.partitionKey = 'userSettings' And c.deleted =false
Request Charge 50 RUs
Document load time 2.15 ms
and then this
SELECT * FROM c where c.cosmosEntityName = 'userSettings' And c.deleted =false
Request Charge 5 RUs
Document load time 0.38 ms
I expect exactly the opposite results.
Here some screenshots:
This question is very specific to the topology of your collection (which Azure support can help with), but generally speaking there are two cases where the latter query on non-partition key property can be lower in RUs than the partition key property:
List item
If the query on non-partition key property is incomplete, the RUs may appear lower, but you still need to read results from other partitions to ascertain there are no more results. You would have to click "More Results" in Data Explorer until it is grayed out
For this specific query where c.partitionKey = 'userSettings' And c.deleted =false, you should compare RUs with and without a composite index on /partitionKey/? and /deleted/? (https://learn.microsoft.com/azure/cosmos-db/how-to-manage-indexing-policy#composite-indexing-policy-examples). In some cases, you will get lower RUs with the composite index than with the default of /* which only indexes them individually, potentially close to ~5 RUs
In Azure Cosmos DB (SQL API) the following query charges 9356.66 RU's:
SELECT * FROM Core c WHERE c.id = #id -- #id is a GUID
In contrast the following more complex query charges only 6.84 RU's:
SELECT TOP 10 * FROM Core c WHERE c.type = "Agent"
The documents in both examples are pretty small having a handful of attributes. Also the document collection does not use any custom indexing policy. The collection contains 105685 documents.
To me this sounds as if there is no properly working index on the "id" field in place.
How is this possible and how can this be fixed?
Updates:
Without the TOP keyword the second query charges 3516.35 RU's and returns 100000 records.
The partition key is "/partition" and its values are 0 or 1 (evenly distributed).
If you have partition collection you need to specify partition keyif you want to do request most efficiently. Cross-partition queries is really expensive (and slower) in cosmos, because partitions data can be stored in different places.
Try following:
SELECT * FROM Core c WHERE c.id = #id AND c.partition = #partition
Or, specify partition key in feed options if you're using CosmosDB SDK.
Let me know, if this helps.
I assume the solution is the same as posted here:
Azure DocumentDB Query by Id is very slow
I will close my own question once I am able to verify this with Microsoft Support.
I need to run a query which joins 5 large table on user_id and filter it on proc_date.
I have planed to do partition on proc_date and partition(5 range partition) on user_id to increase query performance. I keep primary index as well on proc_date and user_id.
"But how can I run the query for just one partition of the user_id at a time? I want to restrict the query to join first partition(on User_id) of every table"
Reason behind this is, once I complete the query for first partition, I can send the output data for next process. While next process is running i can run the query for 2nd partition.
Could anyone please give me some solution to achieve this.