Huge differences in performance between Order By ASC vs DESC - azure-cosmosdb

I am recording a huge difference between ORDER BY ASC and DESC in CosmosDB SQL API. ASC is almost 200x less expensive in RUs than DESC.
Here is output from my testing tool:
INFO ------------------- QUERY -----------------
SELECT TOP 100 * FROM root
WHERE
root.projectId = '783af8f2-8e2f-0083-5d86-2f60f34e11b4'
AND NOT ARRAY_CONTAINS(root.translatedLanguages, '0d42a87f-4d68-417b-99a9-a228cb63edce')
ORDER BY root._srtDue DESC
INFO ------------------- ROUND 1 -----------------
INFO Request Charge: 2532.53
INFO Count: 100
INFO Metrics: {
"TotalTime": "00:00:01.7036800",
"RetrievedDocumentCount": 38238,
"RetrievedDocumentSize": 236459696,
"OutputDocumentCount": 100,
"IndexHitRatio": 0.0,
"QueryPreparationTimes": {
"CompileTime": "00:00:00.0001900",
"LogicalPlanBuildTime": "00:00:00.0000700",
"PhysicalPlanBuildTime": "00:00:00.0000600",
"QueryOptimizationTime": "00:00:00.0000100"
},
"QueryEngineTimes": {
"IndexLookupTime": "00:00:00.0298500",
"DocumentLoadTime": "00:00:01.4093599",
"WriteOutputTime": "00:00:00.0001300",
"RuntimeExecutionTimes": {
"TotalTime": "00:00:00.2636001",
"SystemFunctionExecutionTime": "00:00:00.0132800",
"UserDefinedFunctionExecutionTime": "00:00:00"
}
},
"Retries": 0
}
vs
INFO ------------------- QUERY -----------------
SELECT TOP 100 * FROM root
WHERE
root.projectId = '783af8f2-8e2f-0083-5d86-2f60f34e11b4'
AND NOT ARRAY_CONTAINS(root.translatedLanguages, '0d42a87f-4d68-417b-99a9-a228cb63edce')
ORDER BY root._srtDue ASC
INFO ------------------- ROUND 1 -----------------
INFO Request Charge: 14.22
INFO Count: 100
INFO Metrics: {
"TotalTime": "00:00:00.0047500",
"RetrievedDocumentCount": 131,
"RetrievedDocumentSize": 187130,
"OutputDocumentCount": 100,
"IndexHitRatio": 0.75572519083969469,
"QueryPreparationTimes": {
"CompileTime": "00:00:00.0001300",
"LogicalPlanBuildTime": "00:00:00.0000700",
"PhysicalPlanBuildTime": "00:00:00.0000600",
"QueryOptimizationTime": "00:00:00.0000100"
},
"QueryEngineTimes": {
"IndexLookupTime": "00:00:00.0010400",
"DocumentLoadTime": "00:00:00.0020299",
"WriteOutputTime": "00:00:00.0002200",
"RuntimeExecutionTimes": {
"TotalTime": "00:00:00.0008301",
"SystemFunctionExecutionTime": "00:00:00.0000500",
"UserDefinedFunctionExecutionTime": "00:00:00"
}
},
"Retries": 0
}
I have not found how exactly is IndexHitRatio computed and how is Cosmos DB execution planned, but it looks to me, that in this particular case, it runs predicates against documents in specified order direction, and documents fulfilling those predicates are at the end of that sorting order so it has to read lots of documents, 38K, to get those TOP 100 output documents.
We believe we have all properly indexed:
path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
...
"path": "/translatedLanguages/[]/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
Unfortunately, this performance is not acceptable for our use case and if we don't solve it somehow we would be forced to change database engine.
Is there a way to tweak this execution plan somehow to improve performance?

Related

Cosmos DB Set Index Precision

Get a question related to setting precision using cosmos db sdk .NET v3.
Based on the documentation it says
This indexing policy is equivalent to the one below which manually sets kind, dataType, and precision to their default values. These properties are no longer necessary to explicitly set and you should omit them from your indexing policy entirely (as shown in above example). If you try to set these properties, they'll be automatically removed from your indexing policy.
{
"indexingMode": "consistent",
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}
],
"excludedPaths": [
{
"path": "/path/to/single/excluded/property/?"
},
{
"path": "/path/to/root/of/multiple/excluded/properties/*"
}
]
}
Does it mean any settings on precision will be ignored, or just when precision = -1.
Also is there any way to set precision in .NET SDK V3? (I cannot find any api for it)

Document DB and SQL operators

I am trying to run simple SQL query in Azure Document DB, here is how document look like:
As you can see I store coordinates as double. Now I attempt to run simple query just to test it SELECT * FROM locations WHERE locations.Latitude.CoordinateStart <= 50.123456 and this is not failing but it return 0 results:
For a little bit I thought well maybe I am wrong because I cannot use such a long decimal due to limitations, but if I change those to integer (multiply by 100 values), my coordinate would be 33644729 and my query would look for <= 50123456. In this case I am still not getting any results in query, getting 0. What is missing here?
EDIT:
Indexing policy looks like this
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Spatial",
"dataType": "Point"
}
]
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
]
}
Those are default settings, I haven't touched them upon collection creation.
Try this query:
SELECT * FROM locations WHERE locations.Latitude[0].CoordinateStart <= 50.123456

How can I select a filtered child document collection when querying a top level document in cosmos db

I'm trying to filter the child documents returned when querying a parent document using the sql api in cosmos db.
For example given this document:
{
"customerName": "Wallace",
"customerReference": 666777,
"orders": [
{
"date": "20181105T00:00:00",
"amount": 118.84,
"description": "Laptop Battery"
},
{
"date": "20181105T00:00:00",
"amount": 81.27,
"description": "Toner"
},
{
"date": "20181105T00:00:00",
"amount": 55.12,
"description": "Business Cards"
},
{
"date": "20181105T00:00:00",
"amount": 281.00,
"description": "Espresso Machine"
}]
}
I would like to query the customer to retrieve the name, reference and orders over 100.00 to produce a results like this
[{
"customerName": "Wallace",
"customerReference": 666777,
"orders": [
{
"date": "20181105T00:00:00",
"amount": 118.84,
"description": "Laptop Battery"
},
{
"date": "20181105T00:00:00",
"amount": 281.00,
"description": "Espresso Machine"
}]
}]
the query I have so far is as follows
SELECT c.customerName, c.customerReference, c.orders
from c
where c.customerReference = 666777
and c.orders.amount > 100
this returns an empty set
[]
and if you remove "and c.orders.amount > 100" it matches the document and returns all orders.
To reproduce this issue I simply set up a new database, added a new collection and copied the json example in as the only document. The index policy is left as the default which I've copied below.
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Spatial",
"dataType": "Point"
}
]
}
],
"excludedPaths": []
}
Cosmos DB doesn't support the deep filtering in the way I attempted in my original query.
To achieve the results described you need to use a subquery using a combination of ARRAY and VALUE as follows:
SELECT
c.customerName,
c.customerReference,
ARRAY(SELECT Value ord from ord in c.orders WHERE ord.amount > 100) orders
from c
where c.customerReference = 666777
note the use of 'ord' - 'order' is a reserved word.
The query then produces the correct result - eg
[{
"customerName": "Wallace",
"customerReference": 666777,
"orders": [
{
"date": "20181105T00:00:00",
"amount": 118.84,
"description": "Laptop Battery"
},
{
"date": "20181105T00:00:00",
"amount": 281.00,
"description": "Espresso Machine"
}
]
}]

Exclude Path in Azure Cosmos DB

What is the correct JSON to exclude certain keys from an input json to be not indexed by Azure CosmosDB. We are using the CosmosDB in mongodb mode. Was planning to change the index configuration on the Azure Portal after creating the collection.
Sample Input Json being
{
"name": "test",
"age": 1,
"location": "l1",
"height":5.7
}
If I were to include name and age in the index and remove location and height from the index, what does the includedPaths and excludedPaths look like.
Finally got it to work with the below spec:-
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [{
"path": "/*",
"indexes": [{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
}],
"excludedPaths": [{
"path": "/\"location\"/?"
},
{
"path": "/\"height\"/?"
}
]
}
It looks like underlying implementation has changed and at the time of writing the documentation does not cover changing indexPolicy in MongoDB flavoured CosmosBD. Because the documents are really stored in a wired way where all the keys start from root $v and all scalar fields are stored as documents, containing value and type information. So your document will be stored something like:
{
'_etag': '"2f00T0da-0000-0d00-0000-4cd987940000"',
'id': 'SDSDFASDFASFAFASDFASDF',
'_self': 'dbs/sMsxAA==/colls/dVsxAI8MBXI=/docs/sMsxAI8MBXIKAAAAAAAAAA==/',
'_rid': 'sMsxAI8MBXIKAAAAAAAAAA==',
'$t': 3,
'_attachments': 'attachments/',
'$v': {
'_id': {
'$t': 7,
'$v': "\\Ù\x87\x14\x01\x15['\x18m\x01ú"
},
'name': {
'$t': 2,
'$v': 'test'
},
'age': {
'$t': 1,
'$v': 1
},
...
},
'_ts': 1557759892
}
Therefore the indexingPolicy paths need to include the root $v and use /* (objects) instead of /? (scalars).
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number"
},
{
"kind": "Hash",
"dataType": "String"
}
]
}
],
"excludedPaths": [
{"path": "/\"$v\"/\"location\"/*"},
{"path": "/\"$v\"/\"height\"/*"}
]
}
PS:
Also mongo api can be used to drop all the default indexes and create specific indexes as required
https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb-indexing

Google Cloud Datastore very slow after trying to get more than 100 000 entities

I'm trying to get the large amount of entities with runQuery request.
I have an entity kind Task, which contains:
integer id
integer TaskGroupId
non-indexed string field (~200B)
The request body:
{
"query": {
"kind": [
{
"name": "Task"
}
],
"filter": {
"propertyFilter": {
"property": {
"name": "TaskGroupId"
},
"value": {
"integerValue": "501"
},
"op": "EQUAL"
}
}
},
"partitionId": {
"namespaceId": "local"
}
}
I have ~2 000 000 entities of this type. If I try to run the request which should return about 100 000 entities it will take ~4 minutes to execute.
Is it appropriate performance or am I doing something wrong?
Is there any way to speed up this request?

Resources