Cosmos DB Set Index Precision - azure-cosmosdb

Get a question related to setting precision using cosmos db sdk .NET v3.
Based on the documentation it says
This indexing policy is equivalent to the one below which manually sets kind, dataType, and precision to their default values. These properties are no longer necessary to explicitly set and you should omit them from your indexing policy entirely (as shown in above example). If you try to set these properties, they'll be automatically removed from your indexing policy.
{
"indexingMode": "consistent",
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}
],
"excludedPaths": [
{
"path": "/path/to/single/excluded/property/?"
},
{
"path": "/path/to/root/of/multiple/excluded/properties/*"
}
]
}
Does it mean any settings on precision will be ignored, or just when precision = -1.
Also is there any way to set precision in .NET SDK V3? (I cannot find any api for it)

Related

Updating CosmosDb indexing policy through ARM templates

I am trying to use ARM templates to update the indexing policy for cosmos container. I tried 2 methods, one to simply declare the indexing policy in while declaring the container in ARM.
{
"apiVersion": "[variables('cosmosDbApiVersion')]",
"type": "Microsoft.DocumentDB/databaseAccounts/apis/databases/containers",
"dependsOn": [ /* resourceId */ ],
"name": "/* containerName */",
"properties": {
"resource": {
"id": "/* id */",
"partitionKey": {
"paths": [
"/partitionKey"
],
"kind": "Hash"
},
"indexes": [
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
}
]
}
],
"defaultTtl": "[variables('defaultTtlValueToEnableTtl')]"
}
}
},
The second was to use to use ARM to deploy container setting as such:
{
"apiVersion": "[variables('cosmosDbApiVersion')]",
"type": "Microsoft.DocumentDB/databaseAccounts/apis/databases/containers/settings",
"name": "[/* name */",
"dependsOn": [ " /* container name */" ],
"properties": {
"resource": {
"throughput": "/* some throughput */",
"indexes": [
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
}
]
}
]
}
}
},
Both techniques do not fail deployment but the indexing policy does not change.
Would appreciate some help.
this is the example from the template reference (looks slightly different to what you are doing):
"resource": {
"id": "string",
"indexingPolicy": {
"automatic": "boolean",
"indexingMode": "string",
"includedPaths": [
{
"path": "string",
"indexes": [
{
"dataType": "string",
"precision": "integer",
"kind": "string"
}
]
}
],
"excludedPaths": [
{
"path": "string"
}
],
"spatialIndexes": [
{
"path": "string",
"types": [
"string"
]
}
]
},
xxx
}
https://learn.microsoft.com/en-us/azure/templates/microsoft.documentdb/2019-08-01/databaseaccounts/sqldatabases/containers
Range and hash index types are ignored by the Cosmos resource provider now for new containers or containers that were created within the past year or so. ARM does not validate the index policy which is why the template will deploy successfully.
Hash index was deprecated for these newer container because the performance of the range index in the new indexer is surpasses what hash index provided so was no longer necessary.
To create/modify index policy refer to this article below. There are multiple examples of index policies that implement everything from very simple to more complex policies that include composite indexes, spatial indexes and unique keys.
https://learn.microsoft.com/en-us/azure/cosmos-db/manage-sql-with-resource-manager#create-resource

Huge differences in performance between Order By ASC vs DESC

I am recording a huge difference between ORDER BY ASC and DESC in CosmosDB SQL API. ASC is almost 200x less expensive in RUs than DESC.
Here is output from my testing tool:
INFO ------------------- QUERY -----------------
SELECT TOP 100 * FROM root
WHERE
root.projectId = '783af8f2-8e2f-0083-5d86-2f60f34e11b4'
AND NOT ARRAY_CONTAINS(root.translatedLanguages, '0d42a87f-4d68-417b-99a9-a228cb63edce')
ORDER BY root._srtDue DESC
INFO ------------------- ROUND 1 -----------------
INFO Request Charge: 2532.53
INFO Count: 100
INFO Metrics: {
"TotalTime": "00:00:01.7036800",
"RetrievedDocumentCount": 38238,
"RetrievedDocumentSize": 236459696,
"OutputDocumentCount": 100,
"IndexHitRatio": 0.0,
"QueryPreparationTimes": {
"CompileTime": "00:00:00.0001900",
"LogicalPlanBuildTime": "00:00:00.0000700",
"PhysicalPlanBuildTime": "00:00:00.0000600",
"QueryOptimizationTime": "00:00:00.0000100"
},
"QueryEngineTimes": {
"IndexLookupTime": "00:00:00.0298500",
"DocumentLoadTime": "00:00:01.4093599",
"WriteOutputTime": "00:00:00.0001300",
"RuntimeExecutionTimes": {
"TotalTime": "00:00:00.2636001",
"SystemFunctionExecutionTime": "00:00:00.0132800",
"UserDefinedFunctionExecutionTime": "00:00:00"
}
},
"Retries": 0
}
vs
INFO ------------------- QUERY -----------------
SELECT TOP 100 * FROM root
WHERE
root.projectId = '783af8f2-8e2f-0083-5d86-2f60f34e11b4'
AND NOT ARRAY_CONTAINS(root.translatedLanguages, '0d42a87f-4d68-417b-99a9-a228cb63edce')
ORDER BY root._srtDue ASC
INFO ------------------- ROUND 1 -----------------
INFO Request Charge: 14.22
INFO Count: 100
INFO Metrics: {
"TotalTime": "00:00:00.0047500",
"RetrievedDocumentCount": 131,
"RetrievedDocumentSize": 187130,
"OutputDocumentCount": 100,
"IndexHitRatio": 0.75572519083969469,
"QueryPreparationTimes": {
"CompileTime": "00:00:00.0001300",
"LogicalPlanBuildTime": "00:00:00.0000700",
"PhysicalPlanBuildTime": "00:00:00.0000600",
"QueryOptimizationTime": "00:00:00.0000100"
},
"QueryEngineTimes": {
"IndexLookupTime": "00:00:00.0010400",
"DocumentLoadTime": "00:00:00.0020299",
"WriteOutputTime": "00:00:00.0002200",
"RuntimeExecutionTimes": {
"TotalTime": "00:00:00.0008301",
"SystemFunctionExecutionTime": "00:00:00.0000500",
"UserDefinedFunctionExecutionTime": "00:00:00"
}
},
"Retries": 0
}
I have not found how exactly is IndexHitRatio computed and how is Cosmos DB execution planned, but it looks to me, that in this particular case, it runs predicates against documents in specified order direction, and documents fulfilling those predicates are at the end of that sorting order so it has to read lots of documents, 38K, to get those TOP 100 output documents.
We believe we have all properly indexed:
path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
...
"path": "/translatedLanguages/[]/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
Unfortunately, this performance is not acceptable for our use case and if we don't solve it somehow we would be forced to change database engine.
Is there a way to tweak this execution plan somehow to improve performance?

Document DB and SQL operators

I am trying to run simple SQL query in Azure Document DB, here is how document look like:
As you can see I store coordinates as double. Now I attempt to run simple query just to test it SELECT * FROM locations WHERE locations.Latitude.CoordinateStart <= 50.123456 and this is not failing but it return 0 results:
For a little bit I thought well maybe I am wrong because I cannot use such a long decimal due to limitations, but if I change those to integer (multiply by 100 values), my coordinate would be 33644729 and my query would look for <= 50123456. In this case I am still not getting any results in query, getting 0. What is missing here?
EDIT:
Indexing policy looks like this
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Spatial",
"dataType": "Point"
}
]
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
]
}
Those are default settings, I haven't touched them upon collection creation.
Try this query:
SELECT * FROM locations WHERE locations.Latitude[0].CoordinateStart <= 50.123456

Exclude Path in Azure Cosmos DB

What is the correct JSON to exclude certain keys from an input json to be not indexed by Azure CosmosDB. We are using the CosmosDB in mongodb mode. Was planning to change the index configuration on the Azure Portal after creating the collection.
Sample Input Json being
{
"name": "test",
"age": 1,
"location": "l1",
"height":5.7
}
If I were to include name and age in the index and remove location and height from the index, what does the includedPaths and excludedPaths look like.
Finally got it to work with the below spec:-
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [{
"path": "/*",
"indexes": [{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
}],
"excludedPaths": [{
"path": "/\"location\"/?"
},
{
"path": "/\"height\"/?"
}
]
}
It looks like underlying implementation has changed and at the time of writing the documentation does not cover changing indexPolicy in MongoDB flavoured CosmosBD. Because the documents are really stored in a wired way where all the keys start from root $v and all scalar fields are stored as documents, containing value and type information. So your document will be stored something like:
{
'_etag': '"2f00T0da-0000-0d00-0000-4cd987940000"',
'id': 'SDSDFASDFASFAFASDFASDF',
'_self': 'dbs/sMsxAA==/colls/dVsxAI8MBXI=/docs/sMsxAI8MBXIKAAAAAAAAAA==/',
'_rid': 'sMsxAI8MBXIKAAAAAAAAAA==',
'$t': 3,
'_attachments': 'attachments/',
'$v': {
'_id': {
'$t': 7,
'$v': "\\Ù\x87\x14\x01\x15['\x18m\x01ú"
},
'name': {
'$t': 2,
'$v': 'test'
},
'age': {
'$t': 1,
'$v': 1
},
...
},
'_ts': 1557759892
}
Therefore the indexingPolicy paths need to include the root $v and use /* (objects) instead of /? (scalars).
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number"
},
{
"kind": "Hash",
"dataType": "String"
}
]
}
],
"excludedPaths": [
{"path": "/\"$v\"/\"location\"/*"},
{"path": "/\"$v\"/\"height\"/*"}
]
}
PS:
Also mongo api can be used to drop all the default indexes and create specific indexes as required
https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb-indexing

Swagger 2 - modelling arrays to ensure empty collections rather than null fields

We've done quite a bit of swagger modelling, but one thing I am struggling with is how we could define that all our arrays will never be null, but will be an empty collection if we have no data.
Is there a way of explicitly defining this in Swagger?
Thanks
To define a model with a mandatory array, you can use required property :
{
"type": "object",
"required": [
"nonNullableArray"
],
"properties": {
"nonNullableArray": {
"type": "array",
"items": {
"type": "string"
}
},
"nullableArray": {
"type": "array",
"items": {
"type": "string"
}
}
}
}

Resources