Cosmos SQL query to flatten all nested properties with a given name - azure-cosmosdb

We persist Cosmos documents with a tree of nodes that can have an unbounded depth, e.g. children have arrays of children who have arrays of children etc etc.
We need to query the document and flatten all properties of a given name (queryRef) from all levels in the tree.
This is the persisted document:
{
id: "1",
children: [
{
id: "2",
queryRef: {ref: "a"},
children: [
{
id: "3",
queryRef: {ref: "b"},
children: [
{
id: "5",
queryRef: {ref: "c"},
children: [
...
]
}
]
},
...
]
}
]
}
This is the desired response:
{
id: "1",
queryRefs: [
{
ref: "a",
},
{
ref: "b",
},
{
ref: "c",
}
]
}
With a query along the lines of:
SELECT c.id, c.children[...].queryRef FROM c

While Cosmos DB's query engine is designed to work on hierarchical data, it does not provide the capability to write queries that recursively traverses arrays within objects of unknown depth then project out flat as you've defined. You need to tell the query engine the structure in which to look and project back out.
Another issue here too is the structure you've defined does not follow the general rules of when to embed vs when to reference. Unbounded arrays should be modeled as individual documents rather than embedded within a single document. Doing so provides multiple benefits.
It ensures that as the amount of data grows, the performance and cost of operations on that data has consistent latency and remains efficient. As the size of a document grows, the cost of operations on it grows in a non-linear fasion.
Inserting these children as individual documents allows you to query the data using generally simple queries.
It avoids the potential your document hits the 2MB limit for a document in Cosmos DB.
You certainly can have tree-like structures within your documents and modeling one-to-few relationships is definitely fine for embedding within a single document. But whether a simple query or more complex where you need to traverse its structure, you need to have a defined model that allows you to identify individual objects to both project and apply filter predicates on the data.

Related

Impact of Mongo API projecion in Cosmos Request unit charge

I can read from Microsoft documentation the RU is impacted due to document size as per the documentation. Is it the size of stored document or the retrieved document. I have a document with lot of entries under the nested level. If I retrieve only the property at level 1 will it reduce the RU?
For example the document is show below. Consider the association level has more than 15000 entries
{
"name": "hi",
"data":"demo",
"associations": [
{
"name": "assoc1"
},
{
"name": "assoc2"
},
{
"name": "assoc3"
},
{
"name": "assoc4"
},
{
"name": "assoc5"
}
]
}
Wil there a difference in RU between the two mongo queries considering the document size is 500KB?
Query without projection:
db.getCollection("demo").find({"name":"hi"})
Query with projection:
db.getCollection("demo").find( {"name":"hi"} , {"data":true} )
I noticed a change in RU between this two query. But I didn't see this mentioned in the document I searched.
If the query engine needs to traverse a large document to project results then it will consume more RU/s than when it doesn't.
The bigger issue I think is a document with an array of more than 15K items. Unbounded or super large arrays is generally not a good pattern for Cosmos DB, especially if they have asymmetric update patterns because updates will replace the entire document.

Filtering Firebase Firestore document's array of maps

I have a field named cartons in a document which is an array of maps.
**each document in a collection
**document-1
{ lotNo: 'alg-100',
cartons:[
{cartonNo:01, trackingNo:'a'},
{cartonNo:02, trackingNo:'b'},
{cartonNo:03, trackingNo:'c'},
{cartonNo:04, trackingNo:'a'}
]
}
**document-2
{ lotNo: 'alg-101',
cartons:[
{cartonNo:01, trackingNo:'a'},
{cartonNo:02, trackingNo:'b'},
{cartonNo:03, trackingNo:'c'},
{cartonNo:04, trackingNo:'a'}
]
}
what I need is only the carton object where trackingNo='a' from each document. As far I know I can't get partial data from a firestore document(either get the full document on not). So to get the carton object of similar trackingNo from different documents I am assuming get both of the documents then filter the data in client-side. Or is there any better way? what could be the best possible solution for achieving only the carton that has a similar trackingNo(as an array) from different documents ( without changing the data structure as my app is heavily relying on this particular data structure)?
Unfortunately you can't index based on properties inside of an array. In the Realtime Database, properties of an array could be indexed as cartons.<index>.trackingNo (e.g. cartons.0.trackingNo), but if you queried this, you would only get documents that contain the requested tracking number as their first carton entry. To get all results, you would need to query again for each subsequent index - cartons.1.trackingNo, cartons.2.trackingNo, and so on.
If the data is as simple as you have shown, the best option would be to tweak your data structure slightly so that you also store a list of tracking numbers in the given lot. This will allow you to perform array-contains queries on a trackingNos property.
{
lotNo: 'alg-101',
cartons: [
{ cartonNo: 01, trackingNo: 'a' },
{ cartonNo: 02, trackingNo: 'b' },
{ cartonNo: 03, trackingNo: 'c' },
{ cartonNo: 04, trackingNo: 'a' }
],
trackingNos: [
'a',
'b',
'c'
]
}
If your data has been simplified to be posted here, you might be better off with a restructure of your database where each carton is a member of the lot document's subcollection called cartons (i.e. .../lots/alg-101/cartons/01) and combined with a collection group query.

Composite index for optional field in Cosmos

I have a collection in Cosmos DB which contains documents of different types (and schemas):
{
"partKey": "...",
"type": "type1",
"data": {
"field1": 123,
"field2": "sdfsdf"
}
}
{
"partKey": "...",
"type": "type2",
"data": {
"field3": ["123", "456", "789"]
}
}
I'm trying to create a composite index [/type, /data/field3/[]/?], but faced an issue:
The indexing path '\\/data\\/field3\\/[]\\/?' could not be accepted, failed near position '15'. Please ensure that the path is a valid path. Common errors include invalid characters or absence of quotes around labels
We don't support wildcards for Composite Indexes in Cosmos DB. Here is a composite index sample as reference.
We will update our docs to be more clear in this. I looked over these and we don't currently document this today.
Thanks.
In composite indexes, you just need to specify the paths that you want to index, rather than the values, so for your example:
"compositeIndexes":[
[
{
"path":"/type",
"order":"ascending"
},
{
"path":"/data/field3",
"order":"descending"
}
]
]
Just specify the order type you need for your queries (I've just used these ones as an example).
For different documents that have different properties underneath your data property, I believe you will have to add each composite index for each use case that you need since composite indexes don't support wildcards, so you would need to add:
/data/field1 /data/field2 etc etc
Hope this helps.

DocumentDB adding ORDER BY clause uses excessive RUs

I have a partitioned collection with about 400k documents in a particular partition. Ideally this would be more distributed, but I need to deal with all the documents in the same partition for transaction considerations. I have a query which includes the partition key and the document id, which returns quickly with 2.58 RUs of usage.
This query is dynamic and potentially could be constructed to have an IN clause to search for multiple document ids. As such I added an ORDER BY to ensure the results were in a consistent order, adding the clause however caused the RUs to skyrocket to almost 6000! Given that the WHERE clause should be filtering down the results to a handful before sorting, I was surprised by these results. It almost seems like it's applying the ORDER BY before the WHERE clause, which must not be correct. Is there something under the covers with the ORDER BY clause that would explain this behavior?
Example document:
{ "DocumentType": "InventoryRecord", (PartitionKey, String) "id": "7867f600-c011-85c0-80f2-c44d1cf09f36", (DocDB assigned GUID, stored as string) "ItemNumber": "123345", (String) "ItemName": "Item1" (String) }
With a Query looking like this:
SELECT * FROM c where c.DocumentType = 'InventoryRecord' and c.id = '7867f600-c011-85c0-80f2-c44d1cf09f36' order by c.ItemNumber
You should at least put a range index to ItemNumber. This should ensure, there is a ordering as expected. The addition in your indexing policy this would look like
{
"path": "/ItemNumber/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}

Advanced multiple search query in Firebase

This is my Firebase database inside "/articles", which has loads of articles inside. A user can (using his/her own article), list other articles that correspond to certain conditions. In order for a article to pass the query test, it has to be of category that the user's article has listed inside "tradableCategories", while also THAT article needs to have the user's article's category within its "tradableCategories".
Here’s the database structure:
"articles": {
"article1": {
"title": "Car",
"category": "vehicles",
"owner": "user1",
"tradableCategories": {
"furnishings": true,
"other": true,
"vehicles": true
},
"category_tradableCategories": {
"vehicles_furnishings": true,
"vehicles_other": true,
"vehicles_vehicles": true
}
},
"article2": {
"title": "Bike",
"category": "vehicles",
"owner": "user2",
"tradableCategories": {
"furnishings": true,
"other": true
"vehicles": true,
},
"category_tradableCategories": {
"vehicles_furnishings": true,
"vehicles_other": true,
"vehicles_vehicles": true
}
},
"article2": {
"title": "Couch",
"category": "furnishings",
"owner": "user2",
"tradableCategories": {
"furnishings": true,
"other": true,
"vehicles": true
},
"category_tradableCategories": {
"furnishings_furnishings": true,
"furnishings_other": true,
"furnishings_vehicles": true
}
},
...
}
user1 owns article1, which wants to find articles that are within furnishings, other and vehicles. Those articles that match the conditions also have to look for article1’s set category. The query can be done easily using SQL:
SELECT *
FROM articles
WHERE category = ’vehicles’ /* This is article1’s category */
AND find_in_set(category, :tradableCategories) /* :tradableCategories is a stringified, comma-separated set of article1’s tradableCategories: “furnishings,other,vehicles” */
AND NOT owner = ‘user1’
As you’ve seen in the database structure. I have included another object called “category_tradableCategories”. I’ve seen various answers here on Stack Overflow that explain how to search for items using two conditions combined into one. This could’ve worked but means that I have to initiate 3 Firebase queries since I cannot combine three (or more) different categories within tradableCategories.
I am afraid this is too complicated for Firebase, but if there is any efficient solution to this I’d like some help. Thank you!
In relational databases you often first define your data model to match with the data you want to store and then write queries for the use-cases of your app. In NoSQL databases you typically use the inverse logic: you make a list of your app's use-cases and then define your data model to match those.
If Firebase's API doesn't directly support the query you want to build, you'll typically have to change/augment your data model to allow that query. This will lead to storing more data and more complex updates, but the advantage is that you have faster and simpler read operations.
So in your scenario: you want a list of articles in one of three categories that is not owned by the current user. The most direct mapping of that requirement would be to literally store that list:
user_articles
$uid
categories_1_2_3
articlekey1: true
articlekey2: true
This would make the query trivial: ref.child("user_articles").child(currentUser.uid).child(categories).on("child_added"....
Now this may be taking the denormalization and duplication a bit too far. We'd need a separate list for each user/category combination. So an article in 3 categories with 10 users would end up in 60 lists.
More likely you'll want to keep these articles-per-categories in a single list across all users. For example:
articles_by_category_with_owner
category_1
articlekey1: uid1
articlekey2: uid2
articlekey3: uid1
category_2
articlekey1: uid1
articlekey2: uid2
category_3
articlekey1: uid1
articlekey3: uid1
Now you can get all article keys with category_1 with ref.child("articles_by_category_with_owner").child(category).on("child_added"... and then do the "not owned by the current user" filtering client-side.
In the above list I've also removed the multiple-categories. That does mean that you'll need to read a node for each category. But this is actually not as slow as you may expect, since Firebase pipelines these requests (see link below).
Further recommended reading/viewing:
NoSQL data modeling
Firebase for SQL developers
Questions/answers from this list
Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly
Query based on multiple where clauses in firebase

Resources