Impact of Mongo API projecion in Cosmos Request unit charge - azure-cosmosdb

I can read from Microsoft documentation the RU is impacted due to document size as per the documentation. Is it the size of stored document or the retrieved document. I have a document with lot of entries under the nested level. If I retrieve only the property at level 1 will it reduce the RU?
For example the document is show below. Consider the association level has more than 15000 entries
{
"name": "hi",
"data":"demo",
"associations": [
{
"name": "assoc1"
},
{
"name": "assoc2"
},
{
"name": "assoc3"
},
{
"name": "assoc4"
},
{
"name": "assoc5"
}
]
}
Wil there a difference in RU between the two mongo queries considering the document size is 500KB?
Query without projection:
db.getCollection("demo").find({"name":"hi"})
Query with projection:
db.getCollection("demo").find( {"name":"hi"} , {"data":true} )
I noticed a change in RU between this two query. But I didn't see this mentioned in the document I searched.

If the query engine needs to traverse a large document to project results then it will consume more RU/s than when it doesn't.
The bigger issue I think is a document with an array of more than 15K items. Unbounded or super large arrays is generally not a good pattern for Cosmos DB, especially if they have asymmetric update patterns because updates will replace the entire document.

Related

Query dynamodb db list items with IN clause

I have a dynamodb table whose items have below structures.
{
"url": "some-url1",
"dependencies": [
"dependency-1",
"dependency-2",
"dependency-3",
"dependency-4"
],
"status": "active"
}
{
"url": "some-url2",
"dependencies": [
"dependency-2",
],
"status": "inactive"
}
{
"url": "some-url3",
"dependencies": [
"dependency-1",
],
"status": "active"
}
Here, url is defined as the partition key and there is no sort key.
The query which needs to run needs to find all the records with a specific dependency and status.
For example - find all the records for whom dependency-1 is present in dependencies list and whose status is active.
So for the above records, record 1st and 3rd should be returned.
Do I need to set GSI on dependencies or is this something which cannot be done in dynamodb ?
You cannot create a GSI on a nested value. You can however create a GSI on status but you would need to be careful as it has a low cardinality meaning you could limit your throughput to 1000 writes per second if all of your items being written to the table have the same status. Of course if you never intend to scale that high then it's no issue.
Your other option is to use a Scan where you read your entire data set and use a FilterExpression to filter based on dependency and status.
Depending on the SDK you use you can find some example operations here:
https://github.com/aws-samples/aws-dynamodb-examples/tree/master/DynamoDB-SDK-Examples

Cosmos SQL query to flatten all nested properties with a given name

We persist Cosmos documents with a tree of nodes that can have an unbounded depth, e.g. children have arrays of children who have arrays of children etc etc.
We need to query the document and flatten all properties of a given name (queryRef) from all levels in the tree.
This is the persisted document:
{
id: "1",
children: [
{
id: "2",
queryRef: {ref: "a"},
children: [
{
id: "3",
queryRef: {ref: "b"},
children: [
{
id: "5",
queryRef: {ref: "c"},
children: [
...
]
}
]
},
...
]
}
]
}
This is the desired response:
{
id: "1",
queryRefs: [
{
ref: "a",
},
{
ref: "b",
},
{
ref: "c",
}
]
}
With a query along the lines of:
SELECT c.id, c.children[...].queryRef FROM c
While Cosmos DB's query engine is designed to work on hierarchical data, it does not provide the capability to write queries that recursively traverses arrays within objects of unknown depth then project out flat as you've defined. You need to tell the query engine the structure in which to look and project back out.
Another issue here too is the structure you've defined does not follow the general rules of when to embed vs when to reference. Unbounded arrays should be modeled as individual documents rather than embedded within a single document. Doing so provides multiple benefits.
It ensures that as the amount of data grows, the performance and cost of operations on that data has consistent latency and remains efficient. As the size of a document grows, the cost of operations on it grows in a non-linear fasion.
Inserting these children as individual documents allows you to query the data using generally simple queries.
It avoids the potential your document hits the 2MB limit for a document in Cosmos DB.
You certainly can have tree-like structures within your documents and modeling one-to-few relationships is definitely fine for embedding within a single document. But whether a simple query or more complex where you need to traverse its structure, you need to have a defined model that allows you to identify individual objects to both project and apply filter predicates on the data.

Firestore - Can you query fields in nested documents?

I currently have a data structure like this in Firebase Cloud Firestore
Database
+ ProductInventories (collection)
+ productId1 (document)
+ variantName (collection)
+ [auto-ID] (document)
+ location: "Paris"
+ count: 1334
How would I make a structuredQuery in POST to get the count for location `Paris'?
Intuitively it might have been a POST to https://firestore.googleapis.com/v1/projects/projectName/databases/(default)/documents/ProductInventories/productId1:runQuery with the following JSON
{
"structuredQuery": {
"from": [
{
"collectionId": "variantName",
"allDescendants": true
}
],
"where": {
"fieldFilter": {
"field": {
"fieldPath": "location"
},
"op": "EQUAL",
"value": {
"stringValue": "Paris"
}
}
}
}
}
With this I get error collection group queries are only allowed at the root parent, which means I need to make the POST to https://firestore.googleapis.com/v1/projects/projectName/databases/(default)/documents:runQuery instead. This however means I'll need to create a collection group index exemption for each variant (variantName) I have for each productId.
Seems like I would be better off to have below variantName collection level, the location as the name of the document, and I can then access the count directly without making a query. But seems to me the point of NoSQL was that I could be less careful about how I structure the data, so I'm wondering if there's a way for me to make the query as is with the current data structure.
Using collection names that are not known ahead of time is usually an anti-pattern in Firestore. And what you get is one of the reasons for that: you need to be able to create a collection group query across documents in multiple collections, you need to be able to define an index on the collection name - and that requires that you know those some time during development.
As usual, when using NoSQL databases, you can modify/augment your data structure to allow the use-case. For example, if you create a single subcollection for all variants, you can give that collection a fixed name, and search for paris and $variantName in there. This collection can either be a replacement of your current $variantName collections, or an addition to it.
have you tried something like this?
fb.firestore().collection('ProductInventories')
.doc('productId1')
.collection('variantName')
.where('location', '==', 'Paris')
.get()
.then(res=>{
res.data().docs.forEach((product, i)=>{
console.log('item ' + i + ': ' + product.count);
})
});

DocumentDB adding ORDER BY clause uses excessive RUs

I have a partitioned collection with about 400k documents in a particular partition. Ideally this would be more distributed, but I need to deal with all the documents in the same partition for transaction considerations. I have a query which includes the partition key and the document id, which returns quickly with 2.58 RUs of usage.
This query is dynamic and potentially could be constructed to have an IN clause to search for multiple document ids. As such I added an ORDER BY to ensure the results were in a consistent order, adding the clause however caused the RUs to skyrocket to almost 6000! Given that the WHERE clause should be filtering down the results to a handful before sorting, I was surprised by these results. It almost seems like it's applying the ORDER BY before the WHERE clause, which must not be correct. Is there something under the covers with the ORDER BY clause that would explain this behavior?
Example document:
{ "DocumentType": "InventoryRecord", (PartitionKey, String) "id": "7867f600-c011-85c0-80f2-c44d1cf09f36", (DocDB assigned GUID, stored as string) "ItemNumber": "123345", (String) "ItemName": "Item1" (String) }
With a Query looking like this:
SELECT * FROM c where c.DocumentType = 'InventoryRecord' and c.id = '7867f600-c011-85c0-80f2-c44d1cf09f36' order by c.ItemNumber
You should at least put a range index to ItemNumber. This should ensure, there is a ordering as expected. The addition in your indexing policy this would look like
{
"path": "/ItemNumber/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}

Advanced multiple search query in Firebase

This is my Firebase database inside "/articles", which has loads of articles inside. A user can (using his/her own article), list other articles that correspond to certain conditions. In order for a article to pass the query test, it has to be of category that the user's article has listed inside "tradableCategories", while also THAT article needs to have the user's article's category within its "tradableCategories".
Here’s the database structure:
"articles": {
"article1": {
"title": "Car",
"category": "vehicles",
"owner": "user1",
"tradableCategories": {
"furnishings": true,
"other": true,
"vehicles": true
},
"category_tradableCategories": {
"vehicles_furnishings": true,
"vehicles_other": true,
"vehicles_vehicles": true
}
},
"article2": {
"title": "Bike",
"category": "vehicles",
"owner": "user2",
"tradableCategories": {
"furnishings": true,
"other": true
"vehicles": true,
},
"category_tradableCategories": {
"vehicles_furnishings": true,
"vehicles_other": true,
"vehicles_vehicles": true
}
},
"article2": {
"title": "Couch",
"category": "furnishings",
"owner": "user2",
"tradableCategories": {
"furnishings": true,
"other": true,
"vehicles": true
},
"category_tradableCategories": {
"furnishings_furnishings": true,
"furnishings_other": true,
"furnishings_vehicles": true
}
},
...
}
user1 owns article1, which wants to find articles that are within furnishings, other and vehicles. Those articles that match the conditions also have to look for article1’s set category. The query can be done easily using SQL:
SELECT *
FROM articles
WHERE category = ’vehicles’ /* This is article1’s category */
AND find_in_set(category, :tradableCategories) /* :tradableCategories is a stringified, comma-separated set of article1’s tradableCategories: “furnishings,other,vehicles” */
AND NOT owner = ‘user1’
As you’ve seen in the database structure. I have included another object called “category_tradableCategories”. I’ve seen various answers here on Stack Overflow that explain how to search for items using two conditions combined into one. This could’ve worked but means that I have to initiate 3 Firebase queries since I cannot combine three (or more) different categories within tradableCategories.
I am afraid this is too complicated for Firebase, but if there is any efficient solution to this I’d like some help. Thank you!
In relational databases you often first define your data model to match with the data you want to store and then write queries for the use-cases of your app. In NoSQL databases you typically use the inverse logic: you make a list of your app's use-cases and then define your data model to match those.
If Firebase's API doesn't directly support the query you want to build, you'll typically have to change/augment your data model to allow that query. This will lead to storing more data and more complex updates, but the advantage is that you have faster and simpler read operations.
So in your scenario: you want a list of articles in one of three categories that is not owned by the current user. The most direct mapping of that requirement would be to literally store that list:
user_articles
$uid
categories_1_2_3
articlekey1: true
articlekey2: true
This would make the query trivial: ref.child("user_articles").child(currentUser.uid).child(categories).on("child_added"....
Now this may be taking the denormalization and duplication a bit too far. We'd need a separate list for each user/category combination. So an article in 3 categories with 10 users would end up in 60 lists.
More likely you'll want to keep these articles-per-categories in a single list across all users. For example:
articles_by_category_with_owner
category_1
articlekey1: uid1
articlekey2: uid2
articlekey3: uid1
category_2
articlekey1: uid1
articlekey2: uid2
category_3
articlekey1: uid1
articlekey3: uid1
Now you can get all article keys with category_1 with ref.child("articles_by_category_with_owner").child(category).on("child_added"... and then do the "not owned by the current user" filtering client-side.
In the above list I've also removed the multiple-categories. That does mean that you'll need to read a node for each category. But this is actually not as slow as you may expect, since Firebase pipelines these requests (see link below).
Further recommended reading/viewing:
NoSQL data modeling
Firebase for SQL developers
Questions/answers from this list
Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly
Query based on multiple where clauses in firebase

Resources