Recursively query for all linked documents in CosmosDB

Recursively query for all linked documents in CosmosDB - azure-cosmosdb

We've built a Document schema where job documents are uniquely identified by an Id property, and represent their link to a parent by a ParentJobId property.
For example:
{
"Type": "Request",
"StateName": "Success",
"id": "4365b7ec-6eee-468a-94f6-ab65d6434611",
"ParentJobId": null
},
{
"Type": "Machine",
"StateName": "ChildJobFailed",
"id": "27040208-add5-97e4-6bd2-d991de73c9b5",
"ParentJobId": "4365b7ec-6eee-468a-94f6-ab65d6434611"
},
{
"Type": "Application",
"StateName": "Error",
"id": "7ef36990-c321-81dd-a0c7-3b04fd64c86f",
"ParentJobId": "27040208-add5-97e4-6bd2-d991de73c9b5"
}
How can I query for all documents that are related to the root parent job?

There is no way in CosmosDB to do that in a single query. You could, of course recursively walk the tree with multiple round trips. You could even do it in one round trip to a stored procedure that you wrote that did multiple requests.
However, I've found that the best way to model hierarchies (trees) for fast retrieval in NoSQL databases is as an array containing a materialized path. Look at this example:
documents = [
{id: 'A', hierarchy: [1, 2, 3]},
{id: 'B', hierarchy: [1, 2, 4]},
{id: 'C', hierarchy: [5]},
{id: 'D', hierarchy: [1, 6]},
]
"A" is "in" Project 3 whose parent is Project 2, whose parent is Project 1. "B" is "in" Project 4 whose parent is Project 2 which still has Project 1 as its parent. Project 5 is another root Project like Project 1; and "D" is "in" Project 6 which is a child of project 1.
Now send in a query like this:
SELECT * FROM c WHERE ARRAY_CONTAINS(c.hierarchy, 1)
It will return documents A, B, and D. Try:
SELECT * FROM c WHERE ARRAY_CONTAINS(c.hierarchy, 2)
It will just return documents A, and B.
One word of caution though, I don't know how performant this approach is in DocumentDB which I don't think allows indexes on array fields. Maybe one of the DocumentDB product managers that monitor Stack Overflow can chime in on this.
This approach is commonly used with NoSQL databases like CouchDB and MongoDB (combining materialized path and array of ancestors) and even SQL databases supporting array types like Postgres.

Related

How to filter elements by node key in JSONPath?

What I want
Apply a JSONPath to given json response, to match specific elements by comparing their children's node keys with a value.
Input
{
"data": {
"ticket": {
"1": "foo",
"2": "bar",
"3": "baz"
}
}
}
Output (expected)
"3": "baz"
Case description
I want to apply a JSONPath expression, to filter ticket elements with ticket key greater than "2", so in this case it should match only the 3rd "baz" ticket.
ticket keys are only integer numbers in my data
Code area
This matches all node keys aka ticket keys
$.data.ticket.*~
This is a basic example of filtering
$..book[?(#.price<10)] // -> filter all books cheaper than 10
I am trying somehow to combine them in order to achieve the desired result
Where I test it
https://jsonpath.com/
References
https://goessner.net/articles/JsonPath/

It is possible with jsonpath-plus. The site https://jsonpath.com/ uses jsonpath-plus library internally.
It has some convenient additions or elaborations not provided in the original spec of jsonpath.
Use the #property to compare the value of the key.
$.data.ticket[?(#property > 2)]

Document stores (e.g. Firebase) - smaller documents or more updates?

I am learning Firebase after many years of using SQL RDBMSs. This is definitely a challenge.
Say, I have a collection of objects. Each object can belong to any number of categories. Categories have user-editable labels (e.g. user may rename the label after the fact.
SQL RDBMS
So, in RDBMS I would have:
Object table -> { object_id, ... }
Category table -> { category_id, label, ... }
ObjectCategory -> { object_id, category_id }
I see the following options to implement this in Firebase:
1. Objects collection with category label arrays in objects:
/user/objects -> [{ object_id, categories: [ 'category_label1', 'category_label2' ] }, ... ]
Seems yucky. Renaming/deleting a category will mean updating all the objects.
2. Objects referring categories by id
/user/objects -> [{ object_id, categories: [ 'category_id1', 'category_id2' ] }, ... ]
/user/categories -> [{category_id, label, is_deleted: false}, ...]
This seems more reasonable and maintainable. Except sometimes (I think pretty rarely) there will be 2 queries.
3. Collection of object and object categories
/user/objects -> [{object_id1, ...}, {object_id2, ...}]
/user/object_id1/labels -> [{categorylabel1}, {categorylabel2}]
This is largely comparable to option 1 but requires less churn on object documents and makes updates smaller. Renaming/deleting a category becomes a pain.
So, what is the recommended approach?

In Gremlin, how can I group pairs of elements by a property from one of them?

After some traversal I select the elements I'm interested in through select(). How can I group by one of the properties from one specific element.
What I did:
g.V() // ... some traversal happens here where I obtain a and b
select('a','b').by(valueMap('Name', 'Description', 'Label'))
Right now this gets me all the data I'm interested in, something like:
[
{
"a": { "Name": "A name" ... },
"b": { "Name": "other name" ... },
}
...
]
But I know that b.Name repeats among different pairs of a,b, and so I would like to group all the a elements under their common b element, I think this should be easy to do, but so far I'm unable to do it.

It's probably better to rewrite the whole traversal, but since you kept it as a secret, here's how you would do the post-grouping:
g.V()...
select('a','b').
by(valueMap('Name', 'Description', 'Label')).
group().
by(select('b')).
by(select('a').fold())

firestore array-like data structure indexing limitation

In the firestore documentation, there is a description below
Indexing limits - A single document can have only 20,000 properties in
order to use Cloud Firestore built-in indexes. If your array-like data
structure grows to tens of thousands of members, you may run into this
limit.
https://cloud.google.com/firestore/docs/solutions/arrays
I want to know how I can interpret the description.
Which of two patterns meet the limitation?
<pattern 1: categories in one document above 20,000>
doc1
- id:111
- categories: {aaaa:true, aaab:false, aaac:true, aaad: false, aaae:true, aaaf:true, aaag:true, aaah:true, aaai:true, aaaj:true, ,,,,,,,,,,, }
another pattern
<pattern 2: categories in one document is a few but as a collection of document number of categories above 20,000>
doc_1
- id:111
categories{aaaa:true, aaab:false, aaac:true, only several element}
doc_2
- id:111
categories{aaad:true, aaae:false, aaaf:true, only several element}
doc_3
- id:111
categories{aaag:true, aaah:false, aaai:true, only several element}
I believe that pattern 1 reach the limit, but is pattern 2 reach the limit?

The limit is on the total number of properties, so it's possible that both patterns could hit the 20,000 limit.
Here are some examples of counting properties that may help:
This document has two properties: a and b.c
{
a: "foo",
b: {
c: "bar"
}
}
This document has four properties: a, b, b.c, d
{
a: "foo",
b: {
c: "bar",
},
d: ["quz", "qaz"]
}
And this document has four as well:
{
a: "foo",
b: {
c: "bar",
},
d: ["quz", "qaz", "apple", "banana"]
}
This document has five:
{
a: "foo",
b: {
c: "bar",
},
d: ["quz", "qaz"],
e: ["apple", "banana"]
}
So it's not about the length of any single array or how deeply nested things are, it's about the total number of queryable values.
EDIT 03/05/18: I was wrong before when I said that array members counted separately against the index. They do not, that was something we had in place when Firestore was in Alpha that never applied in a public release.

DynamoDB nested attribute querying support

Does Amazon DynamoDB scan operation allow you to query on nested attributes of type Array or Object? For example,
{
Id: 206,
Title: "20-Bicycle 206",
Description: "206 description",
RelatedItems: [
341,
472,
649
],
Pictures: {
FrontView: "123",
RearView: "456",
SideView: "789"
}
}
Can I query on RelatedItems[2] or Pictures.RearView attributes?

Yes, you can use a Filter Expression, which is just like Condition Expression. The section that talks about the functions that you can use in these types of expressions mentions the following:
"For a nested attribute, you must provide its full path; for more information, see Document Paths."
The Document Paths reference has examples on how to reference nested attributes in DynamoDB data types like List (what you are calling an array) and Map (what you are calling an object). Check out that reference for examples on how to do so:
MyList[0]
AnotherList[12]
ThisList[5][11]
MyMap.nestedField
MyMap.nestedField.deeplyNestedField

Please note that in DyanomoDB query and scan are quite different (scan is a much costlier operation). So while you can filter on both as pointed out by #coffeeplease; you can only query/index on:
The key schema for the index. Every attribute in the index key schema must be a top-level attribute of type String, Number, or Binary. Other data types, including documents and sets, are not allowed (ref).

Yes, you can by passing list or value.
data = table.scan(FilterExpression=Attr('RelatedItems').contains([1, 2, 3]) & Attr('Pictures.RearView').eq('1'))

Yes, you can query on nested attributes of type array or object using scan or query .
Reference for Python boto3:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/dynamodb.html#querying-and-scanning
Example: Suppose you want to find out records for which the RearView" > 500 and second item of RelatedItems" > 200, you can do the following:
data = table.scan(
FilterExpression=Attr('RelatedItems[1]').gt('200') & Attr('Pictures.RearView').gt('500'))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Recursively query for all linked documents in CosmosDB - azure-cosmosdb

Related

How to filter elements by node key in JSONPath?

Document stores (e.g. Firebase) - smaller documents or more updates?

In Gremlin, how can I group pairs of elements by a property from one of them?

firestore array-like data structure indexing limitation

DynamoDB nested attribute querying support

Categories

Resources