Document stores (e.g. Firebase) - smaller documents or more updates? - firebase

I am learning Firebase after many years of using SQL RDBMSs. This is definitely a challenge.
Say, I have a collection of objects. Each object can belong to any number of categories. Categories have user-editable labels (e.g. user may rename the label after the fact.
SQL RDBMS
So, in RDBMS I would have:
Object table -> { object_id, ... }
Category table -> { category_id, label, ... }
ObjectCategory -> { object_id, category_id }
I see the following options to implement this in Firebase:
1. Objects collection with category label arrays in objects:
/user/objects -> [{ object_id, categories: [ 'category_label1', 'category_label2' ] }, ... ]
Seems yucky. Renaming/deleting a category will mean updating all the objects.
2. Objects referring categories by id
/user/objects -> [{ object_id, categories: [ 'category_id1', 'category_id2' ] }, ... ]
/user/categories -> [{category_id, label, is_deleted: false}, ...]
This seems more reasonable and maintainable. Except sometimes (I think pretty rarely) there will be 2 queries.
3. Collection of object and object categories
/user/objects -> [{object_id1, ...}, {object_id2, ...}]
/user/object_id1/labels -> [{categorylabel1}, {categorylabel2}]
This is largely comparable to option 1 but requires less churn on object documents and makes updates smaller. Renaming/deleting a category becomes a pain.
So, what is the recommended approach?

Related

Declare two fields of a struct as mutually exclusive in CueLang?

I want to ensure that my users only set one of two fields:
rotations:
- type: weekly
time_restrictions:
# Allow only ONE of the following fields:
weekday_time_of_day: {...}
time_of_day: [...]
I came across the OneOf pattern on Cuetorials, but this does only seem to help when wanting to enforce a schema while writing cue files.
#OneOfTimeRestrictions: {time_of_day: [...string]} | { weekday_time_of_day: {...string} }
rotations: [{
type: *"weekly" | "daily"
restrictions: #oneOf_timerestrictions | {} # won't work, naturally, because nothing is "chosen"
}]
(the the values of the mutually exclusive fields are actually additional, more complex structs, not strings, in case that might matter - but for the sake of a shorter example I've omitted them).
However, I'm trying to vet YAML instead.
The problem is that when defining this:
#OneOfTimeRestrictions:
rotations: [{
type: *"weekly" | "daily"
restrictions: {time_of_day: [...string]} | { weekday_time_of_day: {...string} }
}]
Both fields are acceptable, including when giving them at the same time.
Pointers?

Firestore how to query field containing any of the values in the array

Say my Firestore DB contains a collection of documents, each with a field that contains a large array of numbers. For Example:
{
arr: [11,13,24,16,37,50]
},
{
arr: [12,34,55,56]
},
{
arr: [12,16,27,59]
}
How can I make a query that returns all the documents where the 'arr' field contains any of the values in a certain array?
For example, if I query with [16,13] I get the first and third documents only (first one contains both 16 and 13, third one contains 16).
Please note that both the 'arr' array and the array in my query can contain a large number of values (> 1000), so I can't use 'array-contains-any'.
Is it possible to do that?
Can I structure my DB differently in order to achieve that goal?
As you know, array-contains-any support up to 10 comparison values. If you have a large array of over 1000 values, then you might be better to restructure your collection to something like this:
Example:
arr:
arr_name:
title: "arr_title"
arr_1: {
"10": true,
"11": true,
"12": true
}
Example:
db.collection("arr_name")
.whereField("arr_1.10", isEqualTo: true)
.whereField("arr_1.12", isEqualTo: true)

firestore array-like data structure indexing limitation

In the firestore documentation, there is a description below
Indexing limits - A single document can have only 20,000 properties in
order to use Cloud Firestore built-in indexes. If your array-like data
structure grows to tens of thousands of members, you may run into this
limit.
https://cloud.google.com/firestore/docs/solutions/arrays
I want to know how I can interpret the description.
Which of two patterns meet the limitation?
<pattern 1: categories in one document above 20,000>
doc1
- id:111
- categories: {aaaa:true, aaab:false, aaac:true, aaad: false, aaae:true, aaaf:true, aaag:true, aaah:true, aaai:true, aaaj:true, ,,,,,,,,,,, }
another pattern
<pattern 2: categories in one document is a few but as a collection of document number of categories above 20,000>
doc_1
- id:111
categories{aaaa:true, aaab:false, aaac:true, only several element}
doc_2
- id:111
categories{aaad:true, aaae:false, aaaf:true, only several element}
doc_3
- id:111
categories{aaag:true, aaah:false, aaai:true, only several element}
I believe that pattern 1 reach the limit, but is pattern 2 reach the limit?
The limit is on the total number of properties, so it's possible that both patterns could hit the 20,000 limit.
Here are some examples of counting properties that may help:
This document has two properties: a and b.c
{
a: "foo",
b: {
c: "bar"
}
}
This document has four properties: a, b, b.c, d
{
a: "foo",
b: {
c: "bar",
},
d: ["quz", "qaz"]
}
And this document has four as well:
{
a: "foo",
b: {
c: "bar",
},
d: ["quz", "qaz", "apple", "banana"]
}
This document has five:
{
a: "foo",
b: {
c: "bar",
},
d: ["quz", "qaz"],
e: ["apple", "banana"]
}
So it's not about the length of any single array or how deeply nested things are, it's about the total number of queryable values.
EDIT 03/05/18: I was wrong before when I said that array members counted separately against the index. They do not, that was something we had in place when Firestore was in Alpha that never applied in a public release.

Recursively query for all linked documents in CosmosDB

We've built a Document schema where job documents are uniquely identified by an Id property, and represent their link to a parent by a ParentJobId property.
For example:
{
"Type": "Request",
"StateName": "Success",
"id": "4365b7ec-6eee-468a-94f6-ab65d6434611",
"ParentJobId": null
},
{
"Type": "Machine",
"StateName": "ChildJobFailed",
"id": "27040208-add5-97e4-6bd2-d991de73c9b5",
"ParentJobId": "4365b7ec-6eee-468a-94f6-ab65d6434611"
},
{
"Type": "Application",
"StateName": "Error",
"id": "7ef36990-c321-81dd-a0c7-3b04fd64c86f",
"ParentJobId": "27040208-add5-97e4-6bd2-d991de73c9b5"
}
How can I query for all documents that are related to the root parent job?
There is no way in CosmosDB to do that in a single query. You could, of course recursively walk the tree with multiple round trips. You could even do it in one round trip to a stored procedure that you wrote that did multiple requests.
However, I've found that the best way to model hierarchies (trees) for fast retrieval in NoSQL databases is as an array containing a materialized path. Look at this example:
documents = [
{id: 'A', hierarchy: [1, 2, 3]},
{id: 'B', hierarchy: [1, 2, 4]},
{id: 'C', hierarchy: [5]},
{id: 'D', hierarchy: [1, 6]},
]
"A" is "in" Project 3 whose parent is Project 2, whose parent is Project 1. "B" is "in" Project 4 whose parent is Project 2 which still has Project 1 as its parent. Project 5 is another root Project like Project 1; and "D" is "in" Project 6 which is a child of project 1.
Now send in a query like this:
SELECT * FROM c WHERE ARRAY_CONTAINS(c.hierarchy, 1)
It will return documents A, B, and D. Try:
SELECT * FROM c WHERE ARRAY_CONTAINS(c.hierarchy, 2)
It will just return documents A, and B.
One word of caution though, I don't know how performant this approach is in DocumentDB which I don't think allows indexes on array fields. Maybe one of the DocumentDB product managers that monitor Stack Overflow can chime in on this.
This approach is commonly used with NoSQL databases like CouchDB and MongoDB (combining materialized path and array of ancestors) and even SQL databases supporting array types like Postgres.

We get data with ORDER BY ASC but NOT BY DESC

We got multiple odd scenarios.
For example:
a) We are unable to order by _ts : empty results
SELECT * FROM data ORDER BY data._ts DESC
b) We can ORDER BY ASC and we get Results(more than >100). But if we ORDER BY DESC we get Zero results, has no sense for us :( ,
Assuming that c is an integer, this is the behavior we are seeing:
SELECT * FROM data ORDER BY data.c ASC = RESULTS
SELECT * FROM data ORDER BY data.c DESC = zero results
c) We have an UDF to do contains insesitive, but is not working for all cases, JS function its tested outside and IT is working, we don't understand
SELECT * FROM data r where udf.TEST(r.c, "AS") = RESULTS
SELECT * FROM data r where udf.TEST(r.c, "health") = zero results (but by other field I can find tha value)
Thanks a lot!
jamesjara and I synced offline... posting our discussion here for everyone else's benefit :)
1) Query response limits and continuation tokens
There are limits for how long a query will execute on DocumentDB. These limits include the query's resource consumption (you can ballpark this w/ the amount of provisioned RU/sec * 5 sec + an undisclosed buffer), response size (1mb), and timeout (5 sec).
If these limits are hit, then a partial set of results may be returned. The work done by the query execution is preserved by passing the state back in the form of a continuation token (x-ms-continuation in the HTTP response header). You can resume the execution of the query by passing the continuation token in a follow-up query. The Client SDKs make this interaction easier by automatically paging through results via toList() or toArray() (depending on the SDK flavor).
It's possible to get an empty page in the result. This can happen when the resource consumption limit is reached before the query engine finds the first result (e.g. when scanning through a collection to look for few documents in a large dataset).
2) ORDER BY and Index Policies
In order to use ORDER BY or range comparisons (<, >, etc) within your queries, you should specify an index policy that contains a Range index with the maximum precision (precision = -1) over the JSON properties used to sort with. This allows the query engine to take advantage of an index.
Otherwise, you can force a scan by specifying the x-ms-documentdb-query-enable-scan HTTP request header w/ the value set to true. In the client SDKs, this is exposed via the FeedOptions object.
Suggested Index Policy for ORDER BY:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
},
{
"path": "/_ts/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
}
],
"excludedPaths": []
}
3) UDFs and indexing
UDFs are not able to take advantage of indexes, and will result in a scan. Therefore, it is advised to include additional filters in your queries WHERE clause to reduce the amount of documents to be scanned.

Resources