firestore array-like data structure indexing limitation - firebase

In the firestore documentation, there is a description below
Indexing limits - A single document can have only 20,000 properties in
order to use Cloud Firestore built-in indexes. If your array-like data
structure grows to tens of thousands of members, you may run into this
limit.
https://cloud.google.com/firestore/docs/solutions/arrays
I want to know how I can interpret the description.
Which of two patterns meet the limitation?
<pattern 1: categories in one document above 20,000>
doc1
- id:111
- categories: {aaaa:true, aaab:false, aaac:true, aaad: false, aaae:true, aaaf:true, aaag:true, aaah:true, aaai:true, aaaj:true, ,,,,,,,,,,, }
another pattern
<pattern 2: categories in one document is a few but as a collection of document number of categories above 20,000>
doc_1
- id:111
categories{aaaa:true, aaab:false, aaac:true, only several element}
doc_2
- id:111
categories{aaad:true, aaae:false, aaaf:true, only several element}
doc_3
- id:111
categories{aaag:true, aaah:false, aaai:true, only several element}
I believe that pattern 1 reach the limit, but is pattern 2 reach the limit?

The limit is on the total number of properties, so it's possible that both patterns could hit the 20,000 limit.
Here are some examples of counting properties that may help:
This document has two properties: a and b.c
{
a: "foo",
b: {
c: "bar"
}
}
This document has four properties: a, b, b.c, d
{
a: "foo",
b: {
c: "bar",
},
d: ["quz", "qaz"]
}
And this document has four as well:
{
a: "foo",
b: {
c: "bar",
},
d: ["quz", "qaz", "apple", "banana"]
}
This document has five:
{
a: "foo",
b: {
c: "bar",
},
d: ["quz", "qaz"],
e: ["apple", "banana"]
}
So it's not about the length of any single array or how deeply nested things are, it's about the total number of queryable values.
EDIT 03/05/18: I was wrong before when I said that array members counted separately against the index. They do not, that was something we had in place when Firestore was in Alpha that never applied in a public release.

Related

Declare two fields of a struct as mutually exclusive in CueLang?

I want to ensure that my users only set one of two fields:
rotations:
- type: weekly
time_restrictions:
# Allow only ONE of the following fields:
weekday_time_of_day: {...}
time_of_day: [...]
I came across the OneOf pattern on Cuetorials, but this does only seem to help when wanting to enforce a schema while writing cue files.
#OneOfTimeRestrictions: {time_of_day: [...string]} | { weekday_time_of_day: {...string} }
rotations: [{
type: *"weekly" | "daily"
restrictions: #oneOf_timerestrictions | {} # won't work, naturally, because nothing is "chosen"
}]
(the the values of the mutually exclusive fields are actually additional, more complex structs, not strings, in case that might matter - but for the sake of a shorter example I've omitted them).
However, I'm trying to vet YAML instead.
The problem is that when defining this:
#OneOfTimeRestrictions:
rotations: [{
type: *"weekly" | "daily"
restrictions: {time_of_day: [...string]} | { weekday_time_of_day: {...string} }
}]
Both fields are acceptable, including when giving them at the same time.
Pointers?

Document stores (e.g. Firebase) - smaller documents or more updates?

I am learning Firebase after many years of using SQL RDBMSs. This is definitely a challenge.
Say, I have a collection of objects. Each object can belong to any number of categories. Categories have user-editable labels (e.g. user may rename the label after the fact.
SQL RDBMS
So, in RDBMS I would have:
Object table -> { object_id, ... }
Category table -> { category_id, label, ... }
ObjectCategory -> { object_id, category_id }
I see the following options to implement this in Firebase:
1. Objects collection with category label arrays in objects:
/user/objects -> [{ object_id, categories: [ 'category_label1', 'category_label2' ] }, ... ]
Seems yucky. Renaming/deleting a category will mean updating all the objects.
2. Objects referring categories by id
/user/objects -> [{ object_id, categories: [ 'category_id1', 'category_id2' ] }, ... ]
/user/categories -> [{category_id, label, is_deleted: false}, ...]
This seems more reasonable and maintainable. Except sometimes (I think pretty rarely) there will be 2 queries.
3. Collection of object and object categories
/user/objects -> [{object_id1, ...}, {object_id2, ...}]
/user/object_id1/labels -> [{categorylabel1}, {categorylabel2}]
This is largely comparable to option 1 but requires less churn on object documents and makes updates smaller. Renaming/deleting a category becomes a pain.
So, what is the recommended approach?

Firestore how to query field containing any of the values in the array

Say my Firestore DB contains a collection of documents, each with a field that contains a large array of numbers. For Example:
{
arr: [11,13,24,16,37,50]
},
{
arr: [12,34,55,56]
},
{
arr: [12,16,27,59]
}
How can I make a query that returns all the documents where the 'arr' field contains any of the values in a certain array?
For example, if I query with [16,13] I get the first and third documents only (first one contains both 16 and 13, third one contains 16).
Please note that both the 'arr' array and the array in my query can contain a large number of values (> 1000), so I can't use 'array-contains-any'.
Is it possible to do that?
Can I structure my DB differently in order to achieve that goal?
As you know, array-contains-any support up to 10 comparison values. If you have a large array of over 1000 values, then you might be better to restructure your collection to something like this:
Example:
arr:
arr_name:
title: "arr_title"
arr_1: {
"10": true,
"11": true,
"12": true
}
Example:
db.collection("arr_name")
.whereField("arr_1.10", isEqualTo: true)
.whereField("arr_1.12", isEqualTo: true)

Recursively query for all linked documents in CosmosDB

We've built a Document schema where job documents are uniquely identified by an Id property, and represent their link to a parent by a ParentJobId property.
For example:
{
"Type": "Request",
"StateName": "Success",
"id": "4365b7ec-6eee-468a-94f6-ab65d6434611",
"ParentJobId": null
},
{
"Type": "Machine",
"StateName": "ChildJobFailed",
"id": "27040208-add5-97e4-6bd2-d991de73c9b5",
"ParentJobId": "4365b7ec-6eee-468a-94f6-ab65d6434611"
},
{
"Type": "Application",
"StateName": "Error",
"id": "7ef36990-c321-81dd-a0c7-3b04fd64c86f",
"ParentJobId": "27040208-add5-97e4-6bd2-d991de73c9b5"
}
How can I query for all documents that are related to the root parent job?
There is no way in CosmosDB to do that in a single query. You could, of course recursively walk the tree with multiple round trips. You could even do it in one round trip to a stored procedure that you wrote that did multiple requests.
However, I've found that the best way to model hierarchies (trees) for fast retrieval in NoSQL databases is as an array containing a materialized path. Look at this example:
documents = [
{id: 'A', hierarchy: [1, 2, 3]},
{id: 'B', hierarchy: [1, 2, 4]},
{id: 'C', hierarchy: [5]},
{id: 'D', hierarchy: [1, 6]},
]
"A" is "in" Project 3 whose parent is Project 2, whose parent is Project 1. "B" is "in" Project 4 whose parent is Project 2 which still has Project 1 as its parent. Project 5 is another root Project like Project 1; and "D" is "in" Project 6 which is a child of project 1.
Now send in a query like this:
SELECT * FROM c WHERE ARRAY_CONTAINS(c.hierarchy, 1)
It will return documents A, B, and D. Try:
SELECT * FROM c WHERE ARRAY_CONTAINS(c.hierarchy, 2)
It will just return documents A, and B.
One word of caution though, I don't know how performant this approach is in DocumentDB which I don't think allows indexes on array fields. Maybe one of the DocumentDB product managers that monitor Stack Overflow can chime in on this.
This approach is commonly used with NoSQL databases like CouchDB and MongoDB (combining materialized path and array of ancestors) and even SQL databases supporting array types like Postgres.

Using Parameters in Neo4j Relationship Queries

I'm struggling to work around a small limitation of Neo4j in that I am unable to use a parameter in the Relationship section of a Cypher query.
Christophe Willemsen has already graciously assisted me in working my query to the following:
MATCH (n1:Point { name: {n1name} }),
(n2:Point { name: {n2name} }),
p = shortestPath((n1)-[r]->(n2))
WHERE type(r) = {relType}
RETURN p
Unfortunately as r is a Collection of relationships and not a single relationship, this fails with an error:
scala.collection.immutable.Stream$Cons cannot be cast to org.neo4j.graphdb.Relationship
Removing the use of shortestPath() allows the query to run successfully but returns no results.
Essentially my graph is a massive collection of "paths" that link "points" together. It is currently structured as such:
http://console.neo4j.org/r/rholp
I need to be able to provide a starting point (n1Name), an ending point (n2Name), and a single path to travel along (relType). I need a list of nodes to come out of the query (all the ones along the path).
Have I structured my graph incorrectly / not optimally? I am open to advice on whether the overall structure is not optimal as well as advice on how best to structure the query!
EDIT
Regarding your edit, the nodes() function returns you the nodes along the path :
MATCH p=allShortestPaths((n:Point { name:"Point5" })-[*]->(n2:Point { name:"Point8" }))
WHERE ALL (r IN rels(p) WHERE type(r)={relType})
RETURN nodes(p)
In the console link, it is returning nodes Points 5,6,7,8
I guess in your case that using a common relationship type name for connecting your Point nodes would be more efficient.
If having a Path1, Path2, .. is for knowing the distance between two points, you can easily know the distance by asking for the length of the path, like this query related to your console link :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=shortestPath((n)-[]->(n2))
RETURN length(p)
If you need to return only paths having a defined relationship length, you can use it without the shortestPath by specifying a strict depth :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=(n)-[*3..3]->(n2)
RETURN length(p)
LIMIT1
As you can see here, the need to specify the relationship is not mandatory, you can just omit it or add the :NEXT type if you have other relationship types in your graph
If you need to match on the type, for e.g. the path from point 5 to point 8 in your console link, and the path can only have a PATH_TWO relationship, then you can do this :
MATCH (n:Point { name:"Point5" })
WITH n
MATCH (n2:Point { name:"Point8" })
WITH n, n2
MATCH p=(n)-[r*]->(n2)
WHERE type(r[0])= 'PATH_TWO'
WITH p, length(p) AS l
ORDER BY l
RETURN p, l
LIMIT 1
If you really NEED to have the Path1, Path2 style, maybe a short explanation on the need could help us find the more appropriate query
MATCH p=shortestpath((n1:Point{name:{n1name}})-[:relType *]->(n2:Point {name:{n2name}}))
RETURN p

Resources