How to move a subcollection on CloudFirestore - firebase

We are planning to implement a virtual filesystem using Google Firestore.
The idea of subcollections is nice because it allows us to model our data in terms of a folder hierarchy, like so: /folders/folderA/entities/folderB/entities/fileX
Much like an actual filesystem, we'd like to handle cross-folder moves, such as moving nested subfolder folderB from parent folderA to parent folderC. Indeed, it will often be the case that the folder we want to move may themselves contain their own subcollections of files and folders an arbitrary K levels deep.
This comment suggests that moving a document will not automagically move its associated subcollections. Similarly, deleting a document will forego deleting its underlying subcollections, leaving them as orphans. It seems like the only way to move a folder (and its entities) from one parent to another would be through a recursive clone + delete strategy, which may be difficult to accomplish reliably and transactionally if its sub-entities are massive.
The alternative is to abandon using subcollections and store all folders at the root instead, using a document field like parent_id to point to other docs within the flat collection. This shouldn't impact querying speeds due to Firestore's aggressive indexing, but we've been unable to reproduce this claim locally; i.e., querying via subcollections is vastly more performant as the total # of documents increase in the DB, versus storing everything at the top level. A reproducible repo is available here. Note that the repo uses a local emulator instance, as opposed to an actual Firestore server.
Any advice would be very helpful!

Related

Firebase free account limitations using Firestore

Based on this other question and on this pricing list I have the next one:
What's the point of using collections when we have a limitation for reads, writes and deletes per document?
I have a collection with 2 different collections inside, would I increase everything x3?
Would it be better for moving everything to the first collection as a single document?
The Firestore pricing for reading ONE document is neither function of the collection (or sub-collection) containing the document nor function of the sub-collection(s) contained by the document.
As you can read in the SO answer/question you refer to, "Firestore queries are always 'shallow'", meaning that when you read a document, you pay for the document read but you don't pay at all for the documents that are in its sub-collection(s).
It's worth noting that the concept of sub-collection can be a bit "misleading".
Let's take an example: Imagine a doc1 document under the col1 collection
col1/doc1/
and another one subDoc1 under the subCol1 (sub-)collection
col1/doc1/subCol1/subDoc1
Actually, from a technical perspective, these two collections (col1 & subCol1) are not at all relating to each other. They just share a part of their path but nothing else. One side effect of this is that if you delete a document, its sub-collection(s) still exist.
So, to answer your questions:
I have a collection with 2 different collections inside, would I
increase everything x3?
It depends on what you exactly read. If you only read documents from the first (parent) collection, you will only pay for these document reads. You will only pay for the documents contained in the two sub-collections if you build two extra queries to read the documents in these 2 sub-collections. Again, you just have to consider these three (sub-)collections as totally independent and therefore you pay for each document you read in each of those collections.
Would it be better moving everything to the first collection as a
single document
It really depends on your data model and on the queries you plan to execute. It is totally possible to "move everything in a single document", but you should take care of some limitations, in particular, the maximum size for a document which is 1 MiB.
Also, if your data model contains some complex hierarchical data it may be much easier to organize this data using sub-collections within documents instead of using nested objects or arrays in one document. For example, querying documents through data contained in Arrays has some limitations.
Again, there isn't a "one single truth": it all depends on your specific case. Note that, in the NoSQL world, your data model should be mainly designed in the light of the queries you plan to execute, without hesitating to denormalize data.

Firestore subcollection vs array

First of, I know how Firestore works and have spent a lot of time, evaluating different approaches for a good structure. Still I am considering following scenario:
There is a database of known recipes. Users can add recipes, but they have to be confirmed to be real recipes and not just some variations. So every user can choose receipes from the user-generated list of recipes to state, that they know how to cook them (or add new ones).
Now I want users to share their list of receipes with others, but this is where I am not sure how this can be best accomplished using Firestore. The trick is, that I want to show all the recipes at once, and don't want to paginate them.
I am currently evaluating two possibilities:
Subcollections
Whenever a user shares his list, the user looking at said list will have to load the entire list of the recipes which can result in a high amount of document reads (I suppose realistically ~50, in very rare cases maybe 1000).
Pros:
More natural structure
Easier to maintain (e.g. deleting a recipe, checking if a specific one exists)
Easier to add fields (e.g. timeOfCreation, comment, personalRating, ...)
Cons:
Can result in a high amount of reads on the long run
Arrays
I could save every known recipe (the id and an imageURL) inside the user's document (or as a single subdocument "KnownRecipes") within an array. This array could be in form of
recipesKnown: [{rid: 293ndwa, imageURL: image1.com, timeAdded: 8371201332},
{rid: 9012831, imageURL: image1.com, timeAdded: 8371201871},
{rid: jd812da, imageURL: image1.com, timeAdded: 8371201118},
...
]
Pros:
I only need one document read whenever someone wants to see another user's list
Reading a user's list is probably faster
Cons:
It's hard to update a specific recipe (e.g. someone wants to change the imageURL: I need to change the list locally and send the entire document as an update to the server - since I cannot just change a single element in the array)
When a user decides to have around 1000 recipes (this will maybe never happen, but it could), the 1MiB limit of the Firestore limit could be reached. A possible workaround would be to create a seperate document and split those two arrays into these two documents.
For me, the idea with Subcollections seems to be the more "clean" solution to this problem, but maybe I am missing some arguments on why one of those solutions would be superior over the other.
My most common queries are as follows (ordered descending by importance):
Which recipes can a user cook
Add a recipe a user can cook to the user's list
Who can cook a specific recipe (there is a Recipe -> Cooks subcollection)
Update an existing recipe a user can cook
The answer to your question depends on the level of scalability you want to achieve.
If by design the amount of sub-data you want to store is limited and very low, you should use arrays, since you reduce the number of document reads, which means lower costs.
If your sub-data is supposed to increase "unlimitedly" over time, you should use sub-collections.
If you're building a database which is not supposed to scale in any direction (Proof of concept, very small business, etc.) just go with what you feel more comfortable with.
I'm researching the same question...
One of the questions is whether the data held in the document will be ever go pass 1MB that is the limit for a document. Researching a bit on how much it can be held in plain text in 1MB well it's a hell of a lot. Still if it were to be incredible bigger it would crash in the end. Thus if you think in a big-big way sub-collections.
If we had to use the Firebase element logic the answer would be sub-collections.
Still I guess the major point is the data pulled. If you call the user you will directly be pulling out that MB of data. Instead with a sub-collection it won't load, even if you loaded it you can still lazy-load.
I guess for the kind of setup you are doing sub-collections.
key is an additional collection's con/pro
key could help to avoid duplicates; but this requires thinking of what is duplicate's definition (which might change);
array's no-key behavior could be emulated via auto-id.
p.s. #Thomas's list of pros/cons in the question has been quite helpful.

Firestore negative impacts for orphan subcollections

The firestore docs say:
When you delete a document that has associated subcollections, the subcollections are not deleted. They are still accessible by reference. For example, there may be a document referenced by db.collection('coll').doc('doc').collection('subcoll').doc('subdoc') even though the document referenced by db.collection('coll').doc('doc') no longer exists.
Since deleting collections is not recommended from a web client, it seems the easiest thing to do in that case would be to delete the document and leave the orphaned subcollection.
Are there any negative impacts to leaving lots of orphaned subcollections in your database?
Yes there are negative impacts of leaving orphaned collections in your database.
First, that is space you've already wasted. You'll just end up exceeding your quotas or budgets faster.
Secondly, bad queries - you can end up with a query, lets say, you want to filter all documents by a condition on a property of documents in different sub collections. That would include your orphaned sub collections too. This is inefficient both performance-wise and cost-wise.
Moreover, I am not even sure how that would impact any index you want to create. In general, leaving orphaned entries in a nested document database is a bad idea, especially if it is not at the leaf level of your structure.
Easiest thing to do - Create a cloud function that cleans up all your orphaned sub collections. You can always call that from the client in turn.

Using firebase tree structure to represent a "document outline" structure directly

How good/stupid would it be to use Firebase tree structure to directly represent a user-facing tree structure, like a "document outline" in "word processors"?
As opposed to e.g. doing an SQL-join parent-child type of relationship and then building the tree via a projection (which would probably be slow).
I know that there is a limit of 32 levels of nesting ( https://www.firebase.com/docs/web/guide/understanding-data.html ), which should be enough, as I cannot imagine a sane user wanting to do as many levels of nesting for a textual tree-outline...
Although maybe I need to divide 32 by two, because of each node needing to have sub nodes for its children and metadata, right?
I know that once a tree node is accessed via Firebase API, then all sub-nodes need to be fetched, which could be a performance problem if the user has a lot of data, but in the end I think this would not be a problem, since the data would mostly be a user-entered plaintext (short).
A performance problem could arise if the user pastes some very long chunks of text copied from somewhere (e.g. tens of kilobytes). But then I could separate those "TLOB-s" via a kind of "symlink" in firebase and fetch them on-demand from a different node, right? Same should apply for separating images and other heavy objects, right?
Although in a prototype and early stages, this should probably be ignored, for the sake of simplicity...
I could probably put in place a generic approach to "symlinking", to overcome the 32 levels limitation and the need to fetch all sub-nodes at once, right? Is there some best-practices approach for that (e.g. syntax for a firebase node which would symbolise a link to another node) ?
I have extracted the "symlinking" idea to a separate question: Firebase "symlink" to another node .
I could probably partition the topmost nodes into some kinds of projects/categories to prevent having to fetch absolutely everything the user has ever had...
Is my reasoning/approach correct?
Is there any consideration that I did not think of, e.g. innate limits on data size or performance or e.g. security rules?
Would I be better served by other technologies like Couchbase/Pouchbase ?
Further details: this is for a hybrid mobile app with some emphasis also on web access and offline access. I hope to do most of the logic in Javascript. The UI part of the question is here: HTML tree for hybrid mobile app .

DocumentDb and how to create folder?

New to documentdb and I am trying to determine the best way to store documents. We are uploading documents every 15 minutes and I need to keep them as easily separated by upload as possible. At first glance, I thought I could have a database and a collection for each upload. Then, I discovered you can only have 3 collections per database. This leaves me with either adding a naming convention or trying to use folders and paths. According to the same source (http://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/), we are limited to 100 paths per collection. This leaves folders. I have been looking, but I haven't found anything concrete on creating folders within a collection. The object API doesn't have an obvious add/create method.
Is this possible? If so, are we limited to how many (assuming I stay within the allowed collection/database size)?
You could define a sequential naming convention and create a range index on the collection indexing policy. In this way, if you need to retrieve a range of documents, you can do it in this way, which will leverage the indexing capabilities of docdb efficiently.
As a recommendation, you can examine the charge response header on the requests you fire off during your tests. This allows you to gauge how efficient your setup is (how stringent it is against the Db, which will translate into your cost structure for the service)
Sorry about the comment. What we ended up doing was just dumping everything into one collection. The azure documentdb query language (i.e. sql like) seems robust enough to handle detailed queries. Though I am not sure what the efficiency will be like once we have a ton of documents in there.

Resources