Firestore negative impacts for orphan subcollections - firebase

The firestore docs say:
When you delete a document that has associated subcollections, the subcollections are not deleted. They are still accessible by reference. For example, there may be a document referenced by db.collection('coll').doc('doc').collection('subcoll').doc('subdoc') even though the document referenced by db.collection('coll').doc('doc') no longer exists.
Since deleting collections is not recommended from a web client, it seems the easiest thing to do in that case would be to delete the document and leave the orphaned subcollection.
Are there any negative impacts to leaving lots of orphaned subcollections in your database?

Yes there are negative impacts of leaving orphaned collections in your database.
First, that is space you've already wasted. You'll just end up exceeding your quotas or budgets faster.
Secondly, bad queries - you can end up with a query, lets say, you want to filter all documents by a condition on a property of documents in different sub collections. That would include your orphaned sub collections too. This is inefficient both performance-wise and cost-wise.
Moreover, I am not even sure how that would impact any index you want to create. In general, leaving orphaned entries in a nested document database is a bad idea, especially if it is not at the leaf level of your structure.
Easiest thing to do - Create a cloud function that cleans up all your orphaned sub collections. You can always call that from the client in turn.

Related

Does Firebase remove collection indexes when a collection becomes empty?

When a collection is deleted from Cloud Firestore, its indexes are deleted along with it. I presume that when a collection goes from one or more documents to zero documents that its indexes are preserved. However, in the Cloud Firestore UI, when a collection goes from one document to zero the collection disappears from the root collections tree. Again, I presume this is an artifact of the Cloud Firestore UI, but it got me wondering whether something more happens when a collection becomes empty (as opposed to the collection being deleted outright).
Can you please help clarify what happens (if anything) when a collection goes from one or more documents to zero in Cloud Firestore? Do I need to be worried about losing any indexes when this occurs?
I'm neither Googler nor Firebaser, BUT...
Firestore indexes documents, not collections - the collection paths are an organizing principle more than physical entities. The "collections" are part of the path to documents, and it's the paths and the document fields that end up indexed.
Case in point: you can actually delete a collection while child documents remain, and they will still be indexed with the collection name/ID as part of their path - you'll see this in the console with the collection (and any interstitial document) names italicized.
When a collection goes from 1 to 0 documents, all that happens is that the document is gone, and nothing else. The UI sees no reason to display a collection when there is nothing to show.
Collections don't really "exist". They are just ways to organize documents for the purpose of making queries. What you see in the console is just there to help you visualize the contents of the database. Collections will apparently spring into "existence" when a document is first created, and just as quickly disappear when there are none. They do not work like directories in a filesystem.
An index is just a way of telling Firestore that you have special query needs for documents in a certain named collection or collection group. The index simply enables the query against the documents in the collection or collection group that you name. The index works without requiring any documents to index, and it will continue working no matter how many documents exist.
Some great answers by LeadDreamer and Doug already, but one more thing you seem to be curious about: deleting all documents from a collection does not affect the index definitions for that collection. So if you later add documents to the collection again, the same index definitions will still apply.

Firebase free account limitations using Firestore

Based on this other question and on this pricing list I have the next one:
What's the point of using collections when we have a limitation for reads, writes and deletes per document?
I have a collection with 2 different collections inside, would I increase everything x3?
Would it be better for moving everything to the first collection as a single document?
The Firestore pricing for reading ONE document is neither function of the collection (or sub-collection) containing the document nor function of the sub-collection(s) contained by the document.
As you can read in the SO answer/question you refer to, "Firestore queries are always 'shallow'", meaning that when you read a document, you pay for the document read but you don't pay at all for the documents that are in its sub-collection(s).
It's worth noting that the concept of sub-collection can be a bit "misleading".
Let's take an example: Imagine a doc1 document under the col1 collection
col1/doc1/
and another one subDoc1 under the subCol1 (sub-)collection
col1/doc1/subCol1/subDoc1
Actually, from a technical perspective, these two collections (col1 & subCol1) are not at all relating to each other. They just share a part of their path but nothing else. One side effect of this is that if you delete a document, its sub-collection(s) still exist.
So, to answer your questions:
I have a collection with 2 different collections inside, would I
increase everything x3?
It depends on what you exactly read. If you only read documents from the first (parent) collection, you will only pay for these document reads. You will only pay for the documents contained in the two sub-collections if you build two extra queries to read the documents in these 2 sub-collections. Again, you just have to consider these three (sub-)collections as totally independent and therefore you pay for each document you read in each of those collections.
Would it be better moving everything to the first collection as a
single document
It really depends on your data model and on the queries you plan to execute. It is totally possible to "move everything in a single document", but you should take care of some limitations, in particular, the maximum size for a document which is 1 MiB.
Also, if your data model contains some complex hierarchical data it may be much easier to organize this data using sub-collections within documents instead of using nested objects or arrays in one document. For example, querying documents through data contained in Arrays has some limitations.
Again, there isn't a "one single truth": it all depends on your specific case. Note that, in the NoSQL world, your data model should be mainly designed in the light of the queries you plan to execute, without hesitating to denormalize data.

How to move a subcollection on CloudFirestore

We are planning to implement a virtual filesystem using Google Firestore.
The idea of subcollections is nice because it allows us to model our data in terms of a folder hierarchy, like so: /folders/folderA/entities/folderB/entities/fileX
Much like an actual filesystem, we'd like to handle cross-folder moves, such as moving nested subfolder folderB from parent folderA to parent folderC. Indeed, it will often be the case that the folder we want to move may themselves contain their own subcollections of files and folders an arbitrary K levels deep.
This comment suggests that moving a document will not automagically move its associated subcollections. Similarly, deleting a document will forego deleting its underlying subcollections, leaving them as orphans. It seems like the only way to move a folder (and its entities) from one parent to another would be through a recursive clone + delete strategy, which may be difficult to accomplish reliably and transactionally if its sub-entities are massive.
The alternative is to abandon using subcollections and store all folders at the root instead, using a document field like parent_id to point to other docs within the flat collection. This shouldn't impact querying speeds due to Firestore's aggressive indexing, but we've been unable to reproduce this claim locally; i.e., querying via subcollections is vastly more performant as the total # of documents increase in the DB, versus storing everything at the top level. A reproducible repo is available here. Note that the repo uses a local emulator instance, as opposed to an actual Firestore server.
Any advice would be very helpful!

Creating Empty Documents in Firestore

I understand that empty documents within collections are removed automatically by the system in Firestore. However, I have a situation now where the name of the document serves a purpose. I have a collection named usernames, and within this, many documents with the ID being the username. For instance, usernames/bob_marley is what I might see in the database. The problem here is that, since the documents do not have any fields in them, they get removed automatically thereby defeating the purpose of the set-up. How should I be structuring my database in these cases?
Thank you
The easiest thing to do is simply not allow the document to ever become empty. Keep one property in it with (for example) "exists = true" and make sure it never gets removed. Use a security rule to prevent this, if you're concerned about users accidentally doing this to themselves.
Another thing to do is re-evaluate what exactly you're trying to do with an empty document in the system, and if it's worthwhile to think about how to structure your data in a way that best meets the queries you want to perform.

Advantages of firestore sub-collections

The firestore docs don't have an in depth discussion of the tradeoffs involved in using sub-collections vs top-level collections, but do point out that they are less flexible and less 'scalable'. Given that you sacrifice flexibility in setting up your data in sub-collections, there must be some definite plus sides besides a mentally satisfying structure.
For example how does the time for a firestore query on a single key across a large collection compare with getting all items from a much smaller collection?
Say we want to query a large collection 'People' for all people in a family unit. Alternatively, partition the data by family in the first place into family units.
People -> person: {family: 'Smith'}
versus
Families -> family: {name:'Smith'} -> People -> person
I would expect the latter to be more efficient, but is this correct? Are the any big-O estimates for each?
Any other advantages of sub-collections (eg for transactions)?
I’ ve got some key points about subcollections that you need to be aware of when modeling your database.
1 – Subcollections give you a more structured database.
2 - Queries are indexed by default: Query performance is proportional to the size of your result set, not your data set. So does not matter the size of your collection, the performance depends on the size of your result set.
3 – Each document has a max size of 1MB. For instance, if you have an array of orders in your customer document, it might be a good idea to create a subcollection of orders to each customer because you cannot foresee how many orders a customer will have. By doing this you don’t need to worry about the max size of your document.
4 – Pricing: Firestore charges you for document reads, writes and deletes. Therefore, when you create many subcollections instead of using arrays in the documents, you will need to perform more read, writes and deletes, thus increasing your bill.
To answer the original question about efficiency:
Querying all people with the family 'Smith' from the people top-level collections really is not any slower than asking for all the people in the 'Smith' family sub-collection.
This is explained in the How to Structure Your Data episode of the Get to Know Cloud Firestore video series.
There are some trade-offs between top-level collections and sub-collections to be aware of. Depending on the specific queries you intend to use you may need to create composite indexes to query top-level collections or collection group indexes to query sub-collections. Both these index types count towards the 200 index exemptions limit.
These trade-offs are discussed in detail near the bottom of the Understanding Collection Group Queries blog post and in Maps, Arrays and Subcollections, Oh My! episode of the Get to Know Cloud Firestore video series.
I've linked to the relevant parts of both videos.
I was wondering about the same thing. The documentation mainly talks about arrays vs sub-collections. My conclusion is that there are no clear advantages of using a sub-collection over a top-level collection. Sub collections had some clear technical limitations before, but I think those are removed with the recent introduction of collection group queries.
Here are some advantages of both approaches:
Sub collection:
Your database "feels" more structured as you will have less top-level collections listed.
No need to store a reference/foreign key/id of the parent document, as it is implied by the database structure. You can get to the parent via the sub collection document ref.
Top-level collection:
Documents are easier to delete. Using sub collections you need to make sure to first delete all sub collection documents before you delete the parent document. There is no API for this so you might need to roll your own helper functions.
Having the parent id directly in each (sub) document might make it easier to process query results, depending on the application.
Todd answered this in firebase youtube video
1) There's a limit to how many documents you can create per minute in
a single collection if the documents have an always-increasing value
(like a timestamp)
2) Very large collections don't do as well from a
performance standpoint when you're offline. But they are generally
good options to consider.

Resources