Advantages of firestore sub-collections - collections

The firestore docs don't have an in depth discussion of the tradeoffs involved in using sub-collections vs top-level collections, but do point out that they are less flexible and less 'scalable'. Given that you sacrifice flexibility in setting up your data in sub-collections, there must be some definite plus sides besides a mentally satisfying structure.
For example how does the time for a firestore query on a single key across a large collection compare with getting all items from a much smaller collection?
Say we want to query a large collection 'People' for all people in a family unit. Alternatively, partition the data by family in the first place into family units.
People -> person: {family: 'Smith'}
versus
Families -> family: {name:'Smith'} -> People -> person
I would expect the latter to be more efficient, but is this correct? Are the any big-O estimates for each?
Any other advantages of sub-collections (eg for transactions)?

I’ ve got some key points about subcollections that you need to be aware of when modeling your database.
1 – Subcollections give you a more structured database.
2 - Queries are indexed by default: Query performance is proportional to the size of your result set, not your data set. So does not matter the size of your collection, the performance depends on the size of your result set.
3 – Each document has a max size of 1MB. For instance, if you have an array of orders in your customer document, it might be a good idea to create a subcollection of orders to each customer because you cannot foresee how many orders a customer will have. By doing this you don’t need to worry about the max size of your document.
4 – Pricing: Firestore charges you for document reads, writes and deletes. Therefore, when you create many subcollections instead of using arrays in the documents, you will need to perform more read, writes and deletes, thus increasing your bill.

To answer the original question about efficiency:
Querying all people with the family 'Smith' from the people top-level collections really is not any slower than asking for all the people in the 'Smith' family sub-collection.
This is explained in the How to Structure Your Data episode of the Get to Know Cloud Firestore video series.
There are some trade-offs between top-level collections and sub-collections to be aware of. Depending on the specific queries you intend to use you may need to create composite indexes to query top-level collections or collection group indexes to query sub-collections. Both these index types count towards the 200 index exemptions limit.
These trade-offs are discussed in detail near the bottom of the Understanding Collection Group Queries blog post and in Maps, Arrays and Subcollections, Oh My! episode of the Get to Know Cloud Firestore video series.
I've linked to the relevant parts of both videos.

I was wondering about the same thing. The documentation mainly talks about arrays vs sub-collections. My conclusion is that there are no clear advantages of using a sub-collection over a top-level collection. Sub collections had some clear technical limitations before, but I think those are removed with the recent introduction of collection group queries.
Here are some advantages of both approaches:
Sub collection:
Your database "feels" more structured as you will have less top-level collections listed.
No need to store a reference/foreign key/id of the parent document, as it is implied by the database structure. You can get to the parent via the sub collection document ref.
Top-level collection:
Documents are easier to delete. Using sub collections you need to make sure to first delete all sub collection documents before you delete the parent document. There is no API for this so you might need to roll your own helper functions.
Having the parent id directly in each (sub) document might make it easier to process query results, depending on the application.

Todd answered this in firebase youtube video
1) There's a limit to how many documents you can create per minute in
a single collection if the documents have an always-increasing value
(like a timestamp)
2) Very large collections don't do as well from a
performance standpoint when you're offline. But they are generally
good options to consider.

Related

Should I create a duplicate collection/document for each use-case? (Firebase/Firestore)

I'm trying to build an ecommerce app with firebase on the backend. I have a collection of 1000+ products, each of which is stored as a separate document, which have product specific info such as price, title etc.
document:{
title: 'Some Title',
price: '$99.99',
genres: ['Horror', 'Action']
}
So in my app I need to display these products in many places, such as product carousels(similar to a bookshelf with arrow buttons at the ends), and also in a search results page.
At any given page, I assume that I will need to display at least 50 products, either as search results, or multiple carousels. I understand that I can use queries to get this data from firebase. But since each document I retrieve counts as (at least)one firestore read, I assume that a typical user session would run into 100+ reads, if not thousands.
It seems a little inefficient to me that I need to read multiple documents to get this data, when I could just all that data in a single array, as its own document. That would mean I get charged for one document read, not 50, per page.
Is this how it is expected to be done? Should I create a new document containing the data I need for each specific use case?
P.S. I'm pretty new to backend dev, let alone firebase.
TL;DR Yes, you should create a new document with the needed data for each specific use case, but it’s not recommended to make it as a document with nested objects like arrays with 1000+ elements.
From a technical point of view, Cloud Firestore is optimized for storing large collections of small documents.
Depending on the use case, you can select the most appropriate Cloud Firestore data structure.
For example, the 10 most buyed books of the month can be a document with nested complex objects like arrays or maps. This structure could be useful for use cases with a small or predefined number of elements, but as stated here, if your data expands over time with larger or growing lists, the document also grows, which can lead to slower document retrieval times.
In plus thousand registers, a better choice can be structure your data as subcollections. It is, you can create collections within documents when you have data that might expand over time, with the main advantage that, as your lists grow, the size of the parent document doesn't change.
Cloud Firestore also has several features to help you manage queries that return a large number of results:
Cursors, which allow you to resume a long-running query.
Page tokens, which help you paginate the query results.
Limits, which specify how many results to retrieve.
Offsets, which allow you
to skip a fixed number of documents.
There are no additional costs for using cursors, page tokens, and limits. In fact, these features can help you save money by reading only the documents that you actually need.
As a best practice, do not use offsets. Instead, use cursors. Using an offset only avoids returning the skipped documents to your application, but these documents are still retrieved internally. The skipped documents affect the latency of the query, and your application is billed for the read operations required to retrieve them.

Google Firestore - Efficiently fetch a single document, perform a point query within a subcollection

Assume I am designing a new Firestore database. Assume I like the idea of a hierarchical design and, as a contrived example, each Year has a sequence of child Weeks of which each has Days.
What's the most performance efficient way to retrieve a single document for today? i.e. 2021-W51-Thursday
Answers are permitted to include changes to the model, e.g. "denormalizing" the day model such that it includes year, week and dayName fields (and querying them).
Otherwise a simple document reference may be the fastest way, like:
DocumentReference ref = db
.Collection("years").Document("2021")
.Collection("weeks").Document("51")
.Collection("days").Document("Thursday");
Thanks.
Any query that identifies a single document to fetch is equally performant to any other query that does the same at the scale that Firestore operates. The organization of collections or documents does not matter at al at scale. You might see some fluctuations in performance at small scale, depending on your data set, but that's not how Firestore is optimized to work.
All collections and all subcollections each have at least one index on the ID of the document that works the same way, independent of each other collection and index. If you can identify a unique document using its path:
/db/XXXX/weeks/YY/days/ZZZZ
Then it scales the same as a document stored using a more flat structure:
/db/XXXXYYZZZZ
It makes no difference at scale, since indexes on collections scale to an infinite number of documents with no theoretical upside limit. That's the magic of Firestore: if the system allows the query, then it will always perform well. You don't worry about scaling and performance at all. The indexes are automatically sharded across computing resources to optimize performance (vs. cost).
All of the above is true for fields of a document (instead of a document ID). You can think of a document ID as a field of a document that must be unique within a collection. Each field has its own index by default, and it scales massively.
With NoSQL databases like Firestore, you should structure your data in such a way that eases your queries, as long as those queries can be supported by indexes that operate at scale. This stands in contrast with SQL databases, which are optimized for query flexibility rather than massive scalability.

Firebase free account limitations using Firestore

Based on this other question and on this pricing list I have the next one:
What's the point of using collections when we have a limitation for reads, writes and deletes per document?
I have a collection with 2 different collections inside, would I increase everything x3?
Would it be better for moving everything to the first collection as a single document?
The Firestore pricing for reading ONE document is neither function of the collection (or sub-collection) containing the document nor function of the sub-collection(s) contained by the document.
As you can read in the SO answer/question you refer to, "Firestore queries are always 'shallow'", meaning that when you read a document, you pay for the document read but you don't pay at all for the documents that are in its sub-collection(s).
It's worth noting that the concept of sub-collection can be a bit "misleading".
Let's take an example: Imagine a doc1 document under the col1 collection
col1/doc1/
and another one subDoc1 under the subCol1 (sub-)collection
col1/doc1/subCol1/subDoc1
Actually, from a technical perspective, these two collections (col1 & subCol1) are not at all relating to each other. They just share a part of their path but nothing else. One side effect of this is that if you delete a document, its sub-collection(s) still exist.
So, to answer your questions:
I have a collection with 2 different collections inside, would I
increase everything x3?
It depends on what you exactly read. If you only read documents from the first (parent) collection, you will only pay for these document reads. You will only pay for the documents contained in the two sub-collections if you build two extra queries to read the documents in these 2 sub-collections. Again, you just have to consider these three (sub-)collections as totally independent and therefore you pay for each document you read in each of those collections.
Would it be better moving everything to the first collection as a
single document
It really depends on your data model and on the queries you plan to execute. It is totally possible to "move everything in a single document", but you should take care of some limitations, in particular, the maximum size for a document which is 1 MiB.
Also, if your data model contains some complex hierarchical data it may be much easier to organize this data using sub-collections within documents instead of using nested objects or arrays in one document. For example, querying documents through data contained in Arrays has some limitations.
Again, there isn't a "one single truth": it all depends on your specific case. Note that, in the NoSQL world, your data model should be mainly designed in the light of the queries you plan to execute, without hesitating to denormalize data.

Firestore Collection Write Rate

The article about Best practices for Cloud Firestore states that we should keep the rate of write operations for an individual collection under 1,000 operations/second.
But at the same time, the Firebase team says in Choose a data structure that root-level collections "offer the most flexibility and scalability".
What if I have a root-level collection (e.g. "messages") which expects to have more than 1,000 write operations/second?
If you think at that limitation of 1,000 operations/second it's pretty much but if you find your self in a situation in which you need more than that, then you should consider changing your database schema to allow writes on multiple collections. So you should multiply the number of collections. Having a single collection of messages, in which every user can add messages doesn't sound as a good way to go since you can reach that limitation very soon. In this case you should split that collection into multiple other collections. A possible schema might be the one I have explained in the following video:
https://www.youtube.com/watch?v=u3KwKQddPoo
See, at the end of that video, there is collection named messages which in term contains a roomId document. This document contains a subcollection named roomMessages which contains as documents all messages from a chat room. In this case, there are no chances you can reach that limitation.
But at the same time, the Firebase team says in Choose a data structure that root-level collections "offer the most flexibility and scalability".
But also rememeber, Firestore can as quickly look up a collection at level 1 as it can at level 100, so you don't need to worry about that.
The limit of 1,000 ops/sec per collection only apply to realtime update, so as long as you don't have a snapshot listener this should be okay.
I asked the question on the Cloud Firestore Google Groups
The limit is 10,000 writes per second if no other limits apply first:
https://firebase.google.com/docs/firestore/quotas#writes_and_transactions
Also just keep in mind the best practices for scaling cloud firestore

Is it ok to store a user id as the key of a field in a Firestore document?

Firestore charges for the amount of indexes used. If I have a structure where there is a massive list of ratings different users gave, and have the key as the user Id and the value as the rating, will that take up too many auto created indexes? Is there a good structure around this.
For example, in the collection 'ratings', I shard individual ratings that each user gives into different documents using a complex sharding mechanism I made that fills a document up to the max document size of around 20k, then starts filling up another document. say I have 5 documents, each filled with 20k fields. One of those docs would look like this:
uid1: 3.3
uid2: 5
uid3: 1.234
...
Is there another structure I should be using to store loads of individual 'fields' in Firestore? I don't want to use loads of documents for each rating either as that is too expensive. Arrays aren't big enough to store loads of ratings either.
Arrays aren't big enough to store loads of ratings either
The problem isn't about the arrays, the problem is that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. When we are talking about storing text, you can store pretty much but as your array getts bigger, be careful about this limitation.
According to the offical documentation regarding modelling data in Cloud Firestore:
Cloud Firestore is optimized for storing large collections of small documents.
So trying to shard a collection by filling up documents one by one, is not such a good idea.
If you are trying to add raitings from multipe users in a single document, with other words you trying to store large amount of data in a single document that can be updated by lots of users, there is another limitation that you need to take care of. So you are limited to 1 write per second on every document. So if you have a situation in which a lot of users al all trying to write data to the same documents all at once, you might start to see some of this writes to fail. So, be careful about this limitation too.
My recommendation is to store those raitings in an array, if you think that the size of the document will be within the 1MiB limitation, otherwise use a collection of tags for each object separately.

Resources