Should I create a duplicate collection/document for each use-case? (Firebase/Firestore) - firebase

I'm trying to build an ecommerce app with firebase on the backend. I have a collection of 1000+ products, each of which is stored as a separate document, which have product specific info such as price, title etc.
document:{
title: 'Some Title',
price: '$99.99',
genres: ['Horror', 'Action']
}
So in my app I need to display these products in many places, such as product carousels(similar to a bookshelf with arrow buttons at the ends), and also in a search results page.
At any given page, I assume that I will need to display at least 50 products, either as search results, or multiple carousels. I understand that I can use queries to get this data from firebase. But since each document I retrieve counts as (at least)one firestore read, I assume that a typical user session would run into 100+ reads, if not thousands.
It seems a little inefficient to me that I need to read multiple documents to get this data, when I could just all that data in a single array, as its own document. That would mean I get charged for one document read, not 50, per page.
Is this how it is expected to be done? Should I create a new document containing the data I need for each specific use case?
P.S. I'm pretty new to backend dev, let alone firebase.

TL;DR Yes, you should create a new document with the needed data for each specific use case, but it’s not recommended to make it as a document with nested objects like arrays with 1000+ elements.
From a technical point of view, Cloud Firestore is optimized for storing large collections of small documents.
Depending on the use case, you can select the most appropriate Cloud Firestore data structure.
For example, the 10 most buyed books of the month can be a document with nested complex objects like arrays or maps. This structure could be useful for use cases with a small or predefined number of elements, but as stated here, if your data expands over time with larger or growing lists, the document also grows, which can lead to slower document retrieval times.
In plus thousand registers, a better choice can be structure your data as subcollections. It is, you can create collections within documents when you have data that might expand over time, with the main advantage that, as your lists grow, the size of the parent document doesn't change.
Cloud Firestore also has several features to help you manage queries that return a large number of results:
Cursors, which allow you to resume a long-running query.
Page tokens, which help you paginate the query results.
Limits, which specify how many results to retrieve.
Offsets, which allow you
to skip a fixed number of documents.
There are no additional costs for using cursors, page tokens, and limits. In fact, these features can help you save money by reading only the documents that you actually need.
As a best practice, do not use offsets. Instead, use cursors. Using an offset only avoids returning the skipped documents to your application, but these documents are still retrieved internally. The skipped documents affect the latency of the query, and your application is billed for the read operations required to retrieve them.

Related

Firebase free account limitations using Firestore

Based on this other question and on this pricing list I have the next one:
What's the point of using collections when we have a limitation for reads, writes and deletes per document?
I have a collection with 2 different collections inside, would I increase everything x3?
Would it be better for moving everything to the first collection as a single document?
The Firestore pricing for reading ONE document is neither function of the collection (or sub-collection) containing the document nor function of the sub-collection(s) contained by the document.
As you can read in the SO answer/question you refer to, "Firestore queries are always 'shallow'", meaning that when you read a document, you pay for the document read but you don't pay at all for the documents that are in its sub-collection(s).
It's worth noting that the concept of sub-collection can be a bit "misleading".
Let's take an example: Imagine a doc1 document under the col1 collection
col1/doc1/
and another one subDoc1 under the subCol1 (sub-)collection
col1/doc1/subCol1/subDoc1
Actually, from a technical perspective, these two collections (col1 & subCol1) are not at all relating to each other. They just share a part of their path but nothing else. One side effect of this is that if you delete a document, its sub-collection(s) still exist.
So, to answer your questions:
I have a collection with 2 different collections inside, would I
increase everything x3?
It depends on what you exactly read. If you only read documents from the first (parent) collection, you will only pay for these document reads. You will only pay for the documents contained in the two sub-collections if you build two extra queries to read the documents in these 2 sub-collections. Again, you just have to consider these three (sub-)collections as totally independent and therefore you pay for each document you read in each of those collections.
Would it be better moving everything to the first collection as a
single document
It really depends on your data model and on the queries you plan to execute. It is totally possible to "move everything in a single document", but you should take care of some limitations, in particular, the maximum size for a document which is 1 MiB.
Also, if your data model contains some complex hierarchical data it may be much easier to organize this data using sub-collections within documents instead of using nested objects or arrays in one document. For example, querying documents through data contained in Arrays has some limitations.
Again, there isn't a "one single truth": it all depends on your specific case. Note that, in the NoSQL world, your data model should be mainly designed in the light of the queries you plan to execute, without hesitating to denormalize data.

Determining number of Firebase reads for nested sub-collection

I have a mobile solution (iOS) that is using Firebase to aid in syncing of data between a users devices. What I have works and allows me to keep clients in sync as I wanted to. However from testing, my reads are a bit out of control for larger data sets and I need to do some optimization. To that end, I wanted to make sure that my understanding of how reads are counted was correct (I am still a newbie at Firebase).
My data is structured like this:
Its a bit nested I agree, but for all the uses cases it seems to be the best way to do things to minimize redundancy, e.g. there are relationship between Cats and Dogs and Birds, but I only store one copy of each, not multiple. In addition, each users data is segregated from the other users and I need the ability to version the data. Put that all together and with the requirement to alternate collections and documents, you get what you see.
Based on this structure, I can create queries like this:
Firestore.firestore().collection("userid1").document("data").collection("version0").document("Cats").collection("data").whereField("modifiedDate" isGreaterThanOrEqualTo: someDoubleValue).getDocuments(completionCallback)
This gets me the data I need and seems to only return the number of items I think it should. However, am I correct in saying that if there are 100 Cat type documents (Cat1...Cat100), but only 3 of them have a modifiedDate that is greater than my query parameter, when the data is returned to me, I will only be "charged" for 3 reads? Or have I don't something completely silly here and I am getting charged for all 100 even though I only get 3 documents back in the callback.
The billing doesn't work any different for subcollections than it does for top-level collections. You are only billed for the documents transferred, not the entire set of documents in the collection (unless you do request every document).
Cloud Firestore scales massively, and it's expected that you might have a massive number of documents in a collection. Billing a read for each and every document in a collection for each query against that collection would be insanely expensive.

Firestore subcollection vs array

First of, I know how Firestore works and have spent a lot of time, evaluating different approaches for a good structure. Still I am considering following scenario:
There is a database of known recipes. Users can add recipes, but they have to be confirmed to be real recipes and not just some variations. So every user can choose receipes from the user-generated list of recipes to state, that they know how to cook them (or add new ones).
Now I want users to share their list of receipes with others, but this is where I am not sure how this can be best accomplished using Firestore. The trick is, that I want to show all the recipes at once, and don't want to paginate them.
I am currently evaluating two possibilities:
Subcollections
Whenever a user shares his list, the user looking at said list will have to load the entire list of the recipes which can result in a high amount of document reads (I suppose realistically ~50, in very rare cases maybe 1000).
Pros:
More natural structure
Easier to maintain (e.g. deleting a recipe, checking if a specific one exists)
Easier to add fields (e.g. timeOfCreation, comment, personalRating, ...)
Cons:
Can result in a high amount of reads on the long run
Arrays
I could save every known recipe (the id and an imageURL) inside the user's document (or as a single subdocument "KnownRecipes") within an array. This array could be in form of
recipesKnown: [{rid: 293ndwa, imageURL: image1.com, timeAdded: 8371201332},
{rid: 9012831, imageURL: image1.com, timeAdded: 8371201871},
{rid: jd812da, imageURL: image1.com, timeAdded: 8371201118},
...
]
Pros:
I only need one document read whenever someone wants to see another user's list
Reading a user's list is probably faster
Cons:
It's hard to update a specific recipe (e.g. someone wants to change the imageURL: I need to change the list locally and send the entire document as an update to the server - since I cannot just change a single element in the array)
When a user decides to have around 1000 recipes (this will maybe never happen, but it could), the 1MiB limit of the Firestore limit could be reached. A possible workaround would be to create a seperate document and split those two arrays into these two documents.
For me, the idea with Subcollections seems to be the more "clean" solution to this problem, but maybe I am missing some arguments on why one of those solutions would be superior over the other.
My most common queries are as follows (ordered descending by importance):
Which recipes can a user cook
Add a recipe a user can cook to the user's list
Who can cook a specific recipe (there is a Recipe -> Cooks subcollection)
Update an existing recipe a user can cook
The answer to your question depends on the level of scalability you want to achieve.
If by design the amount of sub-data you want to store is limited and very low, you should use arrays, since you reduce the number of document reads, which means lower costs.
If your sub-data is supposed to increase "unlimitedly" over time, you should use sub-collections.
If you're building a database which is not supposed to scale in any direction (Proof of concept, very small business, etc.) just go with what you feel more comfortable with.
I'm researching the same question...
One of the questions is whether the data held in the document will be ever go pass 1MB that is the limit for a document. Researching a bit on how much it can be held in plain text in 1MB well it's a hell of a lot. Still if it were to be incredible bigger it would crash in the end. Thus if you think in a big-big way sub-collections.
If we had to use the Firebase element logic the answer would be sub-collections.
Still I guess the major point is the data pulled. If you call the user you will directly be pulling out that MB of data. Instead with a sub-collection it won't load, even if you loaded it you can still lazy-load.
I guess for the kind of setup you are doing sub-collections.
key is an additional collection's con/pro
key could help to avoid duplicates; but this requires thinking of what is duplicate's definition (which might change);
array's no-key behavior could be emulated via auto-id.
p.s. #Thomas's list of pros/cons in the question has been quite helpful.

Is it ok to store a user id as the key of a field in a Firestore document?

Firestore charges for the amount of indexes used. If I have a structure where there is a massive list of ratings different users gave, and have the key as the user Id and the value as the rating, will that take up too many auto created indexes? Is there a good structure around this.
For example, in the collection 'ratings', I shard individual ratings that each user gives into different documents using a complex sharding mechanism I made that fills a document up to the max document size of around 20k, then starts filling up another document. say I have 5 documents, each filled with 20k fields. One of those docs would look like this:
uid1: 3.3
uid2: 5
uid3: 1.234
...
Is there another structure I should be using to store loads of individual 'fields' in Firestore? I don't want to use loads of documents for each rating either as that is too expensive. Arrays aren't big enough to store loads of ratings either.
Arrays aren't big enough to store loads of ratings either
The problem isn't about the arrays, the problem is that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. When we are talking about storing text, you can store pretty much but as your array getts bigger, be careful about this limitation.
According to the offical documentation regarding modelling data in Cloud Firestore:
Cloud Firestore is optimized for storing large collections of small documents.
So trying to shard a collection by filling up documents one by one, is not such a good idea.
If you are trying to add raitings from multipe users in a single document, with other words you trying to store large amount of data in a single document that can be updated by lots of users, there is another limitation that you need to take care of. So you are limited to 1 write per second on every document. So if you have a situation in which a lot of users al all trying to write data to the same documents all at once, you might start to see some of this writes to fail. So, be careful about this limitation too.
My recommendation is to store those raitings in an array, if you think that the size of the document will be within the 1MiB limitation, otherwise use a collection of tags for each object separately.

Work around firestore document size limit?

I need to store a large number of fields, like for a star rating system, but firestore only allows 20,000 fields per document. Is there a known way around this? Right now I am going to 'shard' the fields in multiple documents, and keep the size of each document in a documentSizeTracker document that I use to determine which document to shard to (and add to the counter with a transaction). Is this the correct approach? Any problems with this?
Sharding certainly could work. It's hard to say without knowing exactly what kind of data you'll need from your document, and when, but that's certainly a reasonable option. You could also consider having a parent "summary" doc that contains fields you might want to search on and then split all of your data into several documents inside a subcollection of that parent.
One important nuance here: the limit isn't 20,000 fields, but 20,000 indexed fields. So if you're storing a bunch of data inside your document, but you know that you're not going to be searching on all of them, another alternative is to mark some of your fields as unindexed (which you can now do in the Firebase console in the "Exemptions" section).
If you're dealing with thousands of fields, though, you probably won't want to exempt them all one at a time, so a better alternative might be to place your data as a map inside a container field (named something like "allOfMyData"), then just mark that one field as unindexed. That will automatically remove all indexes from any fields contained inside that map.
Actually, I ran into similar problem with the read and write issues with Firebase. So, here is my conclusion:
# if something small needs to be written & read very often, then use Firebase Realtime Database
Firebase Realtime database allows fast writes, but limits concurrent users to 100,000
Firebase Firestore allows a maximum of 1 write per second per document
It's very expensive to read a document that only contains a rating for example in Firestore
# if something (larger) needs to be read very often with writes usually more than 1 second in between then use Firestore
Firestore allows up to 1,000,000 concurrent users at current Beta release (they might make it more)
It's cheaper to read a large document (less than 1 MiB limit) in Firestore than Firebase Realtime database
# If your model doesn't fit into these two choices, then you should modify your model and split them into 2 models:
1 very small model to store in Firebase Real Database (ratings for example)
1 larger model to store in Firestore
Note: You could use both Firebase Realtime database and Firebase Firestore in the same project. Don't forget to take into account the billing differences between both databases. and their different limits. I believe, it's best to combine them and use the good side of each instead of trying to force solutions into one of them.
Note 2: I really didn't like the shard-ing idea in Firestore suggested solution and work around

Resources