Context:
I am creating a kind-of-wiki page in Angular. The wiki page would probably not get bigger than 5000 articles in total.
I want to get the most efficient (pageload) way possible but I think I am too new to this to oversee the consequences of one option over the other. Of course I also would like to follow conventions.
Problem:
I have a collection of articles in firestore which I want to categorize. An article should belong to one category. A category can belong to one category (as a sub category).
Now, Which data-model is preferred? And why?
Every article having an attribute (of reference datatype) referring to the category document?
The category documents having an array of references to the articles?
Firestore-root
|
--- categories (collection)
|
--- categoryId (document)
|
--- name: "Science" //Simple property
|
--- articles ['articleId1', 'articleId2', etc.. (long array)]
Something completely different?
Every article having an attribute (of reference datatype) referring to the category document?
This structure will help you only if the an article can belong to a single category.
Firestore-root
|
--- articles (collection)
|
--- articleId (document)
|
--- catoegory: "Science" //Simple property
There is no need to use a reference to a category document. Beside that, to filter your articles, you can simple use a where equal call.
The category documents having an array of references to the articles?
This structure will help you if the an article can belong to one or more categories.
Firestore-root
|
--- articles (collection)
|
--- articleId (document)
|
--- catoegories: ["Science", "Economy"] //Property of type array
Now to filter your articles, you can simply use a where array-contains call.
Something completely different?
Both solutions are widely used when structuring a Cloud Firestore database, but you should choose which one is more appropriate to the use-case of your app.
Related
I am trying to learn about firestore and its data structure.
I have read that the queries are swallow so if I do a query I don't get the data from the subcollection.
I have an application about shopping, this application has a collection shoppingCart and this shoppingCart has a field name an another collection lists inside and those lists have products with a name.
If I want to get just the shoppingCart name in a query and in other query I want to get the lists inside this shoppingCart should I structure my data as:
shoppingCart
name
Lists
listId1:
Products:
productId1:
name
and then do a query to get the collection shoppingCart an another query to get the subcollection lists.
Or should I duplicate data and do a collection with just the field:
shoppingCart:
name
and then another shoppingCart with the subcollection lists
shoppingCart:
Lists
listId1:
Products:
productId1:
name
I know it depends a lot in the application, but I am a bit confused, because I read the documentation and doesn't say anything against this, but in the realtime database documentation it says to avoid nested list. I know they are not the same but after reading this I don't know if the nested subcollections are a good practice.
Thank you in advance.
When it comes to storing shopping cart products in Firestore, there are three ways in which you can do this.
You can create a shopping cart document that can hold product IDs or references that point to documents. In this case, to display the content of the shopping cart, you have to create separate calls for displaying product details. Since a shopping cart, most likely belongs to a user, the schema should look like this:
Firestore-root
|
--- users (collection)
|
--- $uid (document)
|
--- shoppingCart (collection)
|
--- content (document)
|
--- ["productId", "productId"] (array)
The benefit is that if you're using a real-time listener, if an admin changes a price or if a product becomes unavailable, you'll be notified instantly.
The second option would be to create a shopping cart collection where you can store as documents all items in the cart. This means that when a user clicks the "add to cart" button, then you should copy the product object inside this collection:
Firestore-root
|
--- users (collection)
|
--- $uid (document)
|
--- shoppingCart (collection)
|
--- $productId (document)
|
--- //product fields.
The third option would be to create a single document that can hold all items, as long as the size of the document doesn't get larger than 1 Mib, which is the maximum limit:
Firestore-root
|
--- users (collection)
|
--- $uid (document)
|
--- shoppingCart (collection)
|
--- content (document)
|
--- items (array)
|
--- 0
|
--- //product fields.
If you're interested, I recently started a series of articles called:
How to create an Android app using Firebase?
Here is the part responsible for the Firestore schema.
Edit:
In my case the shoppingCart is shared between multiple users.
In that case, it doesn't make sense to add the shopping cart under the User document. It can be a top-level collection or a stand-alone document in a collection of your choice.
so if I add a product to the cart I want the rest of the users to be notified instantly.
In that case, you should create inside the shopping cart document, an array of UIDs. In this way, you'll be able to know to which users to send a notification.
can I do that like in the second option, making that if a new product is added or deleted I notify the users, or should I do it like in the first option?
It's up to you to decide which option to choose. If one of the above solutions, solves the problem, then you can go ahead with it.
if I want to notify the users in real-time, should I duplicate documents like in Realtime Database or should I not duplicate documents?
You should duplicate the data only when needed. That's why I provided three solutions so you can choose which one fits better.
Coming from an RDMS background I am trying to understand NoSQL databases and planning a simple project that includes topics, posts & comments.
Topics have posts & posts have comments
I have found the following guide that suggests using the following top-level collections:
A users collection
A posts collection
A user-posts collection
A posts-comments collection
https://firebaseopensource.com/projects/firebase/quickstart-android/database/readme/
I fail to understand the benefits of (3) above as surely we can simply filter (2) based on the user, even 3 would still need to be filtered.
What is the logic of having comments as a top-level collection as opposed to having comments as a subcollection under posts? Is this not the better way to store hierarchical data?
In the NoSQL world, we are structuring a database according to the queries that we want to perform.
What is the logic of having comments as a top-level collection as opposed to having comments as a subcollection under posts?
None is better than the other. However, there are some differences:
What are the benefits of using a root collection in Firestore vs. a subcollection?
Is this not the better way to store hierarchical data?
There is no "perfect", "the best" or "the correct" solution for structuring a Cloud Firestore database. We always choose to create a structure for our database that satisfies our queries. So in your case, I would create a schema that looks like this:
Firestore-root
|
--- users (collection)
| |
| --- $uid (document)
| |
| --- //user fields.
|
--- posts (collection)
|
--- $postId (document)
|
--- uid: "veryLongUid"
|
--- //user fields.
|
--- comments (sub-collection)
|
--- $commentId (document)
|
--- uid: "veryLongUid"
|
--- //comment fields.
Using this schema you can:
Get all users.
Get all posts in the database.
Get all posts that correspond to only a particular user.
Get all comments of all posts, of all users in the database. Requires a collection group query.
Get all comments of all posts that correspond to a particular user. Requires a collection group query.
Get all comments of all users that correspond to a particular post.
Get all comments of a particular user that correspond to a particular post.
Am I missing something?
If you think that all the comments of a post might fit into 1 MiB maximum limitation, then you should consider adding all comments into an array. If not, I highly recommend you read the following approach:
How to reduce Firestore costs?
Where I have explained how can we store up to billions of comments and replies in Firestore.
I have a subcollection for each doc in the users collection of my app. This subcollection stores docs that are related to the user, however they could just as well be saved to a master collection, each doc with an associated userId.
I chose this structure as it seemed the most obvious at the time but I can imagine it will make things harder down the road if I need to do database maintenance. E.g. If I wanted to clean up those docs, I would have to query each user and then each users docs, whereas if I had a master collection I could just query all docs.
That lead me to question what is the point of subcollections at all, if you can just associate those docs with an ID. Is it solely there so that you can expand if your doc becomes close to the 1MB limit?
Edit: October, 29th 2021:
To be clear about the following sentence that exists in the docs:
If you don't query based on the field with sequential values.
A timestamp just can not be considered consecutive. However, it still can be considered sequential. The same rules apply to alphabetical (Customer1, Customer2, Customer3, ...), or pretty much everything that can be treated as a predictably generated value.
Such sequential data in the Firestore indexes, it's most likely to be written in the physical proximity on the storage media, hence that limitation.
That being said, please note that Firestore uses a mechanism to map the documents to their corresponding locations. This means that if the values are not randomly distributed, the write operations will not be distributed correctly over the locations. That's the reason why that limitation exists.
Also note, that there is a physical limit on how much data you can write to such a location in a specific amount of time. Predictable key/values most likely will end up in the same location, which is actually bad. So there are more changes to reach the limitation.
Edit: July, 16th 2021:
Since this answer sounds a little old, I will try to add a few more advantages of using subcollections that I found over time:
Subcollections will always give you a more structured database schema, as you can always refer to a subcollection that is related only to a specific document. So you can nest only data that is related to a particular document.
As mention before, the maximum depth of a subcollection is 100. So an important feature here is that a Firestore Query is as fast at level 1, as it is at level 100. So there should be no concerns regarding depth. This feature is tested.
Queries in subcollections are indexed by default, as in the case of top-level collections.
In terms of speed, it doesn't really matter if you Query a top-level collection, a subcollection, or a collection group, the speed will always be the same, as long as the Query returns the same number of documents. This is happening because the Query performance depends on the number of documents you request and not on the number of documents you search. So querying a subcollection has the same effect as querying a top-level collection, no downsides at all.
When storing documents in a subcollection, please note that there is no need to storing the document ID as a field, as it is by default part of the reference. This means that you can store less data in the documents that exist in the subcollection. More important, if you would have saved the same data in a top-level collection, and you would have needed to create a Query with two whereEqualTo() calls + an orderBy() call, then an index would be required.
In terms of security, subcollections allow inheritance of security rules, which is useful because we can write less and less code to secure the database.
That's for the moment, if I found other benefits, I'll update the answer.
Let's take an example for that. Let's assume we have a database schema for a quiz app that looks like this:
Firestore-root
|
--- questions (collections)
|
--- questionId (document)
|
--- questionId: "LongQuestionIdOne"
|
--- title: "Question Title"
|
--- tags (collections)
|
--- tagIdOne (document)
| |
| --- tagId: "yR8iLzdBdylFkSzg1k4K"
| |
| --- tagName: "History"
| |
| --- //Other tag properties
|
--- tagIdTwo (document)
|
--- tagId: "tUjKPoq2dylFkSzg9cFg"
|
--- tagName: "Geography"
|
--- //Other tag properties
In which tags is a subcollection within questionId object. Let's create now the tags collection as a top-level collection like this:
Firestore-root
|
--- questions (collections)
| |
| --- questionId (document)
| |
| --- questionId: "LongQuestionIdOne"
| |
| --- title: "Question Title"
|
--- tags (collections)
|
--- tagIdOne (document)
| |
| --- tagId: "yR8iLzdBdylFkSzg1k4K"
| |
| --- tagName: "History"
| |
| --- questionId: "LongQuestionIdOne"
| |
| --- //Other tag properties
|
--- tagIdTwo (document)
|
--- tagId: "tUjKPoq2dylFkSzg9cFg"
|
--- tagName: "Geography"
|
--- questionId: "LongQuestionIdTwo"
|
--- //Other tag properties
The differences between this two approaches are:
If you want to query the database to get all tags of a particular question, using the first schema it's very easy because only a CollectionReference is needed (questions -> questionId -> tags). To achieve the same thing using the second schema, instead of a CollectionReference, a Query is needed, which means that you need to query the entire tags collection to get only the tags that correspond to a single question.
Using the first schema everything is more organised. Beside that, in Firestore Maximum depth of subcollections: 100. So you can take advantage of that.
As also #RenaudTarnec mentioned in his comment, queries in Cloud Firestore are shallow, they only get documents from the collection that the query is run against. There is no way to get documents from a top-level collection and other collections or subcollections in a single query. Firestore doesn't support queries across different collections in one go. A single query may only use properties of documents in a single collection. So there is no way you can get all the tags of all the questions using the first schema.
This technique is called database flatten and is a quite common practice when it comes to Firebase. So use this technique only if is needed. So in your case, if you only need to display the tags of a single question, use the first schema. If you want somehow to display all the tags of all questions, the second schema is recommended.
Is it solely there so that you can expand if your doc becomes close to the 1MB limit?
If you have a subcollection of objects within a document, please note that size of the subcollection it does not count in that 1 MiB limit. Only the data that is stored in the properties of the document is counted.
Edit Oct 01 2019:
According to #ShahoodulHassan comment:
So there is no way you can get all the tags of all the questions using the first schema?
Actually now there is, we can get all tags of all questions with the use of Firestore collection group query. One thing to note is that all the subcolletions must have the same name, for instance tags.
The single biggest advantage of sub-collections that I've found is that they have their own rate limit for writes because each sub-collection has its own index (assuming you don't have a collection group index). This probably isn't a concern for small applications but for medium/large scale apps it could be very important.
Imagine a chat application where each chat has a series of messages. You'll want to index messages by timestamp to show them in chronological order. The Firestore write limit for sequential values is 500/second, which is definitely within reach of a medium-sized app (especially if you consider the possibility of a rogue user scripting messages -- which is not currently easy to prevent with Security Rules)
// root collection
/messages {
chatId: string
timeSent: timestamp // the entire app would be limited to 500/second
}
// sub-collection
/chat/{chatId}/messages {
timeSent: timestamp // each chat could safely write up to 500/second
}
Surprised this hasn't been mentioned before, but sub-collections can (in some cases) help bypass the orderBy limitations:
You can't order your query by a field included in an equality (==) or in clause.
Suppose you want to get a users most recent 10 logins:
Top-Level:
//We can't use .orderBy after .where('==')
USER_LOGINS.where('userId', '==', {uid}).limit(10)
Sub-Collection:
//With a subcollection we can order and limit properly
USERS.doc({uid}).collection('LOGINS').orderBy('unixCreated', 'desc').limit(10);
Subcollections are also helpful in setting up security rules. Suppose you are building a chat app and have a user collection with a replies subcollection. You want other users to be able to add to the replies collection but want to give the user full rights to the user collection. If you have replies as an array of maps/objects in user collection, it severely limits the rules you can write against the user collection for the collection owner and other users to be able to add to the collection. Whereas, having it as its own subcollection makes writing security rules waaaaay easier.
I wanted to ask for an advice on data structuring best practices for Cloud Firestore for the following scenario.
There's a booking/appointment app. Hotels rent out rooms. Each hotel has multiple rooms. Clients can search the rooms of all hotels by availability on specific days.
What is the best way to structure the availability data in Firestore so I could create a view of all available rooms throughout all hotels.
I thought of creating a separate collections where I would put all the reservations referencing the room ID and date of the reservation. However, it seems like I won't be able to search for available slots this way since Firestore can't perform 'not equals' queries.
So I thought I would create an array field for each room containing all the available dates as timestamps. This creates another problem. Even though I can use 'array_contains' query, users can't check availability for more than one day this way since 'array_contains' can only be used once per query.
What would be the most efficient way to structure the data in this case?
Thank you!
What is the best way to structure the availability data in Firestore so I could create a view of all available rooms throughout all hotels.
A possible database structure that can help you achieve what you want, might be this:
Firestore-root
|
--- hotels (collection)
| |
| --- hotelId (document)
| |
| --- //Hotel properties
|
|
--- rooms (collection)
| |
| --- hotelId (document)
| |
| --- hotelRooms (collection)
| |
| --- roomId (document)
| |
| --- available: true
| |
| --- hotel: "hotelId"
| |
| --- //Other room properties
|
|
--- availabeRooms (collection)
|
--- roomId (document)
|
--- available: true
|
--- hotel: "hotelId"
|
--- //Other room properties
As you can probably see, I have duplicate some data in order to achieve what you want. This practice is called denormalization and is a common practice when it comes to Firebase. For a better understanding, I recommend you see this video, Denormalization is normal with the Firebase Database. It's for Firebase realtime database but same principles apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an item, you need to do it in every place that it exists.
Using this database schema, you can simply query the database to get all available rooms from all hotels by attaching a listener on availabeRooms reference and get all room objects. If you want to get the details of the hotel from which a particular room is apart, you need to make an extra call to get the hotel details. I have stored within the room object, only a reference of the hotel object which is as you can see, the hotelId. You can also store the entire hotel object but before taking a decision, I recommend you to be aware of some details that can be found in my answer from this post.
Furthermore, if a room becomes unavailable, simply change the value of the available property that exist under rooms -> hotelId -> hotelRooms -> roomId to false and remove the corresponding room from the availabeRooms collection. That's it!
P.S. If you want to get all the available rooms within a single hotel, just attach a listener on rooms -> hotelId -> hotelRooms and get all available rooms using a query that should look like this:
Query query = db.collection("rooms").document(hotelId)
.collection("hotelRooms").whereEqualTo("available", true);
Edit:
According to your comment regarding the date of the reservation, you should create a calendar of reservations for each room separately. Then just simply create a function, in Cloud Function that can be triggered using a cron job. This function can help you check the availability for each room daily. If the room is available, set the available to true otherwise, set the property to false and remove the room from the availabeRooms collection.
There is a firestore collection that stores recipes with a list of ingredients. We need to find recipes that contain at least one of the ingredients. How to implement it? This is possible with the firestore?
--- recipe1
|
--- ingredients: ["salt", "pepper", "sucar"]
--- recipe2
|
--- ingredients: ["pepper"]
--- recipe3
|
--- ingredients: ["salt", "pepper"]
How to choose a recipe in which there is either "pepper" OR "salt"?
This is possible with the firestore?
Yes it is.
We need to find recipes that contain at least one of the ingredients. How to implement it?
In order to implement this feature, you should use arrays. Please see below a database schema, that can help you achieve this:
Firestore-root
|
--- recipes (collection)
|
--- recipeId (document)
|
--- ingredients: ["salt", "pepper"] (array)
|
--- //other decipe details
In order to find all recipes that contain one of the ingredients, you should use a query that looks like this:
FirebaseFirestore rootRef = FirebaseFirestore.getInstance();
Query query = rootRef.collection("recipes").whereArrayContains("ingredients", "salt");
This is for Android but in the same way you can achieve this for other programming languages. See here more details.
But be aware of one thing, you cannot find all recipes that contain more than one of the ingredients, without making significant changes in your database. Firestore does not allow chained whereArrayContains() calls. So you can get all recipes that contain only one of the ingredients.
Edit: According to your comment:
How to choose a recipe in which there is either "pepper" OR "salt"?
You need to know that there is no OR clause in Firestore. According to my answer from this post, you should create two separate queries and merge the result cliend side.
Edit2:
100,000 recipes to combine on the client.
If this is the use-case of your app then you should query this way. Firestore scales massively.
But this is a fairly simple query!
It is in terms of SQL databases but this is how NoSQL databases work.
If you need to exclude recipes with the ingredient "sugar"?
According to the official documentation, beside the fact that there is no OR clause, there is also another query limitation:
Queries with a != clause. In this case, you should split the query into a greater-than query and a less-than query.
So this is the way in which you can solve this exclude or not equal situation.
Your link also does not solve the problem.
My link cannot solve a problem that cannot be solved in the way you want, it just indicates the constraints that Firestore official documentation provides. In that link, I also provide a workaround in the way the docs recommend. In this case, you should make your own attempt given the information in my answer and that post and ask another question if something else comes up.
The result - recipes with salt AND pepper. But not salt OR pepper!
No, if you create two separate queries and merge the result cliend side, you'll have the desired result. I've test it and it works pretty fine.
I know that there is no "OR". I therefore ask - is there a solution?
Yes it is, the solution I have provided you above, which is the simplest one. If you are not happy with that, you might also consider change your entire database structure so it can be organized according to a reverse look up. Your structure should look like this:
Firestore-root
|
--- salt_peper (collection)
| |
| --- recipeId (document)
| | |
| | --- //recipe with salt
| |
| --- recipeId (document)
| |
| --- //recipe with peper
|
--- salt_sugar (collection)
|
--- recipeId (document)
|
--- //recipe details
As you can see, this is another schema for your database. The first collection will provide you all recipes with salt OR peper.
This practice is called denormalization and is a common practice when it comes to Firebase. For a better understanding, I recommend you see this video, Denormalization is normal with the Firebase Database. It is for Firebase realtime database but same principle apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an item, you need to do it in every place that it exists.
I also recommend you take a look at my answer from this post to see pro and cons regarding the tehnique above.