I am trying to migrate a few plugins that use DS to FS and I was wondering on the data structure. In my DS I am utilizing ancestors
so the top level Kind is Users and any other Kinds consist of ancestors from Users. E.g. Kind Products has ancestor Key(Users,'UUID').
In Firestore world would it be structure looks like this:
1. Users(Collections):
{userID:...
...
so on},
{...},
...list of users
2. Products(Collections).User-1(Doc)
Subcollections{...list of product docs belonging to User1}
.User-2(Doc)
Subcollections{...list of product docs belonging to User2}
Users and Products top-level collections.
or
this structure would be better:
+ Users (collection)
* user_1 (document)
- name: "Blah"
- last: "Blah"
+ Product (subcollection)
* product_1 (document)
- title: "blah...."
- vendor: "blah..."
+ Product_variants (subcollection)
* product_1 (document)
- name: "..."
- price: "..."
* product_2 (document)
- name: "..."
- price: "..."
* product_2 (document)
- title: "blah...."
- vendor: "blah..."
+ Product_variants (subcollection)
* product_1 (document)
- name: "..."
- price: "..."
* product_2 (document)
- name: "..."
- price: "..."
Is there a better way to handle this structure? Also concern from an action update perspective which simpler would be? I am trying to understand tradeoffs between update vs. query. For example if I have users that have more than 100K products and getting events on updates/deletes/... is there downside of that structure.
Update: As of May, 2019, Cloud Firestore now supports collection group queries.
You can now structure the data either way and still be able to query across users.
Original Answer
If I'm understanding correctly, you're asking about the trade-offs of having flat collections vs subcollections.
As far as updates are concerned, there aren't any material differences. One thing to look out for is if you have fields that cluster around a single value. For example, with flat collections if products had an update-time field then by default you'd be limited to 500 updates/second across all users. With products nested within users you're limited to 500 updates/second per user. However, with flattened collections you can work around this by disabling the default single-field index on update-time and creating a composite index on (user, update-time). Once you do that, these are equivalent.
The real difference comes down to which queries are possible. In Firestore as it exists today, you can only query within a subcollection tree. So for example, if you wanted to search for products from a specific title or vendor, you'd only be able to search within a single user.
If you flatten the collections such that products is a top-level collection, you can query across users.
Note that collection group queries are a feature we're developing that will remove this restriction. Once that's launched you'll be able to structure the data either way and still be able to query across users.
Related
I am trying to learn about firestore and its data structure.
I have read that the queries are swallow so if I do a query I don't get the data from the subcollection.
I have an application about shopping, this application has a collection shoppingCart and this shoppingCart has a field name an another collection lists inside and those lists have products with a name.
If I want to get just the shoppingCart name in a query and in other query I want to get the lists inside this shoppingCart should I structure my data as:
shoppingCart
name
Lists
listId1:
Products:
productId1:
name
and then do a query to get the collection shoppingCart an another query to get the subcollection lists.
Or should I duplicate data and do a collection with just the field:
shoppingCart:
name
and then another shoppingCart with the subcollection lists
shoppingCart:
Lists
listId1:
Products:
productId1:
name
I know it depends a lot in the application, but I am a bit confused, because I read the documentation and doesn't say anything against this, but in the realtime database documentation it says to avoid nested list. I know they are not the same but after reading this I don't know if the nested subcollections are a good practice.
Thank you in advance.
When it comes to storing shopping cart products in Firestore, there are three ways in which you can do this.
You can create a shopping cart document that can hold product IDs or references that point to documents. In this case, to display the content of the shopping cart, you have to create separate calls for displaying product details. Since a shopping cart, most likely belongs to a user, the schema should look like this:
Firestore-root
|
--- users (collection)
|
--- $uid (document)
|
--- shoppingCart (collection)
|
--- content (document)
|
--- ["productId", "productId"] (array)
The benefit is that if you're using a real-time listener, if an admin changes a price or if a product becomes unavailable, you'll be notified instantly.
The second option would be to create a shopping cart collection where you can store as documents all items in the cart. This means that when a user clicks the "add to cart" button, then you should copy the product object inside this collection:
Firestore-root
|
--- users (collection)
|
--- $uid (document)
|
--- shoppingCart (collection)
|
--- $productId (document)
|
--- //product fields.
The third option would be to create a single document that can hold all items, as long as the size of the document doesn't get larger than 1 Mib, which is the maximum limit:
Firestore-root
|
--- users (collection)
|
--- $uid (document)
|
--- shoppingCart (collection)
|
--- content (document)
|
--- items (array)
|
--- 0
|
--- //product fields.
If you're interested, I recently started a series of articles called:
How to create an Android app using Firebase?
Here is the part responsible for the Firestore schema.
Edit:
In my case the shoppingCart is shared between multiple users.
In that case, it doesn't make sense to add the shopping cart under the User document. It can be a top-level collection or a stand-alone document in a collection of your choice.
so if I add a product to the cart I want the rest of the users to be notified instantly.
In that case, you should create inside the shopping cart document, an array of UIDs. In this way, you'll be able to know to which users to send a notification.
can I do that like in the second option, making that if a new product is added or deleted I notify the users, or should I do it like in the first option?
It's up to you to decide which option to choose. If one of the above solutions, solves the problem, then you can go ahead with it.
if I want to notify the users in real-time, should I duplicate documents like in Realtime Database or should I not duplicate documents?
You should duplicate the data only when needed. That's why I provided three solutions so you can choose which one fits better.
Coming from an RDMS background I am trying to understand NoSQL databases and planning a simple project that includes topics, posts & comments.
Topics have posts & posts have comments
I have found the following guide that suggests using the following top-level collections:
A users collection
A posts collection
A user-posts collection
A posts-comments collection
https://firebaseopensource.com/projects/firebase/quickstart-android/database/readme/
I fail to understand the benefits of (3) above as surely we can simply filter (2) based on the user, even 3 would still need to be filtered.
What is the logic of having comments as a top-level collection as opposed to having comments as a subcollection under posts? Is this not the better way to store hierarchical data?
In the NoSQL world, we are structuring a database according to the queries that we want to perform.
What is the logic of having comments as a top-level collection as opposed to having comments as a subcollection under posts?
None is better than the other. However, there are some differences:
What are the benefits of using a root collection in Firestore vs. a subcollection?
Is this not the better way to store hierarchical data?
There is no "perfect", "the best" or "the correct" solution for structuring a Cloud Firestore database. We always choose to create a structure for our database that satisfies our queries. So in your case, I would create a schema that looks like this:
Firestore-root
|
--- users (collection)
| |
| --- $uid (document)
| |
| --- //user fields.
|
--- posts (collection)
|
--- $postId (document)
|
--- uid: "veryLongUid"
|
--- //user fields.
|
--- comments (sub-collection)
|
--- $commentId (document)
|
--- uid: "veryLongUid"
|
--- //comment fields.
Using this schema you can:
Get all users.
Get all posts in the database.
Get all posts that correspond to only a particular user.
Get all comments of all posts, of all users in the database. Requires a collection group query.
Get all comments of all posts that correspond to a particular user. Requires a collection group query.
Get all comments of all users that correspond to a particular post.
Get all comments of a particular user that correspond to a particular post.
Am I missing something?
If you think that all the comments of a post might fit into 1 MiB maximum limitation, then you should consider adding all comments into an array. If not, I highly recommend you read the following approach:
How to reduce Firestore costs?
Where I have explained how can we store up to billions of comments and replies in Firestore.
Viewed the Firestore docs + Google's I/O 2019 webinar, but I'm still not clear about the right data modeling for my particular use case.
App lets pro service providers register and publish one or more of their services in pre-defined categories (Stay, Sports, Wellness...) and at pre-defined price points (50$, 75$, 100$...).
Users on the homepage are to filter down first with a price point slider - see wireframe), e.g: 199€, then and optionally by selecting the category, eg: all 'Sports' (at 199€) and the location (e.g: all sports at 199€ in the UK). Optionally because users can also build their list with a button as soon as the price is selected. The same 'build list' button is after the category selection and after the location selection. So 3 depths of filtering are possible.
What would be the ideal data structure, given that I want to avoid thousands of reads each time there's filtering.
Three root-level collections (service providers, price points, service categories?) with their relevant documents? I understand and accept denormalization for the purpose of my filtering.
Here's the wireframe for a better understanding of the filtering:
App lets pro service providers register and publish one or more of their services in pre-defined categories (Stay, Sports, Wellness...) and at pre-defined price points (50$, 75$, 100$...).
Since you're having pre-defined categories, prices, and locations, then the simplest solution for modeling such a database would be to have a single collection of products:
Firestore-root
|
--- products (collection)
|
--- $productId (document)
|
--- name: "Running Shoe"
|
--- category: "Sport"
|
--- price: 199
|
--- location: "Europe"
|
--- country: "France"
In this way, you can simply perform all queries that you need. Since you didn't specify a programming language, I'll write the queries in Java, but you can simply convert them into any other programming language. So for example, you can query all products with a particular price:
FirebaseFirestore db = FirebaseFirestore.getInstance();
Query queryByPrice = db.collection("products").whereEqualTo("price", 199);
If you need to query by price, category and location, then you have to chain multiple whereEqualTo() methods:
Query queryByPrice = db.collection("products")
.whereEqualTo("price", 199)
.whereEqualTo("category", "Sport")
.whereEqualTo("location", "Europe");
If you, however, need to order the results, ascending or descending, don't also forget to create an index.
What would be the ideal data structure, given that I want to avoid thousands of reads each time there's filtering.
If you don't need to have all the results at once, then you have to implement pagination. If you need to know the number of products that exist in the sports category ahead of time, that is not possible without performing a query and counting the available products. I have written an article regarding this topic called:
How to count the number of documents in a Firestore collection?
Another feasible possible solution would be to create a single document that contains all those numbers. In other words, exactly what you're displaying to the users, everything that exists in those screenshots. In this way, you'll only have to pay a single read operation. When the users click on a particular category, only then you should perform the actual search.
I understand and accept denormalization for the purpose of my filtering.
In this case, there is no need to denormalize the data. For more info regarding this kind of operation, please check my answer below:
What is denormalization in Firebase Cloud Firestore?
I have a Firestore structure with an "organizations" collection and a "users" collection.
When a user creates an account via Auth, I'd like to create a new "Organization" and add him to this organization. That means having a "Create" right.
The problem is that, by doing so, the user can create multiple Organizations and be in them.
The other issue I'm facing is regarding the changes. When that user will change their information (name, email, etc), it will also update their line at the "users" collection, but that also means they will be able to change the "organization" reference and point it to another one, which is bad.
So I wonder what is the proper way to do so, and/or if I'm doing it wrong.
That technique is called denormalization and it's a common practice when it comes to top NoSQL databases.
As I understand from your question, you want to add users to be part of the organization. In that case, there is no need to duplicate the data. I would use a structure that looks like this:
Firestore-root
|
---- users (collection)
| |
| --- $uid (document)
| |
| --- organizations: [$orgId, $orgId, $orgId] (array)
|
---- organizations (collection)
|
--- $orgId (document)
|
--- users: [$uid, $uid, $uid] (array)
In which "organizations" is an array that holds organizations IDs, and "users" is an array that holds user IDs.
Since we usually are structuring a Firestore database according to the queries that we want to perform, the above schema will help you query all the organizations a user is a part of or all users that are a part of an organization. This means that if you want to display user data, you have to perform a new Firestore database call.
I'm looking for a proper way to structure Firestore database to handle multiple version histories of documents inside a single collection.
For example: I have a collection named offers which have multiple documents which correspond to multiple offers. For each of these documents, I'd like to have history of changes, something like changes on Google Docs.
Since documents support only adding fields directly or nesting another collection, here's a structure I had in mind:
collections: offers
- documents: offer1, (offer2, offer3, ...)
- fields populated with latest version of the offer content
- nested collection named history
- nested documents for each version (v1, v2, v3), which in turn have fields specifing state of each field in that version.
This seems a bit overly complicated since I have latest state and than nested collection for history. Can this be somehow in flat structure where latest item in array is the latest state, or something similar.
Also, history state is generated on a button click, so I don't need every possible change saved in a history, just snapshots when user saves it.
I'd like to use Firebase as my DB for this, as I need it some other things, so I'm not looking into different solutions for now.
Thanks!
EDIT: According to the Alex's answer, here's my another take on this.
Firestore-root
|
--- offers (collection)
|
--- offerID (document)
| (with fields populated )
| |
| --- history (collection) //last edited timestamp
| |
| --- historyId
| --- historyId
|
--- offerID (document)
(with fields populated with latest changes)
|
--- history (collection) //last edited timestamp
|
--- historyId
--- historyId
This way I can query whole offers collection and get array of offers together with latest status since it's on the same level as the collection itself. Then if I need specific content from history state, I can query history collection of specific offer and get it's history states. Does this make sense?
I'm not sure about denormalization as this seems like it solves my problem and avoids complication.
Once more, requirements are:
- being able to fetch all offers with latest state (works)
- being able to load specific history state (works)
Just every time I update history collection with new state, I overwrite the fields directly in offerID collection with the same, latest, state.
Am I missing something?
In my opinion, your above schema might work but you'll need to do some extra database calls, since Firestore queries are shallow. This means that Firestore queries can only get items from the collection that the query is run against. Firestore doesn't support queries across different collections. So there is no way in which you can get one document and the corresponding history versions that are hosted beneath a collection of that document in a single query.
A possible database structure that I can think of, would be to use a single collection like this:
Firestore-root
|
--- offerId (collection)
|
--- offerHistoryId (document)
| |
| --- //Offer details
|
--- offerHistoryId (document)
|
--- //Offer details
If you want to diplay all history versions of an offer, a single query is required. So you just need to attach a listener on offerId collection and get all offer objects (documents) in a single go.
However, if you only want to get the last version of an offer, then you should add under each offer object a timestamp property and query the database according to it descending. At the end just make a limit(1) call and that's it!
Edit:
According to your comment:
I need to get a list of all offers with their latest data
In this case you need to create a new collection named offers which will hold all the latest versions of your offers. Your new collection should look like this:
Firestore-root
|
--- offers (collection)
|
--- offerHistoryId (document)
| |
| --- date: //last edited timestamp
| |
| --- //Offer details
|
--- offerHistoryId (document)
|
--- date: //last edited timestamp
|
--- //Offer details
This practice is called denormalization and is a common practice when it comes to Firebase. If you are new to NoQSL databases, I recommend you see this video, Denormalization is normal with the Firebase Database for a better understanding. It is for Firebase realtime database but same rules apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an item, you need to do it in every place that it exists.
In your particular case, when you want to create an offer you need to add it in two places, once in your offerId collection and once in your offers collection. Once a new history version of an offer is created, there is only one more operation that you need to do. As before, add the offerHistoryId document in your offerId collection, add the same object in your offers collection, but in this case you need to remove the older version of the offer from the offers collection.
I can think of it like this. Each offers document will have offerHistoryID as number.
You can have a separate root collection for versioned documents of offers(say offers_transactions).
Now write an update trigger cloud function on offers document which will have both after and before values of the document.
Before doing the doc update, you can write the before values into the offers_transactions along with timestamp and latest historyID.
Increment the offerHistoryID by 1 for that offer and update the doc with new values.
Now you can query the root collection offers_transactions for historic transactions based on your filters. This way you can keep your root collection offers cleaner.
Thoughts?
Here's a solution my team uses to leverage Google Cloud Functions to add every collection update to a dedicated "history" collection in Firestore (no command line necessary):
Identify path of document to watch: COLLECTION-NAME/{documentID} (or define a specific document to watch)
Create a new Cloud Function (1st gen because 2nd gen doesn't support Firestore triggers yet)
Set trigger as any Firestore "write" event watching the document path from Step 1.
In the Cloud Function's inline code editor, select the language of your choice (I'll use Python), and include google-cloud-firestore==2.6.0 in your requriements.txt file (or whatever the latest version is)
Finally, define your Cloud Function's code (be sure to import Firestore correctly!)
def hello_firestore(event, context):
resource_string = context.resource
# print out the resource string that triggered the function
print(f"Function triggered by change to: {resource_string}.")
# now print out the entire event object
print(str(event))
# now import firestore and add event to the 'history' collection
from google.cloud import firestore
db = firestore.Client(project="YOUR-PROJECT-ID")
newHistDoc = db.collection(u'history').add(event)