Firebase Firestore database structure - firebase

I'm building an app using flutter and firebase and was wondering what the best firestore database structure.
I want the ability for users to post messages and then search by both the content of the post and the posters username.
Does it make sense to create one collection for users with each document storing username and other info and a separate collection for the posts with each document containing the post and the username of the poster?
In the unlikely event where the number of posts exceeds a million or more, is there an additional cost of querying this kind of massive collection?
Would it make more sense to store each user's posts as a sub-collection under their user document? I believe this would require additional read operations to access each document's sub-collection. Would this be cheaper or more expensive if I end up getting a lot of traffic?

is there an additional cost of querying this kind of massive collection?
The cost and performance of reading from Firestore are purely based on the amount of data (number of documents and their size) you retrieve, and not in any way on the number of documents in the collection.
But what is limited in Firestore is the number of writes you can do to data that is "close to each other". That intentionally vague definition means that it's typically better for write scalability to spread the data over separate subcollections, if the data naturally lends itself to that (such as in your case).
To get a great introduction to Firestore, and to data modeling trade-offs, watch Getting to know Cloud Firestore.

Related

Is Firestore (NoSQL) a good choice for social media apps?

We are building a social media web app using firebase and use firestore to store users and their posts.
When a user likes a post, we save it in posts/{postID}/likedBy/{userID} and also update totalLikes in the post document.
Let's say our app has 1 million daily users, and they all are liking viral posts very frequently.
Now, firebase says that a document cannot handle more than one write per second. However, we've seen that we can update the document several times per second, but they still don't recommend it.
My question is, what is the best way to store total post likes in firestore, if there's any. Or, should we use some other services?
EDIT: Firestore's distributed counters are made for exactly as suggested by the answer below.
Also, I want to query only those posts which are not liked by a user.
The way I can query this is if our documents inside posts collection contains Map of all the users who liked it, and then run a query where the map doesn't contain current userID. This approach isn't good because it limits the number of likes a post can get as the document size in firestore cannot exceed 1mb.
Another way can be to save the liked posts in the user's document, however by this, we'll not only loose the functionality to just fetch those posts which are not liked by user, it'll also limit the number of posts a user can like.
Third way can be to store the users who liked the post in a sub-collection, which will also loose the query functionality. Similar case would be with storing posts liked by a user in a sub-collection.
Now, either I've not enough knowledge of firestore(actually any other NoSQL database), or I'm thinking right but it's just that NoSQL isn't made for social media apps.
Let's say our app has 1 million daily users, and they all are liking
viral posts very frequently.
Now, firebase says that a document cannot handle more than one write
per second.
My question is, what is the best way to store total post likes in
firestore, if there's any. Or, should we use some other services?
This is the exact scenario for which Firebase recommends to use some distributed counters.
With distributed counterS, "each counter is a document with a subcollection of shards, and the value of the counter is the sum of the value of the shards."
"Write throughput increases linearly with the number of shards, so a distributed counter with 10 shards can handle 10x as many writes as a traditional counter." (traditional counter = counter in one document)

Are Firestore Collections Physically Isolated from Each Other?

I am considering storing multiple tenants in a single Firebase Firestore database. There will only be one collection per tenant and a few shared collections. Some will have more data than others. Some tenants may have a few million records while others may end up with a few billion. I want to confirm that the size of data in one collection will not impact the performance or storage of another collection in the same database.
I couldn't find much in the documentation about how the data is physically stored. Is all the data in Firestore stored in a single blob/file? If so, this could be a problem when there are hundreds of tenants with billions of records each. In an ideal world, each collection would be a physically separate file, and the server orchestration would separate the collections onto multiple servers so that a single server is not sharing the load between a very heavy tenant, and a very light tenant. This scenario would mean that a heavy tenant would slow down a light tenant.
My basic question is: can a single Firestore database infinitely scale up in size assuming that no single collection is bigger than a few billion records?
I know that there are two types of databases: native and datastore. Which of these seems more appropriate, and is the answer to my question different depending on which of these I select?
If the answer is that Firestore cannot scale infinitely in this way, what is the alternative approach? Should I be using Bigtable instead? Cassandra? Or, is there another way to physically divide my Firestore database other than collections?
Some tenants may have a few million records while others may end up with a few billion. I want to confirm that the size of data in one collection will not impact the performance or storage of another collection in the same database.
The performance in Firestore isn't related to the number of documents that exist in a collection. In terms of speed, it doesn't matter if you perform a query on:
A top-level (root-level) collection.
A sub-collection, which basically represents a collection that is nested under a document.
A collection group, which actually means querying collections and sub-collections that exist across the entire database.
The speed will always be the same, as long as the query returns the same number of documents. This is happening because the query performance depends on the number of documents you request and not on the number of documents you search. So it doesn't really matter if you query a collection with 1 MILLION documents or even 1 BILLION documents, the time for getting the same results will be the same.
I couldn't find much in the documentation about how the data is physically stored. Is all the data in Firestore stored in a single blob/file? If so, this could be a problem when there are hundreds of tenants with billions of records each.
In Cloud Firestore, the unit of storage is the document. Documents live in collections, which are simply containers for documents. Please note that Firestore is optimized for storing large collections of small documents. And when I say large, I mean extremely large. So when you perform a query against a collection of 1 MILLION documents, the speed depends on the number of results you return and it does not depend on the number of the documents in which you search, or on the number of documents that exist in other collections in which you aren't performing a search.
Can a single Firestore database infinitely scale up in size assuming that no single collection is bigger than a few billion records?
While when using the Firebase Realtime Database you had to scale using multiple databases, in Firestore this practice is not necessary. However, the are some techniques that are really good explained in the official docs:
Building scalable applications with Firestore
If the answer is that Firestore cannot scale infinitely in this way, what is the alternative approach?
I can definitely massively scale.
See the Firestore best practices and security rules.
You may conceptualize Firestore as being one service being shared by all of Google's customers. Just as Google's attempts to ensure that one customer's (so-called "noisy neighbor") impact on the service does not affect others, you don't want to be a noisy neighbor to yourself.
You need to consider more than just performance.
Security. E.g.see security rules as a mechanism that you may be able to use to help enforce segregation of your tenants' data. You will want to understand fully how to keep different customers' data separated securely. Your customers will want to understand what measures you're employing to ensure their data is keep separate too.
Multitenancy. Google Cloud Platform has no intrinsic (platform-wide) multitenant capabilities and, often, a way to manifest tenancy has been to use different Google Projects for different customers. This is because Projects provide a well-defined security perimeter. You may want to investigate whether (some subset of your customers) would benefit from being one customer, one project.
Quota. Another important consideration is quota. Every Cloud Platform method is constrained by some quota. You will want to be careful in ensuring that quota is distributed fairly across customers so that some customers don't consume all the quota denying other customers access to the service.

What is the best way to get multiple specific data from collections in firestore?

is there any better way to get multiple specific data from collection in firestore?
Let's say have this collection:
--Feeds (collection)
--feedA (doc)
--comments (collection)
--commentA (doc)
users_in_conversation: [abcdefg, hijklmn, ...] //Field contains list of all user in conversation
Then, I'll need to retrieve the user data (name and avatar) from the Users collection, currently, I did 1 query per user, but it will be slow when there are many people in conversation.
What's the best way to retrieve specific users?
Thanks!
Retrieving the additional names is actually a lot faster than most developers expect, as the requests can often be pipelined over a single HTTP/2 connection. But if you're noticing performance problems, edit your question to show the code you use, the data you have, and the performance you're getting.
A common way to reduce the need to load additional documents is by duplicating data. For example, if you store the name and avatar of the user in each comment document, you won't need to look up the user profile every time you read a comment.
If you come from a background in relational databases, this sort of data duplication may be very unexpected. But it's actually quite common in NoSQL databases.
You will of course then have to consider how to deal with updates to the user profile, for which I recommend reading: How to write denormalized data in Firebase While this is for Firebase's other database, the same concepts apply to Firebase. I also in general recommend watching Getting to know Cloud Firestore.
I have tried some solution, but I think this solution is the best for the case:
When a user posts a comment, write a field of array named discussions in the user document containing the feed/post id.
When user load on a feed/post, get all user data which have its id in the user discussions (using array-contains)
it’s efficient and costs fewer transaction processes.

Firestore Collection Write Rate

The article about Best practices for Cloud Firestore states that we should keep the rate of write operations for an individual collection under 1,000 operations/second.
But at the same time, the Firebase team says in Choose a data structure that root-level collections "offer the most flexibility and scalability".
What if I have a root-level collection (e.g. "messages") which expects to have more than 1,000 write operations/second?
If you think at that limitation of 1,000 operations/second it's pretty much but if you find your self in a situation in which you need more than that, then you should consider changing your database schema to allow writes on multiple collections. So you should multiply the number of collections. Having a single collection of messages, in which every user can add messages doesn't sound as a good way to go since you can reach that limitation very soon. In this case you should split that collection into multiple other collections. A possible schema might be the one I have explained in the following video:
https://www.youtube.com/watch?v=u3KwKQddPoo
See, at the end of that video, there is collection named messages which in term contains a roomId document. This document contains a subcollection named roomMessages which contains as documents all messages from a chat room. In this case, there are no chances you can reach that limitation.
But at the same time, the Firebase team says in Choose a data structure that root-level collections "offer the most flexibility and scalability".
But also rememeber, Firestore can as quickly look up a collection at level 1 as it can at level 100, so you don't need to worry about that.
The limit of 1,000 ops/sec per collection only apply to realtime update, so as long as you don't have a snapshot listener this should be okay.
I asked the question on the Cloud Firestore Google Groups
The limit is 10,000 writes per second if no other limits apply first:
https://firebase.google.com/docs/firestore/quotas#writes_and_transactions
Also just keep in mind the best practices for scaling cloud firestore

Is It Possible to Have a Slow Query In Cloud Firestore?

I have read in the documentation that the amount of time for retreiving data will be the same for querying a collection of 6 documents and a collection of 60M.
So is it safe to save all of the data of a specific kind (like users) under the same collection? Will I never have to split them into separate collections for getting better performance?
It is definitely possible to have slow-performing queries on Firestore, but the performance will not be related to the number of documents in the collection that you're querying. A common cause of slow reads is for example having documents that contain way more data than the application needs, which means that it takes more time to download that data to the client than is necessary for the use-case.
In your example: it is indeed normal to store all user profiles in a single collection. Querying 6 users out of that collection will always take the same amount of time, even if you app grows to millions or hundreds of millions of users.

Resources