Firestore Collection Write Rate - firebase

The article about Best practices for Cloud Firestore states that we should keep the rate of write operations for an individual collection under 1,000 operations/second.
But at the same time, the Firebase team says in Choose a data structure that root-level collections "offer the most flexibility and scalability".
What if I have a root-level collection (e.g. "messages") which expects to have more than 1,000 write operations/second?

If you think at that limitation of 1,000 operations/second it's pretty much but if you find your self in a situation in which you need more than that, then you should consider changing your database schema to allow writes on multiple collections. So you should multiply the number of collections. Having a single collection of messages, in which every user can add messages doesn't sound as a good way to go since you can reach that limitation very soon. In this case you should split that collection into multiple other collections. A possible schema might be the one I have explained in the following video:
https://www.youtube.com/watch?v=u3KwKQddPoo
See, at the end of that video, there is collection named messages which in term contains a roomId document. This document contains a subcollection named roomMessages which contains as documents all messages from a chat room. In this case, there are no chances you can reach that limitation.
But at the same time, the Firebase team says in Choose a data structure that root-level collections "offer the most flexibility and scalability".
But also rememeber, Firestore can as quickly look up a collection at level 1 as it can at level 100, so you don't need to worry about that.

The limit of 1,000 ops/sec per collection only apply to realtime update, so as long as you don't have a snapshot listener this should be okay.
I asked the question on the Cloud Firestore Google Groups
The limit is 10,000 writes per second if no other limits apply first:
https://firebase.google.com/docs/firestore/quotas#writes_and_transactions
Also just keep in mind the best practices for scaling cloud firestore

Related

Are Firestore Collections Physically Isolated from Each Other?

I am considering storing multiple tenants in a single Firebase Firestore database. There will only be one collection per tenant and a few shared collections. Some will have more data than others. Some tenants may have a few million records while others may end up with a few billion. I want to confirm that the size of data in one collection will not impact the performance or storage of another collection in the same database.
I couldn't find much in the documentation about how the data is physically stored. Is all the data in Firestore stored in a single blob/file? If so, this could be a problem when there are hundreds of tenants with billions of records each. In an ideal world, each collection would be a physically separate file, and the server orchestration would separate the collections onto multiple servers so that a single server is not sharing the load between a very heavy tenant, and a very light tenant. This scenario would mean that a heavy tenant would slow down a light tenant.
My basic question is: can a single Firestore database infinitely scale up in size assuming that no single collection is bigger than a few billion records?
I know that there are two types of databases: native and datastore. Which of these seems more appropriate, and is the answer to my question different depending on which of these I select?
If the answer is that Firestore cannot scale infinitely in this way, what is the alternative approach? Should I be using Bigtable instead? Cassandra? Or, is there another way to physically divide my Firestore database other than collections?
Some tenants may have a few million records while others may end up with a few billion. I want to confirm that the size of data in one collection will not impact the performance or storage of another collection in the same database.
The performance in Firestore isn't related to the number of documents that exist in a collection. In terms of speed, it doesn't matter if you perform a query on:
A top-level (root-level) collection.
A sub-collection, which basically represents a collection that is nested under a document.
A collection group, which actually means querying collections and sub-collections that exist across the entire database.
The speed will always be the same, as long as the query returns the same number of documents. This is happening because the query performance depends on the number of documents you request and not on the number of documents you search. So it doesn't really matter if you query a collection with 1 MILLION documents or even 1 BILLION documents, the time for getting the same results will be the same.
I couldn't find much in the documentation about how the data is physically stored. Is all the data in Firestore stored in a single blob/file? If so, this could be a problem when there are hundreds of tenants with billions of records each.
In Cloud Firestore, the unit of storage is the document. Documents live in collections, which are simply containers for documents. Please note that Firestore is optimized for storing large collections of small documents. And when I say large, I mean extremely large. So when you perform a query against a collection of 1 MILLION documents, the speed depends on the number of results you return and it does not depend on the number of the documents in which you search, or on the number of documents that exist in other collections in which you aren't performing a search.
Can a single Firestore database infinitely scale up in size assuming that no single collection is bigger than a few billion records?
While when using the Firebase Realtime Database you had to scale using multiple databases, in Firestore this practice is not necessary. However, the are some techniques that are really good explained in the official docs:
Building scalable applications with Firestore
If the answer is that Firestore cannot scale infinitely in this way, what is the alternative approach?
I can definitely massively scale.
See the Firestore best practices and security rules.
You may conceptualize Firestore as being one service being shared by all of Google's customers. Just as Google's attempts to ensure that one customer's (so-called "noisy neighbor") impact on the service does not affect others, you don't want to be a noisy neighbor to yourself.
You need to consider more than just performance.
Security. E.g.see security rules as a mechanism that you may be able to use to help enforce segregation of your tenants' data. You will want to understand fully how to keep different customers' data separated securely. Your customers will want to understand what measures you're employing to ensure their data is keep separate too.
Multitenancy. Google Cloud Platform has no intrinsic (platform-wide) multitenant capabilities and, often, a way to manifest tenancy has been to use different Google Projects for different customers. This is because Projects provide a well-defined security perimeter. You may want to investigate whether (some subset of your customers) would benefit from being one customer, one project.
Quota. Another important consideration is quota. Every Cloud Platform method is constrained by some quota. You will want to be careful in ensuring that quota is distributed fairly across customers so that some customers don't consume all the quota denying other customers access to the service.

Firebase Firestore database structure

I'm building an app using flutter and firebase and was wondering what the best firestore database structure.
I want the ability for users to post messages and then search by both the content of the post and the posters username.
Does it make sense to create one collection for users with each document storing username and other info and a separate collection for the posts with each document containing the post and the username of the poster?
In the unlikely event where the number of posts exceeds a million or more, is there an additional cost of querying this kind of massive collection?
Would it make more sense to store each user's posts as a sub-collection under their user document? I believe this would require additional read operations to access each document's sub-collection. Would this be cheaper or more expensive if I end up getting a lot of traffic?
is there an additional cost of querying this kind of massive collection?
The cost and performance of reading from Firestore are purely based on the amount of data (number of documents and their size) you retrieve, and not in any way on the number of documents in the collection.
But what is limited in Firestore is the number of writes you can do to data that is "close to each other". That intentionally vague definition means that it's typically better for write scalability to spread the data over separate subcollections, if the data naturally lends itself to that (such as in your case).
To get a great introduction to Firestore, and to data modeling trade-offs, watch Getting to know Cloud Firestore.

Structuring Firestore data for efficient reads

Assume I have an application that shows a list of restaurants & 1000 restaurants to show.
My first impression would be to create a collection of restaurants and each individual restaurant would be a document inside this collection.
The concern with the above approach is that for each user Cloud Firestore would register 1000 reads.
My question is if there is a better way of storing the restaurants to decrease the number of reads?
My first impression would be to create a collection of restaurants and each individual restaurant would be a document inside this collection.
Yes, that's the right way to do it.
The concern with the above approach is that for each user firestore would register 1000 reads
You'll be charged with 1000 read operations only if you read all documents at once. But this is not the right way to do it, you need to limit the data that you get. On how you can achieve this, please check the official documentation regarding order and limit data in Cloud Firestore.
Another most apropiate approach is to load the data in smaller chunks. This practice is called pagination and can be used very simply in Cloud Firestore using startAt() or startAfter() methods.
For Android, this is a recommended way in which you can paginate queries by combining query cursors with the limit() method. I also recommend you take a look at this video for a better understanding.
My question is if there is a better way of storing the restaurants to decrease the number of reads?
And to answer your question, the problem is not about how you store the data is about how you read it.

Work around firestore document size limit?

I need to store a large number of fields, like for a star rating system, but firestore only allows 20,000 fields per document. Is there a known way around this? Right now I am going to 'shard' the fields in multiple documents, and keep the size of each document in a documentSizeTracker document that I use to determine which document to shard to (and add to the counter with a transaction). Is this the correct approach? Any problems with this?
Sharding certainly could work. It's hard to say without knowing exactly what kind of data you'll need from your document, and when, but that's certainly a reasonable option. You could also consider having a parent "summary" doc that contains fields you might want to search on and then split all of your data into several documents inside a subcollection of that parent.
One important nuance here: the limit isn't 20,000 fields, but 20,000 indexed fields. So if you're storing a bunch of data inside your document, but you know that you're not going to be searching on all of them, another alternative is to mark some of your fields as unindexed (which you can now do in the Firebase console in the "Exemptions" section).
If you're dealing with thousands of fields, though, you probably won't want to exempt them all one at a time, so a better alternative might be to place your data as a map inside a container field (named something like "allOfMyData"), then just mark that one field as unindexed. That will automatically remove all indexes from any fields contained inside that map.
Actually, I ran into similar problem with the read and write issues with Firebase. So, here is my conclusion:
# if something small needs to be written & read very often, then use Firebase Realtime Database
Firebase Realtime database allows fast writes, but limits concurrent users to 100,000
Firebase Firestore allows a maximum of 1 write per second per document
It's very expensive to read a document that only contains a rating for example in Firestore
# if something (larger) needs to be read very often with writes usually more than 1 second in between then use Firestore
Firestore allows up to 1,000,000 concurrent users at current Beta release (they might make it more)
It's cheaper to read a large document (less than 1 MiB limit) in Firestore than Firebase Realtime database
# If your model doesn't fit into these two choices, then you should modify your model and split them into 2 models:
1 very small model to store in Firebase Real Database (ratings for example)
1 larger model to store in Firestore
Note: You could use both Firebase Realtime database and Firebase Firestore in the same project. Don't forget to take into account the billing differences between both databases. and their different limits. I believe, it's best to combine them and use the good side of each instead of trying to force solutions into one of them.
Note 2: I really didn't like the shard-ing idea in Firestore suggested solution and work around

Advantages of firestore sub-collections

The firestore docs don't have an in depth discussion of the tradeoffs involved in using sub-collections vs top-level collections, but do point out that they are less flexible and less 'scalable'. Given that you sacrifice flexibility in setting up your data in sub-collections, there must be some definite plus sides besides a mentally satisfying structure.
For example how does the time for a firestore query on a single key across a large collection compare with getting all items from a much smaller collection?
Say we want to query a large collection 'People' for all people in a family unit. Alternatively, partition the data by family in the first place into family units.
People -> person: {family: 'Smith'}
versus
Families -> family: {name:'Smith'} -> People -> person
I would expect the latter to be more efficient, but is this correct? Are the any big-O estimates for each?
Any other advantages of sub-collections (eg for transactions)?
I’ ve got some key points about subcollections that you need to be aware of when modeling your database.
1 – Subcollections give you a more structured database.
2 - Queries are indexed by default: Query performance is proportional to the size of your result set, not your data set. So does not matter the size of your collection, the performance depends on the size of your result set.
3 – Each document has a max size of 1MB. For instance, if you have an array of orders in your customer document, it might be a good idea to create a subcollection of orders to each customer because you cannot foresee how many orders a customer will have. By doing this you don’t need to worry about the max size of your document.
4 – Pricing: Firestore charges you for document reads, writes and deletes. Therefore, when you create many subcollections instead of using arrays in the documents, you will need to perform more read, writes and deletes, thus increasing your bill.
To answer the original question about efficiency:
Querying all people with the family 'Smith' from the people top-level collections really is not any slower than asking for all the people in the 'Smith' family sub-collection.
This is explained in the How to Structure Your Data episode of the Get to Know Cloud Firestore video series.
There are some trade-offs between top-level collections and sub-collections to be aware of. Depending on the specific queries you intend to use you may need to create composite indexes to query top-level collections or collection group indexes to query sub-collections. Both these index types count towards the 200 index exemptions limit.
These trade-offs are discussed in detail near the bottom of the Understanding Collection Group Queries blog post and in Maps, Arrays and Subcollections, Oh My! episode of the Get to Know Cloud Firestore video series.
I've linked to the relevant parts of both videos.
I was wondering about the same thing. The documentation mainly talks about arrays vs sub-collections. My conclusion is that there are no clear advantages of using a sub-collection over a top-level collection. Sub collections had some clear technical limitations before, but I think those are removed with the recent introduction of collection group queries.
Here are some advantages of both approaches:
Sub collection:
Your database "feels" more structured as you will have less top-level collections listed.
No need to store a reference/foreign key/id of the parent document, as it is implied by the database structure. You can get to the parent via the sub collection document ref.
Top-level collection:
Documents are easier to delete. Using sub collections you need to make sure to first delete all sub collection documents before you delete the parent document. There is no API for this so you might need to roll your own helper functions.
Having the parent id directly in each (sub) document might make it easier to process query results, depending on the application.
Todd answered this in firebase youtube video
1) There's a limit to how many documents you can create per minute in
a single collection if the documents have an always-increasing value
(like a timestamp)
2) Very large collections don't do as well from a
performance standpoint when you're offline. But they are generally
good options to consider.

Resources