Structuring Firestore data for efficient reads - firebase

Assume I have an application that shows a list of restaurants & 1000 restaurants to show.
My first impression would be to create a collection of restaurants and each individual restaurant would be a document inside this collection.
The concern with the above approach is that for each user Cloud Firestore would register 1000 reads.
My question is if there is a better way of storing the restaurants to decrease the number of reads?

My first impression would be to create a collection of restaurants and each individual restaurant would be a document inside this collection.
Yes, that's the right way to do it.
The concern with the above approach is that for each user firestore would register 1000 reads
You'll be charged with 1000 read operations only if you read all documents at once. But this is not the right way to do it, you need to limit the data that you get. On how you can achieve this, please check the official documentation regarding order and limit data in Cloud Firestore.
Another most apropiate approach is to load the data in smaller chunks. This practice is called pagination and can be used very simply in Cloud Firestore using startAt() or startAfter() methods.
For Android, this is a recommended way in which you can paginate queries by combining query cursors with the limit() method. I also recommend you take a look at this video for a better understanding.
My question is if there is a better way of storing the restaurants to decrease the number of reads?
And to answer your question, the problem is not about how you store the data is about how you read it.

Related

Firestore data model for events planning app

I am new to Firestore and building an event planning app but I am unsure what the best way to structure the data is taking into account the speed of queries and Firestore costs based on reads etc. In both options I can think of, I have a users collection and an events collection
Option 1:
In the users collection, each user has an array of eventIds for events they are hosting and also events they are attending. Then I query the events collection for those eventIds of that user so I can list the appropriate events to the user
Option 2:
For each event in the events collection, there is a hostId and an array of attendeeIds. So I would query the events collection for events where the hostID === user.id and where attendeeIds.includes(user.id)
I am trying to figure out which is best from a performance and a costs perspective taking into account there could be thousands of events to iterate through. Is it better to search events collections by an eventId as it will stop iterating when all events are found or is that slow since it will be searching for one eventId at a time? Maybe there is a better way to do this than I haven't mentioned above. Would really appreciate the feedback.
In addition to #Dharmaraj answer, please note that none of the solutions is better than the other in terms of performance. In Firestore, the query performance depends on the number of documents you request (read) and not on the number of documents you are searching. It doesn't really matter if you search 10 documents in a collection of 100 documents or in a collection that contains 100 million documents, the response time will always be the same.
From a billing perspective, yes, the first solution will imply an additional document to read, since you first need to actually read the user document. However, reading the array and getting all the corresponding events will also be very fast.
Please bear in mind, that in the NoSQL world, we are always structuring a database according to the queries that we intend to perform. So if a query returns the documents that you're interested in, and produces the fewest reads, then that's the solution you should go ahead with. Also remember, that you'll always have to pay a number of reads that is equal to the number of documents the query returns.
Regarding security, both solutions can be secured relatively easily. Now it's up to you to decide which one works better for your use case.
I would recommend going with option 2 because it might save you some reads:
You won't have to query the user's document in the first place and then run another query like where(documentId(), "in", [...userEvents]) or fetch each of them individually if you have many.
When trying to write security rules, you can directly check if an event belongs to the user trying to update the event by resource.data.hostId == request.auth.uid.
When using the first option, you'll have to query the user's document in security rules to check if this eventID is present in that events array (that may cost you another read). Checkout the documentation for more information on billing.

Is Firestore (NoSQL) a good choice for social media apps?

We are building a social media web app using firebase and use firestore to store users and their posts.
When a user likes a post, we save it in posts/{postID}/likedBy/{userID} and also update totalLikes in the post document.
Let's say our app has 1 million daily users, and they all are liking viral posts very frequently.
Now, firebase says that a document cannot handle more than one write per second. However, we've seen that we can update the document several times per second, but they still don't recommend it.
My question is, what is the best way to store total post likes in firestore, if there's any. Or, should we use some other services?
EDIT: Firestore's distributed counters are made for exactly as suggested by the answer below.
Also, I want to query only those posts which are not liked by a user.
The way I can query this is if our documents inside posts collection contains Map of all the users who liked it, and then run a query where the map doesn't contain current userID. This approach isn't good because it limits the number of likes a post can get as the document size in firestore cannot exceed 1mb.
Another way can be to save the liked posts in the user's document, however by this, we'll not only loose the functionality to just fetch those posts which are not liked by user, it'll also limit the number of posts a user can like.
Third way can be to store the users who liked the post in a sub-collection, which will also loose the query functionality. Similar case would be with storing posts liked by a user in a sub-collection.
Now, either I've not enough knowledge of firestore(actually any other NoSQL database), or I'm thinking right but it's just that NoSQL isn't made for social media apps.
Let's say our app has 1 million daily users, and they all are liking
viral posts very frequently.
Now, firebase says that a document cannot handle more than one write
per second.
My question is, what is the best way to store total post likes in
firestore, if there's any. Or, should we use some other services?
This is the exact scenario for which Firebase recommends to use some distributed counters.
With distributed counterS, "each counter is a document with a subcollection of shards, and the value of the counter is the sum of the value of the shards."
"Write throughput increases linearly with the number of shards, so a distributed counter with 10 shards can handle 10x as many writes as a traditional counter." (traditional counter = counter in one document)

Firestore Collection Write Rate

The article about Best practices for Cloud Firestore states that we should keep the rate of write operations for an individual collection under 1,000 operations/second.
But at the same time, the Firebase team says in Choose a data structure that root-level collections "offer the most flexibility and scalability".
What if I have a root-level collection (e.g. "messages") which expects to have more than 1,000 write operations/second?
If you think at that limitation of 1,000 operations/second it's pretty much but if you find your self in a situation in which you need more than that, then you should consider changing your database schema to allow writes on multiple collections. So you should multiply the number of collections. Having a single collection of messages, in which every user can add messages doesn't sound as a good way to go since you can reach that limitation very soon. In this case you should split that collection into multiple other collections. A possible schema might be the one I have explained in the following video:
https://www.youtube.com/watch?v=u3KwKQddPoo
See, at the end of that video, there is collection named messages which in term contains a roomId document. This document contains a subcollection named roomMessages which contains as documents all messages from a chat room. In this case, there are no chances you can reach that limitation.
But at the same time, the Firebase team says in Choose a data structure that root-level collections "offer the most flexibility and scalability".
But also rememeber, Firestore can as quickly look up a collection at level 1 as it can at level 100, so you don't need to worry about that.
The limit of 1,000 ops/sec per collection only apply to realtime update, so as long as you don't have a snapshot listener this should be okay.
I asked the question on the Cloud Firestore Google Groups
The limit is 10,000 writes per second if no other limits apply first:
https://firebase.google.com/docs/firestore/quotas#writes_and_transactions
Also just keep in mind the best practices for scaling cloud firestore

Work around firestore document size limit?

I need to store a large number of fields, like for a star rating system, but firestore only allows 20,000 fields per document. Is there a known way around this? Right now I am going to 'shard' the fields in multiple documents, and keep the size of each document in a documentSizeTracker document that I use to determine which document to shard to (and add to the counter with a transaction). Is this the correct approach? Any problems with this?
Sharding certainly could work. It's hard to say without knowing exactly what kind of data you'll need from your document, and when, but that's certainly a reasonable option. You could also consider having a parent "summary" doc that contains fields you might want to search on and then split all of your data into several documents inside a subcollection of that parent.
One important nuance here: the limit isn't 20,000 fields, but 20,000 indexed fields. So if you're storing a bunch of data inside your document, but you know that you're not going to be searching on all of them, another alternative is to mark some of your fields as unindexed (which you can now do in the Firebase console in the "Exemptions" section).
If you're dealing with thousands of fields, though, you probably won't want to exempt them all one at a time, so a better alternative might be to place your data as a map inside a container field (named something like "allOfMyData"), then just mark that one field as unindexed. That will automatically remove all indexes from any fields contained inside that map.
Actually, I ran into similar problem with the read and write issues with Firebase. So, here is my conclusion:
# if something small needs to be written & read very often, then use Firebase Realtime Database
Firebase Realtime database allows fast writes, but limits concurrent users to 100,000
Firebase Firestore allows a maximum of 1 write per second per document
It's very expensive to read a document that only contains a rating for example in Firestore
# if something (larger) needs to be read very often with writes usually more than 1 second in between then use Firestore
Firestore allows up to 1,000,000 concurrent users at current Beta release (they might make it more)
It's cheaper to read a large document (less than 1 MiB limit) in Firestore than Firebase Realtime database
# If your model doesn't fit into these two choices, then you should modify your model and split them into 2 models:
1 very small model to store in Firebase Real Database (ratings for example)
1 larger model to store in Firestore
Note: You could use both Firebase Realtime database and Firebase Firestore in the same project. Don't forget to take into account the billing differences between both databases. and their different limits. I believe, it's best to combine them and use the good side of each instead of trying to force solutions into one of them.
Note 2: I really didn't like the shard-ing idea in Firestore suggested solution and work around

Complicated data structuring in firebase/firestore

I need an optimal way to store a lot of individual fields in firestore. Here is the problem:
I get json data from some api. it contains a list of users. I need to tell if those users are active, ie have been online in the past n days.
I cannot query each user in the list from the api against firestore, because there could be hundreds of thousands of users in that list, and therefore hundreds of thousands of queries and reads, which is way too expensive.
There is no way to use a list as a map for querying as far as I know in firestore, so that's not an option.
What I initially did was have a cloud function go through and find all the active users maybe once every hour, and place them in firebase realtime database in the structure:
activeUsers{
uid1: true
uid2: true
uid2: true
etc...
}
and every time I need to check which users are active, I get all fields under activeUsers (which is constrained to a maximum of 100,000 fields, approx 3~5 mb.
Now i was going to use that as my final mechanism, but I just realised that firebase charges for amount of bandwidth used, not number of reads. Therefore it could get very expensive doing this over and over whenever a user makes this request. And I cannot query every single result from firebase database as, while it does not charge per read (i think), it would be very slow to carry out hundreds of thousands of queries.
Now I have decided to use cloud firestore as my final hope, since it charges for number of reads and writes primarily as opposed to data downloaded and uploaded. I am going to use cloud functions again to check every hour the active users, and I'm going to try to figure out the best way to store that data within a few documents. I was thinking 10,000 fields per document with all the active users, then when a user needs to get the active users, they get all the documents (would be
10 if there are 100,000 total active users) and maps those client side to filter the active users.
So I really have 2 questions. 1, If I do it this way, what is the best way to store that data in firestore, is it the way I suggested? And 2, is there an all around better way to be performing this check of active users against the list returned from the api? Have I got it all wrong?
You could use firebase storage to store all the users in a text file, then download that text file every time?
Well this is three years old, but I'll answer here.
What you have done is not efficient and not a good approach. What I would do is as follows:
Make a separate collection, for all active users.
and store all the active users unique field such as ID there.
Then query that collection. Update that collection when needed.

Resources