I need an optimal way to store a lot of individual fields in firestore. Here is the problem:
I get json data from some api. it contains a list of users. I need to tell if those users are active, ie have been online in the past n days.
I cannot query each user in the list from the api against firestore, because there could be hundreds of thousands of users in that list, and therefore hundreds of thousands of queries and reads, which is way too expensive.
There is no way to use a list as a map for querying as far as I know in firestore, so that's not an option.
What I initially did was have a cloud function go through and find all the active users maybe once every hour, and place them in firebase realtime database in the structure:
activeUsers{
uid1: true
uid2: true
uid2: true
etc...
}
and every time I need to check which users are active, I get all fields under activeUsers (which is constrained to a maximum of 100,000 fields, approx 3~5 mb.
Now i was going to use that as my final mechanism, but I just realised that firebase charges for amount of bandwidth used, not number of reads. Therefore it could get very expensive doing this over and over whenever a user makes this request. And I cannot query every single result from firebase database as, while it does not charge per read (i think), it would be very slow to carry out hundreds of thousands of queries.
Now I have decided to use cloud firestore as my final hope, since it charges for number of reads and writes primarily as opposed to data downloaded and uploaded. I am going to use cloud functions again to check every hour the active users, and I'm going to try to figure out the best way to store that data within a few documents. I was thinking 10,000 fields per document with all the active users, then when a user needs to get the active users, they get all the documents (would be
10 if there are 100,000 total active users) and maps those client side to filter the active users.
So I really have 2 questions. 1, If I do it this way, what is the best way to store that data in firestore, is it the way I suggested? And 2, is there an all around better way to be performing this check of active users against the list returned from the api? Have I got it all wrong?
You could use firebase storage to store all the users in a text file, then download that text file every time?
Well this is three years old, but I'll answer here.
What you have done is not efficient and not a good approach. What I would do is as follows:
Make a separate collection, for all active users.
and store all the active users unique field such as ID there.
Then query that collection. Update that collection when needed.
Related
I am new to Firestore and building an event planning app but I am unsure what the best way to structure the data is taking into account the speed of queries and Firestore costs based on reads etc. In both options I can think of, I have a users collection and an events collection
Option 1:
In the users collection, each user has an array of eventIds for events they are hosting and also events they are attending. Then I query the events collection for those eventIds of that user so I can list the appropriate events to the user
Option 2:
For each event in the events collection, there is a hostId and an array of attendeeIds. So I would query the events collection for events where the hostID === user.id and where attendeeIds.includes(user.id)
I am trying to figure out which is best from a performance and a costs perspective taking into account there could be thousands of events to iterate through. Is it better to search events collections by an eventId as it will stop iterating when all events are found or is that slow since it will be searching for one eventId at a time? Maybe there is a better way to do this than I haven't mentioned above. Would really appreciate the feedback.
In addition to #Dharmaraj answer, please note that none of the solutions is better than the other in terms of performance. In Firestore, the query performance depends on the number of documents you request (read) and not on the number of documents you are searching. It doesn't really matter if you search 10 documents in a collection of 100 documents or in a collection that contains 100 million documents, the response time will always be the same.
From a billing perspective, yes, the first solution will imply an additional document to read, since you first need to actually read the user document. However, reading the array and getting all the corresponding events will also be very fast.
Please bear in mind, that in the NoSQL world, we are always structuring a database according to the queries that we intend to perform. So if a query returns the documents that you're interested in, and produces the fewest reads, then that's the solution you should go ahead with. Also remember, that you'll always have to pay a number of reads that is equal to the number of documents the query returns.
Regarding security, both solutions can be secured relatively easily. Now it's up to you to decide which one works better for your use case.
I would recommend going with option 2 because it might save you some reads:
You won't have to query the user's document in the first place and then run another query like where(documentId(), "in", [...userEvents]) or fetch each of them individually if you have many.
When trying to write security rules, you can directly check if an event belongs to the user trying to update the event by resource.data.hostId == request.auth.uid.
When using the first option, you'll have to query the user's document in security rules to check if this eventID is present in that events array (that may cost you another read). Checkout the documentation for more information on billing.
We are building a social media web app using firebase and use firestore to store users and their posts.
When a user likes a post, we save it in posts/{postID}/likedBy/{userID} and also update totalLikes in the post document.
Let's say our app has 1 million daily users, and they all are liking viral posts very frequently.
Now, firebase says that a document cannot handle more than one write per second. However, we've seen that we can update the document several times per second, but they still don't recommend it.
My question is, what is the best way to store total post likes in firestore, if there's any. Or, should we use some other services?
EDIT: Firestore's distributed counters are made for exactly as suggested by the answer below.
Also, I want to query only those posts which are not liked by a user.
The way I can query this is if our documents inside posts collection contains Map of all the users who liked it, and then run a query where the map doesn't contain current userID. This approach isn't good because it limits the number of likes a post can get as the document size in firestore cannot exceed 1mb.
Another way can be to save the liked posts in the user's document, however by this, we'll not only loose the functionality to just fetch those posts which are not liked by user, it'll also limit the number of posts a user can like.
Third way can be to store the users who liked the post in a sub-collection, which will also loose the query functionality. Similar case would be with storing posts liked by a user in a sub-collection.
Now, either I've not enough knowledge of firestore(actually any other NoSQL database), or I'm thinking right but it's just that NoSQL isn't made for social media apps.
Let's say our app has 1 million daily users, and they all are liking
viral posts very frequently.
Now, firebase says that a document cannot handle more than one write
per second.
My question is, what is the best way to store total post likes in
firestore, if there's any. Or, should we use some other services?
This is the exact scenario for which Firebase recommends to use some distributed counters.
With distributed counterS, "each counter is a document with a subcollection of shards, and the value of the counter is the sum of the value of the shards."
"Write throughput increases linearly with the number of shards, so a distributed counter with 10 shards can handle 10x as many writes as a traditional counter." (traditional counter = counter in one document)
On Firestore I have a social app that stores each user as a document, and queries based on users within a certain distance.
If a user launched the app and had 1,000 users within 50 miles for example, would I be charged for 1000 reads for downloading all nearby profiles? That seems like it would be hyper expensive if I got charged that much every time a user queried nearby users. Is there a cheaper way to do this?
As far as I know, if your query returns 1 document, you'll be charged 1 read. If your query returns 1000 documents, you'll be charged 1000 reads.
I'm not sure how your app might look like, I'd rather re-structure fetching process. For instance, I'd rather not fetch the entire 1000 users at once.
Instead, the way of getting a fresh set of 10 or 20 group of nearby users whenever a person wants to see new users seems much better to me.
Hope this helps you.
Note: Be aware that your queries won't get any extra charges for having supplementary documents in a collection that are unread.
Have a look at Managing large result sets which help you manage queries that return a large number of results.
You can use Realtime Database as an alternative. It seems cheaper than Firestore. No document read. 10 GB is free and it means 200 million chat messages.
I use Blaze plan and i only pay for Firestore Reads. I plan to migrate some tables to old Realtime Database. I have 10.000+ users. I just show a calendar & dining menu to them from Firestore. I don't want to pay for such simple things.
Before creating a new app I wanna make sure I get the pricing model correct.
For example in a phonebook app, I have a collection called userList that has a list of users which are individual documents.
I have 50k users on my list, which means I have 50k documents in my collection.
If I were to get the userList collection it will read all 50k documents.
FireStore allows 50k document reads. Does that mean 50k document reads in total or 50k document read per document?
As in the example of my phonebook app if it is 50k document reads in total I will run out of the free limit in just one get call.
If you actually have to pull an entire collection of 50k documents, the question you likely should be asking is how to properly structure a Firestore Database.
More than likely you need to filter these documents based on some criteria within them by using the query WHERE clause. Having each client device hold 50k documents locally sounds like poor database planning and possibly a security risk.
Each returned document from your query counts as 1 read. If there are no matches to your query, 1 read is charged. If there are 50k matches, there are 50k reads charged.
For example, you can retrieve the logged in user's document and be charged 1 read with something like:
db.collection('userList').where('uid', '==', clientUID)
Note: As of 10/2018 Firestore charges 6 cents (USD) per 100k reads after the first 50k/ day.
The free quota is for your entire project. So you're allowed 50.000 document reads under the entire project.
Reading 50K user profile documents will indeed use that free quota in one go.
Reading large numbers of documents is in general something you should try to prevent when using NoSQL databases.
The client apps that access Firestore should only read data that they're going to immediately show to the user. And there's no way you'll fit 50K users on a screen.
So more likely you have a case where you're aggregating over the user collection. E.g. things like:
Count the number of users
Count the number of users named Frank
Calculate the average length of the user names
NoSQL databases are usually more limited in their query capabilities than traditional relational databases, because they focus on ensuring read-scalability. You'll frequently do extra work when something is written to the database, if in exchange you can get better performance when reading from the database.
For better performance you'll want to store these aggregation values in the database, and then update them whenever a user profile is written. So you'll have a "userCount", a document with "userCount for each unique username", and a "averageUsernameLength".
For an example of how to run such aggregation queries, see: https://firebase.google.com/docs/firestore/solutions/aggregation. For lower write volumes, you can also consider using Cloud Functions to update the counters.
Don't call all users in one go. You can limit your query to get a limited number of users. And when a user will scroll your query will get more users. And as no one is going to scroll fro 50k users so you can get rid of a bundle of cost. This is something like saving memory in case of recycle view.
I am developing an iOS app with Firebase Realtime Database. The app will potentially have billions of posts with a number of images and data that needs to be retrieved based on the people a specific user follows (something like Instagram).
I understand that the best practice in Firebase is to structure data as flat as possible which would mean having a "Posts" node with potentially billion of entries, which I would then filter by a kind of 'posted_by' parameter. This begs two questions:
1) Will I be able to retrieve said posts with a query that returns posts by any of the users I follow? (By passing something like an array of the users I follow)
2) Will Firebase be effective enough to loop through potentially billions of posts to find the ones that match my criteria, or is there otherwise a better way to structure data so as to make the app as optimal as possible?
Thanks in advance for the answers.
Billions of entries are no problem.
You should check if Firebase is the most cost efficient solution if you have huge volume of data.
1) Firebase can do that, but you probably don't want the user to wait for all entries (when there are a lot for a single user), but instead request them "page" by "page" and only request more pages on demand when the user scrolls up/down.
2) If you ensure you have an index on the user id, then it doesn't have to go through each one individually. Searching by index is efficient.