I'm working on a product that displays the results of running races. Races could have thousands of participants. So, in the days after a medium-sized event, there might be 3000 non-authenticated users wanting to browse 3000 results.
Although not every visitor will view all the results, the maximum damage at 3000 * 3000 would be 9,000,000 reads which at $.06 (Google cloud pricing) would cost $540,000 (Update: I'm a dummy, I missed the "per 100,000 documents" part, so this would only be $540).
Obviously, I wouldn't deliver all 3000 results for each visit - there would be paging and limits. Though, there's something inherently scary about the possibility of those costs.
Questions:
Is firebase simply the wrong technology for this type of product?
Is firebase not really intended for non-authenticated apps? Obviously DDOS becomes a concern for public access and there's no real protection in FB for this.
Every post I've read on these topics assumes developers are building apps for authenticated users.
9,000,000 reads which at $.06 (Google cloud pricing) would cost $540,000
The Firestore pricing of $0.06 is for 100,000 document reads, so 9 million document reads cost $540.
Aside from that: you should model your data in a way that ensures you read the data that the user actually sees. For example, if all users will read the entirety of all 3,000 documents, consider using a data bundle to distribute that to them.
Realistically though it is more likely that each user will read just a subset of the documents, and probably not of all 3,000 documents. So consider if you can combine the part that they'll read into a more cost-efficient structure. For if these were news articles: you could store the headline and intro paragraph of the first 100 articles in a single document, and just read that document (let's call it the frontpage) into each client when it starts.
There are many more ways to model the data, depending on the use-cases of your app. To learn more on how to think about such data modeling, I recommend reading NoSQL data modeling and watching the excellent Get to know Cloud Firestore video series.
Related
In my application there will be users (tens/hundreds). Each user will have many documents of the same type(typeA). I will need to read all these documents for a current user. I plan to use the following option:
root collection: typeACollection
|
nested collections for users: user1Collection, user2Collection, user3Collection ....
|
all documents for a specific user
An alternative to this solution is to create a separate root collection for each user and store documents of this type in it. But I do not like this solution - there will be a "not clear" structure.
user1typeACollection, user2typeACollection, user3typeACollection ....
your opinion which of the options is preferable (performance/price) - first or second?
There is no singular best structure here, it all depends on the use-cases of your app.
The performance of a read operation in Firestore purely depends on the amount of data retrieved, and not on the size of the database. So it makes no difference if you read 20 user documents from a collection of 100 documents in total, or if there are 100 million documents in there - the performance will be the same.
What does make a marginal difference is the number of API calls you need to make. So loading 20 user documents with 20 cals will be slower than loading them with 1 call. But if you use a single collection group query to load the documents from multiple collections of the same name, that's the same performance again - as you're loading 20 documents with a single API call.
The cost is also going to be the same, as you pay for the number of documents read and the bandwidth consumed by those documents, which is the same in these scenarios.
I highly recommend watching the Getting to know Cloud Firestore video series to learn more about data modeling considerations and pricing when using Firestore.
Understand firestore charge based on read / write operation.
But I notice that the firestore read from server per app launch, it will cause a big read count if many user open the app quite frequent.
Q1 Can I just limit user read from server for first time login. After that it just read for those update document per app launch?
For example there's a chat app group.
100 users
100 message
100 app launch / user / day
It will become 1,000,000 read count per day?
Which is ridiculous high.
Q2 Read is count per document, doesn't matter is root collection / sub collection, right?
For example, I read from a root collection that contain 10 subcollection and each of them having 10 documents, which will result 100 read count, am i right?
Thanks.
That’s correct, Cloud Firestore cares less about the amount of downloaded data and more about the number of performed operations.
As Cloud Firestore’s pricing depends on the number of reads, writes, and deletes that you perform, it means that if you had 100 users communicating within one chat room, each of the users would get an update once someone sends a message in that chat, therefore, increasing the number of read operations.
Since the number of read operations would be very much affected by the number of people in the same chatroom, Cloud Firestore suits best (price-wise) for a person-to-person chat app.
However, you could structure your app to have more chat rooms in order to decrease the volume of reads. Here you can see how to store different chat rooms, while the following link will guide you to the best practices on how to optimize your Cloud Firestore realtime updates.
Please keep in mind that Cloud Firestore itself does not have any rate limiting by default. However, Google Cloud Platform, has configurable billing alerts that apply to your entire project.
You can also limit the billing to $25/month by using the Flame plan, and if there is anything unclear in your bill, you can always contact Firebase support for help.
Regarding your second question, a read occurs any time a client gets data from a document. Remember, only the documents that are retrieved are counted - Cloud Firestore does searching through indexes, not the documents themselves.
By using subcollections, you can still retrieve data from a single document, which will count only as 1 read, or you can use a collection group query that will retrieve all the documents within the subcollection, counting into multiple reads depending on the amount of documents (in the example you put, it would be 10x10 = 100).
I'm working on some posting forum projects and trying to figure out the ideal Firestore database structure.
I read that documents have a max size of 1 mg but what are the pros and cons to maxing out the storage space of each document by having multiple posts stored in a document rather than using a single document for each post?
I think it would be cheaper. Assuming that the app would make use of all the data in a document, the bandwidth costs would be the same but rather than multiple reads, I would be charged for only one document. Does this make sense?
Would it also be faster?
You can likely store many posts in a single document, and depending on your application, there may be good reasons for doing so. Just keep a few things in mind:
Firestore always reads complete documents. So if you store 100 posts in a single 1MB document, to only display 10 of those posts, you may have reduced the read operations by 10x, but you've increased the bandwidth consumption by 10x. And your mobile users will likely also pay for that bandwidth.
Implementing your own sharding strategy is not always hard, but it's seldom related to application functionality.
My guidelines when modeling data in any NoSQL database is:
model application screens in your database
I tend to model the data in my database after the screens that I have in my application. So if you typically show a list of headlines of recent articles when a user starts the app, I might actually create a document that contains just the headlines of recent articles. That way the app only has to read a single document with just the headlines, instead of having to read each individual post. This reduces not only the number of documents the app needs to read, but also the bandwidth it consumes.
don't be afraid to duplicate data
This goes hand-in-hand with the previous guideline, and is very normal across all NoSQL databases, but goes against the core of what many of us have learned from relational databases. It is sometimes also referred to as denormalizing, as it counters the database normalization of relations database models.
Continuing the example from before: you'll probably have a separate document for each post, just to make sure that each post has its own single point of definition. But you'll store parts of that post in many other places, such as in the document-of-recent-headlines that we had before. This means that we'll have to duplicate the data for each new post into that document, and possibly multiple other places. This process is known as fan-out, and there are some common strategies for updating this denormalized data.
I find that this duplication leads to no concerns, as long as it is clear what the main point of definition for each entity is. So in our example: if there ever is a difference between the headline of a post in the post-document itself, and the document-of-recent-headlines, I know that I should update the document-of-recent-headlines, since the post-document itself is my point-of-definition for the post.
The result of all this is that I often see my database as part actual data storage, part prerendered fragments of application screens. As long as the points of definition are clear, that works quite well and allows me to define data models that scale efficiently both for users of the applications that consume the data and for the cost to operate them.
To learn more about NoSQL data modeling:
NoSQL data modeling
Getting to know Cloud Firestore, which contains many more examples of these prerendered application screens.
I need an optimal way to store a lot of individual fields in firestore. Here is the problem:
I get json data from some api. it contains a list of users. I need to tell if those users are active, ie have been online in the past n days.
I cannot query each user in the list from the api against firestore, because there could be hundreds of thousands of users in that list, and therefore hundreds of thousands of queries and reads, which is way too expensive.
There is no way to use a list as a map for querying as far as I know in firestore, so that's not an option.
What I initially did was have a cloud function go through and find all the active users maybe once every hour, and place them in firebase realtime database in the structure:
activeUsers{
uid1: true
uid2: true
uid2: true
etc...
}
and every time I need to check which users are active, I get all fields under activeUsers (which is constrained to a maximum of 100,000 fields, approx 3~5 mb.
Now i was going to use that as my final mechanism, but I just realised that firebase charges for amount of bandwidth used, not number of reads. Therefore it could get very expensive doing this over and over whenever a user makes this request. And I cannot query every single result from firebase database as, while it does not charge per read (i think), it would be very slow to carry out hundreds of thousands of queries.
Now I have decided to use cloud firestore as my final hope, since it charges for number of reads and writes primarily as opposed to data downloaded and uploaded. I am going to use cloud functions again to check every hour the active users, and I'm going to try to figure out the best way to store that data within a few documents. I was thinking 10,000 fields per document with all the active users, then when a user needs to get the active users, they get all the documents (would be
10 if there are 100,000 total active users) and maps those client side to filter the active users.
So I really have 2 questions. 1, If I do it this way, what is the best way to store that data in firestore, is it the way I suggested? And 2, is there an all around better way to be performing this check of active users against the list returned from the api? Have I got it all wrong?
You could use firebase storage to store all the users in a text file, then download that text file every time?
Well this is three years old, but I'll answer here.
What you have done is not efficient and not a good approach. What I would do is as follows:
Make a separate collection, for all active users.
and store all the active users unique field such as ID there.
Then query that collection. Update that collection when needed.
On Firestore I have a social app that stores each user as a document, and queries based on users within a certain distance.
If a user launched the app and had 1,000 users within 50 miles for example, would I be charged for 1000 reads for downloading all nearby profiles? That seems like it would be hyper expensive if I got charged that much every time a user queried nearby users. Is there a cheaper way to do this?
As far as I know, if your query returns 1 document, you'll be charged 1 read. If your query returns 1000 documents, you'll be charged 1000 reads.
I'm not sure how your app might look like, I'd rather re-structure fetching process. For instance, I'd rather not fetch the entire 1000 users at once.
Instead, the way of getting a fresh set of 10 or 20 group of nearby users whenever a person wants to see new users seems much better to me.
Hope this helps you.
Note: Be aware that your queries won't get any extra charges for having supplementary documents in a collection that are unread.
Have a look at Managing large result sets which help you manage queries that return a large number of results.
You can use Realtime Database as an alternative. It seems cheaper than Firestore. No document read. 10 GB is free and it means 200 million chat messages.
I use Blaze plan and i only pay for Firestore Reads. I plan to migrate some tables to old Realtime Database. I have 10.000+ users. I just show a calendar & dining menu to them from Firestore. I don't want to pay for such simple things.