How to read data only from cache in Firestore? - firebase

I read some answers here on stackoverflow, were is said that every time we get documents from Firestore, the SDK is always trying to get the online version of the documents, even if no documents were changed. This is ending in having more reads billed, which in my opinion is not necessary, since nothing is changed.
What I want to achieve
Let's say a have a collection of 5 documents. When the user opens the app for the first, I want to pay 5 reads. However, when the user opens the app for the second time, I just want to pay a read operation only for documents that were changed. If nothing is changed, I don't want to pay any reads, I just want to read the data from cache. Is this possible?

The key phrase in your question is:
If nothing is changed, I don't want to pay any reads,
To determine if something changes about the documents in your cache, the Firestore server will need to read those documents. And hence you will pay for those reads.
The only way to work around this, is to take control of filtering the changed documents yourself.
For example, if you include a lastModified field in each of your documents, you can use that to retrieve only the new/modified documents from the server, and then run your other read operations against the local cache by specifying source options.

Related

Firebase Firestore Read Costs - Clarification

I am using Firestore DB for an e-commerce app. I have a collection of products, each product has a document that has a "title" field and "search_keywords" field. The search keyword field stores an array. For example, if the title="apple", then the "search_keywords" field would store the following array: ["a","ap","app","appl","apple"]. When the user starts typing "apple" in the search box, I want to show the user, all products where "search_keywords" contains "a", then when they type the "p", I want to show all products where search keywords contain "ap"...and so on. Here is the snippet of code that gets called each time an additional letter is typed:
firebaseFireStore.collection("Produce").whereArrayContains("search_keywords", toSearch).get()
For example, in every case, the documents that would be returned on each successive call where an additional letter was typed would be a subset of what was returned in the previous call - it would just be a smaller list of documents - documents that were read on the previous query. My question is since the documents retrieved on a successive query are a subset of those retrieved in a prior query, would I be charged reads based on how many documents each successive query returns, or would Firestore have them in the cache and read them from there since the successive result set is a subset of a prior result set. This question has been on my mind for a while and every time I search for it, I can't seem to find a clear answer. For example, based on my research, the following two posts on Stackoverflow have involved similar questions and the following are relevant quotes from there, but they seem to contradict each other because #AlexMamo says "it will always read the online version of the documents...[when online]" and #Doug Stevenson says "if the local persistence is enabled on your client (it is by default) and the documents haven't been updated in the server...[it will get them from the cache]". I would appreciate any clarification on this if anyone knows the answer. Thanks.
"If the OP has offline persistence enabled, which is by default in Cloud Firestore, then he will be able to read the cache only while offline. When the OP has internet connectivity, it will always read the online version of the documents." –
Alex Mamo (https://stackoverflow.com/a/69320068/14556386)
"According to this answer by Doug Stevenson, the reads are only charged when performed upon the server, not your local cache. That is if the local persistence is enabled on your client (it is by default) and the documents haven't been updated in the server."
(https://stackoverflow.com/a/61381656/14556386)
EDIT: In addition, if for each product document that was retrieved by the Firestore search, I download its corresponding image file from Firebase Storage. Would it charge me for downloading that file on successive attempts to download it or would it recognize that I had previously downloaded that image and fetch it from cache automatically?
First of all, storing ["a", "ap", "app", "appl", "apple"] into an array and performing an whereArrayContains() query, doesn't sound like a feasible idea. Why? Imagine you have a really big online shop with 100k products, in which 5k start with "a". Are you willing to pay 5k reads every time a user types "a"? That's a very costly feature.
Most likely you should return the corresponding documents when the user types, for example, two, or even three characters. You'll reduce costs enormously. Or you might take into consideration using the solution I have explained in the following article:
How to filter Firestore data cheaper?
Let's go forward.
For example, in every case, the documents that would be returned on each successive call where an additional letter was typed would be a subset of what was returned in the previous call, it would just be a smaller list of documents.
Yes, that's correct.
My question is since the documents retrieved on a successive query are a subset of those retrieved in a prior query, would I be charged reads based on how many documents each successive query returns?
Yes. You'll always be charged with a number of reads that is equal to the number of documents that are returned by your query. It doesn't matter if a query was previously performed, or not. Every time you perform a new query, you'll be charged with a number of reads that is equal to the number of documents you get.
For example, let's assume you perform this query:
.whereArrayContains("search_keywords", "a")
And you get the 100 documents, and right after that you perform:
.whereArrayContains("search_keywords", "ap")
And you get only 30 documents, you'll have to pay 130 reads, and not only 100. So it doesn't matter if the documents that are returned by the second query are a subset of the documents that are returned by the first query.
Or would Firestore have them in the cache and read them from there since the successive result set is a subset of a prior result set.
No, it won't. It will read those documents from the cache only if the user losses the internet connectivity, otherwise it will always read the online versions of the documents that exist on the Firebase servers. The cached version of the documents works only when the user is offline. I have also written an article on this topic called:
How to drastically reduce the number of reads when no documents are changed in Firestore?
In Doug's answer:
Am I charged with read operations everytime the location is changed?
He clearly says:
You are charged for the number of documents read on the server every time you call get().
So if you called get(), you have to pay as reads, the number of documents that are returned.
The following statement is available:
If local persistence is enabled in your client (it is by default), then the documents may come from the cache if the documents are also not changed on the server.
When you are listening for real-time updates. According to the docs:
When you listen to the results of a query, you are charged for a read each time a document in the result set is added or updated. You are also charged for a read when a document is removed from the result set because the document has changed.
And I would add, if nothing has changed, you don't have to pay anything. Again, according to the same docs:
Also, if the listener is disconnected for more than 30 minutes (for example, if the user goes offline), you will be charged for reads as if you had issued a brand-new query.
So if the listener is active, you always read the documents from the cache. Bear in mind that a get() operation is different than listening for real-time updates.
if for each product document that was retrieved by the Firestore search, I download its corresponding image file from Firebase Storage. Would it charge me for downloading that file on successive attempts to download it or would it recognize that I had previously downloaded that image and fetch it from cache automatically?
You'll always be charged if you download the image over and over again unless you are using a library that helps you cache the images. For Android, there is a library called Glide:
Glide is a fast and efficient open-source media management and image loading framework for Android that wraps media decoding, memory and disk caching, and resource pooling into a simple and easy-to-use interface.

How to check if a document exists with a given id in firestore, without costing money?

I have a scenario where I have the phone number of the user and I want to check if the user is already registered on my app or not. To do this, I have a collection in firestore. In this collection, I the contact number of the individual user as a document. Whenever the user goes on the app and enters his mobile number, the app sends the request to search a specific document using
final snapShot = await Firestore.instance.collection('rCust').document(_phoneNumberController.text).get();
My database structure is as follows
Due to this, my firestore billing is spiking up really fast. In just with 4-5 queries, my number of reads spiked from 75 to 293. It would be great if anyone could guide me in how to do this efficiently.
If you want to know if a document definitely exists on the server, it will always cost you a document read. There is currently no way to avoid this cost. It's the cost of accessing the massively scalable index that allows you to find 1 document among potentially billions.
You could try to query your local cache first, which is doesn't cost anything. You do this by passing a Source.cache argument to get(). If you want to make the assumption that presence in the local cache always means that the document exists on the server, that will save you one document read. However, if the document is deleted on the server, the local cache query will be incorrect. You will still have to query the server to know for sure.
To check if a document exists, you can use the .exists propety in the documentSnapshot, in your case:
if(snapShot.exists) {
}
From that query, you are selecting a single document, not a collection.
Because we can't see other code, I am assuming that your firestore usage is actually not spiking due to your query, but due to you viewing your documents in the firebase web console. Viewing the console on the web also incurrs billing, and lists documents 300 at a time.
You can check it doing this
if(snapShot.getResults().exists()) {
// ...
}
if you don't want to set each time you send the phoneNumber to the document but instead updating just that number, you should use update("fieldToUpdate",value) on the document you are setting the data instead of using .set(value)

does accessing the firebase firestore database dashboard will be considered as read operation?

I am now in developing phase for the project. currently the project only using one Android app as the frontend. the query from Android using limit and pagination. but the total number of documents read is way above the expected number.
I am trying to figure this out, why the number of read documents is so big even though the user is only one (me). I am scared the project will not be feasible if the number of read is so big. thats why i need to figure out the firestore read behaviour
When I accessed the firestore dashboard, and select a collection like the image below, it will show blue loading indicator and then show all documents available. currently in the event collection I have 52 documents. I access all documents in the event collection like this for several times for debugging purpose.
so whenever i tap that event collection, I assume it will be counted as 52 read operation, so the read operation will not only come from Android device but also from the dashboard ? thats why the number of reads is so big. am I right ?
if thats the case....
say if I have 100000 documents in event collection, then whenever i tap that event collection, will i perform 100000 read operation as well ? is there a way to limit this dashboard read ?
so the read operation will not only come from the Android device but also from the dashboard? That's why the number of reads is so big. am I right?
Yes, you are right.
say if I have 100000 documents in event collection, then whenever I tap that event collection, will I perform 100000 read operation as well?
No, you'll be charged only for the number of documents that belong to the first page. Inside the Console, there is a pagination mechanism especially implemented for that. So you'll be not charged for all the documents that exist in your collection.
Is there a way to limit this dashboard read?
The limitation already exists but be aware that as much as you scroll down, you get more documents which means more read operations charged.
One thing to bear in mind about the Firebase console is that it reflects changes to visible documents in real time, and each one of those changes also costs you a read. So, if you leave the console open while documents are changing, you will accumulate reads over time, even if you aren't actively using the console. This is a common source of unexpected reads.

Firestore pricing

There are several questions asked about this topic but I cant find one that answers my question. As described here, there is no clear explanation as to whether the minimum charges are applicable to query.get() or real-time listeners as well. Quoted:
There is a minimum charge of one document read for each query that you perform, even if the query returns no results.
The reason am asking this question even though it may seem obvious for someone is due to the section; *for each query that you perform* in that statement which could mean a one time trigger e.g with get() method.
Scenario: If 10 users are listening to changes in a collection with queries i.e query.addSnapshotListener() then change occurs in one document which matches query filter of only two users, are the other eight charged a cost of one read too?
Database used: Firestore
In this scenario I would say no, the other eight would not be counted as reads because the documents they are listening to have not been updated or have not been added/removed from that collection based on their filters (query params). The reads aren't based on changes to the collection but rather changes to the stream of documents you are specifically listening to. Because that 1 document change was not part of the documents that the other 8 users were listening to then there is no new read for them. However, if that 1 document change led to that document now matching the query filters of those other 8, then yes there would be 8 new reads for those users. Hope that makes sense.
Also it's worth noting that things like have offlinePersistence enabled via the sdk and firestore's caching maximize the efficiency of limiting reads as well as using a singleton Observable that multiple instances in your app subscribe to as oppose to opening multiple streams of the same query throughout your app. Doesn't really apply to this question directory but again while in the same vein, it's worth noting.

Complicated data structuring in firebase/firestore

I need an optimal way to store a lot of individual fields in firestore. Here is the problem:
I get json data from some api. it contains a list of users. I need to tell if those users are active, ie have been online in the past n days.
I cannot query each user in the list from the api against firestore, because there could be hundreds of thousands of users in that list, and therefore hundreds of thousands of queries and reads, which is way too expensive.
There is no way to use a list as a map for querying as far as I know in firestore, so that's not an option.
What I initially did was have a cloud function go through and find all the active users maybe once every hour, and place them in firebase realtime database in the structure:
activeUsers{
uid1: true
uid2: true
uid2: true
etc...
}
and every time I need to check which users are active, I get all fields under activeUsers (which is constrained to a maximum of 100,000 fields, approx 3~5 mb.
Now i was going to use that as my final mechanism, but I just realised that firebase charges for amount of bandwidth used, not number of reads. Therefore it could get very expensive doing this over and over whenever a user makes this request. And I cannot query every single result from firebase database as, while it does not charge per read (i think), it would be very slow to carry out hundreds of thousands of queries.
Now I have decided to use cloud firestore as my final hope, since it charges for number of reads and writes primarily as opposed to data downloaded and uploaded. I am going to use cloud functions again to check every hour the active users, and I'm going to try to figure out the best way to store that data within a few documents. I was thinking 10,000 fields per document with all the active users, then when a user needs to get the active users, they get all the documents (would be
10 if there are 100,000 total active users) and maps those client side to filter the active users.
So I really have 2 questions. 1, If I do it this way, what is the best way to store that data in firestore, is it the way I suggested? And 2, is there an all around better way to be performing this check of active users against the list returned from the api? Have I got it all wrong?
You could use firebase storage to store all the users in a text file, then download that text file every time?
Well this is three years old, but I'll answer here.
What you have done is not efficient and not a good approach. What I would do is as follows:
Make a separate collection, for all active users.
and store all the active users unique field such as ID there.
Then query that collection. Update that collection when needed.

Resources