In the MV2 extension, I attach a Firestore onSnapshot listener in the persistent background page. As I understand it: 1. Firestore downloads all documents on first attaching of the listener, and afterwards, 2. only downloads the changed documents when they change. Since this listener persists over several hours, the total number of Firestore read counts (and hence the cost) is low.
But in the MV3 extension, the service worker (which houses the Firestore listener) is destroyed after five minutes. Therefore, the onSnapshot listener will be destroyed and re-attached several times in just a few hours. On every re-attachment, that listener would potentially re-download all of the user data. So, if the listener gets destroyed and attached five times, we incur five times as many document read counts (and hence the cost) in an MV3 extension as compared to an MV2 extension.
I'd like to understand:
Does using IndexedDB persistence help significantly reduce the document read counts? Even the when the service worker is restarted.
In case we are not using IDB persistence, how is the billing done? For example, the billing docs state that:
Also, if the listener is disconnected for more than 30 minutes (for example, if the user goes offline), you will be charged for reads as if you had issued a brand-new query.
If I re-attach the listener within 15-20 minutes, does it again incur document reads on all user data?
I tried writing a small example myself, and monitor the results on Cloud Console Monitoring to measure "Firestore Instance - Document Reads". However, I was not able to get clear results from it.
Note: For the purpose of discussion, I will avoid the workaround for making service-workers persistent, and focus on a worst case assuming that this workaround does not work.
Does using IndexedDB persistence help significantly reduce the document read counts? Even the when the service worker is restarted.
Yes. Upon a reconnect within the 30m interval, most documents can be read from the disk cache and won't have to be read from/on the server.
In case we are not using IDB persistence, how is the billing done? ... If I re-attach the listener within 15-20 minutes, does it again incur document reads on all user data?
If there is no data in the disk cache and no existing listener on the data, the documents will have to read from the server, and thus you will be charged for each document read there.
Related
There are several questions related to firestore cost, but I couldn't find the one which clarify the question in my head.
I have two cases, and I'd like to know the estimated cost, the document read count, in each case.
Let's assume that I have a one page app which shows 10 users. Opening the app attaches the listener to the userList collection and listens 10 documents from that collection, and closing the app detach the listener from firestore.
Case 1:
If there is no update on any document, I open and close the app, and open it again within 30 minutes. What the document read count would be? 10, 20, or any other?
Case 2:
I open and close the app; one document is updated, and I open it again within 30 minutes. What the document read count would be? 11, 21, or any other?
It depends on what you mean by "close the app".
If you have a listener that gets cut off due to loss of network connectivity, but the app process is still running, the listener will automatically reattach when the network comes back. If the network comes back within 30 minutes, you are not charged for updates. If the network comes back after 30 minutes, you are charged for new query.
If you have a listener that gets cut off because the app process was terminated by the OS, and later gets reattached when the app is launched again, you will be charged for another query.
If the app is simply backgrounded but not terminated, and the listener is still active in the background, there are no changes in behavior, but you are still paying for document updates during the time it is still added but before the app process eventually loses network and is terminated completely.
If your code removes a listener and adds it again, you will be charged for a new query.
You will have to figure out which of these situations apply. The SDK doesn't track the user's intent. It just tracks the behavior of the network, and is affected by the state of the process, as managed by the OS. The user's action of "closing an app" could involve any number of details which are not immediately obvious.
I didn't find any solutions to avoid reading data from the server when using get(). However, I might found a solution but it's not clear to me if it will work. I found that when using the real-time feature, the client will continuously update as the data changes. So per my understanding, if nothing is changed on the server, no reads charged, right?
However, I read that the listener should be removed, and I understood why, what I cannot understand is, if I close the app (listener is removed) and I open the app the second day, am I charged again for the data that was cached a day before?
I'm really confused because I also read that:
Also, if the listener is disconnected for more than 30 minutes (for example, if the user goes offline), you will be charged for reads as if you had issued a brand-new query.
Removing the listener and going online, are not the same exact thing?
I found that when using the real-time feature, the client will continuously update as the data changes. So per my understanding, if nothing is changed on the server, no reads charged, right?
Every query that reaches the server will incur reads for documents returned by the query. Whenever a document is returned from the server, it costs a read. If you have a listener on a set of query results where only one document changes while the listener is active, it costs one read, because only one document must come from the server, and the rest are already in memory. They stay in memory until the listener is removed.
if I close the app (listener is removed) and I open the app the second day, am I charged again for the data that was cached a day before?
Yes. Whenever the results come from the server, you will be billed for those reads. The cache is not used to satisfy query results when using the server as a source.
Removing the listener and going online, are not the same exact thing?
They are not the same thing. Removing a listener says that you're completely done with the results of the query. Going online temporarily and coming back online just resumes the existing query.
I have a large one million document collection with Firebase that I treat as a stack array where the first element gets read and removed from the stack. My main problem is I have over a thousand connections trying to access the collection and I am having issues with connections receiving the same document. To prevent duplicates results, I've resorted to using Mutex as referenced by this post below..
Cloud Firestore document locking
I am using a Mutex to lock each document before removing it from the collection. I use transactions to ensure the mutex owner is not getting overwritten by other connections or to check if the document has not been removed yet.
The problem I have with this solution is as we scale up, more connections are fighting over retrieving a mutex lock. Each connection spends a long time retrying until it successfully locks a document. Avoiding long retries will allow for faster response time and less reads.
So in summary, a connection tries to retrieve a document. It retrieves the document but fails to successfully create a lock because another incoming connection just locked it. So it looks for another document and also fails. It keeps retrying until it beats another connnection to locking the document.
Is it possible to increase throughput and keep read costs low as I scale up?
Yeah, I doubt such a mutex is going to help your throughput.
How important is it that documents are processed in the exact order that they are in the queue? If it is not crucial, you could consider having each client request the first N documents, and then picking one at random to lock-and-process. That would improve your throughput up to N times.
Firestore offers 50000 documents read operations as part of its free bundle.
However, in my application, the client is fetching a collection containing price data. The price data is created over time. Hence, starting from a specific timestamp, the client can read up to 1000 documents. Each document represents one timestamp with the price information.
This is means that if the client refreshes his/her web browser 50 times, it will exhaust my quota immediately. And that is just for a single client.
That is what happened. And got this error:
Error: 8 RESOURCE_EXHAUSTED: Quota exceeded
The price data are static. Once they have been written, it is not supposed to change.
Is there a solution for this issue or I should consider other database other than Firestore?
The error message indicates that you've exhausted the quota that is available. On the free plan the quota is 50,000 document reads per day, so you've read that number of documents already.
Possible solutions:
Upgrade to a paid plan, which has a much higher quota.
Wait until tomorrow to continue, since the quota resets every day.
Try in another free project, since each project has its own quota.
If you have a dataset that will never-ever (or rarely) change, why not write it as JSON object in the app itself. You could make it a separate .js file and then import for reading to make your table.
Alternatively - is there a reason your users would ever navigate through all 1,000 records. You can simulate a full table even with limiting to calls to say 10 or so and then paginate to get more results if needed.
If the listener is disconnected for more than 30 minutes (for example,
if the user goes offline), you will be charged for reads as if you had
issued a brand-new query.
Does this still apply if persistence is enabled?
Situation 1: App is offline for over 30 minutes. Persistence is enabled and reads data from cache. Does reading documents from cache count as read operations?
Situation 2: App is online but no added/modified/deleted operations occur. Persistence is enabled and all data exists in cache. Does opening my app after 30 minutes cause read operations if no new data has been added/modified/deleted?
Firestore documentation
In both cases, if some read operation is satisfied only by the local cache, it is not billed.
The issue with the documentation that you quoted about listeners is specifically regarding the total results of a query that could return multiple documents over time. Note that a query listener can generate updates for new or changed documents indefinitely over time. But if your query listener is disconnected for more than 30 minutes, you are billed for the entire query again, and do not pick up where the listener may have left off previously with partial or in-progress results.