Firestore listener mechanism efficiency - firebase

If I understand correctly, in the initialization phase of addSnapshotListener you get a list of all the documents (even if it is 500 trillion documents) from the QuerySnapshot if you call the getDocuments function.
Then, every time you modify or add a document to a collection, you get from QuerySnapshot all the documents that have been modified by calling the getDocumentChanges function or all the existing documents by calling getDocuments.
That means both at the initialization stage and after every change, I always get a list of all the documents. That's logical? Assuming I have 500 trillion documents under the same collection (just for the sake of exaggeration), at every change and initialization of the app will I get them all?
Is that really the case?
Or is it some kind of lazy instantiation or something?
Because if so, when I would like to question the whole collection, no matter what I get at first the whole list?

The QuerySnapshot always contains all documents that match the query (or collection reference). Even when there's an update and only a subset of the documents matched by the query is changed, the QuerySnapshot still contains all of the documents, even though in its communication between the SDK and the backend serves, Firestore only synchronizes the modified documents. If you only want to process the changes, you can process just the changes between the snapshots.

Related

Firebase Firestore, Delete Collection with a Callable Cloud Function

if you see here https://firebase.google.com/docs/firestore/solutions/delete-collections
you can see the below
Consistency - the code above deletes documents one at a time. If you
query while there is an ongoing delete operation, your results may
reflect a partially complete state where only some targeted documents
are deleted. There is also no guarantee that the delete operations
will succeed or fail uniformly, so be prepared to handle cases of
partial deletion.
so how to handle this correctly?
this means "preventing users from accessing this collection while deletion is in progress?"
or "If the work is stopped by accessing the collection in the middle, is it to call the function again from the failed part to proceed with the complete deletion?"
so how to handle this correctly?
It's suggesting that you should check for failures, and retry until there are no documents remaining (or at least until you are satisfied with the result).

Cost of reads when matches in a query have been updated in Firestore

Folks,
Hope someone knows the answer to this Firebase costing question.
Imagine I am listening to a query stream. The query returns 10 Firebase documents. One of the matching documents gets updated and therefore the listener callback is triggered again with the 10 matching results. 9 documents in the result set haven not changed. Have I just incurred another 10 reads from Firebase from a costing perspective?
Only the documents that change get read from the server after the first callback to the listener. Everything else comes from a cache in memory (and do not cost a read), as long as the listener is still attached to the query. You can find out which documents changed by looking at the QuerySnapshot - there will be a property that tells you which documents were actually added/changed/removed from the result set.

Structuring a firestore database to filter by what is not in the array?

I have am building collection that will contain over a million documents. Each document will contain one token and a history table. A process retrieves a token, it stores the process id in the history table inside the document so it can never be used again by the same process. The tokens are reusable by different processes. I want to make each process pull a document/token and never be able to pull that same document/token again.
My approach is to have a stored history table in each document with the processes that have used the token. That is why you need to query for what is not in the array.
Firestore does not have a condition where you can search for what is not in an array. How would I perform a query like such below where array-does-not-contain being a placeholder to search through an array where 'process-001' is not in the history array?
db.collection('tokens').where('history', 'array-does-not-contain',
'process-001').limit(1).get();
Below is how I'm planning to structure my collection,
My actual problem,
I have a multiple processes running and I only want each process to pull documents from firebase that it's never seen before. The firebase collection will be over a million documents and growing.
Firestore is not very well suited for queries that need to look for things that don't exist. The problem is that the indexes it uses are only meant to tell you if things exist. The universe of strings that don't exist would be impossible to efficiently quantify for indexing.
The only want to make this happen is to know the names of all the processes ahead of time, and create values for them in the index. You would do this with a map type object, not an array:
- token: "1234"
- history: {
"process-001": false,
"process-002": false,
"process-003": false
}
This document can be queried to find out if "history.process-001" has a value of false, then updated to true when the process uses it. But again, without all the process names known ahead of time and populated in each document, the query is not possible.
See also:
Firestore get documents where value not in array?
How to query Cloud Firestore for non-existing keys of documents

StreamBuilder Controls and Firestore Pricing

I have a two-part question. After reading through Firestore pricing, it says that you are charged based on the number of documents you read and write and the operations you perform such as deleting and some other factors. With that being said, I am using a StreamBuilder that continually refreshes itself whenever the list is scrolled whether there is new data in the database or not. Right now, the builder is fetching from a collection with very little data but that collect can grow to be bigger. With that being said, my questions are:
Each time the StreamBuilder refreshes itself to show new data, is it actually fetching all the documents again from the Firestore collection or is it only looking for changes and updating them? If it is fetching the documents again from Firestore, does Firestore consider this as downloading multiple documents each time it refreshes or does it count it only once and if there are updates to any new document fetched, those are counted separately?
If it fetches all the documents over and over again every 2 seconds or even less as in the current behavior, is there a way to limit this say to every 30 seconds or to when every a RefreshIndicator is used so as to avoid multiple unnecessary reads? I tried using a StreamController but the stream still refreshes every time the list is touched or every second.
Well i guess it depends a bit on your code. I think there are methods to listen to firestore changes constantly.
However if you use the most common queries then this should not be the case. Here my reasoning why, according to my understanding:
Streambuilder: The builder function is triggered everytime data hits the sink of a stream.
Sink is the input channel for any data. Streams immediately return data which is put in the sink.
Firestore: If you execute a firestore "query" it will read document by document and return it once it is read. Once all documents are read the connection will be closed.
If you now assign the firestore query as stream to your builder, example below. The builder is triggered when a document is read. In the builder you then probably build a widget which is displayed.
Once the firestore query has read all documents no new data will be pushed into the sink and therefore the builder will not be triggered anymore. This query will then be completed and no longer listen to changes as the connection will be closed.
Therefore the documents are usually only read once during the lifetime of a streambuilder.
StreamBuilder<QuerySnapshot>(
stream: Firestore.instance.collection('your collection').snapshots(),
builder: (BuildContext context,
AsyncSnapshot<QuerySnapshot> snapshot) {
//Your own code to handle the data
})
I recently build an app where I read tasks from firestore and process the documents via StreamBuilder. An easy way to test how often a document is read is by simply printing the document to your console in the builder section.
What I observed is that documents are only read once as long as the Widget tree in which the Streambuilder resides is not rebuild.
So to answer your question:
My understanding is that if the StreamBuilder refreshes or is initialized again then it triggers again the query and reads the data. According to the firestore documentation each read of the document is counting towards your limits and costs. So I would say yes it counts for all documents included in your query.
I am not sure how you constantly refresh or initialize the streambuilder, therefore I can't give you a clear answer. If you just use the code similar to above once during the build of the widget tree then it should be read only once...
Without some more details, I cannot provide more information.
Each time the StreamBuilder refreshes, it will query Firestore for the documents in the collection again. Firestore will count this as a read operation for each document retrieved. However, Firestore does not count it as multiple reads for the same document if it is fetched multiple times. Firestore only charges for each document read once.
Firestore provides real-time updates through the use of listeners. When you listen to a Firestore collection using a StreamBuilder, Firestore will automatically send updates whenever there are changes to the documents in the collection. This means that the StreamBuilder will only fetch the documents that have changed, rather than the entire collection. However, if the entire collection changes (e.g. if a new document is added), then Firestore will fetch the entire collection again.
Each time the StreamBuilder refreshes, it creates a new query to Firestore, which counts as a read operation. If you refresh the StreamBuilder frequently, this can lead to a large number of read operations. However, Firestore does not charge you for reading the same document multiple times in a short period of time. For example, if you refresh the StreamBuilder every second and the same document is retrieved each time, Firestore will only count this as one read operation for that document.
To limit the number of reads, you can consider using a caching mechanism to store the data on the client-side and avoid making unnecessary network requests. You can also adjust the frequency of fetching new data by using a timer or a debounce mechanism to avoid refreshing too frequently.

Can we not query collections inside transactions?

Looking at https://firebase.google.com/docs/reference/js/firebase.firestore.Transaction I see four methods: delete, set, get, update.
I was about to construct a lovely little collection query and pass it to .get, but I see the docs say that .get "Reads the document referenced by the provided DocumentReference."
It appears this means we cannot get a collection, or query a collection, with a Transaction object.
I could query those with the query's .get() method instead of the transaction's .get() method, but if the collection changes out from under me, the transaction will end up in an inconsistent state without retrying.
It seems I am hitting a wall here. Is my understanding correct? Can we not access collections inside a transaction in a consistent way?
Your understanding is correct. When using the web and mobile SDKs, you have to identify the individual documents that you would like to ensure will not change before your transaction is complete. If those documents come from a collection query ahead of time, fine. But think for a moment about how not-scalable it would be if you had to track every document in a (very large) collection in order to complete your transaction.
However, for backend SDKs, you can perform a query inside a transacction and effectively transact on all the documents that were returned by the query, up to the limit of number of documents in a transaction (500).
You can run queries (not just fetch single documents) in a transaction's get() method, but that's only for server execution. So if you really need to do that (say for maintaining denormalized data's consistency), you can put that code in a cloud function and make use of server-side transactions

Resources