I have a subcollection with documents. If any of them are added or removed, I need to trigger a calculation to derive an overall count (amongst other things) and store that in the parent document.
In order to listen for the document changes, I have a firestore onWrite background function. From this function, I would like to trigger the calculation via pubsub. However, it will happen that the system updates many subcollection documents at once. If I delete 100 documents, I do not want the calculation to be triggered 100 times. That would be a real waste of resources.
So I'm wondering, is there already some sort of mechanism in place that would batch these triggers or the pubsub topic publishing, or do I need to do something specific to make this happen?
If there are other ways to better solve this problem I'm open to suggestions of course. I could possibly even introduce a Redis store if that helps.
Related
I am using a collection in Firebase Firestore to log some activities but I don't want this log collection to grow forever. Is there a way to set a limit to the number of documents in a collection or a size limit for the whole collection or get a notification if it passes a limit?
OR is there a way to automatically delete old documents in a collection just by settings and not writing some cron job or scheduled function?
Alternatively, what options are there to create a rotational logging system for client activities in Firebase?
I don't want this log collection to grow forever.
Why not? There are no downsides. In Firestore the performance depends on the number of documents you request and not on the number of documents you search. So it doesn't really matter if you search 10 documents in a collection of 100 documents or in a collection of 100 MIL documents, the response time will always be the same. As you can see, the number of documents within a collection is irrelevant.
Is there a way to set a limit to the number of documents in a collection or a size limit for the whole collection or get a notification if it passes a limit?
There is no built-in mechanism for that. However, you can create one mechanism yourself in a very simple way. Meaning, that you can create a document in which you can increment/decrement a numeric value, each time a document is added or deleted from the collection. Once you hit the limit, you can restrict the addition of documents in that particular collection.
OR is there a way to automatically delete old documents in a collection just by settings and not writing some cron job or scheduled function?
There is also no automatic operation that can help you achieve that. You can either use the solution above and once you hit the limit + 1, you can delete the oldest document. Or you can use a Cloud Function for Firebase to achieve the same thing. I cannot see any reason why you should use a cron job. You can use a Cloud Scheduler to perform some operation at a specific time, but as I understand you want it to happen automatically when you hit the limit.
Alternatively, what options are there to create a rotational logging system for client activities in Firebase?
If you still don't want to have larger collections, maybe you can export the data into a file and add that file to Cloud Storage for Firebase.
If every document in a collection is a user resource that is limited, how can you ensure the user does not go over their assigned limit?
My first thought was to take advantage of the Firestore triggers to avoid building a real backend, but the triggers sometimes fire more than once even if the inputed data has not changed. I was comparing the new doc to the old doc and taking action if certain keys did not match but if GCP fires the same function twice I get double the result. In this case incrementing or decrementing counts.
The Firestore docs state:
Events are delivered at least once, but a single event may result in multiple function invocations. Avoid depending on exactly-once mechanics, and write idempotent functions.
So in my situation the only solution I can think of is saving the event id's somewhere and ensuring they did not fire already. Or even worse doing a read on each call to count the current docs and adjust them accordingly (increasing read costs).
Whats a smart way to approach this?
If reinvocations (which while possible are quite uncommon) are a concern for your use-case, you could indeed store the ID of the invocation event or something less frequent, like (depending on the use-case) the source document ID.
I know what you are probably thinking, "why does it matter? Don't try to over-complicate it just to optimize pricing". In my case, I need to.
I have a collection with millions of records in Firestore, and each document gets updated quite often. Every-time one gets updated, I need to do some data-cleaning (and more). So I have a function trigger by onUpdate that does that. In the function there's two parameters: document before update and document after update.
My question is:
Because the document is been passed as an argument, does that count as a database read?
The event generated by Cloud Firestore to send to Cloud Functions should not count as an extra read beyond what what was done by the client to initially trigger that event.
I need to delete very large collections in Firestore.
Initially I used client side batch deletes, but when the documentation changed and started to discouraged that with the comments
Deleting collections from an iOS client is not recommended.
Deleting collections from a Web client is not recommended.
Deleting collections from an Android client is not recommended.
https://firebase.google.com/docs/firestore/manage-data/delete-data?authuser=0
I switched to a cloud function as recommended in the docs. The cloud function gets triggered when a document is deleted and then deletes all documents in a subcollection as proposed in the above link in the section on "NODE.JS".
The problem that I am running into now is that the cloud function seems to be able to manage around 300 deletes per seconds. With the maximum runtime of a cloud function of 9 minutes I can manage up to 162000 deletes this way. But the collection I want to delete currently holds 237560 documents, which makes the cloud function timeout about half way.
I cannot trigger the cloud function again with an onDelete trigger on the parent document, as this one has already been deleted (which triggered the initial call of the function).
So my question is: What is the recommended way to delete large collections in Firestore? According to the docs it's not client side but server side, but the recommended solution does not scale for large collections.
Thanks!
When you have too muck work that can be performed in a single Cloud Function execution, you will need to either find a way to shard that work across multiple invocations, or continue the work in a subsequent invocations after the first. This is not trivial, and you have to put some thought and work into constructing the best solution for your particular situation.
For a sharding solution, you will have to figure out how to split up the document deletes ahead of time, and have your master function kick off subordinate functions (probably via pubsub), passing it the arguments to use to figure out which shard to delete. For example, you might kick off a function whose sole purpose is to delete documents that begin with 'a'. And another with 'b', etc by querying for them, then deleting them.
For a continuation solution, you might just start deleting documents from the beginning, go for as long as you can before timing out, remember where you left off, then kick off a subordinate function to pick up where the prior stopped.
You should be able to use one of these strategies to limit the amount of work done per functions, but the implementation details are entirely up to you to work out.
If, for some reason, neither of these strategies are viable, you will have to manage your own server (perhaps via App Engine), and message (via pubsub) it to perform a single unit of long-running work in response to a Cloud Function.
What is the most performant way of using onSnapshot to listen for changes to a large group of documents in Cloud Firestore?
(There would probably be a max of around 50-75 documents being returned by the query, paged with a 'load more' button which uses query.endBefore)
Would it be better to listen for changes to the entire query, or to query once and attach onSnapshot listeners to each document returned?
Note the documents aren't likely to change THAT often, but I still need to be able to listen for when they do.
You're better off listening to changes to the entire query.
Cloud Firestore is pretty efficient when it comes to sending changes over the network. So if you were to create a snapshot listener for a group of 75 documents and one of them changes, Cloud Firestore will only send your device that one changed document. (You'll still receive the full set of data in your snapshot listener, which is generally what you want. Firebase does the work of merging the new data into your cached data.)
Setting up 75 different snapshot listeners, one for each document, doesn't really save you anything in terms of network or battery costs. (In fact, it probably is less efficient on the client.) But it does make life a lot more difficult for you, and also means you'll miss events like new documents being added to your collection.