Trigger function on batch create with firebase - firebase

In my app, I have two ways of creating users.
One is a singular add which triggers a cloud function onCreate to send email and does some other logic.
The other one is by batch which ultimately triggers the same function for each added document.
Question is how can I trigger a different function when users are added by a batch ?
I looked into firebase documentation and it doesn't seem to have this feature. Am I wrong ?
This will greatly help reducing the number of reads and I can bulk send emails to added users instead of sending them one by one.

The trigger on Cloud functions for document creation is only one.
What you can do is to have two different functions with the same trigger and incode differentiate between both creation methods.
This can be something like adding to the document two more values:
creation_method
batch
with creation method you can evaluate its value on each document to verify if the execution continues or it finishes at that point.
batch can be used in the batch created to identify the whole batch.
for creation_method I recommend there different values:
singular
batch_normal
batch_final
on Batch just having a batchID
For the function for singular creation verify that is singular and thats it.
For the batch function make that it only continue on batch_final status and get all the values that have the same batchId.
This approach will not reduce the reads as the reads are billed for each document read so unless you depend on additional documents the number of reads will be the same.
As a work around if you want to reduce the amount you are billed per reads you can change to Realtime Database the triggers you mentioned also exist and it has the advantage that it doesn't bill for reads.

Related

Firebase realtime database limit for delete operations

I'm a firebase user recently diving into rtdb, and just found a limit docs explaining write limit for a single db instance, saying the quote below:
The limit on write operations per second on a single database. While not a hard limit, if you sustain more than 1,000 writes per second, your write activity may be rate-limited.
In firestore's security rules for example, delete operation is in the category of write operation, and i guess such concept would be applicable to other firebase services. So i want to exactly know if delete operation is subject to write limit for rtdb instance.
FYI, i'm planning to use the latest node js admin sdk with cloud functions to operate a huge number of deletes, using this link's method for huge number of different paths.
So, if the delete op is subject to rtdb write operation, it seems to be a critical mistake to deploy this function even if only few number of users are likely to trigger this function concurrently. And even few concurrent invocations would soon max out the per-second write limit, considering that firebase admin sdk is good at iterating those ops really quickly.
Since i have to specify the id(key) of path for each removal(-so that no nested data would be deleted unintentionally), simply deleting parent path is not applicable to this situation, and even really dangerous..
If delete op is not subject to write limit, then i also want to know if there is truly no single limit for delete operations for rtdb!! Hope this question reach to firebase gurus in the community! Comments are welcomed and appreciate! Thank you in advance [:
A delete operation does count as a write operation. If you run 20K delete operations i.e. 20K separate .remove() operations simultaneously using Promise.all(), they all will be counted as unique operation and you'll be rate limited. Those additional delete requests over the limit will take time to succeed.
Instead if you are using a Cloud function you can create a single object including all paths to be deleted and use update() to remove all those nodes in a single write operation. Let's say you have a root node users and each user node has a points node and you want to remove it from all the users.
const remObject = {
"user_id_1/points": null,
"user_id_2/points": null
}
await admin.database().ref("users").update(remObject)
Although you would need to know IDs of all users, this will remove points node from all users in a single operation and hence you won't be rate limited. Another benefit of doing this would be all those nodes will be deleted for sure unlike executing individual requests where some of them may fail.
If you run different `remove()` operation for each user as shown below, then it'll count as N writes where N is number of operations.
const userIDs = []
const removeRequests = userIDs.map(u => admin.database().ref(`users/${u}/points`).remove())
await Promise.all(removeRequests)
// userIDs.length writes which will count towards that rate limit
I ran some test functions with above code and no surprise both adding and removing 20K nodes using distinct operations with Promise.all() took over 40 seconds while using a single update operation with an object took just 3.
Do note that using the single update method maybe limited by "Size of a single write request to the database" which is 16 MB for SDKs and 256 MB for REST API. In such cases, you may have to break down the object in smaller parts and use multiple update() operations.

do you need to create a separate collection/document for reading an aggregated document with firebase cloud function

I want to use cloud function to produce an aggregated document containing all the data i need for the first page of my app. The aggregated document will be updated each time a document is add/updated in a Firestore collection A.
In order to do so, I have to create a separate collection B containing a single document(the aggregated doc from cloud function) which the app will fetch from when it start right? Hence, my cloud function will be updating the single document in Collection B? Am I correct in my understanding of how using cloud function to aggregate data works? Thank you very much
This is indeed a totally valid approach. Note that you may use a Transaction, in the Cloud Function, if there is a risk that source data is updated by several users in parallel.
You don't give any detail on what is in collection A (identical docs or different docs?) and what is aggregated (numbers, headlines,...), but you should try to avoid reading all the docs from collection A each time the Cloud Function is triggered. This may generate some unnecessary extra cost. If the first page of your app aggregates some figures, you may use some counters.

Prevent more than 1 write a second to a Firestore document when using a counter with cloud function

Background:
I have a Firestore database with a users collection. Each user is a document which contains a contacts collection. Each document in that collection is a single contact.
Since firestore does not have a "count" feature for all documents, and since I don't want to read all contacts to count how many contacts a user has, I trigger cloud functions when a contact is added or deleted which increments or decrements numberOfContacts in the user document. In order to make the function idempotent, it has to do multiple reads and writes to prevent incrementing the counter more than once if it's called more than once for the same document. This means that I need to have a different collection of eventIDs that I've already handled so I don't duplicate it. This requires me to run another function once a month to go through each user deleting all such documents (which is a lot of reads and some writes).
Issue
Now the challenge is that the user can import his/her contacts. So if a user imports 10,000 contacts, this function will get fired 10,000 times in quick succession.
How do I prevent that?
Current approach:
Right now I am adding a field in the contact document that indicates that the addition was part of an import. This gets the cloud function to not increment.
I perform the operation from the client 499 contacts at a time in a transaction, which also increments the count as the 500th write. That way the count stays consistent if something failed halfway.
Is this really the best way? It seems so complicated to just have a count of contacts available. I end up doing multiple reads and writes each time a single contact changes plus I have to run a cleanup function every month.
I keep thinking there's gotta be a simpler way.
For those who're curious, it seems like the approach I am taking is the best appraoch.
I add a field in the contact document that indicates that the addition was part of an import (bulkAdd = true). This gets the cloud function to not increment.
I have another cloud function add the contacts 200 at a time (I do FieldValue.timestamp and that counts as another write, so it's 400 writes). I do this in a batch and the 401th write in the batch is the increment count. That way I can bulk import contacts without having to bombard a single document with writes.
Problem with increments
There are duplicate-safe operations like FieldValue.arrayUnion() & FieldValue.arrayRemove(). I wrote a bit about that approach here: Firebase function document.create and user.create triggers firing multiple times
By this approach you make your user document contain a special array field with contact IDs. Once the contact is added to a subcollection and your function is triggered, the contact's id can be written to this field. If the function is triggered twice or more times for one contact, there will be only one instance of it written into the master user doc. But the actual size can be fetched on the client or with one more function triggered on the user doc update. This is a bit simplier than having eventIDs.
Problem with importing 10k+ contacts
This is a bit philosophically.
If I got it, the problem is that a user performs 10k writes. Than these 10k writes trigger 10k functions, which perform additional 10k writes to the master doc (and same amount of reads if they use eventIDs document)?
You can make a special subcollection just for importing multiple contacts to your DB. Instead of writing 10k docs to the DB, the client would create one but big document with 10k contact fields, which triggers a cloud function. The mentioned function would read it all, make the neccessary 10k contact writes + 1 write to master doc with all the arrayUnions. You would just need to think how to prevent 10k invoked function writes (adding a special metadata field like yours bulkAdd)
This is just an opinion.

What is the most cost-efficient method of making document writes/reads from Firestore?

Firebase's Cloud Firestore gives you limits on the number of document writes and reads (and deletes). For example, the spark plan (free) allows 50K reads and 20k writes a day. Estimating how many writes and reads is obviously important when developing an app, as you will want to know the potential costs incurred.
Part of this estimation is knowing exactly what counts as a document read/write. This part is somewhat unclear from searching online.
One document can contain many different fields, so if an app is designed such that user actions done through a session require the fields within a single document to be updated, would it be cost-efficient to update all the fields in one single document write at the end of the session, rather than writing the document every single the user wants to update one field?
Similarly, would it not make sense to read the document once at the start of a session, getting the values of all fields, rather than reading them when each is needed?
I appreciate that method will lead to the user seeing slightly out-of-date field values, and the database not being updated admittedly, but if such things aren't too much of a concern to you, couldn't such a method reduce you reads/writes by a large factor?
This all depends on what counts as a document write/read (does writing 20 fields within the same document in one go count as 20 writes?).
The cost of a write operation has no bearing on the number of fields you write. It's purely based on the number of times you call update() or set() on a document reference, weither independently, in a transaction, or in a batch.
If you choose to write each N fields using N separate updates, then you will be charged N writes. If you choose to write N fields using 1 update, then you will be charged 1 write.

Deleting very large collections in Firestore

I need to delete very large collections in Firestore.
Initially I used client side batch deletes, but when the documentation changed and started to discouraged that with the comments
Deleting collections from an iOS client is not recommended.
Deleting collections from a Web client is not recommended.
Deleting collections from an Android client is not recommended.
https://firebase.google.com/docs/firestore/manage-data/delete-data?authuser=0
I switched to a cloud function as recommended in the docs. The cloud function gets triggered when a document is deleted and then deletes all documents in a subcollection as proposed in the above link in the section on "NODE.JS".
The problem that I am running into now is that the cloud function seems to be able to manage around 300 deletes per seconds. With the maximum runtime of a cloud function of 9 minutes I can manage up to 162000 deletes this way. But the collection I want to delete currently holds 237560 documents, which makes the cloud function timeout about half way.
I cannot trigger the cloud function again with an onDelete trigger on the parent document, as this one has already been deleted (which triggered the initial call of the function).
So my question is: What is the recommended way to delete large collections in Firestore? According to the docs it's not client side but server side, but the recommended solution does not scale for large collections.
Thanks!
When you have too muck work that can be performed in a single Cloud Function execution, you will need to either find a way to shard that work across multiple invocations, or continue the work in a subsequent invocations after the first. This is not trivial, and you have to put some thought and work into constructing the best solution for your particular situation.
For a sharding solution, you will have to figure out how to split up the document deletes ahead of time, and have your master function kick off subordinate functions (probably via pubsub), passing it the arguments to use to figure out which shard to delete. For example, you might kick off a function whose sole purpose is to delete documents that begin with 'a'. And another with 'b', etc by querying for them, then deleting them.
For a continuation solution, you might just start deleting documents from the beginning, go for as long as you can before timing out, remember where you left off, then kick off a subordinate function to pick up where the prior stopped.
You should be able to use one of these strategies to limit the amount of work done per functions, but the implementation details are entirely up to you to work out.
If, for some reason, neither of these strategies are viable, you will have to manage your own server (perhaps via App Engine), and message (via pubsub) it to perform a single unit of long-running work in response to a Cloud Function.

Resources