Firebase realtime database limit for delete operations - firebase

I'm a firebase user recently diving into rtdb, and just found a limit docs explaining write limit for a single db instance, saying the quote below:
The limit on write operations per second on a single database. While not a hard limit, if you sustain more than 1,000 writes per second, your write activity may be rate-limited.
In firestore's security rules for example, delete operation is in the category of write operation, and i guess such concept would be applicable to other firebase services. So i want to exactly know if delete operation is subject to write limit for rtdb instance.
FYI, i'm planning to use the latest node js admin sdk with cloud functions to operate a huge number of deletes, using this link's method for huge number of different paths.
So, if the delete op is subject to rtdb write operation, it seems to be a critical mistake to deploy this function even if only few number of users are likely to trigger this function concurrently. And even few concurrent invocations would soon max out the per-second write limit, considering that firebase admin sdk is good at iterating those ops really quickly.
Since i have to specify the id(key) of path for each removal(-so that no nested data would be deleted unintentionally), simply deleting parent path is not applicable to this situation, and even really dangerous..
If delete op is not subject to write limit, then i also want to know if there is truly no single limit for delete operations for rtdb!! Hope this question reach to firebase gurus in the community! Comments are welcomed and appreciate! Thank you in advance [:

A delete operation does count as a write operation. If you run 20K delete operations i.e. 20K separate .remove() operations simultaneously using Promise.all(), they all will be counted as unique operation and you'll be rate limited. Those additional delete requests over the limit will take time to succeed.
Instead if you are using a Cloud function you can create a single object including all paths to be deleted and use update() to remove all those nodes in a single write operation. Let's say you have a root node users and each user node has a points node and you want to remove it from all the users.
const remObject = {
"user_id_1/points": null,
"user_id_2/points": null
}
await admin.database().ref("users").update(remObject)
Although you would need to know IDs of all users, this will remove points node from all users in a single operation and hence you won't be rate limited. Another benefit of doing this would be all those nodes will be deleted for sure unlike executing individual requests where some of them may fail.
If you run different `remove()` operation for each user as shown below, then it'll count as N writes where N is number of operations.
const userIDs = []
const removeRequests = userIDs.map(u => admin.database().ref(`users/${u}/points`).remove())
await Promise.all(removeRequests)
// userIDs.length writes which will count towards that rate limit
I ran some test functions with above code and no surprise both adding and removing 20K nodes using distinct operations with Promise.all() took over 40 seconds while using a single update operation with an object took just 3.
Do note that using the single update method maybe limited by "Size of a single write request to the database" which is 16 MB for SDKs and 256 MB for REST API. In such cases, you may have to break down the object in smaller parts and use multiple update() operations.

Related

Trigger function on batch create with firebase

In my app, I have two ways of creating users.
One is a singular add which triggers a cloud function onCreate to send email and does some other logic.
The other one is by batch which ultimately triggers the same function for each added document.
Question is how can I trigger a different function when users are added by a batch ?
I looked into firebase documentation and it doesn't seem to have this feature. Am I wrong ?
This will greatly help reducing the number of reads and I can bulk send emails to added users instead of sending them one by one.
The trigger on Cloud functions for document creation is only one.
What you can do is to have two different functions with the same trigger and incode differentiate between both creation methods.
This can be something like adding to the document two more values:
creation_method
batch
with creation method you can evaluate its value on each document to verify if the execution continues or it finishes at that point.
batch can be used in the batch created to identify the whole batch.
for creation_method I recommend there different values:
singular
batch_normal
batch_final
on Batch just having a batchID
For the function for singular creation verify that is singular and thats it.
For the batch function make that it only continue on batch_final status and get all the values that have the same batchId.
This approach will not reduce the reads as the reads are billed for each document read so unless you depend on additional documents the number of reads will be the same.
As a work around if you want to reduce the amount you are billed per reads you can change to Realtime Database the triggers you mentioned also exist and it has the advantage that it doesn't bill for reads.

Firestore count number of documents

If I have a Firestore document in the following structure:
In my web app, I would like to display the number of followers. If I just do a get() of the whole followers sub-collection. That will be costly in terms of read operations. I thought about the following solution:
Having a counter document and having a counter field that would be incremented every time a document is created inside the followers collection using cloud function. But there is the limit of one write per second per document for that counter. The idea to have a followers collection and each document for each follower is to avoid the one write per second limit (thanks to Doug Stevenson's blog: The top 10 things to know about Firestore when choosing a database for your app).
The only get around for that I can think of is to use distributed counter extension. But from I read so far, the counter only works with front-end SDK. Would I be able to use the extension in a cloud function or in a node.js backend to increase the followers counter?
The "one write per document per second" is a guideline and not a hard rule, so I'd highly recommend not immediately getting hung up on that.
Then again, if you think you'll consistently need to count more than can be kept in a single document, your options are:
Keep a distributed counter, as shown in the documentation on distributed counters.
Keep the counter somewhere else. For example, I typically keep counters in Realtime Database, which has much higher write throughput (but lower read concurrency per shard).
But from I read so far, the counter only works with front-end SDK.
That's not true. The extension works for any query made to Firestore.
Would I be able to use the extension in a cloud function or in a node.js backend to increase the followers counter?
The extension works by monitoring documents added to and removed from the collection. It doesn't matter where the change comes from. You will still be able to use the computed counter from any code that's capable of querying the counter documents.

How to avoid Firestore document write limit when implementing an Aggregate Query?

I need to keep track of the number of photos I have in a Photos collection. So I want to implement an Aggregate Query as detailed in the linked article.
My plan is to have a Cloud Function that runs whenever a Photo document is created or deleted, and then increment or decrement the aggregate counter as needed.
This will work, but I worry about running into the 1 write/document/second limit. Say that a user adds 10 images in a single import action. That is 10 executions of the Cloud Function in more-or-less the same time, and thus 10 writes to the Aggregate Query document more-or-less at the same time.
Looking around I have seen several mentions (like here) that the 1 write/doc/sec limit is for sustained periods of constant load, not short bursts. That sounds reassuring, but it isn't really reassuring enough to convince an employer that your choice of DB is a safe and secure option if all you have to go on is that 'some guy said it was OK on Google Groups'. Is there any official sources stating that short write bursts are OK, and if so, what definitions are there for a 'short burst'?
Or are there other ways to maintain an Aggregate Query result document without also subjecting all the aggregated documents to a very restrictive 1 write / second limitation across all the aggregated documents?
If you think that you'll see a sustained write rate of more than once per second, consider dividing the aggregation up in shards. In this scenario you have N aggregation docs, and each client/function picks one at random to write to. Then when a client needs the aggregate, it reads all these subdocuments and adds them up client-side. This approach is quite well explained in the Firebase documentation on distributed counters, and is also the approach used in the distributed counter Firebase Extension.

What is the most cost-efficient method of making document writes/reads from Firestore?

Firebase's Cloud Firestore gives you limits on the number of document writes and reads (and deletes). For example, the spark plan (free) allows 50K reads and 20k writes a day. Estimating how many writes and reads is obviously important when developing an app, as you will want to know the potential costs incurred.
Part of this estimation is knowing exactly what counts as a document read/write. This part is somewhat unclear from searching online.
One document can contain many different fields, so if an app is designed such that user actions done through a session require the fields within a single document to be updated, would it be cost-efficient to update all the fields in one single document write at the end of the session, rather than writing the document every single the user wants to update one field?
Similarly, would it not make sense to read the document once at the start of a session, getting the values of all fields, rather than reading them when each is needed?
I appreciate that method will lead to the user seeing slightly out-of-date field values, and the database not being updated admittedly, but if such things aren't too much of a concern to you, couldn't such a method reduce you reads/writes by a large factor?
This all depends on what counts as a document write/read (does writing 20 fields within the same document in one go count as 20 writes?).
The cost of a write operation has no bearing on the number of fields you write. It's purely based on the number of times you call update() or set() on a document reference, weither independently, in a transaction, or in a batch.
If you choose to write each N fields using N separate updates, then you will be charged N writes. If you choose to write N fields using 1 update, then you will be charged 1 write.

Deleting very large collections in Firestore

I need to delete very large collections in Firestore.
Initially I used client side batch deletes, but when the documentation changed and started to discouraged that with the comments
Deleting collections from an iOS client is not recommended.
Deleting collections from a Web client is not recommended.
Deleting collections from an Android client is not recommended.
https://firebase.google.com/docs/firestore/manage-data/delete-data?authuser=0
I switched to a cloud function as recommended in the docs. The cloud function gets triggered when a document is deleted and then deletes all documents in a subcollection as proposed in the above link in the section on "NODE.JS".
The problem that I am running into now is that the cloud function seems to be able to manage around 300 deletes per seconds. With the maximum runtime of a cloud function of 9 minutes I can manage up to 162000 deletes this way. But the collection I want to delete currently holds 237560 documents, which makes the cloud function timeout about half way.
I cannot trigger the cloud function again with an onDelete trigger on the parent document, as this one has already been deleted (which triggered the initial call of the function).
So my question is: What is the recommended way to delete large collections in Firestore? According to the docs it's not client side but server side, but the recommended solution does not scale for large collections.
Thanks!
When you have too muck work that can be performed in a single Cloud Function execution, you will need to either find a way to shard that work across multiple invocations, or continue the work in a subsequent invocations after the first. This is not trivial, and you have to put some thought and work into constructing the best solution for your particular situation.
For a sharding solution, you will have to figure out how to split up the document deletes ahead of time, and have your master function kick off subordinate functions (probably via pubsub), passing it the arguments to use to figure out which shard to delete. For example, you might kick off a function whose sole purpose is to delete documents that begin with 'a'. And another with 'b', etc by querying for them, then deleting them.
For a continuation solution, you might just start deleting documents from the beginning, go for as long as you can before timing out, remember where you left off, then kick off a subordinate function to pick up where the prior stopped.
You should be able to use one of these strategies to limit the amount of work done per functions, but the implementation details are entirely up to you to work out.
If, for some reason, neither of these strategies are viable, you will have to manage your own server (perhaps via App Engine), and message (via pubsub) it to perform a single unit of long-running work in response to a Cloud Function.

Resources