Most efficient structure for cloud functions in Firebase - firebase

Is there a performance difference between the following two options for real-time database triggered functions?:
One cloud function that listens to all subnodes and decides what to execute based on the path
An entirely separate cloud function for each subnode.
This is assuming total number of function executions stays equal.

If there are multiple events happening at the same time, it may be a problem (from https://cloud.google.com/functions/docs/concepts/exec):
Cloud Functions may start multiple function instances to scale your
function up to meet the current load. These instances run in parallel,
which results in having more than one parallel function execution.
However, each function instance handles only one concurrent request at
a time. This means while your code is processing one request, there is
no possibility of a second request being routed to the same function
instance, and the original request can use the full amount of
resources (CPU and memory) that you requested.
Adding to this, the logic for separate cloud functions should be a lot simpler than having one monolithic function checking for each trigger.

Related

Firestore : Maintaining the count of a collection. Trigger function vs transaction

Let's say I have a collection called persons and another collection called cities with a field population. When a Person is created in a City, I would like to increment the population field in the corresponding city.
I have two options.
Create a onCreate trigger function. Find the city document and increment using FieldValue.increment(1).
Create an HTTPS callable cloud function to create the person. The cloud function executes a transaction in which the person is created and the population is incremented.
The first one is simpler and I am using it right now. But, I am wondering if there could be cases where the onCreate is not called due to some glitch...
I am thinking of moving to the second option. I am wondering if there are any disadvantages. Does HTTPS callable function cost more?
The only problem I see with the HTTPS callables would be that if something fails you would need to handle that on your client side. That would be (at least for me) a little bit to much logic for the client side.
What I can recommend you after almost 4 years experience with exactly that problem is a solution with a virtual queue. I had a long dicussion on that theme here and even with the Firebase ppl on the last in person Google IO and Firebase Summit.
Our problem was that there where those glitches and even if they happend sometimes the changes and transaction failed due to too much requests. After trying every offical recommendation like the shard counters etc. we ended up creating a virtual queue where each onCreate adds an entry to just a Firestore or RTD list/collection and another function that runs eaither by crone or another trigger (that doesn't matter). That cloud function handles each entry in the queue one by one and starts again for each of them to awoid timouts and memeroy limits. We made sure one handler/calculation is enought for a single function to handle it.
This method was the only bullet proof one that could handle thousands of new entries in a second without having an issue. The only downside is that it takes more time than an usual trigger because each entries is calculated one by one. If your calculations are smaller you could do them in batches (that is how we started to).

How to multiply Cloud Functions timeout time?

I know that Firebase/Google Cloud Functions can have timeout increased up to 9 minutes, but I have a Pub/Sub function that one request within it needs around 20-30 seconds to complete (document conversion).
async function () {
// code...
// const convertedDoc = await convertDocument()
// ... do something with convertedDoc
}
With 9 minutes of maximum timeout it gives me up to 18 documents I can process.
My question is if after 15th document conversion I would call PubSub function again while finishing the previous invocation will timeout timer start over with new function? Of course, I would need to pass all the data from previous one, but is that a way to do it? Something like recursive PubSub of sorts?
If you don't mind, I would suggest a slightly different approach.
Let's divide the whole process into 2 steps.
The first step - a cloud function which "collects" all documents (I mean their id, or reference, or metadata to uniquely distinct one from others) into a list, and then sends a message per document into a PubSub topic. That message contains a unique identifier/handle/hash of the document, so it can be fetch/processed later.
The PubSub topic triggers (push) the second cloud function. This cloud function is deployed with a maximum instances argument of a few dozens (or hundreds) depending on the context and requirements. Thus, many cloud function instances are being executed parallel, but each cloud function instance is triggered by a message with a unique document id.
The cloud function performs the processing you described, and presumably it takes 20 or 30 seconds. As many cloud functions are being executed in parallel, the overall processing time can be less, than if everything is done sequentially.
In addition, you might like to keep the state of a process in a firestore database using a document id as a firestore record id. Thus each record reflects a process of handling of one particular document. By doing that, any possible duplication can be eliminated, and a self-healing process can be organised.

Can I use Google CloudFunctions for reliable application purposes?

I remember to have read an article where it was explained that Cloud Functions are not guaranteed to be executed and especially in the right order. I can't find any sources on this anymore.
Is this still recent information?
I am aware that the start of a function can take a couple seconds, especially when cold starting the function.
Could I reliably increment a number each time a document is created in a specific Firestore collection without getting my numbers mixed up? I know this is done often but I've never seen information on whether or not it is safe to do.
Following up on question one, are there red flags when using Cloud Functions for payment backend services?
Can I be sure that Cloud Functions are executed in the order that they were triggered i.e. are they queued or executed in parallel?
Could I reliably increment a number each time a document is created in a specific Firestore collection without getting my numbers mixed up?
You can certainly write code to do that. You will need to keep track of a running count of documents in another document, and use a transaction to keep it up to date.
I don't recommend doing this. It's kind an anti-pattern in Firestore to impose sequentially increasing numbers for documents in a collection. If you want time-based ordering, you should consider using a timestamp instead.
Can I be sure that Cloud Functions are executed in the order that they were triggered i.e. are they queued or executed in parallel?
Cloud Functions provides absolutely no guarantee that functions invocations will happen in any order. They are asynchronous and can execute in parallel on multiple server instances, depending on the load applied to the function.
I strongly suggest reading through the documentation to understand the execution environment provided by Cloud Functions.

How to run multiple Firestore Functions Sequentially?

We have 20 functions that must run everyday. Each of these functions do something different based on inputs from the previous function.
We tried calling all the functions in one function, but it hits the timeout error as these 20 functions take more than 9 minutes to execute.
How can we trigger these multiple functions sequentially, or avoid timeout error for one function that executes each of these functions?
There is no configuration or easy way to get this done. You will have to set up a fair amount of code and infrastructure to get this done.
The most straightforward solution involves chaining together calls using pubsub type functions. You can send a message to a pubsub topic that will trigger the next function to run. The payload of the message to send can be the parameters that the function should use to determine how it should operate. If the payload is too big, or some more complex sources of data are required to make that decision, you can use a database to store intermediate data that the next function can query and use.
Since we don't have any more specific details about how your functions actually work, nothing more specific can be said. If you run into problems with a specific detail of this scheme, please post again describing that specifically you're trying to do and what's not working the way you expect.
There is a variant to the Doug solution. At the end of the function, instead of publishing a message into pubsub, simply write a specific log (for example " end").
Then, go to stackdriver logging, search for this specific log trace (turn on advanced filters) and configure a sink into a PubSub topic of this log entry. Thereby, every time that the log is detected, a PubSub message is published with the log content.
Finally, plug your next function on this PubSub topic.
If you need to pass values from function to another one, you can simply add these values in the log trace at the end of the function and parse it at the beginning of the next one.
Chaining functions is not an easy things to do. Things are coming, maybe Google Cloud Next will announce new products for helping you in this task.
If you simply want the functions to execute in order, and you don't need to pass the result of one directly to the next, you could wrap them in a scheduled function (docs) that spaces them out with enough time for each to run.
Sketch below with 3 minute spacing:
exports.myScheduler = functions.pubsub
.schedule('every 3 minutes from 22:00 to 23:00')
.onRun(context => {
let time = // check the time
if (time === '22:00') func1of20();
else if (time === '22:03') func2of20();
// etc. through func20of20()
}
If you do need to pass the results of each function to the next, func1 could store its result in a DB entry, then func2 starts by reading that result, and ends by overwriting with its own so func3 can read when fired 3 minutes later, etc. — though perhaps in this case, the other solutions are more tailored to your needs.

Deleting very large collections in Firestore

I need to delete very large collections in Firestore.
Initially I used client side batch deletes, but when the documentation changed and started to discouraged that with the comments
Deleting collections from an iOS client is not recommended.
Deleting collections from a Web client is not recommended.
Deleting collections from an Android client is not recommended.
https://firebase.google.com/docs/firestore/manage-data/delete-data?authuser=0
I switched to a cloud function as recommended in the docs. The cloud function gets triggered when a document is deleted and then deletes all documents in a subcollection as proposed in the above link in the section on "NODE.JS".
The problem that I am running into now is that the cloud function seems to be able to manage around 300 deletes per seconds. With the maximum runtime of a cloud function of 9 minutes I can manage up to 162000 deletes this way. But the collection I want to delete currently holds 237560 documents, which makes the cloud function timeout about half way.
I cannot trigger the cloud function again with an onDelete trigger on the parent document, as this one has already been deleted (which triggered the initial call of the function).
So my question is: What is the recommended way to delete large collections in Firestore? According to the docs it's not client side but server side, but the recommended solution does not scale for large collections.
Thanks!
When you have too muck work that can be performed in a single Cloud Function execution, you will need to either find a way to shard that work across multiple invocations, or continue the work in a subsequent invocations after the first. This is not trivial, and you have to put some thought and work into constructing the best solution for your particular situation.
For a sharding solution, you will have to figure out how to split up the document deletes ahead of time, and have your master function kick off subordinate functions (probably via pubsub), passing it the arguments to use to figure out which shard to delete. For example, you might kick off a function whose sole purpose is to delete documents that begin with 'a'. And another with 'b', etc by querying for them, then deleting them.
For a continuation solution, you might just start deleting documents from the beginning, go for as long as you can before timing out, remember where you left off, then kick off a subordinate function to pick up where the prior stopped.
You should be able to use one of these strategies to limit the amount of work done per functions, but the implementation details are entirely up to you to work out.
If, for some reason, neither of these strategies are viable, you will have to manage your own server (perhaps via App Engine), and message (via pubsub) it to perform a single unit of long-running work in response to a Cloud Function.

Resources