Cloud Firestore triggers run multiple times - firebase

In the cloud functions documentation for Firestore triggers, it mentions the following limitation:
"Events are delivered at least once, but a single event may result in multiple function invocations. Avoid depending on exactly-once mechanics, and write idempotent functions."
I am writing a system that checks the length of an array, and when it reaches a certain length (100), it is cleared, and some processing is done on the data. I think this is what it means by an "exactly-once" mechanic.
My question is this, is checking:
if (change.after.data().array.length === change.before.data().array.length) {
return;
}
A sufficient way to prevent multiple executions?

Is checking the following a sufficient way to prevent multiple executions?
if (change.after.data().array.length === change.before.data().array.length) { return; }
The answer is no. The Cloud Function could be ran multiple times in the case array length was 99 and is now 100 (= the single event mentioned in the doc you referred to).
There is a Blog article on Building idempotent functions which explains that a common way to make a Cloud Function idempotent is "to use the event ID which remains unchanged across function retries for the same event".
You could use this method, by saving the event ID when you do the "data processing" you mention in your question, and in case there is another execution, check if the ID was already saved.
Concretely, you can create a Firestore document with the CF event ID as Document ID and in your Cloud Function, you check if this document exists or not, with a Transaction. If it does not exist, you can proceed: the CF was never executed for this event ID.
An simpler solution for your case could be to ensure that if the Cloud Function is executed multiple times it results in the exact same situation in terms of data.
In other words, if the result of the "data processing" is exactly the same when running several times the Cloud Function for array.length == 100, then your function is idempotent ("the operation results remain unchanged when an operation is applied more than once").
Of course this highly depends on the business logic of your "data processing". If, for example, it involves the time of execution, obviously the end result could be different between two executions.

Related

Using Firestore Triggers to Manage User Document Count

If every document in a collection is a user resource that is limited, how can you ensure the user does not go over their assigned limit?
My first thought was to take advantage of the Firestore triggers to avoid building a real backend, but the triggers sometimes fire more than once even if the inputed data has not changed. I was comparing the new doc to the old doc and taking action if certain keys did not match but if GCP fires the same function twice I get double the result. In this case incrementing or decrementing counts.
The Firestore docs state:
Events are delivered at least once, but a single event may result in multiple function invocations. Avoid depending on exactly-once mechanics, and write idempotent functions.
So in my situation the only solution I can think of is saving the event id's somewhere and ensuring they did not fire already. Or even worse doing a read on each call to count the current docs and adjust them accordingly (increasing read costs).
Whats a smart way to approach this?
If reinvocations (which while possible are quite uncommon) are a concern for your use-case, you could indeed store the ID of the invocation event or something less frequent, like (depending on the use-case) the source document ID.

How to multiply Cloud Functions timeout time?

I know that Firebase/Google Cloud Functions can have timeout increased up to 9 minutes, but I have a Pub/Sub function that one request within it needs around 20-30 seconds to complete (document conversion).
async function () {
// code...
// const convertedDoc = await convertDocument()
// ... do something with convertedDoc
}
With 9 minutes of maximum timeout it gives me up to 18 documents I can process.
My question is if after 15th document conversion I would call PubSub function again while finishing the previous invocation will timeout timer start over with new function? Of course, I would need to pass all the data from previous one, but is that a way to do it? Something like recursive PubSub of sorts?
If you don't mind, I would suggest a slightly different approach.
Let's divide the whole process into 2 steps.
The first step - a cloud function which "collects" all documents (I mean their id, or reference, or metadata to uniquely distinct one from others) into a list, and then sends a message per document into a PubSub topic. That message contains a unique identifier/handle/hash of the document, so it can be fetch/processed later.
The PubSub topic triggers (push) the second cloud function. This cloud function is deployed with a maximum instances argument of a few dozens (or hundreds) depending on the context and requirements. Thus, many cloud function instances are being executed parallel, but each cloud function instance is triggered by a message with a unique document id.
The cloud function performs the processing you described, and presumably it takes 20 or 30 seconds. As many cloud functions are being executed in parallel, the overall processing time can be less, than if everything is done sequentially.
In addition, you might like to keep the state of a process in a firestore database using a document id as a firestore record id. Thus each record reflects a process of handling of one particular document. By doing that, any possible duplication can be eliminated, and a self-healing process can be organised.

Trigger function on batch create with firebase

In my app, I have two ways of creating users.
One is a singular add which triggers a cloud function onCreate to send email and does some other logic.
The other one is by batch which ultimately triggers the same function for each added document.
Question is how can I trigger a different function when users are added by a batch ?
I looked into firebase documentation and it doesn't seem to have this feature. Am I wrong ?
This will greatly help reducing the number of reads and I can bulk send emails to added users instead of sending them one by one.
The trigger on Cloud functions for document creation is only one.
What you can do is to have two different functions with the same trigger and incode differentiate between both creation methods.
This can be something like adding to the document two more values:
creation_method
batch
with creation method you can evaluate its value on each document to verify if the execution continues or it finishes at that point.
batch can be used in the batch created to identify the whole batch.
for creation_method I recommend there different values:
singular
batch_normal
batch_final
on Batch just having a batchID
For the function for singular creation verify that is singular and thats it.
For the batch function make that it only continue on batch_final status and get all the values that have the same batchId.
This approach will not reduce the reads as the reads are billed for each document read so unless you depend on additional documents the number of reads will be the same.
As a work around if you want to reduce the amount you are billed per reads you can change to Realtime Database the triggers you mentioned also exist and it has the advantage that it doesn't bill for reads.

Firebase Firestore, Delete Collection with a Callable Cloud Function

if you see here https://firebase.google.com/docs/firestore/solutions/delete-collections
you can see the below
Consistency - the code above deletes documents one at a time. If you
query while there is an ongoing delete operation, your results may
reflect a partially complete state where only some targeted documents
are deleted. There is also no guarantee that the delete operations
will succeed or fail uniformly, so be prepared to handle cases of
partial deletion.
so how to handle this correctly?
this means "preventing users from accessing this collection while deletion is in progress?"
or "If the work is stopped by accessing the collection in the middle, is it to call the function again from the failed part to proceed with the complete deletion?"
so how to handle this correctly?
It's suggesting that you should check for failures, and retry until there are no documents remaining (or at least until you are satisfied with the result).

How to run multiple Firestore Functions Sequentially?

We have 20 functions that must run everyday. Each of these functions do something different based on inputs from the previous function.
We tried calling all the functions in one function, but it hits the timeout error as these 20 functions take more than 9 minutes to execute.
How can we trigger these multiple functions sequentially, or avoid timeout error for one function that executes each of these functions?
There is no configuration or easy way to get this done. You will have to set up a fair amount of code and infrastructure to get this done.
The most straightforward solution involves chaining together calls using pubsub type functions. You can send a message to a pubsub topic that will trigger the next function to run. The payload of the message to send can be the parameters that the function should use to determine how it should operate. If the payload is too big, or some more complex sources of data are required to make that decision, you can use a database to store intermediate data that the next function can query and use.
Since we don't have any more specific details about how your functions actually work, nothing more specific can be said. If you run into problems with a specific detail of this scheme, please post again describing that specifically you're trying to do and what's not working the way you expect.
There is a variant to the Doug solution. At the end of the function, instead of publishing a message into pubsub, simply write a specific log (for example " end").
Then, go to stackdriver logging, search for this specific log trace (turn on advanced filters) and configure a sink into a PubSub topic of this log entry. Thereby, every time that the log is detected, a PubSub message is published with the log content.
Finally, plug your next function on this PubSub topic.
If you need to pass values from function to another one, you can simply add these values in the log trace at the end of the function and parse it at the beginning of the next one.
Chaining functions is not an easy things to do. Things are coming, maybe Google Cloud Next will announce new products for helping you in this task.
If you simply want the functions to execute in order, and you don't need to pass the result of one directly to the next, you could wrap them in a scheduled function (docs) that spaces them out with enough time for each to run.
Sketch below with 3 minute spacing:
exports.myScheduler = functions.pubsub
.schedule('every 3 minutes from 22:00 to 23:00')
.onRun(context => {
let time = // check the time
if (time === '22:00') func1of20();
else if (time === '22:03') func2of20();
// etc. through func20of20()
}
If you do need to pass the results of each function to the next, func1 could store its result in a DB entry, then func2 starts by reading that result, and ends by overwriting with its own so func3 can read when fired 3 minutes later, etc. — though perhaps in this case, the other solutions are more tailored to your needs.

Resources