How to run multiple Firestore Functions Sequentially? - firebase

We have 20 functions that must run everyday. Each of these functions do something different based on inputs from the previous function.
We tried calling all the functions in one function, but it hits the timeout error as these 20 functions take more than 9 minutes to execute.
How can we trigger these multiple functions sequentially, or avoid timeout error for one function that executes each of these functions?

There is no configuration or easy way to get this done. You will have to set up a fair amount of code and infrastructure to get this done.
The most straightforward solution involves chaining together calls using pubsub type functions. You can send a message to a pubsub topic that will trigger the next function to run. The payload of the message to send can be the parameters that the function should use to determine how it should operate. If the payload is too big, or some more complex sources of data are required to make that decision, you can use a database to store intermediate data that the next function can query and use.
Since we don't have any more specific details about how your functions actually work, nothing more specific can be said. If you run into problems with a specific detail of this scheme, please post again describing that specifically you're trying to do and what's not working the way you expect.

There is a variant to the Doug solution. At the end of the function, instead of publishing a message into pubsub, simply write a specific log (for example " end").
Then, go to stackdriver logging, search for this specific log trace (turn on advanced filters) and configure a sink into a PubSub topic of this log entry. Thereby, every time that the log is detected, a PubSub message is published with the log content.
Finally, plug your next function on this PubSub topic.
If you need to pass values from function to another one, you can simply add these values in the log trace at the end of the function and parse it at the beginning of the next one.
Chaining functions is not an easy things to do. Things are coming, maybe Google Cloud Next will announce new products for helping you in this task.

If you simply want the functions to execute in order, and you don't need to pass the result of one directly to the next, you could wrap them in a scheduled function (docs) that spaces them out with enough time for each to run.
Sketch below with 3 minute spacing:
exports.myScheduler = functions.pubsub
.schedule('every 3 minutes from 22:00 to 23:00')
.onRun(context => {
let time = // check the time
if (time === '22:00') func1of20();
else if (time === '22:03') func2of20();
// etc. through func20of20()
}
If you do need to pass the results of each function to the next, func1 could store its result in a DB entry, then func2 starts by reading that result, and ends by overwriting with its own so func3 can read when fired 3 minutes later, etc. — though perhaps in this case, the other solutions are more tailored to your needs.

Related

Cloud Firestore triggers run multiple times

In the cloud functions documentation for Firestore triggers, it mentions the following limitation:
"Events are delivered at least once, but a single event may result in multiple function invocations. Avoid depending on exactly-once mechanics, and write idempotent functions."
I am writing a system that checks the length of an array, and when it reaches a certain length (100), it is cleared, and some processing is done on the data. I think this is what it means by an "exactly-once" mechanic.
My question is this, is checking:
if (change.after.data().array.length === change.before.data().array.length) {
return;
}
A sufficient way to prevent multiple executions?
Is checking the following a sufficient way to prevent multiple executions?
if (change.after.data().array.length === change.before.data().array.length) { return; }
The answer is no. The Cloud Function could be ran multiple times in the case array length was 99 and is now 100 (= the single event mentioned in the doc you referred to).
There is a Blog article on Building idempotent functions which explains that a common way to make a Cloud Function idempotent is "to use the event ID which remains unchanged across function retries for the same event".
You could use this method, by saving the event ID when you do the "data processing" you mention in your question, and in case there is another execution, check if the ID was already saved.
Concretely, you can create a Firestore document with the CF event ID as Document ID and in your Cloud Function, you check if this document exists or not, with a Transaction. If it does not exist, you can proceed: the CF was never executed for this event ID.
An simpler solution for your case could be to ensure that if the Cloud Function is executed multiple times it results in the exact same situation in terms of data.
In other words, if the result of the "data processing" is exactly the same when running several times the Cloud Function for array.length == 100, then your function is idempotent ("the operation results remain unchanged when an operation is applied more than once").
Of course this highly depends on the business logic of your "data processing". If, for example, it involves the time of execution, obviously the end result could be different between two executions.

How to multiply Cloud Functions timeout time?

I know that Firebase/Google Cloud Functions can have timeout increased up to 9 minutes, but I have a Pub/Sub function that one request within it needs around 20-30 seconds to complete (document conversion).
async function () {
// code...
// const convertedDoc = await convertDocument()
// ... do something with convertedDoc
}
With 9 minutes of maximum timeout it gives me up to 18 documents I can process.
My question is if after 15th document conversion I would call PubSub function again while finishing the previous invocation will timeout timer start over with new function? Of course, I would need to pass all the data from previous one, but is that a way to do it? Something like recursive PubSub of sorts?
If you don't mind, I would suggest a slightly different approach.
Let's divide the whole process into 2 steps.
The first step - a cloud function which "collects" all documents (I mean their id, or reference, or metadata to uniquely distinct one from others) into a list, and then sends a message per document into a PubSub topic. That message contains a unique identifier/handle/hash of the document, so it can be fetch/processed later.
The PubSub topic triggers (push) the second cloud function. This cloud function is deployed with a maximum instances argument of a few dozens (or hundreds) depending on the context and requirements. Thus, many cloud function instances are being executed parallel, but each cloud function instance is triggered by a message with a unique document id.
The cloud function performs the processing you described, and presumably it takes 20 or 30 seconds. As many cloud functions are being executed in parallel, the overall processing time can be less, than if everything is done sequentially.
In addition, you might like to keep the state of a process in a firestore database using a document id as a firestore record id. Thus each record reflects a process of handling of one particular document. By doing that, any possible duplication can be eliminated, and a self-healing process can be organised.

Trigger function on batch create with firebase

In my app, I have two ways of creating users.
One is a singular add which triggers a cloud function onCreate to send email and does some other logic.
The other one is by batch which ultimately triggers the same function for each added document.
Question is how can I trigger a different function when users are added by a batch ?
I looked into firebase documentation and it doesn't seem to have this feature. Am I wrong ?
This will greatly help reducing the number of reads and I can bulk send emails to added users instead of sending them one by one.
The trigger on Cloud functions for document creation is only one.
What you can do is to have two different functions with the same trigger and incode differentiate between both creation methods.
This can be something like adding to the document two more values:
creation_method
batch
with creation method you can evaluate its value on each document to verify if the execution continues or it finishes at that point.
batch can be used in the batch created to identify the whole batch.
for creation_method I recommend there different values:
singular
batch_normal
batch_final
on Batch just having a batchID
For the function for singular creation verify that is singular and thats it.
For the batch function make that it only continue on batch_final status and get all the values that have the same batchId.
This approach will not reduce the reads as the reads are billed for each document read so unless you depend on additional documents the number of reads will be the same.
As a work around if you want to reduce the amount you are billed per reads you can change to Realtime Database the triggers you mentioned also exist and it has the advantage that it doesn't bill for reads.

Firebase, how to implement scheduler?

When some information is stored in the firestore, each document is storing some specific time in the future, and according to that time, the event should occur in the user's app.
The first way I could find was the Cloud Function pub sub scheduler. However, I could not use this because the time is fixed.
The second method was to use Cloud Function + Cloud Task. I have referenced this. 
https://medium.com/firebase-developers/how-to-schedule-a-cloud-function-to-run-in-the-future-in-order-to-build-a-firestore-document-ttl-754f9bf3214a
This perfectly performed the function I really wanted, but there was a fatal drawback in the Cloud Task, because I could only save the event within 30 days. In other words, future time exceeding 30 days did not apply to this.
I want this event to be saved over the long term. And I want it to be somewhat smooth for large traffic.
I`m using Flutter/Firebase, how to implement this requirements above?
thank you for reading happy new year
You could check in the function that gets activated on document creation if the task is due in more than 30 days, and if so, store it somewhere else (maybe another document). Then have another process that checks if the task is now within the 30 days range and then have it do the same as the newly created ones. This second process could be run every week or two weeks.

Is there a way to define a trigger that runs reliably at a datetime specified as a field in the updated/created object?

The question
Is it possible (and if so, how) to make it so when an object's field x (that contains a timestamp) is created/updated a specific trigger will be called at the time specified in x (probably calling a serverless function)?
My Specific context
In my specific instance the object can be seen as a task. I want to make it so when the task is created a serverless function tries to complete the task and if it doesn't succeed it updates the record with the partial results and specifies in a field x when the next attempt should happen.
The attempts should not span at a fixed interval. For example, a task may require 10 successive attempts at approximately every 30 seconds, but then it may need to wait 8 hours.
There currently is no way to (re)trigger a Cloud Function on a node after a certain timespan.
The closest you can get is by regularly scheduling a cron job to run on the list of tasks. For more on that, see this sample in the function-samples repo, this blog post by Abe, and this video where Jen explains them.
I admit I never like using this cron-job approach, since you have to query the list to find the items to process. A while ago, I wrote a more efficient solution that runs a priority queue in a node process. My code was a bit messy, so I'm not quite ready to share it, but it wasn't a lot (<100 lines). So if the cron-trigger approach doesn't work for you, I recommend investigating that direction.

Resources