Firebase Functions: What scheduling algorithm is used when retrying events? (exponential backoff, etc) - firebase

When Firebase Function events are retried, what replay algorithm is used? Or how often can I expect events to be retried?

Cloud Functions does not offer a guaranteed behavior on this. The documentation merely states:
When you enable retries on a background function, Cloud Functions will retry a failed function invocation until it completes successfully, or the retry window (by default, 7 days) expires.
And that's all you're given. I would take this to mean that the system is free to adjust the retry frequency as it sees fit, based on internal configuration current conditions.

Related

Is it possible to reduce the retry period of 7 days of a firebase function to a lesser value?

This documentation page describes how to enable retries for asynchronous firebase functions. It mentions the maximum retry period is 7 days.
Cloud Functions guarantees at-least-once execution of an event-driven
function for each event emitted by an event source. However, by
default, if a function invocation terminates with an error, the
function will not be invoked again, and the event will be dropped.
When you enable retries on an event-driven function, Cloud Functions
will retry a failed function invocation until it completes
successfully or the retry window expires (by default, after 7 days).
Is there a way to reduce the retry period to few minutes, from the default value of 7 days?
Posting my comment as an answer:
"Unfortunately, the default Firebase Functions retry period of 7 days cannot be shortened to a few minutes. The longest possible retry period is specified by Google Cloud Functions and is 7 days. Making a new function that is activated by a timer could be a workaround to change the default Firebase Functions retry period from 7 days to a few minutes. This timer-triggered function can be used to monitor the performance of the original function and, if necessary, attempt it at predetermined intervals."

Root cause and retry of "The request was aborted because there was no available instance." error in Cloud Functions

Over time, we see sometimes bursts of errors in our Cloud Functions - "The request was aborted because there was no available instance." with HTTP response 500, which indicates Cloud Functions intrinsically cannot manage the rate of traffic.
This happens for Cloud Functions triggered by changes on Firestore, RTDB, PubSub and even scheduled functions.
According to the troubleshooting guide, this can happen due to sudden increase of traffic, long cold-starts or long request processing.
We also understand that it's a good practice to use exponential backoff retry mechanism where it's important that the Cloud Function will execute.
We know it's not a max-instance issue as we didn't set one for these functions, and also the error is 500 and not 429.
Questions:
Can we identify the underling root-cause - e.g. is it a cold-start? is it a long running function which causes it?
When functions fail due to cold-start time? Does this cold-start include only the time it takes to provision the instance and put the code there or also the initial execution of the runtime environment (e.g. node index.js), which executes also the code in the global scope?
Cloud Function have a retry on failure configuration. Does it cover also the "no available instance" case we experienced?
This error can be caused by one of the following:
A huge sudden increase in traffic.
A long cold start time.
A long request processing time.
Transient factors attributed to the Cloud Run service
As mentioned in this github,Cloud Run does not mark request logs with information about whether they caused a cold start or not.However,Stackdriver which is a suite of monitoring tools (Stackdriver Logging,Stackdriver Error reporting,Stackdriver Monitoring) that helps you understand what is going on in your cloud functions. It has in-built tools for logging,reporting errors and monitoring.Apart from stackdriver, you can do execution times, execution counts and memory usage in the GCP console You can refer this Stackdriver Logging and Stackdriver Trace for Cloud Functions & Error Reporting
cold-start includes the time it takes to provision the instance and also the initial execution of the runtime environment. I think the retry on failure configuration does not cover the "no available instance"
I have found this github & Issue tracker raised for a similar issue which is still open.If you are still facing the issue, you can follow that issue for future updates and also add your concerns there.

Firebase Cloud Function:How many concurrent invocations a single HTTP trigger type function can have at any given time?

From the doc in "Rate Limit" section, calls are restricted to 16 invocations per 100 seconds. But I am not sure if it is saying about onRequest/onCall HTTP triggers. That limit seems to be related to some CLI deployment or "testing via Firebase Console" (whatever that means) and not a call from client mobile sdk. If the restriction is legitimate, then that seems too limiting for something that is advertised to scale to potentially "millions/billions". I have a use case where 500 or so mobile users will be calling an HTTP end point which will perform a mixture of read/write to Firestore + some processing at a moment notice.
The "rate limits" table here mentions 16 invocations per 100 seconds, but the table is actually referring to usage limits on invoking the Cloud functions API to list/deploy cloud functions in a firebase project. Not invocation limits on the actual cloud fuctions which are much more generous.
The rate limits for background functions such as firebase event handlers (onCreate, onUpdate, etc) or PubSub scheduled functions, etc are 3000 concurrent invocations (for a function that takes 100 seconds to execute).
The rate limits for HTTP onCall functions are unlimited, at the moment. They simply scale up to accommodate higher traffic.
Background functions have additional limits, as explained below. These limits do not apply to HTTP functions
Here's a screenshot from cloud functions quota page

setTimeout() on realtime database trigger in Google cloud functions

I have a Firebase realtime database structure that looks like this:
rooms/
room_id/
user_1/
name: 'hey',
connected: true
connected is basically a Boolean indicating as to whether the user is connected or not and will be set to false using onDisconnect() that Firebase provides.
Now my question is - If I trigger a cloud function every time theconnected property of a user changes , can I run a setTimeout() for 45 seconds . If the connected property is still false, at the end of the setTimeout() (for which I read that particular connected value from the db ) then I delete the node of the user (like the user_1 node above).
Will ths setTimeout() pose a problem if there are many triggers fired simultaneously?
In short, Cloud Functions have a maximum time they can run.
If your timeout makes it's callback after that time limit expired, the function will already have been terminated.
Consider using a very simple and efficient way for calling scheduled code in Cloud Functions called a cron-job.
Ref
If you use setTimeout() to delay the execution of a function for 45 seconds, that's probably not enough time to cause a problem. The default timeout for a function is 1 minute, and if you exceed that, the function will be forced to terminate immediately. If you are concerned about this, you should simply increase the timeout.
Bear in mind that you are paying for the entire time that a function executes, so even if you pause the function, you will be billed for that time. If you want a more efficient way of delaying some work for later, consider using Cloud Tasks to schedule that.
For my understanding your functionality is intend to monitoring the users that connected and is on going connect to the Firebase mealtime database from the cloud function. Is that correct?
For monitoring the Firebase Realtime database, GCP provided the tool to monitor the DB performance and usage.
Or simply you just want to keep the connection a live ?
if the request to the Firebase Realtime DB is RESTful requests like GET and PUT, it only keep the connection per request, but is is High requests, it still cost more.
Normally, we suggest the client to use the native SDKs for your app's platform, instead of the REST API. The SDKs maintain open connections, reducing the SSL encryption costs and database load that can add up with the REST API.
However, If you do use the REST API, consider using an HTTP keep-alive to maintain an open connection or use server-sent events and set keep-alive, which can reduce costs from SSL handshakes.

Is it ok to use setTimeout in Cloud Functions?

I'm wondering if it's okay to use setTimeout in Firebase Cloud Functions? I mean it's kinda working for me locally, but it has a very weird behavior: Unpredictable execution of the timeout functions.
Example: I set the timeout with a duration of 5 minutes. So after 5 minutes execute my callback. Most of the time it does that correctly, but sometimes the callback gets executed a lot later than 5 minutes.
But it's only doing so on my local computer. Is this behavior also happening when I'm deploying my functions to firebase?
Cloud Functions have a maximum time they can run, which is documented in time limits. If your timeout makes its callback after that time limit expired, the function will likely already have been terminated. The way expiration happens may be different between the local emulator and the hosted environments.
In general I'd recommend against any setTimeout of more than a few seconds. In Cloud Functions you're being billed for as long as your function is active. If you have a setTimeout of a few minutes, you're being billed for all that time, even when all your code is doing is waiting for a clock to expire. It's likely more (cost) efficient to see if the service you're waiting for has the ability to call a webhook, or to use a cron-job to check if it has completed

Resources