In my task queue I have set max attempts =1 ; but still my cloud function is being called multiple time (around 50) , the cloud run is sending status 200 , but still multiple attempts happens. Does someone have idea what am i doing wrong
Related
Currently, I'm developing app with Firebase(Firestore, Firebase functions, Storage, etc)
I have one pubsub job which runs monthly(This job iterates over thousands of items and each item takes 1-3 seconds to run.)
Therefore, firebase functions timeout in 540 seconds(max limit for firebase functions).
As I researched, i need to move to Cloud App Engine(or something like that) or split into multiple jobs(But not sure how to split it).
Could you share how you achived this problem?
Thank you.
The timeout cannot be increased. You can switch to Cloud Functions Gen 2 that has a 1 hour max timeout instead of 9 minutes. If that is insufficient then it would be best to use something like Cloud Compute.
I need to actively receive crash notifications for firebase functions.
Is there any way to set up Slack webhooks to receive a message when Firebase Functions throw an Error, functions crash, or something like that?
I would love to receive issue messages by velocity ie: Firebase Functions crash 50 times a day.
Thank you so much.
First you have to create a log based (counter) metric that will be counting specific error occurencies and second - you create alerting policy with Slack notification channel.
Let's start from finding corresponding logs that appear when the function throws an error. Since I didn't have one that would crash I used logs that indicated that it was started.
Next you have to create a log based metric. Ignore the next screen and go to Monitoring > Alerting. Click on "Create new policy", find your metric and select "Rolling Window" to whatever time period you need. For testing I used 1 minute. Then set "Rollind windows function" to "mean".
Now configure when the alert has to be triggered - I chose over 3 (within 1 minute window).
On the next screen you select notification channel. In case of Slack it has to be configured first in "Notification Channels".
You can save policy the policy now.
After a few minutes I gathered enough data to generate two incidents:
And here's some alerting related documentation that may help you understand how to use them.
I am trying to call an api every minute for ski lift status and check for changes. I am going to store the value of if the lift is open or closed in firebase (Real Time Database) and read to see if value from api is different and only update/ write to that node when it's a different value. Then I can set up a cloud function that will listen for database changes and send push notifications to the list of FCM tokens from that channel. I am not sure if this is the most efficient way, but I was going to set up scheduled functions to call the third party api.
I have been using these docs:
https://firebase.google.com/docs/functions/schedule-functions
I was planning to do something like this:
exports.scheduledFunction = functions.pubsub.schedule('every 5 minutes').onRun((context) => {
CALL MY API IN HERE AND UPDATE DATABASE IF SNAPSHOT BACK IS DIFFERENT
});
I was wondering how would I run only between set times- say 8am-6pm EST. I am struggling to find anything about times to run. Should I just run the function every minute and then pause and resume by checking the time? In which case how does it know to keep checking the time when it is paused?
Firebase scheduled functions use Cloud Scheduler to implement the schedule. It accepts cron style time specifiers to indicate when a job should be run. The full spec for that can be found here. You will have to use ranges of numbers to indicate the valid times and frequency of the schedule. For example, you might use "8-18" in the hour field to limit the hours of execution.
I want to monitor that a pod in kubernetes is running correctly as cronjob twice a day using stackdriver.
In order to do it I want to send start msg and end msg logs in the pod and I want to create an alert metric in stack driver that if not receiving these msgs after 24 hours, send Email.
Is it possible to do this alerting in stack driver ?
There are several ways of accomplishing this.
In order to generate the event, I think the easiest way is to check on a log-based metric based on the CRON itself. If you are running a kind:CronJob, you can either use the Metrics Explorer to find Resource type:GKE Container Metric: Log entries, and then filter by container_name (which will be your CronJob spec.containers.name)
You could also create a log based metric on something like
logName="projects/[PROJECT-ID]/logs/[CONTAINER-NAME]"
...and maybe add a string to the spec.containers.args section to make filtering easier.
You could also publish to a pub/sub topic and do your alerting on publish message operations.
Once you decide on the metric, you just need to alert if Any time series is absent[1] for 13 hours. Add a notification channel type=email[2], and you will receive an alert whenever the cron does not run at least once a day.
[1] https://cloud.google.com/monitoring/alerts/concepts-indepth#condition-types
[2] https://cloud.google.com/monitoring/support/notification-options#email
In short, we are sometimes seeing that a small number of Cloud Bigtable queries fail repeatedly (for 10s or even 100s of times in a row) with the error rpc error: code = 13 desc = "server closed the stream without sending trailers" until (usually) the query finally works.
In detail, our setup is as follows:
We are running a collection (< 10) of Go services on Google Compute Engine. Each service leases tasks from a pair of PULL task queues. Each task contains an ID of a bigtable row. The task handler executes the following query:
row, err := tbl.ReadRow(ctx, <my-row-id>,
bigtable.RowFilter(bigtable.ChainFilters(
bigtable.FamilyFilter(<my-column-family>),
bigtable.LatestNFilter(1))))
If the query fails then the task handler simply returns. Since we lease tasks with a lease time between 10 and 15 minutes, a little while later the lease will expire on that task, it will be lease again, and we'll retry. The tasks have a max retry of 1000 so they can be retried many times over a long period. In a small number of cases, a particular task will fail with the grpc error above. The task will typically fail with this same error every time it runs for hours or days on end, before (seemingly out of the blue) eventually succeeding (or the task runs out of retries and dies).
Since this often takes so long, it seems unrelated to server load. For example right now on a Sunday morning, these servers are very lightly loaded, and yet I see plenty of these errors when I tail the logs. From this answer, I had originally thought that this might be due to trying to query for a large amount of data, perhaps near the max limit that cloud bigtable will support. However I now see that this is not the case; I can find many examples where tasks that have failed many times finally succeed and report only a small amount of data (e.g. <1 MB) was retrieved.
What else should I be looking at here?
edit: From further testing I now know that this is completely machine (client) independent. If I tail the log on one of the task leasing machines, wait for a "server closed the stream without sending trailers" error, and then try a one-off ReadRow query to the same rowId from another, unrelated, totally unused machine, I get the same error repeatedly.
This error is typically caused by having more than 256MB of data in your reply.
However, there is currently a bug in our server side error handling code that allows some invalid characters in HTTP/2 trailers which is not allowed by the spec. This means that some error messages that have invalid characters will be seen as this kind of error. This should be fixed early next year.