I remember to have read an article where it was explained that Cloud Functions are not guaranteed to be executed and especially in the right order. I can't find any sources on this anymore.
Is this still recent information?
I am aware that the start of a function can take a couple seconds, especially when cold starting the function.
Could I reliably increment a number each time a document is created in a specific Firestore collection without getting my numbers mixed up? I know this is done often but I've never seen information on whether or not it is safe to do.
Following up on question one, are there red flags when using Cloud Functions for payment backend services?
Can I be sure that Cloud Functions are executed in the order that they were triggered i.e. are they queued or executed in parallel?
Could I reliably increment a number each time a document is created in a specific Firestore collection without getting my numbers mixed up?
You can certainly write code to do that. You will need to keep track of a running count of documents in another document, and use a transaction to keep it up to date.
I don't recommend doing this. It's kind an anti-pattern in Firestore to impose sequentially increasing numbers for documents in a collection. If you want time-based ordering, you should consider using a timestamp instead.
Can I be sure that Cloud Functions are executed in the order that they were triggered i.e. are they queued or executed in parallel?
Cloud Functions provides absolutely no guarantee that functions invocations will happen in any order. They are asynchronous and can execute in parallel on multiple server instances, depending on the load applied to the function.
I strongly suggest reading through the documentation to understand the execution environment provided by Cloud Functions.
Related
Let's say there are 2 Firebase writes that go together, such as "liking a post" and "adding my uid to the list of people that liked the post."
Is it better to chain together the actions in my client code (adding the second write in the completion block of the first)?
Or is it better to use a cloud function that does the second write that is triggered on the first?
I'm asking on a cost standpoint, and how susceptible to hacking is client code? Is it easy for the client to disallow the second write after the first, especially in web applications?
The idea behind using cloud functions in this case would be to take as much processing as possible out of your app, not only for cost and security but also to make it more efficient and fast.
The second option is definitely more secure, since the Cloud Function would be a event triggered function, making it "uninvocable" by the malicious users.
As per costs, for Firestore either options would represent 2 writes, but the second option would represent also a cost of Compute Time on your Cloud Function, you can get more details here. There is a free tier for Cloud Functions, but as you app scale this may become significant, so that's something to be considered.
Apparently some of onCreate/onDelete events our cloud function is triggered by are received more than once!
We've observed them arriving even 3 times a few seconds apart from each other spread between instances of the cloud function. Is it a normal behavior or there is something we've been doing wrong?
Have a look at the content of this "Firebase Google Group" post, that I paste below. It was written in August 2018 but is still fully valid at the time of writing this "copy/paste answer".
Cloud Functions typically
guarantees that your functions are run "at least once", meaning that
it's very possible (but typically rare) that an event may get
delivered to your function more than once. To deal with this, your
functions should be "idempotent", meaning that the receipt of the same
event multiple times should result in no further changes to whatever
it is you need to update. This can be kind of challenging, but it's
one of the properties of serverless systems that needs to be dealt
with, if it's problematic for your app.
https://cloud.google.com/functions/docs/bestpractices/tips#write_idempotent_functions
I need to delete very large collections in Firestore.
Initially I used client side batch deletes, but when the documentation changed and started to discouraged that with the comments
Deleting collections from an iOS client is not recommended.
Deleting collections from a Web client is not recommended.
Deleting collections from an Android client is not recommended.
https://firebase.google.com/docs/firestore/manage-data/delete-data?authuser=0
I switched to a cloud function as recommended in the docs. The cloud function gets triggered when a document is deleted and then deletes all documents in a subcollection as proposed in the above link in the section on "NODE.JS".
The problem that I am running into now is that the cloud function seems to be able to manage around 300 deletes per seconds. With the maximum runtime of a cloud function of 9 minutes I can manage up to 162000 deletes this way. But the collection I want to delete currently holds 237560 documents, which makes the cloud function timeout about half way.
I cannot trigger the cloud function again with an onDelete trigger on the parent document, as this one has already been deleted (which triggered the initial call of the function).
So my question is: What is the recommended way to delete large collections in Firestore? According to the docs it's not client side but server side, but the recommended solution does not scale for large collections.
Thanks!
When you have too muck work that can be performed in a single Cloud Function execution, you will need to either find a way to shard that work across multiple invocations, or continue the work in a subsequent invocations after the first. This is not trivial, and you have to put some thought and work into constructing the best solution for your particular situation.
For a sharding solution, you will have to figure out how to split up the document deletes ahead of time, and have your master function kick off subordinate functions (probably via pubsub), passing it the arguments to use to figure out which shard to delete. For example, you might kick off a function whose sole purpose is to delete documents that begin with 'a'. And another with 'b', etc by querying for them, then deleting them.
For a continuation solution, you might just start deleting documents from the beginning, go for as long as you can before timing out, remember where you left off, then kick off a subordinate function to pick up where the prior stopped.
You should be able to use one of these strategies to limit the amount of work done per functions, but the implementation details are entirely up to you to work out.
If, for some reason, neither of these strategies are viable, you will have to manage your own server (perhaps via App Engine), and message (via pubsub) it to perform a single unit of long-running work in response to a Cloud Function.
I tried a couple of my trigger functions as https.onCall and called them after promise return and so far they work really well and faster than the triggers.
What's the catch? Are they also affected by the cold starts too?
If not, then unless it's cron job or lack of support of app language, why should anyone use use a trigger function at all?
All Cloud Functions are affected by cold starts. This is how all serverless function architectures work. In order to scale down to zero (so you pay nothing if you use nothing), all server instances must be able to be decommissioned. Cold start up cost is paid when a new server instance is allocated, so going from zero to one will cost you one cold start.
You haven't defined what a "trigger function" is, so I'll assume you mean a "background function" which triggers in response to events that occur within your project.
Background functions are absolutely required when you want to have some work performed in response to those changes when you can't trust the client to perform that work directly. This is important to maintain data consistency, and also to prevent having to duplicate logic among all your different clients that are all doing the same thing. This also allows you to ship new features and bug fixes without having to ship new client code, which can be difficult and time-consuming.
I want to create an expense tracker and one of the things I want to find out is how much did I spend in each month per category.
How should I do this in FireStore/DataStore?
Pull down required data and do aggregation locally? Seems very slow?
Perform aggregation everytime a transaction is created/updated and save it in a table? But this may result in many invocations of the functions, which may be costly?
Is there a better way? Seems like 2 is currently the best option? But I wonder if theres anyway I can reduce costs?
I note that I may not need the aggregated data to be realtime, so is there a way to debounce the cloud function execution? Since I note that at times, I will batch insert a bunch of transactions. Wonder if theres a way to disable functions for certain queries and manually call them after the batch has finished for example?
The two approaches you describe are indeed the most common.
The best approach mostly depends on the number of transactions you have. If you have few transactions, then it may be totally fine to do the aggregation on each client. But as you get more transactions, the overhead of downloading the data will become prohibitive and you're more likely to want to keep a running total in the database.
I'd normally recommend keeping the total up to date with any transaction. You can even do that with client-side code, by using transactions (to prevent multiple users overwriting each other's updates) and server-side security rules (to prevent malicious actors from writing an aggregate that doesn't match its transaction).
If you want to aggregate in batches, you'll want to run code periodically, either in a server you control, or in Cloud Functions.
There is nothing built into Cloud Functions to debounce document writes. You could probably keep a debounce counter in Firestore, but that would then be reading/writing a document on each transaction.
More reasonable seems to run a function on a timer, as described in this blog post and shown in this video. But you'll need to make sure your data structure in that case allows the code to detect what transactions it needs to aggregate.
One way to do this is to ensure the transactions can be ordered in some way, e.g. by giving them a timestamp, and having your aggregation code keep track (likely in the database) of the last timestamp it has aggregated already. Then whenever the aggregator runs, it:
reads the current aggregated value
queries the database for transactions that have been added since it last ran
loops over those transactions, updating the aggregated value
writes the aggregated value and the last timestamp back to the database in a transaction (to ensure either both are written, or neither is written)