Using Firestore Triggers to Manage User Document Count - firebase

If every document in a collection is a user resource that is limited, how can you ensure the user does not go over their assigned limit?
My first thought was to take advantage of the Firestore triggers to avoid building a real backend, but the triggers sometimes fire more than once even if the inputed data has not changed. I was comparing the new doc to the old doc and taking action if certain keys did not match but if GCP fires the same function twice I get double the result. In this case incrementing or decrementing counts.
The Firestore docs state:
Events are delivered at least once, but a single event may result in multiple function invocations. Avoid depending on exactly-once mechanics, and write idempotent functions.
So in my situation the only solution I can think of is saving the event id's somewhere and ensuring they did not fire already. Or even worse doing a read on each call to count the current docs and adjust them accordingly (increasing read costs).
Whats a smart way to approach this?

If reinvocations (which while possible are quite uncommon) are a concern for your use-case, you could indeed store the ID of the invocation event or something less frequent, like (depending on the use-case) the source document ID.

Related

Firestore : Maintaining the count of a collection. Trigger function vs transaction

Let's say I have a collection called persons and another collection called cities with a field population. When a Person is created in a City, I would like to increment the population field in the corresponding city.
I have two options.
Create a onCreate trigger function. Find the city document and increment using FieldValue.increment(1).
Create an HTTPS callable cloud function to create the person. The cloud function executes a transaction in which the person is created and the population is incremented.
The first one is simpler and I am using it right now. But, I am wondering if there could be cases where the onCreate is not called due to some glitch...
I am thinking of moving to the second option. I am wondering if there are any disadvantages. Does HTTPS callable function cost more?
The only problem I see with the HTTPS callables would be that if something fails you would need to handle that on your client side. That would be (at least for me) a little bit to much logic for the client side.
What I can recommend you after almost 4 years experience with exactly that problem is a solution with a virtual queue. I had a long dicussion on that theme here and even with the Firebase ppl on the last in person Google IO and Firebase Summit.
Our problem was that there where those glitches and even if they happend sometimes the changes and transaction failed due to too much requests. After trying every offical recommendation like the shard counters etc. we ended up creating a virtual queue where each onCreate adds an entry to just a Firestore or RTD list/collection and another function that runs eaither by crone or another trigger (that doesn't matter). That cloud function handles each entry in the queue one by one and starts again for each of them to awoid timouts and memeroy limits. We made sure one handler/calculation is enought for a single function to handle it.
This method was the only bullet proof one that could handle thousands of new entries in a second without having an issue. The only downside is that it takes more time than an usual trigger because each entries is calculated one by one. If your calculations are smaller you could do them in batches (that is how we started to).

How to initiate a single calculation from batched document onWrite triggers

I have a subcollection with documents. If any of them are added or removed, I need to trigger a calculation to derive an overall count (amongst other things) and store that in the parent document.
In order to listen for the document changes, I have a firestore onWrite background function. From this function, I would like to trigger the calculation via pubsub. However, it will happen that the system updates many subcollection documents at once. If I delete 100 documents, I do not want the calculation to be triggered 100 times. That would be a real waste of resources.
So I'm wondering, is there already some sort of mechanism in place that would batch these triggers or the pubsub topic publishing, or do I need to do something specific to make this happen?
If there are other ways to better solve this problem I'm open to suggestions of course. I could possibly even introduce a Redis store if that helps.

Downside of using transactions in google firestore

I'm developing a Flutter App and I'm using the Firebase services. I'd like to stick only to using transactions as I prefer consistency over simplicity.
await Firestore.instance.collection('user').document(id).updateData({'name': 'new name'});
await Firestore.instance.runTransaction((transaction) async {
transaction.update(Firestore.instance.collection('user').document(id), {'name': 'new name'});
});
Are there any (major) downsides to transactions? For example, are they more expensive (Firebase billing, not computationally)? After all there might be changes to the data on the Firestore database which will result in up to 5 retries.
For reference: https://firebase.google.com/docs/firestore/manage-data/transactions
"You can also make atomic changes to data using transactions. While
this is a bit heavy-handed for incrementing a vote total, it is the
right approach for more complex changes."
https://codelabs.developers.google.com/codelabs/flutter-firebase/#10
With the specific code samples you're showing, there is little advantage to using a transaction. If your document update makes a static change to a document, without regard to its existing data, a transaction doesn't make sense. The transaction you're proposing is actually just a slower version of the update, since it has to round-trip with the server twice in order to make the change. A plain update just uses a single round trip.
For example, if you want to append data to a string, two clients might overwrite each other's changes, depending on when they each read the document. Using a transaction, you can be sure that each append is going to take effect, no matter when the append was executed, since the transaction will be retried with updated data in the face of concurrency.
Typically, you should strive to get your work done without transactions if possible. For example, prefer to use FieldValue.increment() outside of a transaction instead of manually incrementing within a transaction.
Transactions are intended to be used when you have changes to make to a document (or, typically, multiple documents) that must take the current values of its fields into account before making the final write. This prevents two clients from clobbering each others' changes when they should actually work in tandem.
Please read more about transactions in the documentation to better understand how they work. It is not quite like SQL transactions.
Are there any (major) downsides to transactions?
I don't know any downsides.
For example, are they more expensive (Firebase billing, not computationally)?
No, a transaction costs like any other write operaton. For example, if you create a transaction to increase a counter, you'll be charged with only one write operation.
I'm not sure I understand your last question completely but if a transaction fails, Cloud Firestore retries the transaction for sure.

Prevent more than 1 write a second to a Firestore document when using a counter with cloud function

Background:
I have a Firestore database with a users collection. Each user is a document which contains a contacts collection. Each document in that collection is a single contact.
Since firestore does not have a "count" feature for all documents, and since I don't want to read all contacts to count how many contacts a user has, I trigger cloud functions when a contact is added or deleted which increments or decrements numberOfContacts in the user document. In order to make the function idempotent, it has to do multiple reads and writes to prevent incrementing the counter more than once if it's called more than once for the same document. This means that I need to have a different collection of eventIDs that I've already handled so I don't duplicate it. This requires me to run another function once a month to go through each user deleting all such documents (which is a lot of reads and some writes).
Issue
Now the challenge is that the user can import his/her contacts. So if a user imports 10,000 contacts, this function will get fired 10,000 times in quick succession.
How do I prevent that?
Current approach:
Right now I am adding a field in the contact document that indicates that the addition was part of an import. This gets the cloud function to not increment.
I perform the operation from the client 499 contacts at a time in a transaction, which also increments the count as the 500th write. That way the count stays consistent if something failed halfway.
Is this really the best way? It seems so complicated to just have a count of contacts available. I end up doing multiple reads and writes each time a single contact changes plus I have to run a cleanup function every month.
I keep thinking there's gotta be a simpler way.
For those who're curious, it seems like the approach I am taking is the best appraoch.
I add a field in the contact document that indicates that the addition was part of an import (bulkAdd = true). This gets the cloud function to not increment.
I have another cloud function add the contacts 200 at a time (I do FieldValue.timestamp and that counts as another write, so it's 400 writes). I do this in a batch and the 401th write in the batch is the increment count. That way I can bulk import contacts without having to bombard a single document with writes.
Problem with increments
There are duplicate-safe operations like FieldValue.arrayUnion() & FieldValue.arrayRemove(). I wrote a bit about that approach here: Firebase function document.create and user.create triggers firing multiple times
By this approach you make your user document contain a special array field with contact IDs. Once the contact is added to a subcollection and your function is triggered, the contact's id can be written to this field. If the function is triggered twice or more times for one contact, there will be only one instance of it written into the master user doc. But the actual size can be fetched on the client or with one more function triggered on the user doc update. This is a bit simplier than having eventIDs.
Problem with importing 10k+ contacts
This is a bit philosophically.
If I got it, the problem is that a user performs 10k writes. Than these 10k writes trigger 10k functions, which perform additional 10k writes to the master doc (and same amount of reads if they use eventIDs document)?
You can make a special subcollection just for importing multiple contacts to your DB. Instead of writing 10k docs to the DB, the client would create one but big document with 10k contact fields, which triggers a cloud function. The mentioned function would read it all, make the neccessary 10k contact writes + 1 write to master doc with all the arrayUnions. You would just need to think how to prevent 10k invoked function writes (adding a special metadata field like yours bulkAdd)
This is just an opinion.

Do Firestore Function Triggers count as reads?

I know what you are probably thinking, "why does it matter? Don't try to over-complicate it just to optimize pricing". In my case, I need to.
I have a collection with millions of records in Firestore, and each document gets updated quite often. Every-time one gets updated, I need to do some data-cleaning (and more). So I have a function trigger by onUpdate that does that. In the function there's two parameters: document before update and document after update.
My question is:
Because the document is been passed as an argument, does that count as a database read?
The event generated by Cloud Firestore to send to Cloud Functions should not count as an extra read beyond what what was done by the client to initially trigger that event.

Resources