I'm working on a Firebase Cloud Function that updates some aggregate information on some documents in my DB. It's a very simple function and is simply adding 1 to a total # of documents count. Much like the example function found in the Firestore documentation.
I just noticed that when creating a single new document, the function was invoked twice. See below screenshot and note the logged document ID (iDup09btyVNr5fHl6vif) is repeated twice:
After a bit of digging around I found this SO post that says the following:
Delivery of function invocations is not currently guaranteed. As the Cloud Firestore and Cloud Functions integration improves, we plan to guarantee "at least once" delivery. However, this may not always be the case during beta. This may also result in multiple invocations for a single event, so for the highest quality functions ensure that the functions are written to be idempotent.
(From Firestore documentation: Limitations and guarantees)
Which leads me to a problem with their documentation. Cloud Functions as mentioned above are meant to be idempotent (In other words, data they alter should be the same whether the function runs once or runs multiple times). However the example function I linked to earlier (to my eyes) is not idempotent:
exports.aggregateRatings = firestore
.document('restaurants/{restId}/ratings/{ratingId}')
.onWrite(event => {
// Get value of the newly added rating
var ratingVal = event.data.get('rating');
// Get a reference to the restaurant
var restRef = db.collection('restaurants').document(event.params.restId);
// Update aggregations in a transaction
return db.transaction(transaction => {
return transaction.get(restRef).then(restDoc => {
// Compute new number of ratings
var newNumRatings = restDoc.data('numRatings') + 1;
// Compute new average rating
var oldRatingTotal = restDoc.data('avgRating') * restDoc.data('numRatings');
var newAvgRating = (oldRatingTotal + ratingVal) / newNumRatings;
// Update restaurant info
return transaction.update(restRef, {
avgRating: newAvgRating,
numRatings: newNumRatings
});
});
});
});
If the function runs once, the aggregate data is increased as if one rating is added, but if it runs again on the same rating it will increase the aggregate data as if there were two ratings added.
Unless I'm misunderstanding the concept of idempotence, this seems to be a problem.
Does anyone have any ideas of how to increase / decrease aggregate data in Cloud Firestore via Cloud Functions in a way that is idempotent?
(And of course doesn't involve querying every single document the aggregate data is regarding)
Bonus points: Does anyone know if functions will still need to be idempotent after Cloud Firestore is out of beta?
The Cloud Functions documentation gives some guidance on how to make retryable background functions idempotent. The bullet point you're most likely to be interested in here is:
Impose a transactional check outside the function, independent of the code. For example, persist state somewhere recording that a given event ID has already been processed.
The event parameter passed to your function has an eventId property on it that is unique, but will be the same when an even it retried. You should use this value to determine if an action taken by an event has already occurred, so you know to skip the action the second time, if necessary.
As for how exactly to check if an event ID has already been processed by your function, there's a lot of ways to do it, and that's up to you.
You can always opt out of making your function idempotent if you think it's simply not worthwhile, or it's OK to possibly have incorrect counts in some (probably rare) cases.
Related
I'm trying to keep track of the number of documents in collections and the number of users in my Firebase project. I set up some .create triggers to update a stats document using increment, but sometimes the .create functions trigger multiple times for a single creation event. This happens with both Firestore documents and new users. Any ideas?
const functions = require('firebase-functions');
const admin = require('firebase-admin');
const firestore = require('#google-cloud/firestore')
admin.initializeApp();
const db = admin.firestore()
/* for counting documents created */
exports.countDoc = functions.firestore
.document('collection/{docId}')
.onCreate((change, context) => {
const docId = context.params.docId
db.doc('stats/doc').update({
'docsCreated': firestore.FieldValue.increment(1)
})
return true;
});
/* for counting users created */
exports.countUsers = functions.auth.user().onCreate((user) => {
db.doc('stats/doc').update({
'usersCreated': firestore.FieldValue.increment(1)
})
return true;
});
Thanks!
There is some advice on how to achieve your functions' idempotency.
There are FieldValue.arrayUnion() & FieldValue.arrayRemove() functions which safely remove and add elements to an array, without duplicates or errors if the element being deleted is nonexistent.
You can make array fields in your documents called 'users' and 'docs' and add there data with FieldValue.arrayUnion() by triggered functions. With that approach you can retrieve the actual sizes on the client side by getting users & docs fields and calling .size() on it.
You should expect that a background trigger could possibly be executed multiple times per event. This should be very rare, but not impossible. It's part of the guarantee that Cloud Functions gives you for "at-least-once execution". Since the internal infrastructure is entirely asynchronous with respect to the execution of your code on a dedicated server instance, that infrastructure might not receive the signal that your function finished successfully. In that case, it triggers the function again in order to ensure delivery.
It's recommended that you write your function to be idempotent in order to handle this situation, if it's important for your app. This is not always a very simple thing to implement correctly, and could also add a lot of weight to your code. There are also many ways to do this for different sorts of scenarios. But the choice is yours.
Read more about it in the documentation for execution guarantees.
I'm trying to update the same document which triggered an onUpdate cloud function, with a read value from the same collection.
This is in a kind of chat app made in Flutter, where the previous response to an inquiry is replicated to the document now being updated, for easier showing in the app.
The code does work, however when a user quickly responds to two separate inquiries, they both read the same latest response thus setting the same previousResponse. This must be down to the asynchronous nature of flutter and/or the cloud function, but I can't figure out where to await or if there's a better way to make the function, so it is never triggering the onUpdate for the same user, until a previous trigger is finished.
Last part also sound a bit like a bad idea.
So far I tried sticking the read/update in a transaction, however that only seems to work for the single function call, and not when they're asynchronous.
Also figured I could fix it, by reading the previous response in a transaction on the client, however firebase doesn't allow reading from a collection in a transaction, when not using the server API.
async function setPreviousResponseToInquiry(
senderUid: string,
recipientUid: string,
inquiryId: string) {
return admin.firestore().collection('/inquiries')
.where('recipientUid', '==', recipientUid)
.where('senderUid', '==', senderUid)
.where('responded', '==', true)
.orderBy('modified', 'desc')
.limit(2)
.get().then(snapshot => {
if (!snapshot.empty &&
snapshot.docs.length >= 2) {
return admin.firestore()
.doc(`/inquiries/${inquiryId}`)
.get().then(snap => {
return snap.ref.update({
previousResponse: snapshot.docs[1].data().response
})
})
}
})
}
I see three possible solutions:
Use a transaction on the server, which ensures that the update you write must be based on the version of the data you read. If the value you write depends on the data that trigger the Cloud Function, you may need to re-read that data as part of the transaction.
Don't use Cloud Functions, but run all updates from the client. This allows you to use transactions to prevent the race condition.
If it's no possible to use a transaction, you may have to include a custom version number in both the upstream data (the data that triggers the write), and the fanned out data that you're updating. You can then use security rules to ensure that the downstream data can only be written if its version matches the current upstream data.
I'd consider/try them in the above order, as they gradually get more involved.
When writing event based cloud functions for firebase firestore it's common to update fields in the affected document, for example:
When a document of users collection is updated a function will trigger, let's say we want to determine the user info state and we have a completeInfo: boolean property, the function will have to perform another update so that the trigger will fire again, if we don't use a flag like needsUpdate: boolean to determine if excecuting the function we will have an infinite loop.
Is there any other way to approach this behavior? Or the situation is a consequence of how the database is designed? How could we avoid ending up in such scenario?
I have a few common approaches to Cloud Functions that transform the data:
Write the transformed data to a different document than the one that triggers the Cloud Function. This is by far the easier approach, since there is no additional code needed - and thus I can't make any mistakes in it. It also means there is no additional trigger, so you're not paying for that extra invocation.
Use granular triggers to ensure my Cloud Function only gets called when it needs to actually do some work. For example, many of my functions only need to run when the document gets created, so by using an onCreate trigger I ensure my code only gets run once, even if it then ends up updating the newly created document.
Write the transformed data into the existing document. In that case I make sure to have the checks for whether the transformation is needed in place before I write the actual code for the transformation. I prefer to not add flag fields, but use the existing data for this check.
A recent example is where I update an amount in a document, which then needs to be fanned out to all users:
exports.fanoutAmount = functions.firestore.document('users/{uid}').onWrite((change, context) => {
let old_amount = change.before && change.before.data() && change.before.data().amount ? change.before.data().amount : 0;
let new_amount = change.after.data().amount;
if (old_amount !== new_amount) {
// TODO: fan out to all documents in the collection
}
});
You need to take care to avoid writing a function that triggers itself infinitely. This is not something that Cloud Functions can do for you. Typically you do this by checking within your function if the work was previously done for the document that was modified in a previous invocation. There are several ways to do this, and you will have to implement something that meets your specific use case.
I would take this approach from an execution time perspective, this means that the function for each document will be run twice. Each time when the document is triggered, a field lastUpdate would be there with a timestamp and the function only updates the document if the time is older than my time - eg 10 seconds.
My use case is that I want to keep aggregating my firebase user count in the database for quick and easy access. For that, I have a cloud function listening on user.onCreate and it simply increments a field in a document using the atomic FieldValue.increment.
Here is the code:
exports.createProfile = functions.auth.user().onCreate(async user => {
return Promise.all([
addProfileToDatabase(),
function() {
db.collection('someCollection').doc(docId).update({
count: admin.firestore.FieldValue.increment(1)
)}
}
])
})
Issue: the count in the database becomes more than the number of authenticated users shown in the Authentication tab of Firebase. I regularly reset it to the correct number and then it slowly increases again.
I have read about the write throttling on a document, but that should instead result in lower count if at all. But why is that the count in the database always overshoots the actual count?
Without seeing your code, the only thing I can imagine is that your function isn't idempotent. It's possible that functions may be invoked more than once per triggering event. This would be an explanation why your count exceeds the expectation.
Read more about Cloud Functions idempotency in the documentation and also this video.
In general, if I want to be sure what happens when several threads make concurrent updates to the same item in DynamoDB, I should use conditional updates (i.e.,"optimistic locking"). I know that. But I was wondering if there is any other case when I can be sure that concurrent updates to the same item survive.
For example, in Cassandra, making concurrent updates to different attributes of the same item is fine, and both updates will eventually be available to read. Is the same true in DynamoDB? Or is it possible that only one of these updates survive?
A very similar question is what happens if I add, concurrently, two different values to a set or list in the same item. Am I guaranteed that I'll eventually see both values when I read this set or list, or is it possible that one of the additions will mask out the other during some sort of DynamoDB "conflict resolution" protocol?
I see a version of my second question was already asked here in the past Are DynamoDB "set" values CDRTs?, but the answer refered to a not-very-clear FAQ entry which doesn't exist any more. What's I would most like to see as an answer to my question is an official DynamoDB documentation that says how DynamoDB handles concurrent updates when neither "conditional updates" nor "transactions" are involved, and in particular what happens in the above two examples. Absent such official documentation, does anyone have any real-world experience with such concurrent updates?
I just had the same question and came across this thread. Given that there was no answer I decided to test it myself.
The answer, as far as I can observe is that as long as you are updating different attributes it will eventually succeed. It does take a little bit longer the more updates I push to the item so they appear to be written in sequence rather than in parallel.
I also tried updating a single List attribute in parallel and this expectedly fail, the resulting list once all queries had completed was broken and only had some of the entries pushed to it.
The test I ran was pretty rudimentary and I might be missing something but I believe the conclusion to be correct.
For completeness, here is the script I used, nodejs.
const aws = require('aws-sdk');
const ddb = new aws.DynamoDB.DocumentClient();
const key = process.argv[2];
const num = process.argv[3];
run().then(() => {
console.log('Done');
});
async function run() {
const p = [];
for (let i = 0; i < num; i++) {
p.push(ddb.update({
TableName: 'concurrency-test',
Key: {x: key},
UpdateExpression: 'SET #k = :v',
ExpressionAttributeValues: {
':v': `test-${i}`
},
ExpressionAttributeNames: {
'#k': `k${i}`
}
}).promise());
}
await Promise.all(p);
const response = await ddb.get({TableName: 'concurrency-test', Key: {x: key}}).promise();
const item = response.Item;
console.log('keys', Object.keys(item).length);
}
Run like so:
node index.js {key} {number}
node index.js myKey 10
Timings:
10 updates: ~1.5s
100 updates: ~2s
1000 updates: ~10-20s (fluctuated a lot)
Worth noting is that the metrics show a lot of throttled events but these are handled internally by the nodejs sdk using exponential backoff so once the dust settled everything was written as expected.
Your post contains quite a lot of questions.
There's a note in DynamoDB's manual:
All write requests are applied in the order in which they were received.
I assume that the clients send the requests in the order they were passed through a call.
That should resolve the question whether there are any guarantees. If you update different properties of an item in several requests updating only those properties, it should end up in an expected state (the 'sum' of the distinct changes).
If you, on the other hand, update the whole object, the last one will win.
DynamoDB has #DynamoDbVersion which you can use for optimistic locking to manage concurent writes of whole objects.
For scenarios like auctions, parallel tick counts (such as "likes"), DynamoDB offers AtomicCounters.
If you update a list, that depends on if you use the DynamoDB's list type (L), or if it is just a property and the client translates the lists into a String (S). So if you read a property, change it, and write, and do that in parallel, the result will be subject to eventual consistency - what you will read may not be the latest write. Applied to lists, and several times, you'll end up with some of the elements added, and some not (or, better said, added but then overwritten).