Is it OK to perform thousand of read and delete Firestore operations in one function of Cloud Function? - firebase

I have an events as a parent collection that has Attendee subcollection to record all users that will attend the event like the image below. the Attendee subcollection contains users data
and then also have users as parent collection that has attendedEvents subcollection to record all events that will be visited by the user like the image below. attendedEvents` subcollection events data.
I use denormalization, so it seems the events data is duplicated in attendedEvents subcollection like this
and then I make a cron job using cloud function. this cron job task is to evaluate if an event has been passed (expired) or not. If the event has been passed, then this function should:
update the field of the event data from isActive == true to be isActive == false
read all its Attendee documents in all expired events, get all the attendeeIDs, and then delete all events data in attendedEvents subcollection of users collection.
As you can see, the second task of my cron job functions may need to read around 50.000 - 100.000 documents and then also need to delete around 50.000 - 100.000 documents as the worst case scenario (peak).
So my question is, Is it OK to perform thousand of read and delete operations in one function of Cloud Function like this ?
I am worried there is a limitation that I don't know. I am not sure, is there something that I have not been considered ? is there a better approach for this, maybe ?
Here is my cloud function code:
exports.cronDeactivatingExpiredEvents = functions.https.onRequest(async (request,response) => {
const now = new Date()
const oneMonthAgo = moment().subtract(1,"month").toDate()
try {
const expiredEventsSnapshot = await eventRef
.where("isActive","==",true)
.where("hasBeenApproved","==",true)
.where("dateTimeStart",">",oneMonthAgo)
.where("dateTimeStart","<",now)
.get()
const eventDocumentsFromFirestore = expiredEventsSnapshot.docs
const updateEventPromises = []
eventDocumentsFromFirestore.forEach(eventSnapshot => {
const event = eventSnapshot.data()
const p = admin.firestore()
.doc(`events/${event.eventID}`)
.update({isActive: false})
updateEventPromises.push(p)
})
// 1. update isActive to be false in firestore document
await Promise.all(updateEventPromises)
console.log(`Successfully deactivating ${expiredEventsSnapshot.size} expired events in Firestore`)
// getting all attendeeIDs.
// this may need to read around 50.000 documents
const eventAttendeeSnapshot = await db.collection("events").doc(eventID).collection("Attendee").get()
const attendeeDocuments = eventAttendeeSnapshot.docs
const attendeeIDs = []
attendeeDocuments.forEach( attendeeSnapshot => {
const attendee = attendeeSnapshot.data()
attendeeIDs.push(attendee.uid)
})
// 3. then delete expired event in users subcollection.
// this may need to delete 50.000 documents
const deletePromises = []
attendeeIDs.forEach( attendeeID => {
const p = db.collection("users").doc(attendeeID).collection("attendedEvents").doc(eventID).delete()
deletePromises.push(p)
})
await Promise.all(deletePromises)
console.log(`successfully delete all events data in user subcollection`)
response.status(200).send(`Successfully deactivating ${expiredEventsSnapshot.size} expired events and delete events data in attendee subcollection`)
} catch (error) {
response.status(500).send(error)
}
})

You have to pay attention to a few things here.
1) There are some limits on the side of the Cloud Function. A quota you might hit depending on how you use the data you're reading is Outbound Socket Data which is 10GB/100seconds excluding HTTP response data. In case you hit this quota you can request a quota increase by going to IAM & admin >> Quotas >> Edit Quotas and select Cloud Function API (Outgoing socket traffic for the Region you want).
However, there is also the Maximum function duration of 540 seconds. I believe what you have described should not take that long. In case it does, then if you are committing a batch delete the deletion will be done even if your function fails because of exceeding the duration.
2) On Firestore side, you have some limits too. Here you can read about some best practices when dealing with Read/Write operations and High read, write, and delete rates. Depending on the structure and the type of your data you might encounter some issues such as connection errors if you try to delete lexicographically close documents at a high rate.
Also keep in mind the more generic Firestore quotas on the number of Read/Write operations for each payment plan.
In any way, even with the best calculations there is always a room for error. So my advice would be to try a test scenario with the highest peak you are expecting. If you hit any quotas you can request a quota increase, or if you hit any hard limits you can contact Google Cloud Platform Support providing specific details on your project and use-case.

Related

How to keep data consistent when sync documents with cloud firestore triggers?

I have a document MAIN which holds an array of objects. On every change of this array i want to update related documents. So in the example below If I add something to the targets of MAIN i want to grab all documents where the field "parent" holds a reference to MAIN and then update their targets accordingly. I wanted to do this with cloud functions so the client does not have to care about updating all related documents himself. But as stated in the docs cloud-funcion triggers do not guarantee order. So if f.e. a user adds a new object to the targets of MAIN and then removes it, cloud trigger would maybe receive the remove event before the add event and thus RELATED documents would be left with inconsistent data. As stated by Doug Stevenson in this stackoverflow post this could happen, even when using transactions. Am I right so far?
const MAIN = {
ID:"MAIN"
targets:[{name: "House"}, {name:"Car"}]
}
const RELATED_1 = {
parent: "MAIN",
targets:[{name: "House"}, {name:"Car"}]
}
const RELATED_2 = {
parent: "MAIN",
targets:[{name: "House"}, {name:"Car"}]
}
If yes, I was thinking about adding a servertimestamp to object MAIN whenever I modify the Document. I would use this timestamp to only update RELATED Documents if their timestamp is smaller then the one of the parent. If yes I update the array and set the timestamp of the parent.
const MAIN = {
ID:"MAIN",
targets:[{name: "House"}, {name:"Car"}],
modifiedAt: 11.04.2022 10:25:33:233
}
const RELATED_1 = {
parent: "MAIN",
lastSync: 11.04.2022 10:25:33:233,
targets:[{name: "House"}, {name:"Car"}]
}
const RELATED_1 = {
parent: "MAIN",
lastSync: 11.04.2022 10:25:33:233,
targets:[{name: "House"}, {name:"Car"}]
}
Would that work? Or how could one sync denormalized data with cloud functions and keep data consistent? Is this even possible?
Using the Cloud Firestore client libraries, you can group multiple operations into a single transaction. Transactions are useful when you want to update a field's value based on its current value, or the value of some other field.
A transaction consists of any number of get() operations followed by any number of write operations such as set(), update(), or delete(). In the case of a concurrent edit, Cloud Firestore runs the entire transaction again. For example, if a transaction reads documents and another client modifies any of those documents, Cloud Firestore retries the transaction. This feature ensures that the transaction runs on up-to-date and consistent data.
Transactions are a way to always ensure a write occurs with the latest information available on the server. Transactions never partially apply writes & all writes execute at the end of a successful transaction.
For a transaction to succeed, the documents retrieved by its read operations must remain unmodified by operations outside the transaction. If another operation attempts to change one of those documents, that operations enters a state of data contention with the transaction.
A transaction works differently than a regular update. It goes like this:
Firestore runs the transaction.
You get the document ready to update whatever property you want to update.
Firestore checks if the document has changed. If not, you’re good, and your update goes through.
If the document has changed, let’s say a new update happened before yours did, then Firestore gets you the new version of the document and repeats the process until it finally updates the document.
The following example from firebase documentation shows how to create and run a transaction:
// Initialize document
const cityRef = db.collection('cities').doc('SF');
await cityRef.set({
name: 'San Francisco',
state: 'CA',
country: 'USA',
capital: false,
population: 860000
});
try {
await db.runTransaction(async (t) => {
const doc = await t.get(cityRef);
// Add one person to the city population.
// Note: this could be done without a transaction
// by updating the population using FieldValue.increment()
const newPopulation = doc.data().population + 1;
t.update(cityRef, {population: newPopulation});
});
console.log('Transaction success!');
} catch (e) {
console.log('Transaction failure:', e);
}
For more information on the above can refer to how do transactions work ,updating data and fastest way to perform bulk data creation

Is is possible to know that onSnpashot connection couldn't be established because of no/bad internet connection? (WEB SDK) [duplicate]

I want to check user's online status, in realtime database I used to check this with the help of onDisconnect(), but now I've shifted to firestore and can't find any similar method in that.
According to this onDisconnect:
The onDisconnect class is most commonly used to manage presence in applications where it is useful to detect how many clients are connected and when other clients disconnect.
To be able to use presence in firestore, you need to connect firestore with realtime firebase(no other way).
Please check this for more info:
https://firebase.google.com/docs/firestore/solutions/presence
NOTE: This solution is not especially efficient
Off the top of my head (read: I haven't thought through the caveats), you could do something like this:
const fiveMinutes = 300000 // five minutes, or whatever makes sense for your app
// "maintain connection"
setInterval(() => {
userPresenceDoc.set({ online: new Date().getTime() })
}, fiveMinutes)
then, on each client...
userPresenceDoc.onSnapshot(doc => {
const fiveMinutesAgo = new Date().getTime() - fiveMinutes
const isOnline = doc.data().online > fiveMinutesAgo
setUserPresence(isOnline)
})
You'd probably want the code checking for presence to use an interval a little more than the interval used by the code maintaining the connection to account for network lag, etc.
A NOTE ABOUT COST
So, obviously, there could be significant lag between when someone disconnects and when that disconnection is recognized by other clients. You can decrease that lag time by increasing the frequency of writes to Firestore, and thus increasing your costs. Running the numbers, I came out with the following costs for different intervals, assuming a single client connection running continuously for a month:
Interval Cost/User/Month
----------------------------
10m $0.007776
5m $0.015552
1m $0.07776
10s $0.46656
5s $0.93312
1s $4.6656
Going with an interval of one second is pretty pricy, at $46,656 a month for a system with 10,000 users who leave the app open all month long. An interval of 10 minutes with the same number of users would only cost $77.76 a month. A more reasonable interval of one minute, 10,000 users, and only four hours of app usage per day per user, rings in at $129.60 / month.
There is no equivalent. The Firestore SDK currently doesn't have presence management like the Realtime Database SDK.
Instead, you might want to use the Realtime Database onDisconnect() in conjunction with Cloud Functions to kick off some work when the client disconnects from RTDB. You would be assuming that that your app probably also lost its connection to Firestore at the same time.
Try this, but this method kind of hackish because we cannot use onDisconnected in firestore. As far as I know, realtime database use secure WebSocket technology, so thats why onDisconnected have on it
But you can use realtime database that can be implemented in cloud functions
to update the firestore data,
functions.database.ref('users/{userId}').onUpdate()
Somewhere in clientside:
firebase.database()
.ref('.info/connected')
.on('value', async (snap) => {
if (snap.val() === true) {
// Update the online status in RTDB
await firebase.database()
.ref(`users/${credentials.user.uid}/`)
.set({
online_status: true,
start_online: firebase.database.ServerValue.TIMESTAMP
});
// OnDisconnect
firebase.database()
.ref(`users/${credentials.user.uid}/`)
.onDisconnect()
.set({
online_status: false,
last_online: firebase.database.ServerValue.TIMESTAMP
});
}
});
in cloudfunctions (this will be trigger on update)
export const onUserOnlineStatusChanged = functions.database.ref('users/{userId}').onUpdate((event: functions.Change<functions.database.DataSnapshot>, context: functions.EventContext) => {
return event.after.ref.once('value')
.then((dataSnapshot) => dataSnapshot.val()) // Get the latest value from the Firebase Realtime database
.then((value: any) => {
// Update the value from RTDB to Firestore
console.log('value.online_status', value.online_status);
if (value.online_status == true) {
// Set the value to the firestore
admin.firestore()
.collection('users_info')
.doc(context.params.userId) // Get document by the userId / Or use .where
.set({
online_status: value.online_status,
updated_at: new Date
}, {
mergeFields: [
'online_status',
'updated_at'
]
});
// Add code if necessary (when the online_status is true)
} else if (value.online_status == false) {
// Set the value to the firestore
admin.firestore()
.collection('users_info')
.doc(context.params.userId) // Get document by the userId / Or use .where
.set({
online_status: value.online_status,
updated_at: new Date
}, {
mergeFields: [
'online_status',
'updated_at'
]
});
// Add code if necessary (when the online_status is false)
}
});
});
It would take about 1 or 2 seconds to update from cloud function to firestore
There is not a direct way to do this thing but this trick helped me achieve this disconnect listener.
window.addEventListener("beforeunload", async function (e) { e.preventDefault(); await firestoreRef.doc("doc-ref").update({ online: false }); });

minimize time operation in firebase/firestore

I build react native app with firebase & firestore.
what I'm looking to do is, when user open app, to insert/update his status to 'online' (kind of presence system), when user close app, his status 'offline'.
I did it with firebase.database.onDisconnect(), it works fine.
this is the function
async signupAnonymous() {
const user = await firebase.auth().signInAnonymouslyAndRetrieveData();
this.uid = firebase.auth().currentUser.uid
this.userStatusDatabaseRef = firebase.database().ref(`UserStatus/${this.uid}`);
this.userStatusFirestoreRef = firebase.firestore().doc(`UserStatus/${this.uid}`);
firebase.database().ref('.info/connected').on('value', async connected => {
if (connected.val() === false) {
// this.userStatusFirestoreRef.set({ state: 'offline', last_changed: firebase.firestore.FieldValue.serverTimestamp()},{merge:true});
return;
}
await firebase.database().ref(`UserStatus/${this.uid}`).onDisconnect().set({ state: 'offline', last_changed: firebase.firestore.FieldValue.serverTimestamp() },{merge:true});
this.userStatusDatabaseRef.set({ state: 'online', last_changed: firebase.firestore.FieldValue.serverTimestamp() },{merge:true});
// this.userStatusFirestoreRef.set({ state: 'online',last_changed: firebase.firestore.FieldValue.serverTimestamp() },{merge:true});
});
}
after that, I did trigger to insert data into firestore(because I want to work with firestore), this is the function(works fine, BUT it takes 3-4 sec)
module.exports.onUserStatusChanged = functions.database
.ref('/UserStatus/{uid}').onUpdate((change,context) => {
const eventStatus = change.after.val();
const userStatusFirestoreRef = firestore.doc(`UserStatus/${context.params.uid}`);
return change.after.ref.once("value").then((statusSnapshot) => {
return statusSnapshot.val();
}).then((status) => {
console.log(status, eventStatus);
if (status.last_changed > eventStatus.last_changed) return status;
eventStatus.last_changed = new Date(eventStatus.last_changed);
//return userStatusFirestoreRef.set(eventStatus);
return userStatusFirestoreRef.set(eventStatus,{merge:true});
});
});
then after that, I want to calculate the online users in app, so I did trigger when write new data to node of firestore so it calculate the size of online users by query.(it works fine but takes 4-7 sec)
module.exports.countOnlineUsers = functions.firestore.document('/UserStatus/{uid}').onWrite((change,context) => {
console.log('userStatus')
const userOnlineCounterRef = firestore.doc('Counters/onlineUsersCounter');
const docRef = firestore.collection('UserStatus').where('state','==','online').get().then(e=>{
let count = e.size;
console.log('count',count)
return userOnlineCounterRef.update({count})
})
return Promise.resolve({success:'added'})
})
then into my react native app
I get the count of online users
this.unsubscribe = firebase.firestore().doc(`Counters/onlineUsersCounter`).onSnapshot(doc=>{
console.log('count',doc.data().count)
})
All the operations takes about 12 sec. it's too much for me, it's online app
my firebase structure
what I'm doing wrong? maybe there is unnecessary function or something?
My final goals:
minimize time operation.
get online users count (with listener-each
change, it will update in app)
update user status.
if there are other way to do that, I would love to know.
Cloud Functions go into a 'cold start' mode, where they take some time to boot up. This is the only reason I can think of that it would take that long. Stack Overflow: Firebase Cloud Functions Is Very Slow
But your cloud function only needs to write to Firestore on log out to
catch the case where your user closes the app. You can write to it directly on log in from your client
with auth().onAuthStateChange().
You could also just always read who is logged in or out directly from the
realtime database and use Firestore for the rest of your data.
You can rearrange your data so that instead of a 'UserStatus' collection you have an 'OnlineUsers' collection containing only online users, kept in sync by deleting the documents on log out. Then it won't take a query operation to get them. The query's impact on your performance is likely minimal, but this would perform better with a large number of users.
The documentation also has a guide that may be useful: Firebase Docs: Build Presence in Cloud Firestore

Writing 4 returns for one trigger, that each return is to different docRef

I want to write a firestore function that onCreate of a new document
it will update couple different docs.
for example for statistics, by adding new sports session doc to sessions collection.
it will update the docs: yearlyStats, quartlyStats, monthlyStats and dailyStats.
so the question, how I write 4 returns for one trigger, that each return is to different docRef.
Do I need to write 4 separated functions with same trigger? or I can do it all in one function?
If you are updating statistics based on new documents that are created, you may be better to use transactions. This way, you will ensure that 2 concurrent document creations don't both update the statistics documents at the same time. You can have a transaction read the value from the new document and then update several documents.
If you simply want to write several documents at the same time, from within a Cloud Function, take a look at using Batched Writes.
The documentation for both options can be found here, Transactions and Batched Writes.
With both options, be aware that you can only update a single document at a rate of once per second. If you are processing large numbers of documents, then you may be better to pipe your new document data into Cloud Dataflow (via PubSub from your Cloud Function), then pass regular updates back to Cloud Firestore. If that's your use case, then this video will be useful... Data Pipelines with Firebase and Google Cloud
Code sample using transaction and getAll
This requires the Node SDK 0.12.0 or higher (Admin SDK >= 5.9.1)
const firestore = firebase.firestore();
let firstDocRef = firestore.doc('myCollection/document1');
let secondDocRef = firestore.doc('myCollection/document2');
return firestore.runTransaction(t => {
return t.getAll(firstDocRef, secondDocRef).then(querySnapshot => {
// Return just the data and map it to firstDoc and secondDoc (personal hack)
querySnapshot = querySnapshot.map(doc => doc.data());
let [firstDocData, secondDocData] = querySnapshot;
// Increment the counters
let firstUpdate = {myCounter: firstDocData.myCounter + 1};
let secondUpdate = {myCounter: secondDocData.myCounter + 1};
// Write the new data back to Cloud Firestore
t.update(firstDocRef, firstUpdate);
t.update(secondDocRef, secondUpdate);
});
})
.then(() => {
console.log('Transaction completed successfully');
})
.catch(err => {
console.error(err);
});
You do that by combining the promises from the four writes into a call to Promise.all() and returning that from your function.
Have a look at Promise.all() the MDN documentation, or at some of the previous questions where Promise.all() was used.

Is there any method like onDisconnect() in firestore like there is in realtime database?

I want to check user's online status, in realtime database I used to check this with the help of onDisconnect(), but now I've shifted to firestore and can't find any similar method in that.
According to this onDisconnect:
The onDisconnect class is most commonly used to manage presence in applications where it is useful to detect how many clients are connected and when other clients disconnect.
To be able to use presence in firestore, you need to connect firestore with realtime firebase(no other way).
Please check this for more info:
https://firebase.google.com/docs/firestore/solutions/presence
NOTE: This solution is not especially efficient
Off the top of my head (read: I haven't thought through the caveats), you could do something like this:
const fiveMinutes = 300000 // five minutes, or whatever makes sense for your app
// "maintain connection"
setInterval(() => {
userPresenceDoc.set({ online: new Date().getTime() })
}, fiveMinutes)
then, on each client...
userPresenceDoc.onSnapshot(doc => {
const fiveMinutesAgo = new Date().getTime() - fiveMinutes
const isOnline = doc.data().online > fiveMinutesAgo
setUserPresence(isOnline)
})
You'd probably want the code checking for presence to use an interval a little more than the interval used by the code maintaining the connection to account for network lag, etc.
A NOTE ABOUT COST
So, obviously, there could be significant lag between when someone disconnects and when that disconnection is recognized by other clients. You can decrease that lag time by increasing the frequency of writes to Firestore, and thus increasing your costs. Running the numbers, I came out with the following costs for different intervals, assuming a single client connection running continuously for a month:
Interval Cost/User/Month
----------------------------
10m $0.007776
5m $0.015552
1m $0.07776
10s $0.46656
5s $0.93312
1s $4.6656
Going with an interval of one second is pretty pricy, at $46,656 a month for a system with 10,000 users who leave the app open all month long. An interval of 10 minutes with the same number of users would only cost $77.76 a month. A more reasonable interval of one minute, 10,000 users, and only four hours of app usage per day per user, rings in at $129.60 / month.
There is no equivalent. The Firestore SDK currently doesn't have presence management like the Realtime Database SDK.
Instead, you might want to use the Realtime Database onDisconnect() in conjunction with Cloud Functions to kick off some work when the client disconnects from RTDB. You would be assuming that that your app probably also lost its connection to Firestore at the same time.
Try this, but this method kind of hackish because we cannot use onDisconnected in firestore. As far as I know, realtime database use secure WebSocket technology, so thats why onDisconnected have on it
But you can use realtime database that can be implemented in cloud functions
to update the firestore data,
functions.database.ref('users/{userId}').onUpdate()
Somewhere in clientside:
firebase.database()
.ref('.info/connected')
.on('value', async (snap) => {
if (snap.val() === true) {
// Update the online status in RTDB
await firebase.database()
.ref(`users/${credentials.user.uid}/`)
.set({
online_status: true,
start_online: firebase.database.ServerValue.TIMESTAMP
});
// OnDisconnect
firebase.database()
.ref(`users/${credentials.user.uid}/`)
.onDisconnect()
.set({
online_status: false,
last_online: firebase.database.ServerValue.TIMESTAMP
});
}
});
in cloudfunctions (this will be trigger on update)
export const onUserOnlineStatusChanged = functions.database.ref('users/{userId}').onUpdate((event: functions.Change<functions.database.DataSnapshot>, context: functions.EventContext) => {
return event.after.ref.once('value')
.then((dataSnapshot) => dataSnapshot.val()) // Get the latest value from the Firebase Realtime database
.then((value: any) => {
// Update the value from RTDB to Firestore
console.log('value.online_status', value.online_status);
if (value.online_status == true) {
// Set the value to the firestore
admin.firestore()
.collection('users_info')
.doc(context.params.userId) // Get document by the userId / Or use .where
.set({
online_status: value.online_status,
updated_at: new Date
}, {
mergeFields: [
'online_status',
'updated_at'
]
});
// Add code if necessary (when the online_status is true)
} else if (value.online_status == false) {
// Set the value to the firestore
admin.firestore()
.collection('users_info')
.doc(context.params.userId) // Get document by the userId / Or use .where
.set({
online_status: value.online_status,
updated_at: new Date
}, {
mergeFields: [
'online_status',
'updated_at'
]
});
// Add code if necessary (when the online_status is false)
}
});
});
It would take about 1 or 2 seconds to update from cloud function to firestore
There is not a direct way to do this thing but this trick helped me achieve this disconnect listener.
window.addEventListener("beforeunload", async function (e) { e.preventDefault(); await firestoreRef.doc("doc-ref").update({ online: false }); });

Resources