How to avoid redundantly downloading data when using switchMap and inner observables in RxFire? - firebase

I have some RxFire code that listens to a Firestore collection query (representing channels) and, for each of the results, listens to a Realtime Database ref for documents (representing messages in that channel).
The problem I'm running into is that the Realtime Database documents are re-downloaded every time the Firestore query changes, even if they're for a path/reference that hasn't changed.
Here's some pseudo-code:
collection(channelsQuery).pipe(
// Emits full array of channels whenever the query changes
switchMap(channels => {
return combineLatest(
channels.map(channel =>
// Emits the full set of messages for a given channel
list(getMessagesRef(channel)),
),
);
})
)
Imagine the following scenario:
Query intially emits 3 Firestore channel documents
Observables are created for corresponding Realtime Database refs for those 3 channels, which emit their message documents
A new Firestore document is added that matches the original query, which now emits 4 channel documents
The previous observables for Realtime Database are destroyed, and new ones are created for the now 4 channels, re-downloading and emitting all the data it already had for the previous 3.
Obviously this is not ideal as it causes a lot of redundant reads on the Realtime Database. What's the best practice in this case? Keep in mind that when a channel is removed, I would like to destroy the corresponding observable, which switchMap already does.

Related

How to shard data Realtime Database for chat app?

I am building a chat app and want to use RealTime Database.
I expect my database to reach the quota 200k simultaneous connection.
So i have read the documentation about scaling and sharding the data.
However i don't understand how to handle this for a chat app.
Let's say i have a groups reference that contains ids of users inside each group, and messages for this group.
If i want to scale, i need to create a new DB instance and start writing groups there too as the first DB may have more than 200k simultaneous connection.
That means users may belong to groups in multiple databases, which seems already weird and not such a good idea.
So i would like to know :
How can i shard the groups reference ?
How can i (or even should i) make users connect to multiple DB according to the groups they belong ?
It seems to be a very complicated way to do things... Am i not understanding this correctly ?
I'm sure there are plenty of ways to shard a database but here's how I've done it. This involves selecting a shard while creating a new chat. For this answer, let's assume there are 4 users: U1, U2, U3 and U4, and 2 shards (excluding the default): shard1 and shard2.
Whenever a user creates a new chat, select a shard and create a new node for that chat. You should store list of user's chats somewhere else along with the shard ID and the default database instance seems to be great for that but Firestore works too. So an object containing information of a chat will look something like:
{
chatID: "c40f15af19a94b6f84117747337b9f7a",
createdBy: "U1",
users: ["U1", "U2", "U3"],
shardId: "shard2"
}
Now you have list of chatIDs along with their shards so just connect your listeners. Again it depends on what the expected behavior is. In my case I just had to listen to data which is selected by user (i.e. active chat).
Try to divide chats evenly across all shards. One with least amount of chats active (you will have to store number of chats created per shard somewhere else like default shard) (or something like Round Robin maybe useful. At the same time, take the user creating the chat into account.
Incrementing count of chats present in a shard when a new chat is created maybe a good way.
At the end I think it's just about how you are dividing your chats in shards and there are many algorithms you can use. Having a list of user's chats containing the shard name seems to be an easy way to do so as above. I personally prefer Firestore to store list of chats so it's easier to query based on creator of chat, chats where a user U2 is a part and so on.
Creating new chats using a Cloud Function (or your servers) is preferred so no one can just spam a single database shard by reverse engineering the app.
This way all your messages will be stored in realtime database but basic information will of the chats is in Firestore (not necessary but easier to query chats). When a user opens the chat app, load the chats they are part of:
Here's a sample Firestore document:
const db = firebase.firestore()
// loading user's chats
const chatsSnapshot = await db.collection("chats").where("members", "array-contains", "myUID").get()
const chatsInfo = chats.map((c) => ({...c.data(), id: c.id}))
// Realtime DB shards
const shards = {
shard1: firebase.database(app1),
shard2: firebase.database(app2),
shard3: firebase.database(app3)
}
// Run a loop on chatsInfo and render chats to your app
for (const chat of chatsInfo) {
// Limit to first N messages if necessary
const chatRef = shards[chat.shardId].ref(chat.id);
chatRef.on('value', (snapshot) => {
const data = snapshot.val();
// Render messages
});
}
You don't need to load all the chats as I've shown above. Load messages only for the chat that is active.

Do Observables re-query ALL data when Firestore fields update?

Let's say I do a query against a Firestore collection over a date range or something. If I get an observable to the set of documents and iterate through it to build up a local collection, will it re-read all the data from Firestore every time there is a change in Firestore? Say this observable is from a where clause that contains 500 documents and I iterate through doing something:
this.firestoreObservable$.subscribe(documents => {
documents.forEach(async doc => {
// do something
})
})
If one field on one documents change on Firestore, will that count as another 500 document reads? If so (ouch!) what would the recommenced best practice be to keep from spending so many reads?
Thanks.
No. If only one document changes, then it will cost only one read. The entire set of documents is cached in memory as long as the query is actively listening to updates, and the SDK will deliver you the cached results in addition to whatever actually changed.
If the query ends and a new one starts up, then you will be charged for the full set of results again.

Is data passed in Firebase Function for database trigger counted towards outgoing bandwidth?

I am planning to add two Firebase Function to keep track of number of elements at a location. One Firebase Function is triggered when an element is added to the location and increments the counter. Other function is triggered when an element is deleted and decrements the counter. An element could contain considerable amount of data. The Firebase functions are provided with the data at the location for which the function is triggered. I wanted to check if that data is counted towards outgoing bandwidth from the Realtime Database.
For example:
exports.incrementCounter = functions.database
.ref('/tasks/{taskId}')
.onCreate((taskSnapshot, context) => {
// Increment a counter
})
exports.deccrementCounter = functions.database
.ref('/tasks/{taskId}')
.onCreate((taskSnapshot, context) => {
// Decrement a counter
})
Is the data present in the taskSnapshot counted towards the outgoing bandwidth for the Firebase Realtime Database?
Ideally, if there were a way to get the number of children at a given location is Firebase Realtime Database, then I can just schedule a function to run every 5 mins and count the number of children at the location and report that but that functionality is not available at the moment.
Cloud Functions triggers that respond to Realtime Database updates do not bill egress from the database in the form of a snapshot delivered to the function.
If you have further questions about billing, please contact Firebase support directly. https://support.google.com/firebase/contact/support

Firestore, fetch only those documents from a collection which are not present in client cache

I am implementing a one-to-one chat app using firestore in which there is a collection named chat such that each document of a collection is a different thread.
When the user opens the app, the screen should display all threads/conversations of that user including those which have new messages (just like in whatsapp). Obviously one method is to fetch all documents from the chat collection which are associated with this user.
However it seems a very costly operation, as the user might have only few updated threads (threads with new messages), but I have to fetch all the threads.
Is there an optimized and less costly method of doing the same where only those threads are fetched which have new messages or more precisely threads which are not present in the user's device cache (either newly created or modified threads).
Each document in the chat collection have these fields:
senderID: (id of the user who have initiated the thread/conversation)
receiverID: (id of the other user in the conversation)
messages: [],
lastMsgTime: (timestamp of last message in this thread)
Currently to load all threads of a certain user, I am applying the following query:
const userID = firebase.auth().currentUser.uid
firebase.firestore().collection('chat').where('senderId', '==', userID)
firebase.firestore().collection('chat').where('receiverId', '==', userID)
and finally I am merging the docs returned by these two queries in an array to render in a flatlist.
In order to know whether a specific thread/document has been updated, the server will have to read that document, which is the charged operation that you're trying to avoid.
The only common way around this is to have the client track when it was last online, and then do a query for documents that were modified since that time. But if you want to show both existing and new documents, this would have to be a separate query, which means that it'd end up in a separate area of the cache. So in that case you'll have to set up your own offline storage on top of Firestore's, which is more work than I'm typically willing to do.

Firebase - Firestore - how many time will I read documents

With the new Firestore from Firebase, I discovered that I have poor knowledge with Observables.
My problem is the following:
I get some data with db.collection('room').
If I don't listen to the observable with a subscription, do I fetch the document? (I think so).
For every change in my collection "room", is it considered as a "new document read" by Firestore?
If I have duplicated Observables which return db.collection('room') in my app, will I have X calls to the Firestore database or just one?
Thanks!
If I don't listen to the observable with a subscription, do I fetch the document? (I think so).
When you call var ref = db.collection('room'), ref is not really an observable it is a reference to the 'room' collection. Creating this reference does not perform any data reads (from network or disk).
When you call ref.get() or ref.onSnapshot() then you are fetching the documents from the server.
For every change in my collection "room", is it considered as a "new document read" by Firestore?
If you are listening to the whole collection (no where() or .orderBy() clauses) and you have an active onSnapshot() listener then yes, you will be charged for a document read operation each time a new document is added, changed, or deleted in the collection.
If I have duplicated Observables which return db.collection('room') in my app, will I have X calls to the Firestore database or just one?
If you are listening to the same Cloud Firestore data in two places you will only make one call to the server and be charged for the read operations one time. There's no cost/performance penalty to attaching multiple listeners to one reference.

Resources