How to shard data Realtime Database for chat app? - firebase

I am building a chat app and want to use RealTime Database.
I expect my database to reach the quota 200k simultaneous connection.
So i have read the documentation about scaling and sharding the data.
However i don't understand how to handle this for a chat app.
Let's say i have a groups reference that contains ids of users inside each group, and messages for this group.
If i want to scale, i need to create a new DB instance and start writing groups there too as the first DB may have more than 200k simultaneous connection.
That means users may belong to groups in multiple databases, which seems already weird and not such a good idea.
So i would like to know :
How can i shard the groups reference ?
How can i (or even should i) make users connect to multiple DB according to the groups they belong ?
It seems to be a very complicated way to do things... Am i not understanding this correctly ?

I'm sure there are plenty of ways to shard a database but here's how I've done it. This involves selecting a shard while creating a new chat. For this answer, let's assume there are 4 users: U1, U2, U3 and U4, and 2 shards (excluding the default): shard1 and shard2.
Whenever a user creates a new chat, select a shard and create a new node for that chat. You should store list of user's chats somewhere else along with the shard ID and the default database instance seems to be great for that but Firestore works too. So an object containing information of a chat will look something like:
{
chatID: "c40f15af19a94b6f84117747337b9f7a",
createdBy: "U1",
users: ["U1", "U2", "U3"],
shardId: "shard2"
}
Now you have list of chatIDs along with their shards so just connect your listeners. Again it depends on what the expected behavior is. In my case I just had to listen to data which is selected by user (i.e. active chat).
Try to divide chats evenly across all shards. One with least amount of chats active (you will have to store number of chats created per shard somewhere else like default shard) (or something like Round Robin maybe useful. At the same time, take the user creating the chat into account.
Incrementing count of chats present in a shard when a new chat is created maybe a good way.
At the end I think it's just about how you are dividing your chats in shards and there are many algorithms you can use. Having a list of user's chats containing the shard name seems to be an easy way to do so as above. I personally prefer Firestore to store list of chats so it's easier to query based on creator of chat, chats where a user U2 is a part and so on.
Creating new chats using a Cloud Function (or your servers) is preferred so no one can just spam a single database shard by reverse engineering the app.
This way all your messages will be stored in realtime database but basic information will of the chats is in Firestore (not necessary but easier to query chats). When a user opens the chat app, load the chats they are part of:
Here's a sample Firestore document:
const db = firebase.firestore()
// loading user's chats
const chatsSnapshot = await db.collection("chats").where("members", "array-contains", "myUID").get()
const chatsInfo = chats.map((c) => ({...c.data(), id: c.id}))
// Realtime DB shards
const shards = {
shard1: firebase.database(app1),
shard2: firebase.database(app2),
shard3: firebase.database(app3)
}
// Run a loop on chatsInfo and render chats to your app
for (const chat of chatsInfo) {
// Limit to first N messages if necessary
const chatRef = shards[chat.shardId].ref(chat.id);
chatRef.on('value', (snapshot) => {
const data = snapshot.val();
// Render messages
});
}
You don't need to load all the chats as I've shown above. Load messages only for the chat that is active.

Related

Nested snapshot listeners

I use Google Firestore for my iOS app built in Swift/SwiftUI and would like to implement the Snapshot listeners feature to my app.
I want to list all documents in debts collection in realtime by using snapshot listeners. Every document in this collection has subcollection debtors, which I want to get in realtime for each debts document as well. Each document in debtors has field userId, which refers to DocumentID in users collection which I would also love to have realtime connection on (for example when user changes his name I would love to see it instantly in the debt entity inside the list). This means I must initialize 2 more snapshot listeners for each document in debts collection. I'm concerned that this is too many opened connections once I have like 100 debts in the list. I can't come up with no idea apart from doing just one time fetches.
Have anyone of you ever dealt with this kind of nested snapshot listeners? Do I have a reason to worry?
This is my Firestore db
Debts
document
- description
- ...
- debtors (subcollection)
- userId
- amount
- ...
Users
document
- name
- profileImage
- email
I uploaded this gist where you can see how I operate with Firestore right now.
https://gist.github.com/michalpuchmertl/6a205a66643c664c46681dc237e0fb5d
If you want to read all debtors documents anywhere in the database with a given value for userId, you can use a collection group query to do so.
In Swift that'd look like:
db.collectionGroup("debtors").whereField("userId", isEqualTo: "uidOfTheUser").getDocuments { (snapshot, error) in
// ...
}
This will read from any collection name debtors. You'll have to add the index for this yourself, and set up the proper security rules. Both of those are documented in the link I included above.

Firestore, fetch only those documents from a collection which are not present in client cache

I am implementing a one-to-one chat app using firestore in which there is a collection named chat such that each document of a collection is a different thread.
When the user opens the app, the screen should display all threads/conversations of that user including those which have new messages (just like in whatsapp). Obviously one method is to fetch all documents from the chat collection which are associated with this user.
However it seems a very costly operation, as the user might have only few updated threads (threads with new messages), but I have to fetch all the threads.
Is there an optimized and less costly method of doing the same where only those threads are fetched which have new messages or more precisely threads which are not present in the user's device cache (either newly created or modified threads).
Each document in the chat collection have these fields:
senderID: (id of the user who have initiated the thread/conversation)
receiverID: (id of the other user in the conversation)
messages: [],
lastMsgTime: (timestamp of last message in this thread)
Currently to load all threads of a certain user, I am applying the following query:
const userID = firebase.auth().currentUser.uid
firebase.firestore().collection('chat').where('senderId', '==', userID)
firebase.firestore().collection('chat').where('receiverId', '==', userID)
and finally I am merging the docs returned by these two queries in an array to render in a flatlist.
In order to know whether a specific thread/document has been updated, the server will have to read that document, which is the charged operation that you're trying to avoid.
The only common way around this is to have the client track when it was last online, and then do a query for documents that were modified since that time. But if you want to show both existing and new documents, this would have to be a separate query, which means that it'd end up in a separate area of the cache. So in that case you'll have to set up your own offline storage on top of Firestore's, which is more work than I'm typically willing to do.

Going over read-quota in firebase firestore

I'm trying to figure out if there's a reasonable way of doing this:
My problem:
Exceeding my daily quota for reads in firestore pretty fast.
My database and what I do:
My database looks like this (simplified):
sessions: { // collection
sessionId: { // document
users: { // collection
userId: { // document
id: string
items: { // collection
itemId: trackObject
}
}
}
}
}
Now I want to retrieve from one session, all users and their items. Most sessions have 2-3 users but some users have around 3000 items. I basically want to retrieve an array like this:
[
{
userId,
items: [
...items
],
},
...users
]
How I go about it currently:
So I get all users:
const usersRef = db.collection(`sessions/${sessionId}/users`);
const userSnapshots = await usersRef.get();
const userDocs = userSnapshots.docs;
Then for each user I retrieve their items:
(I use a for-loop which can be discussed but anyhow)
const user = userDocs[i].data();
const itemsRef = usersRef.collection(`${user.id}/items`);
const itemSnapshots = await itemRef.get();
const items = itemSnapshots.docs
Finally I retrieve the actual items through a map:
user.items = items.map(doc => doc.data());
return user;
My theory:
So it looks like if I do this on a session where a user has 3000 items, the code will perform 3000 read operations on firestore. After just 17 runs I eat up my 50000 operations a day.
This reasoning is somewhat based on this answer.
My question:
Is there any other way of doing this? Like getting all tracks in one read-call? Should I see if I can fit all the items into an array-key in the user-object instead of storing as a collection? Is the free version of firestore simply not designed for this many documents being retrieved in one go?
If you're trying to reduce the number of document reads, you'll need to reduce the number of documents that you need to read to implement your use-case.
For example, it is fairly unlike that a user of your app will want to read the details of all 3000 items. So you might want to limit how many items you initially read, and load the additional items only on demand.
Also consider if each item needs to be its own document, or whether you could combine all items of a user into a single document. For example, if you never query the individual items, there is no need to store them as separate documents.
Another thing to consider if whether you can combine common items into a single document. An example of this is, even if you keep the items in a separate subcollection, to keep the names and ids of the most recent 30 items for a user in the user's document. This allows you to easily show a user and their 30 most recent items. Doing this you're essentially pre-rendering those 30 items of each user, significantly reducing the number of documents you need to read.
To learn more on data modeling considerations, see:
Cloud Firestore Payments
Going over read-quota in firebase firestore
the video series Getting to know Cloud Firestore, specifically What is a NoSQL Database? How is Cloud Firestore structured? and How to Structure Your Data
this article on NoSQL data modeling

How to avoid redundantly downloading data when using switchMap and inner observables in RxFire?

I have some RxFire code that listens to a Firestore collection query (representing channels) and, for each of the results, listens to a Realtime Database ref for documents (representing messages in that channel).
The problem I'm running into is that the Realtime Database documents are re-downloaded every time the Firestore query changes, even if they're for a path/reference that hasn't changed.
Here's some pseudo-code:
collection(channelsQuery).pipe(
// Emits full array of channels whenever the query changes
switchMap(channels => {
return combineLatest(
channels.map(channel =>
// Emits the full set of messages for a given channel
list(getMessagesRef(channel)),
),
);
})
)
Imagine the following scenario:
Query intially emits 3 Firestore channel documents
Observables are created for corresponding Realtime Database refs for those 3 channels, which emit their message documents
A new Firestore document is added that matches the original query, which now emits 4 channel documents
The previous observables for Realtime Database are destroyed, and new ones are created for the now 4 channels, re-downloading and emitting all the data it already had for the previous 3.
Obviously this is not ideal as it causes a lot of redundant reads on the Realtime Database. What's the best practice in this case? Keep in mind that when a channel is removed, I would like to destroy the corresponding observable, which switchMap already does.

Custom Authentication in Google Firebase

I have a question regarding authentication using Google Firebase.
For an app, I want to build an authentication similar to the one Slack uses: first, the user provides the input as to which group they want to log in to. If there exists a group with the same name as provided in the input, the user is then taken to a login/signup screen.
I've thought about storing users in the realtime database as follows, but I think there must be a better way to do this (since I don't think I can use the firebase authentication in this case):
groups: {
"some_group_name": {
"users": [
"user1": {
.. user 1 information
},
"user2": {
.. user 2 information
}
],
"group_details": {
"name": ..,
"someGroupDetail": ..
}
},
"some_other_group_name": {
...
}
}
I haven't realized if there is an obvious answer yet, so I'm open to suggestions. How would you suggest I tackle this?
Thanks
PS: I'm building the application using Nativescript and Angular, and (so far) there is no server or database involved other than Firebase.
Another suggestion that might work, is by using Firebase Auth Custom Claims. That way, you only need to store the group ID and group name in your realtime database, without worrying to keep changing the database each time user is added or removed.
This is one way you can do it:
Store database exactly like you have it, with it's group ID and name.
In your backend script (I recommend Cloud Function), each time a User is registering themselves, add custom claims in your user: Specifying what group is the User belong to.
Every time user authenticate, retrieve the group ID from custom claims. And there you get it!
Note: be careful not to put too much information in your custom claims as it cannot exceed 1000 bytes.
Read more about it here: https://firebase.google.com/docs/auth/admin/custom-claims
I would suggest you to implement Root-level collections.
Which is to create collections at the root level of your database to organize disparate data sets(as shown in the image below).
Advantages: As your lists grow, the size of the parent document doesn't change. You also get full query capabilities on
subcollections.
Possible use case: In the same chat app, for example, you
might create collections of users or messages within chat room
documents
Based on the reference from the firebase cloud firestore
Choose a data structure tutorial (I know you are using Realtime database but structuring the database is the same since both are using the NoSQL Schema)
For your case:
Make 2 Collections: Users, Groups
Users: User info is stored in the form of document
Groups: In the Groups Collection, here comes the tricky part, you can either store all groups subcollection under 1 document or split into multiple documents (based on your preference)
In the group-subcollection, you can now store your group info as well as the user assigned where you can store user assigned in the form of array, therefore whenever a user access the group, query the user assigned first, if yes, then allow (assuming users can view all group)
You do the thinking now

Resources