Firebase chat - removing old messages - firebase

I have create a chat with just 1 room, private messages, moderation and everything now and it all works great!
While I was testing the chat, I realised that all the messages every typed in the chat got saved and if you had a lot of people using the chat it would very quickly take up quite a lot of space in your Firebase.
To give you an example of what I am looking for, let me show you how I handle private messages:
When John sends a private message to Bob, the message will be added to both John and Bobs list of private messages, like this:
/private/John <-- Message appended here
/private/Bob <-- Message appended here
This is an example of how the firebase would look with 2 message in the chat, and 2 private messages:
{
"chat" : {
"516993ddeea4f" : {
"string" : "Some message here",
"time" : "Sat 13th - 18:20:29",
"username" : "test",
},
"516993e241d1c" : {
"string" : "another message here",
"time" : "Sat 13th - 18:20:34",
"username" : "Test2",
}
},
"private" : {
"test" : {
"516993f1dff57" : {
"string" : "Test PM reply!",
"time" : "Sat 13th - 18:20:49",
"username" : "Test2",
},
"516993eb4ec59" : {
"string" : "Test private message!",
"time" : "Sat 13th - 18:20:43",
"username" : "test",
}
},
"Test2" : {
"516993f26dbe4" : {
"string" : "Test PM reply!",
"time" : "Sat 13th - 18:20:50",
"username" : "Test2",
},
"516993eab8a55" : {
"string" : "Test private message!",
"time" : "Sat 13th - 18:20:42",
"username" : "test",
}
}
}
}
and the same goes the other way around. Now if Bob where to disconnect, Bobs list of private messages get removed, but John is still able to see his conversation with Bob because he got a copy of all the messages on his list. If John then disconnect after Bob, the Firebase would be cleaned and their conversation no longer stored.
Is there a way to achieve something like this with the General chat?
Pushing a message to all the users who are using the chat does not seem like a good solution (?). Or is it possible to somehow make the Firebase only keep the latest 100 messages for example?
I hope it makes sense!
Kind regards
Ps. Big thanks for the Firebase team for all the help so far, I really appreciate it.

There are a few different ways to tackle this, some more complicated to implement than others. The simplest solution would be have each user only read the latest 100 messages:
var messagesRef = new Firebase("https://<example>.firebaseio.com/message").limit(100);
messagesRef.on("child_added", function(snap) {
// render new chat message.
});
messagesRef.on("child_removed", function(snap) {
// optionally remove this chat message from DOM
// if you only want last 100 messages displayed.
});
Your messages list will still contain all messages but it won't affect performance since every client is only asking for the last 100 messages. In addition, if you want to clean up old data, it is best to set the priority for each message as the timestamp of when the message was sent. Then, you remove all the messages older than 2 days with:
var timestamp = new Date();
timestamp.setDate(timestamp.getDate()-2);
messagesRef.endAt(timestamp).on("child_added", function(snap) {
snap.ref().remove();
});
Hope this helps!

Related

How to make consistent delete in Firebase database when the data lies in multiple paths in a fan out way? [duplicate]

This question already has an answer here:
Firebase -- Bulk delete child nodes
(1 answer)
Closed 6 years ago.
With Firebase fan out data to different nodes and paths is recommended by Firebase like below example from Firebase sample:
{
"post-comments" : {
"PostId1" : {
"CommentID1" : {
"author" : "User1",
"text" : "Comment1!",
"uid" : "UserId1"
}
}
},
"posts" : {
"PostId1" : {
"author" : "user1",
"body" : "Firebase Mobile platform",
"starCount" : 1,
"stars" : {
"UserId1" : true
},
"title" : "About firebase",
"uid" : "UserId1"
}
},
"user-posts" : {
"UserId1" : {
"PostId1" : {
"author" : "user1",
"body" : "Firebase Mobile platform",
"starCount" : 1,
"stars" : {
"UserId1" : true
},
"title" : "About firebase",
"uid" : "UserId1"
}
}
},
"users" : {
"UserId1" : {
"email" : "user1#gmail.com",
"username" : "user1"
}
}
}
With multipath updates we can atomically update all the paths for a post, however if we want to delete a blog post in above kind of schema then how can we do it atomically? There is no multi path delete, I guess. If client losses network connection while deleting then only few paths would be deleted!
Also in case there is a requirement like when a user is deleted for all the post he has starred, we should remove the stars and unstar the post for that user. This becomes difficult as there is no direct tracking of what posts user has starred. For this do we need to fan out the starring of posts as well like have a node user-stars. Then while deleting we know what all activity the user has done and act on it while deleting user. Is there a better way of handling this?
"user-stars":{
"UserId1":{
"PostID1":true
}
}
In both cases the question on atomically or consistently deleting the data from multipaths (either all or nothing) is seemingly not available.
In that case the only option available looks to be putting the delete command in Firebase queue which will resolve the task in queue only if everything is deleted. That will be eventually consistent option but should be fine. But that is expensive option requiring server. Is there a better way?
You can implement a multi-path delete, by writing a value of null to the paths.
So:
var updates = {
"user-posts/UserId1/PostId1": null,
"post-comments/PostId1": null,
"posts/PostId1": null
}
ref.update(updates);
I had already answered this before: Firebase -- Bulk delete child nodes
It's also quite explicitly mentioned in the documentation on deleting data:
You can also delete by specifying null as the value for another write operation such as set() or update(). You can use this technique with update() to delete multiple children in a single API call.

Firebase data structure for chat

I am trying to build an app for employers to chat with employees.
So I have employers, employees and messages.
I did it with $firebaseArray and childs:
recipient > sender > messages
I want to add sender data such as profile images and last message, I'm not how to do it.
- employer_1
- employee_1
- SKLDJLKDksdklJS
- content: "Hello"
- timestamp: 129081021
Is this the right way to do it or is there a better way? Thanks.
You might have issues of showing only latest messages or newer messages since any get on employer_1 -> employee_1 will load all messages.
Another alternate might be to have a structure like this:
{
"users":{
"employer_1":{
"profile-image":"<url>",
"last-message":"SKLDJLKDksdklJS",
...
},
"employee_1"{
"profile-image":"<url>",
"user-chat-list":{
"employer_1":{
"last-message":"SKLDJLKDksdklJS",
"message-list":{
"SKLDJLKDksdklJS" : 129081021,
"ASDCJLKDksdklJS" : 129081021
}
}
}
}
},
"messages":{
"SKLDJLKDksdklJS":{
"content": "Hello",
"sender":"employer_1",
"timestamp": 129081021
}
}
}
you won't need to fetch all the messages with content for a chat list.

Decent data structure for Firebase messaging?

I'm trying to get started with Firebase and I just want to make sure that this data structure is optimized for Firebase.
The conversation object/tree/whatever looks like this:
conversations: {
"-JRHTHaKuITFIhnj02kE": {
user_one_id: "054bd9ea-5e05-442b-a03d-4ff8e763030b",
user_two_id: "0b1b89b7-2580-4d39-ae6e-22ba6773e004",
user_one_name: "Christina",
user_two_name: "Conor",
user_one_typing: false,
user_two_typing: false,
last_message_text: "Hey girl, what are you doing?",
last_message_type: "TEXT",
last_message_date: 0
}
}
and the messages object looks like so:
messages: {
"-JRHTHaKuITFIhnj02kE": {
conversation: "-JRHTHaKuITFIhnj02kE",
sender: "054bd9ea-5e05-442b-a03d-4ff8e763030b",
message: "Hey girl, what are you doing?",
message_type: "TEXT",
message_date: 0
}
}
Is storing the name relative to the user in the conversation object needed, or can I easily look up the name of the user by the users UID on the fly? Other than the name question, is this good? I don't want to get started with a really bad data structure.
Note: Yes, i know the UID for the conversation & message are the same, I got tired of making up variables.
I usually model the data that I need to show in a single screen in a single location in the database. That makes it possible to retrieve that data with a single read/listener.
Following that train of thought it makes sense to keep the user name in the conversation node. In fact, I usually keep the username in each message node too. The latter prevents the need for a lookup, although in this case I might be expanding the data model a bit far for the sake of keep the code as simple as possible.
For the naming of the chat: if this is a fairly standard chat app, then user may expect to have a persistent 1:1 chat with each other, so that every time you and I chat, we end up in the same room. A good approach for accomplishing that in the data model, can be found in this answer: Best way to manage Chat channels in Firebase
I don't think you structured it right. You should bare in mind "What if" complete analysis.
Though, I would recommend structuring it this way (I made it up for fun, not really tested in-terms of performance when getting a huge traffic. but you can always do denormalization to increase performance when needed):
{
"conversation-messages" : {
"--JpntMPN_iPC3pKDUX9Z" : {
"-Jpnjg_7eom7pMG6LDe1" : {
"message" : "hey! Who are you?",
"timestamp" : 1432165992987,
"type" : "text",
"userId" : "user:-Jpnjcdp6YXM0auS1BAT"
},
"-JpnjibdwWpf1k-zS3SD" : {
"message" : "Arya Stark. You?",
"timestamp" : 1432166001453,
"type" : "text",
"userId" : "user:-OuJffgdYY0jshTFD"
},
"-JpnkqRjkz5oT9sTrKYU" : {
"message" : "no one. a man has no name.",
"timestamp" : 1432166295571,
"type" : "text",
"userId" : "user:-Jpnjcdp6YXM0auS1BAT"
}
}
},
"conversations-metadata" : { // to show the conversation list from all users for each user
"-JpntMPN_iPC3pKDUX9Z" : {
"id": "-JpntMPN_iPC3pKDUX9Z",
"date":995043959933,
"lastMsg": "no one. a man has no name.",
"messages_id": "-JpntMPN_iPC3pKDUX9Z"
}
},
"users" : {
"user:-Jpnjcdp6YXM0auS1BAT" : {
"id" : "user:-Jpnjcdp6YXM0auS1BAT",
"name" : "many-faced foo",
"ProfileImg" : "...."
"conversations":{
"user:-Yabba_Dabba_Doo" : {
"conversation_id": "-JpntMPN_iPC3pKDUX9Z",
"read" : false
}
}
},
"user:-Yabba_Dabba_Doo" : {
"id" : "user:-Yabba_Dabba_Doo",
"name" : "Arya Stark",
"ProfileImg" : "...."
"conversations":{
"user:-Jpnjcdp6YXM0auS1BAT" : {
"conversation_id": "-JpntMPN_iPC3pKDUX9Z",
"read" : true
}
}
}
}
}

Firebase: structuring data via per-user copies? Risk of data corruption?

Implementing an Android+Web(Angular)+Firebase app, which has a many-to-many relationship: User <-> Widget (Widgets can be shared to multiple users).
Considerations:
List all the Widgets that a User has.
A User can only see the Widgets which are shared to him/her.
Be able to see all Users to whom a given Widget is shared.
A single Widget can be owned/administered by multiple Users with equal rights (modify Widget and change to whom it is shared). Similar to how Google Drive does sharing to specific users.
One of the approaches to implement fetching (join-style), would be to go with this advice: https://www.firebase.com/docs/android/guide/structuring-data.html ("Joining Flattened Data") via multiple listeners.
However I have doubts about this approach, because I have discovered that data loading would be worryingly slow (at least on Android) - I asked about it in another question - Firebase Android: slow "join" using many listeners, seems to contradict documentation .
So, this question is about another approach: per-user copies of all Widgets that a user has. As used in the Firebase+Udacity tutorial "ShoppingList++" ( https://www.firebase.com/blog/2015-12-07-udacity-course-firebase-essentials.html ).
Their structure looks like this:
In particular this part - userLists:
"userLists" : {
"abc#gmail,com" : {
"-KBt0MDWbvXFwNvZJXTj" : {
"listName" : "Test List 1 Rename 2",
"owner" : "xyz#gmail,com",
"timestampCreated" : {
"timestamp" : 1456950573084
},
"timestampLastChanged" : {
"timestamp" : 1457044229747
},
"timestampLastChangedReverse" : {
"timestamp" : -1457044229747
}
}
},
"xyz#gmail,com" : {
"-KBt0MDWbvXFwNvZJXTj" : {
"listName" : "Test List 1 Rename 2",
"owner" : "xyz#gmail,com",
"timestampCreated" : {
"timestamp" : 1456950573084
},
"timestampLastChanged" : {
"timestamp" : 1457044229747
},
"timestampLastChangedReverse" : {
"timestamp" : -1457044229747
}
},
"-KByb0imU7hFzWTK4eoM" : {
"listName" : "List2",
"owner" : "xyz#gmail,com",
"timestampCreated" : {
"timestamp" : 1457044332539
},
"timestampLastChanged" : {
"timestamp" : 1457044332539
},
"timestampLastChangedReverse" : {
"timestamp" : -1457044332539
}
}
}
},
As you can see, the copies of shopping list "Test List 1 Rename 2" info appears in two places (for 2 users).
And here is the rest for completeness:
{
"ownerMappings" : {
"-KBt0MDWbvXFwNvZJXTj" : "xyz#gmail,com",
"-KByb0imU7hFzWTK4eoM" : "xyz#gmail,com"
},
"sharedWith" : {
"-KBt0MDWbvXFwNvZJXTj" : {
"abc#gmail,com" : {
"email" : "abc#gmail,com",
"hasLoggedInWithPassword" : false,
"name" : "Agenda TEST",
"timestampJoined" : {
"timestamp" : 1456950523145
}
}
}
},
"shoppingListItems" : {
"-KBt0MDWbvXFwNvZJXTj" : {
"-KBt0heZh-YDWIZNV7xs" : {
"bought" : false,
"itemName" : "item",
"owner" : "xyz#gmail,com"
}
}
},
"uidMappings" : {
"google:112894577549422030859" : "abc#gmail,com",
"google:117151367009479509658" : "xyz#gmail,com"
},
"userFriends" : {
"xyz#gmail,com" : {
"abc#gmail,com" : {
"email" : "abc#gmail,com",
"hasLoggedInWithPassword" : false,
"name" : "Agenda TEST",
"timestampJoined" : {
"timestamp" : 1456950523145
}
}
}
},
"users" : {
"abc#gmail,com" : {
"email" : "abc#gmail,com",
"hasLoggedInWithPassword" : false,
"name" : "Agenda TEST",
"timestampJoined" : {
"timestamp" : 1456950523145
}
},
"xyz#gmail,com" : {
"email" : "xyz#gmail,com",
"hasLoggedInWithPassword" : false,
"name" : "Karol Depka",
"timestampJoined" : {
"timestamp" : 1456952940258
}
}
}
}
However, before I jump into implementing a similar structure in my app, I would like to clarify a few doubts.
Here are my interrelated questions:
In their ShoppingList++ app, they only permit a single "owner" - assigned in the ownerMappings node. Thus no-one else can rename the shopping list. I would like to have multiple "owners"/admins, with equal rights. Would such a keep-copies-per-user structure still work for multiple owner/admin users, without risking data corruption/"desynchronization" or "pranks"?
Could data corruption arise in scenarios like this: User1 goes offline, renames Widget1 to Widget1Prim. While User1 is offline, User2 shares Widget1 to User3 (User3's copy would not yet be aware of the rename). User1 goes online and sends the info about the rename of Widget1 (only to his own and User2's copies, of which the client code was aware at the time of the rename - not updating User3's copy). Now, in a naive implementation, User3 would have the old name, while the others would have the new name. This would probably be rare, but still worrying a bit.
Could/should the data corruption scenario in point "2." be resolved via having some process (e.g. on AppEngine) listening to changes and ensuring proper propagation to all user copies?
And/or could/should the data corruption scenario in point "2." be resolved via implementing a redundant listening to both changes of sharing and renaming, and propagating the changes to per-user copies, to handle the special case? Most of the time this would not be necessary, so it could result in performance/bandwidth penalty and complicated code. Is it worth it?
Going forward, once we have multiple versions deployed "in the wild", wouldn't it become unwieldy to evolve the schema, given how much of the data-handling responsibility lies with the code in the clients? For example if we add a new relationship, that the older client versions don't yet know about, doesn't it seem fragile? Then, back to the server-side syncer-ensurerer process on e.g. AppEngine (described in question "3.") ?
Would it seem like a good idea, to also have a "master reference copy" of every Widget / shopping-list, so as to give good "source of truth" for any syncer-ensurerer type of operations that would update per-user copies?
Any special considerations/traps/blockers regarding rules.json / rules.bolt permissions for data structured in such a (redundant) way ?
PS: I know about atomic multi-path updates via updateChildren() - would definitely use them.
Any other hints/observations welcome. TIA.
I suggest having only one copy of a widget for the entire system. It would have an origin user ID, and a set of users that have access to it. The widget tree can hold user permissions and change history. Any time a change is made, a branch is added to the tree. Branches can then be "promoted" to the "master" kind of like GIT. This would guarantee data integrity because past versions are never changed or deleted. It would also simplify your fetches... I think :)
{
users:[
bob:{
widgets:[
xxx:{
widgetKey: xyz,
permissions: *,
lastEdit...
}
]
}
...
]
widgets:[
xyz:{
masterKey:abc,
data: {...},
owner: bob,
},
...
]
widgetHistory:[
xyz:[
v1:{
data:{...},
},
v2,
v3
]
123:[
...
],
...
]
}

Datastructure for multi-cast type of message broadcasting

We are thinking about migrating from Pusher to Firebase. We are having troubling thinking about how Pusher channels would be represented in Firebase.
In Pusher we have a channel per user. So a user might be in a user-1 channel, another might be in a user-2 channel.
Then our backend/server would send a message to both these users via Pusher.trigger(message, ['user-1', 'user-2']).
I think this would usually be done like this:
{
web_page_1: {
user_1: {
messages: [{}, {}, ..],
},
user_2: {
messages: [{}, {}, ..],
},
...
},
web_page_2: {
user_2: {
messages: [{}, {}, ..],
},
user_3: {
messages: [{}, {}, ..],
}
},
....
}
Here the problem is: User 1 and User 2 for the same page might have a lot of messages in common. Is there a way to reduce this duplication, since these messages can get rather large, sending and storing them per user can get expensive. Also User 1 should not be able to read the messages of User 2.
It would be nice to do something like this:
{
web_page_1: {
message_1: {
user_ids: [1,2,3]
content: {},
},
message_2: {
recipient_ids: [3,4,5]
content: {},
}
...
},
web_page_2: {
message_1: {
user_ids: [1,2,3]
content: {},
},
message_2: {
user_ids: [3,4,5]
content: {},
}
},
....
}
But then, how would the security policy be applied such that a message can only be read by the user_ids specified in it.
Any pointers would be really appreciated.
If multi-cast is your use-case and the messages get large, I would indeed split the messages from the users and add message-references to the users like you show.
Root
Users
provider:344923
Name: Akshay Rawat
Messages
1: true
2: true
3: true
provider:209103
Name: Frank van Puffelen
Messages
1: true
Messages
1: It's a beautiful day
2: The sun is shining
3: I feel good, I feel good
4: And nothing's gonna stop me now
In the above data you can see that you and I are users. The provider:... is our uid, but can be anything that allows you to identify the current user. You've received messages 1, 2 and 3, while I have only received message 3. Neither of us has received message 4.
I took the Web_page level out to simplify things a bit. If you really need that level, you can add it back. The basic approach will remain the same.
You security rules can then use these message-references to see if the use can read a specific message:
{
"rules": {
"Messages": {
"$message_id": {
".read": "root.child('Users/'+auth.uid+'/Messages').hasChild($message_id)"
}
}
}
This rule defines the security for any child under messages (identified by $message_id). We grant read access if the $message_id is references as a message for the current user (auth.uid).

Resources