I read a lot about nosql databases lately. I get that rule of thumb is to structure the data based on our view (of course, depends on the use case).
Now, let's say that we have a social app and the user has a profile but he also creates posts and we have to store them in the database.
So, I see some developers choose to do that like so:
Posts
-----UserID
-----------PostID
-----------------username: John
-----------------profileImage: https://...
-----------------posted_photo: https://...
This totally fits the structure base on the view. We'd go into posts and our userID and we could get all the data that our view needs. Now my question is, what happens when the user has made 100K posts and he decides to change his profile photo for example. All of his posts so far contain his old photo, so now, we have to write a method that would cycle through 100K of posts (or all of his posts in general) and update his photo. In 2 hours, he decides that "Nah, I don't like this photo, I'd change it back" and we have to make another 100K queries.
How is that (denormalized data) ok? Sure, its easier, its flat but then we have to make ridiculous amounts of queries to change a single profile photo. What's the way to handle this really?
I've done this storing user's data in a place and setting just the userID as post attribute.
posts:
userID:
postID:
userID: 'user1',
attachedImageURL: 'http:..',
message: 'hey',
reblogID: 'post4',
type: 'audio|poll|quote'
users:
user1:
name: 'john',
profileImage: 'http..'
It requires one more query to Firebase to retrieve user's profile data but it's a good way to solve this. It really depends on how you want to use those data.
Related
I'm building a simple application where posts appear on a user's home page and he can like or unlike them.
first when there were no users in the applications, I made a simple boolean field "liked", as shown in the figure, to determine if a post is liked or un-liked. However, when I started working with users, I find it hard to find the perfect structure for the likes field.
In Firestore, I added a field named "likedBy" for each post, which contains a map with a key of each user's id and a boolean to determine if the user liked the post or not.
I don't know if this structure is suitable or not, and if not, is there a better way to reach my goal?
The likedBy field is enough to cover most use cases.
Just store in likedBy an array of all users who liked the post by user ID. When a user likes - add the user ID to the array and remove it upon unlike.
That means you can also remove the likes and liked field as you can get it from reading the likedBy array.
I would recommend going one step further and save more than the User ID. You could also save the User displayName and photoUrl. That way, you don't need to read ($$$$money$$$$) the documents of every Users who liked Posts in order to display their name and/or avatar.
I am building a social media database schema, in which I have users, followers, tags and posts. To conform to the firebase model I have flattened the structure as suggested in the firebase documentation as seen below. The issue that I am struggling with is when a user selects a tag and sees a bunch of posts from the tagPosts table all related by tag returned, I would then like to show the posts created by the current users followers first.
In SQL this would be done with an inline query checking the users followers, against the posts returned by a specific tag.
However in firebase I am not sure how do this without downloading all the posts contained under the tagID node in tagPosts and checking through each post's creator against the node of Followers for the current user userID. This operation could easily grow out of hand for 100s of posts amongst 100s of users. Ive tried modeling off of this answer, How do I check if a firebase database value exists? and this article From SQL to Firebase — How to structure the DB for a social network app. Am I poorly structuring the data how do I fix this thank you so much.
`
Users-
-userID1
-misc. userData
-userID2
-misc. userData
Followers-
-userID1
-userIDOfFollower1
-userIDOfFollower2
Following-
-userID1
-userIDOfFollower1
-userIDOfFollower2
Posts-
-postID1
-userIDFromCreator
-misc. PostData
Tags-
-tagID1
-misc. TagData
TagsUsers
-tagID1
-userID1
-userID2
TagsPosts
-tagID1
-postID1
-postID2
Edits-Thank you Frank
In our storyboard flow we plan to have a user see a wall of tags determined by constantly updating popular score based on properties of the tag and where we predict the user may have interest. The user will then select a tag and see posts related to that tag, from those posts I would like to show the posts from a users followers before those of everyone else who’s post falls in the category of a specified tag.
I have considered two possibilities either I optimize on reads in which I would have to keep track of every time a users follower posts to a tag and record the tagID along with the postID in a node for every follower a user has who posted in a special node of FollowersTags which would have a structure of listing for each userID a list of users and the all the followers of a user posted to which would become 100s of writes for each post created directly proportional to the number of followers a friend has.
*creates a list of posts to a specific tag made by followers
FollowersTags
-userID1_tagID1(composite key)
-postID1
-postID2
-postID3
-postID4
-userID1_tagID2
-postID1
-postID2
-postID3
-postID4
Or I could optimize on writes as tried above, which presents us with our current predicament of having to perform a query 100s of times directly proportional to the number of posts in a tag.
Is there any way around these two options which of the two is the better approach.
Unfortunately I would not be able to predict the posts displayed to the user before they select a tag.
In the Firebase Realtime Database, I typically model the data in the database to what I show on the screen. So if you have a "wall" of recent, relevant posts for each user, consider modeling precisely that in your database: a list of recent, relevant posts (or post IDs) for each user.
UserWalls
userID1
"timestamp_or_push_id": "postId1"
"timestamp_or_push_id": "postId2"
userID2
"timestamp_or_push_id": "postId1"
"timestamp_or_push_id": "postId3"
While the problem of determining what to show remains the same, with this database model it's now a write-time problem, instead of a read-time problem.
I am building an iOS app that is using Cloud Firestore (not Firebase realtime database) as a backend/database.
Google is trying to push new projects towards Cloud Firestore, and to be honest, developers with new projects should opt-in for Firestore (better querying, easier to scale, etc..).
My issue is the same that any relational database developer has when switching to a no-SQL database: data modeling
I have a very simple scenario, that I will first explain how I would configure it using MySQL:
I want to show a list of posts in a table view, and when the user clicks on one post to expand and show more details for that post (let say the user who wrote it). Sounds easy.
In a relational database world, I would create 2 tables: one named "posts" and one named "users". Inside the "posts" table I would have a foreign key indicating the user. Problem solved.
Poor Barry, never had the time to write a post :(
Using this approach, I can easily achieve what I described, and also, if a user updates his/her details, you will only have to change it in one place and you are done.
Lets now switch to Firestore. I like to think of RDBMS's table names as Firestore's collections and the content/structure of the table as the documents.
In my mind i have 2 possible solutions:
Solution 1:
Follow the same logic as the RDBMS: inside the posts collection, each document should have a key named "userId" and the value should be the documentId of that user. Then by fetching the posts you will know the user. Querying the database a second time will fetch all user related details.
Solution 2:
Data duplication: Each post should have a map (nested object) with a key named "user" and containing any user values you want. By doing this the user data will be attached to every post it writes.
Coming from the normalization realm of RDBMS this sounds scary, but a lot of no-SQL documents encourage duplication(?).
Is this a valid approach?
What happens when a user needs to update his/her email address? How easily you make sure that the email is updated in all places?
The only benefit I see in the second solution is that you can fetch both post and user data in one call.
Is there any other solution for this simple yet very common scenario?
ps: go easy on me, first time no-sql dev.
Thanks in advance.
Use solution 1. Guidance on nesting vs not nesting will depend on the N-to-M relationship of those entities (for example, is it 1 to many, many to many?).
If you believe you will never access an entity without accessing its 'parent', nesting may be appropriate. In firestore (or document-based noSQL databases), you should make the decision whether to nest that entity directly in the document vs in a subcollection based on the expect size of that nested entity. For example, messages in a chat should be a subcollection, as they may in total exceed the maximum document size.
Mongo, a leading noSQL db, provides some guides here
Firestore also provided docs
Hope this helps
#christostsang I would suggest a combination of option 1 and option 2. I like to duplicate data for the view layer and reference the user_id as you suggested.
For example, you will usually show a post and the created_by or author_name with the post. Rather than having to pay additional money and cycles for the user query, you could store both the user_id and the user_name in the document.
A model you could use would be an object/map in firestore here is an example model for you to consider
posts = {
id: xxx,
title: xxx,
body: xxx,
likes: 4,
user: {refId: xxx123, name: "John Doe"}
}
users = {
id: xxx,
name: xxx,
email: xxx,
}
Now when you retrieve the posts document(s) you also have the user/author name included. This would make it easy on a postList page where you might show posts from many different users/authors without needed to query each user to retrieve their name. Now when a user clicks on a post, and you want to show additional user/author information like their email you can perform the query for that one user on the postView page. FYI - you will need to consider changes that user(s) make to their name and if you will update all posts to reflect the name change.
I am looking create a social-media feed using Firebase. My data is structured like this:
users: {
uid: {
... // details
}
}
friends: {
uid: {
friends: { // sub collection
fuid: {
... // details
}
}
}
}`
posts: {
postId: {
postedBy: uid
... // details
}
}
Now I am trying to get the posts from all friends of the user, limit it to the most recent 10 posts, and then create a scrolling directive that queries the next set of 10 posts so that the user doesn't have to query and load posts^N for friends^N on the page load. But I'm not really sure how to query firebase in an effective manner like this, for the user's friends and then their posts.
I have the scrolling directive working, taken from Jeff Delaney's Infinite Scrolling Lesson on AngularFirebase.com. But it only handles the posts (boats in the tutorial) collection as a whole, without selectively querying within that collection (to check if the user is a friend).
The only solution that I could think of was to query all of the user's friends posts, store that in an array, and then chunk load the results in the DOM based on the last batch of posts that were loaded. This just seems like it could be really inefficient in the long-haul if the user has 100's of friends, with 100's of posts each.
If I get it right, you are duplicating the post for each user in the user's friend list right? I don't think it is a good idea if your app escalates... At this time, the cost for 100k doc writes is $0,18, so:
Imagine that a user of your app have 1000 friends. When he posts anything, you are making 1000 writes in the database. imagine that you have 1000 active users like him. You have just made 1.000.000 writes now and paid $1.80.
Now even worse: you probably have on each post, a duplicated field for user displayName and a profileImageUrl. Imagine that this user has 500 posts in his history and have just changed his profile picture. You will have to update one of the fields for each post on each of his 1000 friend's feed right? You will be doing 1000 * 500 = 500.000 writes just for updating the profileImageUrl! and if the user didn't like the photo? he tries 3 new photos and now in 10 minutes you had made 2.000.000 writes in the database. This means you will be charged $3.60. It may not seems too much, but pay attention that we're talking about 1 single user in a single moment. 1000 users changing profile picture 4 times in the same day and you are paying $3,600.00.
Take a look at this article: https://proandroiddev.com/working-with-firestore-building-a-simple-database-model-79a5ce2692cb#7709
I ended up solving this issue by leveraging Firebase Functions. I have two collections, one is called Posts and the other is called Feeds. When a user adds a post, it gets added to the Posts collection. When this happens, it triggers a Firebase Function, which then grabs the posting user's UID.
Once it has the UID, it queries another collection called Friends/UID/Friends and grabs all of their friend's UID's.
Once it has the UID's, it creates a batch add (in case the user has more than 500 friends), and then adds the post to their friend's Feeds/UID/Posts collection.
The reason I chose this route, was a number of reasons.
Firebase does not allow you to query with array lists (the user's friends).
I did not want to filter out posts from non-friends.
I did not want to download excessive data to the user's device.
I had to paginate the results in order from newest to oldest.
By using the above solution, I am now able to query the Feeds/UID/Posts/ collection, in a way that returns the next 10 results every time, without performance or data issues. The only limitation I have not been able to get around completely is it takes a few seconds to add the post to the user's personally feed, as the Function needs time to spin up. But this can be mitigated by increasing the memory allocation for that particular function.
I also do the above listed for posts that are edited and or deleted.
I think i have a solution for Firestore Social Feed queries. Not sure if it works but here it is;
A Friends collection keeps the friends UUID'S list as an array in a document. Every document in this collection is for a user. So when the user logs in we first have the friends list with a cloud function with "one read" right? All friends id's are in one document. And we also put a lastchecked time stamp to this document. Everytime we get friends array we record the date.
Now a cloud function can check all users posts one by one. As i understand latest IN queries allow an array up to 10 UUID's. So if user has 100 friend query will end in ten rounds. Now we have sth to serve.
Instead of directly serving the posts we create a collection for every user. We will put all this collected data to document but we slice it to days. Let's pretend we already have older posts in this usersfeed collection (every day as a document). So we had a last time check on our friends document. We query now -> last checked date. This way we only fetched unseen posts and sliced them daily (if they belong to more days ofcourse)
So while this happens on cloud function we already served the previous feed document. And when collection has new document firestore already listens and adds right? If the user scrolls down we get the previous days document. So every document will have more then one posts data as map / array.
This saves many read counts i guess.
In any social network, you can follow a person, a post or anything, everything you followed will be displayed in your wall, now I wanna implement the same feature in asp.net mvc, but I have problem on design table to query all following things of a user. This is tables I designed:
[User(id,name,email,password)]
[Following(id,personId,followingId,source)]
[Post(id,title,description,authorId)]
So when a user followed other user,a new record will be pushed on Following table with followingId is userId, and source is "User" table, the same as with following a post with followingId is postId and source is "Post" table.
The problem is when fetch data from what your following, the query join many tables to return result if user followed more things than a Post, and Other User (such as a Tag, a Topic...). this will be not good performance and query time to return data to user.
Do you have any idea about this ? I'm very appreciate to hear your solution, thanks a lot!
Your database design is flawed, instead of one "link" table with a string to identify where the "Followed thing" resides makes it hard to query effectively.
Instead you need one link table per thing linked. SO in your simplified example you might have
[User(id,name,email,password)]
[Post(id,title,description,authorId)]
[UserFollowingUser(id, userId, followedUserId]
[UserFollowPost(id,userId,postId)]
Therefore to get all users following a post, or all posts followed by a user, or get all users following a particular user, or get all users followed by a particular user is easy as pie.