I just read the Firebase blog post titled Denormalizing Your Data Is Normal, and I have request for clarification.
I was with it until the Considerations paragraph. Specifically, the following:
"Modification of comments is easy: just set the value of the comment under /comments to the new content. For deletion, simply delete the comment from /comments — and whenever you come across a comment ID elsewhere in your code that doesn’t exist in /comments, you can assume it was deleted and proceed normally"
For modifications, why don't I have to modify the duplicate comments stored under /links and /users?
For deletions, am I correct in my understanding that once I delete a comment I have to have logic in all my read logic to cross-check /comments in case it was deleted?
Thanks!
The structure detailed in the blog post does not store duplicate comments. We store comments once under /comments then store the name of those comments under /links and /users. These function as pointers to the actual comment data.
Consider the example structure from the post...
{
users: {
user1: {
name: "Alice",
comments: {
comment1: true
}
},
},
comments: {
comment1: {
body: "This is awesome!",
author: "user1"
}
}
}
Note that the actual comment data is only stored once.
If we modify /comments/comment1, we don't need to update anything else because we only store the name of the comment under /links and /users, not the actual comment contents.
If we were to remove /comments/comment1, that would remove the only existence of the comment data. However, we still have these "dangling" references to comment1 under /users/user1/comments.
Imagine we delete /comments/comment1, when we try to load Alice's comments, we can look and see that comment1 doesn't exist anymore. Then our application can react accordingly by either a) deleting the reference or b) ignoring the reference and not trying to display the deleted comment.
Related
Situation
I have the following Firestore setup
/posts/{id}
/posts/{id}/comments/{id}
/users/{id}/followers/{userId}
A user profile can either be public or private. All users can see posts by public users, but only users who follow private users can see said post, ie. they are in the owner's followers collection.
Current Solution
The post doc looks like this:
owner_account_visibility: public || private
ownerId: uid
The comment doc looks the same:
owner_account_visibility: public || private
ownerId: uid
My rules look like this
match /events/{eventId} {
allow read: isValid();
match /eventComments/{commentId} {
allow read: isValid();
}
}
function isValid(){
return (resource.data.owner_account_visibility == "public" || exists(/users/$(resource.data.ownerId)/followers/request.auth.uid)))
}
Problem
I see problems/questions with this solution:
Problem: A user may create many posts, which in turn may have lots of comments. This means that if a user updates their account visibility, a cloud function has to update possibly thousands of post and comment documents
Problem: A user may load many private posts and comments, and for each one of those is a database read, which can get very expensive as the user scrolls their feed
Question: In the isValid() function, there are two conditions seperated by an OR sign (||). Does this mean that if the first condition returns true (resource.data.owner_account_visibility == "public") then the function will not check the second condition (exists(/users/$(resource.data.ownerId)/followers/request.auth.uid)), saving me a database read? If this isn't the case, then I will waste a loooot of reads when a user loads tons of comments from a post even though it is public...
Does anyone have a proposed solution to this problem? Any help would be appreciated :)
I solved this myself. In short, instead of letting a user set their accounts' visibility, I let them set each post's visibility. This is simply because that is the functionality I want in my app. Now, I can simply use resource.data.post_visibility == "public", avoiding the issue of having to update every post if a user changes their account's visibility. If the first condition is false, I do as I did in my current solution in the question (exists(/users/$(resource.data.ownerId)/followers/request.auth.uid)). Also, comments and replies to a post are opened to all authenticated users even though the post is set to private, since comments aren't necessarily the post owner's own content/sensible information
I'm building an app with a social network component using Firebase, currently if a user likes a post I create a node in the user document called likes and I add the post id, example:
users: {
k9EdVpyRJ2R2: {
likes: {
E36F50C: true
}
}
}
I'm wondering if the post gets deleted should I just handle the deleted post-id on client side when I get the likes ids? or is there a better way to trim the data (or even restructuring it since the app is not live yet)
You'd typically remove the likes at the same time as you remove the post. To efficiently determine what likes to remove, you should keep a link from each post to its likes (in addition to the user-to-likes mapping you already have).
With that you can use a single multi-location update to remove the post and all likes. See this blog post for examples: https://firebase.googleblog.com/2015/10/client-side-fan-out-for-data-consistency_73.html
Im new to Firebase, I have Structure my Data on firebase using Indices for maintaining relationshi as described in below articles
https://www.firebase.com/docs/web/guide/structuring-data.html
https://www.firebase.com/blog/2013-04-12-denormalizing-is-normal.html
I just want to clear my retrieval concept on firebase.
as mention in above links
{
links:{
link1:{
title:"Example",
href:"http://example.org",
submitted:"user1",
comments:{
comment1:true
}
}
}
}
when I access link1, response contains link1 data as well as comments: {comment1:true}. Instead of comment1 actual text, accessing link1 gives comment's ID i.e, comment1. its mean when I access link1, it gives me the Ids of comments belongs to that link. so I have to retrieve comments mannually requesting firebase again based on comments ids received in link1 response? Please clear my concept : )
Yes, your concept is correct, you will flatten out the data in firebase and then retrieve comments by using the id from the previous read
How would I structure my data in firebase to retrieve all posts that the current user has not commented on. I am very new to nosql and I can't seem to get my head out of a SQL way of structuring it.
This is my attempt at it:
Posts: {
someUniqueId: {
user: userid,
content: "blah"
}
}
Comments: {
someCommentUniqueId: {
comment: "ola",
post: someUniqueId,
user: userid
}
}
Now if the above is correct, I have absolutely no idea how I would query this. Is it even possible in NOSQL?
Firebase does not have a mechanism to query for the absence of a value. See is it possible query data that are not equal to the specified condition?
In NoSQL you often end up modeling data for the queries that you need. So if you want to know which posts each user still can comment on, model that information in your JSON tree:
CommentablePosts_per_User
$uid
$postid: true
This type of structure is often called an index, since it allows you to efficiently look up the relevant $postid values for a given user. The process of extracting such indexes from the data is often called denormalization. For a (somewhat older) overview of this technique, see this Firebase blog post on denormalization.
I recommend this article as a good introduction to NoSQL data modeling.
If I may suggest a couple of options:
Posts:
someUniquePostId:
user_id_0: false
user_id_1: true
comment: "dude, awesome post"
user_id_2: false
user_id_3: true
comment: "wicked!"
drive space is cheap, so storing all the user id's within the post would allow you to easily select which posts user_id_0 has not commented on by query'ing for user_id_0: false.
Alternatively you could flip the logic
Posts:
post_id_0:
user_id_1: "dude, awesome post"
user_id_3: "wicked"
post_id_1:
user_id_0: "meh"
user_id_2: "sup?"
Users:
user_id_0:
no_posts:
post_id_0: true
user_id_1:
no_posts:
post_id_1: true
This would enable you to query which posts each user has not posted to: in this case, user_id_0 has not posted to post_id_0 and user_id_1 has not posted to post_id_1
Of course, depending on the situation, you can also lean on client logic to get the data you need. For example, if you only care about which posts a user didn't comment on yesterday, you could read query them by .value of yesterday and do a comparison in code to see if their user_id is a child of the post. Obviously avoiding this if the dataset is large.
Generally,the CRUD operation url pattern for model can be like this(Suppose the Model is Post):
new: /posts/new(get)
create:/posts/(post)
edit:/posts/edit(get)
update:/posts/1(put)
get:/posts/1(get)
However if there is a nested model "Comment".
And the association of the "Post" and "Comment" is one-many.
So what should the CURD operation url pattern like for comments ?
new: /posts/1/comments/new or /comments/new
create:?
edit:?
update:?
.......
What is the best practice?
Update:
It seems that the url for comment should be like this:
Get one comment for one post: /posts/1/comments/1
create: /posts/1/comments
update: /posts/1/comments/1
delete: /posts/1/comments/1
Now I am confused with the update and delete operation.
For update and delete: /posts/1/comments/1
SInce the comment id is specified,so I wonder if the /posts/1 inside the url is necessary?
I think the key is whether a comment is "contained" by the post resource. Remember that RESTful urls should be permalinks so under all of your scenarios, the end point to a specific comment(s) must not change. It sounds like it's containted so the url pattern can have the comment nested within the post. If that's not the case (e.g. a comment could move to another post which if nested would change the url) then you want a more flat structure with /comment/{id} urls referenced by the post resource).
The key is if it's a RESTful "Hypermedia API" then like the web it constantly links to the nested or referenced resources. It doesn't rely on the client necessarily understanding the REST pattern or special knowledge as to what end point holds the referenced or contained
resource.
http://blog.steveklabnik.com/posts/2012-02-23-rest-is-over
If a 'comment' is the resource(s) under a 'post' resource:
([httpVerb] /url)
get a post:
[get] /posts/{id}
body has a couple options - either it contains the full deep comments array
(depends on how much data, chat pattern)
{
id:xxx,
title:my post,
comments: [...]
}
... or it just contains the post resource with a url reference to the comments e.g.
{
id: xxx,
title: my post,
commentsUrl: /posts/xxx/comments
}
could also have an option like this (or other options to control depth):
[get] /posts/{id}?deep=true
get a collection of comments within a post:
[get] /posts/{id}/comments
returns 200 and an array of comments in the response body
create a comment for a post:
[post] /posts/{id}/comments
body contains json object to create
returns a 201 created
edit a comment under post:
[patch|post] /posts/{id}/comments/{id}
body contains json object with subset of fields/data to update
returns a 200
replace a post:
[put] /posts/{id}/comment/{id}
body contains json object to *replace*
returns a 200
If you have tons of comments per post, you could also consider a paging pattern:
{
id: xxx,
title: myPost,
pages:6,
commentsUrl:/posts/xxx/comments/page/1
}
then:
/posts/{id}/comments/pages/{pageNo}
{
nextPage: /posts/xxx/comments/2,
pages:7,
comments:
[ { ...,...,}]
}
each page would reference the next page, the page count and an array of comments for that page. If you went with a paging pattern then each comment in the array would have a reference url to the individual comment.
If you find yourself putting an action in the url, you're probably doing something wrong. Good read: http://blog.steveklabnik.com/posts/2011-07-03-nobody-understands-rest-or-http
You want to use the canonical URL for delete/update, which would be the complete one. More importantly, you shouldn't be exposing your Primary Key values which come from the Database as ids in the public api that is your restful URL space. These values could change in the future based on a database update, restore from backup, or anything else, depending on vendor. And you don't want an infrastructure change making all your URLs invalid if you can help it.
If you're using /posts/{postnum}/comments/{commentnum}, you can number the public comment ids starting from 1 for every post, making for shorter, nicer URLs.