How would I structure my data in firebase to retrieve all posts that the current user has not commented on. I am very new to nosql and I can't seem to get my head out of a SQL way of structuring it.
This is my attempt at it:
Posts: {
someUniqueId: {
user: userid,
content: "blah"
}
}
Comments: {
someCommentUniqueId: {
comment: "ola",
post: someUniqueId,
user: userid
}
}
Now if the above is correct, I have absolutely no idea how I would query this. Is it even possible in NOSQL?
Firebase does not have a mechanism to query for the absence of a value. See is it possible query data that are not equal to the specified condition?
In NoSQL you often end up modeling data for the queries that you need. So if you want to know which posts each user still can comment on, model that information in your JSON tree:
CommentablePosts_per_User
$uid
$postid: true
This type of structure is often called an index, since it allows you to efficiently look up the relevant $postid values for a given user. The process of extracting such indexes from the data is often called denormalization. For a (somewhat older) overview of this technique, see this Firebase blog post on denormalization.
I recommend this article as a good introduction to NoSQL data modeling.
If I may suggest a couple of options:
Posts:
someUniquePostId:
user_id_0: false
user_id_1: true
comment: "dude, awesome post"
user_id_2: false
user_id_3: true
comment: "wicked!"
drive space is cheap, so storing all the user id's within the post would allow you to easily select which posts user_id_0 has not commented on by query'ing for user_id_0: false.
Alternatively you could flip the logic
Posts:
post_id_0:
user_id_1: "dude, awesome post"
user_id_3: "wicked"
post_id_1:
user_id_0: "meh"
user_id_2: "sup?"
Users:
user_id_0:
no_posts:
post_id_0: true
user_id_1:
no_posts:
post_id_1: true
This would enable you to query which posts each user has not posted to: in this case, user_id_0 has not posted to post_id_0 and user_id_1 has not posted to post_id_1
Of course, depending on the situation, you can also lean on client logic to get the data you need. For example, if you only care about which posts a user didn't comment on yesterday, you could read query them by .value of yesterday and do a comparison in code to see if their user_id is a child of the post. Obviously avoiding this if the dataset is large.
Related
Situation
I have the following Firestore setup
/posts/{id}
/posts/{id}/comments/{id}
/users/{id}/followers/{userId}
A user profile can either be public or private. All users can see posts by public users, but only users who follow private users can see said post, ie. they are in the owner's followers collection.
Current Solution
The post doc looks like this:
owner_account_visibility: public || private
ownerId: uid
The comment doc looks the same:
owner_account_visibility: public || private
ownerId: uid
My rules look like this
match /events/{eventId} {
allow read: isValid();
match /eventComments/{commentId} {
allow read: isValid();
}
}
function isValid(){
return (resource.data.owner_account_visibility == "public" || exists(/users/$(resource.data.ownerId)/followers/request.auth.uid)))
}
Problem
I see problems/questions with this solution:
Problem: A user may create many posts, which in turn may have lots of comments. This means that if a user updates their account visibility, a cloud function has to update possibly thousands of post and comment documents
Problem: A user may load many private posts and comments, and for each one of those is a database read, which can get very expensive as the user scrolls their feed
Question: In the isValid() function, there are two conditions seperated by an OR sign (||). Does this mean that if the first condition returns true (resource.data.owner_account_visibility == "public") then the function will not check the second condition (exists(/users/$(resource.data.ownerId)/followers/request.auth.uid)), saving me a database read? If this isn't the case, then I will waste a loooot of reads when a user loads tons of comments from a post even though it is public...
Does anyone have a proposed solution to this problem? Any help would be appreciated :)
I solved this myself. In short, instead of letting a user set their accounts' visibility, I let them set each post's visibility. This is simply because that is the functionality I want in my app. Now, I can simply use resource.data.post_visibility == "public", avoiding the issue of having to update every post if a user changes their account's visibility. If the first condition is false, I do as I did in my current solution in the question (exists(/users/$(resource.data.ownerId)/followers/request.auth.uid)). Also, comments and replies to a post are opened to all authenticated users even though the post is set to private, since comments aren't necessarily the post owner's own content/sensible information
Say that I have node user, item and user_items used to join them.
Typically one would(as advised in official documents and videos) use such a structure:
"user_items": {
"$userKey": {
"$itemKey1": true,
"$itemKey2": true,
"$itemKey3": true
}
}
I would like to use the following structure instead:
"user_items": {
"$userKey": {
"$itemKey1": 1494912826601,
"$itemKey2": 1494912826602,
"$itemKey3": 1494912826603
}
}
with values being a timestamp value. So that i can order them by creation date also while being able to tell the associated time. Seems like killing two birds with one stone situation. Or is it?
Any down sides to this approach?
EDIT: Also I'm using this approach for the boolean fields such as: approved_at, seen_at,... etc instead of using two fields like:
"some_message": {
"is_seen": true,
"seen_timestamp": 1494912826602,
}
You can model your database in every way you want, as long as you follow Firebase rules. The most important rule is to have the data as flatten as possible. According to this rule your database is structured correctly. There is no 100% solution to have a perfect database but according to your needs and using one of the following situations, you can consider that is a good practice to do it.
1. "$itemKey1": true,
2. "$itemName1": true,
3. "$itemKey1": 1494912826601,
4. "$itemName1": 1494912826601,
What is the meaning of "$itemKey1": 1494912826601,? Beacause you already have set a timestamp, means that your item was uploaded into your database and is linked to the specific user, which means also in other words true. So is not a bad approach to do something like this.
Hope it helps.
Great minds must think alike, because I do the exact same thing :) In my case, the "items" are posts that the user has upvoted. I use the timestamps with orderBy(), along with limitToLast(50) to get the "last 50 posts that the user has upvoted". And from there they can load more. I see no downsides to doing this.
I need help in a scenario when we do multipath updates to a fan-out data. When we calculate the number of paths and then update, in between that, if a new path is added somewhere, the data would be inconsistent in the newly added path.
For example below is the data of blog posts. The posts can be tagged by multiple terms like “tag1”, “tag2”. In order to find how many posts are tagged with a specific tag I can fanout the posts data to the tags path path as well:
/posts/postid1:{“Title”:”Title 1”, “body”: “About Firebase”, “tags”: {“tag1:true, “tag2”: true}}
/tags/tag1/postid1: {“Title”:”Title 1”, “body”: “About Firebase”}
/tags/tag2/postid1: {“Title”:”Title 1”, “body”: “About Firebase”}
Now consider concurrently,
1a) that User1 wants to modify title of postid1 and he builds following multi-path update:
/posts/postid1/Title : “Title 1 modified”
/tags/tag1/postid1/Title : “Title 1 modified”
/tags/tag2/postid1/Title : “Title 1 modified”
1b) At the same time User2 wants to add tag3 to the postid1 and build following multi-path update:
/posts/postid1/tags : {“tag1:true, “tag2”: true, “tag3”: true}
/tags/tag3/postid1: {“Title”:”Title 1”, “body”: “About Firebase”}
So apparently both updates can succeed one after other and we can have tags/tag3/postid1 data out of sync as it has old title.
I can think of security rules to handle this but then not sure if this is correct or will work.
Like we can have updatedAt and lastUpdatedAt fields and we have check if we are updating our own version of post that we read:
posts":{
"$postid":{
".write":true,
".read":true,
".validate": "
newData.hasChildren(['userId', 'updatedAt', 'lastUpdated', 'Title']) && (
!data.exists() ||
data.child('updatedAt').val() === newData.child('lastUpdated').val())"
}
}
Also for tags we do not want to check that again and we can check if /tags/$tag/$postid/updatedAt is same as /posts/$postid/updatedAt.
"tags":{
"$tag":{
"$postid":{
".write":true,
".read":true,
".validate": "
newData.hasChildren(['userId', 'updatedAt', 'lastUpdated', 'Title']) && (
newData.child('updatedAt').val() === root.child('posts').child('$postid').val().child('updatedAt').val())”
}
}
}
By this “/posts/$postid” has concurrency control in it and users can write their own reads
Also /posts/$postid” becomes source of truth and rest other fan-out paths check if updatedAt fields matches with it the primary source of truth path.
Will this bring in consistency or there are still problems? Or can bring performance down when done at scale?
Are multi path updates and rules atomic together by that I mean a rule or both rules are evaluated separately in isolation for multi path updates like 1a and 1b above?
Unfortunately, Firebase does not provide any guarantees, or mechanisms, to provide the level of determinism you're looking for. I have had the best luck front-ending such updates with an API stack (GCF and Lambda are both very easy, server-less methods of doing this). The updates can be made in that layer, and even serialized if absolutely necessary. But there isn't a safe way to do this in Firebase itself.
There are numerous "hack" options you could apply. You could, for example, have a simple lock mechanism using a dedicated collection for tracking write locks. Clients could post to a lock collection, then verify that their key was the only member of that collection, before performing a write. But I hope you'll agree with me that such cooperative systems have too many potential edge cases, potential security issues, and so on. In Firebase, it is best to design such that this component is not a requirement in the first place.
Whenever I encounter code snippets on the web, I see something like
Meteor.subscribe('posts', 'bob-smith');
The client can then display all posts of "bob-smith".
The subscription returns several documents.
What I need, in contrast, is a single-document subscription in order to show an article's body field. I would like to filter by (article) id:
Meteor.subscribe('articles', articleId);
But I got suspicious when I searched the web for similar examples: I cannot find even one single-document subscription example.
What is the reason for that? Why does nobody use single-document subscriptions?
Oh but people do!
This is not against any best practice that I know of.
For example, here is a code sample from the github repository of Telescope where you can see a publication for retrieving a single user based on his or her id.
Here is another one for retrieving a single post, and here is the subscription for it.
It is actually sane to subscribe only to the data that you need at a given moment in your app. If you are writing a single post page, you should make a single post publication/subscription for it, such as:
Meteor.publish('singleArticle', function (articleId) {
return Articles.find({_id: articleId});
});
// Then, from an iron-router route for example:
Meteor.subscribe('singleArticle', this.params.articleId);
A common pattern that uses a single document subscription is a parameterized route, ex: /posts/:_id - you'll see these in many iron:router answers here.
I just read the Firebase blog post titled Denormalizing Your Data Is Normal, and I have request for clarification.
I was with it until the Considerations paragraph. Specifically, the following:
"Modification of comments is easy: just set the value of the comment under /comments to the new content. For deletion, simply delete the comment from /comments — and whenever you come across a comment ID elsewhere in your code that doesn’t exist in /comments, you can assume it was deleted and proceed normally"
For modifications, why don't I have to modify the duplicate comments stored under /links and /users?
For deletions, am I correct in my understanding that once I delete a comment I have to have logic in all my read logic to cross-check /comments in case it was deleted?
Thanks!
The structure detailed in the blog post does not store duplicate comments. We store comments once under /comments then store the name of those comments under /links and /users. These function as pointers to the actual comment data.
Consider the example structure from the post...
{
users: {
user1: {
name: "Alice",
comments: {
comment1: true
}
},
},
comments: {
comment1: {
body: "This is awesome!",
author: "user1"
}
}
}
Note that the actual comment data is only stored once.
If we modify /comments/comment1, we don't need to update anything else because we only store the name of the comment under /links and /users, not the actual comment contents.
If we were to remove /comments/comment1, that would remove the only existence of the comment data. However, we still have these "dangling" references to comment1 under /users/user1/comments.
Imagine we delete /comments/comment1, when we try to load Alice's comments, we can look and see that comment1 doesn't exist anymore. Then our application can react accordingly by either a) deleting the reference or b) ignoring the reference and not trying to display the deleted comment.