Should I use AutoValue to store aggregated values of a collection?

Should I use AutoValue to store aggregated values of a collection? - meteor

I have a Comments collection and a Page collection. Comments belong to pages. Users can upvote the comments, and I want to display the aggregated sum of all the votes of the comments belonging to a page. What would be a good way to do this?
I was thinking of keeping the sum as an AutoValue inside the page collection. Would there be a way to occasionally trigger a recalculation of the AutoValue? I don't need the sum to be updated realtime, once every 5 minutes would suffice.
Or is this a bad idea? Would it be better to use a ReactiveVar in the template to do the calculation, or something else?
Edit: There's not much special about the setup, really. Simply a comment collection with a numeric 'votes' attribute and a pages collection with a numeric autovalue 'score' that should count the votes.
The pages:
Collections.Pages = new Mongo.Collection("pages");
var PageSchema = new SimpleSchema({
name: {
type: String,
min: 1
},
score: {
type: Number,
autoValue: function (doc) {
var maxValue = 1;
Collections.Comments.find({ pageId: doc.pageId }).map(function(mapDoc){
maxValue += mapDoc.votes;
});
return maxValue;
}
},
The comments:
Collections.Comments = new Mongo.Collection("comments");
var CommentSchema = new SimpleSchema({
pageId: {
type: String
},
name: {
type: String,
optional: true
},
votes: {
type: Number,
label: 'Total Votes',
defaultValue: 0
},

Maybe an alternative approach to periodic/timed recalculations might be to simply recalculate the value in one collection in response to a change in the other collection. You said you don't need realtime, but I don't imagine you'd mind if it was realtime.
I had a similar challenge and used the Meteor Collection Hooks package (see https://github.com/matb33/meteor-collection-hooks).
Example:
Collection.comments.after.update(function(userId, doc) {
// make update to aggregated value in Collections.pages
});

i did something similar: i had News items with Comments, and i wanted to track the number of comments per news item w/o having to publish all the Comments.
i chose to give News a commentCount field. i had methods for adding and removing comments, and as part of that processing, i looked up the associated News item and incremented or decremented its count.
what you're finding with your schema solution is that there's no clear way to trigger the autoValue. (it's an interesting use of autoValue, btw, i'll have to keep that in mind for future use).
so i think you're left with these choices:
create upvote/downvote methods for the votes. in the method handler, do the calculations for total votes and store the updated value along with post. this is similar to what i did with News/Comments.
as David suggested, use collection hooks to do something similar to #1. though i do use collection hooks, it's usually when i don't have a clear hook into what i want to do, it's more of a catchall, or processing driven off something i don't totally control.
take care of it in the publish. when you publish the Page, also look up the vote count and simply add dynamically to the publish object. Note that this won't republish the Page when the votes change, so you would lose that reactivity; you did indicate that you were ok with periodic updates.
getting that updated would be a little tricky, because you would have to force the publisher to run again. e.g. through unsubscribing and resubscribing.
of those 3, based on what i understand of your problem, i like them in the order presented. #3 feels the least viable, but i mention it in case it fits in w/ something else you're doing.

Related

Template level subscription, is running a lot of time... Should I use?

I'm doing my meteor app and it has 1 Collection: Students
In Server I made a Publish that receives 3 params: query, limit and skip; to avoid client to subscribe all data and just show the top 10.
I have also 3 Paths:
student/list -> Bring top 10, based on search input and pagination (using find);
student/:id -> Show the student (using findOne)
student/:id/edit -> Edit the student (using findOne)
Each Template subscribe to the Students collection, but every time the user change between this paths, my Template re-render and re-subscribe.
Should I make just one subscribe, and make the find based on this "global" subscription?
I see a lot of people talking about Template level subscription, but I don't know if it is the better choice.
And about making query on server to publish and not send all data, I saw people talking too, to avoid data traffic...
In this case, when I have just 1 Collection, is better making an "global" subscription?

You're following a normal pattern although it's a bit hard to tell without the code. If there many students then you don't really want to publish them all, only what is really necessary for the current route. What you should do is figure out why your pub-sub is slow. Is it the find() on the server? Do you have very large student objects? (In which case you will probably want to limit what fields are returned). Is the search you're running hitting mongo indexes?
Your publication for a list view can have different fields than for a individual document view, for example:
Meteor.publish('studentList',function(){
let fields = { field1: 1, field2: 1 }; // only include two fields
return Students.find({},fields);
});
Meteor.publish('oneStudent',function(_id){
return Students.find(_id); // here all fields will be included
});

Reactively show number of unread comments in a thread?

I'm making a forum type app with Threads and Comments within a Thread. I'm trying to figure out how to show the total number of unread comments within a thread to each user.
I considered publishing all the Comments for every Thread, but this seems like excessive data to be publishing to the client when all I want is a single number showing the unread Comments. But if I start adding metadata to the Thread collection (such as numComments, numCommentsUnread...), this adds extra moving parts to the app (i.e. I have to track every time a different user adds a Comment to a Thread, etc...).
What are some of the best practices for dealing with this?

I would recommend using the Publish-Counts package (https://github.com/percolatestudio/publish-counts) if all you need is the count. If you need the actual related comments take a look at the meteor-composite-publish (https://github.com/englue/meteor-publish-composite) package.

This sounds like a database design problem.
You will have to keep a collection of UserThreads, which tracks when the last time the user checked the thread. It has the userId, the threadId, and the lastViewed date(or whatever sensible alternatives you might use).
IF the user has never checked the thread then do not have an object in the UserThreads then the unread count would be the comment count.
WHEN the user views the thread for the first time, create a UserThread object for him.
UPDATE the lastViewed on the UserThread whenever he views the thread.
The UnreadCommentCount will be calculated reactively. It is the sum of comments on the thread where the comment's createdAt is newer than the lastViewed on the UserThread. This can be a template helper function that is executed in the view on an as needed basis. For example, when listing Threads in a subforum view, then it would only calculate for the Threads being viewed in that list at that time.
Alternatively, you could keep an unreadCommentCount attribute on the UserThread. Every time a comment is posted to the thread, then you would iterate through that Thread's UserThreads, updating the unreadCommentCount. When the user later visits that thread, you then reset the unreadCommentCount to zero and updated the lastViewed. The user would then subscribe to a publication of his own UserThreads, which would update reactively.
It seems that in building a forum type site that UserThread object would be indispensable for tracking how a User interacts with Threads. If he had viewed it, ignored it, has commented in it, wants to subscribe to it but has not commented yet, etc.

Based on #datacarl answer, you can modify your thread publication to integrate additional data, such as a count of your unread comments. Here is how you can achieve it, using Cursor.observe().
var self = this;
// Modify the document we are sending to the client.
function filter(doc) {
var length = doc.item.length;
// White list the fields you want to publish.
var docToPublish = _.pick(doc, [
'someOtherField'
]);
// Add your custom fields.
docToPublish.itemLength = length;
return docToPublish;
}
var handle = myCollection.find({}, {fields: {item:1, someOtherField:1}})
// Use observe since it gives us the the old and new document when something is changing.
// If this becomes a performance issue then consider using observeChanges,
// but its usually a lot simpler to use observe in cases like this.
.observe({
added: function(doc) {
self.added("myCollection", doc._id, filter(doc));
},
changed: function(newDocument, oldDocument)
// When the item count is changing, send update to client.
if (newDocument.item.length !== oldDocument.item.length)
self.changed("myCollection", newDocument._id, filter(newDocument));
},
removed: function(doc) {
self.removed("myCollection", doc._id);
});
self.ready();
self.onStop(function () {
handle.stop();
});
I guess you can adapt this example to your case. You can remove the white list part if you need to. The count part will be covered using a request such as post.find({"unread":true, "thread_id": doc._id}).count()
Another way to achieve that is to use collection hooks. Each time you insert a comment, you hook on after the insert and you update a dedicated field "unread comments count" in your related thread document. Each time, the user read a post, you update the value.

Should I be further denormalizing? [duplicate]

I've read the Firebase docs on Stucturing Data. Data storage is cheap, but the user's time is not. We should optimize for get operations, and write in multiple places.
So then I might store a list node and a list-index node, with some duplicated data between the two, at very least the list name.
I'm using ES6 and promises in my javascript app to handle the async flow, mainly of fetching a ref key from firebase after the first data push.
let addIndexPromise = new Promise( (resolve, reject) => {
let newRef = ref.child('list-index').push(newItem);
resolve( newRef.key()); // ignore reject() for brevity
});
addIndexPromise.then( key => {
ref.child('list').child(key).set(newItem);
});
How do I make sure the data stays in sync in all places, knowing my app runs only on the client?
For sanity check, I set a setTimeout in my promise and shut my browser before it resolved, and indeed my database was no longer consistent, with an extra index saved without a corresponding list.
Any advice?

Great question. I know of three approaches to this, which I'll list below.
I'll take a slightly different example for this, mostly because it allows me to use more concrete terms in the explanation.
Say we have a chat application, where we store two entities: messages and users. In the screen where we show the messages, we also show the name of the user. So to minimize the number of reads, we store the name of the user with each chat message too.
users
so:209103
name: "Frank van Puffelen"
location: "San Francisco, CA"
questionCount: 12
so:3648524
name: "legolandbridge"
location: "London, Prague, Barcelona"
questionCount: 4
messages
-Jabhsay3487
message: "How to write denormalized data in Firebase"
user: so:3648524
username: "legolandbridge"
-Jabhsay3591
message: "Great question."
user: so:209103
username: "Frank van Puffelen"
-Jabhsay3595
message: "I know of three approaches, which I'll list below."
user: so:209103
username: "Frank van Puffelen"
So we store the primary copy of the user's profile in the users node. In the message we store the uid (so:209103 and so:3648524) so that we can look up the user. But we also store the user's name in the messages, so that we don't have to look this up for each user when we want to display a list of messages.
So now what happens when I go to the Profile page on the chat service and change my name from "Frank van Puffelen" to just "puf".
Transactional update
Performing a transactional update is the one that probably pops to mind of most developers initially. We always want the username in messages to match the name in the corresponding profile.
Using multipath writes (added on 20150925)
Since Firebase 2.3 (for JavaScript) and 2.4 (for Android and iOS), you can achieve atomic updates quite easily by using a single multi-path update:
function renameUser(ref, uid, name) {
var updates = {}; // all paths to be updated and their new values
updates['users/'+uid+'/name'] = name;
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.once('value', function(snapshot) {
snapshot.forEach(function(messageSnapshot) {
updates['messages/'+messageSnapshot.key()+'/username'] = name;
})
ref.update(updates);
});
}
This will send a single update command to Firebase that updates the user's name in their profile and in each message.
Previous atomic approach
So when the user change's the name in their profile:
var ref = new Firebase('https://mychat.firebaseio.com/');
var uid = "so:209103";
var nameInProfileRef = ref.child('users').child(uid).child('name');
nameInProfileRef.transaction(function(currentName) {
return "puf";
}, function(error, committed, snapshot) {
if (error) {
console.log('Transaction failed abnormally!', error);
} else if (!committed) {
console.log('Transaction aborted by our code.');
} else {
console.log('Name updated in profile, now update it in the messages');
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.on('child_added', function(messageSnapshot) {
messageSnapshot.ref().update({ username: "puf" });
});
}
console.log("Wilma's data: ", snapshot.val());
}, false /* don't apply the change locally */);
Pretty involved and the astute reader will notice that I cheat in the handling of the messages. First cheat is that I never call off for the listener, but I also don't use a transaction.
If we want to securely do this type of operation from the client, we'd need:
security rules that ensure the names in both places match. But the rules need to allow enough flexibility for them to temporarily be different while we're changing the name. So this turns into a pretty painful two-phase commit scheme.
change all username fields for messages by so:209103 to null (some magic value)
change the name of user so:209103 to 'puf'
change the username in every message by so:209103 that is null to puf.
that query requires an and of two conditions, which Firebase queries don't support. So we'll end up with an extra property uid_plus_name (with value so:209103_puf) that we can query on.
client-side code that handles all these transitions transactionally.
This type of approach makes my head hurt. And usually that means that I'm doing something wrong. But even if it's the right approach, with a head that hurts I'm way more likely to make coding mistakes. So I prefer to look for a simpler solution.
Eventual consistency
Update (20150925): Firebase released a feature to allow atomic writes to multiple paths. This works similar to approach below, but with a single command. See the updated section above to read how this works.
The second approach depends on splitting the user action ("I want to change my name to 'puf'") from the implications of that action ("We need to update the name in profile so:209103 and in every message that has user = so:209103).
I'd handle the rename in a script that we run on a server. The main method would be something like this:
function renameUser(ref, uid, name) {
ref.child('users').child(uid).update({ name: name });
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.once('value', function(snapshot) {
snapshot.forEach(function(messageSnapshot) {
messageSnapshot.update({ username: name });
})
});
}
Once again I take a few shortcuts here, such as using once('value' (which is in general a bad idea for optimal performance with Firebase). But overall the approach is simpler, at the cost of not having all data completely updated at the same time. But eventually the messages will all be updated to match the new value.
Not caring
The third approach is the simplest of all: in many cases you don't really have to update the duplicated data at all. In the example we've used here, you could say that each message recorded the name as I used it at that time. I didn't change my name until just now, so it makes sense that older messages show the name I used at that time. This applies in many cases where the secondary data is transactional in nature. It doesn't apply everywhere of course, but where it applies "not caring" is the simplest approach of all.
Summary
While the above are just broad descriptions of how you could solve this problem and they are definitely not complete, I find that each time I need to fan out duplicate data it comes back to one of these basic approaches.

To add to Franks great reply, I implemented the eventual consistency approach with a set of Firebase Cloud Functions. The functions get triggered whenever a primary value (eg. users name) gets changed, and then propagate the changes to the denormalized fields.
It is not as fast as a transaction, but for many cases it does not need to be.

Meteor Collection advanced selector

I have a Project collection and a Task collection.
Each project has a user_id field, this holds the owner of the project.
Each task has a project_id field. So the structure is something like this:
User 1
Project 1
Task 1
Task 2
Project 2
Task 3
User 2
Project 3
Task 4
Task 5
For security purposes I only want to publish the projects belonging to a certain logged in user. For the project itself that's quite easy:
Meteor.publish('projects', function(){
return Projects.find({user_id: this.userId});
});
But how do I do this in a clean way for the Task collection? And why does the Collection.Allow doesn't have a 'view' option?
Something like:
Tasks.allow({
view: function (userId, doc) {
return Projects.findOne(doc.project_id).user_id == userId;
}
});
would be nice, is there a reason it's not there?

First, some recommended reading:
Reactive joins in meteor
A similar question on SO
Joins in meteor are currently tricky. It's easy to just join the collections in a publish function, but it isn't always straightforward to make them reactive (run again when things change).
Non-Reactive Options
You could publish both collections at the same time with:
Meteor.publish('projectsAndTasks', function() {
var projectsCursor = Projects.find({user_id: this.userId});
var projectIds = projectsCursor.map(function(p) { return p._id });
return [
projectsCursor,
Tasks.find({project_id: {$in: projectIds}});
];
});
The potential problem is that if tasks were added to a new project, they would not be published (see "The Naive Approach" from the first article above). Depending on how your application starts and stops its subscriptions, this may not matter. If you find that it does, keep reading.
Reactive Options
A simple option is just to denormalize the data. If you also added user_id to your tasks, then no joins are necessary, and the publish function looks like:
Meteor.publish('projectsAndTasks', function() {
var projectsCursor = Projects.find({user_id: this.userId});
var tasksCursor = Tasks.find({user_id: this.userId});
return [projectsCursor, tasksCursor];
});
If that doesn't appeal to you and you are using iron-router, you can do a client-side join in your routes (see "Joining On The Client" from the first article above). It's a bit slower because you need a second round trip but it's clean in that no data needs to be modified and no external packages need to be added.
Finally, you can do a reactive join on the server, either manually using observeChanges (not recommended), or by using a package. I have used publish-with-relations in the past, but it has some issues as pointed out in the articles). For a more complete list of package options, you can see this thread.
Not being a core developer on meteor, I don't have a precise answer for why allow/deny doesn't have a "read" option, but I'll take an educated guess. Depending on how the allow/deny function was written, the publisher would potentially have to run an expensive callback for every single document or partial update. The allow/deny callbacks are easy to tolerate when a single document is being modified, but if you suddenly need to publish several hundred documents and each one needs to be separately evaluated before being transmitted, I don't think that would be practical. I'm pretty sure that's why publishers can act alone as the arbiter of document read authorization.

You can do this for the tasks:
Meteor.publish('tasks', function(){
var projects = Projects.find({user_id: this.userId}, {fields: {_id: 1}});
var projectIdList = projects.map(function(project) { return project._id;});
return Tasks.find({project_id: {$in: projectIdList}});
});
First we get all the projects belonging to the user. We will only need the _id field so we filter the other fields
Then we map the _id's of the projects to a new array.
Then we publish a tasks.find that includes all the project ids in the mapped array.
The allow construction you mentionend is by my knowledge only ment to be used with updates and inserts

Limiting Children of Object in Firebase

I am looking to get back my whole object, but limit one of my children objects.
For example, say you take a chat app like firebase does and you do "rooms".
So you might have
rooms: {
mainroom:{
name: something,
otherAttrs: mfasfd,
messages: {
0: {
message: something
},
1: {
message: something else
}
}
}
I may have 300 messages in that mainroom, but I want to limit it to 30 say. This example is basic, but in my actual application my objects are very related so I don't want to denormalize any further.
I could do a mainroom call, and then do another child call off of that, but I am wondering if I would get dinged twice. in the initial call it would load all messages anyways, and then I would load 30 of them with the child call. Was just hoping someone would have a better recommendation.

Start by reading up about denormalization. This is a concept which is enforced in SQL by table structures, but also important in NoSQL, although you're given enough rope to tangle yourself up and have a bad day.
So the first step is to split messages into its own path:
URL/rooms
URL/messages
Now you can grab your meta data and messages separately, and call limit to set the number loaded:
var fbRef = new Firebase(URL);
var roomRef = fbRef.child('rooms/'+roomId);
var chatRef = fbRef.child('messages/'+roomId).limit(30);
In case you're not convinced that these should be split up, you're going to run into this same issue when you want to create a dropdown containing a list of room names (you have to load all your messages in the current data structure, just to get the room names).
For great justice, split meta data and detailed records into their own paths. Otherwise, all your base are belong to bandwidth.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex