For a record set of 10000 rows just a window of a few hundred rows
is subscribed via start row,stop row, takes over 10 seconds to show "ready" ? Should all records be removed before moving the window ? Why is meteor slow ?
Firstly, make sure you are only publishing the keys you really need and not those you don't. For example:
Meteor.publish('queryData',function(queryId,startRow,stopRow){
return queryData.find({ ... query ...},{ fields: {name: 1, description: 1}});
});
This is especially important if your documents are big.
Secondly, look at the websocket traffic in your browser's inspector to see how much data you are actually sending over in the publication.
Thirdly, make sure your collection is indexed on the keys you are searching on so that you are not doing collection scans.
Meteor is generally pretty fast but simple mistakes can make it feel very slow.
Related
Firebase has an interesting feature/nuisance where when you listen on a data ref, you get all the data that was ever added to that ref. So, for example, when you listen on 'child_added', you get a replay of all the children that were added to that ref from the beginning of time. We are writing a commenting system with a dataset that looks something like this:
/comments
/sites
/sites/articles
/users
Sites have many articles and articles have many comments and users have many comments.
We want to be able to track all the comments a user makes, so we feel it is wise to put comments in a separate ref rather than partition them by the articles they belong to. We have a backend listener that needs to do things on new comments as they arrive (increment their child counts, adjust a user's stats etc.). My concern is that, after a while, it will take this listener a long time to start up if it has to process a replay of every comment ever made.
I thought about possibly storing comments only in articles and storing references to each comment's siteId/articleId/commentId in the user table so we could still find all the comments for a given user, but this complicates the backend, as it would then probably need to have a separate listener for each site or even each article, which could make it difficult to manage so many listeners.
Imagine if one of these articles is on a very high-traffic site with tens of thousands of articles and thousands of comments per article. Is the scaling answer to somehow keep track of the traffic levels of every site and set up and partition them in a way that they are assigned to different worker processes? And what about the question of startup time and how long it takes to replay all data every time we load up our workers?
Adding on to Frank's answer, here are a couple other possibilities.
Use a queue strategy
Since the workers are really expecting to process one-time events, then give them one-time events which they can pull from a queue and delete after they finish processing. This resolves the multiple-worker scenario elegantly and ensures nothing is ever missed because a server was offline
Utilize a timestamp to reduce backlog
A simple strategy for avoiding backlog during reboot/startup of the workers is to add a timestamp to all of the events and then do something like the following:
var startTime = Date.now() - 3600 // an hour ago
pathRef.orderByChild('timestamp').startAt( startTime );
Keep track of the last id processed
This only works well with push ids, since formats that do not sort naturally by key will likely become out of order at some point in the future.
When processing records, have your worker keep track of the last record it added by writing that value into Firebase. Then one can use orderByKey().startAt( lastKeyProcessed ) to avoid the backlog. Annoyingly, we then have to discard the first key. However, this is an efficient query, does not cost data storage for an index, and is quick to implement.
If you only need to process new comments once, you can put them in a separate list, e.g. newComments vs. comments (the ones that have been processed). The when you're done processing, move them from newComments to comments.
Alternatively you can keep all comments in a single list like you have today and add a field (e.g. "isNew") to it that you set to true initially. Then you can filter with orderByChild('isNew').equalTo(true) and update({ isNew: false }) once you're done with processing.
I've been struggling with the following issue in Meteor + iron router:
I have a page (route) that has a subscription to a mongo collection
on that page, I have some logic which relies on a cursor querying the collection, also utilizing an observeChanges handler (namely, I'm running a search on that collection)
the problem in this case is the collection is being preserved in the client throughout route changes, which causes 2 unwanted effects:
1) the collection isn't necessarily needed outside that route, meaning i'm wasting client RAM (the collection, or even a subset of it, is likely to be quite big)
2) whenever i go back to that route, I want to start off with an empty subset for the observeChanges handler to work properly.
Any advice on how to clear the mirrored collection? (using the Collection._collection.remove({}) hack is bad practice, and doesn't even solve the problem)
Thanks!
solved this by storing the subscription handles. used them to unsubscribe the data (i.e. subscription_handle.stop() ) on template.destroyed()
Meteor app, typical pattern, I have publish on a server, subscribe on a client.
Reactivity is great, but now I have a need to let client synchronize its local minimongo (or, lets say, fetch new values from server) only each, lets say, 30 seconds.
Is there a way to do so? In other words I must be able to delay synchronisation for n seconds and repeat it every n seconds also.
The only pattern comes in mind right now is a very dirty one - just use an another helper for layout that only updates each n seconds, but that doesn't save me traffic because synchronisation will happen anyway, I will only visually make it like its been synchronized not in real time.
Seems like you don't necessarily want to prevent the subscription itself from stopping/starting (this would get difficult as meteor will think there is no data and will remove everything reactively).
Really you just want to prevent the UI from updating as often. One way to do that is the following, which will change the local cursor query to be temporarily reactive (allowing the DOM to update) every 5 seconds, and then non-reactive right away:
# client.coffee
Meteor.setInterval ->
Session.set('reactive', true)
Session.set('reactive', false)
, 5000
Template.test.helpers
docs: -> Collection.find {}, {reactive:Session.get('reactive')}
This would be my initial approach just to demo the concept, and it seems pretty hacky; it works in a tiny app but I haven't tested it in anything big. I've never seen this kind of thing being used in a real app, but understand why you might want it.
another approach is to add a updateTimestamp to each document. Then you can publish all the documents until a specific time-stamp and update this every 30 seconds. Making sure you do not get the documents every time they are added or changed
the biggest difficulty would be to manage the time difference between the client and the server.
Meteor.publish("allPosts", function(until){
return Posts.find({updateTimeStamp: {$lte: until}});
});
and on the client
Meteor.setInterval(function(){
Meteor.subscribe("allPosts", new Date());
}, 30000)
I need to manage the acquisition of many record at hour. About 1000000 records. And I need to get every second the last insert value for every primary key. It works quit well with sharding. I was thinking to try the use os capped collection to get only the last record for every primary key. In order to do this, I made two separated insert, there is a way, into mongodb, to make some kind of trigger to propagate the insert into a collection to another collection?
MongoDB does not have any support for triggers or similar behavior.
The only way to do this is to make it happen in your code. So the code that writes the first entry should also write the second.
People have definitely requested triggers. If they are necessary for your solution, please cast a vote on the feature request.
I disagree with "triggers is needed". People, MongoDB was created to be very fast and to provide as basic functionalities as can be. This is a power of this solution.
I think that here the best think is to create triggers inside Your application as a part of Data Access layer.
I'm using ASP.net and an SQL database. I have a blog like system where a number of comments are made against a post and I want to display the number of those comments next to the post. To get that number I could either hold it in the post record and add/subtrack when a comment is added or deleted or I could use the SQL to calculate the number of comments using a query each time a user hits the page. The latter seems to be a bad idea as its going to hit my SQL database harder however holding the number against the record feels like it could be error prone. What do you think is best coding practice in this case?
Always start with a normalized database (your second option). Only denormalize if you have an absolute necessity for performance reasons. Designing it in the denormalized way (which is error-prone as you guessed) is premature optimization. With proper indexes it should be fine calculating the number on the fly.
I think the SQL statement should be fine. The other is duplication of data you already have. A count query should be quick.
Don't optimize prematurely. Use the simple solution and pagefault in optimizations only when they're needed.
I would query the database each time you want the information. I would revisit it later if you find that performance is lacking (optimize later). For the traffic most blog type applications will get, that should be sufficient.
Perhaps get the count back as part of the main thread query so as to limit the number of hits on the actual DB from the webserver. But I would always query the actual count and not try and keep it in a field, data will eventually get out of sync as that is reality.
To increase performance, you could keep a flag in the main table to indicate if the item has any comments but only use this as a 'hint' as to whether or not to perform an additional query to count and retrieve comments at a later time.
Imagine a photo gallery that returns 50 photos to rotate through. Each photo could have its own comments.
The initial page load would return a list of photos plus a flag indicating if a photo has comments.
When a photo is displayed, if the comments flag is set to True, your app would make an ajax request to count and fetch the comments for that photo.
If only 3 out of the 50 photos have comments, you just saved yourself 47 additional requests!
This does denormalize the data, but on a limited level.
Creating hints can really help improve performance for very busy sites.
Depending on how your data model looks...Don't add the total post count to the main thread record, it is error prone, you should calculate the comment count when needed based on the thread ID, IMHO
Caching the pages and updating that cache as comments are added/removed would be a good option a long with the SQL count query if you are that worried about the number of queries happening against the db..
I usually use an indexed view for this kind of thing. This allows you to denormalize the data for quick retrieval, but there is no way for it to get out of sync. Folks will also not be confused and think the view is the master of the data. I have mostly used the standard sku of SS2K5, so I have to specify the (noexpand) hint to get it to actually use the index on the view (enterprise will do it automatically). So for standard sku, I always create a wrapper view that everyone hits so I know the hint is always in place.
Coding this on the web page, so hopefully no syntax errors ;)
create view postCount__
as
select
threadId
,postCount=count_big(*)
from thread
group by threadId
go
create unique clustered index postCount__xpk_threadid on postCount__(threadId)
go
create view postCount
as
select
threadId
,postCount=cast(postCount as int)
from postCount__ with (noexpand)
go
So I use a nomenclature on the actual indexed view to let everyone know not to query it directly. Instead they look for the associated wrapper view that enforces the noexpand hint. Using an indexed view forces you to do count_big, so I often cast down to int in the wrapper view to be able to keep our asp.net code lazily using 32 bit ints. It would be better to omit the cast, but it hasn't been of any significant impact for me.
EDIT - I can tell you that forum software always denormalizes the post count to the thread table. It kills the DB to continually count the post count on every page view if you have an active forum. I love that mssql has indexed views so you can define the denormalization declaratively rather than maintain it yourself.