Performance of large collection in Meteor 1.0.X - meteor

There has been a LOT of development in the Meteor world, and as such it's getting hard to find answers that work for current versions due to the plethora of answers you find for old, out-dated versions.
I have an app that has a LOT of data in a particular collection. By lots I mean somewhere between 10k-100k, and very potentially a lot more. Essentially it's log data, and I need to display the results in a table with no pagination (like a tail). In researching ways to optimize large collections I keep running into things like this that seem to be for older versions of Meteor.
So, as I see it my options are:
Use fast-render plugin to display the page prior to the subscription (at least this is my understand on how it works).
Use some sort of progressive publish function, where it loads limited more relevant bits of data first, then progressively loads the remaining data by expanding the window/limit (not sure if this would cause heavier load on the server, though). There seems to have been a "progressive publish" plugin, but it doesn't seem to be under active development any longer.
Optimize the lookups via indexing (How do you specify that when creating the collection???)
Profiling and optimizing the template further (not sure how).
Some other method I haven't thought of yet...
Some combination of all-the-above.
What is the proper approach by which to publish and render lots of data in this way?

I'm going to assume that "optimize" means reduced query time.
Always start with the biggest bang for your buck.
Unless you're publishing the entire collection, or query on the _id, then you want to create an index using _ensureIndex. Get more info on this on the mongodb website or by searching other questions. http://docs.mongodb.org/manual/reference/method/db.collection.ensureIndex/
Second, limit the fields to just the info you need. eg {fields: {a:1, b:1}}. http://docs.meteor.com/#/full/fieldspecifiers
Third, don't sort.
If this still isn't good enough, make another question with schema & query details & the desired UI so we can better understand the reactivity and why you can't use some form of pagination.

Related

Static data storage on server-side

Why some data on server-side are still stored in DBC files, not in SQL-DB? In particular - spells (spells.dbc). What for?
We have a lot of bugs in spells and it's very hard to understand what's wrong with spell, but it's harder to find it spell...
Spells, Talents, achievements, etc... Are mostly found in DBC files because that is the way Blizzard did it back in the day. It's true that in 2019 this is a pretty outdated way to work indeed. Databases are getting stronger and more versatile and having hard-coded data is proving to be hard to work with. Hell, DBCs aren't really that heavy anyways and the reason why we haven't made this change yet is that... We have no other reason other than it being a task that takes a bit of time and It is monotonous to do.
We are aware that Trinity core has already made this change but they have far more contributors than we do if that serves as an excuse!
Nonetheless, this is already in our to-do list if you check the issue tracker at the main repository.
While It's true that we can't really edit DBC files because we would lose all the progress when re-extracted or lost the files, however, we can modify spells in a C++ file called SpellMgr.
There we have a function called SpellMgr::LoadDbcDataCorrections().
The main problem while doing this change is that we have to modify the core to support this change, and the function above contains a lot of corrections. Would need intense testing to make sure nothing is screwed up in the process.
In here by altering bits you can remove or add certain properties to the desired spells instead of touching the hard coded dbc files.
If you want an example, in this link, I have changed an Archimonde spell to have no cast time.
NOTE:
In this line, the commentary about damage can be miss leading but that's because I made a mistake and I haven't finished this pull request yet as of 18/04/2019.
The work has been started, notably by Kaev. I think at least 3 DBCs are now useless server side (but probably still needed client side, they are called DataBaseClient for a reason) like item.dbc.
Also, the original philosophy (for ALL cores, not just AC) was that we would not touch DBC because we don't do custom modifications, so there was no interest in having them server side.
But we wanted to change this and started to make them available directly in the DB, if you wish to help with that, it would be nice!
Why?
Because when emulation started, dbc fields were 90% unknown. So, developers created a parser for them that just required few code changes to support new fields as soon as their functionality was discovered.
Now that we've discovered 90% of required dbc fields and we've also created some great conversion tools for DBC<->SQL, it's just a matter of "effort".
SQL conversion is useful to avoid using of client data on server (you can totally overwrite them if you don't want to go against EULA) or just extends/customize them.
Here you are the issue about DBC->SQL conversion: https://github.com/azerothcore/azerothcore-wotlk/issues/584

Querying the Cache using Linq in asp.net

Search is the most used feature on our website and the search query is the most CPU intensive, complex and frequent query that executes on our db, causing heavy CPU usages on the db server. To reduce the load on the db we have been looking at various caching strategies. For now, we intend to use the ASP.NET Cache.
The idea is to have an in-memory db of the most frequently/recently created/accessed objects in the cache and then query the in-memory db using linq to come up with search results. My initial thought was to Cache a List of the Users and then query or modify this List using linq. But given the complexities of multiple threads accessing or trying to modify List I was looking at other options.
Which is when I thought that instead of Caching a List, cache the individual User objects with its Id as the key and try and query the Cache. At http://msdn.microsoft.com/en-us/library/system.web.caching.cache.aspx I see that the Cache has an extension method AsQueryable but I am not sure what does this mean. Cache is a key value pair so with AsQueryable will I able to query the keys and get a set of User Objects or will I able to query the User objects and get my desired result?
Before you start this you really need to have some measurability in place around it -- there is no way to figure out if your changes help or hurt without having some good, solid data to make that judgement on. Performance, especially performance at scale isn't something you can think or guess through. You have to know your way through it.
As for your solution, I think you might well make the problem worse or at least create another problem here. Your database server is theoretically designed to handle arbitrary user queries across vast information sets efficiently. Linq is awesome but it is not really meant to be an ad-hoc search engine -- it doesn't have the sorts of indexing capabilities one really expects from search engines. Just because it can expose things as an IQueryable doesn't mean you should treat it that way. And even if you've got a way to efficently search the cache, you've got another problem to get past -- how do you identify what is most frequently used? And how do you manage the ASP.NET cache to not start ejecting things when it gets low on memory?
You would probably be better served here by:
Starting with some good old fashioned database tuning -- why are your queries so slow and expensive? Are you missing an index somewhere?
Looking at caching the results page output, especially if your search URLs are GET-able as that is pretty easy to manage. This is a great short term solution if the site is melting.
Look at building the search bits properly. Using LIKE %whatever% is not a proper search. Full text indexes in your database is a good start. Something like lucene.net is probably better.
No, cannot use AsQueryable to query User objects and get the desired result I was looking for. So now I will be using a static List for the time being though I know I will have to change sooner rather than later.

Best Practices for Passing Data Between Pages

The Problem
In the stack that we re-use between projects, we are putting a little bit too much data in the session for passing data between pages. This was good in theory because it prevents tampering, replay attacks, and so on, but it creates as many problems as it solves.
Session loss itself is an issue, although it's mostly handled by implementing Session State Server (or by using SQL Server). More importantly, it's tricky to make the back button work correctly, and it's also extra work to create a situation where a user can, say, open the same screen in three tabs to work on different records.
And that's just the tip of the iceberg.
There are workarounds for most of these issues, but as I grind away, all this friction gives me the feeling that passing data between pages using session is the wrong direction.
What I really want to do here is come up with a best practice that my shop can use all the time for passing data between pages, and then, for new apps, replace key parts of our stack that currently rely on Session.
It would also be nice if the final solution did not result in mountains of boilerplate plumbing code.
Proposed Solutions
Session
As mentioned above, leaning heavily on Session seems like a good idea, but it breaks the back button and causes some other problems.
There may be ways to get around all the problems, but it seems like a lot of extra work.
One thing that's very nice about using session is the fact that tampering is just not an issue. Compared to passing everything via the unencrypted QueryString, you end up writing much less guard code.
Cross-Page Posting
In truth I've barely considered this option. I have a problem with how tightly coupled it makes the pages -- if I start doing PreviousPage.FindControl("SomeTextBox"), that seems like a maintenance problem if I ever want to get to this page from another page that maybe does not have a control called SomeTextBox.
It seems limited in other ways as well. Maybe I want to get to the page via a link, for instance.
QueryString
I'm currently leaning towards this strategy, like in the olden days. But I probably want my QueryString to be encrypted to make it harder to tamper with, and I would like to handle the problem of replay attacks as well.
On 4 guys from Rolla, there's an article about this.
However, it should be possible to create an HttpModule that takes care of all this and removes all the encryption sausage-making from the page. Sure enough, Mads Kristensen has an article where he released one. However, the comments make it sound like it has problems with extremely common scenarios.
Other Options
Of course this is not an exaustive look at the options, but rather the main options I'm considering. This link contains a more complete list. The ones I didn't mention such as Cookies and the Cache not appropriate for the purpose of passing data between pages.
In Closing...
So, how are you handling the problem of passing data between pages? What hidden gotchas did you have to work around, and are there any pre-existing tools around this that solve them all flawlessly? Do you feel like you've got a solution that you're completely happy with?
Thanks in advance!
Update: Just in case I'm not being clear enough, by 'passing data between pages' I'm talking about, for instance, passing a CustomerID key from a CustomerSearch.aspx page to Customers.aspx, where the Customer will be opened and editing can occur.
First, the problems with which you are dealing relate to handling state in a state-less environment. The struggles you are having are not new and it is probably one of the things that makes web development harder than windows development or the development of an executable.
With respect to web development, you have five choices, as far as I'm aware, for handling user-specific state which can all be used in combination with each other. You will find that no one solution works for everything. Instead, you need to determine when to use each solution:
Query string - Query strings are good for passing pointers to data (e.g. primary key values) or state values. Query strings by themselves should not be assumed to be secure even if encrypted because of replay. In addition, some browsers have a limit on the length of the url. However, query strings have some advantages such as that they can be bookmarked and emailed to people and are inherently stateless if not used with anything else.
Cookies - Cookies are good for storing very tiny amounts of information for a particular user. The problem is that cookies also have a size limitation after which it will simply truncate the data so you have to be careful with putting custom data in a cookie. In addition, users can kill cookies or stop their use (although that would prevent use of standard Session as well). Similar to query strings, cookies are better, IMO, for pointers to data than for the data itself unless the data is tiny.
Form data - Form data can take quite a bit of information however at the cost of post times and in some cases reload times. ASP.NET's ViewState uses hidden form variables to maintain information. Passing data between pages using something like ViewState has the advantage of working nicer with the back button but can easily create ginormous pages which slow down the experience for the user. In general, ASP.NET model does not work on cross page posting (although it is possible) but instead works on posts back to the same page and from there navigating to the next page.
Session - Session is good for information that relates to a process with which the user is progressing or for general settings. You can store quite a bit of information into session at the cost of server memory or load times from the databases. Conceptually, Session works by loading the entire wad of data for the user all at once either from memory or from a state server. That means that if you have a very large set of data you probably do not want to put it into session. Session can create some back button problems which must be weighed against what the user is actually trying to accomplish. In general you will find that the back button can be the bane of the web developer.
Database - The last solution (which again can be used in combination with others) is that you store the information in the database in its appropriate schema with a column that indicates the state of the item. For example, if you were handling the creation of an order, you could store the order in the Order table with a "state" column that determines whether it was a real order or not. You would store the order identifier in the query string or session. The web site would continue to write data into the table to update the various parts and child items until eventually the user is able to declare that they are done and the order's state is marked as being a real order. This can complicate reports and queries in that they all need to differentiate "real" items from ones that are in process.
One of the items mentioned in your later link was Application Cache. I wouldn't consider this to be user-specific since it is application wide. (It can obviously be shoe-horned into being user-specific but I wouldn't recommend that either). I've never played with storing data in the HttpContext outside of passing it to a handler or module but I'd be skeptical that it was any different than the above mentioned solutions.
In general, there is no one solution to rule them all. The best approach is to assume on each page that the user could have navigated to that page from anywhere (as opposed to assuming they got there by using a link on another page). If you do that, back button issues become easier to handle (although still a pain). In my development, I use the first four extensively and on occasion resort to the last solution when the need calls for it.
Alright, so I want to preface my answer with this; Thomas clearly has the most accurate and comprehensive answer so far for people starting fresh. This answer isn't in the same vein at all. My answer is coming from a "business developer's" standpoint. As we all know too well; sometimes it's just not feasible to spend money re-writing something that already exists and "works"... at least not all in one shot. Sometimes it's best to implement a solution which will let you migrate to a better alternative over time.
The only thing I'd say Thomas is missing is; client-side javascript state. Where I work we've found customers are coming to expect "Web 2.0"-type applications more and more. We've also found these sorts of applications typically result in much higher user satisfaction. With a little practice, and the help of some really great javascript libraries like jQuery (we've even started using GWT and found it to be AWESOME) communicating with JSON-based REST services implemented in WCF can be trivial. This approach also provides a very nice way to start moving towards a SOA-based architecture, and clean separation of UI and business logic.
But I digress.
It sounds to me as though you already have an application, and you've already stretched the limits of ASP.NET's built-in session state management. So... here's my suggestion (assuming you've already tried ASP.NET's out-of-process session management, which scales signifigantly better than the in-process/on-box session management, and it sounds like you have because you mentioned it); NCache.
NCache provides you with a "drop-in" replacement for ASP.NET's session management options. It's super easy to implement, and could "band-aid" your application more than well enough to get you through - without any significant investment in refactoring your existing codebase immediately.
You can use the extra time and money to start reducing your technical debt by focusing new development on things with immediate business-value - using a new approach (such as any of the alternatives offered in the other answers, or mine).
Just my thoughts.
Several months later, I thought I would update this question with the technique I ended up going with, since it has worked out so well.
After playing with more involved session state handling (which resulted in a lot of broken back buttons and so on) I ended up rolling my own code to handle encrypted QueryStrings. It's been a huge win -- all of my problem scenarios (back button, multiple tabs open at the same time, lost session state, etc) are solved and the complexity is minimal since the usage is very familiar.
This is still not a magic bullet for everything but I think it's good for about 90% of the scenarios you run into.
Details
I built a class called CorePage that inherits from Page. It has methods called SecureRequest and SecureRedirect.
So you might call:
SecureRedirect(String.Format("Orders.aspx?ClientID={0}&OrderID={1}, ClientID, OrderID)
CorePage parses out the QueryString and encrypts it into a QueryString variable called CoreSecure. So the actual request looks like this:
Orders.aspx?CoreSecure=1IHXaPzUCYrdmWPkkkuThEes%2fIs4l6grKaznFGAeDDI%3d
If available, the currently logged in UserID is added to the encryption key, so replay attacks are not as much of a problem.
From there, you can call:
X = SecureRequest("ClientID")
Conclusion
Everything works seamlessly, using familiar syntax.
Over the last several months I've also adapted this code to work with edge cases, such as hyperlinks that trigger a download - sometimes you need to generate a hyperlink on the client that has a secure QueryString. That works really well.
Let me know if you would like to see this code and I will put it up somewhere.
One last thought: it's weird to accept my own answer over some of the very thoughtful posts other people put on here, but this really does seem to be the ultimate answer to my problem. Thanks to everyone who helped get me there.
After going through all the above scenarios and answers and this link Data pasing methods My final advice would be :
COOKIES for:
ENCRYPT[userId's]
ENCRYPT[productId]
ENCRYPT[xyzIds..]
ENCRYPT[etc..]
DATABASE for:
datasets BY COOKIE ID
datatables BY COOKIE ID
all other large chunks BY COOKIE ID
My advise also depends on the below statistics and this link details Data pasing methods :
I would never do this. I have never had any issues storing all session data in the database, loading it based on the users cookie. It's a session as far as anything is concerned, but I maintain control over it. Don't give up control of your session data to your web server...
With a little work, you can support sub sessions, and allow multi-tasking in different tabs/windows.
As a starting point, I find using the critical data elements, such as a Customer ID, best put into the query string for processing. You can easily track/filter bad data coming off of these elements, and it also allows for some integration with e-mail or other related sites/applications.
In a previous application, the only way to view an employee or a request record involving them was to log into the application, do a search for the employee or do a search for recent records to find the record in question. This became problematic and a big time sink when somebody from a related department needed to do a simple view on records for auditing purposes.
In the rewrite, I made both the employee Id, and request Ids available through a basic URL of "ViewEmployee.aspx?Id=XXX" and "ViewRequest.aspx?Id=XXX". The application was setup to A) filter out bad Ids and B) authenticate and authorize the user before allowing them to these pages. What this allowed the primarily application users to do was to send simple e-mails to the auditors with a URL in the e-mail. When they were in a big hurry, they were in their bulk processing time, they were able to simply click down a list of URLs and do the appropriate processing.
Other session related data, such as modification dates and maintaining the "state" of the user's interaction with the application gets a little more complex, but hopefully this provides a starting poing for you.

Bugzilla: How to get an rss feed for bug comments?

I can see where to get an rss feed for the BUG LIST, however I would like to get rss updates for modifications to current bugs if possible.
This is quite high up when searching via Google for it, so I'm adding a bit of advertisement here:
As Bugzilla still doesn't support this I wrote a small web service supporting exactly this. You can find its source code here and a running instance here.
What you're asking for is the subject of this enhancement bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=256718
but no one seems to be working on it.
My first guess is that the way to do it is to add a template somewhere like template/en/default/bug/show.atom.tmpl with whatever you need. Put it in custom or an extension as needed.
If you're interested in working on it or helping someone with it, visit channel #mozwebtools on irc.mozilla.org.
Not a perfect solution, but with the resolution of bug #255606, Bugzilla now allows listing all bugs, by running a search with no criteria, and you can then get the results of the search in Atom format using the link in the bottom of the list.
From the release notes for 4.2:
Configuration: A new parameter search_allow_no_criteria has been added (default: on) which allows admins to forbid queries with no criteria. This is particularly useful for large installations with several tens of thousands bugs where returning all bugs doesn't make sense and would have a performance impact on the database.

ajaxified auto suggest

I have a search module with Auto Suggest feature to build in ASP.Net
The search criteria is Training Name and there is a table in database that stores trainings. The size would be as large as 30,000 trainings in the table so I have to be very careful in selecting the approach keeping in mind the performance.
There could be about 3000 users logging in the system simultaneously. When the user starts typing a training name the system should autosuggest.
The approaches that came in my mind were as under
Cache object - There would be a database hit after the user types 3 (e.g. saf) characters and the system would search the activity table for all trainings starting with saf and would cache them. The other requests would go thro this cache.
But the problem with this approach would be if there are 3000 concurrent users using the system and if they all search for different combinations of 3 different letters the cache would just blow.
Client side caching - Did not think much on this. The only drawback I see here is we might have to purge the temporary internet folder periodically.
Using Session - I thought to rule this out completely as I thought it would hit performance.
Can you please suggest the best or any other different approach I can take here. I am looking for all information/ideas that you have on this.
Thank you so much
Deepa.
My favourite jQuery plug-in to do that (if you're in intent to use jQuery) is the Flexbox.
It has a really impressive list of features.
You could use the jQuery Auto Complete plugin, which has caching features built in.
$(document).ready(function()
{
$(".landingpage").autocomplete('/AutoSuggestHandler.ashx',
{
minChars: 1,
matchSubset: 1,
autoFill: false,
delay: 10,
scroll: false
}).result(OnResultSelected);
}
Furthermore, you could specify outpu caching on the generic handler, to accommodate the need of caching across users.
I think your first approach will work.
Make sure there is an index on the field - you probably won't need to index the whole field. This should give the database a decent boost. You may need to look at full text indexing depending on how your search works, or even use an external library like lucene for the index is performance is an issue.
Cache the object, or even the resulting xml/json from the queries to improve performance.
You should also set the http headers so that browsers cache the xml/json as well.
Your posting really contains two questions:
How can I get autocomplete on my webpage?
I am concerned about performance due to a large number of queries hitting my database at the same time.
My answers...
1: We've found the ASP.NET AJAX AutoComplete Extender works well on all modern browsers, provides a slick user experience and is pretty easy to implement.
In your web application you need to create a web service that has a method with a specific signature (covered in the documentation linked to above).
2: Have you proven that you actually have a performance bottleneck with this part of your project? I'd recommend setting up a test harness and hitting your database with a large number of autocomplete queries to see how much it can take. Be wary of premature optimization.

Resources