Search is the most used feature on our website and the search query is the most CPU intensive, complex and frequent query that executes on our db, causing heavy CPU usages on the db server. To reduce the load on the db we have been looking at various caching strategies. For now, we intend to use the ASP.NET Cache.
The idea is to have an in-memory db of the most frequently/recently created/accessed objects in the cache and then query the in-memory db using linq to come up with search results. My initial thought was to Cache a List of the Users and then query or modify this List using linq. But given the complexities of multiple threads accessing or trying to modify List I was looking at other options.
Which is when I thought that instead of Caching a List, cache the individual User objects with its Id as the key and try and query the Cache. At http://msdn.microsoft.com/en-us/library/system.web.caching.cache.aspx I see that the Cache has an extension method AsQueryable but I am not sure what does this mean. Cache is a key value pair so with AsQueryable will I able to query the keys and get a set of User Objects or will I able to query the User objects and get my desired result?
Before you start this you really need to have some measurability in place around it -- there is no way to figure out if your changes help or hurt without having some good, solid data to make that judgement on. Performance, especially performance at scale isn't something you can think or guess through. You have to know your way through it.
As for your solution, I think you might well make the problem worse or at least create another problem here. Your database server is theoretically designed to handle arbitrary user queries across vast information sets efficiently. Linq is awesome but it is not really meant to be an ad-hoc search engine -- it doesn't have the sorts of indexing capabilities one really expects from search engines. Just because it can expose things as an IQueryable doesn't mean you should treat it that way. And even if you've got a way to efficently search the cache, you've got another problem to get past -- how do you identify what is most frequently used? And how do you manage the ASP.NET cache to not start ejecting things when it gets low on memory?
You would probably be better served here by:
Starting with some good old fashioned database tuning -- why are your queries so slow and expensive? Are you missing an index somewhere?
Looking at caching the results page output, especially if your search URLs are GET-able as that is pretty easy to manage. This is a great short term solution if the site is melting.
Look at building the search bits properly. Using LIKE %whatever% is not a proper search. Full text indexes in your database is a good start. Something like lucene.net is probably better.
No, cannot use AsQueryable to query User objects and get the desired result I was looking for. So now I will be using a static List for the time being though I know I will have to change sooner rather than later.
Related
There has been a LOT of development in the Meteor world, and as such it's getting hard to find answers that work for current versions due to the plethora of answers you find for old, out-dated versions.
I have an app that has a LOT of data in a particular collection. By lots I mean somewhere between 10k-100k, and very potentially a lot more. Essentially it's log data, and I need to display the results in a table with no pagination (like a tail). In researching ways to optimize large collections I keep running into things like this that seem to be for older versions of Meteor.
So, as I see it my options are:
Use fast-render plugin to display the page prior to the subscription (at least this is my understand on how it works).
Use some sort of progressive publish function, where it loads limited more relevant bits of data first, then progressively loads the remaining data by expanding the window/limit (not sure if this would cause heavier load on the server, though). There seems to have been a "progressive publish" plugin, but it doesn't seem to be under active development any longer.
Optimize the lookups via indexing (How do you specify that when creating the collection???)
Profiling and optimizing the template further (not sure how).
Some other method I haven't thought of yet...
Some combination of all-the-above.
What is the proper approach by which to publish and render lots of data in this way?
I'm going to assume that "optimize" means reduced query time.
Always start with the biggest bang for your buck.
Unless you're publishing the entire collection, or query on the _id, then you want to create an index using _ensureIndex. Get more info on this on the mongodb website or by searching other questions. http://docs.mongodb.org/manual/reference/method/db.collection.ensureIndex/
Second, limit the fields to just the info you need. eg {fields: {a:1, b:1}}. http://docs.meteor.com/#/full/fieldspecifiers
Third, don't sort.
If this still isn't good enough, make another question with schema & query details & the desired UI so we can better understand the reactivity and why you can't use some form of pagination.
I was just learning how to use the LINQ/EDM combination to retrieve and update a simple user-thread-and-comment webapp as part of evaluating it.
When I turn on SQL Profiler, I rarely see a SQL statement executed by my app.
I'm starting to really like how well it keeps things cached, coz as soon as I add new data, it magically updates itself while I'm blinking.
But is that something I should be scared of?
My concern is when I use this to make a webapp with some traffic (whatever hit count that reaches this approach's par).
Should I keep a single context object at app-level, so that different sessions can benefit from each other's cache entries?
Or should I do the create-and-release on each page submission?
I know this sounds like an open-ended question, but I really have that as a question: how does the entity cache its data when using LINQ?
On the ObjectContext question you should use a lifetime of per-page-request cycle or smaller. It's designed for a unit of work and not for the application lifetime. Search SO for "ObjectContext lifetime" or "DataContext lifetime" and you'll see this is a common question.
In our application,Many pages includes "update" and when we update a table,we update unnecessary columns,which dont change,too.
i want to know that is there a way to avoid unnecessary column updates?We use stored procedures in .net 2003.In Following link,i found a solution but it is not for stored procedures.
http://blogs.msdn.com/alexj/archive/2009/04/25/tip-15-how-to-avoid-loading-unnecessary-properties.aspx
Thanks
You can really only accomplish this with a good ORM tool that generates the update query for you. It will typically look at what changed and generate the query for only the columns that changed.
If you're using a stored procedure then all of the column values get sent over to the database anyway when you call the stored procedure so you can't save there. The SP will probably just execute a run-of-the-mill UPDATE statement. The RDMS then takes over. It won't physically change the data on disc if it's not different. It's smart enough for that.
So my answer in short: don't worry about it. It's not really a big deal and requires drastic changes to get what you want and you wont even see performance benefits.
When I was working at a financial software company, performance was vital. Some tables had hundreds of columns, and the update statements were costly. We created our own ORM layer (in java) which included an object cache. When we generated the update statement, we compared the current values of every field to the values as they were on load and only updated the changed fields.
Our db was SQLServer. I do not remember the performance improvement, but it was substantial and worth the investment. We also did bulk inserts and updates where possible.
I believe that Hibernate and the other big ORMs all do this sort of thing for you, if you do not want to write one yourself.
I'm creating an ASP.Net website that displays large amounts of data. The data is served to me through a data access layer. From the data I'm getting I'm building up large data tables and then displaying these using either gridview's or dynamically created web controls.
The problem I'm finding is that the website is slow when a lot of data is passed to it. I've read that data readers are the way to go but I can't use a datareader directly from the SQL table due to needing to use the data access layer.
I also don't have the option of partially filling the datatable as I need to apply a lot of sorting methods to the data to display what I need.
Any suggestions of ways to speed up data tables? or perhaps use something else that's more efficient?
Since you are “.. building up large data tables and then displaying these using either gridview's or dynamically created web controls”, the network can be a bottleneck. See the answers to the similar SO question that may be helpful.
Are you absolutely certain that the bottleneck is in the Web Application?
The first thing I would do would be to guess what the longest SQL query you execute on a slow page would be, then see how it runs in the query browser.
If it's slow, work on optimizing that.
Pulling 'large' amounts of data into a web application and doing sorting / filtering there is always going to be slow, depending on your definition of 'large'. If you can apply any sorting / filtering on the database server before you pull the data to your web application that should speed things up. You say you don't have the option to do this but sorting is something that database servers are made for, are you sure you can't make this work?
You can use distributed cache, to cache your data. Memcache (http://www.danga.com/memcached/) or Velocity Microsoft Distributed Cache (http://www.microsoft.com/downloads/details.aspx?FamilyId=B24C3708-EEFF-4055-A867-19B5851E7CD2&displaylang=en).
The first thing you will want to do is to pinpoint exactly what part of the process that is being slow. It might not be where you think it is. Do code profiling or timing of different parts to determine exactly how much time each piece of code consume during a request. In our case we found that the data layer (executing readers, populating object models) were really fast (with a couple of exceptions that were taken care of by indexes in the database), while we had some javascript on the client that was really slow.
So, start with measuring, then decide where to optimize.
I have a search module with Auto Suggest feature to build in ASP.Net
The search criteria is Training Name and there is a table in database that stores trainings. The size would be as large as 30,000 trainings in the table so I have to be very careful in selecting the approach keeping in mind the performance.
There could be about 3000 users logging in the system simultaneously. When the user starts typing a training name the system should autosuggest.
The approaches that came in my mind were as under
Cache object - There would be a database hit after the user types 3 (e.g. saf) characters and the system would search the activity table for all trainings starting with saf and would cache them. The other requests would go thro this cache.
But the problem with this approach would be if there are 3000 concurrent users using the system and if they all search for different combinations of 3 different letters the cache would just blow.
Client side caching - Did not think much on this. The only drawback I see here is we might have to purge the temporary internet folder periodically.
Using Session - I thought to rule this out completely as I thought it would hit performance.
Can you please suggest the best or any other different approach I can take here. I am looking for all information/ideas that you have on this.
Thank you so much
Deepa.
My favourite jQuery plug-in to do that (if you're in intent to use jQuery) is the Flexbox.
It has a really impressive list of features.
You could use the jQuery Auto Complete plugin, which has caching features built in.
$(document).ready(function()
{
$(".landingpage").autocomplete('/AutoSuggestHandler.ashx',
{
minChars: 1,
matchSubset: 1,
autoFill: false,
delay: 10,
scroll: false
}).result(OnResultSelected);
}
Furthermore, you could specify outpu caching on the generic handler, to accommodate the need of caching across users.
I think your first approach will work.
Make sure there is an index on the field - you probably won't need to index the whole field. This should give the database a decent boost. You may need to look at full text indexing depending on how your search works, or even use an external library like lucene for the index is performance is an issue.
Cache the object, or even the resulting xml/json from the queries to improve performance.
You should also set the http headers so that browsers cache the xml/json as well.
Your posting really contains two questions:
How can I get autocomplete on my webpage?
I am concerned about performance due to a large number of queries hitting my database at the same time.
My answers...
1: We've found the ASP.NET AJAX AutoComplete Extender works well on all modern browsers, provides a slick user experience and is pretty easy to implement.
In your web application you need to create a web service that has a method with a specific signature (covered in the documentation linked to above).
2: Have you proven that you actually have a performance bottleneck with this part of your project? I'd recommend setting up a test harness and hitting your database with a large number of autocomplete queries to see how much it can take. Be wary of premature optimization.