I'm generating pages based on an sql query.
This is the query:
CREATEPROCEDURE sp_searchUsersByFirstLetter
#searchQuery nvarchar(1)
AS
BEGIN
SET NOCOUNT ON;
SELECT UserName
FROM Users Join aspnet_Users asp on Users.UserId = asp.UserId
WHERE (LoweredUserName like #searchQuery + '%')
I can call this procedure for each letter in the alphabet, and get all the users that start with that letter. Then, I put these users into a list on one of my pages.
My question is this: would it be better to cache the list of users to my webserver, rather than query the database each time? Like this:
HttpRuntime.Cache.Insert("users", listOfUsersReturnedFromQuery, null, DateTime.Now.AddHours(1), System.Web.Caching.Cache.NoSlidingExpiration);
Its ok for me if the list of users is an hour out of date. Will this be more efficient that querying the database each time?
Using a cache is best reserved for situations where your query meets the following constraints:
The data is not time critical, i.e. make sure a cache hit won't break your application by causing your code to miss a recent update of the data.
The data isn't sequenced, i.e. A, B, C, D, E are cached, F is inserted by another user, your user inserts G and hits the cache, resulting in ABCDEG instead of ABCDEFG.
The data doesn't change much.
The data is queried and re-used frequently.
Size isn't really a factor unless it's going to really tax your RAM.
I have found that one of the best tables to cache is a settings table, where the data is practically static, gets queried on nearly every page request, and changes don't have to be immediate.
The best thing for you to do would be to test which queries are performed most, then select those that are taxing the database server highest. Out of those, cache anything you can afford to. You should also take a look at tweaking maximum cached object ages. If you're performing a query 100 times a second, you can cut that rate down by a factor of 99% by simply caching it for 1 second, which negates the update delay problem for most practical situations.
In case if you have few servers memory cashing isn't so good, because it will take memroy in each server and in each w3p process of every server.
It will be also hard to maintain consistent data.
I would advise to choose from:
basic output cache (assuming you are using MVC this is zero efforts and good imporevement)
Db cache using smaller pre-calculated table where you have mapping from input string to 10 possible results
It really depends. Do you bottleneck at your database server (I would hope that answer is no)? If you are hitting the database 26 times, that is nothing compared to what typically happens. You should be considering caching data in a Dataset or some other offline model if you are hitting the database hundreds of thousands of times.
So I would say, no. You should be fine with your round trips to the database.
But there is no replacement for testing. That'll tell you for sure.
Considering that each DB call is always expensive in terms of network and DB load I would prefer to avoid such extra operations and cache items even they are requested few times per hour.
Only one opposite case I see - when amount of users in terms of memory consumption is a tons of megabytes.
Well Caching data and get back is fastest but it also depends on the data size...If there is large amount of data than it will cause performance issue.
So it almost depends on your requirement.
I would like you to suggest make use of paging or make use of mix mode by loading half of the user put in cache and than load the other data when require....
Related
I want to implement a views counter like most forums, Youtube and several others have. So every time a user reads an article, that is stored and remembered. I also want to know who looked at the article.
My queston is: How do you implement this efficiently? What is the best practice?
One way would be to call a stored procedure for every view, but that would result in a lot of unneeded calls to the database.
Another way would be to store this to some global application object, and then store in DB every 5 minutes or so (and can you even do that in a good way?)
What's the best way to do this?
Database operations are surprisingly cheap and really are not worth worrying about. In the event that a DB operation was even marginally expensive then you can always delegate the blocking operation to a new thread thus freeing-up your page-generation thread (you can trivially do this for UPDATE and INSERT operations that return nothing from the database - they are inconsequential).
Sprocs aren't really in-fashion right now - the performance advantage they might have had from pre-computed execution plans is almost eliminated because modern servers cache plans from all previous queries, and for trivial SELECT, INSERT, and UPDATEs you begin to suffer from increased code complexity. There's nothing wrong with inline SQL commands now.
Anyway, back on-topic and in summary: your assumptions are wrong. There is nothing wrong with running UPDATE Pages SET ViewCount = ViewCount + 1 WHERE PageId = #pageId on every page-view. There is also nothing wrong with doing this either: INSERT INTO UserPageviews (UserId, PageId, DateTime) VALUES ( #userId, #pageId, NOW() ). Both operations are very cheap and will execute in under 2-3 miliseconds on even an old and aged database server.
Another way would be to store this to some global application object,
and then store in DB every 5 minutes or so (and can you even do that
in a good way?)
This method is very prone to data loss unless you use a durable queueing mechanism (like MSMQ). Unless you anticipate massive traffic, I wouldn't even think about this approach.
Writes of this nature are inexpensive and hundreds of operations per second are not a big deal. I recently built a comment/rating framework that acheives throughput of 3000+ complete transactions per second just on my local all-in-one workstation. This included processing the request, validation, and creating multiple records within a transaction.
As a note, you should take steps to ensure that your statistics data isn't vulnerable to artificial inflation/manipulation. This part of the process will probably be more complex than the view tracking itself. For example, a user should not be able to sit and hold down the F5 key and inflate the number of views on their video. Nor should these values be manipulable by HTTP (e.g. creating a small script to send an AJAX request over and over).
This suggests that each INSERT would be preceded by a SELECT to ensure that the same user ID or IP hadn't already been recorded in some period of time. Of course, this isn't foolproof (unless you invest a great deal of effort), but it errs on the side of conservatism which is usually a good approach.
One way would be to call a stored procedure for every view, but that
would result in a lot of unneeded calls to the database.
I regularly have to remind myself (and other developers) to not fear the database. People (me included) sometimes go to great lengths to avoid a few simple database calls. Keep your tables narrow and well-indexed, and operations like this are faster than you might think.
Say I need to populate 4 or 5 dropdowns w/ items from a database. Each drop down will have < 15 items in it. These items almost never change.
Now I could query the DB each time the page is accessed or I could grab the values from a custom class that would check to see if they already exist in ASP.Net's cache and only if they don't query the DB to update the cache.
It's trivial for me to write but I'm unsure if the performace would be better or not. I think it would be (although not likely anything huge).
What do you think?
When dealing with performance issues you should always:
Do things the simplest way first (avoid premature optimisation)
Performance test your code with set performance goals (e.g. 200ms response time under load of N concurent users)
Then, IF your code doesn't perform then profile your code to determine what is slow, and profile your proposed performance fixes to accurately measure what the real-world performance change will be.
Having said that then yes, what you are suggesting seems sensible (you would usually expect an in-memory cache to be quicker than a database), however it also depends on what data is being returned, what the memory load of your application is, how expensive the query is, what the query parameters are etc...
You should performance test your changes before and after to determine the actual effect of your changes (including things like memory load), and you should only really be doing things like this once you have identified that these dropdowns are the cause of an unacceptable performance problem.
That's what System.Web.Helpers.WebCache class exists for.
IO is usually more expensive than memory operations (by orders of magnitude). Especially if your database is in another machine, then you would even be using network resources, and it will definitely be faster to just use the cache.
But indeed, optimize in the end when you have really identified it as a performance bottleneck by measuring.
Quick answer to your question:
Use the built in .Net cache.
Additional points to ponder over..
Preferably, retrieve all master data in a single database retrieval (think stored procedure and dataset): though, I do not advocate the used of stored procs in all scenarios.
As you rightly said, ensure that your data access layer checks the cache before making a round trip to the database
Also, as your drop down values do not change very often; do remember to keep a long expiry duration
Finally, based on your page design you could also look at Fragment Caching (partial page caching: user controls) which could give you bigger benefits since now you neither access the data cache nor the database.
Performance:
Again, the performance depends more on the application's load as compared to your direct round trips for fetching the master data. Put simply, As Thomas suggested use the cache class!
my asp.net application uses some sequences to generate tables primary keys. Db administrators have set the cache size to 20. Now the application is under test and a few records are added daily (say 4 for each user test session).
I've found that new test session records always use new cache portions as if the preavious day cached numbers had expired, losing tenth of keys everyday. I'd like to understand if it's due to some mistake i might have made in my application (disposing of tableadapters or whatever) or if it's the usual behaviour. There are programming best practices to take into account when handling oracle sequences ?
Since the application will not have to bear an heavy load of work (say 20-40 new records at day), i was tinking if it might be the case to set a smaller cache size or none at all.
Does sequence cache resizing implies the reset of current index ?
thank you in advance for any hint
The answer from Justin Cave in this thread might be interesting for you:
http://forums.oracle.com/forums/thread.jspa?threadID=640623
In a nutshell: if the sequence is not accessed frequently enough but you have a a lot of "traffic" in the library cache, then the sequence might be aged out and removed from the cache. In that case the pre-allocated values are lost.
If that happens very frequently to you, it seems that your sequence is not used very often.
I guess that reducing the cache size (or completely disabling it) will not have a noticable impact on performance in your case (also when taking your statement of 20-40 new records a day into account)
Oracle Sequences are not gap-free. Reducing the Cache size will reduce the gaps... but you will still have gaps.
The sequence is not associated to the table by the database, but by your code (via the nextval on the insert via trigger/sql/pkg api) -- on that note you may use the same sequence over multiple tables (it is not like sql server's identity where it is associated to the column/ table)
thus changing the sequence will have no impact on the indexes.
You would just need to make sure if you drop the sequence and restart it, you 'reseed' to the +1 of the current value (e.g. create sequence seqer start with 125 nocache;)
, but
If your application requires a
gap-free set of numbers, then you
cannot use Oracle sequences. You must
serialize activities in the database
using your own developed code.
but be forewarned, you may increase disk IO and possible transaction locking if you choose not to use sequences.
The sequence generator is useful in
multiuser environments for generating
unique numbers without the overhead of
disk I/O or transaction locking.
to reiterate a_horse_with_no_name's comments, what is the issue with gaps in the id?
Edit
also have a look at the caching logic you should use located here:
http://download.oracle.com/docs/cd/E11882_01/server.112/e17120/views002.htm#i1007824
If you are using the sequence for PKs and not to enforce some application logic then you shouldn't worry about gaps. However, if there is some application logic tied to sequential sequence values, you will have holes if you use sequence caching and do not have a busy system. Sequence cache values can be aged out of the library cache.
You say that your system is not very busy, in this case alter your sequence to no cache. You are in a position of taking a negligible performance hit to fix a logic issue so you might as well.
As people mentioned: Gaps shouldn't be a problem, so if you are requiring no gaps you are doing something wrong. (But I don't think this is what you want).
Reducing the cache should reduce the number and decrease the performance of the sequence especially with concurrent access to it. (which shouldn't be a problem in your use case).
Changing the sequence using the alter sequence statement (http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_2011.htm) should not reset the current/next val of the sequence.
I have method in my BLL that interacts with the database and retrieves data based on the defined criteria.
The returned data is a collection of FAQ objects which is defined as follows:
FAQID,
FAQContent,
AnswerContent
I would like to cache the returned data to minimize the DB interaction.
Now, based on the user selected option, I have to return either of the below:
ShowAll: all data.
ShowAnsweredOnly: faqList.Where(Answercontent != null)
ShowUnansweredOnly: faqList.Where(AnswerContent != null)
My Question:
Should I only cache all data returned from DB (e.g. FAQ_ALL) and filter other faqList modes from cache (= interacting with DB just once and filter the data from the cache item for the other two modes)? Or should I have 3 cache items: FAQ_ALL, FAQ_ANSWERED and FAQ_UNANSWERED (=interacting with database for each mode [3 times]) and return the cache item for each mode?
I'd be pleased if anyone tells me about pros/cons of each approach.
Food for thought.
How many records are you caching, how big are the tables?
How much mid-tier resources can be reserved for caching?
How many of each type data exists?
How fast filtering on the client side will be?
How often does the data change?
how often is it changed by the same application instance?
how often is it changed by other applications or server side jobs?
What is your cache invalidation policy?
What happens if you return stale data?
Can you/Should you leverage active cache invalidation, like SqlDependency or LinqToCache?
If the dataset is large then filtering on the client side will be slow and you'll need to cache two separate results (no need for a third if ALL is the union of the other two). If the data changes often then caching will return stale items frequently w/o a proactive cache invalidation in place. Active cache invalidation is achievable in the mid-tier if you control all the updates paths and there is only one mid-tier instance application, but becomes near really hard if one of those prerequisites is not satisfied.
It basically depends how volatile the data is, how much of it there is, and how often it's accessed.
For example, if the answered data didn't change much then you'd be safe caching that for a while; but if the unanswered data changed a lot (and more often) then your caching needs might be different. If this was the case it's unlikely that caching it as one dataset will be the best option.
It's not all bad though - if the discrepancy isn't too huge then you might be ok cachnig the lot.
The other point to think about is how the data is related. If the FAQ items toggle between answered and unanswered then it'd make sense to cache the base data as one - otherwise the items would be split where you wanted it together.
Alternatively, work with the data in-memory and treat the database as an add-on...
What do I mean? Well, typically the user will hit "save" this will invoke code which saves to the DB; when the next user comes along they will invoke a call which gets the data out of the DB. In terms of design the DB is a first class citizen, everything has to go through it before anyone else gets a look in. The alternative is to base the design around data which is held in-memory (by the BLL) and then saved (perhaps asynchronously) to the DB. This removes the DB as a bottleneck but gives you a new set of problems - like what happens if the database connection goes down or the server dies with data only in-memory?
Pros and Cons
Getting all the data in one call might be faster (by making less calls).
Getting all the data at once if it's related makes sense.
Granularity: data that is related and has a similar "cachability" can be cached together, otherwise you might want to keep them in separate cache partitions.
Context
My current project is a large-ish public site (2 million pageviews per day) site running a mixture of asp classic and asp.net with a SQL Server 2005 back-end. We're heavy on reads, with occasional writes and virtually no updates/deletes. Our pages typically concern a single 'master' object with a stack of dependent (detail) objects.
I like the idea of returning all the data required for a page in a single proc (and absolutely no unnecesary data). True, this requires a dedicated proc for such pages, but some pages receive double-digit percentages of our overall site traffic so it's worth the time/maintenance hit. We typically only consume multiple-recordsets from our .net code, using System.Data.SqlClient.SqlDataReader and it's NextResult method. Oh, yeah, I'm not doing any updates/inserts in these procs either (except to table variables).
The question
SQL Server (2005) procs which return multiple recordsets are working well (in prod) for us so far but I am a little worried that multi-recordset procs are my new favourite hammer that i'm hitting every problem (nail) with. Are there any multi-recordset sql server proc gotchas I should know about? Anything that's going to make me wish I hadn't used them? Specifically anything about it affecting connection pooling, memory utilization etc.
Here's a few gotchas for multiple-recordset stored procs:
They make it more difficult to reuse code. If you're doing several queries, odds are you'd be able to reuse one of those queries on another page.
They make it more difficult to unit test. Every time you make a change to one of the queries, you have to test all of the results. If something changed, you have to dig through to see which query failed the unit test.
They make it more difficult to tune performance later. If another DBA comes in behind you to help performance improve, they have to do more slicing and dicing to figure out where the problems are coming from. Then, combine this with the code reuse problem - if they optimize one query, that query might be used in several different stored procs, and then they have to go fix all of them - which makes for more unit testing again.
They make error handling much more difficult. Four of the queries in the stored proc might succeed, and the fifth fails. You have to plan for that.
They can increase locking problems and incur load in TempDB. If your stored procs are designed in a way that need repeatable reads, then the more queries you stuff into a stored proc, the longer it's going to take to run, and the longer it's going to take to return those results back to your app server. That increased time means higher contention for locks, and the more SQL Server has to store in TempDB for row versioning. You mentioned that you're heavy on reads, so this particular issue shouldn't be too bad for you, but you want to be aware of it before you reuse this hammer on a write-intensive app.
I think multi recordset stored procedures are great in some cases, and it sounds like yours maybe one of them.
The bigger (more traffic), you site gets, the more important that 'extra' bit of performance is going to matter. If you can combine 2-3-4 calls (and possibly a new connections), to the database in one, you could be cutting down your database hits by 4-6-8 million per day, which is substantial.
I use them sparingly, but when I have, I have never had a problem.
I would recommend having invoking in one stored procedure several inner invocations of stored procedures that return 1 resultset each.
create proc foo
as
execute foobar --returns one result
execute barfoo --returns one result
execute bar --returns one result
That way when requirments change and you only need the 3rd and 5th result set, you have a easy way to invoke them without adding new stored procedures and regenerating your data access layer. My current app returns all reference tables (e.g. US states table) if I want them or not. Worst is when you need to get a reference table and the only access is via a stored procedure that also runs an expensive query as one of its six resultsets.