I have method in my BLL that interacts with the database and retrieves data based on the defined criteria.
The returned data is a collection of FAQ objects which is defined as follows:
FAQID,
FAQContent,
AnswerContent
I would like to cache the returned data to minimize the DB interaction.
Now, based on the user selected option, I have to return either of the below:
ShowAll: all data.
ShowAnsweredOnly: faqList.Where(Answercontent != null)
ShowUnansweredOnly: faqList.Where(AnswerContent != null)
My Question:
Should I only cache all data returned from DB (e.g. FAQ_ALL) and filter other faqList modes from cache (= interacting with DB just once and filter the data from the cache item for the other two modes)? Or should I have 3 cache items: FAQ_ALL, FAQ_ANSWERED and FAQ_UNANSWERED (=interacting with database for each mode [3 times]) and return the cache item for each mode?
I'd be pleased if anyone tells me about pros/cons of each approach.
Food for thought.
How many records are you caching, how big are the tables?
How much mid-tier resources can be reserved for caching?
How many of each type data exists?
How fast filtering on the client side will be?
How often does the data change?
how often is it changed by the same application instance?
how often is it changed by other applications or server side jobs?
What is your cache invalidation policy?
What happens if you return stale data?
Can you/Should you leverage active cache invalidation, like SqlDependency or LinqToCache?
If the dataset is large then filtering on the client side will be slow and you'll need to cache two separate results (no need for a third if ALL is the union of the other two). If the data changes often then caching will return stale items frequently w/o a proactive cache invalidation in place. Active cache invalidation is achievable in the mid-tier if you control all the updates paths and there is only one mid-tier instance application, but becomes near really hard if one of those prerequisites is not satisfied.
It basically depends how volatile the data is, how much of it there is, and how often it's accessed.
For example, if the answered data didn't change much then you'd be safe caching that for a while; but if the unanswered data changed a lot (and more often) then your caching needs might be different. If this was the case it's unlikely that caching it as one dataset will be the best option.
It's not all bad though - if the discrepancy isn't too huge then you might be ok cachnig the lot.
The other point to think about is how the data is related. If the FAQ items toggle between answered and unanswered then it'd make sense to cache the base data as one - otherwise the items would be split where you wanted it together.
Alternatively, work with the data in-memory and treat the database as an add-on...
What do I mean? Well, typically the user will hit "save" this will invoke code which saves to the DB; when the next user comes along they will invoke a call which gets the data out of the DB. In terms of design the DB is a first class citizen, everything has to go through it before anyone else gets a look in. The alternative is to base the design around data which is held in-memory (by the BLL) and then saved (perhaps asynchronously) to the DB. This removes the DB as a bottleneck but gives you a new set of problems - like what happens if the database connection goes down or the server dies with data only in-memory?
Pros and Cons
Getting all the data in one call might be faster (by making less calls).
Getting all the data at once if it's related makes sense.
Granularity: data that is related and has a similar "cachability" can be cached together, otherwise you might want to keep them in separate cache partitions.
Related
This is my first post/question so please let me know if/how I can improve it. I found similar questions but nothing quite covered this.
When you store to InProc session you're just storing a reference to the data. So, if I have a public property foo, and I store it in Session("foo") = foo, then I haven't really taken up any additional memory (aside from the 32/64 bits used by the pointer)?
In my case, we are currently reloading foo on every page of our website, so if I were to instead store it in session, then it should be taking the same about of space, but not needing to reload on every page. I'm seeing a lot of people say not to store large objects in session, but if that large object already exists, what difference does it make to have a pointer to it? Of course I would remove the object from session the moment it was no longer needed.
The data we are trying to store is an object specific to the user's current work, but not user data. As an analogy, say the user was a car dealer, and he is looking at all the data for a particular customer. We have multiple pages for this customer, and we want to keep all the customer info loaded on each page, All the customer data is stored in a single xml data column in a SQL table, which we parse on every page.
We have tried binary serialization instead of parsing xml, so we could store with session in state server mode, but we found the performance to actually be worse.
We are running on a single web server.
First off, no. When you store something in the session state all the data required to store that object is consumed by the website process(s). Just because .NET treats variables like references doesn't mean it actually uses less memory than a no-GC language. It just means that copying that variable around is done efficiently without using reference operators or pointers.
Your question is a bit vague, but you have a few options for persisting data:
1) Send the data to the client as JSON and store it on the browser if it should be per-user and is needed more on the client side than the server side. You can then send pieces of the data with different requests if you need to (put it in hidden fields if you have to use ASPX web forms).
2) Store it in the session state if it is a small bit of per user data.
3) Store it in the ASP.NET cache if it is large and common to all users, see here (https://msdn.microsoft.com/en-us/library/6hbbsfk6.aspx).
4) If it is large and user-specific that is used primarily on the server then you have more of a performance problem. You should see if you can break out any user specific stuff from static stuff. If you do that and its still large then a database may not be a bad solution. If you are already using DB calls in your application then looking up this data on every request won't cause too much overhead and you won't have to regenerate it from scratch (You should only do this if the data takes a considerable time to generate as a DB call could be slower than just regenerating the data itself). I recommend writing some sort of middleware (HttpModule or OwinMiddleware) that uses whatever user Identity you use for auth to look up the data and then set it on the HttpContext.Current.Items collection. This way the data is usable for the entire request and you can add logic in the middleware to figure out when to set it.
I would think that having a large chunk of user-specific data would be a red flag as user data should just be a list of what the user can/can't do and what their preferences are.
If this is static data then its super simple. The application cache is what you want. The only complications would be if you have multiple servers that need synced data.
Say I need to populate 4 or 5 dropdowns w/ items from a database. Each drop down will have < 15 items in it. These items almost never change.
Now I could query the DB each time the page is accessed or I could grab the values from a custom class that would check to see if they already exist in ASP.Net's cache and only if they don't query the DB to update the cache.
It's trivial for me to write but I'm unsure if the performace would be better or not. I think it would be (although not likely anything huge).
What do you think?
When dealing with performance issues you should always:
Do things the simplest way first (avoid premature optimisation)
Performance test your code with set performance goals (e.g. 200ms response time under load of N concurent users)
Then, IF your code doesn't perform then profile your code to determine what is slow, and profile your proposed performance fixes to accurately measure what the real-world performance change will be.
Having said that then yes, what you are suggesting seems sensible (you would usually expect an in-memory cache to be quicker than a database), however it also depends on what data is being returned, what the memory load of your application is, how expensive the query is, what the query parameters are etc...
You should performance test your changes before and after to determine the actual effect of your changes (including things like memory load), and you should only really be doing things like this once you have identified that these dropdowns are the cause of an unacceptable performance problem.
That's what System.Web.Helpers.WebCache class exists for.
IO is usually more expensive than memory operations (by orders of magnitude). Especially if your database is in another machine, then you would even be using network resources, and it will definitely be faster to just use the cache.
But indeed, optimize in the end when you have really identified it as a performance bottleneck by measuring.
Quick answer to your question:
Use the built in .Net cache.
Additional points to ponder over..
Preferably, retrieve all master data in a single database retrieval (think stored procedure and dataset): though, I do not advocate the used of stored procs in all scenarios.
As you rightly said, ensure that your data access layer checks the cache before making a round trip to the database
Also, as your drop down values do not change very often; do remember to keep a long expiry duration
Finally, based on your page design you could also look at Fragment Caching (partial page caching: user controls) which could give you bigger benefits since now you neither access the data cache nor the database.
Performance:
Again, the performance depends more on the application's load as compared to your direct round trips for fetching the master data. Put simply, As Thomas suggested use the cache class!
Background
I am writing a survey that is going to a large audience. It contains 15 questions and there are five possible answers to each question along with potential comments.
The user can cycle through all 15 questions answering them in any order and is allowed to leave the survey at any point and return to answer the remaining questions.
Once an answer has been attempted on all 15 questions a submit button appears which allows them to submit the questions as final answers. Until that stage all answers are required to be retrievable whenever the user loads the survey page up.
The requirement is that the user only sees one question on a page and 'Previous' and 'Next' buttons allow the user to scroll through the questions.
Requirement
I could request the question each time the user clicks a button and save the current response and so on but that would be a large number of hits to a database that is already heavily used. I don't have the time to procure a new server etc so I have to make do with what I have. Is there any way I can cache the questions on the user machine and/or responses? Obviously I need the response data to be secure and only known to the user so I feel a little bit stuck as for the best way of doing this. Any pointers?
I am prepared to offer a bounty of 100 points on this question if it means I get some good quality discussion and feedback going.
Unless there's a reason for using a database, you could always store the results in flat files on the server itself. It doesn't sound like the data you're storing is relational in any way. Worst comes to worst, you could always insert them back into a relational db as a batch job every night.
Another option would be the application cache. However, if your web server suddenly crashes on you, you risk losing information from there.
You could also store the values in the user's cookies.
Based on my personal experience (serving thousands of short survey pages per second) I suspect your fears are unfounded. Among other reasons, the DBMS will cache such small amounts of data far more efficiently that you can.
I've tested this, loading the questions and answers into an Application-scope collection at start up, and serving them from memory after that - often it made no difference at all.
Your alternative is to send everything at once to the browser, and write it as a javascript application, storing the data in (encrypted) cookies and only hitting the database when the whole thing is done. This is tedious but not difficult.
You have three requirements that need to be balanced:
users must be able to return to their survey at any time
answers entered by users must be saved with the least possible chance of data loss
need to minimize database hits
Any solution that involves caching answers in a volatile place (cookies, session, etc) will increase the risk of data loss. The final solution depends on how you rank the three requirements in importance. If the db issue is at the top, then you will either need to risk data loss, or spend a lot of extra time coding a solution using some temporary storage scheme (like Kevin's flat file idea).
A couple of folks suggested that you may be optimizing prematurely. I suggest you consider that idea first - maybe this whole thing is moot.
However, assuming that your db situation is a real problem, I think your best balance of requirements will be a system that saves answer to the db immediately (to prevent data loss) but carefully manages when you actually have to hit the db.
When the app starts up (or when the first user requests the survey) load the survey and its questions into application cache. If any of the questions have a pick list of possible answers, load these also. You will only have to hit the db once during the application lifetime (or your cache duration) to load survey data.
When a user starts their survey, run a single query to load any existing answers (in case they are a returning user) into an object in session - could be as simple as a <List>string. (If you can somehow identify a new user without having to hit the db, then you can skip this step for new users.)
Use the session answer object along with the survey question object in app cache to populate each page without hitting the db again.
When the user submits an answer, compare it to the session answer object to see if it has changed (she may be just clicking 'next' on a page with a previously entered answer). If the answer is new, or has changed, the save it to the db and to the session answer object.
When the user leaves the survey, you don't need to do anything - everything is saved already.
With this scheme, you hit the db once to load the survey, once for each user when they start (or restart) the survey, and once for each new or modified answer. Probably not as much of a reduction as you were hoping for, but it gives you the best data protection.
If the database trips are a problem, you can cache them in the web server (or wherever your application resides) but it sounds like each answer needs to be recorded as the user goes to the next question.
If the questions and possible answers are identical for everyone, I would definitely cache them in the application layer - this can be stored in the Application object. In any case, you could certainly optimize the database calls to return the results as efficiently as possible - i.e. multiple result sets or a joined result set from a single stored proc. If you don't mind multiple copes for each session (or if there is variation), you can stored it in the Session object. Storing it on the client (i.e. a cookie) is not really secure and kind of pointless from a web server-client bandwidth saving persepective.
This sounds a lot like premature optimization to me, though.
Your scenario is a perfect candidate for Predictive Fetch Pattern. I would suggest that you cache all your questions. When the user signs in use the pattern to fetch the first 5 answers (if they have given any answers) and based on their navigation (where their current question is) get the information from the Response object or from the DB.
HTH
Not sure of the languages etc you are using, but most have an application cache. I would store the questions there, and retrieve them from the database and store them when they are not in the cache (when the application recycles).
As for the answers, are the users logging in some how? Is it feasible to save answers in a cookie until all questions are answered?
Edit:
If cookies aren't reliable enough, you could store (in the application cache) a list of queries (inserts/updates) to be executed, they would not be executed until an a query limit was reached or under certain conditions (i.e. execute the query list when a user requests answers that are in the list, execute list when the application recycles, etc).
Pretty crude, but you get the idea:
if (function == "get question" && userQuestionIsInQueue) || function == "finish survey"
execute(Application["querylist"]);
continue as normal...
if function == "submit answer"
if Application["querylist"] == null
Application["querylist"] = newAnswerQuery;
else
Application["querylist"] += newAnswerQuery;
You'd also need to add execute(Application["querylist"]) to the recycle event, I believe you can hook it in the global.asax
Edit 2:
I would also accumulate all database transactions for a request into 1, you if you did have to execute the list, then followed by getting the answer for the user, do them in the same transaction and save a trip. Common practice when optimizing.
This is a classic problem to do with maintaining state between pages in a browser based system. Im also assuming that we want this data to persist even if the user logs out and comes back later. Here are the options:
With a high availability server we can keep a single collection of 15 answers in memory (not session) for this user (probably not a good idea and not easily load balanced)
We denormalise the 15 answers into 1 row of a sql table
We persist the data on the client using a cookie or localStorage (IE8).
My feeling is that the first two options are probably not what you are looking for, so lets explore the last option.
You could quite simply store the answers in a cookie. There is a small chance that this could get lost, and that the user may log in from another machine, but this may be an acceptable risk. With with latest browsers that support HTML5 (inc IE8 afaik) you get the benefit of localStorage which is not as easily deleted as a cookie. You could fall back to cookies if this wasnt available.
Cookies can be encrypted if required.
I would like to offer you the new feature of HTML5 which is called Dom Storage but since only the new browsers are supporting it, it could be a problem using it at this point.
With DOM Storage, you can store data on user browser. Since it can store up to 5MB per domain in Mozilla Firefox[3], Google Chrome, and Opera, 10MB per storage area in Internet Explorer, you can store answers and question ids in the DOM Storage.
Even with DOM Storage, let alone Database hit, you can reduce server hits as well.
Since we all know working with cookies is hassle sometimes and it can store 4kb, the easiest way is now to store key-value information in DOM Storage.
You can store key-value information specifically for sessions as well as locally. When session ends, the session based info will be wiped off from the browser but if you store local based values, even the user closes the tab, the key-value will remain for a while.
Example Code:
<p>
You have viewed this page
<span id="count">an untold number of</span>
time(s).
</p>
<script>
var storage = window.localStorage;
if (!storage.pageLoadCount) storage.pageLoadCount = 0;
storage.pageLoadCount = parseInt(storage.pageLoadCount, 10) + 1;
document.getElementById('count').innerHTML = storage.pageLoadCount;
</script>
You can learn more about DOM Storage from the links below :
https://developer.mozilla.org/en/DOM/Storage
http://en.wikipedia.org/wiki/Web_Storage
http://msdn.microsoft.com/en-us/library/cc197062%28VS.85%29.aspx
do you mean...a cookie?
I'm having an internal debate about where I should handle checking for changes to data and can't decide on the practice that makes the most sense:
Handling IsChanged in the GUI - This requires persistence of data between page load and posting of data which is potentially a lot of bandwidth/page delivery overhead. This isn't so bad in a win forms application, but in a website, this could start having a major financial impact for bandwidth costs.
Handling it in the DAL - This requires multiple calls to the database to check if any data has changed prior to saving it. This potentially means an extra needless call to the database potentially causing scalability issues by needless database queries.
Handling it in a Save() stored proc - This would require the stored proc to potentially make an extra needless call to the table to check, but would save the extra call from the DAL to the database. This could potentially scale better than having the DAL handle it, but my gut says this can be done better.
Handling it in a trigger - This would require using a trigger (which I'm emotionally averse to, I tend to avoid triggers except where absolutely necessary).
Don't handle IsChanged functionality at all - It not only becomes hard to handle the "LastUpdated" date, but saving data unnecessarily to the database seems like a bad practice for scalability in itself.
So each approach has its drawbacks and I'm at a loss as to which is the best of this bad bunch. Does anyone have any more scalable ideas for handling data persistence for the specific purpose of seeing if anything has changed?
Architecture: SQL Server 2005, ASP.NET MVC, IIS7, High scalability requirements for non-specific global audience.
Okay, here's another solution - I've not thought through all the repercussions, but it could work I think:
Think about the GetHashCode() comparison functionality:
At page load time, you calculate the hash code for your page data. You store the hashcode in the page data or viewstate/session if that's what you prefer.
At data post (postback) you calculate the hash code of the data that was posted and compare it to the original hash. If it's different, the user changed something and you can save it back to the database.
Pros
You don't have to store all your original data in the page load which cuts down on bandwidth/page delivery overhead.
You don't have to have your DAL do multiple calls to the database to determine if something's changed.
The record will only be updated in the database if something's changed and hence maintain your correct LastUpdated date.
Cons
You still have to load any original data from the database into your business object that wasn't stored in the "viewstate" that is necessary to save a valid record to your database.
Change of one field will change the hash, but you don't know which field unless you call the original data from the database to compare. On a side note, perhaps you don't need to. If you've gotta update any of the fields the timestamp changes and overwriting a field that hasn't changed for all intensive purposes doesn't have any effect.
You can't completely rule out the chance of collisions but they would be rare. This comes down to is the repercussion of a collision acceptable or not?
Either/Or
If you store the hash in the session, then that saves bandwidth, but increases server resources so you have a potential scalability issue in either case to consider.
Unknowns
Is the overhead of updating a single column and different than that of updating multiple/all columns in a record? I don't know what that performance overhead is.
I handle it in the DAL - it has the original values in it so no need to go to the database.
For each entity in your system introduce additional Version field. With this field you will be able to check for changes at the database level.
Since you have a web application and usually scalability matters for web application, I would suggest you to avoid IsChanged logic at the UI level. LastUpdated date can be set at the database level during Save operation.
What is preferable, keeping a dataset in session or filling the dataset on each postback?
That would depend on many factors. It is usually best not to keep too many items in session memory if your session is inproc or on a state server because it is less scalable.
If your session resides on the database, it might be better to just requery and repopulate the dataset unless the query is costly to execute.
Don't use the session!!! If the user opens a second tab with a different request the session will be reused and the results will not be the ones that he/she expects. You can use a combination of ViewState and Session but still measure how much you can handle without any sort of caching before you resort to caching.
It depends how heavily you will do that and where your session data is stored on the server.
Keeping the datasets in session surely affects memory usage. All of the session variables are stored in memory, or in SQL server if you use SQL Server state. You can also offload the load to a different machine using the State Server mode.
The serialization/deserialization aspect. If you use insanely large datasets it could influence server seide performance badly.
On the other hand having very large viewstates on pages can be a real performance killer. So I would keep datasets in session or in the cache and look for an "outproc" way to store that session data.
Since I want as few database operations as possible, I will keep the data in session, but I'm not sure what would be the best way to do it. How do I handle concurrency and also since the data is shared when two users simultaneously use the page how can I make sure each of them have their own set of data in the session?
I usually keep it in session if it is not too big and/or the db is far and slow. If the data is big, in general, it's better to reload.
Sometimes I use the Cache, I load from Db the first time and put it in the cache. During postback I check the chache and if it is empty I reload.
So the server manage the cache by itself.
the trouble with the session is that if it's in proc it could disappear which isn't great, if it's state server then you have to serialize and if it's sql sate your're doing a round trip anyway!
if the result set is large do custom paging so that you are only returning a small subset of the total results.
then if you think more than one user will see this result set put each page in the cache as the user pages making sure that the cache is renewed when the data changes or after a while of it not being accessed.
if you don't want to make many round trips and you think you've got the memory then bung the whole dataset in the cache but be careful it doesn't melt the web server.
using the cache means users don't need their own set of data rather they use the global set.
as far as concurrency goes when you load up the insert/ edit page you need to populate it with fresh data and renew the cache after the add/update.
I'm a big believer in decoupling and i rarely, if ever, see the need to throw a full dataset out to the user interface.
You should really only pass objects to the UI which needs to be used so unless you're showing a big diagram or some sort of data structure which needs to display relationships between data it's not worth the cost.
Smaller subsets of data, when applicable, is far more efficient. Is your application actually using all features within a dataset on the UI? if not, then the best way to proceed would be to only pass the data out that you're displaying.
If you're binding data to a control and sorting/paging etc through it, you can implement a lot of the interfaces which enables the dataset to support this, in a lot smaller piece of code.
On that note, i'd keep data, which is largely static (e.g. it doesn't update that often) in the cache. So you need to look at how often the data is updated before you can really make a decision for this.
Lastly, i'll say this again, i see the need to utilise a dataset in the UI very, very rarely..it's a very heavy data object and the cost benefits of looking at your data requirements, versus ease of implementation, could far outweigh the need to cache a dataset.
IMHO, datasets are rather bad to use if you're worried about performance or memory utilisation.
Your answer doesn't hint at the use of the data. Is it reference data? Is the user actively updating it? Is more than one user meant to have update access at any one time?
Without any better information than you provided, go with the accepted wisdom that says keeping large amounts of data in session is a way to guarantee that your application will not scale and will require hefty resources to serve a handful of people.
There's are usually better ways to manage large datasets without resorting to loading all the data in-memory.
Failing that, if your application data needs are truly monsterous, then consider a heavy client with a web service back-end. It may be better suited than making a web page.
As other answers have noted, the answer "depends". However I will assume that you are talking about data that is shared between users. In that case you should not store the dataset in the user's session, but repopulate on postback.
In addition, you should populate a cache near the data access layer of your application to increase interactive performance and reduce load on the database. The design of your cache will again depend on the type of data (read-only vs. read/write vs. read-mostly).
This approach decouples the user session (a UI-layer concept) from data management, and provides the added benefit of supporting shared use of cached data (in cases where data is common across users).
Neither--don't "keep" anything! Display only what a user needs to see and build an efficient paging scheme to see rows backwards or fowards.
rp
Keep in session. Have option to repopulate if you need to, for example if the session memory is wiped.