What is preferable, keeping a dataset in session or filling the dataset on each postback?
That would depend on many factors. It is usually best not to keep too many items in session memory if your session is inproc or on a state server because it is less scalable.
If your session resides on the database, it might be better to just requery and repopulate the dataset unless the query is costly to execute.
Don't use the session!!! If the user opens a second tab with a different request the session will be reused and the results will not be the ones that he/she expects. You can use a combination of ViewState and Session but still measure how much you can handle without any sort of caching before you resort to caching.
It depends how heavily you will do that and where your session data is stored on the server.
Keeping the datasets in session surely affects memory usage. All of the session variables are stored in memory, or in SQL server if you use SQL Server state. You can also offload the load to a different machine using the State Server mode.
The serialization/deserialization aspect. If you use insanely large datasets it could influence server seide performance badly.
On the other hand having very large viewstates on pages can be a real performance killer. So I would keep datasets in session or in the cache and look for an "outproc" way to store that session data.
Since I want as few database operations as possible, I will keep the data in session, but I'm not sure what would be the best way to do it. How do I handle concurrency and also since the data is shared when two users simultaneously use the page how can I make sure each of them have their own set of data in the session?
I usually keep it in session if it is not too big and/or the db is far and slow. If the data is big, in general, it's better to reload.
Sometimes I use the Cache, I load from Db the first time and put it in the cache. During postback I check the chache and if it is empty I reload.
So the server manage the cache by itself.
the trouble with the session is that if it's in proc it could disappear which isn't great, if it's state server then you have to serialize and if it's sql sate your're doing a round trip anyway!
if the result set is large do custom paging so that you are only returning a small subset of the total results.
then if you think more than one user will see this result set put each page in the cache as the user pages making sure that the cache is renewed when the data changes or after a while of it not being accessed.
if you don't want to make many round trips and you think you've got the memory then bung the whole dataset in the cache but be careful it doesn't melt the web server.
using the cache means users don't need their own set of data rather they use the global set.
as far as concurrency goes when you load up the insert/ edit page you need to populate it with fresh data and renew the cache after the add/update.
I'm a big believer in decoupling and i rarely, if ever, see the need to throw a full dataset out to the user interface.
You should really only pass objects to the UI which needs to be used so unless you're showing a big diagram or some sort of data structure which needs to display relationships between data it's not worth the cost.
Smaller subsets of data, when applicable, is far more efficient. Is your application actually using all features within a dataset on the UI? if not, then the best way to proceed would be to only pass the data out that you're displaying.
If you're binding data to a control and sorting/paging etc through it, you can implement a lot of the interfaces which enables the dataset to support this, in a lot smaller piece of code.
On that note, i'd keep data, which is largely static (e.g. it doesn't update that often) in the cache. So you need to look at how often the data is updated before you can really make a decision for this.
Lastly, i'll say this again, i see the need to utilise a dataset in the UI very, very rarely..it's a very heavy data object and the cost benefits of looking at your data requirements, versus ease of implementation, could far outweigh the need to cache a dataset.
IMHO, datasets are rather bad to use if you're worried about performance or memory utilisation.
Your answer doesn't hint at the use of the data. Is it reference data? Is the user actively updating it? Is more than one user meant to have update access at any one time?
Without any better information than you provided, go with the accepted wisdom that says keeping large amounts of data in session is a way to guarantee that your application will not scale and will require hefty resources to serve a handful of people.
There's are usually better ways to manage large datasets without resorting to loading all the data in-memory.
Failing that, if your application data needs are truly monsterous, then consider a heavy client with a web service back-end. It may be better suited than making a web page.
As other answers have noted, the answer "depends". However I will assume that you are talking about data that is shared between users. In that case you should not store the dataset in the user's session, but repopulate on postback.
In addition, you should populate a cache near the data access layer of your application to increase interactive performance and reduce load on the database. The design of your cache will again depend on the type of data (read-only vs. read/write vs. read-mostly).
This approach decouples the user session (a UI-layer concept) from data management, and provides the added benefit of supporting shared use of cached data (in cases where data is common across users).
Neither--don't "keep" anything! Display only what a user needs to see and build an efficient paging scheme to see rows backwards or fowards.
rp
Keep in session. Have option to repopulate if you need to, for example if the session memory is wiped.
Related
Say I need to populate 4 or 5 dropdowns w/ items from a database. Each drop down will have < 15 items in it. These items almost never change.
Now I could query the DB each time the page is accessed or I could grab the values from a custom class that would check to see if they already exist in ASP.Net's cache and only if they don't query the DB to update the cache.
It's trivial for me to write but I'm unsure if the performace would be better or not. I think it would be (although not likely anything huge).
What do you think?
When dealing with performance issues you should always:
Do things the simplest way first (avoid premature optimisation)
Performance test your code with set performance goals (e.g. 200ms response time under load of N concurent users)
Then, IF your code doesn't perform then profile your code to determine what is slow, and profile your proposed performance fixes to accurately measure what the real-world performance change will be.
Having said that then yes, what you are suggesting seems sensible (you would usually expect an in-memory cache to be quicker than a database), however it also depends on what data is being returned, what the memory load of your application is, how expensive the query is, what the query parameters are etc...
You should performance test your changes before and after to determine the actual effect of your changes (including things like memory load), and you should only really be doing things like this once you have identified that these dropdowns are the cause of an unacceptable performance problem.
That's what System.Web.Helpers.WebCache class exists for.
IO is usually more expensive than memory operations (by orders of magnitude). Especially if your database is in another machine, then you would even be using network resources, and it will definitely be faster to just use the cache.
But indeed, optimize in the end when you have really identified it as a performance bottleneck by measuring.
Quick answer to your question:
Use the built in .Net cache.
Additional points to ponder over..
Preferably, retrieve all master data in a single database retrieval (think stored procedure and dataset): though, I do not advocate the used of stored procs in all scenarios.
As you rightly said, ensure that your data access layer checks the cache before making a round trip to the database
Also, as your drop down values do not change very often; do remember to keep a long expiry duration
Finally, based on your page design you could also look at Fragment Caching (partial page caching: user controls) which could give you bigger benefits since now you neither access the data cache nor the database.
Performance:
Again, the performance depends more on the application's load as compared to your direct round trips for fetching the master data. Put simply, As Thomas suggested use the cache class!
So there seems not be any pretty answer to the question of how pass data between multiple pages. After having done a little homework here's why (or at least what I've gleaned):
ViewState variables don't persist across pages.
Session variables are volatile and must be used sparingly.
Cookies have potential safety issues and take time and must be kept small.
Storing vars in the URL has limits to the amount of data and can be unsafe.
Storing vars temporarily in a db is a real pita because you add one table per object that might be potentially passed to another page.
So far it is looking like I will be using hidden fields to pass a keyid and unique id to the next page and then retrieve the data from the db. What are your thoughts on all of this? What is the best way to go about doing any of it? I am early in the development of this app, so making changes now is preferred.
edit: I am anticipating a lot of users using this application at any one time, does that affect whether or not I should be using SQL Server based Session?
If you want to persist state, yes store it in the database since you don't have to worry about an expiration. The Session is similar except you have to worry about the Session expiring. In both cases concurrent calls that write similar data to the same area could cause problems and need to be accounted for.
Session is good when you don't have to worry about multiple web servers or timeout issues. The database gives you more scalability but at a cost of doing lots of db read/writes and you have to consider clean up.
Personally I would try to use the following decision tree:
Is the data simple, short and not private -> query string
Is the data less simple but only needs to exist for a short time -> session
Will the data be needed across multiple area and be persistent for long period of time -> database
Of course there is more to it that this but that should give you a basic outline of considerations since you are just starting out. Keep it simple. Don't try to over engineer a solution if a simple query string will suffice. You can always over engineer late as long as you have kept it simple to start.
I think context is important here, e.g. what are you trying to pass between pages and why?
If you are dealing with complex, multi-part forms, then you can implement the form in a single page, simply showing or hiding relevant element. Use usercontrols and custom controls as much as possible to facilitate isolation and reusability. This makes life a lot easier across the board.
Anything that is user-generated is almost certainly going to end up in a database anyway - so #5 does not seem relevant. That is you shouldn't have to store data "temporarily" in a database- what data would need to be persisted between pages that isn't part of your application.
Anything else would seem to be session related and not that much data.
I could add some more thoughts if I knew what specifically you were dealing with.
Oh - "cookies have potential safety issues and take time" - you're going to use cookies, unless you don't want to be able to identify return visitors. Any potential safety issues would only be a result of bad implementation, and certainly passing data in hidden fields is no better. And you really don't want to get into writing an ASP.NET app that is designed around pages posting to forms other than itself. That's just a headache for many reasons and I can't think of a benefit of doing this as part of basic application design.
Session variables should work fine for your needs.
I would go with StateServer or SQLServer Session state mode. Using mode InProc is the fastest, but it has some issues (including all user sessions getting dropped when a new binary is pushed, web.config changes, etc). Sessions are volatile, but you can control the volatility in several ways. Sessions require cookies unless they are configured as cookieless (which I highly recommend you stay away from), but I think that is a reasonable requirement.
Also, you can create a struct or serializable class from which you create objects that you can store in a session variable. The struct or class will allow you to keep all of your data in one place - you only have one session variable to worry about.
There is going to be advantages and disadvantages for any method, it's all about finding the best method. I hope this helps.
All methods have their pros and cons. It would all depend on the scenario you are working in.
Session variables work quite well if used within reason. InProc sessions in traffic heavy sites can quickly drain your resources but you can always switch to SQL Server based session that does most of the DB work for you.
I have method in my BLL that interacts with the database and retrieves data based on the defined criteria.
The returned data is a collection of FAQ objects which is defined as follows:
FAQID,
FAQContent,
AnswerContent
I would like to cache the returned data to minimize the DB interaction.
Now, based on the user selected option, I have to return either of the below:
ShowAll: all data.
ShowAnsweredOnly: faqList.Where(Answercontent != null)
ShowUnansweredOnly: faqList.Where(AnswerContent != null)
My Question:
Should I only cache all data returned from DB (e.g. FAQ_ALL) and filter other faqList modes from cache (= interacting with DB just once and filter the data from the cache item for the other two modes)? Or should I have 3 cache items: FAQ_ALL, FAQ_ANSWERED and FAQ_UNANSWERED (=interacting with database for each mode [3 times]) and return the cache item for each mode?
I'd be pleased if anyone tells me about pros/cons of each approach.
Food for thought.
How many records are you caching, how big are the tables?
How much mid-tier resources can be reserved for caching?
How many of each type data exists?
How fast filtering on the client side will be?
How often does the data change?
how often is it changed by the same application instance?
how often is it changed by other applications or server side jobs?
What is your cache invalidation policy?
What happens if you return stale data?
Can you/Should you leverage active cache invalidation, like SqlDependency or LinqToCache?
If the dataset is large then filtering on the client side will be slow and you'll need to cache two separate results (no need for a third if ALL is the union of the other two). If the data changes often then caching will return stale items frequently w/o a proactive cache invalidation in place. Active cache invalidation is achievable in the mid-tier if you control all the updates paths and there is only one mid-tier instance application, but becomes near really hard if one of those prerequisites is not satisfied.
It basically depends how volatile the data is, how much of it there is, and how often it's accessed.
For example, if the answered data didn't change much then you'd be safe caching that for a while; but if the unanswered data changed a lot (and more often) then your caching needs might be different. If this was the case it's unlikely that caching it as one dataset will be the best option.
It's not all bad though - if the discrepancy isn't too huge then you might be ok cachnig the lot.
The other point to think about is how the data is related. If the FAQ items toggle between answered and unanswered then it'd make sense to cache the base data as one - otherwise the items would be split where you wanted it together.
Alternatively, work with the data in-memory and treat the database as an add-on...
What do I mean? Well, typically the user will hit "save" this will invoke code which saves to the DB; when the next user comes along they will invoke a call which gets the data out of the DB. In terms of design the DB is a first class citizen, everything has to go through it before anyone else gets a look in. The alternative is to base the design around data which is held in-memory (by the BLL) and then saved (perhaps asynchronously) to the DB. This removes the DB as a bottleneck but gives you a new set of problems - like what happens if the database connection goes down or the server dies with data only in-memory?
Pros and Cons
Getting all the data in one call might be faster (by making less calls).
Getting all the data at once if it's related makes sense.
Granularity: data that is related and has a similar "cachability" can be cached together, otherwise you might want to keep them in separate cache partitions.
I'm doing a survey builder, and I think to store the survey in session with a unique guid key until the user creates it fully and saves it I'm thinking it is going to be an array of 100~200 objects (8 properties class)
That sounds like a fair use of Session.
Whether your data is too large depends on quite a few things, such as your web server's memory. The best thing to do is test the performance using Session. If you find your data is too heavy for Session have a look at ASP.NET Profile.
That doesn't sound like that much data unless we are talking about reams of text for each answer. I would not worry about it unless I was working on a web site expected to have thousands of these open at any given point in time.
IMHO I think the data should be stored in something other than the Session.
Session objects can disappear for a myriad of reasons. Would your users be annoyed if their answers are not persisted and needs to start afresh.
Remember to write the data to a persistent store (DB, files, etc) as soon as possible unless the users don't mind starting over.
I agree with ggonsalv. I would store the data somewhere just in case the session is lost. I have been to sites where I fill stuff out and then it looses it near the end. It's not fun to start over.
We use ViewState to store data that is being edited by users. However, ViewState just got too large so we really want something that is faster.
We considered session, but it has to be manually collected when users travel among pages.
Does anybody has suggestions?
Update:
Actually I am using Asp.net. They reason we do not want to use session is:
1. We don't need to carry our data among pages.
2. When developer put something into session, he has to remember to delete it if it is no longer useful. Otherwise, the session will get bigger and bigger. This is kind of trival.
You say you want the data stored server side and its should be automatically available?
You could trick the viewstate into storing its data in session rather than a hidden field by using this technique:
http://aspadvice.com/blogs/robertb/archive/2005/11/16/13835.aspx
You might find this article interesting as well which shows another technique to store your viewstate server side:
http://msdn.microsoft.com/en-us/magazine/cc188774.aspx#S6
Despite the initial complexity of getting this set up I think it would be the best solution because then you don't have to change your code throughout, it can still use ViewState as normal without realising this is now saved on the server.
You can look into Conversations.
http://evolutionarygoo.com/blog/?p=68
http://code.google.com/p/google-guice/issues/detail?id=5
You can use the database as a alternate to the session store. This would scale in terms of the size of the data stored, and if you use an appropriate caching strategy you can reduce the overhead of retrieving the data a great deal.