I have a dataset that contains around 500 records with 7 columns in each record. I would like to use objectdatasource and caching. I am sure that the data doesn't change. But there will be lots of users accessing the data. My question is whether it would be a good idea to cache the data that has 500 records. is it optimal or not? I think the objectdatasource caches the data per user. so I am wondering whether the data with this size would have performance issues if I cache the data.
Thanks,
sridhar.
The answer really depends on the size of the records, not just the number of them. In any event, ASP.NET's caching algorithm uses a Least Recently Used (LRU) algorithm to evict out items that have not been used recently to make room for additional items. Thus, unless your site is either extremely busy or your result set extremely large, you should be fine to cache the result set and let ASP.NET's cache handle the details of keeping the most used data in the cache and expiring the least used data.
You might also consider using Page level Output Caching if nothing else on the page is user-specific, as this would likely give you slightly more performance for minimal additional memory usage. Other areas to investigate are the control, SqlDependencies, and of course if the data is read-only, be sure you are disabling ViewState on whatever control you're using to display the data.
Related
What is considering good practice using datatables in an asp.net applications?
I need to make multiple queries everytime the user clicks a control. Is it better to go directly to the sql server table or load that data in a datatable and use LINQ to get the data. In this case the table has 10 columns and a 3000+ rows.
That's really a fairly complex question (without a whole lot of detail here). At the highest level, you're trying to balance the optimization of holding data in memory vs. factors like concurrency and memory utilization. I'd bet if you did a little reading on caching strategies, you'd start to get a sense for how you can weigh these tradeoffs.
DataTable is okay, using SqlDataAdapter is slow compared to SqlDataReader. I like to read data into my own custom structures for easy retrieval.
10 columns * 3000 rows is very small and you'd be fine keeping that in memory if it was important data. if you assume 1k per cell, that is only 30k, tiny, and if you have a lot of traffic to the page it will be faster to much faster depending on the speed of the query that retrieves the data from the database.
One thing to keep in mind is you will probably need to think about refreshing your data from time to time, or managing changes to the data. ASP has a Cache object that you can use for this purpose, it allows you set expiration times in various ways.
If the data is subject to change very often and from many different sources, it can be complicated to manage concurrency of changes. When I use a caching strategy I try to use it on non critical data that isn't subject to constant change. This isn't to say it's impossible to cache data that changes a lot, it's just more complicated.
Use data cache, depending on your application size and complexity you will decide on a distributed system or not:
http://www.25hoursaday.com/weblog/CommentView.aspx?guid=3109dc37-49f8-4249-baf1-56d4c6158321
http://www.infoq.com/news/2007/07/memcached
Say I need to populate 4 or 5 dropdowns w/ items from a database. Each drop down will have < 15 items in it. These items almost never change.
Now I could query the DB each time the page is accessed or I could grab the values from a custom class that would check to see if they already exist in ASP.Net's cache and only if they don't query the DB to update the cache.
It's trivial for me to write but I'm unsure if the performace would be better or not. I think it would be (although not likely anything huge).
What do you think?
When dealing with performance issues you should always:
Do things the simplest way first (avoid premature optimisation)
Performance test your code with set performance goals (e.g. 200ms response time under load of N concurent users)
Then, IF your code doesn't perform then profile your code to determine what is slow, and profile your proposed performance fixes to accurately measure what the real-world performance change will be.
Having said that then yes, what you are suggesting seems sensible (you would usually expect an in-memory cache to be quicker than a database), however it also depends on what data is being returned, what the memory load of your application is, how expensive the query is, what the query parameters are etc...
You should performance test your changes before and after to determine the actual effect of your changes (including things like memory load), and you should only really be doing things like this once you have identified that these dropdowns are the cause of an unacceptable performance problem.
That's what System.Web.Helpers.WebCache class exists for.
IO is usually more expensive than memory operations (by orders of magnitude). Especially if your database is in another machine, then you would even be using network resources, and it will definitely be faster to just use the cache.
But indeed, optimize in the end when you have really identified it as a performance bottleneck by measuring.
Quick answer to your question:
Use the built in .Net cache.
Additional points to ponder over..
Preferably, retrieve all master data in a single database retrieval (think stored procedure and dataset): though, I do not advocate the used of stored procs in all scenarios.
As you rightly said, ensure that your data access layer checks the cache before making a round trip to the database
Also, as your drop down values do not change very often; do remember to keep a long expiry duration
Finally, based on your page design you could also look at Fragment Caching (partial page caching: user controls) which could give you bigger benefits since now you neither access the data cache nor the database.
Performance:
Again, the performance depends more on the application's load as compared to your direct round trips for fetching the master data. Put simply, As Thomas suggested use the cache class!
I have an application that holds data referencing 300,000 customers. When a user did a search the result was often bigger than our MaxRequestlength would allow, we have dealt with this in two ways: We have increased our MaxRequestLength to 102400 (KB) and required the user to supply two letters of the first Name and two letters of the last name, to limit the sheer # of customer records returned. This keeps us from exceeding the MaxRequestLength limit.
I was just wondering if anyone had any insight in to whether this was a particularly good approach, whether there is a limit to how big MaxRequestLength could be or should be, and what other options might be useful in this situation.
Most web applications I have seen deal with this by returning a paginated list, and displaying only the first page of results.
In modern implementations using ORM's, "Skip" and "Take" operators are used to retrieve only those records which are required for a given page.
So any given request is no longer than the number of records on one page.
I would recommend paging the results instead of displaying everything. I would also suggest adding multiple search fields allowing your users to filter their results even further. This will allow your user to find what they are looking for faster.
As you can guess from my comment, I think MaxRequestLength only restricts the size of the request (-> the amount of data sent from the client/browser to the server).
If you are exceeding this limit, then this probably means that you have a huge ViewState which is sent with every response. ViewState is stored in a hidden field on the page and is sent back to the server with every PostBack (and that's where the MaxRequestLength setting could come into play). You can easily check this by looking at the source of your page in the web browser and looking for a hidden INPUT element with the name "__VIEWSTATE" and a large string-value.
If this is the case, the you should try to reduce the size of the ViewState, e.g. by
setting ViewState="false" on your controls (GridView or whatever) and re-binding the control on every PostBack (this is the recommended approach)
storing the ViewState on the server side
compressing the ViewState
If your requirements allow it, I would suggest implementing server-side paging. That way you only send one page worth of records over the wire rather than the entire record set.
300,000 records is a completely unusable result set from a human perspective.
As others have said, page the results to something like the top 50 or 100 records. Let them sort it and provide a way to narrow the search criteria.
For perspective, look at google. They default to 10 records per page. Part of the reason for this is that people would rather provide more criteria than go spelunking through a large result set.
What is preferable, keeping a dataset in session or filling the dataset on each postback?
That would depend on many factors. It is usually best not to keep too many items in session memory if your session is inproc or on a state server because it is less scalable.
If your session resides on the database, it might be better to just requery and repopulate the dataset unless the query is costly to execute.
Don't use the session!!! If the user opens a second tab with a different request the session will be reused and the results will not be the ones that he/she expects. You can use a combination of ViewState and Session but still measure how much you can handle without any sort of caching before you resort to caching.
It depends how heavily you will do that and where your session data is stored on the server.
Keeping the datasets in session surely affects memory usage. All of the session variables are stored in memory, or in SQL server if you use SQL Server state. You can also offload the load to a different machine using the State Server mode.
The serialization/deserialization aspect. If you use insanely large datasets it could influence server seide performance badly.
On the other hand having very large viewstates on pages can be a real performance killer. So I would keep datasets in session or in the cache and look for an "outproc" way to store that session data.
Since I want as few database operations as possible, I will keep the data in session, but I'm not sure what would be the best way to do it. How do I handle concurrency and also since the data is shared when two users simultaneously use the page how can I make sure each of them have their own set of data in the session?
I usually keep it in session if it is not too big and/or the db is far and slow. If the data is big, in general, it's better to reload.
Sometimes I use the Cache, I load from Db the first time and put it in the cache. During postback I check the chache and if it is empty I reload.
So the server manage the cache by itself.
the trouble with the session is that if it's in proc it could disappear which isn't great, if it's state server then you have to serialize and if it's sql sate your're doing a round trip anyway!
if the result set is large do custom paging so that you are only returning a small subset of the total results.
then if you think more than one user will see this result set put each page in the cache as the user pages making sure that the cache is renewed when the data changes or after a while of it not being accessed.
if you don't want to make many round trips and you think you've got the memory then bung the whole dataset in the cache but be careful it doesn't melt the web server.
using the cache means users don't need their own set of data rather they use the global set.
as far as concurrency goes when you load up the insert/ edit page you need to populate it with fresh data and renew the cache after the add/update.
I'm a big believer in decoupling and i rarely, if ever, see the need to throw a full dataset out to the user interface.
You should really only pass objects to the UI which needs to be used so unless you're showing a big diagram or some sort of data structure which needs to display relationships between data it's not worth the cost.
Smaller subsets of data, when applicable, is far more efficient. Is your application actually using all features within a dataset on the UI? if not, then the best way to proceed would be to only pass the data out that you're displaying.
If you're binding data to a control and sorting/paging etc through it, you can implement a lot of the interfaces which enables the dataset to support this, in a lot smaller piece of code.
On that note, i'd keep data, which is largely static (e.g. it doesn't update that often) in the cache. So you need to look at how often the data is updated before you can really make a decision for this.
Lastly, i'll say this again, i see the need to utilise a dataset in the UI very, very rarely..it's a very heavy data object and the cost benefits of looking at your data requirements, versus ease of implementation, could far outweigh the need to cache a dataset.
IMHO, datasets are rather bad to use if you're worried about performance or memory utilisation.
Your answer doesn't hint at the use of the data. Is it reference data? Is the user actively updating it? Is more than one user meant to have update access at any one time?
Without any better information than you provided, go with the accepted wisdom that says keeping large amounts of data in session is a way to guarantee that your application will not scale and will require hefty resources to serve a handful of people.
There's are usually better ways to manage large datasets without resorting to loading all the data in-memory.
Failing that, if your application data needs are truly monsterous, then consider a heavy client with a web service back-end. It may be better suited than making a web page.
As other answers have noted, the answer "depends". However I will assume that you are talking about data that is shared between users. In that case you should not store the dataset in the user's session, but repopulate on postback.
In addition, you should populate a cache near the data access layer of your application to increase interactive performance and reduce load on the database. The design of your cache will again depend on the type of data (read-only vs. read/write vs. read-mostly).
This approach decouples the user session (a UI-layer concept) from data management, and provides the added benefit of supporting shared use of cached data (in cases where data is common across users).
Neither--don't "keep" anything! Display only what a user needs to see and build an efficient paging scheme to see rows backwards or fowards.
rp
Keep in session. Have option to repopulate if you need to, for example if the session memory is wiped.
I have an objectdatasource that will return a potentially large collection (up to 200,000 records) that are bound and paged in a gridview. I am using default paging and caching on the objectdatasource. The data being returned is only updated weekly so stale data is not an issue. The paging in this solution was also faster than when I created a solution using custom paging.
My questions are: Is caching a record set this large acceptable to you? If not, why? Are there any performance counters that you use to see the impact on memory that your cached data is creating?
Thanks!
to answer your questions:
1) Yes caching a large data set is ok. Particulary is generating the data set is more expensive then caching it. Also since this is fairly static data that makes this a good candidate.
2) As for performance counters that sort of depends on the caching mechanism you use. If your use Enterprise Librarie's caching block for example it has counters built in. As for general counters watch the memory counters, working set, private bytes etc...