Caching large amounts of data - asp.net

I have been reading that lots of people use Redis or another key-value store/NoSQL solution as a distributed cache for their website.
Maybe I'm not understanding completely, but it seems a solution like this only works for shared data. For example, if I have a website that requires a user to log-in and the queries they generate return data specific to only that user (in my case, banking/asset information) that can't be cached for all users, this type of solution doesn't work.
Unfortunately, the database is shared across all our applications and when it get bogged down, the website gets bogged down as well. Since each user has gigabytes of information, I obviously can't cache all of that and each web page queries completely different information.
Is there some caching strategy that I can employ for this type of scenario?

A distributed cache like Velocity doesn't require that the data it stores be limited to "shared" data. But you do have to read the data from your DB and store it in the cache, which takes time.
A few alternatives:
Partition your data, so it's spread out among several DB servers
Add as much RAM as you can to each DB server, to allow SQL Server to cache what it can
There are many variations to the partitioning theme....
Is your web app load balanced? There are caching options at the web tier as well -- the ASP.NET object cache is a good place to start.

It's possible that your web clients are requesting the same data more than once (for a given user). So caching could give a benefit in that case.
But before you go implementing a huge caching solution, you really need to look at the queries that are particularly slow or executed a huge number of times and see if you can optimize them in any way.
Then look at upgrading your DB machine.

I read a nice article about the performance issues that MySpace had when they had a huge growth.
You can find the article here.
One quote from the article that stands out:
The addition of the cache servers is "something we should have done
from the beginning, but we were growing too fast and didn't have time
to sit down and do it," Benedetto adds
If the problem is in your database server think about partitioning your data and making use of a database farm to spread the load. Also think about SSD's! They can really speed up your database access code.

Depending how dynamic your data is you could consider using Fragment Caching. This will cache the HTML of the page rather than the data so if the volume of data is prohibtive to cache then this might work for you

Related

NoSQL and AppFabric with Azure

I have an ASP.net application that I'm moving to Azure. In the application, there's a query that joins 9 tables to produce a user record. Each record is then serialized in json and sent back and forth with the client. To increase query performance, the first time the 9 queries run and the record is serialized in json, the resulting string is saved to a table called JsonUserCache. The table only has 2 columns: JsonUserRecordID (that's unique) and JsonRecord. Each time a user record is requested from the client, the JsonUserCache table is queried first to avoid having to do the query with the 9 joins. When the user logs off, the records he created in the JsonUserCache are deleted.
The table JsonUserCache is SQL Server. I could simply leave everything as is but I'm wondering if there's a better way. I'm thinking about creating a simple dictionary that'll store the key/values and put that dictionary in AppFabric. I'm also considering using a NoSQL provider and if there's an option for Azure or if I should just stick to a dictionary in AppFabric. Or, is there another alternative?
Thanks for your suggestions.
"There are only two hard problems in Computer Science: cache invalidation and naming things."
Phil Karlton
You are clearly talking about a cache and as a general principle, you should not persist any cached data (in SQL or anywhere else) as you have the problem of expiring the cache and having to do the deletes (as you currently are). If you insist on storing your result somewhere and don't mind the clearing up afterwards, then look at putting it in an Azure blob - this is easily accessible from the browser and doesn't require that the request be handled by your own application.
To implement it as a traditional cache, look at these options.
Use out of the box ASP.NET caching, where you cache in memory on the web role. This means that your join will be re-run on every instance that the user goes to, but depending on the number of instances and the duration of the average session may be the simplest to implement.
Use AppFabric Cache. This is an extra API to learn and has additional costs which may get quite high if you have lots of unique visitors.
Use a specialised distributed cache such as Memcached. This has the added cost/hassle of having to run it all yourself, but gives you lots of flexibility in the long run.
Edit: All are RAM based. Using ASP.NET caching is simpler to implement and is faster to retrieve the data from cache because it is on the same machine - BUT requires the cache to be populated for each instance of the web role (i.e. it is not distributed). AppFabric caching is distributed but is also a bit slower (network latency) and, depending what you mean by scalable, AppFabric caching currently behaves a bit erratically at scale - so make sure you run tests. If you want scalable, feature rich distributed caching, and it is a big part of your application, go and put in Memcached.

.Net Scenario Based Opinion

I am facing a situation where I am stuck in a very heavy traffic load and keeping the performance high at the same time. Here is my scenario, please read it and advise me with your valuable opinion.
I am going to have a three way communication between my server, client and visitor. When visitor visits my client's website, he will be detected and sent to a intermediate Rule Engine to perform some tasks and output a filtered list of different visitors on my server. On the other side, I have a client who will access those lists. Now what my initial idea was to have a Web Service at my server who will act as a Rule Engine and output resultant lists on an ASPX page. But this seems to be inefficient because there will be huge traffic coming in and the clients will continuously requesting data from those lists so it will be a performance overhead. Kindly suggest me what approach should I do to achieve this scenario so that no deadlock will happen and things work smoothly. I also considered the option for writing and fetching from XML file but its also not very good approach in my case.
NOTE: Please remember that no DB will involve initially, all work will remain outside DB.
Wow, storing data efficiently without a database will be tricky. What you can possibly consider is the following:
Store the visitor data in an object list of some sort and keep it in the application cache on the server.
Periodically flush this list (say after 100 items in the list) to a file on the server - possibly storing it in XML for ease of access (you can associate a schema with it as well to make sure you always get the same structure you need). You can perform this file-writing asynchronously as to avoid keeping the thread locked while writing the file.
The Web Service sounds like a good idea - make it feed off the XML file. Possibly consider breaking the XML file up into several files as well. You can even cache the contents of this file separately so the service feeds of the cached data for added performance benefits...

Using static data in ASP.NET vs. database calls?

We are developing an ASP.NET HR Application that will make thousands of calls per user session to relatively static database tables (e.g. tax rates). The user cannot change this information, and changes made at the corporate office will happen ~once per day at most (and do not need to be immediately refreshed in the application).
About 2/3 of all database calls are to these static tables, so I am considering just moving them into a set of static objects that are loaded during application initialization and then refreshed every 24 hours (if the app has not restarted during that time). Total in-memory size would be about 5MB.
Am I making a mistake? What are the pitfalls to this approach?
From the info you present, it looks like you definitely should cache this data -- rarely changing and so often accessed. "Static" objects may be inappropriate, though: why not just access the DB whenever the cached data is, say, more than N hours old?
You can vary N at will, even if you don't need special freshness -- even hitting the DB 4 times or so per day will be much better than "thousands [of times] per user session"!
Best may be to keep with the DB info a timestamp or datetime remembering when it was last updated. This way, the check for "is my cache still fresh" is typically very light weight, just get that "latest update" info and check it with the latest update on which you rebuilt the local cache. Kind of like an HTTP "if modified since" caching strategy, except you'd be implementing most of it DB-client-side;-).
If you decide to cache the data (vs. make a database call each time), use the ASP.NET Cache instead of statics. The ASP.NET Cache provides functionality for expiry, handles multiple concurrent requests, it can even invalidate the cache automatically using the query notification features of SQL 2005+.
If you use statics, you'll probably end up implementing those things anyway.
There are no drawbacks to using the ASP.NET Cache for this. In fact, it's designed for caching data too (see the SqlCacheDependency class http://msdn.microsoft.com/en-us/library/system.web.caching.sqlcachedependency.aspx).
With caching, a dbms is plenty efficient with static data anyway, especially only 5M of it.
True, but the point here is to avoid the database roundtrip at all.
ASP.NET Cache is the right tool for this job.
You didnt state how you will be able to find the matching data for a user. If it is as simple as finding a foreign key in the cached set then you dont have to worry.
If you implement some kind of filtering/sorting/paging or worst searching then you might at some point miss the quereing capabilities of SQL.
ORM often have their own quereing and linq makes things easy to, but it is still not SQL.
(try to group by 2 columns)
Sometimes it is a good way to have the db return the keys of a resultset only and use the Cache to fill the complete set.
Think: Premature Optimization. You'll still need to deal with the data as tables eventually anyway, and you'd be leaving an "unusual design pattern".
With event default caching, a dbms is plenty efficient with static data anyway, especially only 5M of it. And the dbms partitioning you're describing is often described as an antipattern. One example: multiple identical databases for multiple clients. There are other questions here on SO about this pattern. I understand there are security issues, but doing it this way creates other security issues. I've recently seen this same concept in a medical billing database (even more highly sensitive) that ultimately had to be refactored into a single database.
If you do this, then I suggest you at least wait until you know it's solving a real problem, and then test to measure how much difference it makes. There are lots of opportunities here for Unintended Consequences.

xml parsing / querying performance question for asp.net

I have to port a smaller windows forms application (product configurator) to an asp.net app which will be used on a large company's website, demand should be moderate because it's for a specialized product line.
I don't have access to a database and using XML is a requirement from their web developers.
There are roughly 30 different products with roughly 300 different possible configurations stored in the xml files, and linked questions / answers that lead to a product recommendation. Also some production options. The app is available in 6 languages.
How would you solve the 'data access' layer, if you could call it this way? I thought of reading / deserializing the xml files into their objects and store them in asp.net's cache if they're not there already and then read from the cache on subsequent requests. But that would mean all objects live in the memory all day and night.
Is that even necessary, or smart, performance wise? As I said before, the app is not that big, the xml files not that large. Could I just create some Repository class that reads the xml files whenever an object is requested (ie. 'Product Details', or 'Next question') and returns it that way, and drive memory consumption down?
The whole approach seems to be sticking to a single server. First consider if this is appropriate as you mentioned a "large company's website", that sets a red flag for me. If you need the site to scale, you will end up having more than a single server, which prevents considering a simple local file.
If you are constrained to using that, analyze what data is more appropriate to keep in cache (does not change often, its long lived, the same info is requested different times). Try to keep the cached stuff separated from the non cached, which will reduce the amount of amount of info in the more dynamic files. If you expect big amounts of information, consider splitting the files with something appropriate to your domain.
I use Cache whenever I can. I cache objects upon their first request. If memory is of any concern, I set expiration policy. And whether it is or not, when short on memory, the framework will unload the cache anyway.
Since it is per application and not per user, it makes sense to have it, especially if the relative footprint is small.
If you have to expand to multiple servers later, you can access the same file over the network or modify DA layer to retrieve data by any other means (services, DB, etc). The caching code will stay the same and performance will be virtually unaffected.
If you set dependency, objects will always stay current.
I'm for it.
Using the cache, and setting an appropriate expiration policy as advised by others is a sound approach. I'd suggest you look at using LINQ to XML as the basis for your data access code as it is so much easier to use than traditional methods of querying XML. You can find a decent introduction here.

What to put in a session variable

I recently came across a ASP 1.1 web application that put a whole heap of stuff in the session variable - including all the DB data objects and even the DB connection object. It ends up being huge. When the web session times out (four hours after the user has finished using the application) sometimes their database transactions get rolled back. I'm assuming this is because the DB connection is not being closed properly when IIS kills the session.
Anyway, my question is what should be in the session variable? Clearly some things need to be in there. The user selects which plan they want to edit on the main screen, so the plan id goes into the session variable. Is it better to try and reduce the load on the DB by storing all the details about the user (and their manager etc.) and the plan they are editing in the session variable or should I try to minimise the stuff in the session variable and query the DB for everything I need in the Page_Load event?
This is pretty hard to answer because it's so application-specific, but here are a few guidelines I use:
Put as little as possible in the session.
User-specific selections that should only last during a given visit are a good choice
often, variables that need to be accessible to multiple pages throughout the user's visit to your site (to avoid passing them from page to page) are also good to put in the session.
From what little you've said about your application, I'd probably select your data from the db and try to find ways to minimize the impact of those queries instead of loading down the session.
Do not put database connection information in the session.
As far as caching, I'd avoid using the session for caching if possible -- you'll run into issues where someone else changes the data a user is using, plus you can't share the cached data between users. Use the ASP.NET Cache, or some other caching utility (like Memcached or Velocity).
As far as what should go in the session, anything that applies to all browser windows a user has open to your site (login, security settings, etc.) should be in the session. Things like what object is being viewed/edited should really be GET/POST variables passed around between the screens so a user can use multiple browser windows to work with your application (unless you'd like to prevent that).
DO NOT put UI objects in session.
beyond that, i'd say it varies. too much in session can slow you down if you aren't using the in process session because you are going to be serializing a lot + the speed of the provider. Cache and Session should be used sparingly and carefully. Don't just put in session because you can or is convenient. Sit down and analyze if it makes sense.
Ideally, the session in ASP should store the least amount of data that you can get away with. Storing a reference to any object that is holding system resources open (particularly a database connection) is a definite scalability killer. Also, storing uncommitted data in a session variable is just a bad idea in most cases. Overall it sounds like the current implementation is abusively using session objects to try and simulate a stateful application in a supposedly stateless environment.
Although it is much maligned, the ASP.NET model of managing state automatically through hidden fields should really eliminate the majority of the need to keep anything in session variables.
My rule of thumb is that the more scalable (in terms of users/hits) that the app needs to be, the less you can get away with using session state. There is, however, a trade-off. For web applications where the user is repeatedly accessing the same data and typically has a fairly long session per use of the site, some caching (if necessary in session objects) can actually help scalability by reducing the load on the DB server. The idea here is that it is much cheaper and less complex to farm the presentation layer than the back-end DB. Of course, with all things, this advice should be taken in moderation and doesn't apply in all situations, but for a fairly simple in-house CRUD app, it should serve you well.
A very similar question was asked regarding PHP sessions earlier. Basically, Sessions are a great place to store user-specific data that you need to access across several page loads. Sessions are NOT a great place to store database connection references; you'd be better to use some sort of connection pooling software or open/close your connection on each page load. As far as caching data in the session, this depends on how session data is being stored, how much security you need, and whether or not the data is specific to the user. A better bet would be to use something else for caching data.
storing navigation cues in sessions is tricky. The same user can have multiple windows open and then changes get propagated in a confusing manner. DB connections should definitely not be stored. ASP.NET maintains the connection pool for you, no need to resort to your own sorcery. If you need to cache stuff for short periods and the data set size is relatively small, look into ViewState as a possible option (at the cost of loading more bulk onto the page size)
A: Data that is only relative to one user. IE: a username, a user ID. At most an object representing a user. Sometimes URL-relative data (like where to take somebody) or an error message stack are useful to push into the session.
If you want to share stuff potentially between different users, use the Application store or the Cache. They're far superior.
Stephen,
Do you work for a company that starts with "I", that has a website that starts with "BC"? That sounds exactly like what I did when I first started developing in .net (and was young and stupid) -- I crammed everything I could think of in session and application. Needless to say, that was double-plus ungood.
In general, eschew session as much as possible. Certainly, non-serializable objects shouldn't be stored there (database connections and such), but even big, serializable objects shouldn't be either. You just don't want the overhead.
I would always keep very little information in session. Sessions use server memory resources which is expensive. Saving too many values in session increases the load on server and eventualy the performance of the site will go down. When you use load balance servers, usage of session can run into problems. So what I do is use minimal or no sessions, use cookies if the information is not very critical, use hidden fields more and database sessions.

Resources