We are developing an ASP.NET HR Application that will make thousands of calls per user session to relatively static database tables (e.g. tax rates). The user cannot change this information, and changes made at the corporate office will happen ~once per day at most (and do not need to be immediately refreshed in the application).
About 2/3 of all database calls are to these static tables, so I am considering just moving them into a set of static objects that are loaded during application initialization and then refreshed every 24 hours (if the app has not restarted during that time). Total in-memory size would be about 5MB.
Am I making a mistake? What are the pitfalls to this approach?
From the info you present, it looks like you definitely should cache this data -- rarely changing and so often accessed. "Static" objects may be inappropriate, though: why not just access the DB whenever the cached data is, say, more than N hours old?
You can vary N at will, even if you don't need special freshness -- even hitting the DB 4 times or so per day will be much better than "thousands [of times] per user session"!
Best may be to keep with the DB info a timestamp or datetime remembering when it was last updated. This way, the check for "is my cache still fresh" is typically very light weight, just get that "latest update" info and check it with the latest update on which you rebuilt the local cache. Kind of like an HTTP "if modified since" caching strategy, except you'd be implementing most of it DB-client-side;-).
If you decide to cache the data (vs. make a database call each time), use the ASP.NET Cache instead of statics. The ASP.NET Cache provides functionality for expiry, handles multiple concurrent requests, it can even invalidate the cache automatically using the query notification features of SQL 2005+.
If you use statics, you'll probably end up implementing those things anyway.
There are no drawbacks to using the ASP.NET Cache for this. In fact, it's designed for caching data too (see the SqlCacheDependency class http://msdn.microsoft.com/en-us/library/system.web.caching.sqlcachedependency.aspx).
With caching, a dbms is plenty efficient with static data anyway, especially only 5M of it.
True, but the point here is to avoid the database roundtrip at all.
ASP.NET Cache is the right tool for this job.
You didnt state how you will be able to find the matching data for a user. If it is as simple as finding a foreign key in the cached set then you dont have to worry.
If you implement some kind of filtering/sorting/paging or worst searching then you might at some point miss the quereing capabilities of SQL.
ORM often have their own quereing and linq makes things easy to, but it is still not SQL.
(try to group by 2 columns)
Sometimes it is a good way to have the db return the keys of a resultset only and use the Cache to fill the complete set.
Think: Premature Optimization. You'll still need to deal with the data as tables eventually anyway, and you'd be leaving an "unusual design pattern".
With event default caching, a dbms is plenty efficient with static data anyway, especially only 5M of it. And the dbms partitioning you're describing is often described as an antipattern. One example: multiple identical databases for multiple clients. There are other questions here on SO about this pattern. I understand there are security issues, but doing it this way creates other security issues. I've recently seen this same concept in a medical billing database (even more highly sensitive) that ultimately had to be refactored into a single database.
If you do this, then I suggest you at least wait until you know it's solving a real problem, and then test to measure how much difference it makes. There are lots of opportunities here for Unintended Consequences.
Related
I have an ASP.net application that I'm moving to Azure. In the application, there's a query that joins 9 tables to produce a user record. Each record is then serialized in json and sent back and forth with the client. To increase query performance, the first time the 9 queries run and the record is serialized in json, the resulting string is saved to a table called JsonUserCache. The table only has 2 columns: JsonUserRecordID (that's unique) and JsonRecord. Each time a user record is requested from the client, the JsonUserCache table is queried first to avoid having to do the query with the 9 joins. When the user logs off, the records he created in the JsonUserCache are deleted.
The table JsonUserCache is SQL Server. I could simply leave everything as is but I'm wondering if there's a better way. I'm thinking about creating a simple dictionary that'll store the key/values and put that dictionary in AppFabric. I'm also considering using a NoSQL provider and if there's an option for Azure or if I should just stick to a dictionary in AppFabric. Or, is there another alternative?
Thanks for your suggestions.
"There are only two hard problems in Computer Science: cache invalidation and naming things."
Phil Karlton
You are clearly talking about a cache and as a general principle, you should not persist any cached data (in SQL or anywhere else) as you have the problem of expiring the cache and having to do the deletes (as you currently are). If you insist on storing your result somewhere and don't mind the clearing up afterwards, then look at putting it in an Azure blob - this is easily accessible from the browser and doesn't require that the request be handled by your own application.
To implement it as a traditional cache, look at these options.
Use out of the box ASP.NET caching, where you cache in memory on the web role. This means that your join will be re-run on every instance that the user goes to, but depending on the number of instances and the duration of the average session may be the simplest to implement.
Use AppFabric Cache. This is an extra API to learn and has additional costs which may get quite high if you have lots of unique visitors.
Use a specialised distributed cache such as Memcached. This has the added cost/hassle of having to run it all yourself, but gives you lots of flexibility in the long run.
Edit: All are RAM based. Using ASP.NET caching is simpler to implement and is faster to retrieve the data from cache because it is on the same machine - BUT requires the cache to be populated for each instance of the web role (i.e. it is not distributed). AppFabric caching is distributed but is also a bit slower (network latency) and, depending what you mean by scalable, AppFabric caching currently behaves a bit erratically at scale - so make sure you run tests. If you want scalable, feature rich distributed caching, and it is a big part of your application, go and put in Memcached.
I have been reading that lots of people use Redis or another key-value store/NoSQL solution as a distributed cache for their website.
Maybe I'm not understanding completely, but it seems a solution like this only works for shared data. For example, if I have a website that requires a user to log-in and the queries they generate return data specific to only that user (in my case, banking/asset information) that can't be cached for all users, this type of solution doesn't work.
Unfortunately, the database is shared across all our applications and when it get bogged down, the website gets bogged down as well. Since each user has gigabytes of information, I obviously can't cache all of that and each web page queries completely different information.
Is there some caching strategy that I can employ for this type of scenario?
A distributed cache like Velocity doesn't require that the data it stores be limited to "shared" data. But you do have to read the data from your DB and store it in the cache, which takes time.
A few alternatives:
Partition your data, so it's spread out among several DB servers
Add as much RAM as you can to each DB server, to allow SQL Server to cache what it can
There are many variations to the partitioning theme....
Is your web app load balanced? There are caching options at the web tier as well -- the ASP.NET object cache is a good place to start.
It's possible that your web clients are requesting the same data more than once (for a given user). So caching could give a benefit in that case.
But before you go implementing a huge caching solution, you really need to look at the queries that are particularly slow or executed a huge number of times and see if you can optimize them in any way.
Then look at upgrading your DB machine.
I read a nice article about the performance issues that MySpace had when they had a huge growth.
You can find the article here.
One quote from the article that stands out:
The addition of the cache servers is "something we should have done
from the beginning, but we were growing too fast and didn't have time
to sit down and do it," Benedetto adds
If the problem is in your database server think about partitioning your data and making use of a database farm to spread the load. Also think about SSD's! They can really speed up your database access code.
Depending how dynamic your data is you could consider using Fragment Caching. This will cache the HTML of the page rather than the data so if the volume of data is prohibtive to cache then this might work for you
I've done a search on this subject already, and have found the same data over and over-- a review of the three different types of sessions. (InProc, Sql, StateServer) However, my question is of a different nature.
Specifically, what is the advantages/disadvantages of using the built in .NET session in the first place?
Here is why I am asking: A fellow .NET developer has told me to NEVER use the built in Microsoft Session. Not at all. Not even create a custom Session State Provider. His reasoning for this is the following--that if you have the Session turned on in IIS it makes all of your requests happen synchronously. He says that enabling session degrades the performance of a web server.
His solution to this is to create a session yourself-- a class that stores all values you need and is serialized in and out of the database. He advises that you store the unique ID to reference this in a cookie or a querystring variable. In our environment, using a DB to store the sessions is a requirement because all the pages we make are on web farms, and we use Oracle-- so I agree with that part.
Does using the built in Session degrade performance more than a home-built Session? Are there any security concerns with this?
So to sum it all up, what are the advantages/disadvantages?
Thanks to all who answer!
My experience has been that the session is a good means of managing state when you use it appropriately. However, often times it's misused, causing the "never ever use the session" sentiment shared by many developers.
I and many other developers have ran into major performance issues when we mistakenly used the session to store large amounts of data from a database, so as to "save a trip." This is bad. Storing 2000 user records per session will bring the web server to its knees when more than a couple of users use the application. Session should not be used as a database cache.
Storing an integer, however, per session is perfectly acceptable. Small amounts of data representing how the current user is using your application (think shopping cart) is a good use of session state.
To me, it's really all about managing state. If done correctly, then session can be one of many good ways to manage state. It should be decided in the beginning on how to manage state though. Most often times, we've run into trouble when someone decides to just "throw something in the session".
I found this article to be really helpful when using out-of-process modes, and it contains some tips that I would have never thought of on my own. For example, rather than marking a class as serializable, storing its primitive datatype members in separate session variables, and then recreating the object can improve performance.
Firstly, you colleague is implementing his own DB backed session management system, I do not see what advantage this has over using built in session state stored on a database (MS SQL is the default, there is no reason not to use Oracle instead).
Is his solution better than the built in one? Unlikely. It's way more work for you for a start. Here's a simple illustration of why. Let's say you use cookies to store your ID, how do you cope with a user who turns off cookies? If you are using ASP.Net's session state there's no problem as it will fall back to using the query string. With your colleagues idea you have to roll your own.
There is a very valid question as to whether you shold have session state at all. If you can design your application not to need any session state at all you will have a much easier time scaling and testing. Obviously you may have application state which needs to live beyond a session anyway (simple case beign user names and passwords), but you have to store these data anyway regardless of whether you have session state.
The MS implementation of Session State is not evil in and of itself... it is how some developers use it. As mentioned above, using the built-in session state provider means that you don't have to reinvent the security, aging, and concurrency issues. Just don't start jamming lots of garbage in the session because you're too lazy to figure out a better way to manage state and page transitions. Session doesn't scale really well... if each user on your site stuffs a bunch of objects in the session, and those objects take up a tiny bit of the finite memory available to your app, you'll run into problems sooner than later as your app grows in popularity. Use session in the manner for which it was designed: a token to represent that a user is still "using" your site. When you start to venture beyond that, either because of ignorance or laziness, you're bound to get burned.
You should be judicious in your use of Session, since multiple requests to the same Session object will usually be queued: see "Concurrent requests and session state" http://msdn.microsoft.com/en-us/library/ms178581.aspx.
Note that you can set EnableSessionState to ReadOnly to allow concurrent read access to session state.
This queuing is a good thing, as it means developers can use Session without being concerned about synchronization.
I would not agree with your colleague's recommendation to "never" use Session and I certainly wouldn't consider rolling my own.
First, a browser will only make two requests, to a given hostname, at a given time. For the most part these requests are for static content (JS files, CSS, etc). So, the serializing of requests to dynamic content aren't nearly the issue that one might think. Also, I think this may be confused with Classic ASP, where pages that use Session are definitely serialized, I don't believe this is the case with ASP.Net.
With ASP.Net session state (SQL mode, state server, or custom) you have an implementation that is standard, and consistent throughout an application. If you don't need to share session information this is your best bet. If you need to share information with other application environments (php, swing/java, classic asp, etc.) it may be worth considering.
Another advantage/disadvantage is that there has been a lot of developer focus on the built-in methodology for sessions with regards to performance, and design over rolling your own, even with a different provider.
Are there any security concerns with this?
If you roll your own you'll have to handle Session Fixation and Hijacking attacks, whereas using the built-in Session I think they are handled for you (but I could be wrong).
the home made session as you have described is doing nothing different "SQL" state of .Net sessions and in my experience i dont think session degrades your performance in anyway. building your own session manager will require putting in several other plumbing tasks along - security, flushing it out, etc.
the advantage with in-built sessions is its easy to use with all this plumbing already been taken care of. with "SQL" mode you can persist the session data in database thus allowing you to run your app on web-farms without any issues.
we designed a b2b ecommerce app for fortune 57 company which processes over 100k transactions a day and used sessions [SQL mode] quite extensively without any problems whatsover at all.
Correct me if I am wrong:
The primary advantage of storing Session state in a db, e.g., SQL Server, is that you are not consuming memory resources, but instead storing it to disk in a db.
The disadvantage is that you take an IO hit to retrieve this info from the database each time you need it (or maybe SQL Sever even does some magic caching of the data for you based on recently executed queries?)
In any event, this the price an IO to retrieve the session info from a db per trip to the web server seems like a safer strategy for sites that encounter a lot of traffic.
I want to cache custom data in an ASP.NET application. I am putting lots of data into it, such as List<objects>, and other objects.
Is there a best practice for this? Since if I use a static data, if the w3p.exe dies or gets recycled, the cache will need to be filled again.
The database is also getting updated by other applications, so a thread would be needed to make sure it is on the latest data.
Update 1:
Just found this, which problably helps me
http://www.codeproject.com/KB/web-cache/cachemanagementinaspnet.aspx?fid=229034&df=90&mpp=25&noise=3&sort=Position&view=Quick&select=2818135#xx2818135xx
Update 2:
I am using DotNetNuke as the application, ( :( ). I have enabled persistent caching and now the whole application feels slugish.
Such as a Multiview takes about 3 seconds to swap view....
Update 3:
Strategies for Caching on the Web?
Linked to this, I am using the DotNetNuke caching method, which in turn uses the ASP.NET Cache object, it also has file based caching.
I have a helper:
CachingProvider.Instance().Add( _
(label & "|") + key, _
newObject, _
Nothing, _
Cache.NoAbsoluteExpiration, _
Cache.NoSlidingExpiration, _
CacheItemPriority.NotRemovable, _
Nothing)
Which runs that to add the objects to the cache, is this correct? As I want to keep it cached as long as possible. I have a thread which runs every x Minutes, which will update the cache. But I have noticied, the cache is getting emptied, I check for an object "CacheFilled" in the cache.
As a test I've told the worker process not to recycle, etc., but still it seems to clear out the cache. I have also changed the DotNetNuke settings from "heavy" to "light" but think that is for module caching.
You are looking for either out of process caching or a distributed caching system of some sort, based upon your requirements. I recommend distributed caching, because it is very scalable and is dedicated to caching. Someone else had recommended Velocity, which we have been evaluating and thoroughly enjoying. We have written several caching providers that we can interchange while we are evaluating different distributed caching systems without having to rebuild. This will come in handy when we are load testing the various systems as part of the final evaluation.
In the past, our legacy application has been a random assortment of cached items. There have been DataTables, DataViews, Hashtables, Arrays, etc. and there was no logic to what was used at any given time. We have started to move to just caching our domain object (which are POCOs) collections. Using generic collections is nice, because we know that everything is stored the same way. It is very simple to run LINQ operations on them and if we need a specialized "view" to be stored, the system is efficient enough to where we can store a specific collection of objects.
We also have put an abstraction layer in place that pretty much brokers calls between either the DAL or the caching model. Calls through this layer will check for a cache miss or cache hit. If there is a hit, it will return from the cache. If there is a miss, and the call should be cached, it will attempt to cache the data after retrieving it. The immediate benefit of this system is that in the event of a hardware or software failure on the machines dedicated to caching, we are still able to retrieve data from the database without having a true outage. Of course, the site will perform slower in this case.
Another thing to consider, in regards to distributed caching systems, is that since they are out of process, you can have multiple applications use the same cache. There are some interesting possibilities there, involving sharing database between applications, real-time manipulation of data, etc.
Also have a look at the MS Enterprise Caching Application block which allows your to write custom expiration policy, custom store etc.
http://msdn.microsoft.com/en-us/library/cc309502.aspx
You can also check "Velocity" which is available at
http://code.msdn.microsoft.com/velocity
This will be useful if you wish to scale your application across servers...
There are lots of articles about the Cache object in ASP.NET and how to make it use SqlDependencies and other types of cache expirations. No need to write your own. And using the Cache is recommended over session or any of the other collections people used to cram lots of data into.
Cache and Session can lead to sluggish behaviour, but sometimes they're the right solutions: the rule of right tool for right job applies.
Personally I've often created collections in pseudo-static singletons for the kind of role you describe (typically to avoid I/O overheads like storing a compiled xslttransform), but it's very important to keep in mind that that kind of cache is fragile, and design for it to A). filewatch or otherwise monitor what it's supposed to cache where appropriate and B). recreate/populate itself with use - it should expect to get flushed frequently.
Essentially I recommend it as a performance crutch, but don't rely on it for anything requiring real persistence.
I recently came across a ASP 1.1 web application that put a whole heap of stuff in the session variable - including all the DB data objects and even the DB connection object. It ends up being huge. When the web session times out (four hours after the user has finished using the application) sometimes their database transactions get rolled back. I'm assuming this is because the DB connection is not being closed properly when IIS kills the session.
Anyway, my question is what should be in the session variable? Clearly some things need to be in there. The user selects which plan they want to edit on the main screen, so the plan id goes into the session variable. Is it better to try and reduce the load on the DB by storing all the details about the user (and their manager etc.) and the plan they are editing in the session variable or should I try to minimise the stuff in the session variable and query the DB for everything I need in the Page_Load event?
This is pretty hard to answer because it's so application-specific, but here are a few guidelines I use:
Put as little as possible in the session.
User-specific selections that should only last during a given visit are a good choice
often, variables that need to be accessible to multiple pages throughout the user's visit to your site (to avoid passing them from page to page) are also good to put in the session.
From what little you've said about your application, I'd probably select your data from the db and try to find ways to minimize the impact of those queries instead of loading down the session.
Do not put database connection information in the session.
As far as caching, I'd avoid using the session for caching if possible -- you'll run into issues where someone else changes the data a user is using, plus you can't share the cached data between users. Use the ASP.NET Cache, or some other caching utility (like Memcached or Velocity).
As far as what should go in the session, anything that applies to all browser windows a user has open to your site (login, security settings, etc.) should be in the session. Things like what object is being viewed/edited should really be GET/POST variables passed around between the screens so a user can use multiple browser windows to work with your application (unless you'd like to prevent that).
DO NOT put UI objects in session.
beyond that, i'd say it varies. too much in session can slow you down if you aren't using the in process session because you are going to be serializing a lot + the speed of the provider. Cache and Session should be used sparingly and carefully. Don't just put in session because you can or is convenient. Sit down and analyze if it makes sense.
Ideally, the session in ASP should store the least amount of data that you can get away with. Storing a reference to any object that is holding system resources open (particularly a database connection) is a definite scalability killer. Also, storing uncommitted data in a session variable is just a bad idea in most cases. Overall it sounds like the current implementation is abusively using session objects to try and simulate a stateful application in a supposedly stateless environment.
Although it is much maligned, the ASP.NET model of managing state automatically through hidden fields should really eliminate the majority of the need to keep anything in session variables.
My rule of thumb is that the more scalable (in terms of users/hits) that the app needs to be, the less you can get away with using session state. There is, however, a trade-off. For web applications where the user is repeatedly accessing the same data and typically has a fairly long session per use of the site, some caching (if necessary in session objects) can actually help scalability by reducing the load on the DB server. The idea here is that it is much cheaper and less complex to farm the presentation layer than the back-end DB. Of course, with all things, this advice should be taken in moderation and doesn't apply in all situations, but for a fairly simple in-house CRUD app, it should serve you well.
A very similar question was asked regarding PHP sessions earlier. Basically, Sessions are a great place to store user-specific data that you need to access across several page loads. Sessions are NOT a great place to store database connection references; you'd be better to use some sort of connection pooling software or open/close your connection on each page load. As far as caching data in the session, this depends on how session data is being stored, how much security you need, and whether or not the data is specific to the user. A better bet would be to use something else for caching data.
storing navigation cues in sessions is tricky. The same user can have multiple windows open and then changes get propagated in a confusing manner. DB connections should definitely not be stored. ASP.NET maintains the connection pool for you, no need to resort to your own sorcery. If you need to cache stuff for short periods and the data set size is relatively small, look into ViewState as a possible option (at the cost of loading more bulk onto the page size)
A: Data that is only relative to one user. IE: a username, a user ID. At most an object representing a user. Sometimes URL-relative data (like where to take somebody) or an error message stack are useful to push into the session.
If you want to share stuff potentially between different users, use the Application store or the Cache. They're far superior.
Stephen,
Do you work for a company that starts with "I", that has a website that starts with "BC"? That sounds exactly like what I did when I first started developing in .net (and was young and stupid) -- I crammed everything I could think of in session and application. Needless to say, that was double-plus ungood.
In general, eschew session as much as possible. Certainly, non-serializable objects shouldn't be stored there (database connections and such), but even big, serializable objects shouldn't be either. You just don't want the overhead.
I would always keep very little information in session. Sessions use server memory resources which is expensive. Saving too many values in session increases the load on server and eventualy the performance of the site will go down. When you use load balance servers, usage of session can run into problems. So what I do is use minimal or no sessions, use cookies if the information is not very critical, use hidden fields more and database sessions.