I have an asp.net web site with 10-25k visitors a day (peaks of over 60k before holidays). Pages/visit is also high, since it's a content site.
I have a few specific pages which generate about 60% of the traffic. These pages are a bit complex and are DB heavy (sql server 2008 r2 backend).
I was wondering if it's worth "caching" a static version of these pages (I hear this is possible) and only re-render them when something changes (about once in 48hs).
Does this sound like a good idea? Where would be the best place to implement this?
(asp.net, iis, db)
Update: Looks like a good option for me is outputcache with SqlDependency. I see a reference to some kind of SQL server notification for invalidating the cache, but I only see talk of SQL server 2005. Has this option been deprecated by Microsoft? Any new way to handle this?
Caching is a broad term that can happen at a number of different points. The optimum solution may be a combination of some or all.
For example, you can add page, or output caching as described here, which caches output on the web server, which I think is what you were referring to.
In addition, you can cache the data in memory using something like memcached, so that your data is more available to the web server as it builds the page, but you need to look at cache hit rate to know for sure that you are caching the right data.
Also, although slightly off the topic of improving db heavy pages, you can cache static resources that change infrequently like images, css and include files using a content delivery network. Any CDN will almost certainly have a higher bandwidth and a cheaper data plan than your own connection because of the economies of scale, so the more of your content you can serve from there the better, in general.
Your first question was "I was wondering if it's worth "caching" a static version of these pages". I guess the answer to that depends on whether there is a performance problem at the moment, and where the cause of that problem is. If the pages are being served quickly and reliably, then quite possibly it's not worth implementing caching. If there is a performance problem, then where is it? Is it in db read time, or is it in the time spent building the page once the data has been returned?
I don't have much experience in caching, but this is what I would try to do:
I would look at your stats and run some profiles, see which are the most heavily visited pages that run the most expensive SQL queries. Pick one or two of the most expensive pages.
If the page is pseudo static, that is, no data on it such as your logged in username, no comments, etc etc, you can cache the entire page. You can set a relatively long cache as well, anything from 1 min to a few hours.
If the page has some dynamic real time content on it, such as comments, you can identify the static controls and cache those individually. Don't put a page wide cache on.
Good luck, sounds like a cache could improve performance.
Caching may or may not help. For example, if a site has low traffic and if the caching is enabled, the server processes to create the cache before serving the request. And because the traffic is low, there can be enough delay between successive requests. So the cached version may even expire and the server again creates a new cached version. This process makes the response even slower than normal.
Read more: Caching - the good, the bad.
I have myself experienced this issue.
If the traffic is good, caching may help you have better load times.
Cheers
Aditya
Related
I am just getting in to the more intricate parts of web development. This may not be in the best place. However, when is it best to get load balancing for a web project? I understand that it depends on good design/bad design as to how many users you can get to visit a site without it REALLY effecting the performance. However, I am planning to code a new project that could potentially have a lot of users and I wondered if I should be thinking off the bat about load balancing. Opinions welcome; thanks in advance!
I should not also that the project most likely will be asp.net (webforms or mvc not yet decided) with backend of mongodb or pgsql(again still deciding).
Load balancing can also be a form of high availability. What if your web server goes down? It can take a long time to replace it.
Generally, when you need to think about throughput you are already rich because you have an enormous amount of users.
Stackoverflow is serving 10m unique users a month with a few servers (6 or so). Think about how many requests per day you had if you were constantly generating 10 HTTP responses per second for 8 hot hours: 10*3600*8=288000 page impressions per day. You won't have that many users soon.
And if you do, you optimize your app to 20 requests per second and CPU core which means you get 80 requests per second on a commodity server. That is a lot.
Adding a load balancer later is usually easy. LBs can tag each user with a cookie so they get pinned to one particular target. You app will not notice the difference. Usually.
Is this for an e-commerce site? If so, then the real question to ask is "for every hour that the site is down, how much money are you losing?" If that number is substantial, then I would make load balancing a priority.
One of the more-important architecture decisions that I have seen affect this, is the use of session variables. You need to be able to provide a seamless experience if your user ends-up on different servers during their visit. Session variables won't transfer from server to server, so I would avoid using them.
I support a solution like this at work. We run four (used to be eight) .NET e-commerce websites on three Windows 2k8 servers (backed by two primary/secondary SQL Server 2008 databases), taking somewhere around 1300 (combined) orders per day. Each site is load-balanced, and kept "in the farm" by a keep-alive. The nice thing about this, is that we can take one server down for maintenance without the users really noticing anything. When we bring it back, we re-enable our replication service and our changes get pushed out to the other two servers fairly quickly.
So yes, I would recommend giving a solution like that some thought.
The parameters here that may affect the one the other and slow down the performance are.
Bandwidth
Processing
Synchronize
Have to do with how many user you have, together with the media you won to serve.
So if you have to serve a lot of video/files to deliver, you need many servers to deliver it. Let say that you do not have, what is the next think that need to check, the users and the processing.
From my experience what is slow down the processing is the locking of the session. So one big step to speed up the processing is to make a total custom session handling and your page will no lock the one the other and you can handle with out issue too many users.
Now for next step let say that you have a database that keep all the data, to gain from a load balance and many computers the trick is to make local cache of what you going to show.
So the idea is to actually avoid too much locking that make the users wait the one the other, and the second idea is to have a local cache on each different computer that is made dynamic from the main database data.
ref:
Web app blocked while processing another web app on sharing same session
Replacing ASP.Net's session entirely
call aspx page to return an image randomly slow
Always online
One more parameter is that you can make a solution that can handle the case of one server for all, and all for one :) style, where you can actually use more servers for backup reason. So if one server go off for any reason (eg for update and restart), the the rest can still work and serve.
As you said, it depends if/when load balancing should be introduced. It depends on performance and how many users you want to serve. LB also improves reliability of your app - it will not stop when one system goes crashing down. If you can see your project growing to be really big and serve lots of users I would sugest to design your application to be able to be upgraded to LB, so do not do anything non-standard. Try to steer away of home-made solutions and always follow good practice. If later on you really need LB it should not be required to change your app.
UPDATE
You may need to think ahead but not at a cost of complicating your application too much. Do not go paranoid and prepare everything to work lightning fast 'just in case'. For example, do not worry about sessions - session management can be easily moved to SQL Server at any time and this is the way to go with LB. Caching will also help if you hit some bottlenecks in the future but you do not need to implement it straight away - good design (stable interfaces), separation and decoupling will allow for the cache to be added later on. So again - stick to good practices, do not close doors but also do not open all of them straight away.
You may find this article interesting.
I have been reading that lots of people use Redis or another key-value store/NoSQL solution as a distributed cache for their website.
Maybe I'm not understanding completely, but it seems a solution like this only works for shared data. For example, if I have a website that requires a user to log-in and the queries they generate return data specific to only that user (in my case, banking/asset information) that can't be cached for all users, this type of solution doesn't work.
Unfortunately, the database is shared across all our applications and when it get bogged down, the website gets bogged down as well. Since each user has gigabytes of information, I obviously can't cache all of that and each web page queries completely different information.
Is there some caching strategy that I can employ for this type of scenario?
A distributed cache like Velocity doesn't require that the data it stores be limited to "shared" data. But you do have to read the data from your DB and store it in the cache, which takes time.
A few alternatives:
Partition your data, so it's spread out among several DB servers
Add as much RAM as you can to each DB server, to allow SQL Server to cache what it can
There are many variations to the partitioning theme....
Is your web app load balanced? There are caching options at the web tier as well -- the ASP.NET object cache is a good place to start.
It's possible that your web clients are requesting the same data more than once (for a given user). So caching could give a benefit in that case.
But before you go implementing a huge caching solution, you really need to look at the queries that are particularly slow or executed a huge number of times and see if you can optimize them in any way.
Then look at upgrading your DB machine.
I read a nice article about the performance issues that MySpace had when they had a huge growth.
You can find the article here.
One quote from the article that stands out:
The addition of the cache servers is "something we should have done
from the beginning, but we were growing too fast and didn't have time
to sit down and do it," Benedetto adds
If the problem is in your database server think about partitioning your data and making use of a database farm to spread the load. Also think about SSD's! They can really speed up your database access code.
Depending how dynamic your data is you could consider using Fragment Caching. This will cache the HTML of the page rather than the data so if the volume of data is prohibtive to cache then this might work for you
I have a asp.net web site for our company and handles about 1000 - 2000 users every day. Now the site will have about 4-5000 users every day. We are putting it to two servers and put them in the hardware load balanced environment.
I am wondering if there is anything else I should do from the ASP.net web site perspective to handle the larger users.
Thanks.
Some things I'd take into consideration..
Session state management - are you going to do it out-of-process? If so, make sure everything being stored in Session is serializable.
Do you have a large number (or any? some may argue) update panels being used or many standard server-side postbacks? If so, try to convert what you can to simple AJAX requests and marshal raw/JSON data back and forth. This will minimize on the number of full page life cycles and data traffic on the server.
On the front-end/UI side, try to leverage CSS sprites, so that you go to the server for the images once and never again.
For database connectivity, make sure you leverage connection pooling.
You may also want to consider js and css minification.
Additionally, these pages has some good tips:
http://msdn.microsoft.com/en-us/magazine/cc163854.aspx (a bit outdated, but still somewhat relevant)
http://developer.yahoo.com/performance/rules.html
First of all you should profile your application against bottleneck - if there is any place in your code which makes your application slow then adding new servers won't help. There are many profilers - I recommend JetBrains Dot Trace (there is a free trial for couple of days).
Second thing is OutputCache - the shortest explanation is "store html that is sent to the users, not recreate it every time. There is a huge number of articles about OutputCache so I don't think you need any link here.
If the traffic is even bigger you can think about using some solution for caching your responses around the world (read e.g. about Akamai) but I don't suppose you will need it with couple thousands of visitors daily.
To make my extranet web application even faster/more scalable I think of using some of the caching mechanisms. For some of the pages we'll use HTML caching, please ignore that technique for this question.
E.g.: at some point in time 2500 managers will simultaneously login on our application
(most of them with the same Account/Project)
I think of storing an Account-cachekey and Project-cachekey into the user's Session and use that to get the item from the Cache.
I could have simply stored the Account into the session, but that would result in 2500 of the same Accounts in memory.
Is there a better solution to this or does it make sense :)?
Generally adding items to session is seen as having a negative impact on scalability. Depending on your technology, you may have a problem scaling out to more than 1 server when using session variables (eg in classic asp).
Having said that, if performance is your top priority you could cache data in both session and application variables. I have always thought that its not worth the hassle for a dataset of this size becuase sql server will almost certainly have this data cached in memory and all you are saving is a network round trip.
Lastly, look at code and hardware optimisation first for performance enhancements. Migrating to managed/compiled code, reducing the size of your html, optimising your images, minifying JavaScript, and of course the html caching you mentioned previously - these are all things I would consider first.
My first time really getting into caching with .NET so wanted to run a couple of scenarios by you.
Question 1: Many expensive objects
I've got some small objects (simple int/string properties) which are pretty expensive to instantiate. These are user statistic objects which each user may have 1 - 10 of. Is it good or bad practice to fill up the cache with these fellas?
Question 2: Few cheap regularly used objects
Also got a few objects (again small) which are used many times on every page load. Is the cache designed to be accessed so regularly?
Fanks!
stackoverflow: Cracking question suggestion tool btw.
1) I would cache them. You can always set CacheItemPriority.Low if you are worrying about the cache 'filling up'
2) Yes the cache is designed to be accessed regularly. It can lead to huge performance improvements.
The answer to both of your questions is to cache them aggressively if you can.
Objects that are expensive to instantiate yet relatively static (that is unchanging) throughout the application's life ought to be cached. Even relatively inexpensive objects should be cached if you can improve performance by doing so.
You may find yourself running into problems when you need to invalidate the cache when any of these objects become stale or obsolete. Cache invalidation can be a difficult problem especially in a multi-server environment.
I don't think there is any problem with hitting the cache too frequently...
Overall asp.net Caching is fairly intelligent int terms of deciding what to keep and generally managing space. As a rule though I wouldn't depend on the cache to store information, only use it as an alternative to hitting disk or DB. User objects may be better served by session state.
https://web.archive.org/web/20211020111559/https://aspnet.4guysfromrolla.com/articles/100902-1.aspx
is a great article explaining the built in capabilities of .net caching.
Let's address the question in your title - when caching is too much.
It's too much if you are putting so much in the cache that it is pushing other things away. If the web sites on the server in total is using more memory than there is physical memory, they will push each other out into the virtual memory that is stored on disk. That will practically mean that you are caching some of the objects on disk instead of in memory, which is a lot slower.
It's too much if you are putting so many objects in the cache that they push each other out, so that you rarely get to use any of the objects in the cache before they go away.
So, generally you can cache a lot before you reach the limit where there is no point in putting anything more in the cache.
When determine what's most benificial to cache, consider where the bottle necks are. If you for example have a database server with a lot more capacity than the web server, caching the database results doesn't save so much resources. Getting data from the database takes time, but it doesn't use much resources on the web server while waiting for it, so it will not affect the throughput much.