I've read lots of material on how to do ASP.Net caching but little on the optimal duration that pages should be cached for.
Let's say that I have a popular site with 50,000 pages. The content does not change frequently, so I could cache pages for up to an hour if I wanted. The server has 16 GB of RAM, but database connections are limited.
How long should pages be cached for?
My thinking is that if I set the cache duration too high (let's say 60 minutes), I will fill up memory with a fraction of the total content, which will continually be shuffled in and out of memory.
Furthermore, let's say that 10% of the pages are responsible for 90% of traffic. If the popular pages are hit every second, and the unpopular ones every hour, then a 60 second cache would only keep the load-intensive content cached without sacrificing freshness.
Should numerous but rarely-accessed content be cached at all?
I don't think you'll ever find a published formula or strict guideline on optimizing your cache. Every situation is going to be radically different, and even from the number of variables you're talking about here, it's impossibly to quantify.
It'll be a combination of experience, guessing, monitoring and incremental adjustments to find your caching sweet spot for any given application.
It sounds like you've already got a handle on what might happen, so I would start with caching the 10% that represent your high traffic, and optimize from there. You may or may not find any performance gains further down the road with your less-used content, but put your effort into the major optimizations first.
One option would be to utilize some sort of disk caching, rather than memory caching, at least for infrequently visited pages. But in general, the point of caching is to support a request that is likely to be made again in a short time. So I would certainly give preference to the frequently requested pages.
Related
I am designing a little server application with local data store and sqlite3 seems to be the way to manage the persistent data. But I am worried about malicious users who know the internal logic and might trick the server into creating (and subsequent deleting) lots of records, in a way where a few valid records remain in each data page. The database size might explode quite soon.
Following the documentation of and recommendations like https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/ implies that even auto_vacuum=incremental would not help me in this scenario because it's only effective for released pages, not for used pages with internal gaps (i.e. fragmentation).
Is there a good way to tell sqlite to consolidate such data on-the-fly?
VACUUM operation is not an option due to long-living global DB lock.
Sqlite will merge an almost empty page with neighbors automatically to help reduce fragmentation like you describe.
From an email from D Richard Hipp on the sqlite mailing list:
Once a sufficient number of rows are removed from a page, and the free space on that page gets to be a substantial fraction of the total space for the page, then the page is merged with adjacent pages, freeing up a whole page for reuse. But as doing this reorganization is expensive, it is deferred until a lot of free space accumulates on the page. (The exact thresholds for when a rebalance occurs are written down some place, but they do not come immediately to my mind, as the whole mechanism just works and we haven't touched it in about 15 years.)
I'm trying to develop a web game. In that game i have a toolbar, which is shown on almost all pages. Toolbar contains user info such as:
- coins (cash)
- level
- name
- group name
- experience points
- skills
- notifications
and more...
Some of them are changing rarely, but some (like experience, skills, notifications or coins) can change quite often.
So, i don't want to query database each time someone reloads a page. I'm thinking of caching this data in memory of server (updating it each time something changes).
Is there any known, good approach? Or should i forget and make extra 2-3 SQL queries on each page load?
"Frequently changed" pretty much excludes caching. I think there's an obsession with optimization that makes people want to cache everything, but you have a database for a reason, and it's perfectly acceptable to query it on occasion.
What you can do, though, is cache portions. Something like Level will typically not change that often as it takes a while to go from one to the next. That could be cached, and it's easy enough to just bust the cache when the level changes so that it's will be forced to be reloaded then. Other things like Coins a user should always have an up-to-date and accurate count. You'll have to make these determinations, obviously.
Long and short, though, if the user should always have an accurate count of display of the thing, then you simply can't cache it. Just go ahead and do the queries you need to do and move on.
Is this a single user game or a multi-user game? If it's a multi-user game, then you're going to have to have some kind of player state that's held for all your currently active players in the game, that's a perfect place for this information. If it's a single user game, then you probably want to keep as little state on the server as possible so you're more likely to load it from disk, or using a distributed caching system.
The problem here is that we can't really tell you what to do because all of it depends on way too many factors... you're just going to have to figure this out as you go along.
My suggestion though, don't pre-maturely think you need to optimize something. Just do it however you think you can, and if you need better performance, then worry about that later.
Say I need to populate 4 or 5 dropdowns w/ items from a database. Each drop down will have < 15 items in it. These items almost never change.
Now I could query the DB each time the page is accessed or I could grab the values from a custom class that would check to see if they already exist in ASP.Net's cache and only if they don't query the DB to update the cache.
It's trivial for me to write but I'm unsure if the performace would be better or not. I think it would be (although not likely anything huge).
What do you think?
When dealing with performance issues you should always:
Do things the simplest way first (avoid premature optimisation)
Performance test your code with set performance goals (e.g. 200ms response time under load of N concurent users)
Then, IF your code doesn't perform then profile your code to determine what is slow, and profile your proposed performance fixes to accurately measure what the real-world performance change will be.
Having said that then yes, what you are suggesting seems sensible (you would usually expect an in-memory cache to be quicker than a database), however it also depends on what data is being returned, what the memory load of your application is, how expensive the query is, what the query parameters are etc...
You should performance test your changes before and after to determine the actual effect of your changes (including things like memory load), and you should only really be doing things like this once you have identified that these dropdowns are the cause of an unacceptable performance problem.
That's what System.Web.Helpers.WebCache class exists for.
IO is usually more expensive than memory operations (by orders of magnitude). Especially if your database is in another machine, then you would even be using network resources, and it will definitely be faster to just use the cache.
But indeed, optimize in the end when you have really identified it as a performance bottleneck by measuring.
Quick answer to your question:
Use the built in .Net cache.
Additional points to ponder over..
Preferably, retrieve all master data in a single database retrieval (think stored procedure and dataset): though, I do not advocate the used of stored procs in all scenarios.
As you rightly said, ensure that your data access layer checks the cache before making a round trip to the database
Also, as your drop down values do not change very often; do remember to keep a long expiry duration
Finally, based on your page design you could also look at Fragment Caching (partial page caching: user controls) which could give you bigger benefits since now you neither access the data cache nor the database.
Performance:
Again, the performance depends more on the application's load as compared to your direct round trips for fetching the master data. Put simply, As Thomas suggested use the cache class!
A web application we wrote intended for one customer is going to be product-ized and sold to dozens of companies, and we will be doing the hosting.
I could use some guidance about the pros and cons of rolling out a seperate instance for each customer versus going with a single (or very small number of) multi-tenant instances.
At first, as we ramp up, I will have to roll out a seperate instance of the application for each new customer (they will come online one at a time) because it's the only immediate option. I imagine this won't scale very well as far as maintenance goes - rolling out changes will become very tedious and possibly error-prone once there are more than 4 or 5 instances out there. Unless we automate that somehow.
Also, the single-instance philosophy seems like it might lead to a bunch of forks if people need customizations. And it would be nice to avoid that.
So what has your experience been with this?
Bonus question #1: What's the performance difference between 10 SQL Servers with 2m records each versus one huge one with 20m? Let's say they are all in one table and we're mainly doing inserts and selects on single records. Sometimes the selects are on an indexed varchar(12) or date field.
Bonus Question #2: I imagine that to avoid forking, we would have to make the customizations configurable, or build a plug-in architecture. However, that might increase the cost of doing customizations, and I don't want to be one of those shops that takes a week to resize a textbox, and I don't want to over-invest in infrastructure. Any thoughts on that?
Scale Details
Each customer will have a decent amount of data -- up to a few million records.
There will be a very small number of concurrent users, only a few per customer, plus a handful of internal reps on our end.
It's unclear whether each customer will require customizations, but I would say some of them probably will, and maybe some of those changes will be things that other customers will not want to see.
when faced with a similar challenge, here's what we did:
we have one code base with multiple sql servers. we do maintain multiple iis servers with copies of the same code base. we are free to move clients around from sql server to sql server to maximize performance.
if a customer has the $ for it, we will install them on their own server and maintain a separate iis server for them. this accommodates the largest customers for whom paying much more money every month (10 fold more money). we do not, however, give them a separate code base. if they need a mod, we make it visible on a per client basis (see #3)
custom programming usually results in a configurable option. even the people who pay us to have their own server get the same version of the code. sometimes its as simple as a clause in the code that says "if the customer = "ourbigcustomer then turn on this option". yes, that's kludgy hard-coding, but if the customer has enough money, that is fine with me.
i didn't quite get from your question whether you wanted to mix different customer's data into one big database .. our rule is we never do that (never ever). it is one of the wisest choices we ever made. it makes data manipulation much less risky and restores of data easier.
I don't see a good reason for either of your two options. I think the real answer lies somewhere in the middle: having multiple instances, each hosting multiple clients.
This adds another layer of automation processing, but it means you can keep the hosting cheap (you won't need to go out and buy a Cray any time soon) and (hopefully) this sort of mentality means you could do failover backups fairly easily.
But let's not get ahead of ourselves... We're talking about a webapp, right? Get your database(s) and aspnet on different machines. Cluster your databases and you'll have a much happier time playing around with various front-end scenarios. You'll also be able to upscale whichever area runs out of puff first.
By the sounds of it, you'll end up with one clustered database over half if not a full dozen database machines and only a couple of front-end boxes.
As for customisations, you've nailed it. You either provide a completely database-hosted set of editable templates or you have to customise who instances. I'm all for the first. It's a lot of work (without much in return) but it's well worth it as you should only need to change the core code when (you will!) you do upgrades. Hunting through a hundred customers' custom instances to make sure they upgrade safely will kill a developer! Template are the answer. At the very very least, you could allow custom CSS without much pain (but they'd need somebody who knew their stuff).
Edit: I've seen a couple of posts going for the all-in-one method. Splitting the instances over multiple machines insulates you from a couple of things:
If you introduce a bug not caught in testing, only a few clients are effected at once
Hardware fails. Having one mega-server fall over will annoy a lot of people at once. Having a failover mega-server is massively expensive. Having a spare failover box per three or four running servers is much cheaper and annoys fewer people.
Performance can be balanced between boxes on a client-by-client basis, so you can put a few light-use clients with a heavy client, or just fill a box with a few medium-use clients, etc.
On the same idea, usage spikes or other slowdowns only effect clients on the same box. Of course this doesn't mean the same for the database, but you can split that up into a cluster of clusters when you get there.
The big advantage of individual instances will be scaling out as each customer's demand increases. For example if you're running on a single server and one customer suddenly needs more preformance you're stuffed. But if they're all individual then moving that customer to a shiny new server is relatively easy.
The big disadvantage will be in managing the instances all individually. (regardless of whether they're all running on the same server or not).
Regardless you should only ever have one instance of the codebase. And customisation should all be controlled through plugins and configuration. Front end should naturally be seperate from content. Although the cost of making a change may be higher, the benefit in terms of features you can offer your other customers (which will just be customisations you've been asked to do) will pay off I'm sure. Which is to say nothing as to how much easier it'll be to manage a single codebase, as opposed to several.
I would strongly advise going with the single instance hosted by your company. This has the following advantages:
You have physical access to all code
and databases to make changes and
updates.
You control the quality of the
hardware it is running on.
When you fix a bug in common code,
you have fixed it once for all
customers.
You can refactor the application
design to better support customer
specific code and avoid forking.
As the number of customers grow, you
can scale-up and scale-out your
servers to meet
performance/responsiveness
requirements.
Your application code and databases
cannot be tampered with by
"inquistive" customers.
I would have to say it is almost more important where your application is running as opposed to how many separate instances there are of it.
Sure, maintaining multiple separate instances is not ideal due to the support/maintenance overhead, but if these apps. are all on servers you control, life is much easier then needing remote/ physical access to different customers networks and servers.
Joel Spolsky also talks about exactly this on StackOverflow podcast 67.
One thing Joel has learned from
selling Fogbugz: software designed to
be installed on a server in-house at a
customer’s site, under full control of
that customer, is almost never worth
the hassle
20 million records relatively speaking is not a huge SQL Server database. A single well provisioned SQL Server could handle this size comfortably. More important however is the number of concurrent accesses to the database. However you say that there will be only a few users per customer so is unlikely to hit you until the level of concurrency grows.
All of the above are good points but you are missing two key questions. What price point is the service offered at and how many customers (order of magnitude) will you ultimately have to support (ie market size)? In 3 years will you have a maximum of 10 customers each of which will pay you $500,000 per year or 500 customers each paying you $10,000 per year? For a small set of high paying premium customers the advantages of individual deployments is clear, whereas the lower prices and larger customer bases demand a shared solution (a la Oli's comment) is the best way to go. Or go with a cloud platform, although I've only read the hype and tinkered rather than deployed that in the field.
Bonus Question 1: table layout, indexing, number of reads / writes, efficiency and complexity of stored procedures (you are using procs or at least prepared statements, right?) all matter a heck of a lot more than the number of physical records in the database to a point. Beyond that you will likely find yourself needing to either provide individual SQL Server instances for each customer or for a pool of customers, once again depending on some of the questions I raised above.
Bonus Question 2: Putting the time into your design for templating and a plugin architecture is essential in this situation and you need to do it sooner rather than later. Once you're in the grind of customizing code for paying customers you will likely not have the time to do it right. This point cannot be stressed enough. Templates and admin tools that give you quick and deep access to data-driven changes in your product will save you a lot of time down the road. As your company / group expands you can then add less technical staff that can be "product experts" who can perform 90% of customizations and maintenance, freeing up your core to continue development or move on to other projects. Finally, don't neglect your data tier in this planning process. Having a core data tier of (almost) immutable stored procs and tables is very important, with custom tables and stored procs clearly demarcated using a good naming convention.
Good luck, feel free to provide more details if you'd like more specific suggestions.
Based on some of the advice received here, we did end up implementing a monolithic multi-tenant version of our application.
I'm glad we did. By the time it was done, we had 3 or 4 forks of the code base (mainly custom skins and things we didn't have n-level support for, but also some actual features), and it was only getting crazier.
We got the multi-tenant version up and successfully folded everything in. There ended up being a lot to think about and a lot to keep track of, but our customers never even knew they had been moved to a new system.
I will say that the actual customer migration was a bit of a bear. I thought at first that we would be able to do it by hand in the backend, but I ended up having to write some fairly involved scripts to get the job done. There were just too many identity columns, and it's not like you can just turn off constraints temporarily when you're importing into a live production system.
I have a dataset that contains around 500 records with 7 columns in each record. I would like to use objectdatasource and caching. I am sure that the data doesn't change. But there will be lots of users accessing the data. My question is whether it would be a good idea to cache the data that has 500 records. is it optimal or not? I think the objectdatasource caches the data per user. so I am wondering whether the data with this size would have performance issues if I cache the data.
Thanks,
sridhar.
The answer really depends on the size of the records, not just the number of them. In any event, ASP.NET's caching algorithm uses a Least Recently Used (LRU) algorithm to evict out items that have not been used recently to make room for additional items. Thus, unless your site is either extremely busy or your result set extremely large, you should be fine to cache the result set and let ASP.NET's cache handle the details of keeping the most used data in the cache and expiring the least used data.
You might also consider using Page level Output Caching if nothing else on the page is user-specific, as this would likely give you slightly more performance for minimal additional memory usage. Other areas to investigate are the control, SqlDependencies, and of course if the data is read-only, be sure you are disabling ViewState on whatever control you're using to display the data.