I’m after some thoughts on how people go about calculating database load for the purposes of capacity planning. I haven’t put this on Server Fault because the question is related to measuring just the application rather than defining the infrastructure. In this case, it’s someone else’s job to worry about that bit!
I’m aware there are a huge number of variables here but I’m interested in how others go about getting a sense of rough order of magnitude. This is simply a costing exercise early in a project lifecycle before any specific design has been created so not a lot of info to go on at this stage.
The question I’ve had put forward from the infrastructure folks is “how many simultaneous users”. Let’s not debate the rationale of seeking only this one figure; it’s just what’s been asked for in this case!
This is a web front end, SQL Server backend with a fairly fixed, easily quantifiable audience. To nail this down to actual simultaneous requests in a very rough fashion, the way I see it, it comes down to increasingly granular units of measurement:
Total audience
Simultaneous sessions
Simultaneous requests
Simultaneous DB queries
This doesn’t account for factors such as web app caching, partial page requests, record volume etc and there’s some creative license needed to define frequency of requests per user and number of DB hits and execution time but it seems like a reasonable starting point. I’m also conscious of the need to scale for peak load but that’s something else that can be plugged into the simultaneous sessions if required.
This is admittedly very basic and I’m sure there’s more comprehensive guidance out there. If anyone can share their approach to this exercise or point me towards other resources that might make the process a little less ad hoc, that would be great!
I will try, but obviously without knowing the details it is quite difficult to give a precise advice.
First of all, the infrastructure guys might have asked this question from the licensing perspective (SQL server can be licensed per user or per CPU)
Now back to your question. "Total audience" is important if you can predict/work out this number. This can give you the worst case scenario when all users hit the database at once (e.g. 9am when everyone logs in).
If you store session information you would probably have at least 2 connections per user (1 session + 1 main DB). But this number can be (sometimes noticeably) reduced by connection pooling (depends on how you connect to the database).
Use a worst case scenario - 50 system connection + 2 * number of users.
Simultaneous requests/queries depend on the nature of the application. Need more details.
More simultaneous requests (to your front end) will not necessarily translate to more requests on the back end.
Having said all of that - for the costing purposes you need to focus on a bigger picture.
SQL server license (If my memory serves me right) will cost ~128K AUD (dual Xeon). Hot/warm standby? Double the cost.
Disk storage - how much storage will you need? Disks are relatively cheap but if you are going to use SAN the cost might become noticeable. Also - the more disks the better from the performance perspective.
Related
I am just getting in to the more intricate parts of web development. This may not be in the best place. However, when is it best to get load balancing for a web project? I understand that it depends on good design/bad design as to how many users you can get to visit a site without it REALLY effecting the performance. However, I am planning to code a new project that could potentially have a lot of users and I wondered if I should be thinking off the bat about load balancing. Opinions welcome; thanks in advance!
I should not also that the project most likely will be asp.net (webforms or mvc not yet decided) with backend of mongodb or pgsql(again still deciding).
Load balancing can also be a form of high availability. What if your web server goes down? It can take a long time to replace it.
Generally, when you need to think about throughput you are already rich because you have an enormous amount of users.
Stackoverflow is serving 10m unique users a month with a few servers (6 or so). Think about how many requests per day you had if you were constantly generating 10 HTTP responses per second for 8 hot hours: 10*3600*8=288000 page impressions per day. You won't have that many users soon.
And if you do, you optimize your app to 20 requests per second and CPU core which means you get 80 requests per second on a commodity server. That is a lot.
Adding a load balancer later is usually easy. LBs can tag each user with a cookie so they get pinned to one particular target. You app will not notice the difference. Usually.
Is this for an e-commerce site? If so, then the real question to ask is "for every hour that the site is down, how much money are you losing?" If that number is substantial, then I would make load balancing a priority.
One of the more-important architecture decisions that I have seen affect this, is the use of session variables. You need to be able to provide a seamless experience if your user ends-up on different servers during their visit. Session variables won't transfer from server to server, so I would avoid using them.
I support a solution like this at work. We run four (used to be eight) .NET e-commerce websites on three Windows 2k8 servers (backed by two primary/secondary SQL Server 2008 databases), taking somewhere around 1300 (combined) orders per day. Each site is load-balanced, and kept "in the farm" by a keep-alive. The nice thing about this, is that we can take one server down for maintenance without the users really noticing anything. When we bring it back, we re-enable our replication service and our changes get pushed out to the other two servers fairly quickly.
So yes, I would recommend giving a solution like that some thought.
The parameters here that may affect the one the other and slow down the performance are.
Bandwidth
Processing
Synchronize
Have to do with how many user you have, together with the media you won to serve.
So if you have to serve a lot of video/files to deliver, you need many servers to deliver it. Let say that you do not have, what is the next think that need to check, the users and the processing.
From my experience what is slow down the processing is the locking of the session. So one big step to speed up the processing is to make a total custom session handling and your page will no lock the one the other and you can handle with out issue too many users.
Now for next step let say that you have a database that keep all the data, to gain from a load balance and many computers the trick is to make local cache of what you going to show.
So the idea is to actually avoid too much locking that make the users wait the one the other, and the second idea is to have a local cache on each different computer that is made dynamic from the main database data.
ref:
Web app blocked while processing another web app on sharing same session
Replacing ASP.Net's session entirely
call aspx page to return an image randomly slow
Always online
One more parameter is that you can make a solution that can handle the case of one server for all, and all for one :) style, where you can actually use more servers for backup reason. So if one server go off for any reason (eg for update and restart), the the rest can still work and serve.
As you said, it depends if/when load balancing should be introduced. It depends on performance and how many users you want to serve. LB also improves reliability of your app - it will not stop when one system goes crashing down. If you can see your project growing to be really big and serve lots of users I would sugest to design your application to be able to be upgraded to LB, so do not do anything non-standard. Try to steer away of home-made solutions and always follow good practice. If later on you really need LB it should not be required to change your app.
UPDATE
You may need to think ahead but not at a cost of complicating your application too much. Do not go paranoid and prepare everything to work lightning fast 'just in case'. For example, do not worry about sessions - session management can be easily moved to SQL Server at any time and this is the way to go with LB. Caching will also help if you hit some bottlenecks in the future but you do not need to implement it straight away - good design (stable interfaces), separation and decoupling will allow for the cache to be added later on. So again - stick to good practices, do not close doors but also do not open all of them straight away.
You may find this article interesting.
Background:
Enterprise application - very will written for its time in 2004.
Stack:
.NET, Heavy use of Remoting, ASMX style web services, SQL Server
Problem:
The application allows user to go through various wizards for lack of a better term, all of their actions are stored in what we call "wiz state", which is essentially XML that is persisted to a SQL server database very frequently because we allow users to pause/resume their application. Often in these wizards, the XML that comprises the wizard state grows very large, I'm talking 5-8 MB of data, and we noticed that when we had a sudden influx of simultaneous users, we started receiving occasional timeouts against the database, because a lot of what the wizard state is comprised of, is keeping track of collections of "things". Sometimes these custom collections grow very large.
Question:
We were in a meeting today and we're expecting a flurry of activity in October that will test the system like never before, and possibly result in huge wizard states that go back and forth from the web server to the database. The crux of the situation is that there is only one database and one web server.
For arguments sake, because of the complexity of the application, lets say adding any kind of clustering/mirroring to increase database throughput is out of the question. I spoke up in the meeting and said the quickest way to address this in the shortest time period would be to add more servers to the front end web application so the load could be distributed amongst web servers. The development lead said I was completely wrong and it would have no effect because we only have one database, so adding more web power would do nothing. He is having one of the other developers reduce the xml bloat that we persist frequently to the database. Probably in the long run, reducing the size of the xml that we pass back and forth is the right idea, but will adding additional web servers truly have no effect, I just think in terms of simultaneous users, it should help.
Any responses thoughts are appreciated, proof that more web servers would help would be pure win.
Thanks.
EDIT: We use binary serialization to store the XML in the database in an image field.
I haven't heard anything about locating the "bottlenecks". Isn't that the first thing to do? Here's the method I use.
Otherwise you're just investing in guesses. That won't work.
I've been in meetings like that, where everybody gets excited throwing ideas around, and "management" wants to make "decisions", but it's the blind leading the blind. Knuckle down and find out what's going on. You can't do that in meetings.
Some time ago I looked at a performance problem with some similarity to yours. The biggest "bottleneck" was in writing and parsing XML, with attendant memory allocation, setup, and destruction. Then there were others as well. You might find the same thing, or something different.
P.S. I keep quoting "bottleneck" because all the performance problems I've found have been nothing at all like the necks of bottles. Rather they are like way over-bushy call trees that need radical pruning, such as making and reading mountains of XML for no good reason.
If the rate at which the data is written by SQL is the bottleneck, feeding data to SQL more quickly should have no effect.
I am not sure exactly what the data structure is, but perhaps compressing the XML data on the web server(s) before writing may have a positive effect.
If the bottleneck is the database, then more web services will not help you a lot.
The problem may be that the problem is not only the size of the data, but the number of concurrent request to the same table. The number of writes will be the big problem. If your XML write is in a transaction with other queries you may try to break out the XML write from that transaction to reduce locking time of the XML table.
As stated by vdeych you may try compression to reduce the data size. (That would increase the load on the web servers.)
You may also try caching the data. Only read from the SQL server if the data is not already in the cache. Make sure you don't update the SQL server if your data has not changed.
No one seems to have suggest this, what about replacing your XML serialization of your wizard with JsonSerialization.
Not only should this give you a minor boost in performance in the serialization itself since both the DataContractSerializer (faster) and Newtonsoft Json.NET (fastest) out perform the XML serializers in .NET. This should easily reduce the size of your object graph by upwards of 50% or more (depending on number of properties vs large strings in the XML).
This should dramatically lower the IO that is inflicted upon Sql server. This should also limit the amount of scope required to alter your application significantly (assuming it's well designed and works through common calls for serialization/deserialization).
If you choose to go this route also invest time comparing BSON vs JSON as I think it would be likely that the binary encoded one will offer even more space savings (and further IO reduction) due to the size of your object graphs.
I'm not a .NET expert but maybe using a binary serialization would increase throughput. Making sure that the XML isn't stored as text (fairly obvious but thought I'd mention it). Also relational databases are best for storing relational data, so perhaps substituting an ORM layer in place of the serialization (sounds feasible) could speed things up.
Mike is spot on, without understanding the resource constaint leading to the performance issues, no amount of discussion will resolve the problem. I'll add that socket timeouts that affect running statements are a symptom, and are never imposed by SQL Server, they're an artifact of your driver configuration or a firewall or similar device between app and db imposing them (unless you're talking about timeouts for new connections, then you have a host in serious distress under load).
Given your symptom is database timeouts, you need to start there. If they're indicative of long running statements that result in a socket timeout, use SQL Server profiler to capture the workload while simultaneously monitoring system resources. Given it's a mature application and the type of workload you mention, it's unlikely to be statement tuning related, it probably boils down to resource limitations CPU, memory or disk IO capacity
This Technet guide is a very good place to start:
http://technet.microsoft.com/en-us/library/cc966540.aspx
If it's resource contention, then it's a simple discussion about how the resource contention can be tuned, configured for or addressed by adding more of whatever is needed.
Edit: I should add that given a database performance issue, more applications servers is likely to worsen the problem as you increase the amount of concurrency, that might otherwise be kept in check by connection pool, request processing or other limits.
I'm developing an asp.net web application targeted to business customers. Can anyone provide some guidelines on how I can determine the number of users my application can support?
Also, the application uses session variables so its currently limited to one web server until that changes.
You can use a session state server running on a second box or sql server backed sessions to get around the single box issue.
As for the question at hand, there is no real way to determine that besides getting your hands on production hardware, setting up the app and running load tests until you can figure out where she breaks. This won't necessarily give you the real number as you have to make assumptions about what the users are doing and it is pretty much impossible to simulate the effects of the network cloud in test environments.
Only you/your team can determine the exact numbers that can be supported.
Your key to here is having a deep understanding of your problem domain and a distinct separation of the processing layers.
The separation allows you to isolate the bottlenecks, and tune the performance of the lowest performance factor much more easily, then move on to the next layer/performance limitation.
Do not make assumptions, as you will find impact unrelated to your assumptions that might surprise you.
Design to scale
Design to have separate “layers” for performance tuning reasons as well as your own sanity – it is also a better design principle and this is directly, one of the reasons development is segmented.
Test – design of “pass/fail” testing of layers to a design specification is only one facet of testing. Your question is answered by the performance impact of the technology, architecture and tools you choose to utilize in your application. Plan to make changes to each part of your application to address performance issues.
Gather performance metrics from each “layer”, tune each layer as you make discoveries of performance challenge nature. Plan for and discover how to quantify the performance measurements of each layer.
You WILL at some juncture have to make a compromise between performance and “cool/wow” factors. Each will impact your ability to market your solution, and you must then determine which will have the greatest impact.
This is one of the PAIN factors that I use to measure quality in designs – Plan All Incremental Needs and have discused elsewhere and in blogs.
Personally, I will often make decisions on design based on performance, but your marketing strategy might differ.
So you know, session can be made to work with multiple web servers very simply by moving it out to a sql server database.
A quick how to is available here
As for your original question, I would look into load testing. Hopefully there will be other posters who know more about that. I would focus on page views, as opposed to users.
Measure the resources (CPU, memory, disk, bandwidth) needed for typical actions within your application. Divide available resources by resources needed for a representative user "session" and you have a rough number.
Until you have a good set of real data, you'll have to make guesses about typical usage habits and the resource requirements. That's about all you can do for a 1st pass at estimating capacity.
A good load balancer can ensure that a user will return to the same server.
A client of mine has a website and they need to determine how 'scalable' the site currently is. What I mean by this is the number of users browsing around the site concurrently.
It's a custom e-commerce app in .net, not written by myself and the code is... well lets just say, a bit dubious.
A much bigger company is looking to buy them / throw funding their way but they need some form of metrics to show how much load it can take before it falls apart. This big company has the ability to 'turn on the taps' to a huge user base - and obviously doesn't want to do that if the site is going to fall over with a sneeze of traffic.
What is a good metric to provide here? And how can I obtain it?
Edit: Question revised
I always use Apache's "ab" tool: link text
Run it from a different machine, preferably a BSD or Linux machine with no firewall rules that will limit the performance of the tool. Because otherwise the result might not be as reliable. If you use a Windows machine, make sure you're using one that isn't limiting the number of active TCP connections.
When using "ab", the number you're looking for it "Requests per second". Experiment with the concurrency switch to see how many concurrent users you can handle before you're getting a lot of errors, or when the requests per seconds is dropping rapidly.
When you are noticing the webserver is having serious issues you should restart the webserver, and let it rest for a while before continuing the test.
You'd be better off with a hosted load test, as this might give you more insight on realworld scenario's (something like http://www.scl.com/software-quality/hosted-load-test, no experience with them though).
Furthermore: scalability is as far as I know, not how many concurrent users can be served, but the way how easy it is to serve more when the site grows bigger (by adding extra servers etc, how easy is it for the website to scale up, does the codebase allow to use unlimited number of servers, etc.)
Well, I suppose it'll depend on what the client cares about.
Do they care about how many users to can access the site at once? Report on that, but running simultaneous requests from another server until it dies, then get the number.
Do they care about something else?
For me, when someone says they want it to 'scale', it really means they have no idea what they want. So try and talk to them, and get specific details of what, exactly, they want to see 'scaling', and then, once you find the areas to analyse, you can do so trivially, and attempt to improve them.
I'm working on a forum based website, the site also supports onsite messaging (ie. the users can send private messages to other users), what I'm trying to do is notify a member if they have new messages, for example by displaying the inbox link in bold and also the number of messgages, e.g. Inbox(3)
I'm a little confused how this can be implemented for a website running on a server farm, querying the database with every request seems like an overkill to me, so this is out of question, probably a shared cache should be used for this, I tend to think this a common feature for many sites including many of the large ones (running on server farms), I wonder how they implement this, any ideas are appreciated.
SO caches the questions, however every postback requeries your reputation. This can be seen by writing a couple of good answers quickly, then refreshing the front page.
The questions will only change every minute or so, but you can watch your rep go up each time.
Waleed, I recommend you read the articles on high scalability. They have specific case studies on the architectures of various mega scale web applications. (See the side bar on the right side of the main page.)
The general consensus these days is that RDBMs usage in this type of application is a bottle neck. It is also probably safe to say that most of the highly scalable web applications sacrifice consistency to achieve availability.
This series should be informative of various views on the topic. A word on scalability is highly cited.
In all this, keep in mind that these folks are dealing with Flickr, Amazon, Tweeter scale issues and architectures. The solutions are somewhat radical departures from the (previously accepted) norms and unless your forum application is the next Big Thing, you may wish to first test out the conventional approach to determine if it can handle the load or not.