Live Data Web Application Design - asp.net

I'm about to begin designing the architecture of a personal project that has the following characteristics:
Essentially a "game" containing several concurrent users based on a sport.
Matches in this sport are simulated on a regular basis and their results stored in a database.
Users can view the details of a simulated match "live" when it is occurring as well as see results after they have occurred.
I developed a similar web application with a much smaller scope as the previous iteration of this project. In that case, however, I chose to go with SQLite as my DB provider since I also had a redistributable desktop application that could be used to manually simulate matches (and in fact that ran as a standalone simulator outside of the web application). My constraints have now shifted to be only a web application, so I don't have to worry about this additional level of complexity.
My main problem with my previous implementation was handling concurrent requests. I made the mistake of using one database (which was represented by a single file on disk) to power both the simulation aspect (which ran in a separate process on the server) and the web application. Hence, when users were accessing the website concurrently with a live simulation happening, there were all sorts of database access issues since it was getting locked by one process. I fixed this by implementing a cross-process mutex on database operations but this drastically slowed down the performance of the website.
The tools I will be using are:
ASP.NET for the web application.
SQL Server 2008 R2 for the database... probably with an NHibernate layer for object relational mapping.
My question is, how do I design this so I will achieve optimal efficiency as well as concurrent access? Obviously shifting to an actual DB server from a file will have it's positives, but do I need to have two redundant servers--one for the simulation process and one for the web server process?
Any suggestions would be appreciated!
Thanks.

You should be fine doing both on the same database. Concurrent access is what modern database engines are designed for. Concurrent reads are usually no problem at all; concurrent writes lock the minimum possible amount of data (a table, or even just a number of rows), not the entire database.
A few things you should keep in mind though:
Use transactions wisely. On the one hand, a transaction is an important tool in making sure your database is always consistent - in short, a transaction either happens completely, or not at all. On the other hand, two concurrent transactions can cause deadlocks, and those buggers can be extremely hard to debug.
Normalize, and use constraints to protect your data integrity. Enforcing foreign keys can save the day, even though it often leads to more cumbersome administration.
Minimize the amount of time spent on data access: don't keep connections around when you don't need them, make absolutely sure you're not leaking any connections, don't fetch data you know don't need, do as much data-related processing (especially things that can be solved using joins, subqueries, groupings, views, etc.) in SQL instead of in code

Related

ASP.NET Application In Multiple Datacenters - Best Architecture?

I've traditionally followed the 'one app, one server' architecture for most of the ASP.NET/SQL Server applications I've worked on. I mean that loosely, having used multiple servers with a load balancer, etc. But they have all been in the same datacenter.
However, recently, a requirement has come up to scope an application which will support users in the US, China, and Russia. Performance will be fairly critical, so what is the most sensible way to architect such an application so it performs well in all these areas?
The options I've come up with are:
Use one single data center (ie: don't host in multiple places around the world). Deliver static content over a CDN, but database and ASP.NET site will be hosted in one place (eg: US). This seems like performance may still be an issue though.
Use multiple data centers, and have multiple versions of the application. Eg: ru.myapp.com, us.myapp.com, ch.myapp.com with their own code/databases/etc. This will work, but things like reporting, management, etc would need to be done in each application, which seems like the least efficient approach.
Use a different architecture - but I'm not familiar with alternatives. Is it possible to architect in such as way that you have one single application and database that works across multiple data centers (like a load balanced environment, but on a larger scale).
Does anyone have any experience in the best way to handle this?
Your front end servers can work across data centers same as they work within a single data center. There are some differences though - Load balancer does not usually work cross colo. Do you would have to use geo DNS to route people to nearest data center and then use a load balancer within that data center.
The main issue that comes up is use of shared resources, such as DB or a web service such as authentication web service. If you really need a single DB, then one architecture is to have a single master, but multiple read slaves spread across different data centers. The reads are then NOT paying any penalty for going cross colo. The writes do have to go cross colo and thus pay the latency penalty. This works for most sites where writes are much less numerous than reads and where write performance can be 1-2 seconds slower than read and still be counted as acceptable. e.g. take a movie ticket booking site. The reads are overwhelmingly more than writes.
The cross colo performance can be dramatically improved through the following choices
1. Minimize the number of round trips. e.g. do all writes over a single transaction rather than doing multiple writes through multiple calls to DB. i.e. use batch queries, stored procedures, batch remote call etc.
2. Use optimistic write/eventual consistency if possible. e.g. say you are recording the time a user logged in. You can very well make it asynchronous, where the time is eventually recorded. Though there are scenarios where the eventual consistency is not acceptable.

Cross-server In-memory data (as variable) per user or global (for all users)

My question is regarding aggregated data for fast access across several servers on Amazon EC2. In an ASP.NET application, I would probably store that data in Application["somevar"] variable so it can be accessed quickly (in memory) by all users.
The problem starts when I want that aggregated data to be gathered and its value equal on all servers. If I chose to deploy two servers, the user might be transmitting data to different servers every time (the servers are under a load balancer or ElasticBean), and if for example I count the number of times the user asked for the page, each server's Application var will have different value
For example:
Server 1:
Application["counter1"] = 120
Server 2:
Application["counter1"] = 130
What I want is a variable that be the same on all servers. The reason I want the data in Application-like variable is that I want that data in memory for fast access, then I might write that data to the database.
What I want to know is how can I achieve this. I though about using Amazon ElasticCache so even if I have 10 server under the load balancer, I can access the ElasticCache variable via API and it doesn't matter from which server I access the memcache variable, it will get/set the same variable, and therefore I can achieve my goal in keeping a cross-server global variable.
I wanted to know if it's a good practice and wherever there is a better way to implement such feature.
I am developing my application in ASP.NET C# and with MySQL. Also take into consideration that some of the aggregated data should be written to the database, and I do that to prevent a lot of writes at the same time, and write data after it reaches 20 writes for example and then the data will be written to the database.
Just to clear up a few things. First lets make sure that we understand how to use ElasticCache. The API for ElasticCache doesn't give us any CRUD operations on the cache cluster, the API from Amazon is strictly for managing the servers and configuration. You will need to use a memcached library for .NET to connect to the cluster. Using a cache like memcached is a good solution for you're first problem. It will easily and quickly store simple application variables in a distributed environment. Using a cache is generally a good practice even with smaller applications.
I'm not sure how many users you have or how many you expect to have but one thing I've learned in my years programming is that over optimization is usually a bad idea. Over optimization is when you start to optimize you're code before it's really necessary. Take you're proposed optimization for example. We know that making 1 write on the database is quicker than making 20 writes, generally speaking of course. However, unless your database is the bottleneck in your application to implement such a feature you introduce a significant amount of complexity for no immediate benefit. If a memcached cluster server crashes, which it will, then the data waiting to be written to the database is lost. If you really do have a lot of users then you have to start thinking about concurrency and locks on the memcached items.
Without knowing more about your application i can't make any real recommendations except to say that make sure your optimization are required before you spend time increasing the complexity of your application for nothing.

Adding more hardware v/s refactoring code under a time crunch

Background:
Enterprise application - very will written for its time in 2004.
Stack:
.NET, Heavy use of Remoting, ASMX style web services, SQL Server
Problem:
The application allows user to go through various wizards for lack of a better term, all of their actions are stored in what we call "wiz state", which is essentially XML that is persisted to a SQL server database very frequently because we allow users to pause/resume their application. Often in these wizards, the XML that comprises the wizard state grows very large, I'm talking 5-8 MB of data, and we noticed that when we had a sudden influx of simultaneous users, we started receiving occasional timeouts against the database, because a lot of what the wizard state is comprised of, is keeping track of collections of "things". Sometimes these custom collections grow very large.
Question:
We were in a meeting today and we're expecting a flurry of activity in October that will test the system like never before, and possibly result in huge wizard states that go back and forth from the web server to the database. The crux of the situation is that there is only one database and one web server.
For arguments sake, because of the complexity of the application, lets say adding any kind of clustering/mirroring to increase database throughput is out of the question. I spoke up in the meeting and said the quickest way to address this in the shortest time period would be to add more servers to the front end web application so the load could be distributed amongst web servers. The development lead said I was completely wrong and it would have no effect because we only have one database, so adding more web power would do nothing. He is having one of the other developers reduce the xml bloat that we persist frequently to the database. Probably in the long run, reducing the size of the xml that we pass back and forth is the right idea, but will adding additional web servers truly have no effect, I just think in terms of simultaneous users, it should help.
Any responses thoughts are appreciated, proof that more web servers would help would be pure win.
Thanks.
EDIT: We use binary serialization to store the XML in the database in an image field.
I haven't heard anything about locating the "bottlenecks". Isn't that the first thing to do? Here's the method I use.
Otherwise you're just investing in guesses. That won't work.
I've been in meetings like that, where everybody gets excited throwing ideas around, and "management" wants to make "decisions", but it's the blind leading the blind. Knuckle down and find out what's going on. You can't do that in meetings.
Some time ago I looked at a performance problem with some similarity to yours. The biggest "bottleneck" was in writing and parsing XML, with attendant memory allocation, setup, and destruction. Then there were others as well. You might find the same thing, or something different.
P.S. I keep quoting "bottleneck" because all the performance problems I've found have been nothing at all like the necks of bottles. Rather they are like way over-bushy call trees that need radical pruning, such as making and reading mountains of XML for no good reason.
If the rate at which the data is written by SQL is the bottleneck, feeding data to SQL more quickly should have no effect.
I am not sure exactly what the data structure is, but perhaps compressing the XML data on the web server(s) before writing may have a positive effect.
If the bottleneck is the database, then more web services will not help you a lot.
The problem may be that the problem is not only the size of the data, but the number of concurrent request to the same table. The number of writes will be the big problem. If your XML write is in a transaction with other queries you may try to break out the XML write from that transaction to reduce locking time of the XML table.
As stated by vdeych you may try compression to reduce the data size. (That would increase the load on the web servers.)
You may also try caching the data. Only read from the SQL server if the data is not already in the cache. Make sure you don't update the SQL server if your data has not changed.
No one seems to have suggest this, what about replacing your XML serialization of your wizard with JsonSerialization.
Not only should this give you a minor boost in performance in the serialization itself since both the DataContractSerializer (faster) and Newtonsoft Json.NET (fastest) out perform the XML serializers in .NET. This should easily reduce the size of your object graph by upwards of 50% or more (depending on number of properties vs large strings in the XML).
This should dramatically lower the IO that is inflicted upon Sql server. This should also limit the amount of scope required to alter your application significantly (assuming it's well designed and works through common calls for serialization/deserialization).
If you choose to go this route also invest time comparing BSON vs JSON as I think it would be likely that the binary encoded one will offer even more space savings (and further IO reduction) due to the size of your object graphs.
I'm not a .NET expert but maybe using a binary serialization would increase throughput. Making sure that the XML isn't stored as text (fairly obvious but thought I'd mention it). Also relational databases are best for storing relational data, so perhaps substituting an ORM layer in place of the serialization (sounds feasible) could speed things up.
Mike is spot on, without understanding the resource constaint leading to the performance issues, no amount of discussion will resolve the problem. I'll add that socket timeouts that affect running statements are a symptom, and are never imposed by SQL Server, they're an artifact of your driver configuration or a firewall or similar device between app and db imposing them (unless you're talking about timeouts for new connections, then you have a host in serious distress under load).
Given your symptom is database timeouts, you need to start there. If they're indicative of long running statements that result in a socket timeout, use SQL Server profiler to capture the workload while simultaneously monitoring system resources. Given it's a mature application and the type of workload you mention, it's unlikely to be statement tuning related, it probably boils down to resource limitations CPU, memory or disk IO capacity
This Technet guide is a very good place to start:
http://technet.microsoft.com/en-us/library/cc966540.aspx
If it's resource contention, then it's a simple discussion about how the resource contention can be tuned, configured for or addressed by adding more of whatever is needed.
Edit: I should add that given a database performance issue, more applications servers is likely to worsen the problem as you increase the amount of concurrency, that might otherwise be kept in check by connection pool, request processing or other limits.

How many is too many databases on SQL Server?

I am working with an application where we store our client data in separate SQL databases for each client. So far this has worked great, there was even a case where some bad code selected the wrong customer ids from the database and since the only data in the database belonged to that client, the damage was not as bad as it could have been. My concerns are about the number of databases you realistically have on an SQL Server.
Is there any additional overhead for each new database you create? We we eventually hit a wall where we have just to many databases on one server? The SQL Server specs say you can have something like 32000 databases but is that possible, does anyone have a large number of database on one server and what are the problems you encounter.
Thanks,
Frank
The upper limits are
disk space
memory
maintenance
Examples:
Rebuilding indexes for 32k databases? When?
If 10% of 32k databases each has a active set of 100MB data in memory at one time, you're already at 320GB target server memory
knowing what DB you're connected too
...
The effective limit depends on load, usage, database size etc.
Edit: And bandwidth as Wyatt Barnett mentioned.. I forgot about network, the bottleneck everyone forgets about...
The biggest problem with all the multiple databases is keeping them all in synch as you make schema changes. As far as realistic number of databases you can have and have the system work well, as usual it depends. It depends on how powerful the server is and how large the databases are. Likely you would want to have multiple servers at some point not just because it will be faster for your clients but because it will put fewer clients at risk at one time if something happens to the server. At what point that is, only your company can decide. Certainly if you start getting a lot of time-outs another server might be indicated (or fixing your poor queries might also do it). Big clients will often pay a premium to be on a separate server, so consider that in your pricing. We had one client so paranoid about their data we had to have a separate server that was not even co-located with the other servers. They paid big bucks for that as we had to rent extra space.
ISPs routinely have one database server that is shared by hundreds or thousands of databases.
Architecturally, this is the right call in general. You've seen the first huge advantage--oftentimes, damage can be limited to a single client and you have near zero risk of a client getting into another client's data. But you are missing the other big advantage--you don't have to keep all the clients on the same database server. When you do get big enough that your server is suffering, you can offload clients onto another box entirely with minimal effort.
I'd also bet you'll run out of bandwidth to manage the databases before your server runs out of steam to handle more databases . . .
What you are really asking about is Scalability; Though, ideally setting up 32,000 Databases on one Server is probably not advantageous it is possible (though, not recommended).
Read - http://www.sql-server-performance.com/articles/clustering/massive_scalability_p1.aspx
I know this is an old thread but it's the same structure we've had in place for the past 2 years and current run 1768 databases over 3 servers.
We have the following setup (not included mirrors and so on):
2 web farm servers and 4 content servers
SQL instance just for a master database of customers, which is queried when they access their webpage by the ID to get the server/instance and database name which their data resides on. This is then stored in the authentication ticket.
3 SQL servers to host customer databases on with load spread on creation based on current total number of learners that exist within all databases on each server (quickly calculated by license number field in master database).
On each SQL Server there is a smaller master database setup which contains shared static data that is used by all clients, therefore allowing smaller client databases and quicker updating of the content.
The biggest thing as mentioned above is keeping the database structures synchronises! For this I ended up programming a small .NET windows form that looks up all customers in the master database and you paste code in to execute and it'll loop through getting the database location and executing the SQL you past.
Creating new customers also caused some issues for us, so I ended up programming a management system for our sale people and it create a new database based on a backup of a inactive "blank" database, therefore we have the latest DB without need to re-script the entire database creation script. It then inserts the customer details inside the master database with location of where the database was created and migrates any old data from an old version of our software. All this is done on a separate instance before moving, therefore reducing any SQL locks.
We are now moving to a single database for our next version of the software as database redundancy is near impossible with so many databases! This is a huge thing to consider as SQL creates a couple of waiting tasks which mirror your data per database, once you start multiplying the databases it gets out of hand and the system almost solely is tasked with synchronising and can lock up due to the shear number of threads. See page 30 of Microsoft document below:
SQLCAT's Guide to High Availability Disaster Recovery.pdf
I do however have doubts about moving to a single database, due to some concerns as mentioned above, such as constantly checking in every single procedure that the current customer has access to only their data and also things along the lines of one little issue will now affect every single database, such as table indexing and so on. Also at the minute our customer are spaced over 3 servers, but the single database will mean yes we have redundancy, but if the error was within the database rather than server going down, then that's every single customer down, not just 1 customer database.
All in all, it depends what you're doing and if you are wanting the redundancy; for me, the redundancy is now key and everything else in a perfect world shouldn't happen (such as error which causes errors within the database for everyone). We only started off expecting a hundred or so to move to the system from the old self hosted software and that quickly turned into 200,500,1000,1500... We now have over 750,000 users use our system each year and in August/September we have over 15,000 concurrent users online (expecting to hit 20,000 this year).
Hope that this is of help to someone along the line :-)
Regards
Liam

Multi-Database Transactional System & ASP.NET MVC

So I have a challenge to build a site that people online can use to interact with organizations.: Asp.NET MVC Customer Application
One of the requirements is financial processing and accounting.
I'm very comfortable using SQL Transactions and stored procedures to do this; i.e. CreateCustomer also creates an entity, and an account record. We have a stored procedure to do this, that does a begin transaction, creates some setup records we need, then does a commit. I'm not seeing a good way to do this with an ORM, and after reading some great blog articles I'm starting to wonder if I'm going down the wrong path.
Part of the complexity here is the data itself:
I'm querying x databases (one per existing customer) to get some of my data, though my app has its own data store as well. I need to query the x databases, run stored procedures on the x databases, and also to my own datastore.
I'm not seeing strong support for things like stored procedures and thereby transactions, though it does seem to be present.
Maybe I'm just trying to make my app a nail here, cause the MVC hammer is sooo shiny. I'm plenty comfortable with raw ADO.NET of course, but I'm in love with the expressive feel to writing Linq code in C# and I'd rather not give up on it.
Down to the question:
Is this a bad idea? Should I try to use Linq / Entity Framework, or something like nHibernate... and stick with the ORM pattern or should I trash it and use raw ADO.NET data access?
Edit: a note on scale; from a queries per second standpoint this app is not "huge". But, from a data complexity perspective, it does need to query against 50+ databases (all identical, or close to it) to read data from an external application and publish data back to that application. ORM feels right when dealing with "my" data store, but feels very wrong for accessing the data from the external application.
From a certain size (number of databases) up, you have to change the paradigm. Are you at that size?
When you deploy what ultimately is a distributed application and yet try to controll it as an ordinary local application you are going to run into a set of fundamental issues around availability, scalability and correctness. If you use concepts like 'distributed transactions', 'linked servers' and 'ORM', your are down the wrong path. True distributed applications will use terms like 'message', 'queue' and and 'service'. Terms like Linq, EF, nHibernate are all fine and good, but none will bring you anything extra from what a simple Transact-SQL SELECT statement brings. In other words, if a SELECT solves your issues, then the client side various ORM will work. If not, they won't add any miraculos value.
I recommend you go over the slides on the SQLCAT: High Performance Distributed Applications in Real World Deployments which explain how a site like MySpace manages to read and write into a store of nearly 500 servers and thousands of databases.
Ultimately what you need to internalize is this: one database can have 95% availability (uptime and acceptable service response time). A system consiting of 10 databases with 95% availability has 59% availability. And a system of 100 databases each with 99.5% availability has 60% availability. 1000 databases with 99.95% availability (5 min downtime per week) have 60% availability. And this is for an ideal situation. In reality there is always a snowball effect caused by resource consumption (eg. threads blocked on trying to access an unavailable or slow resource) that makes things far worse.
This means that one cannot write a large distributed system relying on synchronous, tightly coupled operatiosn and transactions. Is simply impossible. You always rely on asynchronous operations (usually messaging and queues), which is something completely different from your run-of-the-mill database application.
use TransactionScope object available in System.Transaction.
What I have chosen is to use Entity Framework to allow access to the application's main data store, and create a custom DAL for access to external application data and for access to stored procedures within the application.
Here's hoping Entity Framework 4.0 fixes the issue. For now, I'm using the concept listed here.
http://social.msdn.microsoft.com/forums/en-US/adodotnetentityframework/thread/44a0a7c2-7c1b-43bc-98e0-4d072b94b2ab/

Resources