Cross-server In-memory data (as variable) per user or global (for all users) - asp.net

My question is regarding aggregated data for fast access across several servers on Amazon EC2. In an ASP.NET application, I would probably store that data in Application["somevar"] variable so it can be accessed quickly (in memory) by all users.
The problem starts when I want that aggregated data to be gathered and its value equal on all servers. If I chose to deploy two servers, the user might be transmitting data to different servers every time (the servers are under a load balancer or ElasticBean), and if for example I count the number of times the user asked for the page, each server's Application var will have different value
For example:
Server 1:
Application["counter1"] = 120
Server 2:
Application["counter1"] = 130
What I want is a variable that be the same on all servers. The reason I want the data in Application-like variable is that I want that data in memory for fast access, then I might write that data to the database.
What I want to know is how can I achieve this. I though about using Amazon ElasticCache so even if I have 10 server under the load balancer, I can access the ElasticCache variable via API and it doesn't matter from which server I access the memcache variable, it will get/set the same variable, and therefore I can achieve my goal in keeping a cross-server global variable.
I wanted to know if it's a good practice and wherever there is a better way to implement such feature.
I am developing my application in ASP.NET C# and with MySQL. Also take into consideration that some of the aggregated data should be written to the database, and I do that to prevent a lot of writes at the same time, and write data after it reaches 20 writes for example and then the data will be written to the database.

Just to clear up a few things. First lets make sure that we understand how to use ElasticCache. The API for ElasticCache doesn't give us any CRUD operations on the cache cluster, the API from Amazon is strictly for managing the servers and configuration. You will need to use a memcached library for .NET to connect to the cluster. Using a cache like memcached is a good solution for you're first problem. It will easily and quickly store simple application variables in a distributed environment. Using a cache is generally a good practice even with smaller applications.
I'm not sure how many users you have or how many you expect to have but one thing I've learned in my years programming is that over optimization is usually a bad idea. Over optimization is when you start to optimize you're code before it's really necessary. Take you're proposed optimization for example. We know that making 1 write on the database is quicker than making 20 writes, generally speaking of course. However, unless your database is the bottleneck in your application to implement such a feature you introduce a significant amount of complexity for no immediate benefit. If a memcached cluster server crashes, which it will, then the data waiting to be written to the database is lost. If you really do have a lot of users then you have to start thinking about concurrency and locks on the memcached items.
Without knowing more about your application i can't make any real recommendations except to say that make sure your optimization are required before you spend time increasing the complexity of your application for nothing.

Related

ASP.NET Application In Multiple Datacenters - Best Architecture?

I've traditionally followed the 'one app, one server' architecture for most of the ASP.NET/SQL Server applications I've worked on. I mean that loosely, having used multiple servers with a load balancer, etc. But they have all been in the same datacenter.
However, recently, a requirement has come up to scope an application which will support users in the US, China, and Russia. Performance will be fairly critical, so what is the most sensible way to architect such an application so it performs well in all these areas?
The options I've come up with are:
Use one single data center (ie: don't host in multiple places around the world). Deliver static content over a CDN, but database and ASP.NET site will be hosted in one place (eg: US). This seems like performance may still be an issue though.
Use multiple data centers, and have multiple versions of the application. Eg: ru.myapp.com, us.myapp.com, ch.myapp.com with their own code/databases/etc. This will work, but things like reporting, management, etc would need to be done in each application, which seems like the least efficient approach.
Use a different architecture - but I'm not familiar with alternatives. Is it possible to architect in such as way that you have one single application and database that works across multiple data centers (like a load balanced environment, but on a larger scale).
Does anyone have any experience in the best way to handle this?
Your front end servers can work across data centers same as they work within a single data center. There are some differences though - Load balancer does not usually work cross colo. Do you would have to use geo DNS to route people to nearest data center and then use a load balancer within that data center.
The main issue that comes up is use of shared resources, such as DB or a web service such as authentication web service. If you really need a single DB, then one architecture is to have a single master, but multiple read slaves spread across different data centers. The reads are then NOT paying any penalty for going cross colo. The writes do have to go cross colo and thus pay the latency penalty. This works for most sites where writes are much less numerous than reads and where write performance can be 1-2 seconds slower than read and still be counted as acceptable. e.g. take a movie ticket booking site. The reads are overwhelmingly more than writes.
The cross colo performance can be dramatically improved through the following choices
1. Minimize the number of round trips. e.g. do all writes over a single transaction rather than doing multiple writes through multiple calls to DB. i.e. use batch queries, stored procedures, batch remote call etc.
2. Use optimistic write/eventual consistency if possible. e.g. say you are recording the time a user logged in. You can very well make it asynchronous, where the time is eventually recorded. Though there are scenarios where the eventual consistency is not acceptable.

Caching large amounts of data

I have been reading that lots of people use Redis or another key-value store/NoSQL solution as a distributed cache for their website.
Maybe I'm not understanding completely, but it seems a solution like this only works for shared data. For example, if I have a website that requires a user to log-in and the queries they generate return data specific to only that user (in my case, banking/asset information) that can't be cached for all users, this type of solution doesn't work.
Unfortunately, the database is shared across all our applications and when it get bogged down, the website gets bogged down as well. Since each user has gigabytes of information, I obviously can't cache all of that and each web page queries completely different information.
Is there some caching strategy that I can employ for this type of scenario?
A distributed cache like Velocity doesn't require that the data it stores be limited to "shared" data. But you do have to read the data from your DB and store it in the cache, which takes time.
A few alternatives:
Partition your data, so it's spread out among several DB servers
Add as much RAM as you can to each DB server, to allow SQL Server to cache what it can
There are many variations to the partitioning theme....
Is your web app load balanced? There are caching options at the web tier as well -- the ASP.NET object cache is a good place to start.
It's possible that your web clients are requesting the same data more than once (for a given user). So caching could give a benefit in that case.
But before you go implementing a huge caching solution, you really need to look at the queries that are particularly slow or executed a huge number of times and see if you can optimize them in any way.
Then look at upgrading your DB machine.
I read a nice article about the performance issues that MySpace had when they had a huge growth.
You can find the article here.
One quote from the article that stands out:
The addition of the cache servers is "something we should have done
from the beginning, but we were growing too fast and didn't have time
to sit down and do it," Benedetto adds
If the problem is in your database server think about partitioning your data and making use of a database farm to spread the load. Also think about SSD's! They can really speed up your database access code.
Depending how dynamic your data is you could consider using Fragment Caching. This will cache the HTML of the page rather than the data so if the volume of data is prohibtive to cache then this might work for you

Live Data Web Application Design

I'm about to begin designing the architecture of a personal project that has the following characteristics:
Essentially a "game" containing several concurrent users based on a sport.
Matches in this sport are simulated on a regular basis and their results stored in a database.
Users can view the details of a simulated match "live" when it is occurring as well as see results after they have occurred.
I developed a similar web application with a much smaller scope as the previous iteration of this project. In that case, however, I chose to go with SQLite as my DB provider since I also had a redistributable desktop application that could be used to manually simulate matches (and in fact that ran as a standalone simulator outside of the web application). My constraints have now shifted to be only a web application, so I don't have to worry about this additional level of complexity.
My main problem with my previous implementation was handling concurrent requests. I made the mistake of using one database (which was represented by a single file on disk) to power both the simulation aspect (which ran in a separate process on the server) and the web application. Hence, when users were accessing the website concurrently with a live simulation happening, there were all sorts of database access issues since it was getting locked by one process. I fixed this by implementing a cross-process mutex on database operations but this drastically slowed down the performance of the website.
The tools I will be using are:
ASP.NET for the web application.
SQL Server 2008 R2 for the database... probably with an NHibernate layer for object relational mapping.
My question is, how do I design this so I will achieve optimal efficiency as well as concurrent access? Obviously shifting to an actual DB server from a file will have it's positives, but do I need to have two redundant servers--one for the simulation process and one for the web server process?
Any suggestions would be appreciated!
Thanks.
You should be fine doing both on the same database. Concurrent access is what modern database engines are designed for. Concurrent reads are usually no problem at all; concurrent writes lock the minimum possible amount of data (a table, or even just a number of rows), not the entire database.
A few things you should keep in mind though:
Use transactions wisely. On the one hand, a transaction is an important tool in making sure your database is always consistent - in short, a transaction either happens completely, or not at all. On the other hand, two concurrent transactions can cause deadlocks, and those buggers can be extremely hard to debug.
Normalize, and use constraints to protect your data integrity. Enforcing foreign keys can save the day, even though it often leads to more cumbersome administration.
Minimize the amount of time spent on data access: don't keep connections around when you don't need them, make absolutely sure you're not leaking any connections, don't fetch data you know don't need, do as much data-related processing (especially things that can be solved using joins, subqueries, groupings, views, etc.) in SQL instead of in code

Static variable across multiple requests

In order to improve speed of chat application, I am remembering last message id in static variable (actually, Dictionary).
Howeever, it seems that every thread has own copy, because users do not get updated on production (single server environment).
private static Dictionary<long, MemoryChatRoom> _chatRooms = new Dictionary<long, MemoryChatRoom>();
No treadstaticattribute used...
What is fast way to share few ints across all application processes?
update
I know that web must be stateless. However, for every rule there is an exception. Currently all data stroed in ms sql, and in this particular case some piece of shared memory wil increase performance dramatically and allow to avoid sql requests for nothing.
I did not used static for years, so I even missed moment when it started to be multiple instances in same application.
So, question is what is simplest way to share memory objects between processes? For now, my workaround is remoting, but there is a lot of extra code and I am not 100% sure in stability of this approach.
I'm assuming you're new to web programming. One of the key differences in a web application to a regular console or Windows forms application is that it is stateless. This means that every page request is basically initialised from scratch. You're using the database to maintain state, but as you're discovering this is fairly slow. Fortunately you have other options.
If you want to remember something frequently accessed on a per-user basis (say, their username) then you could use session. I recommend reading up on session state here. Be careful, however, not to abuse the session object -- since each user has his or her own copy of session, it can easily use a lot of RAM and cause you more performance problems than your database ever was.
If you want to cache information that's relevant across all users of your apps, ASP.NET provides a framework for data caching. The simplest way to use this is like a dictionary, eg:
Cache["item"] = "Some cached data";
I recommend reading in detail about the various options for caching in ASP.NET here.
Overall, though, I recommend you do NOT bother with caching until you are more comfortable with web programming. As with any type of globally shared data, it can cause unpredictable issues which are difficult to diagnosed if misused.
So far, there is no easy way to comminucate between processes. (And maybe this is good based on isolation, scaling). For example, this is mentioned explicitely here: ASP.Net static objects
When you really need web application/service to remember some state in memory, and NOT IN DATABASE you have following options:
You can Max Processes count = 1. Require to move this piece of code to seperate web application. In case you make it separate subdomain you will have Cross Site Scripting issues when accesing this from JS.
Remoting/WCF - You can host critical data in remoting applcation, and access it from web application.
Store data in every process and syncronize changes via memcached. Memcached doesn't have actual data, because it took long tim eto transfer it. Only last changed date per each collection.
With #3 I am able to achieve more than 100 pages per second from single server.

Multi-Database Transactional System & ASP.NET MVC

So I have a challenge to build a site that people online can use to interact with organizations.: Asp.NET MVC Customer Application
One of the requirements is financial processing and accounting.
I'm very comfortable using SQL Transactions and stored procedures to do this; i.e. CreateCustomer also creates an entity, and an account record. We have a stored procedure to do this, that does a begin transaction, creates some setup records we need, then does a commit. I'm not seeing a good way to do this with an ORM, and after reading some great blog articles I'm starting to wonder if I'm going down the wrong path.
Part of the complexity here is the data itself:
I'm querying x databases (one per existing customer) to get some of my data, though my app has its own data store as well. I need to query the x databases, run stored procedures on the x databases, and also to my own datastore.
I'm not seeing strong support for things like stored procedures and thereby transactions, though it does seem to be present.
Maybe I'm just trying to make my app a nail here, cause the MVC hammer is sooo shiny. I'm plenty comfortable with raw ADO.NET of course, but I'm in love with the expressive feel to writing Linq code in C# and I'd rather not give up on it.
Down to the question:
Is this a bad idea? Should I try to use Linq / Entity Framework, or something like nHibernate... and stick with the ORM pattern or should I trash it and use raw ADO.NET data access?
Edit: a note on scale; from a queries per second standpoint this app is not "huge". But, from a data complexity perspective, it does need to query against 50+ databases (all identical, or close to it) to read data from an external application and publish data back to that application. ORM feels right when dealing with "my" data store, but feels very wrong for accessing the data from the external application.
From a certain size (number of databases) up, you have to change the paradigm. Are you at that size?
When you deploy what ultimately is a distributed application and yet try to controll it as an ordinary local application you are going to run into a set of fundamental issues around availability, scalability and correctness. If you use concepts like 'distributed transactions', 'linked servers' and 'ORM', your are down the wrong path. True distributed applications will use terms like 'message', 'queue' and and 'service'. Terms like Linq, EF, nHibernate are all fine and good, but none will bring you anything extra from what a simple Transact-SQL SELECT statement brings. In other words, if a SELECT solves your issues, then the client side various ORM will work. If not, they won't add any miraculos value.
I recommend you go over the slides on the SQLCAT: High Performance Distributed Applications in Real World Deployments which explain how a site like MySpace manages to read and write into a store of nearly 500 servers and thousands of databases.
Ultimately what you need to internalize is this: one database can have 95% availability (uptime and acceptable service response time). A system consiting of 10 databases with 95% availability has 59% availability. And a system of 100 databases each with 99.5% availability has 60% availability. 1000 databases with 99.95% availability (5 min downtime per week) have 60% availability. And this is for an ideal situation. In reality there is always a snowball effect caused by resource consumption (eg. threads blocked on trying to access an unavailable or slow resource) that makes things far worse.
This means that one cannot write a large distributed system relying on synchronous, tightly coupled operatiosn and transactions. Is simply impossible. You always rely on asynchronous operations (usually messaging and queues), which is something completely different from your run-of-the-mill database application.
use TransactionScope object available in System.Transaction.
What I have chosen is to use Entity Framework to allow access to the application's main data store, and create a custom DAL for access to external application data and for access to stored procedures within the application.
Here's hoping Entity Framework 4.0 fixes the issue. For now, I'm using the concept listed here.
http://social.msdn.microsoft.com/forums/en-US/adodotnetentityframework/thread/44a0a7c2-7c1b-43bc-98e0-4d072b94b2ab/

Resources