Pros/cons of sharding using "app+db nodes" vs. separately sharding the db and load-balancing the app servers - scaling

We are preparing to scale the API side of an API-heavy web application. My (technically savvy) client proposes a rather unconventional approach to this: instead of balancing the load to several app servers, which would talk to a sharded database, he wants us to:
“shard the app servers”, putting both app server code and db on each physical server, so that the app server only connects to its own db shard;
have the app servers talk to each other when they need to access other shards (instead of talking to another shard's DB directly);
have the API client pick an app shard itself (on the client side, based on some stable hash) and talk directly to it.
The underlying reasoning is that this is the most natural thing to do it, and that this would allow us to move to a multisite distributed system in the future.
(The stack is PHP + Node.js on MySQL, although at this point a transition to MongoDB is considered too.)
Now, I don't see huge problems with it off the shelf. It might get somewhat cumbersome to code these server-to-server interactions, but then it will surely have its own benefits. Basically I'm at a loss on whether this is a good idea or not.
What pros and cons come to your mind? I'm looking for technical issues and advantages here. Thanks!

This is just plain bad for many reasons.
The API client should not know which app shard to talk to. This will limit you in ways you probably can't foresee now, but may/will become a problem in the future. The API client should play dumb so you can route requests appropriately if an app server dies, changes, gets sharded again etc.
What happens if your app code or database architecture is slow? (Not both at the same time, just one). Now you have a db shard slowing down an app shard.
Your db+app shards will need to keep both app code+memory and db code+memory in RAM. This means the CPUs will spend more time swapping code and memory in and out to perform both sets of tasks.
I'm finding it hard to put down in words, but this type of architecture screams 'bad coupling' and 'no separation of concerns' (probably not the right terminology but I hope you understand what I mean). You are putting two distinctly different types of applications (app server and database) onto one box. The management nightmare of updating them and routing around failed instances will be very difficult.
I hate to argue my point this way, but a lot of very smart people have dealt with these problems before and I've never heard of this type of architecture. There's probably a reason for it. Not to mention there's a lot of technology and resources out there that can help you handle traditional sharding and load balancing of app and database servers. If you go with your client's suggested architecture you're on your own.

Related

Cloud database for Azure multi-tenant application?

I am starting to port one old desktop single tenant application into the cloud and wish to hear what would be your recommendation about the databases for my cloud-based multi-tenant application?
My basic requirement is simple:
For each tenant, its data is separate to any other tenants' data. I can easily backup, restore, export the data for one single tenant without affecting other tenants.
I don't really want to care about multi-tenancy in the business logic code. It should look like a single tenant application behind the security layer, no tenant ID pass around etc.
Easy to query using some mature technology like LINQ.
Availability and scalability, of course, easy to set up replicas, fail-over and scaling up and down etc.
I have gone through some investigations about multi-tenant application development. I have noticed SQL databases from Azure and AWS are both very expensive(the cost for just SQL database instance is close to the license fee of the original application), so I definitely can't use separate SQL database instances for tenants.
Now I'm reading this book Developing Multi-tenant Applications for the Cloud, 3rd Edition, and it uses Azure Storage Service to implement multi-tenancy. I haven't finished the book yet, it seems you still have to handle the multi-tenancy by yourself and the sample code is already out of date.
I have seen lots of SO questions compare Azure Table Storage with MongoDB. The MongoDB is very new to me, not sure whether it could be easily used to fulfill my requirements?
And I have seen RavenDB as well, it does support multi-tenancy out of box. But I didn't see some good sample code about how to use it in Azure app development.
Hope to hear some good advices from awesome SO guys.
I would better opt with RavenDB on top of MongoDB. Even Raven is a new comer in to the game, it supports most of the features which traditional SQL supports.
Also to make up a decisions the volume of data you are dealing is a also a key decision pointer. Also the amount of traffic you are expecting.
Also keep in mind that operational costs and development efforts. HA and DR scenarios can be problematic when you use Raven or Mongo because of the fact that you need to host them. But when it comes to Azure Storage, it by defaults protects you to a maximum extent by maintaining 3 copies of information.
So I would suggest you to carefully make the trade offs and opt wisely based on your business needs, cost optimization, development and operational effort.
Having a single instance of your application for each tenant is a very expensive way to implement an application, however I realise that if an application was developed with a single tenant in mind, then the costs of changing over can be high.
First can we start out with why you have a desktop application connecting to a database at another location. The latency can really slow down an application. Ideally you would want a locally installed database and have it sync with the cloud DB, or add in appropriate caching into your application.
However the DB would still need to differentiate the clients.
Why do you need this to go to a cloud database? Is it for backup purposes, not installing a DB locally on a clients machine, accessing the same data from many machines or something else?
Unless your application is extremely large, I would recommend rewriting it for multi-tenant to one SQL Azure database. The architecture chosen at the beginning of the project doesn't suit your requirements now. As you expand you will run into further issues.

Can I move my asp.net application to the cloud?

Our company is thinking about moving to the cloud. Would we still be able to meet all our current requirements (below). We want to be able to easily scale in the future without high costs.
5 ASP.net 4.0 websites running (using sql databases, see below)
SQL Server 2008 Express (8 databases on this)
2 Scheduler services running (send nightly reports via email e.g. new orders in db)
MongoDB and Memcached are also installed on server
Currently the websites are on a separate server from the database server for security reasons.
We were thinking about Windows Azure and Amazon Web Services (AWS) as providers, which would best fit our requirements?
Are there any other factors we need to consider?
Re: SQL Databases: on Windows Azure this would map to SQL Azure. Costs start at $5/month for up to a 100 MB instance - and goes all the way up to 150 GB - and goes beyond that with Federations.
Re: 5 ASP.net 4.0 websites running: these map naturally into Windows Azure Web Roles. The "small" instance is $0.12/hour/instance, and you'll usually want two instances (to avoid single point of failure for a few scenarios). Depending on your load, you may be able to put all 5 sites on the same instances. If you have very low usage sites, consider the $0.05/hour/instance "extra small" instance.
Re: Currently the websites are on a seperate server from the database server for security reasons: of course this is also doable.
Re: 2 Scheduler services running: Running Windows Services is no problem.
Re: send nightly reports via email e.g. new orders in db: No problem doing, though is not baked into Windows Azure directly, but there are many simple ways to do this (even for free, such as via SendGrid).
Re: We want to be able to easily scale in the future without high costs: you will need to do the math regarding your actual costs, but Windows Azure can surely scale.
Re: MongoDB and Memcache are also installed on server: These can both be run on Azure. Check out https://github.com/mongodb/mongo for MongoDB. Also, the Azure Caching service is also avail (managed for you).
Re: We were thinking about Azure and Amazon as providers, which would best fit our requirements: These are functionally very similar (in capability and cost), with a few noteworthy differences.
Windows Azure is Platform as a Service meaning that you don't need to worry about Virtual Machines, but rather Applications. In other words, you upload your (basically) Zipped app package to the cloud for execution. With Amazon, you will be dealing with the Virtual Machine yourself. In Azure, you get a copy of Windows Server 2008 which is managed for you, but you can also do admin things to it if you need to. This is far less of an advantage if your app is an old messy install that isn't really clean (though may not be a good high-value cloud candidate anyway).
Windows Azure has an emulator that works great - F5 right from visual studio to work with storage system and VMs and more popular features.
Re: Are there any other factors we need to consider: Yes. With any cloud application, you need to be prepared to deal with scaling out (not up), dealing with transient retries (you may need to retry an operation to a cloud service - any cloud service). The benefits of this are much better (and more cost-effective) scalability and higher reliability (when you run across nodes, you don't have a single point of failure). Be sure to understand when/where storage on a VM is persistent vs. ephemeral. There are more considerations, but these are primary ones.
You may want to check out the Windows Azure Pricing calculator.
Good luck! And welcome to the cloud.
with the exception of the scaling question, and the 2 physical servers, you can move this functionality into a hosted environment and you will technically be in "the cloud". This could be a dedicated or VPS (Virtual Private Server), or even a shared server if you are small.
Those can allow for growth over time...you just need to upgrade what you have with the provider.
You also could use a colo-server with a hosting provider, which basically means you put your hardware in an hosting provider rack, and use their electricity and bandwidth. They charge based on bandwidth usage.
Since you are using SQL Express, remember that each database is limited to 8gb. So that will limit your growth at some point. That would entail an upgrade from Express to regular SQL if you don't want to re-engineer anything.
Have you considered AppHarbour? It has Memcached, MongoDB, SQL Server and so on, and is quicker to deploy to than Azure. I like Azure, but there is quite a learning curve and I have found the connection to SQL Azure to be pretty bad - which means re-engineering your DAL to use something like the SQL Transient Failure Library = a bit of a faff for existing projects.
AppHarbour does not have blob storage - so if you are uploading files you will need to use Azure Blob Storage or Amazon S3 or some equivalent as well.
Hope this helps.
Not an expert but being that Asp.net is a Microsoft product it should be easier to migrate to azure, although from what I have heard AWS shouldn't be difficult. Another thing you may want to consider is cost. Last time I checked AWS is significantly less costly unless you already pay for MSDN subscriptions.
All the requirements you sum up are not any issue to deploy in Windows Azure. You can find a lot of information on the internet on how to do this.
Keep in mind, if you want to deploy your services to windows azure, you'll need to do some code review of your applications to fix session state, output cache and so forth on your web applications.
Since you want to scale them out and they are sitting behind a non-sticky round-robin load balancer, you will run into issues with your session state if it is saved on the machine itself. You'll need to part session state to SQL Azure or to the Windows Azure table storage for example.
Installing MongoB and Memcache in Azure is not an issue, you'll find a lot of information on how to do it, but it'll require some to set up your role and the scripting
codingoutloud has given a very detailed answer. I would add two very key considerations to think about when moving any application to Azure (or, indeed, many other cloud providers).
Local state
With normal Azure, they reserve the right to shut down any one instance of a role at any time in order to move or upgrade it. This means you always need at least two instances of any one role and they will be transparently load balanced. If your current websites are currently running on individual servers then they may rely on session state or files in local directories etc. Now, there are ways around this (like putting session state in SQL, using the cookie provider for temp data, using a shared drive for files etc) or, indeed, bypassing a lot of the benefits of Azure and using their "virtual server" concepts which means you don't get the scale benefits etc.
But, sites that rely heavily on local state may be challenging to move to the cloud.
Time Zones
All Azure servers run on UTC time. If you are used to running on dedicated servers serving users from a single time zone then chances are that you use things like DateTime.Now() which won't really correspond to what the user wants.
I don't see any of the above as limitations of Azure, I find them very useful in forcing you to build global and scalable solutions from the start. However, when porting an existing application, the above may be quite a challenge to adapt to, even though there are workarounds.
As also mentioned elsewhere, there is a learning curve to Azure and somehow the documentation - plentiful as it is - just doesn't quite seem to help for some reason. Once you "get it", though, I find Azure really nice and there are a bunch of subtle features that will help you build scalable solutions, like the whole queuing infrastructure, the blob storage and the table storage. In some ways the learning is hampered by having too much choice.
Good luck!

SQLite use it for websites, but not for client/server apps?

After reading this question and the suggested link explaining when is more appropriate to use SQLite vs another DB it's still unclear to me one simple thing, and I hope someone could clarify it.
They say:
Situations Where SQLite Works Well
Websites
SQLite usually will work
great as the database engine for low
to medium traffic websites...
...
Situations Where Another RDBMS May
Work Better
Client/Server
Applications...
If you have many
client programs accessing a common
database over a network...
Isn't a website also a client/server app?
I mean I don't understand, a website is exactly a situation where I have many client programs (users with their web browsres) concurrently accessing a common DB via one server application.
Just to keep it simple: at the end of the day, is it possible for instance to use this SQLite for an ecommerce site or an online catalog or a CMS site with about 1000 products/pages?
The users' web browsers don't directly access the database; the web application does. And normally the request/response cycle for each page the user views will be very fast, usually lasting a fraction of a second.
IIRC, a transaction in SQLite locks the whole database file, meaning that if a web app request requires a blocking transaction, all traffic will effectively be serialized. This is fine, for a low-to-medium traffic website, because many requests per second can still be handled.
In a client-server database application, however, multiple users may need to keep connections open for longer periods of time, and may also need to perform transactions. This is far less of a problem for bigger RDBMS systems because locking can be performed in a more fine-grained way.
SQLite can allow multiple client reads but only single client write. See: https://www.sqlite.org/faq.html
Client/server is when multiple clients do simultaneous writes to the database, such as order entry where there are multiple users simultanously inserting and updating information, or a multi-user blog where there are multiple simultaneous editors.
A website, in the case of read-only, is not client/server but rather simply a server with multiple requests. In many cases, a website is heavily cached and the database is not even accessed, or rarely.
In the case of a slightly used ecommerce website, say a few simultaneous shoppers, this could be supported by SQLite, or by MySQL. Somewhere there is a line where performance is better for a highly-concurrent database as opposed to SQLite.
Note that the number of products/pages is not a great way to determine the requirement for MySQL over SQLite, rather it is the number of concurrent users, and at what point their concurrent behavior experiences slowness due to waiting for locks to clear.
A website isn't necessarily a client server application in the context of use.
I think when they say website, they mean that the web application will directly manage the database. That is, the database file will live within the web site and will not be access via any other means. (A single point of access, put simply)
In contrast, a client/server app may have the web site accessing the data store as well as another web site, SOAP client or even a smart client. IN this context, you have multiple clients access one database (server). This is where the web site would become (yet another) client.
Another aspect to consider when constrasting the two, is what is the percentage of writes compared to reads. I think SQLite will perform happiply when there is little writing going on compared to the amount of reads. SQLite, I understand, doesn't do well in a multiple write scenario. It's intended for a single (handful?) process to be manipulating it.
I mainly only use SQLite on embedded applications. (iOS, Android). For larger, more complex websites (like your describing) I would use something like mySQL.

Moving an Asp.net application to a load-balanced environment

I have an application which works well on a single server. My customer now wants to move it to a load-balanced environment. Which things are likely to bite me when doing this?
Currently I know of
Session state, and
Machine key.
Both of these are described here, for example, so I'm looking for things additional to this.
These similar questions, but the first addresses load balancing in general, and I'm looking for a migration guide, and the second addresses a specific problem.
One thing that you might experience is an increased load on your database server. If you implement any serverside caching of data, you will be used to your site hitting the database once for a given dataset, caching the data and then not going to the database again until the cache times out. This is a good strategy for commonly accessed data.
In a load-balanced environment, subsequent requests might go to different servers and the database will be hit twice for the same data. Or more if you have more than 2 servers. This is not bad in itself, but you would be well advised to keep an eye on it. If the database is queueing, the benefits of running a webfarm might be negated. As always, benchmark, profile and run tests.
One way around this is to have sticky-sessions. This is a router-based approach. once a user session is established, all requests from that user are routed to the same server. This has its draw backs, most notably, a potential decrease in the efficiency of the load-balancing, not to mention the problems when you lose a server. Furthermore, it will only help when you are caching mostly user-specific data, such as paged search results, which are only cached for a short time.
Another solution is to have a distributed, in memory cache, such as memcached, Velocity or NCache. There are others as well, but these are the ones that I have worked with.
Another thing to look out for, which is unrelated to the above is: How you deal with file uploads from your users. Many sites allow files to be uploaded by users. If such files have not previously been saved to a central store, such as a database or a common file share, they will need to be.
Look at this article which describes some tips regarding preparing asp.net application for load balancing.

How to upgrade my asp.net app to support more users?

When an asp.net website has about 1,000 active users, it works good.
How should I do if the website has about 100,000 active users?
How to upgrade my asp.net app to support a larger number of users?
Changing the webApp's architecture?
Or buying more web servers?
I just wonder in the real-world, how do other people build an asp.net website supporting millions of users? What's the app architecture of a website to support that?
Any suggestion will be welcome.
First, make sure you're with a first rate hosting provider.
Second, download a performance profiler (I always suggest Red Gate Performance Profiler) and profile your app. Find the bottlenecks and eliminate them. Repeat until you get your desired performance metric.
If your application is querying a database or other web services, try to use asynchronous methods. Using asynch methods will free up the web server to handle a lot more client requests while it is waiting for a response from the database server or web service.
You say it "works good" at the moment. It's impossible to know what the point at which this may change will be wihtout knowing a whole lot more about the nature of your traffic, current set up, what else runs on the server, etc ,etc. It could be that it continues to "work good" with a million users as it is.
When you need to make changes (and slowly reducing performance will alert you), that's whne you need to worry. And then, as Justin says, knowing the potential bottelnecks will give you pointers as to what solution you need.
Buying more servers is one strategy. So is changing the architecture. The easiest and cost effective is throwing more servers at it. It does depend a little bit on the current application architecture, but nothing that can't be easily overcome.
What I suggest, is to load test your application. See what happens as you increase the active users. Who knows it might handle 100k active users, maybe it won't but at least you will know the tipping point.
In regards to what you should do, that really depends on your business needs. If your company has the $$ and this is a core product, then it makes sense to architect a robust application. If it's not, maybe throwing hardware at the problem is good enough.
It would also help if you could define an active user. Is it someone who is visiting your site and has a session? Is it 100k concurrent requests to the server...?
In terms of hardware scaling: Scaling Up or Scaling out
Software scaling - Profile your app

Resources