Assets Management in a clustered environment - networking

I have a content management system running on a web server, that among others allows the user to upload assets like images, files, etc to the server.
The problem i have is that there will be 2 servers running behind a load balancer and i am trying to find an efficient way to handle the assets management.
The question i have is:
Will the assets be uploaded to one server every time? Or is there a chance that the images/files will end up into server1 or server2 depending on the load?
How to i serve the images if i don't know on which server they end up in? Will i have to keep the directories of these assets (images/files) synchronized between the two servers?
Thanks,

Synchronization is a tough problem to crack. You can do ad-hoc synchronization using Couchdb but that requires good knowledge of the low-level issues. Therefore you need to choose a write master.
DRDB
You could look at DRDB :D Use one server as the write-master and the other as the slave. Then you can server content from both. This approach is amazing for database-pairs.
Note: seperating your code and URL's for write-master and serve-only will be anoying
Couchdb
You could use couchdb but I think that might be overkill. This is for the LARGE amounts of data and high-levels of fault tolerance.
NFS
You could export the asset directory on the write-master as an nfs drive and import it from the other computer. But in this case it wouldn't be load-balanced in all cases -- i.e., only if the files are cached by the slave. You could use a third computer as an NFS server -- this would allow you to scale to more web-servers.
A central NFS server might just be your best solution as you can do without a write-master as every front-end server can perform writes. This is the approach I would use unless I am thinking of going past the peta-byte range :P

Related

Planning server infrastructure when hosting duplicated web-product over multiple servers

We have a web-application product that we sell to companies that is hosted at our servers.
The product contains couple of web applications, windows services and SQL server db.
Right now we have only one client that uses our product. We have two servers - one for the web apps and services and other for the db.
In order to add the product to another client, we have to 'duplicate' all the apps and db and run in separately.
As we started expanding and some companies will require more server power then others, I need to plan the servers infrastructure.
Having two servers for each client sounds ridiculous. Hosting costs will be huge. What will happen when I'll have 10 clients? And probably some servers will take more power than others, leaving servers using 30% from their capacity while others use 70%.
One thing I really care about is separating the DB from each product so in case of server compromise, only one db will be at risk.
So... I thought about Virtual Machines...
Does it sounds right?
Do I need two super servers to hold virtual machine instances? (one for web and other for db?)
What about Load balancing / etc..?
Will it require more maintenance time only because I use virtual machines?
Are there any hardware recommendations?
Any help will be appreciated
Many thanks
Virtual Machines is definitely the safest way to separate clients and will allow you the flexibility to allocate a specific percentage of resources to specific clients.
However, using separate processes on the same physical machine will perform better (but not always significantly) and will allow more dynamic use of resources (i.e., if one spikes, it will use the resources it needs). This setup will not allow you to control the resource allocation nearly as easily though. You'll also have to build your own monitoring tools to see and analyze what processes (clients) are using what resources (piggyback on perfmon).
Using separate processes also is dangerous if your application wasn't designed for this. Anywhere the application caches data on the file system or accesses anything besides memory and the database needs to be thoroughly scrubbed to make sure data from clients is not co-mingled or shared.
Separate virtual machines is more work to manage--each one is pretty much like it's own computer. So you have to manage all the VM's plus the physical machine.
You may also want to consider hosting in a more dynamic environment like Amazon AWS or Microsoft's Azure which will allow you to more easily scale up/down as necessary than a VM at a traditional host.

How do I set up the Boost module and rsync in order to make mirroring a breeze?

I'm looking for a way to setup a server in order to make the static caches created by boost module easily mirrorable to several other servers.
You COULD use rsync to do this but it is brittle and liable to break. You would be better off by using either:
a single shared network filesystem
or my recommended solution, use a cluster distributed filesystem such as glusterFS. This is what is generally used on web server clusters for distributing web apps across nodes automagically.
here are some ideas...
If you want to prevent getting stabbed in the back by your hosting provider wouldn't be better to use a solution that doesn't rely on the hosting provider?
My choice would be to use a third party dns provider which supports Round Robin [ http://en.wikipedia.org/wiki/Round_robin_DNS ] -or your own server configured to support round robin- (which you can also use for auto-load-balancing).
Round robin should allow you to have several A addresses and, every time someone goes to your domain, it checks whether the servers are up or down, and redirects to the servers that are up.
For the static caches I think you could use rsync, but that's involving your hosting provider. Maybe a better way (but I think not resource-efficient) would be to have clones of your drupal installation in each server, then syncing the DBs using MySql Mirroring (and cron to create the boost static cache)... then you would not depend on any server because all of them would have the whole site and use Round Robin to redirect your domain to the working server.

ASP.Net load balancing

I am working on asp.net (newbie) and I am trying to understand what it means to do "load balancing" for the web site. The website will be used by multiple users and resources (database, web service,..).
If anyone could help me understanding the concept of the load balance for asp.net web site, I would really appreciate it.
Thanks.
One load-balancing-related issue you may want to be aware of at development time: where you store your session state. This MSDN article gives a good overview of your options.
If you implement your asp.net system using "out-of-process" or "sql-server-mode" session state management, that will give you some additional flexibliity later, if you decide to introduce a load-balancer to your deployed system:
Your load balancer needn't handle session affinity. As one poster mentioned above, all modern load-balancers handle it anyway, so this is a minor consideration in any case.
Web-gardens (a sort of IIS/server-implemented load-balancer) REQUIRES use of "out-of-process" or "sql-server-mode" session state management. So if your system is already configured that way, you'll be one step closer to being able to use web-gardens.
What is it?
Load balancing simply refers to distributing a workload between two or more computers. As a concept, it's not unique to asp.net. Although having separate machines for your database and web server could be called "load balancing" it more commonly refers to using multiple machines to serve a single role, such as having multiple web servers.
Should you worry about it? Probably not. Do you already have a performance problem? Are your database and web server on their own machines? If you do find that your server resources are strained, it would probably be easier to scale up (a more powerful single machine) than out (load balancing). These days, a dedicated box can handle a LOT of traffic if your code is decent.
Load Balancing, in the programming sense, does not apply to ASP.NET; it applies to a technique to try to distribute server load across two or more machines, rather than it all being used on one machine. Unless you will have many thousands (millions?) of users, you probably do not need to worry about it.
Check the Wikipedia article for more information.
Load balancing is not specific for any on technology stack be it asp.net, jsp etc. To load balance is to spread the incoming requests to a web site over more than one server. This is typically done with a software or hardware load balancer. The load balancer sits in front of two or more web servers and delegates the incoming traffic. Although this technique is not limited to web servers. Load Balancing
Enjoy!
I've never used it, but an option is IIS Application Request Routing.
IIS Application Request Routing (ARR)
2.0 enables Web server administrators, hosting providers, and Content
Delivery Networks (CDNs) to increase
Web application scalability and
reliability through rule-based
routing, client and host name
affinity, load balancing of HTTP
server requests, and distributed disk
caching
In a typical web server/database scenario, the db is almost always guaranteed to load up the machine first. This is because dealing with storing data requires more resources. Before you even start looking at load balancing your web server, you need to think about how to load balance the database.
Spreading one database across multiple servers is a lot harder than load balancing a web server. One of the techniques that can be used is sharding (or horizontal partitioning). This is where some records are stored on one server, and other records - on another server. For example records with ID 1-900000 are on server 1 and records 900001- are on server 2.
In comparison to DB load balancing, spreading the load across multiple ASP.NET servers is not overly complicated. Most of the session issues can be easily mitigated by using out of process session and/or never talking to Application.Cache directly. Data load balancing on the other hand is hard and requires a lot of planning and trial and error. In most cases, talking to a load balanced DB requires using an ORM which supports it (e.g. NHibernate) or your own Data Access Layer. The reason being is that you need to take out establishing a connection from the code that uses the database, so that the decision which DB to talk to is handled in one place.
the exact solution is to save session into the SQL Server with Stored Procedure. To read session call 'SessionCheck' stored Procedure.
I'd add that it really isn't something to worry about. By the time you need a load balancer, you can probably afford one of the neato newfangled ones with sticky sessions so you don't even have to deal with the session boogeyman.

Using EC2 Load Balancing with Existing Wordpress Blog

I currently have a virtual dedicated server through Media Temple that I use to run several high traffic Wordpress blogs. Both tend to receive sudden StumbleUpon traffic surges that (I'm assuming) cause the server CPU to run at 100% and slow down everything. I'm currently using WP-Super-Cache, S3, and CloudFront for most static files, but high traffic is still causing slowdown on the CPU.
From what I'm reading, it seems like I might want to use EC2 to help the existing server when traffic spikes occur. Since I'm currently using the top tier of virtual dedicated servers on Media Temple, I'd like to avoid jumping to a dedicated server if possible. I get the sense that AWS might help boost the existing server's power. How would I go about doing this?
I apologize if I'm using any of these terms incorrectly -- I'm relatively amateur when it comes to server administration. If this isn't the best way to improve performance, what is the recommended course of action?
The first thing I would do is move your database server to another Media Temple VPS. After that, look to see which one is hitting 100% CPU. If it's the web server, you can create a second instance, and use a proxy to balance the load. If it's the database, you may be able to create some indexes.
Alternatively, setting up a Squid caching server in front of your web server can take off a lot of load from anonymous users. This is the approach Wikipedia takes, as the page doesn't need to be re-rendered for each user.
In either case, there isn't an easy way to spin up extra capacity on the EC2 unless your site is on the EC2 to begin with.
There is just 3 type of instance you can have. Other than that they cant give you any more "server power". You will need to do some load balancing. There are software Load Balancers, such as HAProxy, NginX, which are not bad, if you dont want to deal with that, you can do DNS Round Robin, after setting up the high load blogs on different machines.
You should be able to scale them, that s the beauty of AWS, scaling.

What's the best solution for file storage for a load-balanced ASP.NET app?

We have an ASP.NET file delivery app (internal users upload, external users download) and I'm wondering what the best approach is for distributing files so we don't have a single point of failure by only storing the app's files on one server. We distribute the app's load across multiple front end web servers, meaning for file storage we can't simply store a file locally on the web server.
Our current setup has us pointing at a share on a primary database/file server. Throughout the day we robocopy the contents of the share on the primary server over to the failover. This scneario ensures we have a secondary machine with fairly current data on it but we want to get to the point where we can failover from the primary to the failover and back again without data loss or errors in the front end app. Right now it's a fairly manual process.
Possible solutions include:
Robocopy. Simple, but it doesn't easily allow you to fail over and back again without multiple jobs running all the time (copying data back and forth)
Store the file in a BLOB in SQL Server 2005. I think this could be a performance issue, especially with large files.
Use the FILESTREAM type in SQL Server 2008. We mirror our database so this would seem to be promising. Anyone have any experience with this?
Microsoft's Distributed File System. Seems like overkill from what I've read since we only have 2 servers to manage.
So how do you normally solve this problem and what is the best solution?
Consider a cloud solution like AWS S3. It's pay for what you use, scalable and has high availability.
You need a SAN with RAID. They build these machines for uptime.
This is really an IT question...
When there are a variety of different application types sharing information via the medium of a central database, storing file content directly into the database would generally be a good idea. But it seems you only have one type in your system design - a web application. If it is just the web servers that ever need to access the files, and no other application interfacing with the database, storage in the file system rather than the database is still a better approach in general. Of course it really depends on the intricate requirements of your system.
If you do not perceive DFS as a viable approach, you may wish to consider Failover clustering of your file server tier, whereby your files are stored in an external shared storage (not an expensive SAN, which I believe is overkill for your case since DFS is already out of your reach) connected between Active and Passive file servers. If the active file server goes down, the passive may take over and continue read/writes to the shared storage. Windows 2008 clustering disk driver has been improved over Windows 2003 for this scenario (as per article), which indicates the requirement of a storage solution supporting SCSI-3 (PR) commands.
I agree with Omar Al Zabir on high availability web sites:
Do: Use Storage Area Network (SAN)
Why: Performance, scalability,
reliability and extensibility. SAN is
the ultimate storage solution. SAN is
a giant box running hundreds of disks
inside it. It has many disk
controllers, many data channels, many
cache memories. You have ultimate
flexibility on RAID configuration,
adding as many disks you like in a
RAID, sharing disks in multiple RAID
configurations and so on. SAN has
faster disk controllers, more parallel
processing power and more disk cache
memory than regular controllers that
you put inside a server. So, you get
better disk throughput when you use
SAN over local disks. You can increase
and decrease volumes on-the-fly, while
your app is running and using the
volume. SAN can automatically mirror
disks and upon disk failure, it
automatically brings up the mirrors
disks and reconfigures the RAID.
Full article is at CodeProject.
Because I don't personally have the budget for a SAN right now, I rely on option 1 (ROBOCOPY) from your post. But the files that I'm saving are not unique and can be recreated automatically if they die for some reason so absolute fault-tolerance is necessary in my case.
I suppose it depends on the type of download volume that you would be seeing. I am storing files in a SQL Server 2005 Image column with great success. We don't see heavy demand for these files, so performance is really not that big of an issue in our particular situation.
One of the benefits of storing the files in the database is that it makes disaster recovery a breeze. It also becomes much easier to manage file permissions as we can manage that on the database.
Windows Server has a File Replication Service that I would not recommend. We have used that for some time and it has caused alot of headaches.
DFS is probably the easiest solution to setup, although depending on the reliability of your network this can become un-synchronized at times, which requires you to break the link, and re-sync, which is quite painful to be honest.
Given the above, I would be inclined to use a SQL Server storage solution, as this reduces the complexity of your system, rather then increases it.
Do some tests to see if performance will be an issue first.

Resources