What's the best solution for file storage for a load-balanced ASP.NET app? - asp.net

We have an ASP.NET file delivery app (internal users upload, external users download) and I'm wondering what the best approach is for distributing files so we don't have a single point of failure by only storing the app's files on one server. We distribute the app's load across multiple front end web servers, meaning for file storage we can't simply store a file locally on the web server.
Our current setup has us pointing at a share on a primary database/file server. Throughout the day we robocopy the contents of the share on the primary server over to the failover. This scneario ensures we have a secondary machine with fairly current data on it but we want to get to the point where we can failover from the primary to the failover and back again without data loss or errors in the front end app. Right now it's a fairly manual process.
Possible solutions include:
Robocopy. Simple, but it doesn't easily allow you to fail over and back again without multiple jobs running all the time (copying data back and forth)
Store the file in a BLOB in SQL Server 2005. I think this could be a performance issue, especially with large files.
Use the FILESTREAM type in SQL Server 2008. We mirror our database so this would seem to be promising. Anyone have any experience with this?
Microsoft's Distributed File System. Seems like overkill from what I've read since we only have 2 servers to manage.
So how do you normally solve this problem and what is the best solution?

Consider a cloud solution like AWS S3. It's pay for what you use, scalable and has high availability.

You need a SAN with RAID. They build these machines for uptime.
This is really an IT question...

When there are a variety of different application types sharing information via the medium of a central database, storing file content directly into the database would generally be a good idea. But it seems you only have one type in your system design - a web application. If it is just the web servers that ever need to access the files, and no other application interfacing with the database, storage in the file system rather than the database is still a better approach in general. Of course it really depends on the intricate requirements of your system.
If you do not perceive DFS as a viable approach, you may wish to consider Failover clustering of your file server tier, whereby your files are stored in an external shared storage (not an expensive SAN, which I believe is overkill for your case since DFS is already out of your reach) connected between Active and Passive file servers. If the active file server goes down, the passive may take over and continue read/writes to the shared storage. Windows 2008 clustering disk driver has been improved over Windows 2003 for this scenario (as per article), which indicates the requirement of a storage solution supporting SCSI-3 (PR) commands.

I agree with Omar Al Zabir on high availability web sites:
Do: Use Storage Area Network (SAN)
Why: Performance, scalability,
reliability and extensibility. SAN is
the ultimate storage solution. SAN is
a giant box running hundreds of disks
inside it. It has many disk
controllers, many data channels, many
cache memories. You have ultimate
flexibility on RAID configuration,
adding as many disks you like in a
RAID, sharing disks in multiple RAID
configurations and so on. SAN has
faster disk controllers, more parallel
processing power and more disk cache
memory than regular controllers that
you put inside a server. So, you get
better disk throughput when you use
SAN over local disks. You can increase
and decrease volumes on-the-fly, while
your app is running and using the
volume. SAN can automatically mirror
disks and upon disk failure, it
automatically brings up the mirrors
disks and reconfigures the RAID.
Full article is at CodeProject.
Because I don't personally have the budget for a SAN right now, I rely on option 1 (ROBOCOPY) from your post. But the files that I'm saving are not unique and can be recreated automatically if they die for some reason so absolute fault-tolerance is necessary in my case.

I suppose it depends on the type of download volume that you would be seeing. I am storing files in a SQL Server 2005 Image column with great success. We don't see heavy demand for these files, so performance is really not that big of an issue in our particular situation.
One of the benefits of storing the files in the database is that it makes disaster recovery a breeze. It also becomes much easier to manage file permissions as we can manage that on the database.
Windows Server has a File Replication Service that I would not recommend. We have used that for some time and it has caused alot of headaches.

DFS is probably the easiest solution to setup, although depending on the reliability of your network this can become un-synchronized at times, which requires you to break the link, and re-sync, which is quite painful to be honest.
Given the above, I would be inclined to use a SQL Server storage solution, as this reduces the complexity of your system, rather then increases it.
Do some tests to see if performance will be an issue first.

Related

Are Google Cloud Disks OK to use with SQLite?

Google Cloud disks are network disks that behave like local disks. SQLite expects a local disk so that locking and transactions work correctly.
A. Is it safe to use Google Cloud disks for SQLite?
B. Do they support the right locking mechanisms? How is this done over the network?
C. How does disk IOP's and Throughput relate to SQLite performance? If I have a 1GB SQLite file with queries that take 40ms to complete locally, how many IOP's would this use? Which disk performance should I choose between (standard, balanced, SSD)?
Thanks.
Related
https://cloud.google.com/compute/docs/disks#pdspecs
Persistent disks are durable network storage devices that your instances can access like physical disks
https://www.sqlite.org/draft/useovernet.html
the SQLite library is not tested in across-a-network scenarios, nor is that reasonably possible. Hence, use of a remote database is done at the user's risk.
Yeah, the article you referenced, essentially stipulates that since the reads and writes are "simplified", at the OS level, they can be unpredictable resulting in "loss in translation" issues when going local-network-remote.
They also point out, it may very well work totally fine in testing and perhaps in production for a time, but there are known side effects which are hard to detect and mitigate against -- so its a slight gamble.
Again the implementation they are describing is not Google Cloud Disk, but rather simply stated as a remote networked arrangement.
My point is more that Google Cloud Disk may be more "virtual" rather than purely networked attached storage... to my mind that would be where to look, and evaluate it from there.
Checkout this thread for some additional insight into the issues, https://serverfault.com/questions/823532/sqlite-on-google-cloud-persistent-disk
Additionally, I was looking around and I found this thread, where one poster suggest using SQLite as a read-only asset, then deploying updates in a far more controlled process.
https://news.ycombinator.com/item?id=26441125
the persistend disk acts like a normal disk in your vm. and is only accessable to one vm at a time.
so it's safe to use, you won't lose any data.
For the performance part. you just have to test it. for your specific workload. if you have plenty of spare ram, and your database is read heavy, and seldom writes. the whole database will be cached by the os (linux) disk cache. so it will be crazy fast. even on hdd storage.
but if you are low on spare ram. than the database won't be in the os cache. and writes are always synced to disk. and that causes lots of I/O operations.
in that case use the highest performing disk you can / are willing to afford.

Local SQLite vs Remote MongoDB

I'm designing a new web project and, after studying some options aiming scalability, I came up with two database solutions:
Local SQLite files carefully designed for a scalable fashion (one new database file for each X users, as writes will depend on user content, with no cross-user data dependence);
Remote MongoDB server (like Mongolab), as my host server doesn't serve MongoDB.
I don't trust MySQL server at current shared host, as it cames down very frequently (and I had problems with MySQL on another host, too). For the same reason I'm not goint to use postgres.
Pros of SQLite:
It's local, so it must be faster (I'll take care of using index and transactions properly);
I don't need to worry about tcp sniffing, as Mongo wire protocol is not crypted;
I don't need to worry about server outage, as SQLite is serverless.
Pros of MongoDB:
It's more easily scalable;
I don't need to worry on splitting databases, as scalability seems natural;
I don't need to worry about schema changes, as Mongo is schemaless and SQLite doesn't fully support alter table (specially considering changing many production files, etc.).
I want help to make a decision (and maybe consider a third option). Which one is better when write and read operations is growing?
I'm going to use Ruby.
One major risk of the SQLite approach is that as your requirements to scale increase, you will not be able to (easily) deploy on multiple application servers. You may be able to partition your users into separate servers, but if that server were to go down, you would have some subset of users who could not access their data.
Using MongoDB (or any other centralized service) alleviates this problem, as your web servers are stateless -- they can be added or removed at any time to accommodate web load without having to worry about what data lives where.

Planning server infrastructure when hosting duplicated web-product over multiple servers

We have a web-application product that we sell to companies that is hosted at our servers.
The product contains couple of web applications, windows services and SQL server db.
Right now we have only one client that uses our product. We have two servers - one for the web apps and services and other for the db.
In order to add the product to another client, we have to 'duplicate' all the apps and db and run in separately.
As we started expanding and some companies will require more server power then others, I need to plan the servers infrastructure.
Having two servers for each client sounds ridiculous. Hosting costs will be huge. What will happen when I'll have 10 clients? And probably some servers will take more power than others, leaving servers using 30% from their capacity while others use 70%.
One thing I really care about is separating the DB from each product so in case of server compromise, only one db will be at risk.
So... I thought about Virtual Machines...
Does it sounds right?
Do I need two super servers to hold virtual machine instances? (one for web and other for db?)
What about Load balancing / etc..?
Will it require more maintenance time only because I use virtual machines?
Are there any hardware recommendations?
Any help will be appreciated
Many thanks
Virtual Machines is definitely the safest way to separate clients and will allow you the flexibility to allocate a specific percentage of resources to specific clients.
However, using separate processes on the same physical machine will perform better (but not always significantly) and will allow more dynamic use of resources (i.e., if one spikes, it will use the resources it needs). This setup will not allow you to control the resource allocation nearly as easily though. You'll also have to build your own monitoring tools to see and analyze what processes (clients) are using what resources (piggyback on perfmon).
Using separate processes also is dangerous if your application wasn't designed for this. Anywhere the application caches data on the file system or accesses anything besides memory and the database needs to be thoroughly scrubbed to make sure data from clients is not co-mingled or shared.
Separate virtual machines is more work to manage--each one is pretty much like it's own computer. So you have to manage all the VM's plus the physical machine.
You may also want to consider hosting in a more dynamic environment like Amazon AWS or Microsoft's Azure which will allow you to more easily scale up/down as necessary than a VM at a traditional host.

Any real experiences of sharing a sqlite db on a network drive?

I have a client who is interested in hiring my company to do a small, custom, multi-user contact database/crm. They don't care about the technology I use, as long as the product is hosted inside their organization (no "cloud" hosting). Now the problem is that their IT department refuses to host any application developed by an outside company on their servers, additionally they will not allow any server not serviced by them inside of their network.
The only means of sharing data that IT would allow is a windows network share...
I was thinking to do the application as a fat client in Adobe Air, and let all users access a shared sqlite database, but then I read a lot of negative things about this approach.
So I'm asking you - Are there people out there who have actually tried this ?
What are your experiences ?
You can use an MS-Access 2007+ (accdb) file.
Of course there are many database engines with much more features and much better SQL syntax, but if you are looking for a file-based database system that can be accessed simultaneously by multiple processes on a shared Windows drive, then an accdb file is as good as you're going to get I think.
Also note that another popular embedded database, SQL Server Compact Edition, cannot be used on shared drives (at least not by multiple processes from different machines).
References:
Share Access Database on a Network Drive:
http://office.microsoft.com/en-us/access-help/ways-to-share-an-access-database-HA010279159.aspx#BM3
SQL Server CE Cannot be used on a shared drive:
SQLCE 4 - EF4.1 Internal error: Cannot open the shared memory region
The ways sqlite locks databases means you have to be careful if there's a chance you'll have multiple sources trying to access the database. You either have to develop a waiting method, or a timeout, or something

Assets Management in a clustered environment

I have a content management system running on a web server, that among others allows the user to upload assets like images, files, etc to the server.
The problem i have is that there will be 2 servers running behind a load balancer and i am trying to find an efficient way to handle the assets management.
The question i have is:
Will the assets be uploaded to one server every time? Or is there a chance that the images/files will end up into server1 or server2 depending on the load?
How to i serve the images if i don't know on which server they end up in? Will i have to keep the directories of these assets (images/files) synchronized between the two servers?
Thanks,
Synchronization is a tough problem to crack. You can do ad-hoc synchronization using Couchdb but that requires good knowledge of the low-level issues. Therefore you need to choose a write master.
DRDB
You could look at DRDB :D Use one server as the write-master and the other as the slave. Then you can server content from both. This approach is amazing for database-pairs.
Note: seperating your code and URL's for write-master and serve-only will be anoying
Couchdb
You could use couchdb but I think that might be overkill. This is for the LARGE amounts of data and high-levels of fault tolerance.
NFS
You could export the asset directory on the write-master as an nfs drive and import it from the other computer. But in this case it wouldn't be load-balanced in all cases -- i.e., only if the files are cached by the slave. You could use a third computer as an NFS server -- this would allow you to scale to more web-servers.
A central NFS server might just be your best solution as you can do without a write-master as every front-end server can perform writes. This is the approach I would use unless I am thinking of going past the peta-byte range :P

Resources