How do I design the file storage issue?

How do I design the file storage issue? - asp.net

I am working on an application that creates video files and stores them in a folder in the C:\ drive. I speculate that there will be a large number of these files in the future and we would run out of disk space at some point of time (on our VPS). When the time comes that we have to upgrade, we either plan to use one of the Cloud providers to store files or our existing provider can add another disk (say D:\ drive).
Either way, I would want to design the app now in a way that in future, moving to different locations would not be an issue and would be transparent to the end user.
The code that creates these files supports 2 ways:
myObj.SetOutputToDisk(<path to store>); or
myObj.SetOutputToMemoryStream(ms);
If we go with the Cloud architecture, I assume we might have the following combination:
Cloud Files + Existing VPS or
Cloud Files + Cloud Windows Server
Given the unknowns at this time, how would I go about designing this?

Serve the files up from a subdomain. Say: media.yourdomain.com.
That way, you can trivially repoint DNS records to the new storage provider at some point in the future.
Also, I'd recommend storing the media files on another physical disk to the OS disk. So have a D:\ drive and store the media there.

You might want to look at the Managed Extensibility Framework as a way of adding extensions to your app for new storage methods without the need to rebuild the whole thing.
You need some way to record the storage location and method used, I'd expect some kind of database store that you could migrate to the cloud later if required.
Your question is very vague, you haven't put much work in yourself and as such you are unlikely to get the level of detail you are hoping for in the answers. At least try to implement the system and then ask specific questions around issues that you are having problems with.

Related

How to sync data of multiple Azure Web Site instances

I have an Azure website running on several instances on Basic compute mode. I want to synchronize local data (just a few small numbers like number of online users, total app users, not whole tables or files) between the instances. How can I achieve this? I've seen Refreshing Data on all Azure Instances but it didn't help much. My current naive approach is writing these values to my database, and in my application code, keeping local changes, and syncing the actual data from the database every few minutes, which is OK for my application. But I'm looking for a more elegant way to achieving sync between instances. Is this possible? If yes, how?

There is not an API that will support syncing state between instances. However, here are a few ideas you might consider.
Local File System
The web site instances share a common file system. So, you could just write data to a .txt file in your App_Data folder and all the instances will use that same file. This would at least eliminate the dependency on a database for such simple data.
Caching
Another option is to write the data to a cache. Your options for this would be to use the Managed Cache service or the new Redis Cache (in preview).
Storage Tables
This may not be any better than your current solution. But, it is cheap and very efficient.

Multiple deployers single Content Delivery database (Broker DB)

In the publishing scenario I have, we have multiple deployers pushing content to both file system and database (broker). Pages and Binaries are put on the file system, everything else in the Broker. We have one of the deployers putting the content into the database. Is this the recommended best practice?
If the storage configurations in all deployers also put the content into the database, how does Tridion handle this? Could this cause duplicate entries, locking failures etc?
I'm afraid at the time of writing I don't have access to an environment to test how this would work.

SDL best practice is to have a one-to-one relationship between a deployer and a publication; that means so long as two deployers do not publish the same content (from the same publication) then they will not collide providing, if a file system, there is separation between the deployed sites e.g. www/pub1 & www/pub2.
Your explanation of your scenario needs some additional information to make it complete but it sounds most likely that there are multiple broker databases (albeit hosted on a single database server). This is the most common setup when dealing with multiple file systems on webservers, combined with a single database server.
I personally do not like this set up as I think it would be better to host file system content in a shared location & share single DB. Or better still deploy everything to the database and uses something like DD4T/CWA.

I have seen (and even recommended based on customer limitations) similar configurations where you have multiple deployers configured as destinations of a given target.
Only one of the deployers can write to the database for the same transaction, otherwise you'll have concurrency issues. So one deployer writes to the database, while all others write to the file system.
All brokers/web applications are configured to read from the database.
This solves the issue of deploying to multiple servers and/or data centers where using a shared file system (preferred approach) is not feasible - be it for cost or any other reason).
In short - not a best practice, but it is known to work.

Julian's and Nuno's approaches cover most of the common scenarios. Indeed a single database is a single point of failure, but in many installations, you are expected to run multiple schemas on the same database server, so you still have a single point of failure even if you have multiple "Broker DBs".
Another alternative to consider is totally independent delivery nodes. This might even mean running a database server on your presentation box. These days it's all virtual anyway so you could run separate small database servers. (Licensing costs would be an important constraint)
Each delivery server has it's own database and file system. Depending on how many you want, you might not want to set up multiple destinations/deployers, so you deploy to one, and use file system replication and database log shipping to mirror the content to the rest.
Of course, you could configure two deployment systems (or three) for redundancy, assuming you can manage all the clustering etc.
OK - to come clean - I've never built one like this, but I'm fairly sure elements of this kind of design will become more common as virtualisation increases, and licensing models which support it. (Maybe we have to wait for Tridion to support an open source database!)

How to scale a document storage system?

I maintain a web application (ASP.NET/IIS7/SQL2K8/Win2K8) that needs to access documents, actually hundreds of thousands of documents, and growing. Currently, they are all on a Windows 2K8 Server fileshare, being accessed by UNC path (SMB). The files are in a single flat directory and I'm trying to plan how to best improve this solution. I don't want to use the SQL Filestream attribute as it would be significant effort to migrate it all into that, and would really lock in to SQL Server. I also need to find a way to replicate the data for disaster recovery, so perhaps a solution can help with that too.
Options could be:
Segment files into multiple directories?
Application would add metadata for which directory it's on (or segment by other means)
Segment files into separate servers? (virtualize)
Backup becomes more complicated.
Application would add metadata for which server it's on
NAS Storage
SAN Storage
Put a service (WCF) in front of the files and have the app talk to the service
bonus of being reusable across many applications
Assuming I'm going to store on filesystem and not in database (I've read those disccusions here), which would be a more scalable solution?

You've got a couple issues:
- managing a large volume of (static?) files
- preparing for backups and disaster recovery of said files
I'll throw this out there, even though I'm not a fan of the answer, but you might poke around with the free SharePoint 2010 Foundation that's included with server 2k8. If you're having issues with finding the documents you need (either by search, taxonomy via tagging or other metadata) as well as document expiration and you don't want to buy a full blown document management system, this might be a solution. Of course it introduces new problems...
If your only desire is to have these files available to spit out on the web, then the file store like you're using now really is the simplest solution. For DR/redundancy purposes, I'd look at a) running them on a raid/SAN of some sort and b) auto-syncing them with the cloud (either azure or amazon). For b) you can get apps that make the cloud appear as a mapped drive and then use an rsync type software to keep the cloud up to date.
If you want to build something new and cool, you might think about moving the entire file archive into the cloud and just write a table in a db to manage the file name, old location, new cloud location and a redirector code that can provide the access tokens to requestors.
3 different approaches... your choice.

Store pictures as files or in the database like MSSQL for a web app?

I'm building an ASP .NET web solution that will include a lot of pictures and hopefully a fair amount of traffic. I do really want to achieve performance.
Should I save the pictures in the Database or on the File system? And regardless the answer I'm more interested in why choosing a specific way.

Store the pictures on the file system and picture locations in the database.
Why? Because...
You will be able to serve the pictures as static files.
No database access or application code will be required to fetch the pictures.
The images could be served from a different server to improve performance.
It will reduce database bottleneck.
The database ultimately stores its data on the file system.
Images can be easily cached when stored on the file system.

In my recently developed projects, I stored images (and all kinds of binary documents) as image columns in database tables.
The advantage of having files stored in the database is obviously that you do not end up with unreferenced files on the harddisk if a record is deleted, since synchronization between database (= meta data) and harddisk (= file storage) is not built-in and has to be programmed manually.
Using today's technology, I suggest you store images in SQL Server 2008 FILESTREAM columns (at least that's what I am going to do with my next project), since they combine the advantage of storing data in database AND having large binaries in separate files (at least according to advertising ;) )

The adage has always been "Files in the filesystem, file metadata in the database"

Better to store files as files. Different databses handle Blob data differently, so if you have to migrate your back end you might get into trouble.
When serving the impages an < img src= to a file that already exists on the server is likely to be quicker than making a temporary file from the database field and pointing the < img tag to that.
I found this answer from googling your question and reading the comments at http://databases.aspfaq.com/database/should-i-store-images-in-the-database-or-the-filesystem.html

i usually like to have binary files in the database because :
data integrity : no unreferenced file, no path in the db without any file associated
data consistency : take a database dump and that's all. no "O i forgot to targz this data directory."

Storing images in the database adds a DB overhead to serve single images and makes it hard to offload to alternate storage (S3, Akami) if you grow to that level. Storing them in the database makes it much easier to move your app to a different server since it's only the DB that needs to move now.
Storing images on the disk makes it easy to offload to alternate storage, makes images static elements so you don't have to mess about with HTTP headers in your web app to make the images cacheable. The downside is if you ever move your app to a different server you need to remember to move the images too; something that's easily forgotten.

For web based applications, you're going to get better performance out of using the file system for storing your images. Doing so will allow you to easily implement caching of the images at multiple levels within your application. There are some advantages to storing images in a database, but most of the time those advantages come with client based applications.

Just to add some more to the already good answers so far. You can still get the benefits of caching from both the web level maybe and the database level if you go the route keeping you images in the database.
I think for the database you can achieve this by how you store the images with relation to the textual data associated with them and if you can the access to the images into a particular query so that the database can cache the query (just theory though so feel free to nuke me on that part).
With the web side, I would guess since you're question is tagged up with asp.net that you would go the route of using a http handler to serve up the images. Then you have all the benefits of the framework at your disposal and you can keep you domain logic cleaner with only having to pass the key to your image to the http handler.

Here is a step-by-step example (general approach, Spring implementation, Eclipse) of storing images in file system and holding their metadata in DB --
http://www.devmanuals.com/tutorials/java/spring/spring3/mvc/Spring3MVCImageUpload.html
Here is an example too -- http://www.journaldev.com/2573/spring-mvc-file-upload-example-tutorial-single-and-multiple-files
Also you can investigate a codebase of this project -- https://github.com/jdmr/fileUpload . Pay attention to this controller.

How do I cluster an upload folder with ASP.Net?

We have a situation where users are allowed to upload content, and then separately make some changes, then submit a form based on those changes.
This works fine in a single-server, non-failover environment, however we would like some sort of solution for sharing the files between servers that supports failover.
Has anyone run into this in the past? And what kind of solutions were you able to develop? Obviously persisting to the database is one option, but we'd prefer to avoid that.

At a former job we had a cluster of web servers with an F5 load balancer in front of them. We had a very similar problem in that our applications allowed users to upload content which might include photo's and such. These were legacy applications and we did not want to edit them to use a database and a SAN solution was too expensive for our situation.
We ended up using a file replication service on the two clustered servers. This ran as a service on both machines using an account that had network access to paths on the opposite server. When a file was uploaded, this backend service sync'd the data in the file system folders making it available to be served from either web server.
Two of the products we reviewed were ViceVersa and PeerSync. I think we ended up using PeerSync.

In our scenario, we have a separate file server that both of our front end app servers write to, that way you either server has access to the same sets of files.

The best solution for this is usually to provide the shared area on some form of SAN, which will be accessible from all servers and contain failover.
This also has the benefit that you don't have to provide sticky load balancing, the upload can be handled by one server, and the edit by another.

A shared SAN with failover is a great solution with a great (high) cost. Are there any similar solutions with failover at a reasonable cost? Perhaps something like DRBD for windows?
The problem with a simple shared filesystem is the lack of redundancy (what if the fileserver goes down)?