A file storage format for file sharing site - asp.net

I am implementing a file sharing system in ASP.NET MVC3. I suppose most file sharing sites store files in a standard binary format on a server's file system, right?
I have two options storage wise - a file system, or binary data field in a database.
Is there any advantages in storing files (including large one's) in a database, rather then on file system?
MORE INFO:
Expected average file size is 800 MB. 3 files per minute are to be usually requested to be fed back to the user, who is downloading.

If the files are as big as that, then using the filesystem is almost certainly a better option. Databases are designed to contain relational data grouped into small rows and are optimized for consulting and comparing the values in these relations. Filesystems are optimized for storing fairly large blobs and recalling them by name as a bytestream.
Putting files that big into a database will also make it difficult to manage the space occupied by the database. The tools to query space used in a filesystem, and remove and replace data are better.
The only caveat to using the filesystem is that your application has to run under an account that has the necessary permission to write the (portion of the) filesystem you use to store these files.

Use FileStream when:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
Here is MSDN link https://msdn.microsoft.com/en-us/library/gg471497.aspx
How to use it: https://www.simple-talk.com/sql/learn-sql-server/an-introduction-to-sql-server-filestream/

Related

Overcoming inode limitation

What's the best practise for storing a large (expanding) number of small files on a server, without running into inode limitations?
For a project, I am storing a large number of small files on a server with 2TB HD space, but my limitation is the 2560000 allowed inodes. Recently the server used up all the inodes and was unable to write new files. I subsequently moved some files into databases, but others (images and json files remain on the drive). I am currently at 58% inode usage, so a solution is required imminently.
The reason for storing the files individually is to limit the number of database calls. Basically the scripts will check if the file exists and if so, then return results dependently. Performance wise this makes sense for my application, but as stated above it has limitations.
As I understand it does not help to move the files into sub-directories, because each inode points to a file (or a directory file), so in fact I would just use up more inodes.
Alternatively I might be able to bundle the files together in an archive type of file, but that will require some sort of indexing.
Perhaps I am going about this all wrong, so any feedback is greatly appreciated.
On the advice of arkascha I looked into loop devices and found some documentation about losetup. Remains to be tested.

How to store lots of user uploaded files in Asp.net in load balanced hosting

I'm building a Asp.net site where users can upload files into their user account and then download them whenever they are logged in. The files will typically be less than 5MB and users can only download the files that they have uploaded (i.e can't download someone else's file). There are around 100k users and each could potentially upload around 2 or 3 files. The live site is load balanced
I'm thinking that storing these files in the central DB (Sql Server) as a BLOB would be nice because...
As the site is load balanced, each node can access the file from the central DB. No need to worry about having shared folder to store the files
I can more easily ensure that user's only download their own files.
Backing up the DB automatically includes the file BLOBS
Only downside to this I've read is performance, but how bad can this be?
If I were to store these files in the file system, would there be any problem storing it all in one folder?
What is the best approach for this?
If you want it fast, then store the data to the filesystem. In this case you dont need to read the amount of data from the sql server. In this case you could do two things:
1st approach: Store all files in one single folder (NTFS limit: 4,294,967,295, FAT limit: 268,435,437)
2nd approach: Create for each user (userid) a separate subfolder. i would prefer this over the 1st approach.
With newer versions of SQL Server you can also use FILESTREAM.
It would be also interesting to knew
How many users do you have
how many uploads per day/hour/minute could you have
how many downloads per day/hour/minute could you have
how many uploads at the same time could you have
how many downloads at the same time could you have
what is the size of the systems you have
is it internally used/externally - which bandwidth/network load do you have etc.

Storing and serving many compressed archives with shared underlying content

I have a web server that has many compressed archive files (zip files) available for download. I would like to drastically reduce the disk footprint those archives take on the server.
The key insight is that those archives are in fact slightly different versions of the same uncompressed content. If you uncompressed any two of these many archives and ran a diff on the results, I expect you would find that the diff is about 1% of the total archive size.
Those archives are actually JAR files, but the compression details are — I believe — irrelevant. But this explains, that serving those archives in a specific compressed format is non-negotiable : it is the basic purpose of the server.
In itself, it is not a problem for me to install differential storage for the content of those archives, drastically reducing the disk footprint of the set of archives. There are numerous ways of doing this, using delta encoding or a compressed filesystem that understands sharing (e.g. I believe btrfs understands block sharing, or I could use snapshotting to enforce it).
The question is, how do I produce compressed zips from those files ? The server I have has very little computational power, certainly not enough to recreate JARs on the fly from the block-sharing content.
Is there a programmatic way to expose the shared content at the uncompressed level to the
compressed level ? An easily-translatable-to-zip incremental compressed format ?
Should I look for a caching solution coupled with generating JARs on the fly ? This would at least alleviate the computational pain from generating the JARs that are the most requested.
There is specialized hardware that can produce zips very fast, but I'd rather avoid the expense. It's also not a very scalable solution as the number of requests to the server grows.
If the 1% differences are smeared across all of the entries in all of the jar files, then there's not much you can do without having to recompress a lot.
If on the other hand the 1% differences are concentrated in a few % of the jar entries, with most of the jar entries unchanged, then there's hope. You can keep all of the individual jar entries in their own jar files on the server, and for each jar file you want to serve, just keep a list of those individual jar entry files to combine. It would be easy to write a fast utility to take a set of jar files and merge them into a single jar file. If there isn't one already.
One approach I've used in the past is to log for some time the actual requests for the zip files. If you find that the requests are highly skewed, then you may be able to use caching to alleviate the cost of producing zip files on the fly.
Basically, implement your differential storage along the lines as you suggest. Allocate also some amount, say 10%, of your total storage for a LRU (or whatever other replacement algorithm you feel like) for the actual .zipped files. Every time a user requests the zip, you serve it from the cache if it is ready, or generate it on the fly and put it in the cache if not.
In the general case this may not work well, but in the common case that actual requests are typically to a small concentrated number of files, it may solve the problem.
Otherwise, I see your options as:
Use delta encoding on disk and then change the format your clients expect for responses. For example, instead of zip, you can serve them a format which is basically the bits of the delta-encoded files they need to reconstruct the file. On the server side, you save most of the work since you are just serving files more or less unmodified from disk, and then the client has to put them together (the existing client already has to unzip the files, so perhaps this is not an undue burden).
Carefully look at the .zip format and store your files in a specialized way that does most of the .zip work ahead of time. For example, something like a delta encoding, but with the actual hard part of match-finding stored on disk, such that encoding a file can be a very fast process. This would require someone with sophisticated knowledge of the zip format to design, however.

Berkeley DB File Splitting

Our application uses berkeley db for temporary storage and persistance.A new issue has risen where tremendous data comes in from various input sources.Now the underlying file system does not support such large file sizes.Is there anyway to split the berkeley DB files into logical segments or partitions without losing data inside it.I also need it to set using berkeley DB properties and not cumbersome programming for this simple task.
To my knowledge, BDB does not support this for you. You can however implement it yourself by creating multiple databases.
I did this before with BDB, programatically. i.e. My code partitioned a potentially large index file into seperate files and created a top level master index over those sub files.
Modern BDB has means to add additional directories either using DB_CONFIG (recommended) or with API calls.
See if these directives (and corresponding API calls) help:
add_data_dir
set_create_dir
set_data_dir
set_lg_dir
set_tmp_dir
Note that adding these directives is unlikely to transparently "Just Work", but it shouldn't be too hard to use db_dump/db_load to recreate the database files configured with these directives.

MySql Audio Library

I'm coding in ASP.NET and want to store audio files (.mp3, or smaller formats) in a MySQL database; which, I can then retrieve based on certain conditions. Is this possible? Are there any preferred methods to having Audio files on your web pages (besides embedding them in the HTML).
Most solutions that store files in a database do not scale well, but you can certain store audio files, or any other type of file, as a blob (binary large object) in MySQL. You can create an ashx handler that performs the retrieval from the database and writes the content to the ASP.NET output stream as raw binary data. You can then create links that point to the ASHX handler and perform any query logic you want in there based on URL parameters.
If you are using a MySQL database, it seems to do well (at least in my experience) with blobs. It takes a relatively short time to load the MP3 and if you tune your database for audio, you can probably even get better performance (I pretty much use default settings).
One thing to remember is that you define the MIME-type so that users know what they are getting when they click a link to access your MP3.
Again, all of this is my own experience. YMMV.
I prefer to store large files outside of the database, unless there is some overwhelming need to keep everything there.
You could store the location of the file in the database and have the files outside of the webapp directory, so they can't be accessed directly.
Then, in the url for playing the music you can just have a cgi program that will just send that data to the browser, with the correct mime type.

Resources