File uploading: what should be the name of the file to save to? - asp.net

I am going to add file upload control to my ASP.NET 2.0 web page so that users can upload files. Files will be stored in the server in the folder with the name as of the user. I want to know what is the best option to name the files when saving to server. Needs to consider security, performance, flexibility to handle files etc.
Options I am considering now :
Upload with the same name as of the input file name
Add User Id+Random Number +File name as of the input file name
Create random numbers +Current Time in seconds and save files with that number. Will have one table to map this number with users upload
Anything else? What is the best way?

NEVER EVER use user input for filenames. Don't use the username. User the user id instead (I assume your users have an unique id).
NEVER use the original filename. Use your solution number 3, plus the user id instead of the username.
For your information, PHP had a vulnerability a few years ago: one could forge a HTTP POST request with a file upload, and with a file name like "../../anything.php", and the php _FILES array, supposed to contain sanitized values, didn't detect these kind of file names, so one could write files anywhere in the filesystem.

I'd use a combination of
User ID
A random generated string (e.g. a GUID)
Example PDF file name: 23212-dd503cf8-a548-4584-a0a3-39dc8be618df.pdf
This way, the user can upload as many files as he/she wants, without file name conflict, and you are also able to point out which files belong to which users, just by looking at the file names.
I don't see the need to include any other information in the file name, since upload time/date and such can be retrieved from the file's attributes.
Also, you should store the files in a safe location, which external users, such as visitors of your website, cannot access. Instead, you deliver the file to them through a proxy web page (you read the file from the safe location, and pass the data on to the user). For this solution, a database is needed to keep track of files, their location, etc.
This also makes you able to control which users have access to which files through your code.
Update: Here's a description of how the solution with the proxy web page could be implemented.
Create a Web Form with the name GetFile.aspx
GetFile.aspx takes one query parameter named fileid, which is used to identify the file to get. E.g.: http://www.mypage.com/GetFile.aspx?fileid=100
Use the fileid parameter to lookup the file location in the database, so that it can be read and sent to the user. In the Web Form you use Request.QueryString("fileid") to get the file ID and use it in a query that will look something like this (SQL): SELECT FileLocation FROM UserFiles WHERE FileID = 100
Read the file using a System.IO.FileStream and output its contents through Response.Write. Remember to set the appropriate content type using Response.ContentType first, so that the client browser handles the requested file correctly (see this post on asp.forums.net and the MDSN article which is also referred to in the post, which both discuss a method of determining the appropriate content type automatically).
If you choose this approach, it's easy to implement your own simple security or custom actions later on, such as making sure a user is logged into your web site before you send the file, or that users can only access files they uploaded themselves, or logging which users download which files, etc. The possibilities are endless ;-)

Take a look at the System.IO.Path class as it has lots of useful functions you can utilise, such as:
Check which characters are invalid in a file name:
System.IO.Path.GetInvalidPathChars();
Get a random file name:
System.IO.Path.GetRandomFileName();
Get a unique, randome filename in the temporary directory
System.IO.Path.GetTempFileName();

I would go with option #3. A table mapping these files with users will provide other uses down the road, it always does. If you use the mapping, the only advantage of appending the user name or id to the file is if you are trying to debug a problem.
I'd probably use a GUID instead of a random number but either would work. The important things in my opinion are
No username as part of the filename as any part of the stored file
Never use the original file name as any part of the stored file
Use a random number or GUID to ensure no duplicate file
Adding an user id to the file will help with manual debugging issues

There is more to this than meets the eye...which I am thinking that you already knew!
What sort of files are you talking about? If they are anything even remotely big or in such quantity that the group of files could be big I would immediately suggest that you add some flexibility to your approach.
create a table that stores the root paths to various file stores (this could be drives, unc paths, what ever your environment supports). It will initially have one entry in it which will be your first storage location. An nice attribute to maintain with this data is how much room can be stored here.
maintain a table of file related data (id {guid}, create date, foreign key to path data, file size)
write the file to a root that still has room on it (query all file sizes stored in a root location and compare to that roots capacity)
write the file using a GUID for the name (obfuscates the file on the file system)..can be written without the file extension if security requires it (sensitive files)
write the file according to its create date starting from the root/year{number}/month{number}/day{number}/file.extension
With a system of this nature in place - even though you won't/don't need it up front - you can now more easily relocate the files. You can better manage the files. You can better manage collections of files. Etc. I have used this system before and found it to be quite flexible. Dealing with files that are stored to a file system but managed from a database can get a bit out of control once the file store becomes so large and things need to get moved around a bit. Also, at least in the case of windows...storing zillions of files in one directory is usually not a good idea (the reason for breaking things up by their create date).
This complexity is only really needed when you have high volumes and large foot prints.

Related

What Exactly are Anonymous Files

A passage in the file documentation caught my eye:
## We can do the same thing with an anonymous file.
Tfile <- file()
cat("abc\ndef\n", file = Tfile)
readLines(Tfile)
close(Tfile)
What exactly is this anonymous file? Does it exist on disk, or only in memory? I'm interested in this as I'm contemplating a program that will potentially need to create/delete thousands of temporary files, and if this happens only in memory it seems like it would have a much lesser impact on system resources.
This linux SO Q appears to suggest this file could be a real disk file, but I'm not sure how relevant to this particular example that is. Additionally, this big memory doc seems to hint at a real disk based storage (though I'm assuming the file based anonymous file is being used):
It should also be noted that a user can create an “anonymous” file-backed big.matrix by specifying "" as the filebacking argument. In this case, the backing resides in the temporary directory and a descriptor file is not created. These should be used with caution since even anonymous backings use disk space which could eventually fill the hard drive. Anonymous backings are removed either manually, by a user, or automatically, when the operating system deems it appropriate.
Alternatively, if textConnection is appropriate for use for this type of application (opened/closed hundreds/thousands of times) and is memory only that would satisfy my needs. I was planning on doing this until I read the note in that function's documentation:
As output text connections keep the character vector up to date line-by-line, they are relatively expensive to use, and it is often better to use an anonymous file() connection to collect output.
My C is very rusty, so hopefully more experienced people can correct me, but I think the answer to your question "What exactly is this anonymous file? Does it exist on disk, or only in memory?" is "It exists on disk".
Here is what happens at C level (I'm looking at the source code at http://cran.r-project.org/src/base/R-3/R-3.0.2.tar.gz):
A. Function file_open, defined in src/main/connections.c:554, has the following logic related to anonymous file (with an empty description), lines 565-568:
if(strlen(con->description) == 0) {
temp = TRUE;
name = R_tmpnam("Rf", R_TempDir);
} else name = R_ExpandFileName(con->description);
So a new temporary filename is generated if no file name was supplied to file.
B. If the name of the file is not equal to stdin, the call R_fopen(name, con->mode) happens at line 585 (there some subtleties with Win32 and UTF8 names, but we can ignore them now).
C. Finally, the file name is unlinked at line 607. The documentation for unlink says:
The unlink() function removes the link named by path from its
directory and decrements the link count of the file which was
referenced by the link. If that decrement
reduces the link count of the file to zero, and no process has the file open, then all resources associated with the file are
reclaimed. If one or more process have the
file open when the last link is removed, the link is removed, but the removal of the file is delayed until all references to it have
been closed.
So in effect the directory entry is removed but file exists as long as it's being open by R process.
D. Finally, R_fopen is defined in src/main/sysutils.c:135 and just calls fopen internally.

File upload and read from database

I am using file upload mechanism to upload file for an employee and converting it into byte[] and passing it to varBinary(Max) to store into database.
Now I what I have to do is, if any file is already uploaded for employee, simply read it from table and show file name. I have only one column to store a file and which is of type VarBinary.
Is it possible to get all file information from VarBinary field?
Any other way around, please let me know.
If you're not storing the filename, you can't retrieve it.
(Unless the file itself contains its filename in which case you'd need to parse the blob's contents.)
If the name of the file (and any other data about the file that's not part of the file's byte data) needs to be used later, then you need to save that data as well. I'd recommend adding a column for the file name, perhaps one for its type (mime type or something like that for properly sending it back to the client's browser, etc.) and maybe even one for size so you don't have to calculate that on the fly for each file (useful when displaying a grid of files and not wanting to touch the large blob field in the query that populates the grid).
Try to stay away from using the file name for system-internal identity purposes. It's fine for allowing the users to search for a file by name, select it, etc. But when actually making the request to the server to display the file it's better to use a simple integer primary key from the table to actually identify it. (On a side note, it's probably a good idea to put a unique constraint on the file name column.)
If you also need help displaying the file to the user, you'll probably want to take the approach that's tried and true for displaying images from a database. Basically it involves having a resource (generally an .aspx page, but could just as well be an HttpHandler instead) which accepts the file ID as a query string parameter and outputs the file.
This resource would have no UI (remove everything from the .aspx except the Page directive) and would manually manipulate the response headers (this is where you'd set the content type from the file's type), write the byte stream to the client, and end the response. From the client's perspective, something like ~/MyContent/MyFile.aspx?fileID=123 would be the file. (You can suggest a file name to the browser for saving purposes in the response headers, which you'd probably want to do with the file's stored name.)
There's no shortage of quick tutorials (some several years old, it's been around for a while) on how to do this with images. Just remember that there's essentially no difference from the server's perspective if it's an image or any other kind of file. All the server needs to do is send the type in the response headers and write the file's bytes to the client. How the client handles the file is up to the browser. In the vast majority of cases, the browser will know what to do (display an image, display via a plugin a PDF, save a .doc, etc.).

Get file upload data from post data in ASP.NET

I am looping through the posted values on a form with a view to doing something with them (so don't have access to the controls themselves). This is the process I have to take on this project so that is why I'm doing it this way.
On the form I will have a file upload box but I am not sure how I would upload the file that has been selected from it as I can't just do Control.SaveAs(). When I return the posted value using Request.Form.Item[i] I get the file name I chose but not the full path like I would expect.
Can someone point me in the right direction please?
Thanks.
If you want to manipulate the uploaded files directly, and not through a FileUploader control, you should use the Request.Files collection and not the Request.Form
File Upload controls only pass the file name and the contents. I'm not sure why you would need a folder name, especially since the folder name would be for the client - I can't expect that this would have any value to you since you want to save the file on the server.
As I am unsure of your goals, I would recommend using Server.MapPath("~/Folder") to find a suitable folder to save your uploaded files to

File upload in ASP.NET - How can I prevent exceptions?

I have a FileUploader control in my web form. If the file being uploaded is already present, I want to delete it, and overwrite it with the newly-uploaded file. But I get an error, as the file is in use by another process, and thus the application can't delete it. Sample code:
if (FUpload.HasFile)
{
string FileName = Path.GetFileName(FUpload.PostedFile.FileName);
string Extension = Path.GetExtension(FUpload.PostedFile.FileName);
string FolderPath = ConfigurationManager.AppSettings["FolderPath"];
string FilePath = Server.MapPath(FolderPath + FileName);
if (File.Exists(FilePath))
{
File.Delete(FilePath);
}
FUpload.SaveAs(FilePath);
}
Is there anything I can do apart from writing the code in try/catch blocks?
Generate a unique temporary file name. Rename it to your destination when complete. You may still have collisions if someone uploads the "same" file name at the same time. You should always be catching file system errors somewhere. If you don't do it here, may I suggest a global error handler in global.asax.
you can save you file with some other name and after that if it exist use File.Replace to replace old file
At the end of the day, due to potential race conditions on your web site (due to, hopefully, concurrent users), you can't get around try/catch. (Why are you averse to it?)
Utkarsh and No Refunds No Returns have the basic answer right -- save it with a temporary file name, then replace/overwrite the existing one if needed. A good approach for this is to use a GUID as the temporary file name, to ensure that there are no collisions on the filename alone.
Depending on the nature of your application, you could get quite a few files stacked up, uploaded by different users, with lots of potential name conflicts. Depending on the nature and scale of your app, as well as its security boundaries, you might consider giving each user his/her own directory, based on user ID (how you'd identify the user in the database). Each user uploads his/her files there. If there's a name collision, you can bounce back to the user (holding the GUID name in session if needed) and ask if he/she wants to overwrite, and know with confidence that the answer is safe.
If the user declines to overwrite, you can delete your temp file.
If the user agrees to overwrite, you can delete the original and write the new one.
In either event, all of this is localized to the user's own directory, and thus (unless multiple users are signed on with the same ID) the behavior is safe.
In general, this will be more robust and safe than arbitrarily overwriting file name collisions.
Again, due to race conditions and other situations beyond your control, you need to use a try/catch block any time you attempt to write to the file system. Why? What if the drive is out of space? What if the file you are attempting to overwrite is legitimately in use by another process? What if the file you are attempting to overwrite has NTFS permissions forbidding the web process from touching it? So on and so forth. You need to be prepared to handle these kinds of exceptions.

Best Practice for Saving Images

I am allowing users of the admin panel of my website to upload photos, its a simple process where I check the validity of the image and then save it to a folder, then I also have to record a couple of database records for that image to be able to retrieve it later, my saving function is as follows...
The function that uploads and saves the picture in the folder with a name i construct in another function:
My_HTMLInputFile.PostedFile.SaveAs(HttpContext.Current.Server.MapPath("~/photos\" & pta.FileName))
And the function that creates the database record for that same picture:
Public Function InsertPhoto() As Integer
Dim pta As New GKPTableAdapters.tblPhotosTableAdapter
Return pta.InsertPhoto(PhotoCaption, PhotoDescription, ("http://www.myURL.com/photos/" & FileName), IsDefault, IsPicture)
End Function
Now I know that what I am doing is full of best-practices violations, so please point me out to what I should do, keep in mind that the users might delete the pictures later, so I wanna make sure that I can delete the database and file of the picture, and the whole issue of the path is confusing me :P
Thanks in advance.
Something I've noticed right off the bet is that you are hardcoding the FULL PATH to the image.
I'd just store the image name, and then prepend the relative path when i display it in the application
If you allow your users to delete the files via your application, you should delete the record in the database, and then delete the file itself by using File.Delete method
You may also want to look at your file name generation. If you use an md5 hash of the image data as the file name, for example, you can prevent people from uploading duplicate images and you also don't have to think of a way to generate "unique" names for the images.
Exposing your photos directory directly to the internet may be a bad idea if there are images in there that the public should not see and your naming policy is predictable. People will start guessing image URLs and stumble upon something they are not allowed to see.

Resources