I need a snippet to check file for validity (I'm allowing users to upload xml files). So I need to check whether uploaded file is XML.
The best I can think of is just check if extension is ".xml". What if its replaced?
You can try loading it like this and catch the exception:
XDocument xdoc = XDocument.Load("data.xml"));
Presumably, if they're uploading XML, then you're going to use it for something afterwards. In this case you should validate the XML against a Schema (XSD etc) so that you know you aren't going to hit unexpected values/layouts etc.
In Urlmon.dll, there's a function called FindMimeFromData.
From the documentation
MIME type detection, or "data
sniffing," refers to the process of
determining an appropriate MIME type
from binary data. The final result
depends on a combination of
server-supplied MIME type headers,
file extension, and/or the data
itself. Usually, only the first 256
bytes of data are significant.
So, read the first (up to) 256 bytes from the file and pass it to FindMimeFromData.
If you must validate the xml (assuming you want to validate the entire thing) you can use the XmlDocument class and catch an exception if it's not XML.
Related
I need to create and read a user preferences XML file with Adobe Air. It will contain around 30 nodes.
<id>18981</id>
<firstrun>false</firstrun>
<background>green</background>
<username>stacker</username>
...
What's a good method to do this?
Write up an "XML parser" that reads the values and is aware of the data types to convert to based on the "save preferences model." So basically you write a method/class for writing the data from the "save preferences model" to XML then write a method/class for reading from the XML into the "save preferences model", you can use describeType for both. Describe type will return an XML description of the model classes properties and the types of those properties and accessibility (read/write, readonly, write only). For all properties that are read/write you would store them into the XML output, when reading them back in you would do the same thing except you could use the type property from the describeType output to determine if you need to do a string to boolean conversion (if(boolValue == "true")) and string to number conversions, parseInt or parseFloat. You could ultimately store the XML in a local SQL database if you want to keep history, or else just store the current preferences in flat file (using FileReference, or in AIR you can use FileStream to write directly to a location).
Edit:
Agree with Joshua's comment below local shared objects was the first thing I thought of when seeing this, you can eliminate the need to write the XML parser/reader since it will handle serializing/de-serializing the objects for you (but manually looking at the LSO is probably ugly)... anyhow I had done something similar for another project of mine, I tried stripping out the relevant code, to note in my example here I didn't use describe type but the general concept is the same:
http://shaunhusain.com/OnePageSaverLoader/index.php
I am using file upload mechanism to upload file for an employee and converting it into byte[] and passing it to varBinary(Max) to store into database.
Now I what I have to do is, if any file is already uploaded for employee, simply read it from table and show file name. I have only one column to store a file and which is of type VarBinary.
Is it possible to get all file information from VarBinary field?
Any other way around, please let me know.
If you're not storing the filename, you can't retrieve it.
(Unless the file itself contains its filename in which case you'd need to parse the blob's contents.)
If the name of the file (and any other data about the file that's not part of the file's byte data) needs to be used later, then you need to save that data as well. I'd recommend adding a column for the file name, perhaps one for its type (mime type or something like that for properly sending it back to the client's browser, etc.) and maybe even one for size so you don't have to calculate that on the fly for each file (useful when displaying a grid of files and not wanting to touch the large blob field in the query that populates the grid).
Try to stay away from using the file name for system-internal identity purposes. It's fine for allowing the users to search for a file by name, select it, etc. But when actually making the request to the server to display the file it's better to use a simple integer primary key from the table to actually identify it. (On a side note, it's probably a good idea to put a unique constraint on the file name column.)
If you also need help displaying the file to the user, you'll probably want to take the approach that's tried and true for displaying images from a database. Basically it involves having a resource (generally an .aspx page, but could just as well be an HttpHandler instead) which accepts the file ID as a query string parameter and outputs the file.
This resource would have no UI (remove everything from the .aspx except the Page directive) and would manually manipulate the response headers (this is where you'd set the content type from the file's type), write the byte stream to the client, and end the response. From the client's perspective, something like ~/MyContent/MyFile.aspx?fileID=123 would be the file. (You can suggest a file name to the browser for saving purposes in the response headers, which you'd probably want to do with the file's stored name.)
There's no shortage of quick tutorials (some several years old, it's been around for a while) on how to do this with images. Just remember that there's essentially no difference from the server's perspective if it's an image or any other kind of file. All the server needs to do is send the type in the response headers and write the file's bytes to the client. How the client handles the file is up to the browser. In the vast majority of cases, the browser will know what to do (display an image, display via a plugin a PDF, save a .doc, etc.).
I able to upload my file through uploadify + .ashx, but the problem is I always get ContentType = application/octet-stream
Lets say I upload an image, I expected to return me "image/pjpeg", but it always return "application/octet-stream" no matter what file I uploaded.
Please advice how to get the correct contentType in .ashx
I believe that most probably content type is getting set by browser. Regardless, different browsers may set different content type for different files - and they may fall back to generic content type such as "application/octet-stream" for any binary file (pdf, zip, doc, xls). Its possible that one browser would report docx as "application/vnd.openxmlformats" while other as ""application/x-zip-compressed" and yet another as "application/octet-stream". And yet all of them are correct, because docx are binary file and are compressed (zip) files.
In short, my suggestion is that you should not rely on the content type sent by client (beyond certain extent such as deciding whether its text, html or binary etc) and rather use server side sniffing logic to determine type of file content. Simple sniffing can be based on file extension while more robust implementation will loot at actual file contents where typically first few bytes of file indicate the file type.
At the moment i get file extension of the file like :
string fileExt = System.IO.Path.GetExtension(filUpload.FileName);
But if the user change the file extension of the file ( for example user could rename "test.txt" to "test.jpg" ), I can't get the real extension . What's the solution ?
You seem to be asking if you can identify file-type from its content.
Most solutions will indeed attempt the file extension, but there are too many different possible file types to be reliably identifiable.
Most approaches use the first several bytes of the file to determine what they are.
Here is one list, here another.
If you are only worried about text vs binary, see this SO question and answers.
See this SO answer for checking if a file is a JPG - this approach can be extended to use other file headers as in the first two links in this answer.
Whatever the user renames the file extension to, that is the real file extension.
You should never depend on the file extension to tell you what's in the file, since it can be renamed.
See "how can we check file types before uploading them in asp.net?"
There's no way to get the 'real' file extension - the file extension that you get from the filename is the real one. If file content is your concern, you can retrieve the content type using the .ContentType property and verify that it is a content type that you are expecting - eg. image/jpg.
I'm storing some files in my database and since I'm storing them in binary format and not keeping any other information, I have to make sure that all of them are in the same format so that I'll be able to "serve" them later (If there's a simple way to infer the file type from a byte array, please tell, but that's not the focus here).
So, what I need to do is validate every file that is uploaded to make sure it's on the required format.
I've set up a FieldTemplate with a FileUpload control and a CustomValidator:
<asp:FileUpload ID="FileUpload" runat="server" />
<asp:CustomValidator
ID="CustomValidator1"
runat="server"
ErrorMessage="PDF only."
ControlToValidate="FileUpload"
OnServerValidate="CustomValidator1_ServerValidate">
</asp:CustomValidator>
What I'm missing is the code to place in that CustomValidator1_ServerValidate method that checks the uploaded file to make sure it's in the right format (PDF in this case).
Thanks in advance.
Use the FileUpload.PostedFile.ContentType property to validate the MIME type ( should be application/pdf ). For security reasons, also validate that the file extension is appropriate ( .pdf ). You could have a static hashtable containing mappings from MIME type to file extension(s) and use as lookup to validate an extension.
Like ary said. This can all be spoofed. Take a .txt file, rename it to a pdf file and try getting the content type. It will be "application\pdf".
However there is one solution that I have used before. During my brief test with the PDF files, I figured out that the first 3 bytes were always the same. I tried only the first 3 bytes because it seemed enough. The value for the first three bytes is : 37, 80, 68.
So I read the bytes (InputFile1.FileContent.ReadByte()), compared them to the 3 bytes above and if they were the same, then I had a PDF file. Also I read somewhere that you should turn off the script execution for the upload directory in IIS. Hope it helps.
The FileUpload.PostedFile.ContentType was exactly what I was looking for.
Just a heads-up to whoever is trying to do the same thing: it seems that the MIME type for PDF files can be "application/pdf" or "text/pdf", so be sure to check for both.
User can spoof it. In the solution above has no validation of the actual bytes content. I can send you executable and disguise it as pdf and this will not catch it.