I want to use UploadedTextFile.FileContent to read file's content and save it as string . but it returns a set of encoding-like characters . this is the code :
string content = (new streamReader(UploadedTextFile.FileContent,true)).ReadToEnd();
but the result is something like this
yua%^##568sda_sdf89 ....
file content is not english .
Microsoft Word documents have tons of formatting content in there as well. Try opening a .doc file in Notepad. What you see there is what you're going to get, regardless of encoding.
If you still want to try and extract the content from the document, there are tools in C# to help you out. I recommend reading the answer in this link
Related
I am on AIX; I am processing the input files and generating a .html file using AWK and sending it through mail using /usr/sbin/sendmail;
Strangely there are some parts in .html file where there is !
eg: Assume that the .html file contains many lines out of which "account" is also part of it, but when it is sent through mail, the account is displayed as acco!unt Please note that the .html file doesn't contain ! character. But when the report is viewed in Outlook, it contains ! characters. Even if the .html file is replaced with .xls (excel), the ! is till present in the mail. But not present in the actual file. Can you please let me know how to remove this. Thanks.
In the past, if I wanted a web page to display as a .DOC word document, I could do so by doing this in the page load:
Response.AddHeader("content-disposition", "attachment;filename=FullDetail.doc")
Response.ContentType = "application/vnd.word"
I was hoping to output the web page as a .DOCX by doing:
Response.AddHeader("content-disposition", "attachment;filename=FullDetail.docx")
Response.ContentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
but it doesn't work. I get an error:
The file FullDetail.docx cannot be opened because there are problems with the contents. The file is corrupt and cannot be opened.
The contents of both files look pretty much identical - just an HTML page.
HR Full Detail Report
etc...
The .doc opens fine. The .docx doesn't. If I rename the .docx to .doc, it opens fine in Word 2010. Any suggestions?
Thanks!
Brad
A docx file is actually a zip file that contains several other files. For example, create a new MS Word doc, put the text "Hello world" in it and save it (example.docx). Then rename the docx file to "example.zip" and open it. You will see that a the content is much more complicated than you might have expected.
Most people find that it is much easier to generate a Word XML file (https://msdn.microsoft.com/en-us/library/bb266220(v=office.12).aspx) or use an API for generating a real docx file (for instance: http://docx.codeplex.com/).
How do I add an html entity to my CSV?
I have an asp.net, sql server that generates html, excel, and csv files. Some of the data needs to have the ‡ entity in it. How do I get it to output to my CSV correctly? If I have it like this: ‡, then it gets screwed up but if I output it with the entity code, the CSV outputs that text.
Non-printable characters in a field are sometimes escaped using one of several c style character escape sequences, ### and \o### Octal, \x## Hex, \d### Decimal, and \u#### Unicode.
So just escape your non-ascii character C#-style and you'll be fine.
I'm not sure what you mean by "it gets screwed up".
Regardless, it is up to the receiving program or application to properly interpret the characters.
What this means is that if you put ‡ in your csv file then the application that opens the CSV will have to look for those entities and understand what to do with them. For example, the opening application would have to run an html entity decoder in order to properly display it.
If you are looking at the CSV file with notepad (for example) then of course it won't decode the entities because notepad has no clue what html entities are or even what to do when it finds them.
Even Internet Explorer wouldn't convert the entities for display when opening a CSV file. Now if you gave it a .html extension then IE would handle the display of the file with it's html rendering engine.
At the moment i get file extension of the file like :
string fileExt = System.IO.Path.GetExtension(filUpload.FileName);
But if the user change the file extension of the file ( for example user could rename "test.txt" to "test.jpg" ), I can't get the real extension . What's the solution ?
You seem to be asking if you can identify file-type from its content.
Most solutions will indeed attempt the file extension, but there are too many different possible file types to be reliably identifiable.
Most approaches use the first several bytes of the file to determine what they are.
Here is one list, here another.
If you are only worried about text vs binary, see this SO question and answers.
See this SO answer for checking if a file is a JPG - this approach can be extended to use other file headers as in the first two links in this answer.
Whatever the user renames the file extension to, that is the real file extension.
You should never depend on the file extension to tell you what's in the file, since it can be renamed.
See "how can we check file types before uploading them in asp.net?"
There's no way to get the 'real' file extension - the file extension that you get from the filename is the real one. If file content is your concern, you can retrieve the content type using the .ContentType property and verify that it is a content type that you are expecting - eg. image/jpg.
I am working on a website at the moment which is displaying a strange bug with generated word documents. The site has a feature on it which allows the user to download a word document containing information related to their visit. This file is generated via some vb.net code and takes an xml template of the final document and inserts the relevant content required.
The strange behaviour is that on some machines the .doc file generated displays fine and on others it displays as XML when opened in Word. Both behaviours have been seen in the same version of Office (2003) but on seperate machines. My question is really whether the error lies with the set up of word on the individual machines, or whether there is an error in the code.
The code to create the file and download it is as follows:
Response.Clear()
Response.ClearHeaders()
Response.AddHeader("content-disposition", "inline; filename=MyNewFile")
Response.ContentType = "application/msword"
'Create the word file as a byte array based off an xml template document'
Dim objWordGenerator As New WordFileGenerator
Response.BinaryWrite(objWordGenerator.GetWordBytes)
Response.Flush()
Response.Clear()
Response.End()
The actual xml template is quite large so probably not suitable to post here but I can provide any more information if necessary.
Update:
Having managed to fix the original bug (it turns out that the original filename being used didn't have the .doc extension) I have found another bit of strange behaviour.
When the file is opened it opens in Word correctly, however when you go to save it the default file type is XML. When saved as an XML file it will open in Word correctly, but I feel this is slightly confusing behaviour for the end user. I would like the file to default to saving as a DOC file instead. Is there a way to force this to happen?
Update 2:
Below is a section of the XML that relates to the Document properties. The rest of the document deals with content and styles etc, so my assumption is that this is the most relevant section. To reiterate, my problem is that when the downloaded .doc file is opened in word, the default "save as" option is as an XML file.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve">
<o:DocumentProperties>
<o:Title>Fancy Word Doc</o:Title>
<o:Author>Bob Bobertson</o:Author>
<o:Characters>999</o:Characters>
<o:Company>A Fancy Company</o:Company>
<o:Version>1.1.1</o:Version>
</o:DocumentProperties>
Cheers
The File -> SaveAs filetype is XML because that is what the file open in Word is. If you want it to say 'Word Document (*.doc) then you will need to create a real Word document on the server and not an XML. Just by putting a .doc extension on the filename doesn't change it's real contents. Word knows the file type that is loaded into it and suggests that as the file type when saving. I don't know of any way to override this behavior.
I've been using Office XML with Excel for awhile now and this is very similar to the code that I'm using to send it down to the client. You might want to try and see if it works for you.
Dim xml As XmlDocument = New XmlDocument()
xml.Load("report.doc")
Response.ContentType = "application/vnd.ms-word"
Response.AppendHeader("CONTENT-DISPOSITION", "attachment; filename=report.doc")
Response.Write(xml.OuterXml)
Try it with firefox and you will probably find that it will be saved with the correct extension.
IIRC, since version 3 IE prefers to ignore the mime type and sniff the file content to see what the "correct" file format is. Maybe is uses the magic cookie?
Is this Word 2007 or later? Try
Response.AddHeader("content-disposition", "attachment; filename='MyNewFile.doc'")
attachment encourages the browser to save the file instead of displaying it.
I ran some tests and could not reproduce your problem on my system in Word 2003. Without a specific example (and actual file that is misbehaving), it would be pure speculation to make any suggestions.