How do I convert a .docx to html using asp.net?

How do I convert a .docx to html using asp.net? - asp.net

Word 2007 saves its documents in .docx format which is really a zip file with a bunch of stuff in it including an xml file with the document.
I want to be able to take a .docx file and drop it into a folder in my asp.net web app and have the code open the .docx file and render the (xml part of the) document as a web page.
I've been searching the web for more information on this but so far haven't found much. My questions are:
Would you (a) use XSLT to transform the XML to HTML, or (b) use xml manipulation libraries in .net (such as XDocument and XElement in 3.5) to convert to HTML or (c) other?
Do you know of any open source libraries/projects that have done this that I could use as a starting point?
Thanks!

Try this post? I don't know but might be what you are looking for.

I wrote mammoth.js, which is a JavaScript library that converts docx files to HTML. If you want to do the rendering server-side in .NET, there is also a .NET version of Mammoth available on NuGet.
Mammoth tries to produce clean HTML by looking at semantic information -- for instance, mapping paragraph styles in Word (such as Heading 1) to appropriate tags and style in HTML/CSS (such as <h1>). If you want something that produces an exact visual copy, then Mammoth probably isn't for you. If you have something that's already well-structured and want to convert that to tidy HTML, Mammoth might do the trick.

Word 2007 has an API that you can use to convert to HTML. Here's a post that talks about it http://msdn.microsoft.com/en-us/magazine/cc163526.aspx. You can find documentation around the API, but I remember that there is a convert to HTML function in the API.

This code will helps to convert .docx file to text
function read_file_docx($filename){
$striped_content = '';
$content = '';
if(!$filename || !file_exists($filename)) { echo "sucess";}else{ echo "not sucess";}
$zip = zip_open($filename);
if (!$zip || is_numeric($zip)) return false;
while ($zip_entry = zip_read($zip)) {
if (zip_entry_open($zip, $zip_entry) == FALSE) continue;
if (zip_entry_name($zip_entry) != "word/document.xml") continue;
$content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
zip_entry_close($zip_entry);
}// end while
zip_close($zip);
//echo $content;
//echo "<hr>";
//file_put_contents('1.xml', $content);
$content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
$content = str_replace('</w:r></w:p>', "\r\n", $content);
//header("Content-Type: plain/text");
$striped_content = strip_tags($content);
$striped_content = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t#\/\_\(\)]/","",$striped_content);
echo nl2br($striped_content);
}

I'm using Interop. It is somewhat problamatic but works fine in most of the case.
using System.Runtime.InteropServices;
using Microsoft.Office.Interop.Word;
This one returns the list of html converted documents' path
public List<string> GetHelpDocuments()
{
List<string> lstHtmlDocuments = new List<string>();
foreach (string _sourceFilePath in Directory.GetFiles(""))
{
string[] validextentions = { ".doc", ".docx" };
if (validextentions.Contains(System.IO.Path.GetExtension(_sourceFilePath)))
{
sourceFilePath = _sourceFilePath;
destinationFilePath = _sourceFilePath.Replace(System.IO.Path.GetExtension(_sourceFilePath), ".html");
if (System.IO.File.Exists(sourceFilePath))
{
//checking if the HTML format of the file already exists. if it does then is it the latest one?
if (System.IO.File.Exists(destinationFilePath))
{
if (System.IO.File.GetCreationTime(destinationFilePath) != System.IO.File.GetCreationTime(sourceFilePath))
{
System.IO.File.Delete(destinationFilePath);
ConvertToHTML();
}
}
else
{
ConvertToHTML();
}
lstHtmlDocuments.Add(destinationFilePath);
}
}
}
return lstHtmlDocuments;
}
And this one to convert doc to html.
private void ConvertToHtml()
{
IsError = false;
if (System.IO.File.Exists(sourceFilePath))
{
Microsoft.Office.Interop.Word.Application docApp = null;
string strExtension = System.IO.Path.GetExtension(sourceFilePath);
try
{
docApp = new Microsoft.Office.Interop.Word.Application();
docApp.Visible = true;
docApp.DisplayAlerts = WdAlertLevel.wdAlertsNone;
object fileFormat = WdSaveFormat.wdFormatHTML;
docApp.Application.Visible = true;
var doc = docApp.Documents.Open(sourceFilePath);
doc.SaveAs2(destinationFilePath, fileFormat);
}
catch
{
IsError = true;
}
finally
{
try
{
docApp.Quit(SaveChanges: false);
}
catch { }
finally
{
Process[] wProcess = Process.GetProcessesByName("WINWORD");
foreach (Process p in wProcess)
{
p.Kill();
}
}
Marshal.ReleaseComObject(docApp);
docApp = null;
GC.Collect();
}
}
}
The killing of the word is not fun, but can't let it hanging there and block others, right?
In the web/html i render html to a iframe.
There is a dropdown which contains the list of help documents. Value is the path to the html version of it and text is name of the document.
private void BindHelpContents()
{
List<string> lstHelpDocuments = new List<string>();
HelpDocuments hDoc = new HelpDocuments(Server.MapPath("~/HelpDocx/docx/"));
lstHelpDocuments = hDoc.GetHelpDocuments();
int index = 1;
ddlHelpDocuments.Items.Insert(0, new ListItem { Value = "0", Text = "---Select Document---", Selected = true });
foreach (string strHelpDocument in lstHelpDocuments)
{
ddlHelpDocuments.Items.Insert(index, new ListItem { Value = strHelpDocument, Text = strHelpDocument.Split('\\')[strHelpDocument.Split('\\').Length - 1].Replace(".html", "") });
index++;
}
FetchDocuments();
}
on selected index changed, it is renedred to frame
protected void RenderHelpContents(object sender, EventArgs e)
{
try
{
if (ddlHelpDocuments.SelectedValue == "0") return;
string strHtml = ddlHelpDocuments.SelectedValue;
string newaspxpage = strHtml.Replace(Server.MapPath("~/"), "~/");
string pageVirtualPath = VirtualPathUtility.ToAbsolute(newaspxpage);//
documentholder.Attributes["src"] = pageVirtualPath;
}
catch
{
lblGError.Text = "Selected document doesn't exist, please refresh the page and try again. If that doesn't help, please contact Support";
}
}

Related

Web Api Help Page XML comments from more than 1 files

I have different plugins in my Web api project with their own XML docs, and have one centralized Help page, but the problem is that Web Api's default Help Page only supports single documentation file
new XmlDocumentationProvider(HttpContext.Current.Server.MapPath("~/App_Data/Documentation.xml"))
How is it possible to load config from different files? I wan to do sth like this:
new XmlDocumentationProvider("PluginsFolder/*.xml")

You can modify the installed XmlDocumentationProvider at Areas\HelpPage to do something like following:
Merge multiple Xml document files into a single one:
Example code(is missing some error checks and validation):
using System.Xml.Linq;
using System.Xml.XPath;
XDocument finalDoc = null;
foreach (string file in Directory.GetFiles(#"PluginsFolder", "*.xml"))
{
if(finalDoc == null)
{
finalDoc = XDocument.Load(File.OpenRead(file));
}
else
{
XDocument xdocAdditional = XDocument.Load(File.OpenRead(file));
finalDoc.Root.XPathSelectElement("/doc/members")
.Add(xdocAdditional.Root.XPathSelectElement("/doc/members").Elements());
}
}
// Supply the navigator that rest of the XmlDocumentationProvider code looks for
_documentNavigator = finalDoc.CreateNavigator();

Kirans solution works very well. I ended up using his approach but by creating a copy of XmlDocumentationProvider, called MultiXmlDocumentationProvider, with an altered constructor:
public MultiXmlDocumentationProvider(string xmlDocFilesPath)
{
XDocument finalDoc = null;
foreach (string file in Directory.GetFiles(xmlDocFilesPath, "*.xml"))
{
using (var fileStream = File.OpenRead(file))
{
if (finalDoc == null)
{
finalDoc = XDocument.Load(fileStream);
}
else
{
XDocument xdocAdditional = XDocument.Load(fileStream);
finalDoc.Root.XPathSelectElement("/doc/members")
.Add(xdocAdditional.Root.XPathSelectElement("/doc/members").Elements());
}
}
}
// Supply the navigator that rest of the XmlDocumentationProvider code looks for
_documentNavigator = finalDoc.CreateNavigator();
}
I register the new provider from HelpPageConfig.cs:
config.SetDocumentationProvider(new MultiXmlDocumentationProvider(HttpContext.Current.Server.MapPath("~/App_Data/")));
Creating a new class and leaving the original one unchanged may be more convenient when upgrading etc...

Rather than create a separate class along the lines of XmlMultiDocumentationProvider, I just added a constructor to the existing XmlDocumentationProvider. Instead of taking a folder name, this takes a list of strings so you can still specify exactly which files you want to include (if there are other xml files in the directory that the Documentation XML are in, it might get hairy). Here's my new constructor:
public XmlDocumentationProvider(IEnumerable<string> documentPaths)
{
if (documentPaths.IsNullOrEmpty())
{
throw new ArgumentNullException(nameof(documentPaths));
}
XDocument fullDocument = null;
foreach (var documentPath in documentPaths)
{
if (documentPath == null)
{
throw new ArgumentNullException(nameof(documentPath));
}
if (fullDocument == null)
{
using (var stream = File.OpenRead(documentPath))
{
fullDocument = XDocument.Load(stream);
}
}
else
{
using (var stream = File.OpenRead(documentPath))
{
var additionalDocument = XDocument.Load(stream);
fullDocument?.Root?.XPathSelectElement("/doc/members").Add(additionalDocument?.Root?.XPathSelectElement("/doc/members").Elements());
}
}
}
_documentNavigator = fullDocument?.CreateNavigator();
}
The HelpPageConfig.cs looks like this. (Yes, it can be fewer lines, but I don't have a line limit so I like splitting it up.)
var xmlPaths = new[]
{
HttpContext.Current.Server.MapPath("~/bin/Path.To.FirstNamespace.XML"),
HttpContext.Current.Server.MapPath("~/bin/Path.To.OtherNamespace.XML")
};
var documentationProvider = new XmlDocumentationProvider(xmlPaths);
config.SetDocumentationProvider(documentationProvider);

I agree with gurra777 that creating a new class is a safer upgrade path. I started with that solution but it involves a fair amount of copy/pasta, which could easily get out of date after a few package updates.
Instead, I am keeping a collection of XmlDocumentationProvider children. For each of the implementation methods, I'm calling into the children to grab the first non-empty result.
public class MultiXmlDocumentationProvider : IDocumentationProvider, IModelDocumentationProvider
{
private IList<XmlDocumentationProvider> _documentationProviders;
public MultiXmlDocumentationProvider(string xmlDocFilesPath)
{
_documentationProviders = new List<XmlDocumentationProvider>();
foreach (string file in Directory.GetFiles(xmlDocFilesPath, "*.xml"))
{
_documentationProviders.Add(new XmlDocumentationProvider(file));
}
}
public string GetDocumentation(System.Reflection.MemberInfo member)
{
return _documentationProviders
.Select(x => x.GetDocumentation(member))
.FirstOrDefault(x => !string.IsNullOrWhiteSpace(x));
}
//and so on...
The HelpPageConfig registration is the same as in gurra777's answer,
config.SetDocumentationProvider(new MultiXmlDocumentationProvider(HttpContext.Current.Server.MapPath("~/App_Data/")));

Creating dynamic multi language website

I'm planning to implement a multi language website, so my first ideas were to use the resx files, but I have a requirements to let every text editable from the administration,
can i do such a feature with resx files or should I store them in a database (schemaless) or is there a better way to do this?

you can use xml or sql tables.
you should prepare a page for administrator and list all the words for translate.
base of language administrator logged on , update the translation of words into your table or xml file.
additional , for best performance load each language words to system catch .
write some code like this for entering words into table or xml.
<%=PLang.GetString("YourWordInEnglish")%>
in your aspx
...................
public static string GetString(string word)
{
try
{
if (String.IsNullOrWhiteSpace(word)) return "";
Dictionary<string, string> resourcesDictionary = GetResource(GetLanguageID());
if (resourcesDictionary != null)
{
if (!resourcesDictionary.ContainsKey(word.ToLower()))
{
Expression exp = new Expression();
exp.Word = exp.Translation = word;
exp.LanguageID = GetLanguageID();
exp.SiteID = Globals.GetSiteID();
if (exp.SiteID == 0 && exp.LanguageID == 0)
return word;
if (FLClass.createExpression(exp, ref resourcesDictionary) > 0)
return resourcesDictionary[word];
else
return word;
}
return resourcesDictionary[word.ToLower()];
}
else
return word;
}
catch
{
return word;
}
}
...................
function for edit
public class ViewExpressionListEdit : BaseWebService
{
[WebMethod(EnableSession = true)]
public bool updateExpression(ExpressionService expressionService)
{
Expression expression = new Expression();
expression.ExpressionID = expressionService.ExpressionID;
expression.Translation = expressionService.Translation;
expression.LanguageID = expressionService.LanguageID;
expression.SiteID = Globals.GetSiteID();
return FLClass.updateExpression(expression);
}
}

You can use XML files for translations, parse them on application startup and store translations in cache. You can use the FileSystemWatcher class to see when someone updates the files and then invalidate the cache.

XmlReader.ReadtoFollowing has state EndofFile why?

I've produced this code to read an xml file from a string, however it has problems. Notably the ReadToFollowing() method returns nothing. It seems to seek the whole xmlstring, then set the XMLReader state to the EndofFile. I'm very puzzled by this, ReadStartElement() works and the next element is read as "heading" as you'd expect.
Here's my code, my idea is to read through the xml pulling out the fields I require;
List<string> contentfields = new List<string>() { "heading", "shortblurb", "description" };
string xml = #"<filemeta filetype='Audio'><heading>Fatigue & Tiredness</heading><shortblurb>shortblurb</shortblurb><description /><Comments /><AlbumTitle /><TrackNumber /><ArtistName /><Year /><Genre /><TrackTitle /></filemeta>";
using (XmlReader reader = XmlReader.Create(new StringReader(xml)))
{
reader.ReadStartElement("filemeta");
foreach (String field_str in contentfields)
{
reader.ReadToFollowing(field_str);
if (reader.Name.ToString() == field_str)
{
Console.WriteLine(field_str + " " + reader.ReadElementContentAsString());
}
}
}
Console.ReadKey();

That's because reader.ReadStartElement("filemeta"); will position the reader on the xml tag heading.
ReadToFollowing will then do 1 read (reading past your heading tag) and then start to seek an element with the name heading. As you just read past it, ReadToFollowing will not find it anymore and read to the end of the file.
If you want to avoid this, change your code like this :
List<string> contentfields = new List<string>() { "heading", "shortblurb", "description" };
string xml = #"<filemeta filetype='Audio'><heading>Fatigue & Tiredness</heading><shortblurb>shortblurb</shortblurb><description /><Comments /><AlbumTitle /><TrackNumber /><ArtistName /><Year /><Genre /><TrackTitle /></filemeta>";
using (XmlReader reader = XmlReader.Create(new StringReader(xml)))
{
reader.ReadStartElement("filemeta");
foreach (String field_str in contentfields)
{
if (reader.Name.ToString() != field_str)
{
reader.ReadToFollowing(field_str);
}
//still keep this if because we could have reached the end of the xml document
if (reader.Name == field_str)
{
Console.WriteLine(field_str + " " + reader.ReadElementContentAsString());
}
}
}
Console.ReadKey();

Display an ashx image using jQuery?

I've been trying to use the jQuery plugin Colorbox to display images I have in my DB through an ashx file. Unfortunately it just spits a bunch of gibberish at the top of the page and no image. Can this be done? Here is what I have so far:
$(document).ready
(
function ()
{
$("a[rel='cbImg']").colorbox();
}
);
...
<a rel="cbImg" href="HuntImage.ashx?id=15">Click to see image</a>
UPDATE:
My ashx file is writing the binary out:
context.Response.ContentType = "image/bmp";
context.Response.BinaryWrite(ba);

Colorbox has an option 'photo'. If you set this to true in your constructor then it will force it to render the photo.
$(target).colorbox({photo: true});

You should be setting the src attribute in the client side.
<img src="HuntImage.ashx?id=15" ..../>
The handler
public class ImageRequestHandler: IHttpHandler, IRequiresSessionState
{
public void ProcessRequest(HttpContext context)
{
context.Response.Clear();
if(context.Request.QueryString.Count != 0)
{
//Get the stored image and write in the response.
var storedImage = context.Session[_Default.STORED_IMAGE] as byte[];
if (storedImage != null)
{
Image image = GetImage(storedImage);
if (image != null)
{
context.Response.ContentType = "image/jpeg";
image.Save(context.Response.OutputStream, ImageFormat.Jpeg);
}
}
}
}
private Image GetImage(byte[] storedImage)
{
var stream = new MemoryStream(storedImage);
return Image.FromStream(stream);
}
public bool IsReusable
{
get { return false; }
}
}

It appears that I can't do what I am trying using colorbox with an ashx image. If anyone finds a way please post it here.
I considered deleting the question but I will leave it up incase someone else runs into the same issue.

Find this function around line 124 (colorbox 1.3.15)
// Checks an href to see if it is a photo.
// There is a force photo option (photo: true) for hrefs that cannot be matched by this regex.
function isImage(url) {
return settings.photo || /\.(gif|png|jpg|jpeg|bmp)(?:\?([^#]*))?(?:#(\.*))?$/i.test(url);
}
On line 127, add |ashx after bmp in (gif|png|jpg|jpeg|bmp) so it reads like this:
// Checks an href to see if it is a photo.
// There is a force photo option (photo: true) for hrefs that cannot be matched by this regex.
function isImage(url) {
return settings.photo || /\.(gif|png|jpg|jpeg|bmp|ashx)(?:\?([^#]*))?(?:#(\.*))?$/i.test(url);
}
This is working just fine for me in Sitecore 6.2 :)

How to get folder size in Adobe Air?

How to get folder size in Adobe Air?

Should be fairly simple using File.size. Just in case is confusing, folders in AIR are represented using the File class, which extends FileReference, thus the link to the FileReference documentation.

Recursive folder listings and contents processing
http://cookbooks.adobe.com/post_Recursive_folder_listings_and_contents_processing-9410.html
...has sufficient sample code in it to get you started.

my implementation is:
public static function getFileSize(file:File):Number{
var result:Number = 0;
if(file == null || file.exists == false) {
return 0;
}
if(file.isDirectory){
var files:Array = file.getDirectoryListing();
for each (var f:File in files) {
if(f.isDirectory){
result += getFileSize(f);
}else{
result += f.size;
}
}
}else{
return file.size;
}
return result;
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How do I convert a .docx to html using asp.net? - asp.net

Try this post? I don't know but might be what you are looking for.

Word 2007 has an API that you can use to convert to HTML. Here's a post that talks about it http://msdn.microsoft.com/en-us/magazine/cc163526.aspx. You can find documentation around the API, but I remember that there is a convert to HTML function in the API.

Related

Web Api Help Page XML comments from more than 1 files

Creating dynamic multi language website

XmlReader.ReadtoFollowing has state EndofFile why?

Display an ashx image using jQuery?

How to get folder size in Adobe Air?

Categories

Resources