Is pandoc's HTML output safe? - docx

We are considering to use pandoc for importing documents from docx to html for viewing in a web app.
Can pandoc's html output be considered safe for displaying inline in a web page, or should it be farther cleaned by another tool?

Related

Linking to other pages on Simple R markdown website

I'm developing a Simple R Markdown Website, and want to include a few links to other .Rmd generated HTML pages on the home page.
Simply, I want to open up "detailed-desc.html" which is rendered from "detailed-desc.Rmd", on clicking a text element on the "index.Rmd" file. I'm not sure how this can be done.
If all your .Rmd files and their rendered HTML results are in the same directory, then you can use the regular markdown syntax and reference the HTML files for links.
[Detailed Description](detailed-desc.html)
Detailed Description
With a more complicated site folder structure, it is just
[Detailed Description](./path/to/your/folder/detailed-desc.html)
Detailed Description

Alternative to HTML to PDF converter?

I've been using the Winnovative HTML to PDF converter for a few years, but I've noticed the quality can be impared because the images etc have first had to be rendered in HTML before being converted into a PDF format.
Winnovative have another option where you can add objects to the PDF Converter before outputting the result, but as this allows you to add HTML elements, I imagine this works in a similar way to the HTML to PDF converter (in terms of rendering).
Is there an alternative to this so that I can generate a PDF in my ASP.NET Web Application without it first having to be rendered as HTML?
I'm looking for the most high quality option
You can use iTextSharp library. It has an object representation of whole PDF document so it will allow you to add any elements you need without translating it from html elements. It also allows you to convert html to pdf, but of course you can do it manually instead by building PDF document from basic blocks...
If you will use version 4.x then it's free to use in commercial projects (LGPL license). Version 5.x is avaible on Affero General Public License so I believe you have to buy it to use in commercial projects, but the features I've described are avaible in the 4.xversion
try http://wkhtmltopdf.org/
it's lightning fast in comparison to iTextSharp.
For step by step installation check out these articles:
http://www.megustaulises.com/2012/12/mvcnet-convert-html-to-pdf-with-pechkin.html
http://w3facility.org/question/how-to-pass-html-as-a-string-using-wkhtmltopdf/
And this manual:
http://madalgo.au.dk/~jakobt/wkhtmltoxdoc/wkhtmltopdf-0.9.9-doc.html

A "shortcut" method for exporting an ASP.Net Page to pdf/xls

I want to export a few Pages to pdf/xls. By Pages I mean as the eye sees it - a screenshot of the Page's contents. I know how to build pdf/xls documents using 3rd party tools but is there any way to quickly export the rendered contents of say a Panel?
edit: maybe a tool that can render the page's output as a browser would, and save it as an image file?
There is an open source console program named wkhtmltopdf which you could call from asp.net to convert the page. It can convert to PDF or an image with wkhtmltoimage (JPG, PNG, etc.) using the webkit rendering engine.
Check my answer to this question to see an example of how to convert from a html to a pdf using C#:
Easiest way of porting html table data to readable document
I can recommend http://www.screengrab.org/ for firefox.

Display Word Document inside ASP.Net page

I want to display a word Document, which is sitting on my IIS. I want to display the whole document as is, inside a iFrame on my aspx page.
I know I can use MS Word Libs, but I cannot install Word on Server where application will be hosted, (Correct me if I am wrong: I cannot use just dlls without installing MS Word on Server).
How can I display the word document in my iFrame?
Probably the easiest way would be to include the Google Docs Viewer.
Other ways could be to use Aspose.Words (commercial) to convert Word to PDF and then use Aspose.Pdf.Kit to convert PDF to images and then display the images online.
PowerTools for Open XML contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See http://bit.ly/1bclyg9

adding images to openxml doc created from altchunk

I need an automated process for creating docx files from xhtml source. The xhtml files contain images (<img> elements) whose "src" attributes point to an external reference. But the docx files need to be readable without a network connection, so I need to find a way to embed the images directly into the docx package (namely, in the /media folder).
So far I've used the altChunk method (as described by Eric White) to create the .docx file. I had hoped to use the OpenXML SDK to insert the image parts into the package. But to do that I need to insert paragraphs (<p> nodes) into the document. Unfortunately the document part contains nothing but a reference to the altChunk (stored separately in the docx package). Of course, once the docx is opened, edited and saved, the altChunk part is removed and it’s contents are embedded properly in the document.xml. But I don’t know of any way to do that programatically, so that doesn't help.
Other options I’ve considered:
Partitioning the xhtml into segments, separated between each image, then adding each altChunk one at a time, with the appropriate image reference between each one. (Tedious but seems possible)
Inserting the images into the media folder, and then find way to embed WordProcessingML directly into the xhtml so that the <img> references the packaged image file. (Questionable at best)
Can anyone think of a better approach?
Well, I sorta solved my own problem: I decided to convert the document to mHtml (which can contain images embedded directly in the file) and then use the altchunk to create the final docx file. However, I still wanted to do some post-processing on the file (to insert endnotes in the Word document), but as mentioned above, this is not possible until after the altchunk has been transformed into docx, which cannot be done programmatically.
So it dawned on me that I could bypass the altchunk path altogether and simply use mHtml as the "gateway" from xHtml to docx. I just transformed the xHtml into mHtml, complete with embedded images and endnotes, then renamed the file with a .doc extension. The resulting document can be opened directly by Word (and will be converted more properly on subsequent save). So far it works great (albeit with some bugs in Mac's version of Word, as well as Word2003).

Resources