How to generate Rich Text Format (RTF) document in ASP.NET?
The RTF specification from Microsoft:
http://msdn.microsoft.com/en-us/library/aa140277%28office.10%29.aspx
Or, in order to bypass the RTF learning curve, you could generate HTML from ASP.NET then shell out and use the html2rtf tool to convert it to rtf:
http://www.gkrueger.com/personal/html2rtf/index.html
Or, if your doc is going to be used in template-like manner, you could create the rtf file by hand in WordPad and save it with special strings that you invent, like $YOUR_NAME_HERE$. Later, you read the your saved rtf file and replace your special strings.
Just follow the spec:
http://en.wikipedia.org/wiki/Rich_Text_Format
Start your file with
MyFile.WriteLine("{\\rtf1 ")
and just follow the spec linked by Spencer.
Related
I have a document like the one mentioned below. There is some text above the table and then there's a table. How do I extract table from the docx file in R or python and then convert it to a csv file or an xlsx file. I don't even mind a .txt file if it retains the exact format of the table. I just don't know what to do with this doc file.
If the document is docx, then it is all XML. The docx file is just a zip container with various XML "parts". Take a look at the Open XML SDK for some ideas on how to parse the file. This SDK is C#, but maybe you can get some ideas from that.
If you are just going to extract the table it should not be too bad ( Updating complex docx documents can get very complicated. I'm working on this now.) My tip to make things easier is to go to the table properties, then to the Alt Text tab and add a unique value to the "Title" field. The value will show up like this within the table properties: <w:tblCaption w:val="TBL1"/>, which will make the table easier to extract from the XML.
If you are going to work with Open XML documents, get the OOXML Chrome Addin. That is great for exploring the internals of docx files.
Note: I saw the link to another SO answer for this. That uses "automation", which is certainly easier to code, but Office via "automation" on the server is not recommended by MS.
You can extract tables from docx using python-docx in python.
Try this:
from docx import Document
import pandas as pd
document = Document(file_path)
tables = []
for index,table in enumerate(document.tables):
df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
df[i][j] = cell.text
pd.DataFrame(df).to_excel("Table# "+str(index)+".xlsx")
I am new to R and have worked for a while as follows. I have the code writen in a word document, then I copy and paste the document with the code into R as to have the code run which works fine, however when the code is long (hundred pages) it takes a significant amount of time in R to start making the code run. This seems rather not a very effective working procedure and I am sure there are other forms to compile the R code.
On another hand one of then that comes to my mind is to import the content of word into R which I am unsure how to do. Have tried with read.table but it does not work, have look on internet as to how to import data, however most explanations are all for data tables etc or internet files in the form of data tables and similar. I have tried saving the document into csv. however word does not include csv have tried with Rich text format and XML package but again the instructions from the packages are for importing tables and similars. I am wondering if there is an effective way for R to import a word document as is in the word document.
Thank you
It's hard to say what the easiest solution would be, without examining the word document. Assuming it only contains code and nothing else, it should be pretty easy to convert it all to plain text from within Word. You can do that by going to File -> Save As, and use 'plain text' under 'Save as type'.
Then edit the filename extension to .R from .txt, download a proper text editor (I can recommend RStudio for R), and open your code in it. Then you will be able to run the code from inside the editor without using copy / paste.
No, read table won't do it.
Microsoft Word has its own format, which includes a lot of meta data over and above the text you enter into it. You'll need a reader/parser that understands the Word format.
A Java developer would use a library like Apache POI to read and parse it into word tokens and n-grams.
Look for Natural Language Processing tools, like this R module:
http://cran.r-project.org/web/views/NaturalLanguageProcessing.html
I am building a web application that requires a number of dynamically generated RTF files to be generated, these files are currently .RTF files but I need to build them all within my application in RTF format, does anyone know a program or something that I can open them in and view the raw RTF code that creates the file eg.
{\rtf\ansi\deff0{\fonttbl{\f0\froman Tms Rmn;}{\f1\fdecor
Symbol;}{\f2\fswiss Helv;}}{\colortbl;\red0\green0\blue0;
\red0\green0\blue255;\red0\green255\blue255;\red0\green255\
blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\
green255\blue0;\red255\green255\blue255;}{\stylesheet{\fs20
\snext0Normal;}}{\info{\author John Doe}
{\creatim\yr1990\mo7\dy30\hr10\min48}{\version1}{\edmins0}
{\nofpages1}{\nofwords0}{\nofchars0}{\vern8351}}\widoctrl\ftnbj \sectd\linex0\endnhere \pard\plain \fs20 This is plain text.\par}
Thanks for any help.
J.
Notepad, or any text editor will do. As you are creating the files yourself programatically, you will want to be liberal with carriage returns and line feeds to make the files more readable.
I'm writing a major report, and have two PDF files I'd like to include as appendices. The report is written using ReStructuredText, and rst2pdf will be used to convert it.
Does docutils or rst2pdf have any functionality for external files as appendices?
Docutils has the raw directive for passing data through to the final output untouched. In the documentation they demonstrate this for the LaTeX and HTML outputs. rst2pdf seems to support this directive: in the manual they use the raw directive to include some text/commands in the final PDF (see the section headed Raw Directive) but they do not demonstrate using this directive for including external PDF files.
If rst2pdf does support this feature, you should just be able to use:
.. raw:: pdf
:file: your_pdf_file.pdf
:encoding: the encoding of the PDF file, if different from the
reStructuredText document's encoding.
I have just had a go at doing this (if in doubt, give it a go) and I get a number of UnicodeDecodeErrors, so the feature seems to be supported but I can't get it to work.
You could embed PDFs as images, but that makes no sense for appendixes.
If you only have those files as PDF, you can add them using a PDF manipulation tool, but those usually break page numbering or links or some other piece of the PDFs.
In the end, I couldn't fix this problem directly. I converted the ReStructuredText file to Latex, and included the appendices there.
I plan to use reStructuredText to write documentation with the main purpose of generating some nice HTML pages. For this I use the docutils rst2html.py tool.
However, I may also need to present the documentation in nice plain text format, that is without the reStructuredText markup, and where paragraph wrapping and similar nice formatting is still performed on the text output. But, there is no rst2txt.py tool in the docutils.
Is there a way to convert reStructuredText to nice plain text format, maybe with use of special options to docutils ?
I have also seen this done by rendering to html using rst2html, then converting the html to plain text by using a command-line html browser, such as:
lynx http://lynx.browser.org
links http://links.sourceforge.net
w3m http://w3m.sourceforge.net
elinks http://elinks.or.cz
Each of these browsers has a command-line switch or similar to render its output to a .txt file, so you could create a two line script called 'rst2txt', something like:
rst2html docs.rst docs.html
lynx -dump docs.html > docs.txt
Sphinx has a TextBuilder for txt output format. Just tried it and it seems to do what you are looking for.
However, it might be a little outdated because it is not in the default Makefile. But it worked well on my fairly complex documentation (150 pdf pages). To use it, just add the following target to it:
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) _build/text
#echo
#echo "Build finished."
Also, keep in mind that Sphinx implements only a subset of the rst specs.