How to convert reStructuredText to plain text - restructuredtext

I plan to use reStructuredText to write documentation with the main purpose of generating some nice HTML pages. For this I use the docutils rst2html.py tool.
However, I may also need to present the documentation in nice plain text format, that is without the reStructuredText markup, and where paragraph wrapping and similar nice formatting is still performed on the text output. But, there is no rst2txt.py tool in the docutils.
Is there a way to convert reStructuredText to nice plain text format, maybe with use of special options to docutils ?

I have also seen this done by rendering to html using rst2html, then converting the html to plain text by using a command-line html browser, such as:
lynx http://lynx.browser.org
links http://links.sourceforge.net
w3m http://w3m.sourceforge.net
elinks http://elinks.or.cz
Each of these browsers has a command-line switch or similar to render its output to a .txt file, so you could create a two line script called 'rst2txt', something like:
rst2html docs.rst docs.html
lynx -dump docs.html > docs.txt

Sphinx has a TextBuilder for txt output format. Just tried it and it seems to do what you are looking for.
However, it might be a little outdated because it is not in the default Makefile. But it worked well on my fairly complex documentation (150 pdf pages). To use it, just add the following target to it:
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) _build/text
#echo
#echo "Build finished."
Also, keep in mind that Sphinx implements only a subset of the rst specs.

Related

Importing the contents of a word document into R

I am new to R and have worked for a while as follows. I have the code writen in a word document, then I copy and paste the document with the code into R as to have the code run which works fine, however when the code is long (hundred pages) it takes a significant amount of time in R to start making the code run. This seems rather not a very effective working procedure and I am sure there are other forms to compile the R code.
On another hand one of then that comes to my mind is to import the content of word into R which I am unsure how to do. Have tried with read.table but it does not work, have look on internet as to how to import data, however most explanations are all for data tables etc or internet files in the form of data tables and similar. I have tried saving the document into csv. however word does not include csv have tried with Rich text format and XML package but again the instructions from the packages are for importing tables and similars. I am wondering if there is an effective way for R to import a word document as is in the word document.
Thank you
It's hard to say what the easiest solution would be, without examining the word document. Assuming it only contains code and nothing else, it should be pretty easy to convert it all to plain text from within Word. You can do that by going to File -> Save As, and use 'plain text' under 'Save as type'.
Then edit the filename extension to .R from .txt, download a proper text editor (I can recommend RStudio for R), and open your code in it. Then you will be able to run the code from inside the editor without using copy / paste.
No, read table won't do it.
Microsoft Word has its own format, which includes a lot of meta data over and above the text you enter into it. You'll need a reader/parser that understands the Word format.
A Java developer would use a library like Apache POI to read and parse it into word tokens and n-grams.
Look for Natural Language Processing tools, like this R module:
http://cran.r-project.org/web/views/NaturalLanguageProcessing.html

Create an ePub file from markdown with math

I've spent a considerable about of time trying to figure out how I can take a markdown file, which contains TeX math and convert it into an ePub file where the math is rendered properly.
For example:
This is a markdown file. Here is a [link](www.example.com).
Here is some inline math: $\sigma_{i=1}^n \frac{\mu}{100}$
Here is an equation:
$$ y = mx + b $$
How can I convert a markdown file with the above text into an ePub file?
I've experimented with different methods of conversion using Pandoc; however, I still can not find a solution which renders the math even 50% correct.
Can anyone provide any help as to how I can do this?
I've tried this solution as well as other Pandoc option without success. Thanks in advance for the help.
Pandoc has an EPUB3 writer. It renders latex math into MathML, which EPUB3 readers are supposed to support (but unfortunately still few do). Use pandoc -t epub3 to force EPUB3 output, as opposed to EPUB2 which is the default.
Of course, this isn't much help if you want EPUB2 output or target readers that don't support MathML. Then you could try using the --webtex option, which will use a web service to convert the TeX to an image.

Annotating Adobe Reader PDFs with math symbols

Many of the math textbooks and other literature I read is in PDF format, so I frequently find myself annotating these with the Adobe Reader comments tool.
I did find a helpful guide here, but sometimes I'd like the option of inserting math symbols, too. Has anyone found a reliable way to insert math symbols, TeX, or other arbitrary formatting into the annotations?
So far, the best I've come up with is to enter the unicode prefixed by "0x" and hit alt+X after it. Maybe with the Adobe javascript SDK you could write a script to shortcut this.
I don't think any of the current commercial editors make this easy, which is too bad. I am sure the vendors monitor this site, so there is hope.
In the meantime, here is a manual workaround.
Use tikz to create your comment boxes. Here are the two examples I found to be most relevant: Boxes and Positioning. Play around with the options to get both the shape and the placement you want. Generate a pdf file from the latex source that contains your comments.
IMPORTANT: if your comments end before the last page of the original document, insert:
\pagebreak{} % create empty page
\thispagestyle{empty} % get rid of page numbers et al
~ % put a space so the page gets generated
before your \end{document}, to get an empty last page. The following command will reuse the last page of your comments document on all subsequent pages of the original document.
Use a recent version of pdftk with the multistamp command to overlay your equations file with your original file like so:
pdftk original.pdf multistamp comments.pdf output out.pdf
Also see this question.
The free (as speech) PDF tool, Okular, supports this functionality by putting latex formula directly between $$...$$.

Adding a external PDF as appendix with ReStructuredText

I'm writing a major report, and have two PDF files I'd like to include as appendices. The report is written using ReStructuredText, and rst2pdf will be used to convert it.
Does docutils or rst2pdf have any functionality for external files as appendices?
Docutils has the raw directive for passing data through to the final output untouched. In the documentation they demonstrate this for the LaTeX and HTML outputs. rst2pdf seems to support this directive: in the manual they use the raw directive to include some text/commands in the final PDF (see the section headed Raw Directive) but they do not demonstrate using this directive for including external PDF files.
If rst2pdf does support this feature, you should just be able to use:
.. raw:: pdf
:file: your_pdf_file.pdf
:encoding: the encoding of the PDF file, if different from the
reStructuredText document's encoding.
I have just had a go at doing this (if in doubt, give it a go) and I get a number of UnicodeDecodeErrors, so the feature seems to be supported but I can't get it to work.
You could embed PDFs as images, but that makes no sense for appendixes.
If you only have those files as PDF, you can add them using a PDF manipulation tool, but those usually break page numbering or links or some other piece of the PDFs.
In the end, I couldn't fix this problem directly. I converted the ReStructuredText file to Latex, and included the appendices there.

Including an image using roxygen documentation

Is it possible to include an image in documentation generated by roxygen? I have a number of functions that are essentially wrappers for ggplot() that I'd like to document by showing an example of the output.
As per the change list from the announcement of R 2.14:
Rd markup has a new \figure tag so that figures can be included in
help pages when converted to HTML or LaTeX. There are examples on
the help pages for par() and points().
From: http://cran.r-project.org/doc/manuals/R-exts.html#Figures
To include figures in help pages, use the \figure markup. There are three forms.
The two commonly used simple forms are \figure{filename} and \figure{filename}{alternate text}. This will include a copy of the figure in either HTML or LaTeX output. In text output, the alternate text will be displayed instead. (When the second argument is omitted, the filename will be used.) Both the filename and the alternate text will be parsed verbatim, and should not include special characters that are significant in HTML or LaTeX.
The expert form is \figure{filename}{options: string}. (The word ‘options:’ must be typed exactly as shown and followed by at least one space.) In this form, the string is copied into the HTML img tag as attributes following the src attribute, or into the second argument of the \Figure macro in LaTeX, which by default is used as options to an \includegraphics call. As it is unlikely that any single string would suffice for both display modes, the expert form would normally be wrapped in conditionals. It is up to the author to make sure that legal HTML/LaTeX is used. For example, to include a logo in both HTML (using the simple form) and LaTeX (using the expert form), the following could be used:
\if{html}{\figure{logo.jpg}{Our logo}}
\if{latex}{\figure{logo.jpg}{options: width=0.5in}}
The files containing the figures should be stored in the directory man/figures. Files with extensions .jpg, .pdf, .png and .svg from that directory will be copied to the help/figures directory at install time. (Figures in PDF format will not display in most HTML browsers, but might be the best choice in reference manuals.) Specify the filename relative to man/figures in the \figure directive.

Resources