knitr html to Word docx using pandoc - r

I have been saving some example R markdown html output to Word using pandoc. I actually only do this so I can add some page breaks for easier printing:
system("pandoc -s Exercise1.html -o Exercise1.docx")
Although the output is acceptable I was wondering if there is a way to keep the original syntax highlighting of the R chunks (just as they are in the original knit HTML document)?
Also, I seem to be loosing all images in the conversion process and have to stick them into Word by hand. Is that normal?

Using the rmarkdown package (baked into RStudio Version 0.98.682, the current preview release) it's very simple to convert Rmd to docx, and code highlighting is included in the docx file.
You just need to include this at the top of your markdown text:
---
title: "Untitled" # obviously you can change this
output: word_document # specifies docx output
---
However, it seems that page breaks are still not supported in this conversion.

Why not convert the markdown directly to Word format?
Anyway, Pandoc does not support syntax highlighting in Word: "Currently, the only output formats that uses this information are HTML and LaTeX."
About the images: the Word file would definitely include those if you'd convert the markdown to Word directly. I am not sure about the HTML source, but I suppose you might have a path issue.

Related

Word to R Markdown Conversion

I have received a file stored in Microsoft Word that includes formatted words (italics, bold). I would like to do some work with the file (extracting sections, inserting words, etc.) and was planning to do this work with R Markdown. I need to keep the formatting (italics, bold) from Word during this conversion. I know I can convert from Markdown to Word, but is the reverse conversion from Word to Markdown also possible? If not, does anyone have any suggestions of how to bring Word into Markdown (relatively) painlessly while maintaining the italics and bold formatting?
From the pandoc manual under "Demos": pandoc -s example30.docx -t markdown -o example35.md
For rmarkdown, please see this answer, Convert docx to Rmarkdown
You could use first pandoc, then RStudio.
In pandoc, pandoc -o output.md originFile.docx. In which, your
output is a markdown from a Word.
Open your RStudio, you can choose your type file at the bottom of the console, whether you select "markdown" or "Rmarkdown". You will be able to change your markdown file.
Also there is Writeage, from convert markdown to word. This is a pulgin in Word

What's the best way to convert between PDF, HTML, and presentation formats in RMarkdown?

I'd like to find an effective and efficient way to convert between PDF, HTML, and presentation formats using the same .Rmd file, under the constraint that the presentation output should be figures only.
I have many .Rmd files that need to be re-formatted depending on which output I need at any given time. Any ideas?
An effective way could be the output tag of the YAML header of the Rmd file, based on which it generates PDF/PPT/Doc/HTML, etc. outputs.
In the top YAML part of Rmd file:
output: ioslides_presentation
for a presentation output.
Similarly, there are more formats described in the documentation.

R Markdown - no ODT and LaTeX options as an output

I found R markdown/knitr useful tool to document my work and generate summary document.
I work with .Rmd (R markdown) files in RStudio.
It seems that knitr provide appropriate functionality to generate .odt (Open Document Text) and .tex (LaTeX) documents from .Rmd.
However, R studio allows to choose .docx, .html and .pdf formats only.
I would like to avoid MS Word format since I prefer open standards and working under Linux.
Is it possible to add .odt and .tex options to Rstudio menu?
It doesn't seem possible to output odt directly in RStudio, but you can always use knitr::knit to produce a markdown document and pandoc to produce the odt:
library(knitr)
knit("myDoc.Rmd")
system("pandoc myDoc.md -o myDoc.odt")
You may have to adjust the pandoc options and adapt the template to get a nice looking result.
As for latex, you can keep the tex sources when compiling to pdf with the following option in your yaml front matter:
---
output:
pdf_document:
keep_tex: true
---

centre title in PDF converted from markdown using Pandoc

I'm converting a markdown document to a PDF document using Pandoc from within R. I'm trying to centre the title.
So far, I've tried:
<center># This is my title</center>
and
-># This is my title<-
but neither have worked. Is there a way to centre the title when converting from markdown to PDF using Pandoc?
pandoc has its own extended version of markdown. This includes a title block.
If the file begins with a title block
% my title
% Me; Someone else
% May 2013
This will be parsed into LaTeX and the resulting pdf as
\title{my title}
\author{Me \and Someone Else}
\date{May 2013}
and then `
\maketitle
called within the document.
This will create the standard centred title.
If you want to change how the title etc is formatted you could use the titling package.
If you want to change how the section headers are formatted, you could use the titlesec package.
To automagically have pandoc implement these you could define your own template. A simpler option is to have a file with your desired latex preamble to be included in the header. and then use the appropriate arguments when calling pandoc (eg -H FILE or --include-in-header=FILE)

Creating a title and numbering of sections on pdf while using R markdown with pandoc

I am using the markdown document created by R and I am trying to create the pdf file from the markdown using pandoc. Everything else works fine but I want the title of the document to appear and the numbering on the sections as in default Latex document. It seems the title defined on rmarkdown appears as a section title for the pdf.
I was able to create double spacing, enter line numbers etc by using a options.sty file. Below is the code that I used to create a pdf.
For options.sty I used:
\usepackage{setspace}
\doublespacing
\usepackage[vmargin=0.75in,hmargin=1in]{geometry}
\usepackage{lineno}
\usepackage{titlesec}
\titleformat{\section}
{\color{red}\normalfont\Large\bfseries}
{\color{red}\thesection}{1em}{}
\titleformat{\subsection}
{\color{blue}\normalfont\Large\bfseries}
{\color{blue}\thesection}{0.8em}{}
\title{Monitoring Stations}
\author{Jdbaba}
I used knitr to create the R markdown file. In the above options.sty file, it seems the program is not taking title and author part. It seems I am missing something.
The code I used to convert markdown to pdf is as follows:
pandoc -H options.sty mydata.md -o mydata.pdf
In latex document, the pdf would have the automatic numbering as well. But my pdf is missing that. Can anyone suggest how numbering can be enabled on the pdf document created using pandoc ?
Thanks.
Pandoc takes the title from a title block in the Markdown file. This is a Pandoc-specific extension to Markdown. The block should have the following format:
% title
% author(s) (separated by semicolons)
% date
So, in your case:
% Monitoring Stations
% Jdbaba
% March 6, 2013
To have the sections numbered, you'll need to run Pandoc with the --number-sections option.

Resources