Word to R Markdown Conversion - r

I have received a file stored in Microsoft Word that includes formatted words (italics, bold). I would like to do some work with the file (extracting sections, inserting words, etc.) and was planning to do this work with R Markdown. I need to keep the formatting (italics, bold) from Word during this conversion. I know I can convert from Markdown to Word, but is the reverse conversion from Word to Markdown also possible? If not, does anyone have any suggestions of how to bring Word into Markdown (relatively) painlessly while maintaining the italics and bold formatting?

From the pandoc manual under "Demos": pandoc -s example30.docx -t markdown -o example35.md
For rmarkdown, please see this answer, Convert docx to Rmarkdown

You could use first pandoc, then RStudio.
In pandoc, pandoc -o output.md originFile.docx. In which, your
output is a markdown from a Word.
Open your RStudio, you can choose your type file at the bottom of the console, whether you select "markdown" or "Rmarkdown". You will be able to change your markdown file.
Also there is Writeage, from convert markdown to word. This is a pulgin in Word

Related

Compile Bookdown to Markdown?

Is there any way to take a Bookdown project, and build it as Markdown instead of HTML or TeX?
I ask because I need to post-process the final Markdown output from Bookdown, in order to extract R and Python notebooks for download.
In more detail, I am using Bookdown to build a textbook that embeds notebooks to download, where the notebooks contain subsets of the code and text in the bookdown .Rmd files. For example, a single chapter could contain more than one notebook.
In order to do this, I put start and end comment markers in the RMarkdown input text to identify the section that will be a notebook, and then post-process the generated Markdown files to extract the notebook section. As in something like:
<!--- notebook: first_section.Rmd
-->
Some explanation, maybe using Bookdown extra markup such as #a_citation.
```{r}
a <- 1
a
```
<!--- end of notebook
-->
More markdown.
```{r}
# More code not in notebook.
b <- 2
```
Obviously I could use the input RMarkdown pages, but this would be ugly, because all the extended Bookdown markup such as citations, cross-references and so on, would appear in raw and ugly form in the generated notebook. So I'd really like to be able to get the final output Markdown, after merging, resolving of citations and cross references. Is there any way of doing that?
My question is similar to this as-yet unanswered question, but adds my motivation for an official solution to this problem.
With the latest version of bookdown on CRAN, you can use the output format bookdown::markdown_document2, e.g.,
output:
bookdown::markdown_document2:
base_format: rmarkdown::md_document
variant: gfm

Is there some convenient way to convert rmarkdown to pandoc markdown?

I use rstudio to write r-markdown, but sometimes it is not compatible with markdown support by pandoc(math for example. If there is a way allow me to convert r-markdown to pandoc markdown, then it will be convenient to export my articles to pdf, org, rts, latex...
https://pandoc.org/ also seems doesn't mention rmarkdown support.
I have tried to export .html form rstudio and use pandoc convert the html file back to markdown, but it seems doesn't work.
Actually pandoc is used to create pdf and other formats from r-markdown. Therefore there is an intermediate file with pandoc compatible markdown. You could retain this file by:
rmarkdown::render("document.Rmd", output_format = "pdf_document", run_pandoc = FALSE)

Accents for characters from R Markdown to Microsoft Word

I'm trying to produce a word document with R markdown. Usually i'm creating pdf-files with no problems. I use Latex syntax to create accents for characters for normal text like:
$\'{e}\'{e}$n
And then it produces "één" correctly when i knit my pdf. But this doesn't seem to work when i knit a word document, it will just show "$\'{e}\'{e}$n". So no good. Does anyone know how to solve this problem? Thanks!

knitr html to Word docx using pandoc

I have been saving some example R markdown html output to Word using pandoc. I actually only do this so I can add some page breaks for easier printing:
system("pandoc -s Exercise1.html -o Exercise1.docx")
Although the output is acceptable I was wondering if there is a way to keep the original syntax highlighting of the R chunks (just as they are in the original knit HTML document)?
Also, I seem to be loosing all images in the conversion process and have to stick them into Word by hand. Is that normal?
Using the rmarkdown package (baked into RStudio Version 0.98.682, the current preview release) it's very simple to convert Rmd to docx, and code highlighting is included in the docx file.
You just need to include this at the top of your markdown text:
---
title: "Untitled" # obviously you can change this
output: word_document # specifies docx output
---
However, it seems that page breaks are still not supported in this conversion.
Why not convert the markdown directly to Word format?
Anyway, Pandoc does not support syntax highlighting in Word: "Currently, the only output formats that uses this information are HTML and LaTeX."
About the images: the Word file would definitely include those if you'd convert the markdown to Word directly. I am not sure about the HTML source, but I suppose you might have a path issue.

Creating a title and numbering of sections on pdf while using R markdown with pandoc

I am using the markdown document created by R and I am trying to create the pdf file from the markdown using pandoc. Everything else works fine but I want the title of the document to appear and the numbering on the sections as in default Latex document. It seems the title defined on rmarkdown appears as a section title for the pdf.
I was able to create double spacing, enter line numbers etc by using a options.sty file. Below is the code that I used to create a pdf.
For options.sty I used:
\usepackage{setspace}
\doublespacing
\usepackage[vmargin=0.75in,hmargin=1in]{geometry}
\usepackage{lineno}
\usepackage{titlesec}
\titleformat{\section}
{\color{red}\normalfont\Large\bfseries}
{\color{red}\thesection}{1em}{}
\titleformat{\subsection}
{\color{blue}\normalfont\Large\bfseries}
{\color{blue}\thesection}{0.8em}{}
\title{Monitoring Stations}
\author{Jdbaba}
I used knitr to create the R markdown file. In the above options.sty file, it seems the program is not taking title and author part. It seems I am missing something.
The code I used to convert markdown to pdf is as follows:
pandoc -H options.sty mydata.md -o mydata.pdf
In latex document, the pdf would have the automatic numbering as well. But my pdf is missing that. Can anyone suggest how numbering can be enabled on the pdf document created using pandoc ?
Thanks.
Pandoc takes the title from a title block in the Markdown file. This is a Pandoc-specific extension to Markdown. The block should have the following format:
% title
% author(s) (separated by semicolons)
% date
So, in your case:
% Monitoring Stations
% Jdbaba
% March 6, 2013
To have the sections numbered, you'll need to run Pandoc with the --number-sections option.

Resources