pandoc mmd_title_block appears not to load - docx

I am new to pandoc and an attempting to use it to convert some simple mmd files to docx. These mmd files contain a mmd style title block in the following form:
Author: Author_name
Title: Title_name
Date: Date_name
I prefer this style to the pandoc style title blocks, so I would like to keep them in the multimarkdown style. The pandoc documentation indicates that there is an extension that will allow me to use them, but when I attempt to use the extension it has no effect on the output. I have tried many permutations of the command to no avail, but an example looks like this:
pandoc -f markdown-pandoc_title_block+mmd_title_block -o test.docx testinput.txt
If I convert the title block to use pandoc's style, the output properly converts the title blocks to the correct format in the resulting Word file, so I know the reference file is okay. Also, when I keep the title block in pandoc's style but use the markdown-pandoc_title_block command, it properly ignores the title block, so I know the problem is not in the disabling of pandoc title blocks.
Suggestions on what I might be doing wrong?

If you upgrade to Pandoc 1.11.1 and try running the following command, it should work fine:
pandoc -f markdown_mmd -t docx test.md -o test.docx
It preserves title, author and date fields.

Related

How do I keep title & subtitle when using pandoc to convert .docx to .md in R?

I'm downloading a Google Doc as .docx and then converting to markdown for manipulation and export to multiple formats.
Problem: When I convert using pandoc, it strips title (and subtitle) and does not add any YAML header information. I could add title manually in the header, but I need it to be scripted, so need to not lose the title (ideally) or extract title from docx and add to YAML header, which would then be concatenated to the converted markdown file.
Example Code, where title is lost on conversion from docx to markdown:
require(rmarkdown);require(devtools)
examplefile=paste0(tempdir(),"/example.docx")
download.file("https://file-examples.com/wp-content/uploads/2017/02/file-sample_100kB.docx",destfile=examplefile)
pandoc_convert(examplefile,to="markdown",output = "example.rmd", options=c("--extract-media=."))
render(paste0(tempdir(), "/example.rmd"),"html_document")
browseURL(paste0(tempdir(),"/example.html"))
When converting from docx to markdown (or another markup format like rst) you need to include the -s or --standalone option.
From the pandoc documentation:
-s, --standalone
Produce output with an appropriate header and footer (e.g. a standalone HTML, LaTeX, TEI, or RTF file, not a fragment). This option is set automatically for pdf, epub, epub3, fb2, docx, and odt output. For native output, this option causes metadata to be included; otherwise, metadata is suppressed.
Without the -s this data is suppressed.

Converting Rmarkdown to PDF without RStudio

I would like to convert a *.Rmd to document to PDF without rstudio being available.
Current approach
Current approach follows the following steps:
*.Rmd document is passed to knitr: knit(input = "report.Rmd"))
Obtained md is converted via pandoc:
# Convert
pandoc --smart --to latex \
--latex-engine pdflatex \
-s report.md \
-o report.PDF
Problems
This results in the following problems, the top section of the Rmarkdown document:
---
title: "Report Title"
author: "Person"
output: pdf_document
classoption: landscape
---
and shows as:
all text is centered, whereas I would like for it to be left-aligned:
Possible approach
I would like to make use of the rmarkdown::render; however, despite setting RSTUDIO_PANDOC (as discussed here), the command fails on pandoc not being available.
Desired outcome
I don't care much whether the utilised mechanism makes use of the rmarkdown::render, what I want to achieve is:
Landscape page layout across all pages
Left-aligned text
Ability to exercise minimum control over the document by controlling default fonts
Ideally, I would like to do as much as in the *.Rmd file as possible without the need to add parameters to the pandoc command.
Updates, following comments
I'm working on Linux and pandoc is installed, I can execute pandoc command pass files and generate exports with no problems. It only doesn't work with the rmarkdown::render package.
Concerning the hooks and *.Rmd files, this is what I'm trying to understand as I see that that the first section of my *.Rmd file is ignored. The current process looks as follows:
*.Rmd (not much in it, just title section and dummy text and code that renders but wrongly justified) >
*.R file running one line knit(input = "report.Rmd")) >
*.sh file running pandoc command and generating PDF
Concerning:
if all that is in place, it is indeed just a call to
rmarkdown::render(...)
The rmarkdown::render(...) fails:
Error: pandoc version 1.12.3 is required and was not found ...
However:
>> rmarkdown::pandoc_available()
[1] TRUE
and:
$ pandoc -v
pandoc 1.9.4.1 (...)
The RSTUDIO_PANDOC points to pandoc.
A few things:
"the command fails on pandoc not being available." well you must have pandoc installed in order to call it -- but you didn't say what OS you have. On Linux it is pretty trivial to install pandoc from the package manager; otherwise jgm has binaries for you on the site; "should" be similar on OS X
for different styling you need to modify the LaTeX code which you can via numerous hooks to include macro files; see the RMarkdown cheat sheets for detail
if you want to exercise more control, you can supply your own template; I have done so in the tint package
(which is also on CRAN)
if all that is in place, it is indeed just a call to rmarkdown::render(...)
Error: pandoc version 1.12.3 is required and was not found
I think the error says it plainly: you need pandoc 1.12.3 and you have pandoc 1.9.4.1
I do not know, however, why such a specific version is required.

knitr html to Word docx using pandoc

I have been saving some example R markdown html output to Word using pandoc. I actually only do this so I can add some page breaks for easier printing:
system("pandoc -s Exercise1.html -o Exercise1.docx")
Although the output is acceptable I was wondering if there is a way to keep the original syntax highlighting of the R chunks (just as they are in the original knit HTML document)?
Also, I seem to be loosing all images in the conversion process and have to stick them into Word by hand. Is that normal?
Using the rmarkdown package (baked into RStudio Version 0.98.682, the current preview release) it's very simple to convert Rmd to docx, and code highlighting is included in the docx file.
You just need to include this at the top of your markdown text:
---
title: "Untitled" # obviously you can change this
output: word_document # specifies docx output
---
However, it seems that page breaks are still not supported in this conversion.
Why not convert the markdown directly to Word format?
Anyway, Pandoc does not support syntax highlighting in Word: "Currently, the only output formats that uses this information are HTML and LaTeX."
About the images: the Word file would definitely include those if you'd convert the markdown to Word directly. I am not sure about the HTML source, but I suppose you might have a path issue.

centre title in PDF converted from markdown using Pandoc

I'm converting a markdown document to a PDF document using Pandoc from within R. I'm trying to centre the title.
So far, I've tried:
<center># This is my title</center>
and
-># This is my title<-
but neither have worked. Is there a way to centre the title when converting from markdown to PDF using Pandoc?
pandoc has its own extended version of markdown. This includes a title block.
If the file begins with a title block
% my title
% Me; Someone else
% May 2013
This will be parsed into LaTeX and the resulting pdf as
\title{my title}
\author{Me \and Someone Else}
\date{May 2013}
and then `
\maketitle
called within the document.
This will create the standard centred title.
If you want to change how the title etc is formatted you could use the titling package.
If you want to change how the section headers are formatted, you could use the titlesec package.
To automagically have pandoc implement these you could define your own template. A simpler option is to have a file with your desired latex preamble to be included in the header. and then use the appropriate arguments when calling pandoc (eg -H FILE or --include-in-header=FILE)

Pandoc insert appendix after bibliography

I'm using the knitr package and pandoc in R to convert a .Rmd file to a PDF. Pandoc is linked to a .bib file and automatically inserts the bibliography at the end of the PDF
The entries in my .bib file look like these, taken from http://johnmacfarlane.net/pandoc/demo/biblio.bib:
#Book{item1,
author="John Doe",
title="First Book",
year="2005",
address="Cambridge",
publisher="Cambridge University Press"
}
#Article{item2,
author="John Doe",
title="Article",
year="2006",
journal="Journal of Generic Studies",
volume="6",
pages="33-34"
}
To build my bibliography, I'm using the following function, taken from: http://quantifyingmemory.blogspot.co.il/2013/02/reproducible-research-with-r-knitr.html
knitsPDF <- function(name) {
library(knitr)
knit(paste0(name, ".Rmd"), encoding = "utf-8")
system(paste0("pandoc -o ", name, ".pdf ", name, ".md --bibliography /Users/.../Desktop/test.bib --csl /Users/.../Desktop/taylor-and-francis-harvard-x.csl"))
}
The contents of my .Rmd file is:
This is some text [#item1]
This is more text [#item2]
# References
And outputted PDF looks like this:
If I try to insert an appendix, the references still print at the end of the document, like this:
How do insert an appendix after the references?
With newer pandoc versions, you can specify the bibliography's position with <div id="refs"></div> source
This is some text [#item1]
This is more text [#item2]
# References
<div id="refs"></div>
# appendix
Eventually reference handling will change to make it possible to put the references wherever you like (https://github.com/jgm/pandoc/issues/771), but right now there's no easy way to do it.
As suggested here, you could put your appendix in a separate file, use pandoc to convert it to a LaTeX fragment, then include that fragment using the --include-after-body flag. It would then come after the bibliography.
When working in an Rmarkdown document, enter the following text where the citations are to be located. It can be placed in any part of the document allowing other materials, like an appendix, to follow as necessary. The method relies on pandoc's fenced divs which will work in Rmarkdown.
::: {#refs}
:::
The aforementioned code should not be in an R code chunk, rather it should be placed on blank lines by themselves. Once processed by pandoc via knitter, this code will produce the same result as <div id="refs"></div> mentioned in the answer by #soca. The two lines of code do consistently allow for exact placement of the references in any section of the document.
In the example below, references are placed first under a heading of the same name while all of the code chunks in the document are placed afterwards in a code appendix. Here is the pandoc fenced div placed in Rmarkdown that can be used to generate the image that follows.
# References
::: {#refs}
:::
# Appendix A: R Code
```{r ref.label=knitr::all_labels(), echo=TRUE, eval=FALSE}
```
Provided there is a .bib file identified in the yaml frontmatter, the preceding Rmarkdown produces output similar to the following:
Helpful links:
Pandoc User’s Guide - Placement of the Bibliography
Pandoc User’s Guide - Divs and Spans
How can the position of the bibliograpy section be set Latex format
9.6 Custom blocks (*) | R Markdown Cookbook

Resources