Pandoc: Problem with incorrect section numbers in generated DOCX chapter - docx

I am using pandoc to generate my thesis chapters.
When I generate individual chapters, I am using the
Chapter 4: <Title of Chapter> {#chapter-4}
=======================
syntax.
In the generated HTML and the EPUB and PDF documents, the sections are correctly numbered as 4.1, 4.2, etc, but in the generated DOCX document the sections are numbered 1.1, 1.2 etc.
In the 'full thesis' DOCX document all section numbers are correct.
Does anyone know what I can do about this? It's pretty much the same pandoc call across formats so I am not sure what options I can tweak!
Command I am using to generate chapter 4 DOCX (where numbering is incorrect):
pandoc src/index.md src/chapter-4.md --output=./out/docx/chapter-4.docx -s --toc --toc-depth=4 --filter=pandoc-citeproc --self-contained --number-sections --number-offset=3
Command I am using to generate chapter 4 PDF (where numbering is correct):
pandoc src/index.md src/chapter-4.md --output=./out/pdf/chapter-4.pdf -s --toc --toc-depth=4 --filter=pandoc-citeproc --self-contained --number-sections --number-offset=3 -t html5
Command I am using for whole thesis DOCX (where numbering is correct):
pandoc src/index.md src/frontmatter.md src/chapter-1.md src/chapter-2.md src/chapter-3.md src/chapter-4.md src/chapter-5.md src/chapter-7.md src/chapter-8.md src/bibliography.md --toc --number-sections --self-contained --toc-depth=2 --filter=pandoc-citeproc --output=./out/docx/thesis.docx

Related

rticles templates do not compile [duplicate]

Knitting (in RStudio version 1.2.1335) an RMarkdown file to PDF fails when trying to create citations (for pandoc version 2.8.0.1, and R version 3.6.1). (This does not happen when knitting to HTML, for example.)
Here is a small rep. ex. in RMarkdown:
---
title: "Rep. Ex. for 'LaTeX Error: Environment cslreferences undefined'"
output:
pdf_document: default
bibliography: report.bib
---
```{r generate-bibtex-file, include=FALSE}
knitr::write_bib(file = "report.bib", prefix = "")
```
# Used R version
R 3.6.1 [#base]
# References
Knitting this yields as final output (on my machine):
"C:/PROGRA~1/Pandoc/pandoc" +RTS -K512m -RTS RepEx.utf8.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output RepEx.tex --template "C:\Users\gcb7\Documents\R\win-library\3.6\rmarkdown\rmd\latex\default-1.17.0.2.tex" --highlight-style tango --pdf-engine pdflatex --variable graphics=yes --lua-filter "C:/Users/gcb7/Documents/R/win-library/3.6/rmarkdown/rmd/lua/pagebreak.lua" --lua-filter "C:/Users/gcb7/Documents/R/win-library/3.6/rmarkdown/rmd/lua/latex-div.lua" --variable "geometry:margin=1in" --variable "compact-title:yes" --filter "C:/PROGRA~1/Pandoc/pandoc-citeproc.exe"
output file: RepEx.knit.md
! LaTeX Error: Environment cslreferences undefined.
This seems to have started after a recent update to pandoc 2.8.0.1, and I just found on https://pandoc.org/releases.html that in 2.8 a few changes seem to have been made in the cslreferences environment (but up to now there seems to have nothing appeared on pandoc-discuss or on the respective github bug tracker).
Any ideas?
According to the release notes you linked, cslreferences was introduced in version 2.8, including a suitable definition of this environment in the pandoc template. However, Rmarkdown is using its own template (C:\Users\gcb7\Documents\R\win-library\3.6\rmarkdown\rmd\latex\default-1.17.0.2.tex in your case), which does not have this definition. This has been fixed on GitHub, c.f. https://github.com/rstudio/rmarkdown/issues/1649.
One workaround would be to copy the relevant lines to a local copy of Rmarkdown's template and specify that via the template field. Alternatively you could add
\newlength{\cslhangindent}
\setlength{\cslhangindent}{1.5em}
\newenvironment{cslreferences}%
{\setlength{\parindent}{0pt}%
\everypar{\setlength{\hangindent}{\cslhangindent}}\ignorespaces}%
{\par}
or
\newenvironment{cslreferences}%
{}%
{\par}
to the resulting tex file via header-includes or similar. Or you could use the pandoc that comes with RStudio, if you have that installed. This can be accomplished by prepending <rstudio-dir>/bin/pandoc/ to the PATH, possibly within .Renviron to make it R specific.
Everything untested, since I do not have pandoc 2.8 ...
Had the same issue when using thesisdown.
Which was confusing, since the solution from Ralf (adding \newenvironment{cslreferences} ) is already included in the template.tex file form thesisdown.
After some while I figured out:
Changing
\newenvironment{cslreferences}% to
\newenvironment{CSLReferences}% solves the problem.
Specifically if you are also having this problem with thesisdown, you must alter the template.tex file.
The section in template.tex should look like this then:
$if(csl-refs)$
\newlength{\cslhangindent}
\setlength{\cslhangindent}{1.5em}
\newenvironment{CSLReferences}%
{$if(csl-hanging-indent)$\setlength{\parindent}{0pt}%
\everypar{\setlength{\hangindent}{\cslhangindent}}\ignorespaces$endif$}%
{\par}
$endif$
As also described here.
Seems like the default Pandoc template also uses \newenvironment{CSLReferences} since Version 2.11 (see Commit)

Specifying multiple simultaneous output formats in knitr (new)

Can I write a YAML header to produce multiple output formats for an R Markdown file using knitr? I could not reproduce the functionality described in the answer for the original question with this title.
This markdown file:
---
title: "Multiple output formats"
output:
pdf_document: default
html_document:
keep_md: yes
---
# This document should be rendered as an html file and as a pdf file
produces a pdf file but no HTML file.
And this file:
---
title: "Multiple output formats"
output:
html_document:
keep_md: yes
pdf_document: default
---
# This document should be rendered as an html file and as a pdf file
produces an HTML file (and an md file) but no pdf file.
This latter example was the solution given to the original question. I have tried knitting with Shift-Ctrl-K and with the Knit button in RStudio, as well as calling rmarkdown::render, but only a single output format is created, regardless of the method I use to generate the output file.
Possibly related, but I could not identify solutions:
How do I produce R package vignettes in multiple formats?
Render all vignette formats #1051
knitr::pandoc can't create pdf and tex files with a single config #769
Multiple formats for pandoc #547
An allusion to multiple output format support in a three year old RStudio blog post
Using R version 3.3.1 (2016-06-21), knitr 1.14, Rmarkdown 1.3
I actually briefly mentioned in Render all vignette formats #1051 and you missed it:
rmarkdown::render('your.Rmd', output_format = 'all')
It is documented on the help page ?rmarkdown::render.
Notwithstanding Yihui Xie's authoritative answer, and with due respect to the author of a great package, there are many cases in which output_format = 'all' is sub-optimal.
One of the issues that this solution raises is that the R script is re-processed from scratch for each format. Proof:
rmarkdown::render("new.Rmd", output_format = c("html_document", "pdf_document"))
processing file: new.spin.Rmd
|....................... | 33%
ordinary text without R code
|............................................... | 67%
label: unnamed-chunk-1
|......................................................................| 100%
ordinary text without R code
output file: new.knit.md
"C:/Users/fabrn/AppData/Local/Pandoc/pandoc" +RTS -K512m -RTS new.utf8.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output new.html --lua-filter "C:\Users\fabrn\R\win-library\4.0\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\fabrn\R\win-library\4.0\rmarkdown\rmarkdown\lua\latex-div.lua" --email-obfuscation none --self-contained --standalone --section-divs --template "C:\Users\fabrn\R\win-library\4.0\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable "theme:bootstrap" --include-in-header "C:\Users\fabrn\AppData\Local\Temp\RtmpW6Vban\rmarkdown-str3490247b1f1e.html" --mathjax --variable "mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"
Output created: new.html
processing file: new.spin.Rmd
|....................... | 33%
ordinary text without R code
|............................................... | 67%
label: unnamed-chunk-1
|......................................................................| 100%
ordinary text without R code
output file: new.knit.md
"C:/Users/fabrn/AppData/Local/Pandoc/pandoc" +RTS -K512m -RTS new.utf8.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output new.tex --lua-filter "C:\Users\fabrn\R\win-library\4.0\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\fabrn\R\win-library\4.0\rmarkdown\rmarkdown\lua\latex-div.lua" --self-contained --highlight-style tango --pdf-engine pdflatex --variable graphics --variable "geometry:margin=1in"
This really is an issue when it comes to processing big data.
In real-world examples, I usually use latex output as a single rmarkdown::render output, then reprocess the .tex files using pandoc or similar tools (like prince for pdf). So my workflow is like:
rmarkdown::render('new.R', output_format = 'latex_document')
lapply(c("html", "pdf", ...),
function(form) rmarkdown::pandoc_convert("new.tex", output=paste0("new.", form)))
The bottom line is: all depends on your data. If small, output_format='all' is straightforward.
If big, you are better off with a common-ground format (latex is a good choice but html may be better in some cases) as an input to conversion tools.

Converting Rmarkdown to PDF without RStudio

I would like to convert a *.Rmd to document to PDF without rstudio being available.
Current approach
Current approach follows the following steps:
*.Rmd document is passed to knitr: knit(input = "report.Rmd"))
Obtained md is converted via pandoc:
# Convert
pandoc --smart --to latex \
--latex-engine pdflatex \
-s report.md \
-o report.PDF
Problems
This results in the following problems, the top section of the Rmarkdown document:
---
title: "Report Title"
author: "Person"
output: pdf_document
classoption: landscape
---
and shows as:
all text is centered, whereas I would like for it to be left-aligned:
Possible approach
I would like to make use of the rmarkdown::render; however, despite setting RSTUDIO_PANDOC (as discussed here), the command fails on pandoc not being available.
Desired outcome
I don't care much whether the utilised mechanism makes use of the rmarkdown::render, what I want to achieve is:
Landscape page layout across all pages
Left-aligned text
Ability to exercise minimum control over the document by controlling default fonts
Ideally, I would like to do as much as in the *.Rmd file as possible without the need to add parameters to the pandoc command.
Updates, following comments
I'm working on Linux and pandoc is installed, I can execute pandoc command pass files and generate exports with no problems. It only doesn't work with the rmarkdown::render package.
Concerning the hooks and *.Rmd files, this is what I'm trying to understand as I see that that the first section of my *.Rmd file is ignored. The current process looks as follows:
*.Rmd (not much in it, just title section and dummy text and code that renders but wrongly justified) >
*.R file running one line knit(input = "report.Rmd")) >
*.sh file running pandoc command and generating PDF
Concerning:
if all that is in place, it is indeed just a call to
rmarkdown::render(...)
The rmarkdown::render(...) fails:
Error: pandoc version 1.12.3 is required and was not found ...
However:
>> rmarkdown::pandoc_available()
[1] TRUE
and:
$ pandoc -v
pandoc 1.9.4.1 (...)
The RSTUDIO_PANDOC points to pandoc.
A few things:
"the command fails on pandoc not being available." well you must have pandoc installed in order to call it -- but you didn't say what OS you have. On Linux it is pretty trivial to install pandoc from the package manager; otherwise jgm has binaries for you on the site; "should" be similar on OS X
for different styling you need to modify the LaTeX code which you can via numerous hooks to include macro files; see the RMarkdown cheat sheets for detail
if you want to exercise more control, you can supply your own template; I have done so in the tint package
(which is also on CRAN)
if all that is in place, it is indeed just a call to rmarkdown::render(...)
Error: pandoc version 1.12.3 is required and was not found
I think the error says it plainly: you need pandoc 1.12.3 and you have pandoc 1.9.4.1
I do not know, however, why such a specific version is required.

Pandoc conversion of markdown to latex with default filename

I'm using the R package knitr to generate a markdown file test.md. This file is then processed by pandoc to produce a variety of output formats, such as html and pdf. Because I want to use bibtex when generating the pdf through latex, I believe I have to tell pandoc to stop at the intermediate latex output, and then run bibtex and pdflatex myself (twice). Here's where I found a slight annoyance in my workflow: the only way I found for pandoc to keep the intermediate tex file, and not go all the way to the pdf, was to specify a hard-coded filename through the -o option with a .tex extension. This is problematic for me because I'm using a config file to run pandoc('test.md', "latex", "config.pandoc") via knitr with options, which I would like to keep generic without hard-coded output filename:
format: latex
o: test.tex
s:
S:
biblio: refs.bib
biblatex:
template: 'template.tex'
default-image-extension: pdf
which in turn becomes the following command for pandoc,
pandoc -s -S --biblio=refs.bib --default-image-extension=pdf --biblatex --template='template.tex' -f markdown -t latex -o test.tex 'test.md'
If I skip the o: test.tex option, pandoc produces a pdf and doesn't keep the intermediate latex file. How can I keep the tex file, without specifying this hard-coded filename?
To solve this problem on my side, I added a new argument ext to the pandoc() function. It is available on Github now (knitr development version 1.3.6). You can override the default file extension, e.g.
library(knitr)
pandoc(..., ext = 'tex')

Creating a title and numbering of sections on pdf while using R markdown with pandoc

I am using the markdown document created by R and I am trying to create the pdf file from the markdown using pandoc. Everything else works fine but I want the title of the document to appear and the numbering on the sections as in default Latex document. It seems the title defined on rmarkdown appears as a section title for the pdf.
I was able to create double spacing, enter line numbers etc by using a options.sty file. Below is the code that I used to create a pdf.
For options.sty I used:
\usepackage{setspace}
\doublespacing
\usepackage[vmargin=0.75in,hmargin=1in]{geometry}
\usepackage{lineno}
\usepackage{titlesec}
\titleformat{\section}
{\color{red}\normalfont\Large\bfseries}
{\color{red}\thesection}{1em}{}
\titleformat{\subsection}
{\color{blue}\normalfont\Large\bfseries}
{\color{blue}\thesection}{0.8em}{}
\title{Monitoring Stations}
\author{Jdbaba}
I used knitr to create the R markdown file. In the above options.sty file, it seems the program is not taking title and author part. It seems I am missing something.
The code I used to convert markdown to pdf is as follows:
pandoc -H options.sty mydata.md -o mydata.pdf
In latex document, the pdf would have the automatic numbering as well. But my pdf is missing that. Can anyone suggest how numbering can be enabled on the pdf document created using pandoc ?
Thanks.
Pandoc takes the title from a title block in the Markdown file. This is a Pandoc-specific extension to Markdown. The block should have the following format:
% title
% author(s) (separated by semicolons)
% date
So, in your case:
% Monitoring Stations
% Jdbaba
% March 6, 2013
To have the sections numbered, you'll need to run Pandoc with the --number-sections option.

Resources