Reduce file size of R Markdown HTML output - r

If I create a very basic R Markdown file with no images or code and knit this HTML, I end up with an outputted file size that is more than 700kb in size. Is there any way to reduce the HTML file size?
Minimal Example:
---
title: "Hello world!"
output:
html_document: default
html_notebook: default
---
Nothing else to say, really.
The output file from html_document is 708.6 kb in size, while html_notebook is 765.7 kb.

The reason for the big file size is that knit creates self-contained files by default and therefore includes javascript dependencies (bootstrap, highlight, jquery, navigation) as base64 encoded string. See: http://rmarkdown.rstudio.com/html_document_format.html#document_dependencies
In your simple case the javascript capabilities are not required therefore you could do the following:
---
title: "Hello world!"
output:
html_document:
self_contained: false
lib_dir: libs
---
Nothing else to say, really.
This will create a html file of size ~2.7kB and a separate libs folder with the javascript files. However the libs folder is nearly 4MB in size. And although you don't necessarily need the javascript libraries the html file still tries to load them.
If you are interested in a truly minimal version you can have a look at the html_fragment output option (http://rmarkdown.rstudio.com/html_fragment_format.html):
---
title: "Hello world!"
output:
html_fragment: default
---
Nothing else to say, really.
This will however not create a full html page but rather html content that can be included into another website. The test.html file is just 36 bytes. Still browsers will be able to display it.
As a last resort you can create a custom html template for pandoc:
http://rmarkdown.rstudio.com/html_document_format.html#custom_templates

The html_vignette format is perfect if you want a smaller file size. As described in the function documentation:
A HTML vignette is a lightweight alternative to html_document suitable for inclusion in packages to be released to CRAN. It reduces the size of a basic vignette from 100k to around 10k.
For your example:
---
title: "Hello world!"
output: rmarkdown::html_vignette
---
Nothing else to say, really.
Results in an output of 6kB:
You can read more about the package in the online documentation here.

The simplest, most direct method to prevent the unwanted insertion of the bootstrap libraries into the preamble of the HTML document is to add the additional markdown flag "theme: null".
output:
html_document:
theme: null
This is more desirable than self_contained: false because it does not prevent insertion of images or other components need to keep the portable document.
In my opinion, it is more desirable than changing to html_vignette because it does not absorb the other changes imposed by that processor.
Please remember that IF your document uses a template, the theme argument is ignored and you need to specify theme=NULL in the rmarkdown::render function.

Related

Force relative paths in knitr::include_graphics()

I need the output to be portable so all the file paths need to be relative to the root directory of the project not absolute paths. I also need to set the figure size so I can't fall back on markdown and just use but knitr insists on creating an absolute path in my output. I've tried setting root.dir & base.dir and fig.path in various combinations but I keep getting an absolute path in my output. Setting root.dir = "/" does change the path in my output but setting it to "" or "./" still results in an abs path so this is not helping produce a portable output.
My rmarkdown::render is being run from the same directory as the Rmd file, the output is to this same directory and the graphics files are in a sub-directory of the one with the notebook. Is the a way I can make knitr just us the paths I give it and not try to transform them in any way?
Issues with knitr & paths seem to be a common problem and there are a lot of stub Q&As out there in search results which often don't satisfactorily resolve the problem, or if they do resolve it the questioner and/or answer seem unclear on why it worked.
I've referred to https://yihui.org/knitr/ & https://bookdown.org/yihui/rmarkdown/ and I still don't have a satisfactory understanding of how to control output paths with knitr could someone help me understand how to control this or point me in the direction of a resource that can?
OK so calmed down and I tried to make a reprex, I found some interesting behaviour.
The graphics folder is a symlink to a folder in the parent directory and this works fine if I use as basic html_document output but when I use the revealjs::revealjs_presentation output I get different results in the basics document the img tag contains:
src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD......."
and in the presentation contains:
src="/home/richardjacton/project/graphics_test/example.jpg"
Now this does not work because I need to be able to move the directory containing the presentation and it's assets to somewhere else and have it work once built.
If I go in with inspect element and edit it to src="./graphics/example.jpg" that works.
Also If I make a real directory instead of a symlink to the a directory in the parent then I get src="graphics/example.jpg" in the revealjs output
---
title: "Example"
author: "Richard J. Acton"
date: "`r Sys.Date()`"
# output:
# html_document:
# df_print: paged
output:
revealjs::revealjs_presentation:
theme: dark
highlight: zenburn
center: true
self_contained: false
reveal_plugins: ["notes"]
reveal_options:
slideNumber: true
previewLinks: true
fig_caption: true
---
# Example
```{r, echo=FALSE, include=FALSE, message=FALSE}
fs::dir_create("../graphics_test/")
download.file("https://i.ytimg.com/vi/0hMr5-05bFc/maxresdefault.jpg", "../graphics_test/example.jpg")
fs::link_create("../graphics_test", "graphics")
# fs::dir_create("graphics")
# download.file("https://i.ytimg.com/vi/0hMr5-05bFc/maxresdefault.jpg", "graphics/example.jpg")
```
```{r, out.width='80%', echo=FALSE}
knitr::include_graphics("graphics/example.jpg")
```
In my real case I'm creating a presentation from inside a {targets} project that's also has a {bookdown} project in it that produces a report based on the pipeline. So the presentation lives in a sub-directory so I can exclude it from the bookdown build. I explicitly built the presentation with an rmarkdown::render call as the knit button in the gui wants to try and render the book: rmarkdown::render("test.Rmd",envir = new.env()). Also this is all in a directory mounted to a docker container that's running my RStudio server instance so I have a portable and reproducible working environment for any non-R dependencies of my pipeline.
So this is why I was looking for a resource with a more in depth explanation of what exactly happens when I make a knitr::include_graphics call because this is the sort of weird edge case I seem to encounter a lot and get stuck trying to debug. Thanks for the response #YihuiXie I love your work and use it daily - sorry for offering you the opportunity to engage with this particular headache.

Ouptut options ignored when using 'render_book' ('preamble'.tex' ignored)

I'm having some trouble to compile an entire document from many Rmd files by using the bookdown approach.
If I knit individual .Rmd files then 'preamble.tex' included in YAML options is taken into account.
If I render the book (with both approaches described here), then 'preamble.tex' is ignored.
To make things concrete, consider the following mwe:
preamble.tex:
\usepackage{times}
index.Rmd:
---
title: "My paper"
site: "bookdown::bookdown_site"
output:
bookdown::pdf_document2:
includes:
in_header: "preamble.tex"
---
01-intro.Rmd:
# Introduction
This chapter is an overview of the methods that we propose to solve an **important problem**.
Then, by knitting 'index.Rmd' or '01-intro.Rmd' the font indicated in 'preamble.tex' is used.
However when rendering with bookdown::render_book('index.Rmd',"bookdown::pdf_book", new_session = T) it is simply ignored.
What is more, in my actual project there are other output options that end up ignored. For example, I use toc: false and it works when knitting single files, but fails when rendering the document.
In this simple example it would be okay to use a single file, but my actual project has many chapters with R chunks within each of them. Thus, building a single file doesn't seem a good idea.
I appreciate any hints on what I am missing here.
Thanks in advance.
What you are missing here is that in your YAML header, preamble.tex is included for the bookdown::pdf_document2 output format and not bookdown::pdf_book, the format you pass to the output_format argument in bookdown::render_book(). For this reason, other YAML options (like toc: true) do not work either.
Running
bookdown::render_book('index.Rmd', "bookdown::pdf_document2", new_session = T)
instead should work.

Custom highlighting style in rmarkdown

Is there a way to use a custom highlighting style in rmarkdown?
Manual is a bit silent regarding that and the closest to that is to make a full blown custom css file for everything, that would however work only for html_document and not for pdf_document (see https://bookdown.org/yihui/rmarkdown/html-document.html#appearance-and-style )
The newer versions of Pandoc support this:
http://pandoc.org/MANUAL.html#syntax-highlighting
but when anything else but one of the default pandoc styles is specified, rmarkdown throws an error.
For example, when I download zenburn.css from highlight.js library, modify it and want to use it:
```
title: Some title
output:
html_document:
theme: readable
highlight: zenburn.css
```
I get:
Error in match.arg(highlight, html_highlighters()) :
'arg' should be one of “default”, “tango”, “pygments”, “kate”, “monochrome”, “espresso”, “zenburn”, “haddock”, “textmate”
Calls: ... -> pandoc_html_highlight_args -> match.arg
Execution halted
For posterity, since this took more time than it should:
The problem
Answers from #martin_schmelzer is correct (didn't test #tarleb's solution, since rmarkdown doesn't supposedly play well with Pandoc > 2.0 see: https://github.com/rstudio/rmarkdown/issues/1471 ). However, when you write echo=TRUE on chunk, the output is code is not tagged as R code and because of that different rules apply for it. In HTML it means that it has white background and with PDF it is formatted through verbatim environment only. For example, the markdown of following:
```{r, echo=TRUE}
foo = "bar"
foo
```
will be:
```r
foo = "bar"
foo
```
```
## [1] "foo"
```
And while the first chunk will be highlighted, the second will follow only in text color, but background will always be white. This is very problematic with darker themes, since they often have very light text color and this does not play nicely with white background. See: https://eranraviv.com/syntax-highlighting-style-in-rmarkdown/ for overview of rmarkdown highlighting styles.
HTML solution
This is made more complicated with switch between highlight.js and pandoc highlighting. If highlighting is not specified, highlight.js is used with associated tags. There the highlighting is done through external css and .js library, which are then (I presume) hashed into the HTML to make it stand-alone. So no luck there.
If some highlighting style is used however, then pandoc highlighting is used. I.e.,:
---
title = "Foo"
output:
html_document:
theme: readable
highlight: zenburn
---
In this case, there IS solution. Looking at the HTML output, there is this structure:
<style typetext/css">
pre:not([class]) {
background-color: white;
}
</style>
This means that whenever there is no style in specific code chunk (applies only to the "echo" chunks since by default rmarkdown assumes R), the background is white. This behaviour can be changed just by including following chunk in the .Rmd file:
```{css, echo = FALSE}
pre:not([class]) {
color: #333333;
background-color: #cccccc;
}
```
and their behaviour can be fully specified accordingly. Here the color and background as reverse of the zenburn style. The output looks like this:
Before:
After:
PDF solution
With PDF, it is a little bit easier to find the problem, but little bit more complicated to solve it. If one looks at the .tex file, you can see that while all the chunks with an actual code have a lot of going on around them, the echo chunks are wrapped only in a simple verbatim environment. The result looks like this:
While it is more readable than the HTML output, since it does not share the text color defined by highlighting style, it kind of blends into the text and creates and breaks the feeling of uniform style across outputs. The solution is, as mentioned in previous answer, to use:
---
title: "Foo"
output:
pdf_document:
highlight: zenburn
includes:
in_header: highlight_echo.tex
---
And a following construct that utilize package framed which is already included:
\usepackage{xcolor}
\definecolor{backgroundecho}{HTML}{cccccc}
\definecolor{textecho}{HTML}{333333}
\let\oldverbatim=\verbatim
\let\oldendverbatim=\endverbatim
\makeatletter
\renewenvironment{verbatim}{
\def\FrameCommand{
\hskip-\fboxsep
\color{textecho}
\colorbox{backgroundecho}
}
\MakeFramed{\#setminipage}
\oldverbatim
}
{
\oldendverbatim
\vskip-2em\#minipagefalse % The size required for this negative space is probably in some variable
\endMakeFramed
}
\makeatother
This redefine the verbatim environment to have colored background and colored text. The result is unified formatting of the "echo" chunks:
So thanks again #tarleb and #martin_schmelzer, I wouldn't be able to solve it without you!
It appears that you are trying to use a CSS file as a highlight style. This won't work (in general), as pandoc expects a highlighting styles to be defined using a special JSON format. To use a modified zenburn, one will have to create a new style file via pandoc --print-highlight-style zenburn > myzenburn.style, and then modify the new file myzenburn.style.
To use the new style, one must circumvent R Markdown by passing the necessary options directly to pandoc.
output:
html_document:
theme: readable
pandoc_args: --highlight-style=myzenburn.style
However, this will only work for non-HTML output formats, as knitr interferes whenever highlight.js can be used.
At least for HTML documents, you can simply include your customized styles using the css YAML option:
---
title: Some title
output:
html_document:
theme: readable
css: zenburn.css
---
Concerning PDF documents, you could check the intermediate TeX file. There you will find a block of commands that look like
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.96,0.35,0.01}{\textit{#1}}}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.93,0.29,0.53}{\textbf{#1}}}
These are the lines that define the code highlighting. The first one for example defines the color for comments. You could write a header.tex in which you redefine these commands using \renewcommand
\renewcommand{\CommentTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
\renewcommand{\KeywordTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
and include it in your document right before the body.
Here is an example in which we alter the highlighting of comments and keywords within the body:
---
title: Some title
output:
pdf_document:
keep_tex: true
---
```{r}
# This is a test
head(mtcars)
```
\renewcommand{\CommentTok}[1]{\textcolor[rgb]{0.96,0.35,0.01}{\textit{#1}}}
\renewcommand{\KeywordTok}[1]{\textcolor[rgb]{0.93,0.29,0.53}{\textbf{#1}}}
```{r}
# This is a test
head(mtcars)
```

Importing common YAML in rstudio/knitr document

I have a few Rmd documents that all have the same YAML frontmatter except for the title. How can I keep this frontmatter in one file and have it used for all the documents? It is getting rather large and I don't want to keep every file in step every time I tweak the frontmatter.
I want to still
use the Knit button/Ctrl+Shift+K shortcut in RStudio to do the compile
keep the whole setup portable: would like to avoid writing a custom output format or overriding rstudio.markdownToHTML (as this would require me to carry around a .Rprofile too)
Example
common.yaml:
author: me
date: "`r format (Sys.time(), format='%Y-%m-%d %H:%M:%S %z')`"
link-citations: true
reference-section-title: References
# many other options
an example document
----
title: On the Culinary Preferences of Anthropomorphic Cats
----
I do not like green eggs and ham. I do not like them, Sam I Am!
Desired output:
The compiled example document (ie either HTML or PDF), which has been compiled with the metadata in common.yaml injected in. The R code in the YAML (in this case, the date) would be compiled as a bonus, but it is not necessary (I only use it for the date which I don't really need).
Options/Solutions?
I haven't quite got any of these working yet.
With rmarkdown one can create a _output.yaml to put common YAML metadata, but this will put all of that metadata under output: in the YAML so is only good for options under html_document: and pdf_document:, and not for things like author, date, ...
write a knitr chunk to import the YAML, e.g.
----
title: On the Culinary Preferences of Anthropomorphic Cats
```{r echo=F, results='asis'}
cat(readLines('common.yaml'), sep='\n')
```
----
I do not like green eggs and ham. I do not like them, Sam I Am!
This works if I knitr('input.Rmd') and then pandoc the output, but not if I use the Knit button from Rstudio (which I assume calls render), because this parses the metadata first before running knitr, and the metadata is malformed until knitr has been run.
Makefile: if I was clever enough I could write a Makefile or something to inject common.yaml into input.Rmd, then run rmarkdown::render(), and somehow hook it up to the Knit button of Rstudio, and perhaps somehow save this Rstudio configuration into the .Rproj file so that the whole thing is portable without me needing to edit .Rprofile too. But I'm not clever enough.
EDIT: I had a go at this last option and hooked up a Makefile to the Build command (Ctrl+Shift+B). However, this will build the same target every time I use it via Ctrl+Shift+B, and I want to build the target that corresponds with the Rmd file I currently have open in the editor [as for Ctrl+Shift+K].
Have found two options to do this portably (ie no .Rprofile customisation needed, minimal duplication of YAML frontmatter):
You can provide common yaml to pandoc on the command-line! d'oh!
You can set the knit: property of the metadata to your own function to have greater control over what happens when you Ctrl+Shift+K.
Option 1: common YAML to command line.
Put all the common YAML in its own file
common.yaml:
---
author: me
date: "`r format (Sys.time(), format='%Y-%m-%d %H:%M:%S %z')`"
link-citations: true
reference-section-title: References
---
Note it's complete, ie the --- are needed.
Then in the document you can specify the YAML as the last argument to pandoc, and it'll apply the YAML (see this github issue)
in example.rmd:
---
title: On the Culinary Preferences of Anthropomorphic Cats
output:
html_document:
pandoc_args: './common.yaml'
---
I do not like green eggs and ham. I do not like them, Sam I Am!
You could even put the html_document: stuff in an _output.yaml since rmarkdown will take that and place it under output: for all the documents in that folder. In this way there can be no duplication of YAML between all documents using this frontmatter.
Pros:
no duplication of YAML frontmatter.
very clean
Cons:
the common YAML is not passed through knit, so the date field above will not be parsed. You will get the literal string "r format(Sys.time(), format='%Y-%m-%d %H:%M:%S %z')" as your date.
from the same github issue:
Metadata definitions seen first are kept and left unchanged, even if conflicting data is parsed at a later point.
Perhaps this could be a problem at some point depending on your setup.
Option 2: override the knit command
This allows for much greater control, though is a bit more cumbersome/tricky.
This link and this one mention an undocumented feature in rmarkdown: the knit: part of the YAML will be executed when one clicks the "Knit" button of Rstudio.
In short:
define a function myknit(inputFile, encoding) that would read the YAML, put it in to the RMD and call render on the result. Saved in its own file myknit.r.
in the YAML of example.rmd, add
knit: (function (...) { source('myknit.r'); myknit(...) })
It seems to have to be on one line. The reason for source('myknit.r') instead of just putting the function definition int he YAML is for portability. If I modify myknit.r I don't have to modify every document's YAML. This way, the only common YAML that all documents must repeat in their frontmatter is the knit line; all other common YAML can stay in common.yaml.
Then Ctrl+Shift+K works as I would hope from within Rstudio.
Further notes:
myknit could just be a system call to make if I had a makefile setup.
the injected YAML will be passed through rmarkdown and hence knitted, since it is injected before the call to render.
Preview window: so long as myknit produces a (single) message Output created: path/to/file.html, then the file will be shown in the preview window.
I have found that there can be only one such message in the output [not multiple], or you get no preview window. So if you use render (which makes an "Output created: basename.extension") message and the final produced file is actually elsewhere, you will need to suppress this message via either render(..., quiet=T) or suppressMessages(render(...)) (the former suppresses knitr progress and pandoc output too), and create your own message with the correct path.
Pros:
the YAML frontmatter is knitted
much more control than option 1 if you need to do custom pre- / post-processing.
Cons:
a bit more effort than option 1
the knit: line must be duplicated in each document (though by source('./myknit.r') at least the function definition may be stored in one central location)
Here is the setup for posterity. For portability, you only need to carry around myknit.r and common.yaml. No .Rprofile or project-specific config needed.
example.rmd:
---
title: On the Culinary Preferences of Anthropomorphic Cats
knit: (function (...) { source('myknit.r'); myknit(...) })
---
I do not like green eggs and ham. I do not like them, Sam I Am!
common.yaml [for example]:
author: me
date: "`r format (Sys.time(), format='%Y-%m-%d %H:%M:%S %z')`"
link-citations: true
reference-section-title: References
myknit.r:
myknit <- function (inputFile, encoding, yaml='common.yaml') {
# read in the YAML + src file
yaml <- readLines(yaml)
rmd <- readLines(inputFile)
# insert the YAML in after the first ---
# I'm assuming all my RMDs have properly-formed YAML and that the first
# occurence of --- starts the YAML. You could do proper validation if you wanted.
yamlHeader <- grep('^---$', rmd)[1]
# put the yaml in
rmd <- append(rmd, yaml, after=yamlHeader)
# write out to a temp file
ofile <- file.path(tempdir(), basename(inputFile))
writeLines(rmd, ofile)
# render with rmarkdown.
message(ofile)
ofile <- rmarkdown::render(ofile, encoding=encoding, envir=new.env())
# copy back to the current directory.
file.copy(ofile, file.path(dirname(inputFile), basename(ofile)), overwrite=T)
}
Pressing Ctrl+Shift+K/Knit from the editor of example.rmd will compile the result and show a preview. I know it is using common.yaml, because the result includes the date and author whereas example.rmd on its own does not have a date or author.
The first solution proposed by #mathematical.coffee is a good approach, but the example it gave did not work for me (maybe because the syntax had changed). As said so, this is possible by providing pandoc arguments in the YAML header. For example,
It's the content of a header.yaml file:
title: "Crime and Punishment"
author: "Fyodor Dostoevsky"
Add this to the beginning of the RMarkdown file:
---
output:
html_document:
pandoc_args: ["--metadata-file=header.yaml"]
---
See the pandoc manual for the --metadata-file argument.

Knitr & Rmarkdown docx tables

When using knitr and rmarkdown together to create a word document you can use an existing document to style the output.
For example in my yaml header:
output:
word_document:
reference_docx: style.docx
fig_caption: TRUE
within this style i have created a default table style - the goal here is to have the kable table output in the correct style.
When I knit the word document and use the style.docx the tables are not stylized according to the table.
Using the style inspector has not been helpful so far, unsure if the default table style is the incorrect style to modify.
Example Code:
```{r kable}
n <- 100
x <- rnorm(n)
y <- 2*x + rnorm(n)
out <- lm(y ~ x)
library(knitr)
kable(summary(out)$coef, digits=2, caption = "Test Captions")
```
I do not have a stylized document I can upload for testing unfortunately.
TL;DR: Want to stylise table output from rmarkdown and knitr automatically (via kable)
Update: So far I have found that changing the 'compact' style in the docx will alter the text contents of the table automatically - but this does not address the overall table styling such as cell colour and alignment.
Update 2: After more research and creation of styles I found that knitr seems to have no problem accessing paragraph styles. However table styles are not under that style category and don't seem to apply in my personal testing.
Update 3: Dabbled with the ReporteRs package - whilst it was able to produce the tables as a desired the syntax required to do so is laborious. Much rather the style be automatically applied.
Update 4: You cannot change TableNormal style, nor does setting a Table Normal style work. The XML approach is not what we are looking for. I have a VBA macro that will do the trick, just want to remove that process if possible.
This is essentially a combination of the answer that recommends TableNormal, this post on rmarkdown.rstudio.com and my own experiments to show how to use a TableNormal style to customize tables like those generated by kable:
RMD:
---
output:
word_document
---
```{r}
knitr::kable(cars)
```
Click "Knit Word" in RStudio. → The document opens in Word, without any custom styles yet.
In that document (not in a new document), add the required styles. This article explains the basics. Key is not to apply direct styles but to modify the styles. See this article on support.office.com on Style basics in Word.
Specifically, to style a table you need to add a table style. My version of Word is non-English, but according to the article linked above table styles are available via "the Design tab, on the Table Tools contextual tab".
Choose TableNormal as style name and define the desired styles. In my experiments most styles worked, however some did not. (Adding a color to the first column and making the first row bold was no problem; highlighting every second row was ignored.) The last screenshot in this answer illustrates this step.
Save the document, e.g. as styles.docx.
Modify the header in the RMD file to use the reference DOCX (see here; don't screw up the indentation – took me 10 minutes find this mistake):
---
output:
word_document:
reference_docx: styles.docx
---
Knit to DOCX again – the style should now be applied.
Following the steps I described above yields this output:
And here a screenshot of the table style dialog used to define TableNormal. Unfortunately it is in German, but maybe someone can provide an English version of it:
As this does not seem to work for most users (anyone but me …), I suggest we test this systematically. Essentially, there are 4 steps that can go wrong:
Wrong RMD (unlikely).
Differences in the initially generated DOCX.
Differences in how the TableNormal style is saved in the DOCX.
Differences in how the reference DOCX is used to format the final DOCX.
I therefore suggest using the same minimal RMD posted above (full code on pastebin) to find out where the results start do differ:
My initially generated DOCX.
The same document with TableNormal added: reference.docx
The final document.
The three files are generated on the following system: Windows 7 / R 3.3.0 / RStudio 0.99.896 / pandoc 1.15.2 / Office 2010.
I get the same results on a system with Windows 7 / R 3.2.4 / RStudio 0.99.484 / pandoc 1.13.1 / Office 2010.
I suppose the most likely culprits are the pandoc and the Office versions. Unfortunately, I cannot test other configurations at the moment. Now it would be interesting to see the following: For users where it does not work, what happens …
… if you start from my initial.docx?
If that does not work, what if you use my reference.docx as reference document?
If nothing works, are there eye-catching differences in the generated XML files (inside the DOCX container)? Please share your files and exact version information.
With a number of users running these tests it should be possible to find out what is causing the problems.
This was actually a known issue. Fortunately, it was solved in v2.0 or later releases of pandoc.
And I have tested the newer version, and found that there is a newly-added hidden style called "Table". Following #CL.'s suggestions to change "Table" style in reference.docx will be okay now.
In addition, look at this entry of pandoc's v2.0 release notes:
Use Table rather than Table Normal for table style (#3275). Table Normal is the default table style and can’t be modified.
As of 2021, I could not get any of the other suggested answers to work.
However, I did discover the {officedown} package, which, amongst other things, supports the styling of tables in .docx documents. You can install {officedown} with remotes::install_github("davidgohel/officedown")
To use {officedown} to render .Rmd to .docx you must replace
output:
word_document
in your document header with
output:
officedown::rdocx_document
In addition to this the {officedown} package must be loaded in your .Rmd.
As with the word_document output format, {officedown} allows us to use styles and settings from template documents, again with the reference_docx parameter.
With a reference document styles.docx, a minimal example .Rmd may look like:
---
date: "2038-01-19"
author: "The Reasonabilists"
title: "The end of time as we know it"
output:
officedown::rdocx_document:
reference_docx: styles.docx
---
```{r setup, include = FALSE}
# Don't forget about me: I'm important!
library("officedown")
```
{officedown} allows us to go one step further and specify the name of the table style to use in the document's front matter. This table style could be a custom style we created in styles.docx, or it could be one of Word's in-built styles you prefer.
Let's say we created a style My Table:
We could tell {officedown} to use this table style in our front matter as:
output:
officedown::rdocx_document:
reference_docx: styles.docx
tables:
style: My Table
Putting this altogether, knitting the minimal .Rmd:
---
date: "2038-01-19"
author: "The Reasonabilists"
title: "The end of time as we know it"
output:
officedown::rdocx_document:
reference_docx: styles.docx
tables:
style: My Table
---
```{r setup, include = FALSE}
# Don't forget about me: I'm important!
library(officedown)
```
```{r}
head(mtcars)
```
Resulting in a .docx document which looks like:
TableNormal doesn't work for me too.
On my Dutch version of Word 2016 (Office 365), I found out that I could markup tables with the style Compact.
Input (refdoc.docx contains the Compact style):
---
title: "Titel"
subtitle: "Ondertitel"
author: "`r Sys.getenv('USERNAME')`"
output:
word_document:
toc: true
toc_depth: 2
fig_width: 6.5
fig_height: 3.5
fig_caption: true
reference_docx: "refdoc.docx"
---
And RMarkdown:
# Methoden {#methoden}
```{r}
kable(cars)
```
Output:
You need to have a reference_docx: style.docx which has "Table" style in it. (see #Liang Zhang's explanation and links above).
Create a basis reference document using pandoc (source). In command line (or cmd.exe on Windows) run:
pandoc -o custom-reference.docx --print-default-data-file reference.docx
In this newly created reference.docx file, find the table created (a basic 1 row table with a caption).
While the table is selected, click "Table Design" and find "Modify Table Style":
Modify the style of the table as you wish and use this reference document in your RMD document (see the first answer by #CL.).
Using this reference document, you can also change the table and figure caption styles.
I was able to get my word output to use a default table style that I defined in a reference .docx.
Instead of 'TableNormal', the table style it defaulted to was 'Table'.
I discovered this by knitting an rmarkdown with a kable.
---
date: "December 1, 2017"
output:
word_document:
reference_docx: Template.docx
---
`r knitr::kable(source)`
Then I took a look at that generated document's XML to see what style it had defaulted to.
require(XML)
docx.file <- "generated_doc.docx"
## unzip the docx converted by Pandoc
system(paste("unzip", docx.file, "-d temp_dir"))
document.xml <- "temp_dir/word/document.xml"
doc <- xmlParse(document.xml)
tblStyle <- getNodeSet(xmlRoot(doc), "//w:tblStyle")
tblStyle
I defined the 'Table' style to put some color and borders in the reference docx. This works for one standard table style throughout the document, I haven't found a way to use different styles throughout.
This stayed true even after I opened the reference doc and edited it.

Resources