Have tried styling MS Word document from the markdown but I don't seem to get it write. What could I not be doing right? Below is the code
---
title: "Test Document"
author: "Moses Otieno"
date: "05/04/2021"
output:
word_document:
reference_docx: referent-doc.docx
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
## Test
This is the test
Referent Document https://www.dropbox.com/scl/fi/dyww5eiga334j55t58vcp/referent-doc.docx?dl=0&rlkey=o1ejilu3dfnncar65irh5er7v
Resulting document https://www.dropbox.com/scl/fi/mieqkcvy22eeighdjaby3/Test.docx?dl=0&rlkey=8wyrci2cg0ijyxbfdw5ks1znh
In a nutshell, make a template using an already knitted document containing your content and different types of headers etc.
I had a similar problem that was fixed by knitting my document and saving the resulting word(.docx) as "template". I now had a good working file with lots of content examples.
Next, I went into this file and manually changed the styles using "styles" in the Home tab. Open up this feature in your new "template" document by using the little down arrow on the bottom right-hand corner. Click on the various paragraphs and headers in your document to see what they refer to (the style will jump about on the dropdown menu) and make your changes. Save again. Now when you want to knit a new document, it will hopefully apply your changes.
Be aware that you might get caught out by the "Body Text" and "First Paragraph", so you might need to change both of these separately.
output:
word_document:
reference_docx: template.docx
In the YAML part of your project, you have to change your referent.docx into "referent-doc.docx", surrounded by quotation marks (" "):
output:
word_document:
reference_docx: "referent-doc.docx"
# quoted with quotation marks, and here located in the same folder
P.S. Be sure to produce a good reference_doc: You need to start from a .docx file which is produced by markdown (e.g., using the knit button), in order to get a few titles, texts, figs and tables into a .docx. Then edit the styles; margin, etc. from Microsoft Word, and save it to a .docx file.
Related
I've been trying to solve some HTML knitting issues. My HTML does not currently allow me to use HTML in the code, and thus I am unable to create tabsets.
However, while trying to solve that issue a new issue occured: My HTML output adds a clickable # behind each # Header.
I use the basic Rmarkdown format:
---
title: "Try"
output: html_document
---
/```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
/```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
## Including Plots
And then the output shows me this:
Anyway have any idea how to solve this?
This is a new feature introduced in the current development version of rmarkdown. See the NEWS file for more info. To disable this feature, you may use:
output:
html_document:
anchor_sections: false
I think the question is quite self-explanatory but for avoidance of doubt I'll explain with more detail below:
I have an R Markdown document that works well if converted to HTML or uploaded to GitHub. When converting to PDF (using Latex), the results are not so pretty. I find that the biggest problem in a Latex PDF document are line breaks. I can fix the line breaks issue on the PDF document by adding "\ " characters, but that throws my HTML document out of whack too.
Is there a way to manually add line breaks (or "space before/after paragraphs") for the PDF output only?
Thank you!
You can redefine the relevant spacings in the YAML header. \parskip controls the paragraph spacing. Code blocks are shaded using a snugshade environment from the framed package. We can also redefine the shaded environment for code blocks to have some vertical space at the start. Here's a reproducible example. Note: I also added the keep_tex parameter so you can see exactly what the generated tex file looks like, in case this is useful:
title: "test"
author: "A.N. Other"
header-includes:
- \setlength{\parskip}{\baselineskip}
- \renewenvironment{Shaded}{\vspace{\parskip}\begin{snugshade}}{\end{snugshade}}
output:
pdf_document:
keep_tex: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
Once you output to HTML, you can just print the HTML webpage as PDF. that might be an easy way keep the original format
How can I change the rmarkdown settings in a way that a new paragraph starts with an indented first line (as the default in LateX) rather than with blank space and no indentation.
That is what I normally get:
---
output: pdf_document
---
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:**
That is what I want:
In the header of your Rmd document, you can add LaTeX includes. To indent the paragraph, just change parindent, e.g.
output: pdf_document
header-includes:
- \setlength{\parindent}{4em}
- \setlength{\parskip}{0em}
Alternatively, you could store the latex commands in a separate file:
includes:
in_header: header.tex
See the advanced customization section.
When using knitr and rmarkdown together to create a word document you can use an existing document to style the output.
For example in my yaml header:
output:
word_document:
reference_docx: style.docx
fig_caption: TRUE
within this style i have created a default table style - the goal here is to have the kable table output in the correct style.
When I knit the word document and use the style.docx the tables are not stylized according to the table.
Using the style inspector has not been helpful so far, unsure if the default table style is the incorrect style to modify.
Example Code:
```{r kable}
n <- 100
x <- rnorm(n)
y <- 2*x + rnorm(n)
out <- lm(y ~ x)
library(knitr)
kable(summary(out)$coef, digits=2, caption = "Test Captions")
```
I do not have a stylized document I can upload for testing unfortunately.
TL;DR: Want to stylise table output from rmarkdown and knitr automatically (via kable)
Update: So far I have found that changing the 'compact' style in the docx will alter the text contents of the table automatically - but this does not address the overall table styling such as cell colour and alignment.
Update 2: After more research and creation of styles I found that knitr seems to have no problem accessing paragraph styles. However table styles are not under that style category and don't seem to apply in my personal testing.
Update 3: Dabbled with the ReporteRs package - whilst it was able to produce the tables as a desired the syntax required to do so is laborious. Much rather the style be automatically applied.
Update 4: You cannot change TableNormal style, nor does setting a Table Normal style work. The XML approach is not what we are looking for. I have a VBA macro that will do the trick, just want to remove that process if possible.
This is essentially a combination of the answer that recommends TableNormal, this post on rmarkdown.rstudio.com and my own experiments to show how to use a TableNormal style to customize tables like those generated by kable:
RMD:
---
output:
word_document
---
```{r}
knitr::kable(cars)
```
Click "Knit Word" in RStudio. → The document opens in Word, without any custom styles yet.
In that document (not in a new document), add the required styles. This article explains the basics. Key is not to apply direct styles but to modify the styles. See this article on support.office.com on Style basics in Word.
Specifically, to style a table you need to add a table style. My version of Word is non-English, but according to the article linked above table styles are available via "the Design tab, on the Table Tools contextual tab".
Choose TableNormal as style name and define the desired styles. In my experiments most styles worked, however some did not. (Adding a color to the first column and making the first row bold was no problem; highlighting every second row was ignored.) The last screenshot in this answer illustrates this step.
Save the document, e.g. as styles.docx.
Modify the header in the RMD file to use the reference DOCX (see here; don't screw up the indentation – took me 10 minutes find this mistake):
---
output:
word_document:
reference_docx: styles.docx
---
Knit to DOCX again – the style should now be applied.
Following the steps I described above yields this output:
And here a screenshot of the table style dialog used to define TableNormal. Unfortunately it is in German, but maybe someone can provide an English version of it:
As this does not seem to work for most users (anyone but me …), I suggest we test this systematically. Essentially, there are 4 steps that can go wrong:
Wrong RMD (unlikely).
Differences in the initially generated DOCX.
Differences in how the TableNormal style is saved in the DOCX.
Differences in how the reference DOCX is used to format the final DOCX.
I therefore suggest using the same minimal RMD posted above (full code on pastebin) to find out where the results start do differ:
My initially generated DOCX.
The same document with TableNormal added: reference.docx
The final document.
The three files are generated on the following system: Windows 7 / R 3.3.0 / RStudio 0.99.896 / pandoc 1.15.2 / Office 2010.
I get the same results on a system with Windows 7 / R 3.2.4 / RStudio 0.99.484 / pandoc 1.13.1 / Office 2010.
I suppose the most likely culprits are the pandoc and the Office versions. Unfortunately, I cannot test other configurations at the moment. Now it would be interesting to see the following: For users where it does not work, what happens …
… if you start from my initial.docx?
If that does not work, what if you use my reference.docx as reference document?
If nothing works, are there eye-catching differences in the generated XML files (inside the DOCX container)? Please share your files and exact version information.
With a number of users running these tests it should be possible to find out what is causing the problems.
This was actually a known issue. Fortunately, it was solved in v2.0 or later releases of pandoc.
And I have tested the newer version, and found that there is a newly-added hidden style called "Table". Following #CL.'s suggestions to change "Table" style in reference.docx will be okay now.
In addition, look at this entry of pandoc's v2.0 release notes:
Use Table rather than Table Normal for table style (#3275). Table Normal is the default table style and can’t be modified.
As of 2021, I could not get any of the other suggested answers to work.
However, I did discover the {officedown} package, which, amongst other things, supports the styling of tables in .docx documents. You can install {officedown} with remotes::install_github("davidgohel/officedown")
To use {officedown} to render .Rmd to .docx you must replace
output:
word_document
in your document header with
output:
officedown::rdocx_document
In addition to this the {officedown} package must be loaded in your .Rmd.
As with the word_document output format, {officedown} allows us to use styles and settings from template documents, again with the reference_docx parameter.
With a reference document styles.docx, a minimal example .Rmd may look like:
---
date: "2038-01-19"
author: "The Reasonabilists"
title: "The end of time as we know it"
output:
officedown::rdocx_document:
reference_docx: styles.docx
---
```{r setup, include = FALSE}
# Don't forget about me: I'm important!
library("officedown")
```
{officedown} allows us to go one step further and specify the name of the table style to use in the document's front matter. This table style could be a custom style we created in styles.docx, or it could be one of Word's in-built styles you prefer.
Let's say we created a style My Table:
We could tell {officedown} to use this table style in our front matter as:
output:
officedown::rdocx_document:
reference_docx: styles.docx
tables:
style: My Table
Putting this altogether, knitting the minimal .Rmd:
---
date: "2038-01-19"
author: "The Reasonabilists"
title: "The end of time as we know it"
output:
officedown::rdocx_document:
reference_docx: styles.docx
tables:
style: My Table
---
```{r setup, include = FALSE}
# Don't forget about me: I'm important!
library(officedown)
```
```{r}
head(mtcars)
```
Resulting in a .docx document which looks like:
TableNormal doesn't work for me too.
On my Dutch version of Word 2016 (Office 365), I found out that I could markup tables with the style Compact.
Input (refdoc.docx contains the Compact style):
---
title: "Titel"
subtitle: "Ondertitel"
author: "`r Sys.getenv('USERNAME')`"
output:
word_document:
toc: true
toc_depth: 2
fig_width: 6.5
fig_height: 3.5
fig_caption: true
reference_docx: "refdoc.docx"
---
And RMarkdown:
# Methoden {#methoden}
```{r}
kable(cars)
```
Output:
You need to have a reference_docx: style.docx which has "Table" style in it. (see #Liang Zhang's explanation and links above).
Create a basis reference document using pandoc (source). In command line (or cmd.exe on Windows) run:
pandoc -o custom-reference.docx --print-default-data-file reference.docx
In this newly created reference.docx file, find the table created (a basic 1 row table with a caption).
While the table is selected, click "Table Design" and find "Modify Table Style":
Modify the style of the table as you wish and use this reference document in your RMD document (see the first answer by #CL.).
Using this reference document, you can also change the table and figure caption styles.
I was able to get my word output to use a default table style that I defined in a reference .docx.
Instead of 'TableNormal', the table style it defaulted to was 'Table'.
I discovered this by knitting an rmarkdown with a kable.
---
date: "December 1, 2017"
output:
word_document:
reference_docx: Template.docx
---
`r knitr::kable(source)`
Then I took a look at that generated document's XML to see what style it had defaulted to.
require(XML)
docx.file <- "generated_doc.docx"
## unzip the docx converted by Pandoc
system(paste("unzip", docx.file, "-d temp_dir"))
document.xml <- "temp_dir/word/document.xml"
doc <- xmlParse(document.xml)
tblStyle <- getNodeSet(xmlRoot(doc), "//w:tblStyle")
tblStyle
I defined the 'Table' style to put some color and borders in the reference docx. This works for one standard table style throughout the document, I haven't found a way to use different styles throughout.
This stayed true even after I opened the reference doc and edited it.
I writing a Word document with R markdown in R Studio. I can get many things, but at the moment I am not figuring out how can I get a page break. I have found solutions but only for rendered latex / pdf document that it is not my case.
Added: To insert a page break, please use \newpage for formats including LaTeX, HTML, Word, and ODT.
https://bookdown.org/yihui/rmarkdown-cookbook/pagebreaks.html
Paragraph before page break.
\newpage
First paragraph on a new page.
Previously: There is a way by using a fifth-level header block (#####) and a docx template defined in YAML.
After creating headingfive.docx in Microsoft Word, you select Modify Style of the Heading 5, and then select Page break before in the Line and Page Breaks tab and save the headingfive.docx file.
---
title: 'Making page break using fifth-level header block'
output:
word_document:
reference_docx: headingfive.docx
---
In your Rmd document, you define reference_docx in the YAML header, and now you can use the page-breaking #####.
Please see below.
https://www.r-bloggers.com/r-markdown-how-to-insert-page-breaks-in-a-ms-word-document/
With the help of John MacFarlane and others on the pandoc google group, I put together a filter that does this. Please see:
https://groups.google.com/forum/#!topic/pandoc-discuss/FzLrhk0vVbU
In short, the filter needs to look for something to replace with the openxml for pagebreak. In this case
\newpage
is being replaced with
<w:p><w:r><w:br w:type=\"page\"/></w:r></w:p>
This allows for a single latex markup to be interpreted for both pdf and word output.
Joel
What you are trying to do is force a "page break" or "new page" in a word document generated with Pandoc. I have found a way to do this in my environment but I'm not sure it will work in every environment.
My environment:
* R-studio / Pandoc / MS-WORD starting with an "*.Rmd" file and generating a DOCX file.
In my RMD file the key idea is that i've created what acts like a TEMPLATE document (MyFormattingDocument.docx) and in that word document I tweak the STYLES for things like "Heading 1" and/or "Heading 2" and or "footnote" or whatever other predefined styles I want to tweak.
(SEE THIS: http://rmarkdown.rstudio.com/word_document_format.html#style-reference ) for explanation of style reference and how to set the header information in your RMD file to specify a reference document.
SOOOO in my case... i tweak the "Heading 1" style in WORD to include a forced "Page Break Before" in the Paragraph formatting for "Heading 1". Exactly how you force every "Heading 1" to always "Page Break" is different in different versions of Microsoft WORD but if you follow the WORD documentation and modify the "Heading 1" style THEN every "Heading 1" will always have a pagebreak before it.
THEN... you save this template file in the some directory you're working from with the RMD file... and it is USED AS a template. THE CONTENTS of the file are ignored.... so don't worry... you can put sample text in this file and test that the formatting all works.... THE CONTENTS ARE IGNORED but the STYLES are USED in the new word document which will be built by the RMD file so.... then every "Heading 1" will have a break before it.
NOTE: You could obviously do the same with ANY style that has a one-to-one mapping from PANDOC MARKUP so you could instead just make all "Heading 3" or whatever.... just look at see in your RMD created DOCX what "STYLE" is being applied and then tweak that style even if you need to insert some "fake" lines with essentially blank content just for the purpose of forcing a style to appear in the DOCX
Here is an R script that can be used as a pandoc filter to replace LaTeX breaks (\pagebreak) with word breaks, per #JAllen's answer above. With this you don't need to compile a pandoc script. Since you are working in R Markdown I assume one has R available in the system.
#!/usr/bin/env Rscript
json_in <- file('stdin', 'r')
lat_newp <- '{"t":"RawBlock","c":["latex","\\\\newpage"]}'
doc_newp <- '{"t":"RawBlock","c":["openxml","<w:p><w:r><w:br w:type=\\"page\\"/></w:r></w:p>"]}'
ast <- paste(readLines(json_in, warn=FALSE), collapse="\n")
ast <- gsub(lat_newp, doc_newp, ast, fixed=TRUE)
write(ast, "")
Save this as page-break-filter.R or something like that and make it executable by running chmod +x page-break-filter.R in the terminal.
Then include this filter the R Markdown YAML like so:
---
title: "Title
author: "Author"
output:
word_document:
pandoc_args: [
"--filter", "/path/to/page-break-filter.R"
]
---
You can use the R package worded. This avoids the need for a template word file. See https://github.com/davidgohel/worded.
The output parameter needs to be set to worded::rdocx_document and you need to call library(worded).
---
date: "2018-03-27"
author: "David Gohel"
title: "Document title"
output:
worded::rdocx_document
---
```{r setup, include=FALSE}
library(worded)
```
You can then add <!---CHUNK_PAGEBREAK---> to your document whenever you want a page break.
The package allows various word formatting options using a similar mechanism.
When updating to R 4.0.0, the <!---CHUNK_PAGEBREAK---> solution was not working any more for me.
Instead I could use the run_pagebreak() function from the officer package, still in combination with the officedown package:
---
output: word_document
---
```{r settings}
library(officedown)
library(officer)
```
Hello world on page 1
`r run_pagebreak()`
Hello world on page 2
R Markdown 1.16 introduced a new feature which allows to insert a page break by adding a paragraph that contains only the commands \pagebreak or \newpage:
Paragraph before page break.
\pagebreak
First paragraph on a new page.
See also the pagebreaks section in the R Markdown cookbook.
It is not an automated solution. But I have been adding the text '#####page break' to my markdown document. Then in MS Word using find-replace to replace the text "page break" with "^m" (manual page break).
Sungpil's article was close, but didn't quite work. This was the best solution I found for this:
https://scriptsandstatistics.wordpress.com/2015/12/18/rmarkdown-how-to-inserts-page-breaks-in-a-ms-word-document/
Even better, the author included the Word template to make this work. The R-blogger's link to his template is broken, and the header is formatted wrong. Some notes I took:
1) You might need to include the whole path to the word template in your Rmd header, like so:
output:
word_document:
reference_docx: C:/workspace/myproject/mystyles.docx
2) The template at the link above changed some of the default style settings so you'll need to change them back
My solution is not very robust but can work for some of us.
Assuming you need a page break before each level 1 title in your word document, I defined this in the format template used in the yaml field reference_docx: .
In this document you modify the Heading 1 format (or equivalent) to insert a page break before the Title. Do not forget to start your template with the first docx rendered with knitr (pandoc) in RStudio.
Ok, I found this in the markdown docs.
Horizontal Rule / Page Break
Three or more asterisks *** or dashes ---.