Create PDF Hyperlinked Table of Contents inside R - r

This is my first questions on StackOverflow, so please let me know if I'm doing anything wrong.
I'm using R to generate a lot of very large PDF documents. My data is about 580,000 observations, and breaks down in to 32 categories with each category containing 70 answers to between 20 and 300 questions. Currently I use two for loops (I try to avoid for loops, but for creating these pdfs it was the only way that worked). The first goes through and creates a pdf for the category with a title page, then the second adds a page for each graph showing the results of that question. I'm using ggplot2 & the "pdf" function.
The script works great, creating 32 pdfs (one for each category) with a custom title page and pages for all the questions in that category. I would like to add a Table of Contents after the title page. I know how to add a page with labels and page numbers, but I need one that links to each question.
I've searched this site and Google, but haven't found any way to do this in R. This question: Adding a table of contents to PDF with R plots talks about using RPython. I've also come across sources mentioning "hyperref", LaTex, Pandoc, and Knitr. I know how to use Kintr in an Rmarkdown doc, but that doesn't work for what I'm trying to do. I'm not really sure how to work with any of the others, so solutions with using them went over my head.
Is there not a way to work with creating a Table of Contents or just hyperlinks to PDF pages inside R, without going to those other languages?

Have you tried just clicking on the section names in the table of contents? By default, these seem to be hyperlinked, although there isn't any colouration that hints at it.
To help you see what might be happening, add / change your YAML header to add the following:
output:
pdf_document:
keep_tex: true
toc: true
toc_depth: 3
That will get the intermediate .tex file kept. If you open that up after knitting, you should already see references to hyperref in it.
I then find my table of contents being defined as:
{
\hypersetup{linkcolor=black}
\setcounter{tocdepth}{3}
\tableofcontents
}
which produces a hyperlinked TOC, but with "black" hyperlinks!
If you want to change the colour and see them show up, you can open the tex file in RSudio and simply change the "black" to "blue" and have RStudio run "Compile PDF" and you should see them showing up.
If you want your page numbers hyperlinked rather than the description, add the following into your YAML:
header-includes:
- \hypersetup{linktocpage}
Share & Enjoy!

I just remembered I left this open and thought I'd go back and post how I ended up solving it, well sorta. Instead of an R script, I used a R Markdownfile to create a combined pdf, which included all sections with their subsequent questions as different levels. I was able to create a pdf for each section individually with a linked clickable Table of Contents including all of its questions(pages) and different header levels for title pages.
The key was pandoc.header, which allowed me to create the headers, which show in the TOC. I think neither the for loops, nor the ggplot, which was created for each page, is relevant. Here is an overview of the .rmd :
title:
author:
output:
pdf_document:
toc: true
```{r results = "asis", message=FALSE, warning=FALSE, echo=FALSE, fig.height = 11, fig.width = 8}
for(i in 1:length(categories){
pandoc.header(paste("Category ",category_num, ": ", category discription), level = 1)
category title page
for(i in 1:numberofquestions){
pandoc.header(paste("Question ",question_num, ": ", subtitle1), level = 2)
print(ggplot())
}}
```
The only inconvenient part is that each page must have a header to be linked to and I really didn't like the title pages having one, but it looks like I can manually edit that out with what dsz posted.

Related

R Markdown Grouping of Figures to Prevent Pagebreak

I'm having a problem with assigning LaTeX environments within an RMarkdown for-loop code-chunk.
In short, I've written an R Markdown document and a series of R-scripts to automatically generate PDF reports at the end of a long data analysis pipeline. The main section of the report can have a variable number of sections that I'm generating using a for-loop, with each section containing a \subsection heading, a datatable and plot generated by ggplot. Some of these sections will be very long (spanning several pages) and some will be very short (~1/4 of a page).
At the moment I'm just inserting a \pagebreak at the end of each for-loop iteration, but that leaves a lot of wasted space with the shorter sections, so I'm trying to "group" each section (i.e. the heading, table and chart) so that there can be several per page, but they will break to a new page if the whole section won't fit.
I've tried using a figure or minipage environment, but for some reason those commands are printed as literal text when the plot is included; these work as expected with the heading and data table, but aren't returned properly in the presence of the image.
I've also tried to create a LaTeX samepage environment around the whole subsection (although not sure this will behave correctly with multi-page sections?) and then it appears that the Markdown generated for the plot is not interpreted correctly somewhere along the way (Pandoc?) when it's within that environment and throws an error when compiling the TeX due to the raw Markdown ![]... image tag.
Finally, I've also tried implementing \pagebreak[x] and \nopagebreak[y] hints at various points in the subsection but can't seem get these to be produce the desired page breaking behaviour.
I've generated an MWE that reproduces my issues below.
I'd be really grateful for any suggestions on how to get around this, or better ways of approaching "grouping" of elements that are generated in a dynamic fashion like this?
---
title: "Untitled"
author: "I don't know what I'm doing"
date: "26/07/2020"
output:
pdf_document:
latex_engine: xelatex
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, dev = "cairo_pdf")
```
```{r cars, results='asis'}
for (i in 1:5){
cat("\\begin{figure}")
cat(paste0("\\subsection{This is subsection ",i,"}"))
cat("\\Huge Here's some bulk text that would represent a data table... kasvfkwsvg fiauwe grfiwgiu iudaldbau iausbd ouasbou asdbva asdbaisd i iuahihai hiuh iaiuhqijdblab ihlibljkb liuglugu h uhi uhi uhqw iuh qoijhoijoijoi qwegru wqe grouw egq\\newline")
plot(mtcars$wt,mtcars[,i])
cat("\\end{figure}")
}
```
Edit to add: interestingly these figure and minipage environments seems to work as expected when executing the same example in an .Rnw using knitr... so does that narrow it down to an issue with Pandoc? Again, any help much appreciated!
What happens is that the raw TeX commands are not treated as TeX when going through Markdown. You can fix that by explicitly marking the relevant snippets as LaTeX:
for (i in 1:5){
cat("`\\begin{figure}`{=latex}")
cat(paste0("\\subsection{This is subsection ",i,"}"))
cat("\\Huge Here's some bulk text that would represent a data table... kasvfkwsvg fiauwe grfiwgiu iudaldbau iausbd ouasbou asdbva asdbaisd i iuahihai hiuh iaiuhqijdblab ihlibljkb liuglugu h uhi uhi uhqw iuh qoijhoijoijoi qwegru wqe grouw egq\\newline")
plot(mtcars$wt,mtcars[,i])
cat("`\\end{figure}`{=latex}")
}
See the generic raw attribute section in the pandoc manual for details.

How to reference two times to a single footnote in rmarkdown?

I try to reference to a single footnote in a few places in the text. However, with the code below, I've got two footnotes with the same content.
---
title: "My document"
output: html_document
---
One part of the text [^1].
Two pages later [^1].
[^1]: My footnote
Is it possible to reference more than once to a specific footnote using rmarkdown?
I had the same problem. I used html tags which worked really well. Whatever you want as superscript just put between the <\sup> like below:
<sup> text </sup>
jan 6 2023 update:
now I use quarto which is eally easy as you can just use do[^1]
[^1]footnote text
My workaround here is to just do it manually using inline latex mathmode (e.g, \(^2\) ).. Annoying, but even if they had a solution, you'd have to remember the citation number anyways...
I would suggest you to go with latex solution if you do not have many footnotes per page. By latex solution I mean:
(in Markdown, use Latex to superscript the footnote number)
First part$^1$
(and the next one)
Second part$^2$
(at the end of your text add *** to create a line across the document)
(under the line, add the text below:)
1, 2: Text for your footnote
On the other hand, there is a thread created on this specific R-Markdown bug. Maybe take a look at it, in this link.
Hope I helped somehow.

R markdown: can I insert a pdf to the r markdown file as an image?

I am trying to insert a pdf image into an r markdown file. I know it is possible to insert jpg or png images. I was just wondering if it is also possible to insert a pdf image. Thanks very much!
If you are just trying to insert an image that has been exported from, for example, some R analysis into a pdf image, you can also use the standard image options from the knitr engine.
With something like:
```{r, out.width="0.3\\linewidth", include=TRUE, fig.align="center", fig.cap=c("your caption"), echo=FALSE}
knitr::include_graphics("./images/imagename.pdf")
```
Unfortunately you can't specify the initial dimensions of your image output (fig.width and fig.height), which you would need to pre-define in your initial output, but you can specify the ultimate size of the image in your document (out.width). As noted below, however, this is limited to scaling down.
You could also of course leave out the initial directory specification if your files are in the same working directory. Just be aware of operating system differences in specifying the path to the image.
An alternative method is to use Markdown syntax noted by #hermestrismegistus on this post:
![Image Title](./path/to/image.pdf){width=65%}
This can also be collected for multiple images side-by side:
![Image Title](./path/to/image.pdf){width=33%}![Image2 Title](./path/to/image2.pdf){width=33%}![Image3 Title](./path/to/image3.pdf){width=33%}
Edit:
After working more extensively with in-text referencing, I have found that using r chunks and the include_graphics option to be most useful. Also because of the flexibility in terms of image alignment (justification).
As an example:
```{r image-ref-for-in-text, echo = FALSE, message=FALSE, fig.align='center', fig.cap='Some cool caption', out.width='0.75\\linewidth', fig.pos='H'}
knitr::include_graphics("./folder/folder/plot_file_name.pdf")
```
The reference can later be used in-text, for example, Figure \#ref(fig:image-ref-for-in-text) illustrates blah blah.
Some important things to note using this format:
You can only expand PDF images via a code chunk up to the out.width and out.height conditions set in the original .pdf file. So I would recommend setting them slightly on the larger side in your original image (just note that any chart text will scale accordingly).
The in-text reference code (in this case image-ref-for-in-text) CANNOT contain any underscores (_) but can contain dashes (-). You will know if you get this wrong by an error message stating ! Package caption Error: \caption outside float.
To stop your plots drifting to the wrong sections of your document, but in a way that unfortunately will generate some white space, the above example includes fig.pos='H'. Where H refers to "hold" position. The same can be achieved for the former Markdown option by placing a full-stop (period .) immediately after the last curly bracket.
Example:
![Image Title](./path/to/image.pdf){width=75%}.
Unfortunately, this latter option results in some unsightly full-stops. Another reason I prefer the include_graphics option.
Sorry, I found that there is a similar post before:
Add pdf file in Rmarkdown file
Basically, I can use something like below works well for the html output:
<img src="myFirstAlignment2.pdf" alt="some text" width="4200" height="4200">
And something like below works well for the pdf output:
(1)possible solution
\begin{center} <br>
\includegraphics[width=8in]{myFirstAlignment2.pdf} <br>
\end{center}
(2)possible solution
![Alt](myFirstAlignment2.pdf)
The myFirstAlignment2.pdf should be replaced with path\myFirstAlignment2.pdf if the pdf file is not in your working directory.
In relation to the comment of the best answer, there is a way to use the second option, and the output not come out tiny.
Use the following syntax below with the height being a large number. Having text in the brackets is necessary for it to work.
![Alt](./file.pdf){width=100% height=400}
None of the answers outlined worked well for me in terms of sizing the pdf, so adding another answer using the code chunk options for out.height and out.width to control the size:
```{r out.height = "460px", out.width='800px', echo=F}
knitr::include_graphics("./images/imagename.pdf")
```

Cache folder for figures deletes itself after knitting html/pdf on RMarkdown

This problem just seems weird to me: I have several plots and tables being produced in my RMarkdown doc, so I'd like to cache to save on compiling/knitting time. When I change all of the cache options to cache=TRUE, it creates two folders in my directory: "Figures_Full_Study_cache" and "Figures_Full_Study_figures" as it should. Problem is, while the former (_cache) remains, the latter (_figures) deletes itself after compiling.
As it's happening, I can see both the "Figures_Full_Study_cache" and "Figures_Full_Study_figures" folders being created. I can even go into the "Figures_Full_Study_figures" folder and see each individual .png being produced in real time. However, once Markdown/pandoc is done, the "Figures_Full_Study_figures" file is immediately deleted (I have screenshots of all of this, but I'm too new to stack overflow to post images).
Of course, once this file disappears along with the .png images, R generates an error when I try to run it again using cache=TRUE because those files don't exist (the tables work just fine). Any ideas why my cached file is deleting itself? I've included a representative example of how my plots are made using ggplot2 and entered into RMarkdown:
```{r,echo=FALSE, warning=FALSE, fig.width=5,fig.height=3.5,cache=TRUE}
ggplot(Demographics, aes(x=Exp)) +
geom_histogram(binwidth=2,fill="slateblue1", color="black", alpha=.9) +
theme_general # Defined previously, not relevant to this discussion
# My apologies for not being able to post the data... protected research.
```
And here are the options for the Markdown doc in case needed:
---
title: "Figures Full Study"
author: "Name"
date: "Wednesday, August 13, 2014"
output:
html_document: default
---

How to add a page break in word document generated by RStudio & markdown

I writing a Word document with R markdown in R Studio. I can get many things, but at the moment I am not figuring out how can I get a page break. I have found solutions but only for rendered latex / pdf document that it is not my case.
Added: To insert a page break, please use \newpage for formats including LaTeX, HTML, Word, and ODT.
https://bookdown.org/yihui/rmarkdown-cookbook/pagebreaks.html
Paragraph before page break.
\newpage
First paragraph on a new page.
Previously: There is a way by using a fifth-level header block (#####) and a docx template defined in YAML.
After creating headingfive.docx in Microsoft Word, you select Modify Style of the Heading 5, and then select Page break before in the Line and Page Breaks tab and save the headingfive.docx file.
---
title: 'Making page break using fifth-level header block'
output:
word_document:
reference_docx: headingfive.docx
---
In your Rmd document, you define reference_docx in the YAML header, and now you can use the page-breaking #####.
Please see below.
https://www.r-bloggers.com/r-markdown-how-to-insert-page-breaks-in-a-ms-word-document/
With the help of John MacFarlane and others on the pandoc google group, I put together a filter that does this. Please see:
https://groups.google.com/forum/#!topic/pandoc-discuss/FzLrhk0vVbU
In short, the filter needs to look for something to replace with the openxml for pagebreak. In this case
\newpage
is being replaced with
<w:p><w:r><w:br w:type=\"page\"/></w:r></w:p>
This allows for a single latex markup to be interpreted for both pdf and word output.
Joel
What you are trying to do is force a "page break" or "new page" in a word document generated with Pandoc. I have found a way to do this in my environment but I'm not sure it will work in every environment.
My environment:
* R-studio / Pandoc / MS-WORD starting with an "*.Rmd" file and generating a DOCX file.
In my RMD file the key idea is that i've created what acts like a TEMPLATE document (MyFormattingDocument.docx) and in that word document I tweak the STYLES for things like "Heading 1" and/or "Heading 2" and or "footnote" or whatever other predefined styles I want to tweak.
(SEE THIS: http://rmarkdown.rstudio.com/word_document_format.html#style-reference ) for explanation of style reference and how to set the header information in your RMD file to specify a reference document.
SOOOO in my case... i tweak the "Heading 1" style in WORD to include a forced "Page Break Before" in the Paragraph formatting for "Heading 1". Exactly how you force every "Heading 1" to always "Page Break" is different in different versions of Microsoft WORD but if you follow the WORD documentation and modify the "Heading 1" style THEN every "Heading 1" will always have a pagebreak before it.
THEN... you save this template file in the some directory you're working from with the RMD file... and it is USED AS a template. THE CONTENTS of the file are ignored.... so don't worry... you can put sample text in this file and test that the formatting all works.... THE CONTENTS ARE IGNORED but the STYLES are USED in the new word document which will be built by the RMD file so.... then every "Heading 1" will have a break before it.
NOTE: You could obviously do the same with ANY style that has a one-to-one mapping from PANDOC MARKUP so you could instead just make all "Heading 3" or whatever.... just look at see in your RMD created DOCX what "STYLE" is being applied and then tweak that style even if you need to insert some "fake" lines with essentially blank content just for the purpose of forcing a style to appear in the DOCX
Here is an R script that can be used as a pandoc filter to replace LaTeX breaks (\pagebreak) with word breaks, per #JAllen's answer above. With this you don't need to compile a pandoc script. Since you are working in R Markdown I assume one has R available in the system.
#!/usr/bin/env Rscript
json_in <- file('stdin', 'r')
lat_newp <- '{"t":"RawBlock","c":["latex","\\\\newpage"]}'
doc_newp <- '{"t":"RawBlock","c":["openxml","<w:p><w:r><w:br w:type=\\"page\\"/></w:r></w:p>"]}'
ast <- paste(readLines(json_in, warn=FALSE), collapse="\n")
ast <- gsub(lat_newp, doc_newp, ast, fixed=TRUE)
write(ast, "")
Save this as page-break-filter.R or something like that and make it executable by running chmod +x page-break-filter.R in the terminal.
Then include this filter the R Markdown YAML like so:
---
title: "Title
author: "Author"
output:
word_document:
pandoc_args: [
"--filter", "/path/to/page-break-filter.R"
]
---
You can use the R package worded. This avoids the need for a template word file. See https://github.com/davidgohel/worded.
The output parameter needs to be set to worded::rdocx_document and you need to call library(worded).
---
date: "2018-03-27"
author: "David Gohel"
title: "Document title"
output:
worded::rdocx_document
---
```{r setup, include=FALSE}
library(worded)
```
You can then add <!---CHUNK_PAGEBREAK---> to your document whenever you want a page break.
The package allows various word formatting options using a similar mechanism.
When updating to R 4.0.0, the <!---CHUNK_PAGEBREAK---> solution was not working any more for me.
Instead I could use the run_pagebreak() function from the officer package, still in combination with the officedown package:
---
output: word_document
---
```{r settings}
library(officedown)
library(officer)
```
Hello world on page 1
`r run_pagebreak()`
Hello world on page 2
R Markdown 1.16 introduced a new feature which allows to insert a page break by adding a paragraph that contains only the commands \pagebreak or \newpage:
Paragraph before page break.
\pagebreak
First paragraph on a new page.
See also the pagebreaks section in the R Markdown cookbook.
It is not an automated solution. But I have been adding the text '#####page break' to my markdown document. Then in MS Word using find-replace to replace the text "page break" with "^m" (manual page break).
Sungpil's article was close, but didn't quite work. This was the best solution I found for this:
https://scriptsandstatistics.wordpress.com/2015/12/18/rmarkdown-how-to-inserts-page-breaks-in-a-ms-word-document/
Even better, the author included the Word template to make this work. The R-blogger's link to his template is broken, and the header is formatted wrong. Some notes I took:
1) You might need to include the whole path to the word template in your Rmd header, like so:
output:
word_document:
reference_docx: C:/workspace/myproject/mystyles.docx
2) The template at the link above changed some of the default style settings so you'll need to change them back
My solution is not very robust but can work for some of us.
Assuming you need a page break before each level 1 title in your word document, I defined this in the format template used in the yaml field reference_docx: .
In this document you modify the Heading 1 format (or equivalent) to insert a page break before the Title. Do not forget to start your template with the first docx rendered with knitr (pandoc) in RStudio.
Ok, I found this in the markdown docs.
Horizontal Rule / Page Break
Three or more asterisks *** or dashes ---.

Resources