How to cache intermediate results when rendering with parameters? - r

With RMarkdown, I try to render a parametrized report for different values of a parameter. The Rmd file use caching.
The caching works as intended if I knit in RStudio, with the knit button : cache built at first, then used at each successive knitting, even if I change the parameter value in the YAML header.
But when looping with my parameters values and using rmarkdown::render() the cache is rebuilt at each iteration.
The test.Rmd file
---
title: "Untitled"
author: "Author"
params:
id: 0
date: "23/10/2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Test `r params$id`
```{r cars, cache=TRUE}
## open and work on large file (simulate)
test <- mtcars
Sys.sleep(10)
```
And the rendering script : render.R
library(rmarkdown)
library(tidyverse)
1:5 %>%
walk(function(x) render("test.Rmd",
params = list(id = x),
output_file = paste0("file", x, ".html")))
The script takes 5 * 10 seconds to run instead of about 10 seconds.
What did I do wrong? How to use the cache?

It has nothing to do with parameters, which can be shown by the minimized reprex below (test.Rmd) by taking out the parameters (and the irrelevant tidyverse):
---
title: "Untitled"
---
```{r, cache=TRUE}
Sys.sleep(10)
```
Then run
for (i in 1:5) rmarkdown::render(
"test.Rmd", output_file = paste0("file", i, ".html")
)
The problem comes from output_file, which changes in each iteration. For R Markdown documents, the output filename determines the knitr chunk option fig.path. For example, when output_file = "file1.html", fig.path is set to file1_files/html/.
When any chunk option of a code chunk changes, knitr will invalidate its cache. In your case, fig.path invalidated the cache each time. To avoid that, you have to stabilize this option, e.g.,
---
title: "Untitled"
---
```{r, cache=TRUE, fig.path='test_files/html/'}
Sys.sleep(2)
```

Related

rmarkdown read code from file and display with highlight

I have two RMarkdown files. main.Rmd which is the main file which is rendered as well as example.Rmd which holds a longer example and is used elsewhere (hence it lives in its own document).
I want to include example.Rmd in the main.Rmd file with code highlighting of its RMarkdown code but the code of example.Rmd does not need to be executed, as if I set eval=FALSE and copied all code into the chunk by hand.
An example MWE is
main.Rmd
---
title: This is main.rmd
output: html_document
---
```{r}
# attempt that doesnt work
cat(readLines("example.Rmd"), sep = "\n")
```
and in example.Rmd
---
title: This is example.rmd
output: html_document
---
```{r}
# code that is not executed but shown in main.Rmd
data <- ...
```
Set eval=FALSE in the example.Rmd file and then include it in main.Rmd using child chunk option.
example.Rmd
---
title: This is example.Rmd
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(eval = FALSE)
```
```{r}
# This is from example.Rmd
x <- rnorm(10)
y <- rnorm(10)
lm(y ~ x)
```
```{r}
# some comments
print("This is from example.Rmd")
```
main.Rmd
---
title: This is main.Rmd
output:
html_document:
highlight: haddock
---
```{r example, child="example.Rmd"}
```
Edit
To show full source-code of the Rmarkdown file, one possible option could be reading that Rmd file and then cat it with chunk option comment="".
Now about the syntax highlighting; there's a chunk option class.output with which it is possible to specify a language name for which pandoc supports syntax highlighting.
You can get the list of language names for which pandoc has syntax highlighting support by running the following,
pandoc --list-highlight-languages
(Note, if you don't have pandoc installed separately, you can also use the pandoc installed with Rstudio itself. Run rmarkdown::pandoc_exec() to get the pandoc executable path)
Now, the file we are trying to include actually contains not just R code, but also markdown and yaml syntaxes. So it's a kind of mixed thing and pandoc has no syntax highlighting support out of the box for this. Still I have chosen c as syntax highlighting language just to show the possibility. (Also tried r, but syntax-highlighting is not so distinctive)
---
title: This is main.Rmd
output:
html_document:
highlight: tango
---
## Rmarkdown
```{r example, echo=FALSE, class.output="c", comment=""}
cat(readLines("example.Rmd"), sep = "\n")
```
But still if you want a specific syntax-highlighting for Rmarkdown, you can actually create one. See here from the pandoc documentation itself and also this answer on SO about this.

Write pdf to figure folder, but delete pngs

This is a follow-up question about this answer. I have set the knitr chunk options to output a png and pdf version of plots in a folder, as well as use the pngs in the knitted report.
However, I'd only like to keep the pdf version of the figure and discard the png file. Is there a knitr-equivalent of on.exit() to clean up the pngs after knitting? Or an option I overlooked?
With the rmarkdown document below, how do I automatically clean up the png version of the plot after knitting? (Or not produce it as a standalone file in the first place)
---
title: "Untitled"
author: "Me"
date: "21/10/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE,
dev = c("png", "pdf"),
fig.path = here::here(
"figures",
gsub("\\.Rmd$", "\\\\", basename(knitr::current_input()))
)
)
```
```{r my_plot}
library(ggplot2)
ggplot(mpg, aes(displ, hwy)) +
geom_point()
```
That is not exactly what you are looking for, but manually removing eval=FALSE from the following chunk, deletes the wanted files:
```{r eval=FALSE, include=FALSE}
fList <- dir("figures")
fList <- fList[stringr::str_detect(fList,"\\.png$")]
file.remove(paste0("figures/",fList))
```

Why R Markdown Caption cannot take "&"

I am gradually building an R Markdown (.RMD) file, learning by doing. I was able to insert a couple of tables, but I had a problem with one of them. The initial setup is:
---
title: "Untitled"
author: "Me"
date: "5/10/2021"
output: bookdown::pdf_book
---
```{r setup, include=FALSE}
library(knitr)
opts_chunk$set(echo = FALSE,
fig_align = "left",
fig_width = 7,
fig_height = 7,
dev = "png",
cache = FALSE)
```
The original code that generated an error was
```{r sphistperf}
kable(stock_index_stats,
format="latex",
caption="S&P Historical Performance Statistics")
```
The error message is:
output file: TestCenter.knit.md
! Misplaced alignment tab character &.
<argument> ...}{\relax }}\label {tab:sphistperf}S&
P Historical Performance S...
l.202 ...rf}S&P Historical Performance Statistics}
Error: LaTeX failed to compile TestCenter.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See TestCenter.log for more info.
Error during wrapup:
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
The problem is fixed if I remove the "&" from the caption, which becomes
caption="SP Historical Performance Statistics"
Still, I want the "&" in my caption. Is there a way to keep it? I tried putting an escape character "" before it and that did not work. Any suggestions on how to keep the "&"?
According to wiki, there are some characters that needs escaping
Here, is a tested version of the markdown code
---
title: "testing"
author: "akrun"
date: "10/05/2021"
output: bookdown::pdf_book
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r cars}
library(kableExtra)
kable(summary(cars), format = 'latex', caption="Dummy S\\&P Performance")
```
-output

Using webshot and robervable

I would like to include screenshots from an observable notebook in an rmarkdown rendered to pdf. Directly including code blocks that call robservable does not work, so I thought I could use the webshot package instead. The example below creates a file, test.html, that does contain the interactive notebook. However, the rendered pdf still does not show any screenshot.
---
title: "robservable and webshot"
output: pdf_document
---
```{r}
knitr::opts_chunk$set(screenshot.opts = list(delay = 5))
library("robservable")
library("webshot")
library("htmlwidgets")
library("dplyr")
```
This is a test.
```{r}
f <- "test.html"
robservable("#d3/horizontal-bar-chart", include = "chart") %>%
saveWidget(f)
```
```{r}
webshot(f)
```

knitr cache: update if data file changes but not other options (e.g., `fig.xyz`)

Suppose that I use knitr, I have a chunk which takes a while to run, I want this chunk to update if a file changes but not if I e.g., change fig.path. The later suggest that I should change the cache chunk option to 1 but then I cannot use a check sum as suggested here.
Here is an example of a markdown file
---
title: "Example"
author: "Benjamin Christoffersen"
date: "September 2, 2018"
output: html_document
---
```{r setup, include=FALSE}
data_file <- "~/data.RDS"
knitr::opts_chunk$set(echo = TRUE, cache.extra = tools::md5sum(data_file))
```
```{r load_data}
dat <- readRDS(data_file)
```
```{r large_computation, cache = 1}
Sys.sleep(10)
Sys.time() # just to that result do not change
```
```{r make_some_plot}
hist(dat)
```
Running set.seed(1): saveRDS(rnorm(100), "~/data.RDS") and knitting yields
Then running set.seed(2): saveRDS(rnorm(100), "~/data.RDS") and knitting yields
showing that large_computation is not updated as is should not since cache.extra is not in the knitr:::cache1.opts vector. Of course, I can save the md5sum result, check the previous stored file and use cache.rebuild or do something similar in the large_computation chunk but it would be nice with a knitr solution. I often find that I change some chunk options (e.g., dpi, fig.width, and fig.height) so using cache = TRUE will not work. I guess one could modify the package to be able to add options to knitr:::cache1.opts.
If I understand the question correctly, the problem is that cache.extra is not taken into account if cache is set to 1. In fact, this is by design.
The desired behavior is to invalidate the cache of all chunks (including chunks with cache = 1) if an external file (or more general: some value provided to cache.extra) changes.
As mentioned in the question, one way to achieve this is using the chunk option cache.rebuild but instead of manually keeping track of changes in the external file, I'd take advantage if knitr's built-in caching capabilies:
```{r cachecontrol, cache = TRUE, cache.extra = tools::md5sum(data_file)}
knitr::opts_chunk$set(cache.rebuild = TRUE)
```
Adding this as an early chunk, the cache of all subsequent chunks is invalidated if data_file changes. The idea is to cache the chunk that controls caching of subsequent chunks – but only if the external file is unchanged.
Of course, this only works if no global chunk options are changed before the cachecontrol chunk is evaluated.
Full example from the question:
Run set.seed(1); saveRDS(rnorm(100), "data.RDS") with different seeds to generate different external files, then knit:
---
title: "Invalidate all chunks condidional on external file (even if cache=1)"
output: html_document
---
```{r}
data_file <- "data.RDS"
```
```{r cachecontrol, include = FALSE, cache = TRUE, cache.extra = tools::md5sum(data_file)}
# do NOT change global chunk options before this chunk
knitr::opts_chunk$set(cache.rebuild = TRUE)
```
```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.width = 8)
```
```{r load_data}
dat <- readRDS(data_file)
```
```{r large_computation, cache = 1}
Sys.sleep(10)
Sys.time() # just to show that result do not change unless external file changes
```
```{r make_some_plot}
hist(dat)
```
I found another solution to the ignorance of cache.extra when cache=1 or 2.
Please insert the following hook code to the setup section, which inserts extra comment to a code section to invalidate a cache when the cache.extra is changed.
knitr::opts_hooks$set(cache.extra = function(options){
# invalidate cache
options$code <- c(sprintf("# cache.extra: %s", options$cache.extra), options$code)
options
})

Resources