How to cache knitr chunks across two (or more) files? - r

I want to use some R-Code in two different *.Rnw files and want to use caching across those files.
I read http://yihui.name/knitr/demo/externalization/
Caching in one file just works fine. But running the second one the whole code is executed again:
plain.R
## #knitr random1
a <- rnorm(10)
a
doc1.Rnw (and doc2.Rnw)
\documentclass{article}
<<set-options, echo=FALSE, cache=FALSE>>=
options(replace.assign=TRUE)
opts_chunk$set(external=TRUE, cache=TRUE, echo=FALSE, fig=TRUE)
read_chunk('plain.R')
#
\title{Doc 1}
\begin{document}
<<random1>>=
#
\end{document}
Is there a way to share the cache across several documents?

It is entirely possible to reuse the cache across multiple source documents. Please read the cache page carefully to understand when cache will be rebuilt. In your case, the cache is not supposed to be rebuilt unless your two documents have different chunk options (condition 1), or different getOption('width') (condition 3), since your code remains the same (condition 2).
You have to post a reproducible example, otherwise this is not considered a real question.

After completely resetting the example it turned out that the cache is reused by both files. I'm not sure what caused the problem before ....
But in a bigger project the chunks are not cached. So I'm not sure what causes the problem - maybe just a different count of spaces ....

Related

Abstracts in rmarkdown that include numbers generated from the rmd itself?

So basically I have been writing a paper in Rmarkdown. The paper includes an abstract which has numbers/results that are generated from the code chunks within the markdown itself. Up to now, the workaround has been to place the abstract at the end of the paper, so that all the code chunks are run and the results generated before they are needed in the abstract.
Now that I am actually working on the final drafts, It would be ideal to have the abstract in the beginning. Is this even possible?
Thank you!
If your values won't change from run to run, one option is to use knitr::load_cache to load values from the cache of later chunks in your abstract section. The main downside is that this will only work on the second time knitting the document. The first time, load_cache will give NULL, then the later chunk will be run and the value cached. The second time, the cache will exist and will be used in the abstract.
```{r abstract}
y = knitr::load_cache('test-a', 'y')
print(y)
```
```{r test-a, cache=TRUE}
y = 2*pi
```
The first time you run it will give you this:
But knit it again and you'll see this:
This is kind of awkward, but was the recommended solution from yihui, the creator of rmarkdown. See this github issue: https://github.com/yihui/knitr/issues/868#issuecomment-68129294
You have to be careful with cached chunks – make sure that there is nothing that would change between runs and clear the cache before doing your final (2-step) knitting.

Rmd re-run all blocks that read_csv

I have an Rmd document with several code blocks that call read_csv on many different csv's. The Rmd makes various graphs and tables and I use cache=TRUE to speed up the rendering.
Another program produces the CSVs and generates different results depending on different experiments and configurations. So I want the Rmd to reload those CSVs which have changed and use the cache for those that have not.
At the moment I have a block parameter {r lastrun=100} on each code block that has a read_csv, and i search for lastrun and replace e.g. to 101 for those blocks that have read_csv i think should be reread, but I'd like this to be automated somehow.
so right now my Rmd looks like:
```{r lastrun=100}
a<-map_dfr(paste0(1:10, ".a.csv"), read_csv)
```
lots of text and including code blocks
```{r}
#whatever
print(f(a))
```
```{r lastrun=100}
b<-map_dfr(paste0(1:20, ".b.csv"), read_csv)
```
So on rendering that, if later any of *.a.csv or *.b.csv change, then i have to search/replace lastrun or i'll just see the stale cached versions of a and b. I want a and b to update when the files change (i don't need to be able to identify the exact file and only reread that one, just reloading the block would be fine.)
How can this be done?
Thanks
-Neal
Here's a trick that might work:
---
title: 'nothing to report'
output: html_document
---
```{r setup}
files <- c("file1", "file2")
fileinfo <- file.info(files)
```
```{r test1, file=fileinfo['file1',-6], cache = TRUE}
Sys.sleep(5)
```
```{r test2, file=fileinfo['file2',-6], cache = TRUE}
Sys.sleep(5)
```
To test:
Render this, it should take 10 seconds.
Render it again, it should be instantaneous.
Create one or both of the files file1 and file2.
Render it again, it should take 5 seconds (if one created) or 10 seconds (both).
Touch one of the files (either updating its mtime or add/replace contents, doesn't matter).
Render it again, there should be a pause.
Notes:
files do not need to exist at first; NA row contents (in fileinfo) is just as changing when files "appear"
files can be in the current directory, a subdirectory, or any absolute path, the name of the row is the name set within my files variable
with [,-6], I'm omitting atime, which indicates when something "looks at the file"; this updates with each file.info, so obviously this would defeat the purpose of the cache (invalidating on each test)
other than that, it is currently invalidating on a change in: size, isdir, mode, mtime, ctime, and exe, all fields within file.info; you can easily choose specific fields (instead of omitting one, as I did above);
one could have a chunk depend on multiple files, you either need to be creative in the file= used in order to grab multiple rows, or your setup block should know how to aggregate summaries
if this is on a network drive, sometimes things like mtime and such are less reliable; in that case, one might consider using md5sum somehow
(You will likely want the code in the {r setup} block to be hidden.)

Can I generate separate HTML file for each header using rmarkdown::render()?

I generate reports using rmarkdown::render() function on a list of .Rmd files and I get one HTML file for each of them.
That was fine until my dataset got bigger and my reports now contain >100 figures... The HTML files often end-up being >100MB and I now have some very big ones (~500MB).
The .Rmd is separated in several chunks so one might think I have to split my .Rmd in smaller files (let's say one chunk per file).
This is not (easily) doable because the .Rmd defines a data-processing workflow (figures generated in chunk3 require processings made in chunk1 and chunk2).
I would like to know if it is possible to split the rendering in several HTML files automatically.
Ideally I dream about a 'splitHeader' argument in render() that would generate separate HTML file for each header of a specified level.
I guess an ugly solution is to manually add conditional statements for every chunk/header that I would like rendered (or not), and call render() several time with different arguments. But this is extremely inefficient (and ugly, I said that already)...
Would somebody have suggestions to achieve that ?
I am not sure if this solve (or at least help to solve): You can have multiple independent .Rmd files (childs) dividing the content as you like. In a "Mother" file, you can add the child using:
```{r child = "yourChild.Rmd"}
```
The child .Rmd files should no contain any header information. That is, delete the first lines in you .Rmd that are something like:
---
title: "Your Title"
author: "Your name"
output: html_notebook
---

Tangle knitr code blocks into not one but several files

I am a new user to knitr. I know that knitr can "tangle out" (taken from the Literate programming community) or extract source code blocks into an R script file. Being an org-mode-user, I am used to being able to specify a specific file for each code block, with potentially the same file for different blocks. When "tangling" or extracting source in org-mode, instead of having one output code file, several code files are produced (this helps with modularity in large projects).
I wonder if something similar is possible in knitr? Can I specify the output file in knitr on a block by block basis?
There are at least two different readings of your question, each requiring slightly different workflows.
If each chunk is going to be written into a separate output document, then to assist modularity, you should split the reporting part down into multiple documents. Since knitr supports child documents, you can always recombine these into larger documents in any combinations that you like.
If you want conditional execution of some chunks, and there are a few different combinations of conditions that can be run, use an R Markdown YAML header, and include a params element.
----
params:
report_type: "weekly" # should be "weekly" or "yearly"
----
You can set which chunks are run by setting the eval and include chunk options.
```{r, some_chunk, eval = params$report_type == "weekly", include = params$report_type == "weekly"}
# chunk contents
```

Rstudio is deleting key files when I knit (both PDF and HTML)

So I am having an R nightmare. I've returned to a project I built under the previous iteration (or perhaps one more) of RStudio. I produced a workable report that I was asked to update, and my current bugbear wasn't around then. Here is what happens:
My report file is "ISS Time Series.Rmd". It calls three other files:
"mystyles.sty", which updates the LaTeX preamble to use some additional packages.
"functions.R" and "load.R". The former contains frequently used functions I've written, and the latter loads the data I'm using.
I source the two R functions in the .Rmd file. When I try to Knit the report, whether I get an error or am successful, my two .R files and my one .sty file are deleted. And not just deleted -- gone for good.
I do not know what is up. I have ruined my previous work simply by returning to examine the original file.
Please, somebody has to help me here. My workflow is shot to hell if I have to write every last bit of code over and over again in each report.
UPDATE: Even copying the files to another directory doesn't help.
Here is the code block that calls the "load.R" file:
```{r loaddata}
#
# ------- Load Data
#
# This section loads the ISS survey files one at a time and saves them as
# read.SPSS objects within a list. It names these eleven objects as "ISS 2002",
# "ISS 2003", etc... until "ISS 2012". This file may be prohibitively large.
#
source("load.R") # Loads the ISS Survey files
```
Rename your file to ISS_Time_Series.Rmd and try again.
It is the spaces in the document name that makes rmarkdown::render() delete the files that have been loaded or sourced.
A an issue has already been filed. See https://github.com/rstudio/rmarkdown/issues/580

Resources