With knitr, preserve chunk options when purling chunks into separate files - r

For teaching purposes, I would like to purl the chunks of my .Rnw file into separate files. This answer explains how to do it:
How to purl each chunks in .Rmd file to multiple .R files using Knitr
BUT the method does not preserve the chunk options. Since the chunks I have produce plots, it's important to preserve the fig.width and fig.height options. Ideally I would like a chunk that looks like this:
<<plot, fig.width = 3, fig.height = 5, outwidth = '.75\\textwidth'>>=
plot (1,1)
#
to become a file named plot.R that looks like this:
#+ fig.width = 3, fig.height = 5
plot (1,1)
That is, turn the chunk options fig.width and fig.height into a format that will be recognized by spin(), as purl() does, and get rid of the chunk options that are irrelevant, or create problems for spin() into Word, such as out.width. All in the spirit of creating code files that are user-friendly.

Since the answer you refer to doesn't copy the header line from the results of purl, you lose everything besides the chunk name. While you could adapt it to paste in the headers, it's actually not hard to build a function to parse the output of purl—much easier than trying to parse a Rmd or Rnw document, anyway, and easier than sorting out exactly how knitr does so.
purl_chunks <- function(input_file){
purled <- knitr::purl(input_file) # purl original file; save name to variable
lines <- readLines(purled) # read purled file into a character vector of lines
starts <- grep('^## ----.*-+', lines) # use grep to find header row indices
stops <- c(starts[-1] - 1L, length(lines)) # end row indices
# extract chunk names from headers
names <- sub('^## ----([^-]([^,=]*[^,=-])*)[,-][^=].*', '\\1', lines[starts])
names <- ifelse(names == lines[starts], '', paste0('_', names)) # clean if no chunk name
# make nice file names with chunk number and name (if exists)
file_names <- paste0('chunk_', seq_along(starts), names, '.R')
for(chunk in seq_along(starts)){ # loop over header rows
# save the lines in the chunk to a file
writeLines(lines[starts[chunk]:stops[chunk]], con = file_names[chunk])
}
unlink(purled) # delete purled file of entire document
}
A couple notes:
While the regex works for what I've thrown at it, it may yet be fallible. Tested:
no chunk name
no name but chunk settings
name with hyphens
single character names
names with spaces, including after (before the comma/brace)
As .Rnw and .Rmd files both purl the same, it works for either.
It uses the default setting (1L) for purl's documentation parameter. 0L wouldn't have chunk headers and is thus pointless here anyway, but it would handle 2L (which would include text chunks as roxygen comments) stupidly. It could be rebuilt for such, though, if you wanted text chunks with the following code chunk.
While the output header lines of purl don't look exactly like your example above (they start with ## ---- and are filled with hyphens), they spin properly; results obey chunk options.

Related

Source code from Rmd file within another Rmd

I'm attempting to make my code more modular: data loading and cleaning in one script, analysis in another, etc. If I were using R scripts, this would be a simple matter of calling source on data_setup.R inside analysis.R, but I'd like to document the decisions I'm making in an Rmarkdown document for both data setup and analysis. So I'm trying to write some sort of source_rmd function that will allow me to source the code from data_setup.Rmd into analysis.Rmd.
What I've tried so far:
The answer to How to source R Markdown file like `source('myfile.r')`? doesn't work if there are any repeated chunk names (a problem since the chunk named setup has special behavior in Rstudio's notebook handling). How to combine two RMarkdown (.Rmd) files into a single output? wants to combine entire documents, not just the code from one, and also requires unique chunk names. I've tried using knit_expand as recommended in Generate Dynamic R Markdown Blocks, but I have to name chunks with variables in double curly-braces, and I'd really like a way to make this easy for my colaborators to use as well. And using knit_child as recommended in How to nest knit calls to fix duplicate chunk label errors? still gives me duplicate label errors.
After some further searching, I've found a solution. There is a package option in knitr that can be set to change the behavior for handling duplicate chunks, appending a number after their label rather than failing with an error. See https://github.com/yihui/knitr/issues/957.
To set this option, use options(knitr.duplicate.label = 'allow').
For the sake of completeness, the full code for the function I've written is
source_rmd <- function(file, local = FALSE, ...){
options(knitr.duplicate.label = 'allow')
tempR <- tempfile(tmpdir = ".", fileext = ".R")
on.exit(unlink(tempR))
knitr::purl(file, output=tempR, quiet = TRUE)
envir <- globalenv()
source(tempR, local = envir, ...)
}

Resources for knitr chunks

I've been trying to find this on the web but haven't had any luck. I'm working on creating a report with R using knitr and was wondering if anyone knows of a good resource for all options involving <<>>=. I've seen some examples like <<setup, include=FALSE, cache=FALSE>>= but I don't know what these mean and would like to know what else I can do.
To let you close the question - everything is here: http://yihui.name/knitr/options/#chunk_options, I will just paste the most important (in my opinion) options below:
Code Evaluation
eval: (TRUE; logical) whether to evaluate the code chunk; it can also be a numeric vector to select which R expression(s) to evaluate, e.g. eval=c(1, 3, 4) or eval=-(4:5)
Text Results
echo: (TRUE; logical or numeric) whether to include R source code in the output file; besides TRUE/FALSE which completely turns on/off the source code, we can also use a numeric vector to select which R expression(s) to echo in a chunk, e.g. echo=2:3 means only echo the 2nd and 3rd expressions, and echo=-4 means to exclude the 4th expression
results: ('markup'; character) takes these possible values
markup: mark up the results using the output hook, e.g. put results in a special LaTeX
asis: output as-is, i.e., write raw results from R into the output document
hold: hold all the output pieces and push them to the end of a chunk
hide hide results; this option only applies to normal R output (not warnings, messages or errors)
collapse: (FALSE; logical; applies to Markdown output only) whether to, if possible, collapse all the source and output blocks from one code chunk into a single block (by default, they are written to separate <pre></pre> blocks)
warning: (TRUE; logical) whether to preserve warnings (produced by warning()) in the output like we run R code in a terminal (if FALSE, all warnings will be printed in the console instead of the output document); it can also take numeric values as indices to select a subset of warnings to include in the output
error: (TRUE; logical) whether to preserve errors (from stop()); by default, the evaluation will not stop even in case of errors!! if we want R to stop on errors, we need to set this option to FALSE
message: (TRUE; logical) whether to preserve messages emitted by message() (similar to warning)
include: (TRUE; logical) whether to include the chunk output in the final output document; if include=FALSE, nothing will be written into the output document, but the code is still evaluated and plot files are generated if there are any plots in the chunk, so you can manually insert figures; note this is the only chunk option that is not cached, i.e., changing it will not invalidate the cache.
Cache
cache: (FALSE; logical) whether to cache a code chunk; when evaluating code chunks, the cached chunks are skipped, but the objects created in these chunks are (lazy-) loaded from previously saved databases (.rdb and .rdx) files, and these files are saved when a chunk is evaluated for the first time, or when cached files are not found (e.g. you may have removed them by hand)
Plots
fig.path: ('figure/'; character) prefix to be used for figure filenames (fig.path and chunk labels are concatenated to make filenames); it may contain a directory like figure/prefix- (will be created if it does not exist); this path is relative to the current working directory
fig.width, fig.height: (both are 7; numeric) width and height of the plot, to be used in the graphics device (in inches) and have to be numeric
dev: ('pdf' for LaTeX output and 'png' for HTML/markdown; character) the function name which will be used as a graphical device to record plots
An examplary chunk:
```{r global_options, include = FALSE}
knitr::opts_chunk$set(fig.width = 9, fig.height = 4, fig.path = "Figs/", dev = "svg",
echo = FALSE, warning = FALSE, message = FALSE,
cache = FALSE, tidy = FALSE, size = "small")
```

Include same chunk twice with different paramters

I have a long .Rnw document which consists mostly of text (typeset in LaTeX) with a few chunks here and there. I have also written a chunk which outputs a specific figure. The figure contains a plot, the values for the plot are currently read from a .csv file and some parameters like colors defined manually within the chunk.
Now I want to have the same figure in a different place in the document, but with different values for the plot and a few other parameters different. Ideally, I would like to include the chunk as a child twice, and pass parameters to it somehow, including the name of the .csv to be used for the plot values. I would hate to copy paste the chunk code with hardcoded parameters, as it is complex enough that potential changes will be difficult to synchronize.
How can I do such "parameterized reuse" of chunks?
update
As requested, a small example
This is saved as include-chunk-reuse.Rnw
<<toReuse, echo=FALSE, result='asis'>>=
l <- 25
#
\newlength{\mylength}
\setlength{\mylength}{\Sexpr{l}pt}
%Omitted: a lot of complicated LaTeX commands
\rule{\mylength}{1pt}
This is the document which is supposed to reuse the chunk. It doesn't even compile, as it complains that the same label is used twice: Error in parse_block(g[-1], g[1], params.src) : duplicate label 'toReuse'
\documentclass{article}
\begin{document}
This is some text. And now comes a 25 pt wide line.
<<first-figure, child='include-chunk-reuse.Rnw'>>=
#
This is some text. The next line is also 25 pt wide. But I would like to call the chunk in a way which makes it 50 pt wide instead.
<<second-figure, child='include-chunk-reuse.Rnw'>>=
#
\end{document}
For the knitr part to work simply leave out the chunk-name in the child document, then you don't have the duplicated label and the knitr part works.
Passing Parameters does not really work as far as I know, but you can just set a global variable before including the child. (For example \Sexpr{l <- 200}
You are still redefining \mylength which is why LaTeX will throw an error, so move the first definition of \mylength from the child to the main document.
The example below demonstrates two ways to reuse and parametrize a chunk.
Reusing Chunks
The mechanism is explained here. Basically, the simplest way to reuse a chunk is to add another empty chunk with the same label. Alternatively, the chunk option ref.label lets a chunk inherit another chunks code.
Both approaches of reusing chunks are largely equivalent – with one exception: figures generated in chunks are saved as chunklabel-i.pdf, where i is the figure index counted by chunk. Therefore, if a chunk is reused by repeating its label, figure i from the second use will overwrite figure i from the first use. This is the reason why I use ref.label (and thus distinct chunk labels) in the example below (otherwise, the points on both plots would be green).
In the example below, I used eval = FALSE in order to prevent evaluation of the masterchunk where it is defined. An alternative would be to externalize the chunk and read it by read_chunk().
Parameterizing Chunks
The two most straightforward options to "pass" parameters to a chunk are
chunk options and
global variables
Also when reusing chunks, each use can set different chunk options. The example below exploits this to set different captions.
As all chunks run in the same environment, setting a variable in an early chunk affects subsequent chunks accessing this variable. In the example below, mycolor is modified this way.
\documentclass{article}
\begin{document}
<<masterchunk, eval = FALSE>>=
plot(1:10, col = mycolor)
#
<<config1>>=
mycolor <- "red"
#
<<use1, ref.label = "masterchunk", fig.cap = "Red dots">>=
#
<<config2>>=
mycolor <- "green"
#
<<use2, ref.label = "masterchunk", fig.cap = "Green dots">>=
#
\end{document}

knitr chunks within .bib file

I am using the LaTeX package problems to create solution sets from a database of problems. The database is structured like a bibliography database, in a .bib file. The whole system works beautifully for regular LaTeX, but now some of my solutions have R code (in knitr chunks) in them.
The default sequence of knitting/TeXing/BibTeXing in RStudio is not working-- the code ends up in the document verbatim, along with mangled versions of the chunk delimiters. So, I need to find the right workflow of steps to ensure that the code makes it through.
This package seems to be very set on having two files, one for the database and one for the .tex/.rnw, so I can't make a working example, but something like this:
\documentclass{article}
\usepackage[solution]{problems}
\Database{
#QUESTION{1.1,
problem = {1.1},
solution = {This solution only uses TeX, like $\bar{x}$. It works fine.}}
#QUESTION{1.2,
problem = {1.2},
solution = {This solution includes code
<<>>=
head(iris)
#
It does not work.
}}}
\begin{document}
Problems for this week were 1.1 and 1.2
\problems{1.1}
\problem{1.2}
\end{document}
You will have to knit the .bib file first and then run LaTeX and BibTeX.
While you usually have a .Rnw file that is knitted to .tex and then let LaTeX tools process the .tex and the .bib file you will have to start with a (let's call it) .Rbib file that is knitted to .bib and then processed by LaTeX.
For simplicity, I give the file I called .Rbib above the name bibliography.Rnw but you can choose any extension you like. I chose .Rnw because the syntax used inside is the same as in .Rnw files.
As dummy entries for the bib-file I use data from verbosus.com and added some knitr code.
The first chunk sets global chunk options to prevent knitr from adding the code of the chunks or any markup to the output file. The next chunk shows how for example the title field could be filled with generated content and the \Sexpr{} part is an example how this could be used to add some dynamic text.
<<setup>>=
library(knitr)
opts_knit$set(
echo = FALSE,
results = "asis"
)
#
article{article,
author = {Peter Adams},
title = {
<<echo=FALSE, results = "asis">>=
cat("The title of the work")
#
},
journal = {"The title of the journal"},
year = 1993,
number = 2,
pages = {201-213},
month = 7,
note = {An optional note},
volume = 4
}
#book{book,
author = {Peter Babington},
title = {\Sexpr{"The title of the work"},
publisher = {The name of the publisher},
year = 1993,
volume = 4,
series = 10,
address = {The address},
edition = 3,
month = 7,
note = {An optional note},
isbn = {3257227892}
}
It is important to have the chunk option results = "asis" and to use cat() instead of print(). Otherwise, there would be unwanted characters in the output.
If this is saved as bibliography.Rnw the following is enough to get a .bib file with the chunks evaluated:
knit(input = "bibliography.Rnw", output = "bibliography.bib")
After that only standard LaTeX compilation remains.

A Way in Knitr to Copy a Chunk?

Knitr Mavens,
Background: Using knitr to report a report with many embedded graphs. In the body of the report, all that's appropriate is the graph, not the code.
For example:
```{r graph_XYZ_subset, echo = FALSE, message = TRUE,
fig.cap = "Text that explains the graph"}
graph.subset <- ggplot() + ...
```
This part works just fine.
However, there is a need to display the key parts of the code (e.g., key statistical analyses and key graph generations)...but in an Addendum.
Which leads to this question: is there a way to copy a knitr chunk from the early parts of a script to a later part?
To ensure accuracy, it's ideal that the code in the Addendum list (display) all the code that was actually executed in the report.
For example:
# ADDENDUM - Code Snippets
### Code to Generate Subset Graph
\\SOMEHOW COPY the code from graph_XYZ_subset to here without executing it.
### Code to Compute the Mean of Means of the NN Factors
\\Copy another knitr chunk which computes the mean of means, etc.
### And So On...
\\Copy chunks till done
* * * * * * * *
Any ideas? Is there a way in knitr to perform these types of chunk copies?
There are several options, four of them listet and shortly explained below. Yihui's explanations in How to reuse chunks might also help.
\documentclass{article}
\begin{document}
\section{Output}
<<mychunk, echo = FALSE>>=
print("Hello World!")
#
\section{Source code}
Option 1: Use an empty chunk with the same label.
<<mychunk, eval = FALSE>>=
#
Option 2: Embed in other chunk (no advantage in this case). Note that there is no equality sign and no at for the inner chunk.
<<myOtherChunk, eval = FALSE>>=
<<mychunk>>
#
Option 3: Use \texttt{ref.label}.
<<ref.label = "mychunk", eval = FALSE>>=
#
Option 4: Define the chunk in an external file you read in using \texttt{read\_chunk}. Then use Option 1--3 to execute the chunk (with \texttt{eval = TRUE}; default) or show it's code (with \texttt{eval = FALSE}).
\end{document}
I usually prefer Option 4. This allows you to separate the programming logic from writing the document.
At the place mychunk is to be exectued and the graph will appear in the PDF, you only have <<mychunk>>= in your Rnw file and don't have to bother with all the code that generates your graph. Developing your code is also easier, because in an interactive session you have all your code at one spot and don't have to scroll through all the text of the report when going from one chunk to the next one.
EDIT:
The options mentioned above have in common that you need to manually maintain a list of the chunks to show in the appendix. Here two options to avoid this; unfortunately, both have some drawbacks:
Option 1: Automatically create a list of all chunks that have been executed and show their code.
This can be achieved using a chunk hook that registers all chunk names. Include the following chunk before all other chunks in the document:
<<echo = FALSE>>=
library(knitr)
myChunkList <- c()
listChunks <- function(before, options, envir) {
if (before) {
myChunkList <<- c(myChunkList, options$label)
}
return(NULL)
}
knit_hooks$set(recordLabel = listChunks) # register the new hook
opts_chunk$set(recordLabel = TRUE) # enable the new hook by default
#
Where you want to show the code (for example in the appendix), insert the following chunk:
<<showCode, ref.label = unique(myChunkList), eval = FALSE>>=
#
Unfortunately, there will be no margin or any other visual separation between the chunks.
Option 2: Using the chunk hook is not always necessary because there is the function all_labels() that returns a list of all chunk labels. However, there might be chunks in your file that don't get executed and you probably don't want to see their code. Moreover, option 1 allows skipping certain chunks simply by setting recordLabel = FALSE in their chunk options.

Resources