R Markdown: Always Run Certain Chunks, Skips Others - r

I need to run a repetitive code on multiple data sets. I like to do this in R Markdown files because the drop-down headers make it easier to organize and navigate my code. I rarely knit these files but instead run specific code chunks.
Some variables are the same across datasets: packages to load, a custom function, master csv file, etc. I prefer to include these common elements in a separate code chunk at the top of the rmd file. This facilitates simple modifications if needed, instead of needing to modify the same code within multiple chunks.
In my example below, when I run the Dataset 1 code chunk, I want it to first run the three chunks under the #Setup header and then run the Dataset 1 Chunk. Dataset 2 Chunk is not run.
Similarly, when I run Dataset 2 Chunk, I want it to first run #Setup chunks followed by Dataset 2 Chunk. Dataset 1 is not run.
# Setup
{r Setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,tidy.opts=list(width.cutoff=90),tidy=TRUE)
{r Packages, message=FALSE, warning=FALSE}
rm(list = ls()); invisible(gc()) #clear workspace and perform garbage collection to free up memory.
suppressPackageStartupMessages(
{ library(tidyverse)
library(readxl)
library(ggplot2)
library(rtracklayer)
library(trackViewer)
library(ggplot2)
}
)
# Specific Analyses
## Dataset 1
{r Dataset 1 Code, message = FALSE}
dataset1 <- read_excel("~/Desktop/Dataset1.xlsx, col_name=TRUE)
## Dataset 2
{r Dataset 2 Code, message = FALSE}
dataset2 <- read_excel("~/Desktop/Dataset2.xlsx, col_name=TRUE)

I would do it by putting the setup code in a function and calling that function at the start of each analysis chunk. If setup is slow, you could add a check to it so it only runs once per session (but then watch out if you change the setup data).
For example, replace your Setup and Packages chunk with this:
```{r SetupFunction, include=FALSE}
Setup <- function() {
knitr::opts_chunk$set(echo = TRUE,
tidy.opts=list(width.cutoff=90),
tidy=TRUE)
#clear workspace and perform garbage collection to free up memory, but keep this function
removals <- ls(globalenv())
removals <- removals[removals != "Setup"]
rm(list = removals, pos = globalenv())
gc()
# Make sure packages are loaded
suppressPackageStartupMessages(
{ library(tidyverse)
library(readxl)
library(ggplot2)
library(rtracklayer)
library(trackViewer)
library(ggplot2)
}
)
# Define a function. Use `<<-` so it is available globally
newfn <<- function(...) {
print("this is newfn")
}
}
```
Then at the start of each analysis chunk, just call Setup().
The main weakness I see is that the search() list isn't being cleaned up, so if any of your analysis chunks attach variables or packages, it will keep those in other chunks as well. You could fix this by saving the search() value and using detach() later to clean it up, but it's probably not needed. You shouldn't ever use attach(), and to be consistent with the way you have this document set up, you should be putting all your library() calls in the Setup() function.

Related

kable all tables in markdown report [duplicate]

Using knitr and R Markdown, I can produce a tabularised output from a matrix using the following command:
```{r results='asis'}
kable(head(x))
```
However, I’m searching for a way to make the kable code implicit since I don’t want to clutter the echoed code with it. Essentially, I want this:
```{r table=TRUE}
head(x)
```
… to yield a formatted tabular (rather than the normal output='markdown') output.
I actually thought this must be pretty straightforward since it’s a pretty obvious requirement, but I cannot find any way to achieve this, either via the documentation or on the web.
My approach to create an output hook fails because once the data arrives at the hook, it’s already formatted and no longer the raw data. Even when specifying results='asis', the hook obtains the output as a character string and not as a matrix. Here’s what I’ve tried:
default_output_hook <- knit_hooks$get('output')
knit_hooks$set(output = function (x, options)
if (! is.null(options$table))
kable(x)
else
default_output_hook(x, options)
)
But like I said, this fails since x is not the original matrix but rather a character string, and it doesn’t matter which value for the results option I specify.
Nowadays one can set df_print in the YAML header:
---
output:
html_document:
df_print: kable
---
```{r}
head(iris)
```
I think other answers are from a time when the following didn't work, but now we can just do :
```{r results='asis', render=pander::pander}
head(x)
```
Or set this for all chunks in the setup chunk, for instance :
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, render=pander::pander)
```
Lacking a better solution I’m currently re-parsing the character string representation that I receive in the hook. I’m posting it here since it kind of works. However, parsing a data frame’s string representation is never perfect. I haven’t tried the following with anything but my own data and I fully expect it to break on some common use-cases.
reparse <- function (data, comment, ...) {
# Remove leading comments
data <- gsub(sprintf('(^|\n)%s ', comment), '\\1', data)
# Read into data frame
read.table(text = data, header = TRUE, ...)
}
default_output_hook <- knit_hooks$get('output')
knit_hooks$set(output = function (x, options)
if (is.null(options$table))
default_output_hook(x, options)
else {
extra_opts <- if (is.list(options$table)) options$table else list()
paste(kable(do.call(reparse, c(x, options$comment, extra_opts))),
collapse = '\n')
}
)
This will break if the R markdown comment option is set to a character sequence containing a regular expression special char (e.g. *), because R doesn’t seem to have an obvious means of escaping a regular expression.
Here’s a usage example:
```{r table=TRUE}
data.frame(A=1:3, B=4:6)
```
You can pass extra arguments to the deparse function. This is necessary e.g. when the table contains NA values because read.table by default interprets them as strings:
```{r table=list(colClasses=c('numeric', 'numeric'))}
data.frame(A=c(1, 2, NA, 3), B=c(4:6, NA))
```
Far from perfect, but at least it works (for many cases).
Not exactly what you are looking for, but I am posting an answer here (that could not fit in a comment) as your described workflow is really similar to what my initial goal and use-case was when I started to work on my pander package. Although I really like the bunch of chunk options that are available in knitr, I wanted to have an engine that makes creating documents really easy, automatic and without any needed tweaks. I am aware of the fact that knitr hooks are really powerful, but I just wanted to set a few things in my Rprofile and let the literate programming tool its job without further trouble, that ended to be Pandoc.brew for me.
The main idea is to specify a few options (what markdown flavour are you using, what's your decimal mark, favorite colors for your charts etc), then simply write your report in a brew syntax without any chunk options, and the results of your code would be automatically transformed to markdown. Then convert that to pdf/docx/odt etc. with Pandoc.

How can I make the output of my function (several ggplot2 graphs) an html file (displaying those graphs)?

I'm writing a personal use package which trains/tests models, and finally runs a myriad of LIME and DALEX explanations on them. I save these as their own ggplot2 objects (say lime_plot_1), and at the end of the function these are all returned to the global environment.
However, what I would like to have happen is that, at the end of the function, not only would I have these graphs in the environment but a small html report would also be rendered - containing all the graphs that were made.
I would like to point out that while I do know I could do this by simply using the function within an Rmarkdown or Rnotebook, I would like to avoid that as I plan on using it as an .R script to streamline the whole process (since I'll be running this with a certain frequency), and from my experience running big chunks in .Rmd tends to crash R.
Ideally, I'd have something like this:
s_plot <- function(...){
1. constructs LIME explanations
2. constructs DALEX explanations
3. saves explanations as ggplot2 objects, and list them under graphs_list
4. render graphs_list as an html file
}
1, 2, and 3 all work but I haven't found a way to tackle 4. that doesn't include doing the whole process in a .Rmd file.
EDIT: Thanks to #Richard Telford's and #Axeman's comments, I figured it out. Below is the function:
s_render <- function(graphs_list = graphs_list, meta = NULL, cacheable = NA){
currentDate <- Sys.Date()
rmd_file <- paste("/path/to/folder",currentDate,"/report.Rmd", sep="")
file.create(rmd_file)
graphs_list <- c(roc_plot, prc_plot, mp_boxplot, vi_plot, corr_plot)
c(Yaml file headers here, just like in a regular .Rmd) %>% write_lines(rmd_file)
rmarkdown::render(rmd_file,
params = list(
output_file = html_document(),
output_dir = rmd_file))}
First, create a simple Rmarkdown file, that takes a parameter. The only objective of this file is to create the report. You can for instance pass a file name:
---
title: "test"
author: "Axeman"
date: "24/06/2019"
output: html_document
params:
file: 'test.RDS'
---
```{r}
plot_list <- readRDS(params$file)
lapply(plot_list, print)
```
I saved this as test.Rmd.
Then in your main script, write the plot list to a temporary file on disk, and pass the file name to your markdown report:
library(ggplot2)
plot_list <- list(
qplot(1:10, 1:10),
qplot(1:10)
)
file <- tempfile()
saveRDS(plot_list, file)
rmarkdown::render('test.Rmd', params = list(file = file))
An .html file with the plots is now on your disk:

how to tell if code is executed within a knitr/rmarkdown context?

Based on some simple tests, interactive() is true when running code within rmarkdown::render() or knitr::knit2html(). That is, a simple .rmd file containing
```{r}
print(interactive())
```
gives an HTML file that reports TRUE.
Does anyone know of a test I can run within a code chunk that will determine whether it is being run "non-interactively", by which I mean "within knit2html() or render()"?
As Yihui suggested on github isTRUE(getOption('knitr.in.progress')) can be used to detect whether code is being knitted or executed interactively.
A simple suggestion for rolling your own: see if you can access current output format:
```{r, echo = FALSE}
is_inside_knitr = function() {
!is.null(knitr::opts_knit$get("out.format"))
}
```
```{r}
is_inside_knitr()
```
There are, of course, many things you could check--and this is not the intended use of these features, so it may not be the most robust solution.
I suspect (?) you might just need to roll your own.
If so, here's one approach which seems to perform just fine. It works by extracting the names of all of the functions in the call stack, and then checks whether any of them are named "knit2html" or "render". (Depending on how robust you need this to be, you could do some additional checking to make sure that these are really the functions in the knitr and rmarkdown packages, but the general idea would still be the same.)
```{r, echo=FALSE}
isNonInteractive <- function() {
ff <- sapply(sys.calls(), function(f) as.character(f[[1]]))
any(ff %in% c("knit2html", "render"))
}
```
```{r}
print(isNonInteractive())
```

A Way in Knitr to Copy a Chunk?

Knitr Mavens,
Background: Using knitr to report a report with many embedded graphs. In the body of the report, all that's appropriate is the graph, not the code.
For example:
```{r graph_XYZ_subset, echo = FALSE, message = TRUE,
fig.cap = "Text that explains the graph"}
graph.subset <- ggplot() + ...
```
This part works just fine.
However, there is a need to display the key parts of the code (e.g., key statistical analyses and key graph generations)...but in an Addendum.
Which leads to this question: is there a way to copy a knitr chunk from the early parts of a script to a later part?
To ensure accuracy, it's ideal that the code in the Addendum list (display) all the code that was actually executed in the report.
For example:
# ADDENDUM - Code Snippets
### Code to Generate Subset Graph
\\SOMEHOW COPY the code from graph_XYZ_subset to here without executing it.
### Code to Compute the Mean of Means of the NN Factors
\\Copy another knitr chunk which computes the mean of means, etc.
### And So On...
\\Copy chunks till done
* * * * * * * *
Any ideas? Is there a way in knitr to perform these types of chunk copies?
There are several options, four of them listet and shortly explained below. Yihui's explanations in How to reuse chunks might also help.
\documentclass{article}
\begin{document}
\section{Output}
<<mychunk, echo = FALSE>>=
print("Hello World!")
#
\section{Source code}
Option 1: Use an empty chunk with the same label.
<<mychunk, eval = FALSE>>=
#
Option 2: Embed in other chunk (no advantage in this case). Note that there is no equality sign and no at for the inner chunk.
<<myOtherChunk, eval = FALSE>>=
<<mychunk>>
#
Option 3: Use \texttt{ref.label}.
<<ref.label = "mychunk", eval = FALSE>>=
#
Option 4: Define the chunk in an external file you read in using \texttt{read\_chunk}. Then use Option 1--3 to execute the chunk (with \texttt{eval = TRUE}; default) or show it's code (with \texttt{eval = FALSE}).
\end{document}
I usually prefer Option 4. This allows you to separate the programming logic from writing the document.
At the place mychunk is to be exectued and the graph will appear in the PDF, you only have <<mychunk>>= in your Rnw file and don't have to bother with all the code that generates your graph. Developing your code is also easier, because in an interactive session you have all your code at one spot and don't have to scroll through all the text of the report when going from one chunk to the next one.
EDIT:
The options mentioned above have in common that you need to manually maintain a list of the chunks to show in the appendix. Here two options to avoid this; unfortunately, both have some drawbacks:
Option 1: Automatically create a list of all chunks that have been executed and show their code.
This can be achieved using a chunk hook that registers all chunk names. Include the following chunk before all other chunks in the document:
<<echo = FALSE>>=
library(knitr)
myChunkList <- c()
listChunks <- function(before, options, envir) {
if (before) {
myChunkList <<- c(myChunkList, options$label)
}
return(NULL)
}
knit_hooks$set(recordLabel = listChunks) # register the new hook
opts_chunk$set(recordLabel = TRUE) # enable the new hook by default
#
Where you want to show the code (for example in the appendix), insert the following chunk:
<<showCode, ref.label = unique(myChunkList), eval = FALSE>>=
#
Unfortunately, there will be no margin or any other visual separation between the chunks.
Option 2: Using the chunk hook is not always necessary because there is the function all_labels() that returns a list of all chunk labels. However, there might be chunks in your file that don't get executed and you probably don't want to see their code. Moreover, option 1 allows skipping certain chunks simply by setting recordLabel = FALSE in their chunk options.

Use hooks to format table in output

Using knitr and R Markdown, I can produce a tabularised output from a matrix using the following command:
```{r results='asis'}
kable(head(x))
```
However, I’m searching for a way to make the kable code implicit since I don’t want to clutter the echoed code with it. Essentially, I want this:
```{r table=TRUE}
head(x)
```
… to yield a formatted tabular (rather than the normal output='markdown') output.
I actually thought this must be pretty straightforward since it’s a pretty obvious requirement, but I cannot find any way to achieve this, either via the documentation or on the web.
My approach to create an output hook fails because once the data arrives at the hook, it’s already formatted and no longer the raw data. Even when specifying results='asis', the hook obtains the output as a character string and not as a matrix. Here’s what I’ve tried:
default_output_hook <- knit_hooks$get('output')
knit_hooks$set(output = function (x, options)
if (! is.null(options$table))
kable(x)
else
default_output_hook(x, options)
)
But like I said, this fails since x is not the original matrix but rather a character string, and it doesn’t matter which value for the results option I specify.
Nowadays one can set df_print in the YAML header:
---
output:
html_document:
df_print: kable
---
```{r}
head(iris)
```
I think other answers are from a time when the following didn't work, but now we can just do :
```{r results='asis', render=pander::pander}
head(x)
```
Or set this for all chunks in the setup chunk, for instance :
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, render=pander::pander)
```
Lacking a better solution I’m currently re-parsing the character string representation that I receive in the hook. I’m posting it here since it kind of works. However, parsing a data frame’s string representation is never perfect. I haven’t tried the following with anything but my own data and I fully expect it to break on some common use-cases.
reparse <- function (data, comment, ...) {
# Remove leading comments
data <- gsub(sprintf('(^|\n)%s ', comment), '\\1', data)
# Read into data frame
read.table(text = data, header = TRUE, ...)
}
default_output_hook <- knit_hooks$get('output')
knit_hooks$set(output = function (x, options)
if (is.null(options$table))
default_output_hook(x, options)
else {
extra_opts <- if (is.list(options$table)) options$table else list()
paste(kable(do.call(reparse, c(x, options$comment, extra_opts))),
collapse = '\n')
}
)
This will break if the R markdown comment option is set to a character sequence containing a regular expression special char (e.g. *), because R doesn’t seem to have an obvious means of escaping a regular expression.
Here’s a usage example:
```{r table=TRUE}
data.frame(A=1:3, B=4:6)
```
You can pass extra arguments to the deparse function. This is necessary e.g. when the table contains NA values because read.table by default interprets them as strings:
```{r table=list(colClasses=c('numeric', 'numeric'))}
data.frame(A=c(1, 2, NA, 3), B=c(4:6, NA))
```
Far from perfect, but at least it works (for many cases).
Not exactly what you are looking for, but I am posting an answer here (that could not fit in a comment) as your described workflow is really similar to what my initial goal and use-case was when I started to work on my pander package. Although I really like the bunch of chunk options that are available in knitr, I wanted to have an engine that makes creating documents really easy, automatic and without any needed tweaks. I am aware of the fact that knitr hooks are really powerful, but I just wanted to set a few things in my Rprofile and let the literate programming tool its job without further trouble, that ended to be Pandoc.brew for me.
The main idea is to specify a few options (what markdown flavour are you using, what's your decimal mark, favorite colors for your charts etc), then simply write your report in a brew syntax without any chunk options, and the results of your code would be automatically transformed to markdown. Then convert that to pdf/docx/odt etc. with Pandoc.

Resources