purl() within knit() duplicate label error - r

I am knitting a .Rmd file and want to have two outputs: the html and a purl'ed R script each time I run knit. This can be done with the following Rmd file:
---
title: "Purl MWE"
output: html_document
---
```{r}
## This chunk automatically generates a text .R version of this script when running within knitr.
input = knitr::current_input() # filename of input document
output = paste(tools::file_path_sans_ext(input), 'R', sep = '.')
knitr::purl(input,output,documentation=1,quiet=T)
```
```{r}
x=1
x
```
If you do not name the chunk, it works fine and you get html and .R output each time you run knit() (or click knit in RStudio).
However, if you name the chunk it fails. For example:
title: "Purl MWE"
output: html_document
---
```{r}
## This chunk automatically generates a text .R version of this script when running within knitr.
input = knitr::current_input() # filename of input document
output = paste(tools::file_path_sans_ext(input), 'R', sep = '.')
knitr::purl(input,output,documentation=1,quiet=T)
```
```{r test}
x=1
x
```
It fails with:
Quitting from lines 7-14 (Purl.Rmd)
Error in parse_block(g[-1], g[1], params.src) : duplicate label 'test'
Calls: <Anonymous> ... process_file -> split_file -> lapply -> FUN -> parse_block
Execution halted
If you comment out the purl() call, it will work with the named chunk. So there is something about how the purl() call is also naming chunks which causes knit() to think there are duplicate chunk names even when there are no duplicates.
Is there a way to include a purl() command inside a .Rmd file so both outputs (html and R) are produced? Or is there a better way to do this? My ultimate goal is to use the new rmarkdown::render_site() to build a website that updates the HTML and R output each time the site is compiled.

You can allow duplicate labels by including options(knitr.duplicate.label = 'allow') within the file as follows:
title: "Purl MWE"
output: html_document
---
```{r GlobalOptions}
options(knitr.duplicate.label = 'allow')
```
```{r}
## This chunk automatically generates a text .R version of this script when running within knitr.
input = knitr::current_input() # filename of input document
output = paste(tools::file_path_sans_ext(input), 'R', sep = '.')
knitr::purl(input,output,documentation=1,quiet=T)
```
```{r test}
x=1
x
```
This code isn't documented on the knitr website, but you can keep track with the latest changes direct from Github: https://github.com/yihui/knitr/blob/master/NEWS.md

A related approach to #ruaridhw solution would be to wrap the knitr::purl() in callr::r(). See function below that saves the R chunks from a specified R markdown file to a temporary .R file:
# RMD to local R temp file
# inspiration: https://gist.github.com/noamross/a549ee50e8a4fd68b8b1
rmd_chunks_to_r_temp <- function(file){
temp <- tempfile(fileext=".R")
# needed callr so can use when knitting -- else can bump into "duplicate chunk
# label" errors when running when knitting
callr::r(function(file, temp){
knitr::purl(file, output = temp)
},
args = list(file, temp))
}
This function also exists in funspotr:::rmd_chunks_to_r_temp() at brshallo/funspotr.

You can avoid this error with a bash chunk that calls purl in a separate R session. That way there's no need to allow duplicate labels.
An example use case is an Rmd file where the code is run (and not echo'd) throughout the report and then all the code chunks are shown with chunks names and code comments in an Appendix. If you don't require that additional functionality then you would only need up until the bash chunk.
The idea is that report_end signifies where to stop purl such that the appendix code isn't considered "report code". Then read_chunk reads the entire R file into one code chunk which can then be echo'd with syntax highlighting if required.
---
title: "Purl MWE"
output: html_document
---
These code chunks are used in the background of the report however
their source is not shown until the Appendix.
```{r test1, echo=FALSE}
x <- 1
x
```
```{r test2, echo=FALSE}
x <- x + 1
x
```
```{r test3, echo=FALSE}
x <- x + 1
x
```
# Appendix
```{r, eval=TRUE}
report_end <- "^# Appendix"
temp <- tempfile(fileext = ".R")
Sys.setenv(PURL_IN = shQuote("this_file.Rmd"), # eg. knitr::current_input()
PURL_OUT = shQuote(temp),
PURL_END = shQuote(report_end))
```
```{bash, include=FALSE}
Rscript -e "lines <- readLines($PURL_IN, warn = FALSE)" \
-e "knitr::purl(text = lines[1:grep($PURL_END, lines)], output = $PURL_OUT, documentation = 1L)"
```
```{r, include=FALSE}
knitr::read_chunk(temp, labels = "appendix")
unlink(temp)
```
```{r appendix, eval=FALSE, echo=TRUE}
```

Related

Run RMarkdown (.rmd) from inside another .rmd without creating HTML output

I am making my code more modular and would like to run multiple RMarkdown files from one overall RMarkdown. I believe I could do this if I translated all my RMarkdown files to .R scripts and used source(), but I like the document-like nature of RMarkdown and I can describe what I'm doing as I'm doing it in plain text.
The goal is to wrangle data and export a usable .sav file. I want to run clean.rmd from run.rmd, but I don't want any HTML/pdf/etc. output. Removing the output line in the YAML header doesn't prevent output. If there is a way to do this without translating everything to .R scripts, I would be very appreciative. Thank you.
clean.rmd: Script that does the cleaning
---
title: "clean"
author: "jrcalabrese"
date: "12/30/2021"
output: html_document
---
```{r}
library(tidyverse)
library(haven)
```
```{r}
data(cars)
cars <- cars %>%
mutate(newvar = speed + dist)
```
```{r}
write_spss(cars, "~/Documents/cars_new.sav", compress = FALSE)
```
run.rmd: Script that runs clean.rmd
---
title: "run"
author: "jrcalabrese"
date: "12/30/2021"
output: html_document
---
```{r}
rmarkdown::render("~/Documents/clean.rmd")
```
Thank you for your help! This function works:
---
title: "run"
author: "jrcalabrese"
date: "12/30/2021"
#output: html_document
---
```{r}
source_rmd = function(file, ...) {
tmp_file = tempfile(fileext=".R")
on.exit(unlink(tmp_file), add = TRUE)
knitr::purl(file, output=tmp_file)
source(file = tmp_file, ...)
}
```
```{r}
source_rmd("~/Documents/clean.rmd")
```

Run selected chunks from one Rmd in another

I've run my analyses in a source Rmd file and would like to knit a clean version from a final Rmd file using only a few of the chunks from the source. I've seen a few answers with regard to pulling all of the chunks from a source Rmd in Source code from Rmd file within another Rmd and How to source R Markdown file like `source('myfile.r')`?. I share the concern with these posts in that I don't want to port out a separate .R file, which seems to be the only way that read_chunk works.
I think I'm at the point where I can import the source Rmd, but now I'm not sure how to call specific chunks from it in the final Rmd. Here's a reproducible example:
SourceCode.Rmd
---
title: "Source Code"
output:
pdf_document:
latex_engine: xelatex
---
```{r}
# Load libraries
library(knitr) # Create tables
library(kableExtra) # Table formatting
# Create a dataframe
df <- data.frame(x = 1:10,
y = 11:20,
z = 21:30)
```
Some explanatory text
```{r table1}
# Potentially big block of stuff I don't want to have to copy/paste
# But I want it in the final document
kable(df, booktabs=TRUE,
caption="Big long title for whatever") %>%
kable_styling(latex_options=c("striped","HOLD_position")) %>%
column_spec(1, width="5cm") %>%
column_spec(2, width="2cm") %>%
column_spec(3, width="3cm")
```
[Some other text, plus a bunch of other chunks I don't need for anyone to see in the clean version.]
```{r}
save(df, file="Source.Rdata")
```
FinalDoc.Rmd
---
title: "Final Doc"
output:
pdf_document:
latex_engine: xelatex
---
```{r setup, include=FALSE}
# Load libraries and data
library(knitr) # Create tables
library(kableExtra) # Table formatting
opts_chunk$set(echo = FALSE)
load("Source.Rdata")
```
As far as I can tell, this is likely the best way to load up SourceCode.Rmd (from the first linked source above):
```{r}
options(knitr.duplicate.label = 'allow')
source_rmd2 <- function(file, local = FALSE, ...){
options(knitr.duplicate.label = 'allow')
tempR <- tempfile(tmpdir = ".", fileext = ".R")
on.exit(unlink(tempR))
knitr::purl(file, output=tempR, quiet = TRUE)
envir <- globalenv()
source(tempR, local = envir, ...)
}
source_rmd2("SourceCode.Rmd")
```
At this point, I'm at a loss as to how to call the specific chunk table1 from SourceCode.Rmd. I've tried the following as per instructions here with no success:
```{r table1}
```
```{r}
<<table1>>
```
The first seems to do nothing, and the second throws an unexpected input in "<<" error.
I wrote a function source_rmd_chunks() that sources chunk(s) by label name. See gist.

Save Rscript in Rmd format

I have R script.
mydat=read.csv("C:/Users/Admin/Downloads/test.csv", sep=";",dec=",")
View(mydat)
str(mydat)
#deleted after FS
mydat$symboling.<-NULL
mydat$make.<-NULL
mydat$num.of.cylinders.<-NULL
mydat$fuel.type.<-NULL
mydat$aspiration.<-NULL
mydat$num.of.cylinders.<-NULL
#this vars have small num. of obs.
mydat$engine.type.<-NULL
mydat$engine.location.<-NULL
mydat$num.of.doors.<-NULL
mydat=na.omit(mydat)
#Feature Selection
FS=Boruta(normalized.losses.~.,data=mydat)
getSelectedAttributes(FS, withTentative = F)
plot(FS, cex.axis=0.5)
#get scatterplot
scatter.smooth(x=mydat$length.,y=mydat$normalized.losses.,main="normalized losse~length")
#split sample on train and sample
index <- sample(1:nrow(mydat),round(0.70*nrow(mydat)))
train <- mydat[index,]
test <- mydat[-index,]
I have to save it in Rmarkdown format (html).
Of course in Rstudio i can do that:
file-new file-rmarkdown-HTML
and i get this script
```{r cars}
summary(cars)
```
I don't want manually write this prefix ```{r}.
Is it possible to make that those parts of the code that are separated by comments
#
#
were saved in rmarkdown format?
In output i expect for example
```{r}
mydat$symboling.<-NULL
mydat$make.<-NULL
mydat$num.of.cylinders.<-NULL
mydat$fuel.type.<-NULL
mydat$aspiration.<-NULL
mydat$num.of.cylinders.<-NULL
```
You can use the spin() function from the knitr package.
It will produce an .md file (but you can keep the intermediate .Rmd with the precious = TRUE argument), using the '#' chracter as the 'documentation argument:
doc
A regular expression to identify the documentation lines; by default it follows the roxygen convention, but it can be customized, e.g. if you want to use ## to denote documentation, you can use '^##\s*'.
For example:
spin('test.R', precious = TRUE, doc = '#')
produces:
test.Rmd
```{r }
mydat=read.csv("C:/Users/Admin/Downloads/test.csv", sep=";",dec=",")
View(mydat)
str(mydat)
```
deleted after FS
```{r }
mydat$symboling.<-NULL
mydat$make.<-NULL
mydat$num.of.cylinders.<-NULL
mydat$fuel.type.<-NULL
mydat$aspiration.<-NULL
mydat$num.of.cylinders.<-NULL
```
this vars have small num. of obs.
...
test.md
```r
mydat=read.csv("C:/Users/Admin/Downloads/test.csv", sep=";",dec=",")
```
```
## Warning in file(file, "rt"): cannot open file 'C:/Users/Admin/Downloads/
## test.csv': No such file or directory
```
```
## Error in file(file, "rt"): cannot open the connection
```
```r
View(mydat)
```
...
You may also have a look at the stitch() function and sibilings (stitch_rhtml and stitch_rmd), have a look here

Chopped up output using message() in an R Markdown / knitr report

I recently submitted a package to rOpenSci, and they prefer the use of message() rather than cat() for user-side console output. When I made the switch for my package, I noticed a disconcerting change in the formatting of the rendered vignettes. I have reproduced the problem in the following R Markdown report.
---
title: "MWE"
author: "Will Landau"
date: "11/20/2017"
output: html_document
---
```{r testcat}
for(x in LETTERS[1:3]){
cat(x, "\n")
}
```
```{r testmessage}
for(x in LETTERS[1:3]){
message(x)
}
```
```{r testmessage2}
for(x in LETTERS[1:3]){
message(x, "\n", appendLF = FALSE)
}
```
For the first code chunk, I get the desired output: all three lines string together in a single gray box.
## A
## B
## C
But for the second and third chunks, each line is given its own separate gray box.
## A
.
## B
.
## C
How do I keep using message() without chopping up the knitr output like this?
I think I solved it: knitr has a collapse chunk option. All I needed was to put this chunk before any of the other chunks.
```{r setup}
knitr::opts_chunk$set(collapse = TRUE)
```
The output is more condensed than I expected, but after some touching up, the formatting actually looks much better now.
You can try and build the string and put the message function outside the loop, like this:
```{r testmessage}
single_message <- c()
for(x in LETTERS[1:3]){
single_message <- paste(single_message , x, sep = "\n")
}
message(single_message )
```
Note that this example does add a newline at the start, you can prevent this with an extra if, or use the first element outside the loop to initialize single_message.

Rmarkdown Chunk Name from Variable

How can I use a variable as the chunk name? I have a child document which gets called a number of times, and I need to advance the chunk labels in such a manner than I can also cross reference them.
Something like this:
child.Rmd
```{r }
if(!exists('existing')) existing <- 0
existing = existing + 1
myChunk <- sprintf("myChunk-%s",existing)
```
## Analysis Routine `r existing`
```{r myChunk,echo = FALSE}
#DO SOMETHING, LIKE PLOT
```
master.Rmd
# Analysis Routines
Analysis for this can be seen in figures \ref{myChunk-1}, \ref{myChunk-2} and \ref{myChunk-3}
```{r child = 'child.Rmd'}
```
```{r child = 'child.Rmd'}
```
```{r child = 'child.Rmd'}
```
EDIT POTENTIAL SOLUTION
Here is one potential workaround, inspired by SQL injection of all things...
child.Rmd
```{r }
if(!exists('existing')) existing <- 0
existing = existing + 1
myChunk <- sprintf("myChunk-%s",existing)
```
## Analysis Routine `r existing`
```{r myChunk,echo = FALSE,fig.cap=sprintf("The Caption}\\label{%s",myChunk)}
#DO SOMETHING, LIKE PLOT
```
A suggestion to preknit the Rmd file into another Rmd file before knitting&rendering as follows
master.Rmd:
# Analysis Routines
Analysis for this can be seen in figures `r paste(paste0("\\ref{", CHUNK_NAME, 1:NUM_CHUNKS, "}"), collapse=", ")`
###
rmdTxt <- unlist(lapply(1:NUM_CHUNKS, function(n) {
c(paste0("## Analysis Routine ", n),
paste0("```{r ",CHUNK_NAME, n, ", child = 'child.Rmd'}"),
"```")
}))
writeLines(rmdTxt)
###
child.Rmd:
```{r,echo = FALSE}
plot(rnorm(100))
```
To knit & render the Rmd:
devtools::install_github("chinsoon12/PreKnitPostHTMLRender")
library(PreKnitPostHTMLRender) #requires version >= 0.1.1
NUM_CHUNKS <- 5
CHUNK_NAME <- "myChunk-"
preknit_knit_render_postrender("master.Rmd", "test__test.html")
Hope it helps. Cheers!
If you're getting to this level of complexity, I suggest you look at the brew package.
That provides a templating engine where you can dynamically create the Rmd for knitting.
You get to reference R variables in the outer brew environment, and build you dynamic Rmd from there.
Dynamic chunk names are possible with knitr::knit_expand(). Arguments are referenced in the child document, including in the chunk headers, using {{arg_name}}.
So my parent doc contains:
```{r child_include, results = "asis"}
###
# Generate a section for each dataset
###
species <- c("a", "b")
out <- lapply(species, function(sp) knitr::knit_expand("child.Rmd"))
res = knitr::knit_child(text = unlist(out), quiet = TRUE)
cat(res, sep = "\n")
```
And my child doc, which has no YAML header, contains:
# EDA for species {{sp}}
```{r getname-{{sp}}}
paste("The species is", "{{sp}}")
```
See here in the RMarkdown cookbook.

Resources