Dynamically building RMarkdown - r

I have a non-fixed List of *.Rmd files and want to dynamically render them into a single html File with RMarkdown.
Like this:
reportFiles <- list()
reportFiles[[1]] <- "F:\\report1.Rmd"
reportFiles[[2]] <- "F:\\report2.Rmd"
outputPath <- "F:\\report.html"
rmarkdown::render(input = reportFiles, output_file = outputPath)
But that doesnt work and i couldnt find a solution on how to do something like this. In all scenarios it either creates multiple files or you have to know what files yuo want to render beforehand or you have to create a temporary *.Rmd file.

One can combine multiple Rmd files into a single output document by modifying the code posted with the question.
First, the documents must be combined into a single Rmd before processing with rmarkdown::render(). Second, all files combined must take account of the following constraints.
Only the first Rmd file can contain document header information
Section labels must be unique across all Rmd files combined into a single Rmd for rendering.
The general approach is to read the files into a character vector, write the vector to a temporary Rmd file, then render the combined document.
library(rmarkdown)
# list of files to be combined
reports <- c("report1.Rmd","report2.Rmd")
# read the files & combine into a single character vector
theReports <- unlist(lapply(reports,readLines))
# use writeLines() to combine into single Rmd
tmpFile <- writeLines(theReports,"tmpReport.Rmd")
# render the combined document
render(input = "tmpReport.Rmd")
When rendered to an HTML document, the output looks like this:
Additional Considerations
We used a character vector instead of a list() to store the file names because the additional complexity of a list() was not needed to drive lapply() in this situation.
Use of a character vector allows the solution to be modified to potentially retrieve a list of files from a subdirectory with list.files(), as in:
reports <- list.files(path="./myReportDir/",
pattern="report[[:digit:]]+.Rmd",full.names=TRUE)
Also, one can segregate the header information into a file that contains only header info, such as report_header.Rmd.
Next, in order to automate retrieval of the files from a directory, one must ensure that the sort order of the file names matches the intended order of inclusion in the output document. H/T to Petr Kajzar for the regular expression to extract only files with a numbered report name from list.files().
Finally, also as suggested by Petr Kajzar in the comments, one can use a truly temporary file to drive rmarkdown::render() as follows.
tmpfile <- tempfile(fileext=".Rmd")
writeLines(theReports,tmpfile)
render(input = tmpfile)
Appendix
To make the example completely reproducible, we include the text of report1.Rmd and report2.Rmd. These files must be copied & saved to a local computer in order for the script above to read, write, and render them.
report1.Rmd
---
title: "report 1"
author: "lg"
date: "7/6/2020"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
report2.Rmd
Notice that the content in the second report conforms to the two constraints listed above.
## Report number 2
This is some text for the second markdown document. Considerations to make concatenation of multiple Rmd files into a single output document work:
1. Files 2 thru N must not have Rmd header information
2. Files that are combined into a single Rmd must not have duplicate section labels
```{r cars2}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure2, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Related

r-exams Questions about the same data on 2 separate xxx.Rmd files

Using R exams, I am developing a pdf exam with several questions (hence several Rmd files) but the questions are connected and would use a dataset created in the first question file. Questions would not be amenable to a cloze format.
Is there a way to write the exercises so that the second exercise can access the data generated by the first exercise ?
The easiest solution is to use a shared environment across the different exercises, in the simplest case the .GlobalEnv. Then you can simply do
exams2pdf(c("ex1.Rmd", "ex2.Rmd"), envir = .GlobalEnv)
and then both exercises will create their variables in the global environment and can re-use existing variables from there. Instead of the .GlobalEnv you can also create myenv <- new.env() and use envir = myenv.
For Rnw (as opposed to Rmd) exercises, it is not necessary to set this option because Sweave() Rnw exercises are always processed in the current environment anyway.
Note that these approaches only work for those exams2xyz() interfaces, where the n-th random draw from each exercise can be assured to end up together in the n-the exam. This is the case for PDF output but not for many of the learning management system outputs (Moodle, Canvas, etc.). See: Sharing a random CSV data set across exercises with exams2moodle()
Is it an option to save the data you need to disk in one Rmd file
```{r, echo=FALSE}
saveRDS(df, "my_stored_data.rds")
```
and then load it in the other one
```{r, echo=FALSE}
readRDS(df, "my_stored_data.rds")
```
Another option could be to knit the Rmd files from an R script and then knit them from this R script. If you do that, the Rmd files use the environment of the R script (!) instead of creating their own. Hence you can use the same objects (and therefore of course let one Rmd script store the data, while the other uses it as input.
In this thread: Create sections through a loop with knitr
there is a post from me about doing this. It's basically this:
The first Rmd file:
---
title: "Script 1"
output: html_document
---
```{r setup, include=FALSE}
a_data_frame_created_in_script_1 <- mtcars
```
saved as rmd_test.Rmd
The second one:
---
title: "Script 1"
output: html_document
---
```{r setup}
a_data_frame_created_in_script_1
```
saved as rmd_test_2.Rmd.
And then you have an R-script that does this:
rmarkdown::render("rmd_test.Rmd", output_file = "rmd_test.html")
rmarkdown::render("rmd_test_2.Rmd", output_file = "rmd_test_2.html")

Modularized R markdown structure

There are a few questions about this already, but they are either unclear or provide solutions that don't work, perhaps because they are outdated:
Proper R Markdown Code Organization
How to source R Markdown file like `source('myfile.r')`?
http://yihui.name/knitr/demo/externalization/
Modularized code structure for large projects
R Markdown/Notebook is nice, but the way it's presented, there is typically a single file that has all the text and all the code chunks. I often have projects where such a single file structure is not a good setup. Instead, I use a single .R master file that loads the other .R files in order. I'd like to replicate this structure using R Notebook i.e. such that I have a single .Rmd file that I call the code from multiple .R files from.
The nice thing about working with a project this way is that it allows for the nice normal workflow with RStudio using the .R files but also the neat output from R Notebook/Markdown without duplicating the code.
Minimal example
This is simplified to make the example as small as possible. Two .R files and one master .Rmd file.
start.R
# libs --------------------------------------------------------------------
library(pacman)
p_load(dplyr, ggplot2)
#normally load a lot of packages here
# data --------------------------------------------------------------------
d = iris
#use iris for example, but normally would load data from file
# data manipulation tasks -------------------------------------------------
#some code here to extract useful info from the data
setosa = dplyr::filter(d, Species == "setosa")
plot.R
#setosa only
ggplot(setosa, aes(Sepal.Length)) +
geom_density()
#all together
ggplot(d, aes(Sepal.Length, color = Species)) +
geom_density()
And then the notebook file:
notebook.Rmd:
---
title: "R Notebook"
output:
html_document: default
html_notebook: default
---
First we load some packages and data and do slight transformation:
```{r start}
#a command here to load the code from start.R and display it
```
```{r plot}
#a command here to load the code from plot.R and display it
```
Desired output
The desired output is that which one gets from manually copying over the code from start.R and plot.R into the code chunks in notebook.Rmd. This looks like this (some missing due to lack of screen space):
Things I've tried
source
This loads the code, but does not display it. It just displays the source command:
knitr::read_chunk
This command was mentioned here, but actually it does the same as source as far as I can tell: it loads the code but displays nothing.
How do I get the desired output?
The solution is to use knitr's chunk option code. According to knitr docs:
code: (NULL; character) if provided, it will override the code in the
current chunk; this allows us to programmatically insert code into the
current chunk; e.g. a chunk option code =
capture.output(dump('fivenum', '')) will use the source code of the
function fivenum to replace the current chunk
No example is provided, however. It sounds like one has to feed it a character vector, so let's try readLines:
```{r start, code=readLines("start.R")}
```
```{r plot, code=readLines("start.R")}
```
This produces the desired output and thus allows for a modularized project structure.
Feeding it a file directly does not work (i.e. code="start.R"), but would be a nice enhancement.
For interoperability with R Notebooks, you can use knitr's read_chunk method as described above. In a notebook, you must call read_chunk in the setup chunk; since you can run notebook chunks in any order, this ensures that the external code will always be available.
Here's a minimal example of using read_chunk to bring code from an external R script into a notebook:
example.Rmd
```{r setup}
knitr::read_chunk("example.R")
```
```{r chunk}
```
example.R
## ---- chunk
1 + 1
When you execute the empty chunk in the notebook, code from the external file will be inserted, and the results displayed inline, as though the chunk contained that code.
As per my comment above, I use the here library to work with projects in folders:
```{ r setup, echo=FALSE, message=FALSE, warning=FALSE, results='asis'}
library(here)
insert <- function(filename){
readLines(here::here("massive_report_folder", filename))
}
```
and then each chunk looks like
```{ r setup, echo=FALSE, message=FALSE, warning=FALSE,
results='asis', code=insert("extra_file.R")}
```

Proper R Markdown Code Organization

I have been reading about R Markdown (here, here, and here) and using it to create solid reports. I would like to try to use what little code I am running to do some ad hoc analyses and turn them into more scalable data reports.
My question is rather broad: Is there a proper way to organize your code around an R Markdown project? Say, have one script that generates all of the data structures?
For example: Let's say that I have the cars data set and I have brought in commercial data on the manufacturer. What if I wanted to attach the manufacturer to the current cars data set, and then produce a separate summary table for each company using a manipulated data set cars.by.name as well as plot a certain sample using cars.import?
EDIT: Right now I have two files open. One is an R Script file that has all of the data manipulation: subsetting and re-categorizing values. And the other is the R Markdown file where I am building out text to accompany the various tables and plots of interest. When I call an object from the R Script file--like:
```{r}
table(cars.by.name$make)
```
I get an error saying Error in summary(cars.by.name$make) : object 'cars.by.name' not found
EDIT 2: I found this older thread to be helpful. Link
---
title: "Untitled"
author: "Jeb"
date: "August 4, 2015"
output: html_document
---
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r}
table(cars.by.name$make)
```
```{r}
summary(cars)
summary(cars.by.name)
```
```{r}
table(cars.by.name)
```
You can also embed plots, for example:
```{r, echo=FALSE}
plot(cars)
plot(cars.import)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
There is a solution for this sort of problem, explained here.
Basically, if you have an .R file containing your code, there is no need to repeat the code in the .Rmd file, but you can include the code from .R file. For this to work, the chunks of code should be named in the .R file, and then can be included by name in the .Rmd file.
test.R:
## ---- chunk-1 ----
table(cars.by.name$make)
test.Rmd
Just once on top of the .Rmd file:
```{r echo=FALSE, cache= F}
knitr::read_chunk('test.R')
```
For every chunk you're including (replace chunk-1 with the label of that specific chunk in your .R file):
```{r chunk-1}
```
Note that it should be left empty (as is) and in run-time your code from .R will be brought over here and run.
Often times, I have many reports that need to run the same code with slightly different parameters. Calling all my "stats" functions separately, generating the results and then just referencing is what I typically do. The way to do this is as follows:
---
title: "Untitled"
author: "Author"
date: "August 4, 2015"
output: html_document
---
```{r, echo=FALSE, message=FALSE}
directoryPath <- "rawPath" ##Something like /Users/userid/RDataFile
fullPath <- file.path(directoryPath,"myROutputFile.RData")
load(fullPath)
```
Some Text, headers whatever
```{r}
summary(myStructure$value1) #Where myStructure was saved to the .RData file
```
You can save an RData file by using the save.image() command.
Hope that helps!

Is there a R markdown analog of \SweaveInput{} for modular report generation?

One of the features I like very much in Sweave is the option to have \SweaveInput{} of separate Sweave files to have a more "modular" report and just be able to comment out parts of the report that I do not want to be generated with a single #\SweaveInput{part_x} rather than having to comment in or out entire blocks of code.
Recently I decided to move to R Markdown for multiple reasons being mainly practicality, the option of interactive (Shiny) integration in the report and the fact that I do not really need the extensive formatting options of LaTeX.
I found that technically pandoc is able to combine multiple Rmd files into one html output by just concatenating them but it would be nice if this behaviour could be called from a "master" Rmd file.
Any answer would be greatly appreciated even if it is just "go back to Sweave, it is not possible in Markdown".
I am using R 3.1.1 for Windows and Linux as well as Rstudio 0.98.1056 and Rstudio server 0.98.983.
Use something like this in the main document:
```{r child="CapsuleRInit.Rmd"}
```
```{r child="CapsuleTitle.Rmd", eval=TRUE}
```
```{r child="CapsuleBaseline.Rmd", eval=TRUE}
```
Use eval=FALSE to skip one child.
For RStudio users: you can define a main document for latex output, but this does not work for RMD documents, so you always have to switch to the main document for processing. Please support my feature request to RStudio; I tried already twice, but is seems to me that too few people use child docs to put it higher in the priority list.
I don't quite understand some of the terms in the answer above, but the solution relates to defining a custom knit: hook in the YAML header. For multipartite documents this allows you to, for example:
Have a 'main' or 'root' Rmarkdown file with an output: markdown_document YAML header
render all child documents from Rmd ⇒ md ahead of calling render, or not if this is time-limiting
combine multiple files (with the child code chunk option) into one (e.g. for chapters in a report)
write output: html_document (or other format) YAML headers for this compilation output on the fly, prepending to the markdown effectively writing a fresh Rmarkdown file
...then render this Rmarkdown to get the output, deleting intermediate files in the process if desired
The code for all of the above (dumped here) is described here, a post I wrote after working out the usage of custom knit: YAML header hooks recently (here).
The custom knit: function (i.e. the replacement to rmarkdown::render) in the above example is:
(function(inputFile, encoding) {
input.dir <- normalizePath(dirname(inputFile))
rmarkdown::render(input = inputFile, encoding = encoding, quiet=TRUE,
output_file = paste0(input.dir,'/Workbook-tmp.Rmd'))
sink("Workbook-compiled.Rmd")
cat(readLines(headerConn <- file("Workbook-header.yaml")), sep="\n")
close(headerConn)
cat(readLines(rmdConn <- file("Workbook-tmp.Rmd")), sep="\n")
close(rmdConn)
sink()
rmarkdown::render(input = paste0(input.dir,'/Workbook-compiled.Rmd'),
encoding = encoding, output_file = paste0(input.dir,'/../Workbook.html'))
unlink(paste0(input.dir,'/Workbook-tmp.Rmd'))
})
...but all squeezed onto 1 line!
The rest of the 'master'/'root'/'control' file or whatever you want to call it takes care of writing the aforementioned YAML for the final HTML output that goes via an intermediate Rmarkdown file, and its second code chunk programmatically appends child documents through a call to list.files()
```{r include=FALSE}
header.input.file <- "Workbook-header.yaml"
header.input.dir <- normalizePath(dirname(header.input.file))
fileConn <- file(header.input.file)
writeLines(c(
"---",
paste0('title: "', rmarkdown::metadata$title,'"'),
"output:",
" html_document:",
" toc: true",
" toc_depth: 3 # defaults to 3 anyway, but just for ease of modification",
" number_sections: TRUE",
paste0(" css: ",paste0(header.input.dir,'/../resources/workbook_style.css')),
' pandoc_args: ["--number-offset=1", "--atx-headers"]',
"---", sep="\n"),
fileConn)
close(fileConn)
```
```{r child = list.files(pattern = 'Notes-.*?\\.md')}
# Use file names with strict numeric ordering, e.g. Notes-001-Feb1.md
```
The directory structure would contain a top-level folder with
A final output Workbook.html
A resources subfolder containing workbook_style.css
A documents subfolder containing said main file "Workbook.Rmd" alongside files named as "Notes-001.Rmd", "Notes-002.Rmd" etc. (to ensure a fileglobbing on list.files(pattern = "Notes-.*?\\.Rmd) finds and thus makes them children in the correct order when rendering the main Workbook.Rmd file)
To get proper numbering of files, each constituent "Notes-XXX.Rmd" file should contain the following style YAML header:
---
title: "March: Doing x, y, and z"
knit: (function(inputFile, encoding) { input.dir <- normalizePath(dirname(inputFile)); rmarkdown::render(input = inputFile, encoding = encoding, quiet=TRUE)})
output:
md_document:
variant: markdown_github
pandoc_args: "--atx-headers"
---
```{r echo=FALSE, results='asis', comment=''}
cat("##", rmarkdown::metadata$title)
```
The code chunk at the top of the Rmarkdown document enters the YAML title as a second-level header when evaluated. results='asis' indicates to return plain text-string rather than
[1] "A text string"
You would knit each of these before knitting the main file - it's easier to remove the requirement to render all child documents and just append their pre-produced markdown output.
I've described all of this at the links above, but I thought it'd be bad manners not to leave the actual code with my answer.
I don't know how effective that RStudio feature request website may be... Personally I've not found it hard to look into the source code for these functions, which thankfully are open source, and if there really is something absent rather than undocumented an inner-workings-informed feature request is likely far more actionable by one of their software devs.
I'm not familiar with Sweave, was the above was what you were aiming at? If I understand correctly you just want to control the inclusion of documents in a modular fashion. The child = list.files() statement could take care of that: if not through file globbing you can straight-up list files as child = c("file1.md","file2md")... and switch that statement to change the children. You can also control TRUE/FALSE switches with YAML, whereby the presence of a custom header would set some children to be included for example
potentially.absent.variable: TRUE
...above the document with a silent include=FALSE hiding the machinations of the first chunk:
```{r include=FALSE}
!all(!as.logical(rmarkdown::metadata$potentially.absent.variable)
# ==> FALSE if potentially.absent.variable is absent
# ==> FALSE if anything other than TRUE
# ==> TRUE if TRUE
checkFor <- function(var) {
return !all(!as.logical(rmarkdown::metadata[[var]])
}
```
```{r child = "Optional_file.md", eval = checkFor(potentially.absent.variable)}
```

knitr: Knitting separate Rnw documents within an Rmd document

I have a master R markdown document (Rmd) within which I would like to knit several separate Rnw documents (NO child documents) in one of the chunks. However, when I call knit on the Rnw document, the contained R code chunks do not seem to be processed, resulting in an error when trying to run texi2pdf on them.
Illustration of the situation:
Inside master.Rmd:
```{r my_chunk, echo=FALSE, message=FALSE, results='asis'}
... some code ...
knit("sub.**Rnw**", output = ..., quiet = TRUE)
tools::texi2pdf(tex_file)
... some code ...
```
Is there some additional configuration required to make this scenario work?
There are a few reasons you can't directly do what you are trying to do (calling knit from within a knit environment)...
Knitr patterns are already set.
[ In this case markdown patterns, so you'd need to set the patterns to 'rnw' patterns. ]
Parsing the chunks (after setting the correct patterns) will add chunk labels to the existing concordance, so unless all chunks are unique you will get a duplicate chunk label error.
[ This is why knit_child exists. ]
The output target and other options are already set, so you either need a completely new knitr environment or to save, modify, restore all pertinent options.
That being said, it seems like completely expected behavior.
Something along the lines of
library(knitr)
files <- list.files( pattern = "*.Rnw", path = ".")
files
## [1] "test_extB.Rnw" "test_ext.Rnw"
for( f in files ) {
system( paste0("R -e \"knitr::knit2pdf('", f, "')\"") )
}
list.files( pattern="*.pdf", path=".")
## [1] "test_extB.pdf" "test_ext.pdf"
or calling Rscript in a loop should do the trick (based on the info provided), which is essentially what #kohske was expressing in the comments.

Resources