How to source R Markdown file like `source('myfile.r')`? - r

I often have a main R Markdown file or knitr LaTeX file where I source some other R file (e.g., for data processing). However, I was thinking that in some instances it would be beneficial to have these sourced files be their own reproducible documents (e.g., an R Markdown file that not only includes commands for data processing but also produces a reproducible document that explains the data processing decisions).
Thus, I would like to have a command like source('myfile.rmd') in my main R Markdown file. that would extract and source all the R code inside the R code chunks of myfile.rmd. Of course, this gives rise to an error.
The following command works:
```{r message=FALSE, results='hide'}
knit('myfile.rmd', tangle=TRUE)
source('myfile.R')
```
where results='hide' could be omitted if the output was desired. I.e., knitr outputs the R code from myfile.rmd into myfile.R.
However, it doesn't seem perfect:
it results in the creation of an extra file
it needs to appear in its own code chunk if control over the display is required.
It's not as elegant as simple source(...).
Thus my question:
Is there a more elegant way of sourcing the R code of an R Markdown file?

It seems you are looking for a one-liner. How about putting this in your .Rprofile?
ksource <- function(x, ...) {
library(knitr)
source(purl(x, output = tempfile()), ...)
}
However, I do not understand why you want to source() the code in the Rmd file itself. I mean knit() will run all the code in this document, and if you extract the code and run it in a chunk, all the code will be run twice when you knit() this document (you run yourself inside yourself). The two tasks should be separate.
If you really want to run all the code, RStudio has made this fairly easy: Ctrl + Shift + R. It basically calls purl() and source() behind the scene.

Factor the common code out into a separate R file, and then source that R file into each Rmd file you want it in.
so for example let's say I have two reports I need to make, Flu Outbreaks and Guns vs Butter Analysis. Naturally I'd create two Rmd documents and be done with it.
Now suppose boss comes along and wants to see the variations of Flu Outbreaks versus Butter prices (controlling for 9mm ammo).
Copying and pasting the code to analyze the reports into the new report is a bad idea for code reuse, etc.
I want it to look nice.
My solution was to factor the project into these files:
Flu.Rmd
flu_data_import.R
Guns_N_Butter.Rmd
guns_data_import.R
butter_data_import.R
within each Rmd file I'd have something like:
```{r include=FALSE}
source('flu_data_import.R')
```
The problem here is that we lose reproducibility. My solution to that is to create a common child document to include into each Rmd file. So at the end of every Rmd file I create, I add this:
```{r autodoc, child='autodoc.Rmd', eval=TRUE}
```
And, of course, autodoc.Rmd:
Source Data & Code
----------------------------
<div id="accordion-start"></div>
```{r sourcedata, echo=FALSE, results='asis', warnings=FALSE}
if(!exists(autodoc.skip.df)) {
autodoc.skip.df <- list()
}
#Generate the following table:
for (i in ls(.GlobalEnv)) {
if(!i %in% autodoc.skip.df) {
itm <- tryCatch(get(i), error=function(e) NA )
if(typeof(itm)=="list") {
if(is.data.frame(itm)) {
cat(sprintf("### %s\n", i))
print(xtable(itm), type="html", include.rownames=FALSE, html.table.attributes=sprintf("class='exportable' id='%s'", i))
}
}
}
}
```
### Source Code
```{r allsource, echo=FALSE, results='asis', warning=FALSE, cache=FALSE}
fns <- unique(c(compact(llply(.data=llply(.data=ls(all.names=TRUE), .fun=function(x) {a<-get(x); c(normalizePath(getSrcDirectory(a)),getSrcFilename(a))}), .fun=function(x) { if(length(x)>0) { x } } )), llply(names(sourced), function(x) c(normalizePath(dirname(x)), basename(x)))))
for (itm in fns) {
cat(sprintf("#### %s\n", itm[2]))
cat("\n```{r eval=FALSE}\n")
cat(paste(tryCatch(readLines(file.path(itm[1], itm[2])), error=function(e) sprintf("Could not read source file named %s", file.path(itm[1], itm[2]))), sep="\n", collapse="\n"))
cat("\n```\n")
}
```
<div id="accordion-stop"></div>
<script type="text/javascript">
```{r jqueryinclude, echo=FALSE, results='asis', warning=FALSE}
cat(readLines(url("http://code.jquery.com/jquery-1.9.1.min.js")), sep="\n")
```
</script>
<script type="text/javascript">
```{r tablesorterinclude, echo=FALSE, results='asis', warning=FALSE}
cat(readLines(url("http://tablesorter.com/__jquery.tablesorter.js")), sep="\n")
```
</script>
<script type="text/javascript">
```{r jqueryuiinclude, echo=FALSE, results='asis', warning=FALSE}
cat(readLines(url("http://code.jquery.com/ui/1.10.2/jquery-ui.min.js")), sep="\n")
```
</script>
<script type="text/javascript">
```{r table2csvinclude, echo=FALSE, results='asis', warning=FALSE}
cat(readLines(file.path(jspath, "table2csv.js")), sep="\n")
```
</script>
<script type="text/javascript">
$(document).ready(function() {
$('tr').has('th').wrap('<thead></thead>');
$('table').each(function() { $('thead', this).prependTo(this); } );
$('table').addClass('tablesorter');$('table').tablesorter();});
//need to put this before the accordion stuff because the panels being hidden makes table2csv return null data
$('table.exportable').each(function() {$(this).after('<a download="' + $(this).attr('id') + '.csv" href="data:application/csv;charset=utf-8,'+encodeURIComponent($(this).table2CSV({delivery:'value'}))+'">Download '+$(this).attr('id')+'</a>')});
$('#accordion-start').nextUntil('#accordion-stop').wrapAll("<div id='accordion'></div>");
$('#accordion > h3').each(function() { $(this).nextUntil('h3').wrapAll("<div>"); });
$( '#accordion' ).accordion({ heightStyle: "content", collapsible: true, active: false });
</script>
N.B., this is designed for the Rmd -> html workflow. This will be an ugly mess if you go with latex or anything else. This Rmd document looks through the global environment for all the source()'ed files and includes their source at the end of your document. It includes jquery ui, tablesorter, and sets the document up to use an accordion style to show/hide sourced files. It's a work in progress, but feel free to adapt it to your own uses.
Not a one-liner, I know. Hope it gives you some ideas at least :)

Try the purl function from knitr:
source(knitr::purl("myfile.rmd", quiet=TRUE))

Probably one should start thinking different. My issue is the following:
Write every code you normally would have had in a .Rmd chunk in a .R file.
And for the Rmd document you use to knit i.e. an html, you only have left
```{R Chunkname, Chunkoptions}
source(file.R)
```
This way you'll probably create a bunch of .R files and you lose the advantage of processing all the code "chunk after chunk" using ctrl+alt+n (or +c, but normally this does not work).
But, I read the book about reproducible research by Mr. Gandrud and realized, that he definitely uses knitr and .Rmd files solely for creating html files. The Main Analysis itself is an .R file.
I think .Rmd documents rapidly grow too large if you start doing your whole analysis inside.

If you are just after the code I think something along these lines should work:
Read the markdown/R file with readLines
Use grep to find the code chunks, searching for lines that start with <<< for example
Take subset of the object that contains the original lines to get only the code
Dump this to a temporary file using writeLines
Source this file into your R session
Wrapping this in a function should give you what you need.

The following hack worked fine for me:
library(readr)
library(stringr)
source_rmd <- function(file_path) {
stopifnot(is.character(file_path) && length(file_path) == 1)
.tmpfile <- tempfile(fileext = ".R")
.con <- file(.tmpfile)
on.exit(close(.con))
full_rmd <- read_file(file_path)
codes <- str_match_all(string = full_rmd, pattern = "```(?s)\\{r[^{}]*\\}\\s*\\n(.*?)```")
stopifnot(length(codes) == 1 && ncol(codes[[1]]) == 2)
codes <- paste(codes[[1]][, 2], collapse = "\n")
writeLines(codes, .con)
flush(.con)
cat(sprintf("R code extracted to tempfile: %s\nSourcing tempfile...", .tmpfile))
source(.tmpfile)
}

I use the following custom function
source_rmd <- function(rmd_file){
knitr::knit(rmd_file, output = tempfile())
}
source_rmd("munge_script.Rmd")

I would recommend keeping the main analysis and calculation code in .R file and importing the chunks as needed in .Rmd file. I have explained the process here.

sys.source("./your_script_file_name.R", envir = knitr::knit_global())
put this command before calling the functions contained in the your_script_file_name.R.
the "./" adding before your_script_file_name.R to show the direction to your file if you already created a Project.
You can see this link for more detail: https://bookdown.org/yihui/rmarkdown-cookbook/source-script.html

I use this one-liner:
```{r optional_chunklabel_for_yourfile_rmd, child = 'yourfile.Rmd'}
```
See:
My .Rmd file becomes very lengthy. Is that possible split it and source() it's smaller portions from main .Rmd?

I would say there is not a more elegant way of sourcing an Rmarkdown file. The ethos of Rmd being that the report is reproducible and at best will be self contained. However, adding to the OP's original solution, the below method avoids the permanent creation of the intermediate file on disk. It also makes some extra effort to ensure chunk output does not appear in the renderred document:
knit_loc <- tempfile(fileext = ".R")
knitr::knit("myfile.rmd",
output = knit_loc,
quiet = TRUE,
tangle = TRUE)
invisible(capture.output(source(knit_loc, verbose = FALSE)))
I would also add that if the child markdown dependencies are external to your R environment (eg write a file to disk, download some external resource, interact with a Web api etc), then instead of knit() I would opt for rmarkdown::render() instead:
rmarkdown::render("myfile.rmd")

this worked for me
source("myfile.r", echo = TRUE, keep.source = TRUE)

The answer by #qed is by far the best. Kevin Keena built the function proposed by #Paul Hiemstra, and this can help you to convert your .Rmd into an .R file to then source the code into another .R file, where knitr::purl would not be available.

Related

Export R script without output to PDF/Word

Is it possible to export the R script to a PDF and/or Word document without the outputs (i.e. without whatever the console prints; plots, graphs etc.)? I know about the r markdown package but as far as I know, it exports the script only with outputs.
rmarkdown is very flexible, you don't have to include output. If you set the option eval = FALSE none of the code will be evaluated, so no outputs will be generated.
See here for a detailed list of options available at the code-chunk level.
To follow up on #GregorThomas's answer: if you simply add R code block-formatting around all of your code with eval=FALSE specified and save it with an .rmd extension, you can then click "Knit to PDF" in RStudio (which will automatically add a minimal header). I think you could probably also do rmarkdown::render("myfile.rmd", output_format = "pdf_document"). If you wanted you could set up a little script to do the minimal editing and rendering automatically ...
```{r eval = FALSE}
x <- 2+3
print("hello")
```
Something like (untested!)
printme <- function(file) {
tt <- tempfile(fileext = ".Rmd")
writeLines(c("```{r eval=FALSE}",
readLines(file),
"```"),
tt)
rmarkdown::render(tt, output_format = "pdf_document",
output_file = "out.pdf")
}

R Sweave: put the whole code of the .Rnw file in Appendix?

I just found this awesome technique to put the code used in the .Rmd file in the appendix (of that same file).
However, I am using R Sweave and not R Markdown and I would like to know if there exists a similar way to put all the code at the end in a unique chunk. The code to do that in Markdown does not work in Sweave. I precise that, unlike this post, I do not have a separate .R file where the calculations are made. Everything is done in the .Rnw file.
Does anybody know how to do it?
Edit : a reproducible example
\documentclass[11pt, twocolumn]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\begin{document}
\SweaveOpts{concordance=TRUE}
<<reg2, echo=FALSE, print=FALSE>>=
head(mtcars)
#
<<reg3, echo=FALSE, print=FALSE>>=
head(iris)
#
\section*{Appendix}
% the place where I could like to put the whole code
\end{document}
This chunk works to include the code:
<<echo=FALSE, eval=TRUE>>=
filename <- tempfile(fileext=".R")
Stangle("test.Rnw", output = filename, quiet = TRUE)
cat(readLines(filename), sep = "\n")
#
When I include that in your example file, I see this:
I think it's possible to modify the format a bit; see ?Rtangle for some details. Similar things are possible with knitr, but it's more flexible. I suspect the best method would be similar to the one you found for RMarkdown.

Modularized R markdown structure

There are a few questions about this already, but they are either unclear or provide solutions that don't work, perhaps because they are outdated:
Proper R Markdown Code Organization
How to source R Markdown file like `source('myfile.r')`?
http://yihui.name/knitr/demo/externalization/
Modularized code structure for large projects
R Markdown/Notebook is nice, but the way it's presented, there is typically a single file that has all the text and all the code chunks. I often have projects where such a single file structure is not a good setup. Instead, I use a single .R master file that loads the other .R files in order. I'd like to replicate this structure using R Notebook i.e. such that I have a single .Rmd file that I call the code from multiple .R files from.
The nice thing about working with a project this way is that it allows for the nice normal workflow with RStudio using the .R files but also the neat output from R Notebook/Markdown without duplicating the code.
Minimal example
This is simplified to make the example as small as possible. Two .R files and one master .Rmd file.
start.R
# libs --------------------------------------------------------------------
library(pacman)
p_load(dplyr, ggplot2)
#normally load a lot of packages here
# data --------------------------------------------------------------------
d = iris
#use iris for example, but normally would load data from file
# data manipulation tasks -------------------------------------------------
#some code here to extract useful info from the data
setosa = dplyr::filter(d, Species == "setosa")
plot.R
#setosa only
ggplot(setosa, aes(Sepal.Length)) +
geom_density()
#all together
ggplot(d, aes(Sepal.Length, color = Species)) +
geom_density()
And then the notebook file:
notebook.Rmd:
---
title: "R Notebook"
output:
html_document: default
html_notebook: default
---
First we load some packages and data and do slight transformation:
```{r start}
#a command here to load the code from start.R and display it
```
```{r plot}
#a command here to load the code from plot.R and display it
```
Desired output
The desired output is that which one gets from manually copying over the code from start.R and plot.R into the code chunks in notebook.Rmd. This looks like this (some missing due to lack of screen space):
Things I've tried
source
This loads the code, but does not display it. It just displays the source command:
knitr::read_chunk
This command was mentioned here, but actually it does the same as source as far as I can tell: it loads the code but displays nothing.
How do I get the desired output?
The solution is to use knitr's chunk option code. According to knitr docs:
code: (NULL; character) if provided, it will override the code in the
current chunk; this allows us to programmatically insert code into the
current chunk; e.g. a chunk option code =
capture.output(dump('fivenum', '')) will use the source code of the
function fivenum to replace the current chunk
No example is provided, however. It sounds like one has to feed it a character vector, so let's try readLines:
```{r start, code=readLines("start.R")}
```
```{r plot, code=readLines("start.R")}
```
This produces the desired output and thus allows for a modularized project structure.
Feeding it a file directly does not work (i.e. code="start.R"), but would be a nice enhancement.
For interoperability with R Notebooks, you can use knitr's read_chunk method as described above. In a notebook, you must call read_chunk in the setup chunk; since you can run notebook chunks in any order, this ensures that the external code will always be available.
Here's a minimal example of using read_chunk to bring code from an external R script into a notebook:
example.Rmd
```{r setup}
knitr::read_chunk("example.R")
```
```{r chunk}
```
example.R
## ---- chunk
1 + 1
When you execute the empty chunk in the notebook, code from the external file will be inserted, and the results displayed inline, as though the chunk contained that code.
As per my comment above, I use the here library to work with projects in folders:
```{ r setup, echo=FALSE, message=FALSE, warning=FALSE, results='asis'}
library(here)
insert <- function(filename){
readLines(here::here("massive_report_folder", filename))
}
```
and then each chunk looks like
```{ r setup, echo=FALSE, message=FALSE, warning=FALSE,
results='asis', code=insert("extra_file.R")}
```

Is there a R markdown analog of \SweaveInput{} for modular report generation?

One of the features I like very much in Sweave is the option to have \SweaveInput{} of separate Sweave files to have a more "modular" report and just be able to comment out parts of the report that I do not want to be generated with a single #\SweaveInput{part_x} rather than having to comment in or out entire blocks of code.
Recently I decided to move to R Markdown for multiple reasons being mainly practicality, the option of interactive (Shiny) integration in the report and the fact that I do not really need the extensive formatting options of LaTeX.
I found that technically pandoc is able to combine multiple Rmd files into one html output by just concatenating them but it would be nice if this behaviour could be called from a "master" Rmd file.
Any answer would be greatly appreciated even if it is just "go back to Sweave, it is not possible in Markdown".
I am using R 3.1.1 for Windows and Linux as well as Rstudio 0.98.1056 and Rstudio server 0.98.983.
Use something like this in the main document:
```{r child="CapsuleRInit.Rmd"}
```
```{r child="CapsuleTitle.Rmd", eval=TRUE}
```
```{r child="CapsuleBaseline.Rmd", eval=TRUE}
```
Use eval=FALSE to skip one child.
For RStudio users: you can define a main document for latex output, but this does not work for RMD documents, so you always have to switch to the main document for processing. Please support my feature request to RStudio; I tried already twice, but is seems to me that too few people use child docs to put it higher in the priority list.
I don't quite understand some of the terms in the answer above, but the solution relates to defining a custom knit: hook in the YAML header. For multipartite documents this allows you to, for example:
Have a 'main' or 'root' Rmarkdown file with an output: markdown_document YAML header
render all child documents from Rmd ⇒ md ahead of calling render, or not if this is time-limiting
combine multiple files (with the child code chunk option) into one (e.g. for chapters in a report)
write output: html_document (or other format) YAML headers for this compilation output on the fly, prepending to the markdown effectively writing a fresh Rmarkdown file
...then render this Rmarkdown to get the output, deleting intermediate files in the process if desired
The code for all of the above (dumped here) is described here, a post I wrote after working out the usage of custom knit: YAML header hooks recently (here).
The custom knit: function (i.e. the replacement to rmarkdown::render) in the above example is:
(function(inputFile, encoding) {
input.dir <- normalizePath(dirname(inputFile))
rmarkdown::render(input = inputFile, encoding = encoding, quiet=TRUE,
output_file = paste0(input.dir,'/Workbook-tmp.Rmd'))
sink("Workbook-compiled.Rmd")
cat(readLines(headerConn <- file("Workbook-header.yaml")), sep="\n")
close(headerConn)
cat(readLines(rmdConn <- file("Workbook-tmp.Rmd")), sep="\n")
close(rmdConn)
sink()
rmarkdown::render(input = paste0(input.dir,'/Workbook-compiled.Rmd'),
encoding = encoding, output_file = paste0(input.dir,'/../Workbook.html'))
unlink(paste0(input.dir,'/Workbook-tmp.Rmd'))
})
...but all squeezed onto 1 line!
The rest of the 'master'/'root'/'control' file or whatever you want to call it takes care of writing the aforementioned YAML for the final HTML output that goes via an intermediate Rmarkdown file, and its second code chunk programmatically appends child documents through a call to list.files()
```{r include=FALSE}
header.input.file <- "Workbook-header.yaml"
header.input.dir <- normalizePath(dirname(header.input.file))
fileConn <- file(header.input.file)
writeLines(c(
"---",
paste0('title: "', rmarkdown::metadata$title,'"'),
"output:",
" html_document:",
" toc: true",
" toc_depth: 3 # defaults to 3 anyway, but just for ease of modification",
" number_sections: TRUE",
paste0(" css: ",paste0(header.input.dir,'/../resources/workbook_style.css')),
' pandoc_args: ["--number-offset=1", "--atx-headers"]',
"---", sep="\n"),
fileConn)
close(fileConn)
```
```{r child = list.files(pattern = 'Notes-.*?\\.md')}
# Use file names with strict numeric ordering, e.g. Notes-001-Feb1.md
```
The directory structure would contain a top-level folder with
A final output Workbook.html
A resources subfolder containing workbook_style.css
A documents subfolder containing said main file "Workbook.Rmd" alongside files named as "Notes-001.Rmd", "Notes-002.Rmd" etc. (to ensure a fileglobbing on list.files(pattern = "Notes-.*?\\.Rmd) finds and thus makes them children in the correct order when rendering the main Workbook.Rmd file)
To get proper numbering of files, each constituent "Notes-XXX.Rmd" file should contain the following style YAML header:
---
title: "March: Doing x, y, and z"
knit: (function(inputFile, encoding) { input.dir <- normalizePath(dirname(inputFile)); rmarkdown::render(input = inputFile, encoding = encoding, quiet=TRUE)})
output:
md_document:
variant: markdown_github
pandoc_args: "--atx-headers"
---
```{r echo=FALSE, results='asis', comment=''}
cat("##", rmarkdown::metadata$title)
```
The code chunk at the top of the Rmarkdown document enters the YAML title as a second-level header when evaluated. results='asis' indicates to return plain text-string rather than
[1] "A text string"
You would knit each of these before knitting the main file - it's easier to remove the requirement to render all child documents and just append their pre-produced markdown output.
I've described all of this at the links above, but I thought it'd be bad manners not to leave the actual code with my answer.
I don't know how effective that RStudio feature request website may be... Personally I've not found it hard to look into the source code for these functions, which thankfully are open source, and if there really is something absent rather than undocumented an inner-workings-informed feature request is likely far more actionable by one of their software devs.
I'm not familiar with Sweave, was the above was what you were aiming at? If I understand correctly you just want to control the inclusion of documents in a modular fashion. The child = list.files() statement could take care of that: if not through file globbing you can straight-up list files as child = c("file1.md","file2md")... and switch that statement to change the children. You can also control TRUE/FALSE switches with YAML, whereby the presence of a custom header would set some children to be included for example
potentially.absent.variable: TRUE
...above the document with a silent include=FALSE hiding the machinations of the first chunk:
```{r include=FALSE}
!all(!as.logical(rmarkdown::metadata$potentially.absent.variable)
# ==> FALSE if potentially.absent.variable is absent
# ==> FALSE if anything other than TRUE
# ==> TRUE if TRUE
checkFor <- function(var) {
return !all(!as.logical(rmarkdown::metadata[[var]])
}
```
```{r child = "Optional_file.md", eval = checkFor(potentially.absent.variable)}
```

knitr: Knitting separate Rnw documents within an Rmd document

I have a master R markdown document (Rmd) within which I would like to knit several separate Rnw documents (NO child documents) in one of the chunks. However, when I call knit on the Rnw document, the contained R code chunks do not seem to be processed, resulting in an error when trying to run texi2pdf on them.
Illustration of the situation:
Inside master.Rmd:
```{r my_chunk, echo=FALSE, message=FALSE, results='asis'}
... some code ...
knit("sub.**Rnw**", output = ..., quiet = TRUE)
tools::texi2pdf(tex_file)
... some code ...
```
Is there some additional configuration required to make this scenario work?
There are a few reasons you can't directly do what you are trying to do (calling knit from within a knit environment)...
Knitr patterns are already set.
[ In this case markdown patterns, so you'd need to set the patterns to 'rnw' patterns. ]
Parsing the chunks (after setting the correct patterns) will add chunk labels to the existing concordance, so unless all chunks are unique you will get a duplicate chunk label error.
[ This is why knit_child exists. ]
The output target and other options are already set, so you either need a completely new knitr environment or to save, modify, restore all pertinent options.
That being said, it seems like completely expected behavior.
Something along the lines of
library(knitr)
files <- list.files( pattern = "*.Rnw", path = ".")
files
## [1] "test_extB.Rnw" "test_ext.Rnw"
for( f in files ) {
system( paste0("R -e \"knitr::knit2pdf('", f, "')\"") )
}
list.files( pattern="*.pdf", path=".")
## [1] "test_extB.pdf" "test_ext.pdf"
or calling Rscript in a loop should do the trick (based on the info provided), which is essentially what #kohske was expressing in the comments.

Resources