Modularized R markdown structure - r

There are a few questions about this already, but they are either unclear or provide solutions that don't work, perhaps because they are outdated:
Proper R Markdown Code Organization
How to source R Markdown file like `source('myfile.r')`?
http://yihui.name/knitr/demo/externalization/
Modularized code structure for large projects
R Markdown/Notebook is nice, but the way it's presented, there is typically a single file that has all the text and all the code chunks. I often have projects where such a single file structure is not a good setup. Instead, I use a single .R master file that loads the other .R files in order. I'd like to replicate this structure using R Notebook i.e. such that I have a single .Rmd file that I call the code from multiple .R files from.
The nice thing about working with a project this way is that it allows for the nice normal workflow with RStudio using the .R files but also the neat output from R Notebook/Markdown without duplicating the code.
Minimal example
This is simplified to make the example as small as possible. Two .R files and one master .Rmd file.
start.R
# libs --------------------------------------------------------------------
library(pacman)
p_load(dplyr, ggplot2)
#normally load a lot of packages here
# data --------------------------------------------------------------------
d = iris
#use iris for example, but normally would load data from file
# data manipulation tasks -------------------------------------------------
#some code here to extract useful info from the data
setosa = dplyr::filter(d, Species == "setosa")
plot.R
#setosa only
ggplot(setosa, aes(Sepal.Length)) +
geom_density()
#all together
ggplot(d, aes(Sepal.Length, color = Species)) +
geom_density()
And then the notebook file:
notebook.Rmd:
---
title: "R Notebook"
output:
html_document: default
html_notebook: default
---
First we load some packages and data and do slight transformation:
```{r start}
#a command here to load the code from start.R and display it
```
```{r plot}
#a command here to load the code from plot.R and display it
```
Desired output
The desired output is that which one gets from manually copying over the code from start.R and plot.R into the code chunks in notebook.Rmd. This looks like this (some missing due to lack of screen space):
Things I've tried
source
This loads the code, but does not display it. It just displays the source command:
knitr::read_chunk
This command was mentioned here, but actually it does the same as source as far as I can tell: it loads the code but displays nothing.
How do I get the desired output?

The solution is to use knitr's chunk option code. According to knitr docs:
code: (NULL; character) if provided, it will override the code in the
current chunk; this allows us to programmatically insert code into the
current chunk; e.g. a chunk option code =
capture.output(dump('fivenum', '')) will use the source code of the
function fivenum to replace the current chunk
No example is provided, however. It sounds like one has to feed it a character vector, so let's try readLines:
```{r start, code=readLines("start.R")}
```
```{r plot, code=readLines("start.R")}
```
This produces the desired output and thus allows for a modularized project structure.
Feeding it a file directly does not work (i.e. code="start.R"), but would be a nice enhancement.

For interoperability with R Notebooks, you can use knitr's read_chunk method as described above. In a notebook, you must call read_chunk in the setup chunk; since you can run notebook chunks in any order, this ensures that the external code will always be available.
Here's a minimal example of using read_chunk to bring code from an external R script into a notebook:
example.Rmd
```{r setup}
knitr::read_chunk("example.R")
```
```{r chunk}
```
example.R
## ---- chunk
1 + 1
When you execute the empty chunk in the notebook, code from the external file will be inserted, and the results displayed inline, as though the chunk contained that code.

As per my comment above, I use the here library to work with projects in folders:
```{ r setup, echo=FALSE, message=FALSE, warning=FALSE, results='asis'}
library(here)
insert <- function(filename){
readLines(here::here("massive_report_folder", filename))
}
```
and then each chunk looks like
```{ r setup, echo=FALSE, message=FALSE, warning=FALSE,
results='asis', code=insert("extra_file.R")}
```

Related

r-exams Questions about the same data on 2 separate xxx.Rmd files

Using R exams, I am developing a pdf exam with several questions (hence several Rmd files) but the questions are connected and would use a dataset created in the first question file. Questions would not be amenable to a cloze format.
Is there a way to write the exercises so that the second exercise can access the data generated by the first exercise ?
The easiest solution is to use a shared environment across the different exercises, in the simplest case the .GlobalEnv. Then you can simply do
exams2pdf(c("ex1.Rmd", "ex2.Rmd"), envir = .GlobalEnv)
and then both exercises will create their variables in the global environment and can re-use existing variables from there. Instead of the .GlobalEnv you can also create myenv <- new.env() and use envir = myenv.
For Rnw (as opposed to Rmd) exercises, it is not necessary to set this option because Sweave() Rnw exercises are always processed in the current environment anyway.
Note that these approaches only work for those exams2xyz() interfaces, where the n-th random draw from each exercise can be assured to end up together in the n-the exam. This is the case for PDF output but not for many of the learning management system outputs (Moodle, Canvas, etc.). See: Sharing a random CSV data set across exercises with exams2moodle()
Is it an option to save the data you need to disk in one Rmd file
```{r, echo=FALSE}
saveRDS(df, "my_stored_data.rds")
```
and then load it in the other one
```{r, echo=FALSE}
readRDS(df, "my_stored_data.rds")
```
Another option could be to knit the Rmd files from an R script and then knit them from this R script. If you do that, the Rmd files use the environment of the R script (!) instead of creating their own. Hence you can use the same objects (and therefore of course let one Rmd script store the data, while the other uses it as input.
In this thread: Create sections through a loop with knitr
there is a post from me about doing this. It's basically this:
The first Rmd file:
---
title: "Script 1"
output: html_document
---
```{r setup, include=FALSE}
a_data_frame_created_in_script_1 <- mtcars
```
saved as rmd_test.Rmd
The second one:
---
title: "Script 1"
output: html_document
---
```{r setup}
a_data_frame_created_in_script_1
```
saved as rmd_test_2.Rmd.
And then you have an R-script that does this:
rmarkdown::render("rmd_test.Rmd", output_file = "rmd_test.html")
rmarkdown::render("rmd_test_2.Rmd", output_file = "rmd_test_2.html")

run an R Markdown (check.Rmd ) and an R knitr (test.Rnw ) file together

I have the following problem; There are 2 big documents, one written in R Markdown (check.Rmd ) and the other in R knitr (test.Rnw ). In the first doc we have a code like the following:
\section{Organisations Test}
\textbf{Running Organisations Checks}
<<CreateOrganisations, echo=FALSE, progress=TRUE, warning=FALSE, eval=TRUE>>=
source("OrganisationsTest.R")
OrganisationsTest(current_schema,server,user,pass)
#
and in the other as follows:
2. check the downwards shock
```{r chunk_Int_Sh_p2, echo=FALSE}
unique(param.int.shock.tab[SHOCKTYPE=="SHOCK_DOWN"&PERIODEND<21|PERIODEND==90, list( Maturity=PERIODEND, Shock_value=100*SHOCKVALUE)])
```
Now the question: how can I combine both so that I have just one script which runs and compile both one after each other. Just for clarification, I mean without any changes in both documents how can I have just one script which applyes to the first doc knit PDF to create pdf and to the other one CompilePDF ?
I suppose in Linux one can write a shell script but what a bout using RStudio in windowes?
I am really grateful for every hint I am a little bit helpless!
Addendum: In principle it is as follows: we have 2 files if you would compile a knitr file you would use bottom in RStudio, and for a Markdown file one may use bottom in RStudio, BUT we want to put both together and klick on one bottom. How is it possible?
The RStudio buttons "Compile PDF" (for RNW documents) and "Knit PDF" (for RMD documents) are convenient, but in cases like this one it is important to understand what they do in order to reproduce the same or similar behavior.
Summing the question up, it asks for a way to convert two files (a RMD and a RNW document) to PDF, preferably using a button like the two buttons mentioned above.
Unfortunately, (up to my knowledge) it is not possible to add any user-defined buttons to the RStudio GUI. But it is straightforward to write an R script that compiles both documents.
In the following I assume two files:
first.Rmd:
This is a RMD file.
```{r, echo=FALSE}
plot(1)
```
second.Rnw:
\documentclass{article}
\begin{document}
This is a RNW file.
<<>>=
plot(1)
#
\end{document}
To compile first.Rmd to PDF, we need the following (see How to convert R Markdown to PDF?):
library(knitr)
library(rmarkdown)
knit(input = "first.Rmd")
render(input = "first.md", output_format = "pdf_document")
The knit call generates first.md from first.Rmd, execting the R code in the chunks. render converts the resulting markdown file to PDF. [Note the addendum at the bottom!]
To compile first.Rnw to PDF, we can simply use knit2pdf:
knit2pdf("second.Rnw")
Copying both snippets into one R script and clicking "Source" is as close as possible to a "one-button-solution".
However, note that the snippets do something very similar to the "Compile / knit PDF" button, but it is not identical. The "Compile" buttons start a new R session while the solution above uses the current session.
Before executing the snippets make sure to use the correct working directory.
Both knit and knit2pdf by default use envir = parent.frame(). That means R code in chunks is executed in the calling enironment (see What is the difference between parent.frame() and parent.env() in R). This can be a useful feature, for example to "pass" variables to chunks, but it is important to know about it. Otherwise a document might compile just fine in one session (where certain variables exist in the calling environment) but cannot be compiled in another session (that is missing these variables). Therefore, this feature is a little bit dangerous in terms of reproducibility. As a solution, envir = new.env(parent = as.environment(2)) could be used; see knitr inherits variables from a user's environment, even with envir = new.env() for more details on that topic.
I just realized to following about render:
If the input requires knitting then knit is called prior to pandoc.
(Source: ?render)
Therefore, knit(input = "first.Rmd"); render(input = "first.md", output_format = "pdf_document") can be simplified to render(input = "first.Rmd", output_format = "pdf_document"). The envir issues of knit from above apply to render as well.

Proper R Markdown Code Organization

I have been reading about R Markdown (here, here, and here) and using it to create solid reports. I would like to try to use what little code I am running to do some ad hoc analyses and turn them into more scalable data reports.
My question is rather broad: Is there a proper way to organize your code around an R Markdown project? Say, have one script that generates all of the data structures?
For example: Let's say that I have the cars data set and I have brought in commercial data on the manufacturer. What if I wanted to attach the manufacturer to the current cars data set, and then produce a separate summary table for each company using a manipulated data set cars.by.name as well as plot a certain sample using cars.import?
EDIT: Right now I have two files open. One is an R Script file that has all of the data manipulation: subsetting and re-categorizing values. And the other is the R Markdown file where I am building out text to accompany the various tables and plots of interest. When I call an object from the R Script file--like:
```{r}
table(cars.by.name$make)
```
I get an error saying Error in summary(cars.by.name$make) : object 'cars.by.name' not found
EDIT 2: I found this older thread to be helpful. Link
---
title: "Untitled"
author: "Jeb"
date: "August 4, 2015"
output: html_document
---
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r}
table(cars.by.name$make)
```
```{r}
summary(cars)
summary(cars.by.name)
```
```{r}
table(cars.by.name)
```
You can also embed plots, for example:
```{r, echo=FALSE}
plot(cars)
plot(cars.import)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
There is a solution for this sort of problem, explained here.
Basically, if you have an .R file containing your code, there is no need to repeat the code in the .Rmd file, but you can include the code from .R file. For this to work, the chunks of code should be named in the .R file, and then can be included by name in the .Rmd file.
test.R:
## ---- chunk-1 ----
table(cars.by.name$make)
test.Rmd
Just once on top of the .Rmd file:
```{r echo=FALSE, cache= F}
knitr::read_chunk('test.R')
```
For every chunk you're including (replace chunk-1 with the label of that specific chunk in your .R file):
```{r chunk-1}
```
Note that it should be left empty (as is) and in run-time your code from .R will be brought over here and run.
Often times, I have many reports that need to run the same code with slightly different parameters. Calling all my "stats" functions separately, generating the results and then just referencing is what I typically do. The way to do this is as follows:
---
title: "Untitled"
author: "Author"
date: "August 4, 2015"
output: html_document
---
```{r, echo=FALSE, message=FALSE}
directoryPath <- "rawPath" ##Something like /Users/userid/RDataFile
fullPath <- file.path(directoryPath,"myROutputFile.RData")
load(fullPath)
```
Some Text, headers whatever
```{r}
summary(myStructure$value1) #Where myStructure was saved to the .RData file
```
You can save an RData file by using the save.image() command.
Hope that helps!

Display .R script in output of .Rmd file

Is it possible to include or display an .r script in the output of .rmd file?
Important - just want to display the .r file!
Tried source(filename.r); source does not display it.
Any ideas?
**knitr Global Options**
```{r echo=TRUE}
knitr::opts_chunk$set(tidy=FALSE, fig.path='figures/')
```
**Load Libraries**
```{r echo=TRUE}
library(dplyr)
```
```{r echo=TRUE, include=TRUE}
source("external.R")
# the complete source code of the .r file should be displayed here
# possible?
```
What would be the use-case for such a requirement?
Creating .Rmd helps with documentation. In fact all my documentation is created using .Rmd.
There are .R scripts which take a long time to run (processing large data). In such a case working with .Rmd is not practical. Prefer to work with .R scripts.
If the source code of the .R can be "included & displayed" in the .Rmd would be wonderful for documentation purpose.
For this particular case, there is a simple solution. That is, you can assign source code to the chunk option code, then knitr will just take your source code as if it were written in the code chunk, e.g.
```{r, code = readLines('external.R')}
```
Alternatively and equivalently, you can use the file option:
```{r, file = 'external.R'}
```

Using R Markdown chunks in R Sweave(Knitr)

I have a R Markdown file that has my notes and chunks of code. I now want to write a R Sweave(Knitr) document to publish a paper using those chunks. I do not want to cut and paste the chunks, I rather call them directly. That way if I update the chunks, I don't have to do it in two places. It seems like it would be simple enough, but I can not figure it out. My code is as follows, test.rmd is my mark down document, foo is the chunk in the rmd file.
Test.rnw
<<Setup>>===
read_chunk('test.rmd')
#
<<foo>>==
#
Test.rmd
```{r foo, echo=TRUE}
print(summary(cars))
```
I would expect a summary of cars to be displayed in the output of the compilation of test.rnw into a PDF. But I don't. Any help is greatly appreciated.
read_chunk reads chunks from r script so call purl before read_chunk:
<<Setup>>=
knit_patterns$set(all_patterns[["md"]])
purl("test.Rmd")
knit_patterns$set(all_patterns[["rnw"]])
read_chunk("test.R")
#
<<foo>>=
#

Resources