Using an R Markdown Document as a source for functions - r

I'm looking into R Markdown for documenting functions I regularly use. I will put them into an R Markdown file to document them and then be able to read my thinking behind the function if i come back to it months later
My question is, if i start a new R project, Is it possible to source the r markdown file and use the library of functions i have created just by calling them similarly to if i was sourcing a regular R file. I dont really wish to maintain two sets of function files
I appreciate this may be a beginners question but any help pointing to tutorials and the like would be greatly appreciated
Thanks

As was mentioned in the comments, you should probably create a package for this purpose. But if you insist on putting function definitions in scripts and document them using RMarkdown files, using read_chunk() from the knitr package might be the way to go.
Note that this approach differs slightly from what you requested. You wanted to have the function definition in the markdown file together with the documentation. And then you wanted to somehow source that file into your R script in order to use the function. I did not find a way to do this (even though it might be possible).
The alternative that I propose puts the function definition in its own R script, say fun.R. The Rmarkdown file then reads the function definition from fun.R and adds documentation. If you want to use the function in some other script, you can simply source fun.R (and not the markdown file). This still means that you have to maintain the code for the function definition only once.
So let me show this with an example. This is fun.R:
## ---- fun
fun <- function(x) x^2
The first line is an identifier that will be used later. The markdown file is as follows:
---
title: "Documentation of fun()"
output: html_document
---
This documents the function `fun()` defined in `fun.R`.
```{r,cache = FALSE}
knitr::read_chunk("fun.R")
```
This is the function definition
```{r fun}
```
This is an example of how to use `fun()`:
```{r use_fun}
fun(3)
```
The first chunk reads in fun.R using knitr::read_chunk. Later on, you can define an empty chunk that has the identifier that was used in fun.R as its name. This will act as if the contents of fun.R were written directly in this file. As you see, you can also use fun() in later chunks. This is a screenshot of the resulting html file:
In a script where you want to use fun() you simply add source("fun.R") to source the function definition.
You could also have several functions in a single R file and still document them separately. Simply put an identifier starting with ## ---- before each function definition and then create empty chunks referring to each one of the identifiers.
This is admittedly somewhat more complicated than what you asked for, because it involves two files instead of just one. But at least there is no redundancy

The klmr/modules package has been superseded by the box package by the same author. It is on CRAN. After the cat command below run these lines to display the roxygen2 help for add2.
box::use(./test)
box::help(test$add2)
Perhaps this is close enough -- you can use the github klmr/modules package (not the CRAN modules package) to combine roxygen2 documentation and code in a single file without creating a package. For example, after installing the modules package copy this to the clipboard and then paste it into the R console to create a single file module with embedded documentation. The subsequent code then imports it, runs a function from it and invokes help. See the documentation of the modules package for more info.
Note that this has the following advantages: (1) everything is in a single file, (2) if you later decide to move to using packages you can use the very same file in your package with roxygen2 -- no need to revise anything, (3) any learning of roxygen2 applies to packages too.
# create a file with our documentation and code
Lines <- "
#' Add two numbers.
#'
#' #param x the first number.
#' #param y the second number.
#' #return The sum.
#' #note This is just a simple example.
#'
#' This function is a simple example intended to show how to use the modules
#' package with roxygen2.
add2 <- function(x, y) x + y
"
cat(Lines, file = "test.R")
# now we can import it
# devtools::install_github("klmr/modules")
library(modules)
test <- import("test") # do not include the .R extension
test$add2(1, 2)
## [1] 3
# this will cause help page to appear
?test$add2

Related

Does roxygen2 work for R scripts in data-raw?

I am using RStudio to create a package for a piece of data analysis I'm doing. To put my raw data into the package, I'm using devtools::use_data_raw() as per this article.
I have a script load-raw-data.R that loads the raw data and assembles it into a dataframe, then calls devtools::use_data() on this dataframe to add it to the package. load-raw-data.R is in /data-raw not /R, as per the article. I've added documentation to the functions in this script via a roxygen2 skeleton, however when I build the documentation the .Rd files for these functions are not built. I presume this is because roxygen2 is only looking in /R. Is there a way to tell roxygen2 to look in /data-raw as well? Or have I misunderstood something along the way?
Update: following #phil's suggestion
#phil - thanks - I tried this for one of the functions (load_data_files) in the load-raw-data.R script (see below for the documentation added to R/data.R), but on rebuilding the package I get an error: 'load_data_files' is not an exported object from 'namespace:clahrcnwlhf'. I have included the #export tag in the documentation in R/data.R. Any thoughts on how I might resolve this?
`# This script loads the individual component files of the raw dataset
# and stitches them together, saving the result as an .RData file
#' load_data_files
#'
#' load_data_files loads in a set of Excel files as dataframes
#'
#' #param fl list of paths of the files to be loaded
#'
#' #return A list of dataframes, one for each of the file paths in fl.
#' #export
"load_data_files"`

Pre- or post-process roxygen snippets

Is there some mechanism by which I can transform the comments that roxygen sees, preferably before it does the roxygen->rd conversion?
For example, suppose I have:
#' My function. Does stuff with numbers.
#'
#' This takes an input `x` and does something with it.
#' #param x a number.
myFunction <- function (x) {
}
Now, suppose I want to do some conversion of the comment before roxygen parses it, for example replacing all instances of things in backticks with \code{}. Ie:
preprocess <- function (txt) {
gsub('`([^ ]+)`', '\\\\code{\\1}', txt)
}
# cat(preprocess('Takes an input `x` and does something with it'.))
# Takes an input \code{x} and does something with it.
Can I feed preprocess into roxygen somehow so that it will run it on the doclets before (or after would work in this case) roxygen does its document generation?
I don't want to do a permanent find-replace in my .r files. As you might guess from my example, I'm aiming towards some rudimentary markdown support in my roxygen comments, and hence wish to keep my .r files as-is to preserve readability (and insert the \code{..} stuff programmatically).
Should I just write my own version of roxygenise that runs preprocess on all detected roxygen-style comments in my files, saves them temporarily somewhere, and then runs the actual roxygenise on those?
Revisiting this a couple of years later, it looks like Roxygen has a function register.preref.parsers that one can use to inject their own parsers into roxygen.
One such use of this is the promising maxygen package (markdown + roxygen = maxygen), which a very neat implementation of markdown processing of roxygen comments (albeit only to the CommonMark spec), and you can see how it is used in that package's macument function. I eagerly await "pandoc + roxygen = pandoxygen"... :)

Making knitr run a r script: do I use read_chunk or source?

I am running R version 2.15.3 with RStudio version 0.97.312. I have one script that reads my data from various sources and creates several data.tables. I then have another r script which uses the data.tables created in the first script. I wanted to turn the second script into a R markdown script so that the results of analysis can be outputted as a report.
I do not know the purpose of read_chunk, as opposed to source. My read_chunk is not working, but source is working. With either instance I do not get to see the objects in my workspace panel of RStudio.
Please explain the difference between read_chunk and source? Why would I use one or the other? Why will my .Rmd script not work
Here is ridiculously simplified sample
It does not work. I get the following message
Error: object 'z' not found
Two simple files...
test of source to rmd.R
x <- 1:10
y <- 3:4
z <- x*y
testing source.Rmd
Can I run another script from Rmd
========================================================
Testing if I can run "test of source to rmd.R"
```{r first part}
require(knitr)
read_chunk("test of source to rmd.R")
a <- z-1000
a
```
The above worked only if I replaced "read_chunk" with "source". I
can use the vectors outside of the code chunk as in inline usage.
So here I will tell you that the first number is `r a[1]`. The most
interesting thing is that I cannot see the variables in RStudio
workspace but it must be there somewhere.
read_chunk() only reads the source code (for future references); it does not evaluate code like source(). The purpose of read_chunk() was explained in this page as well as the manual.
There isn't an option to run a chunk interactively from within knitr AFAIK. However, this can be done easily enough with something like:
#' Run a previously loaded chunk interactively
#'
#' Takes labeled code loaded with load_chunk and runs it in the /global/ envir (unless otherwise specified)
#'
#' #param chunkName The name of the chunk as a character string
#' #param envir The environment in which the chunk is to be evaluated
run_chunk <- function(chunkName,envir=.GlobalEnv) {
chunkName <- unlist(lapply(as.list(substitute(.(chunkName)))[-1], as.character))
eval(parse(text=knitr:::knit_code$get(chunkName)),envir=envir)
}
NULL
In case it helps anyone else, I've found using read_chunk() to read a script without evaluating can be useful in two ways. First, you might have a script with many chunks and want control over which ones run where (e.g., a plot or a table in a specific place). I use source when I want to run everything in a script (for example, at the start of a document to load a standard set of packages or custom functions). I've started using read_chunk early in the document to load scripts and then selectively run the chunks I want where I need them.
Second, if you are working with an R script directly or interactively, you might want a long preamble of code that loads packages, data, etc. Such a preamble, however, could be unnecessary and slow if, for example, prior code chunks in the main document already loaded data.

Is it possible to use R package data in testthat tests or run_examples()?

I'm working on developing an R package, using devtools, testthat, and roxygen2. I have a couple of data sets in the data folder (foo.txt and bar.csv).
My file structure looks like this:
/ mypackage
/ data
* foo.txt, bar.csv
/ inst
/ tests
* run-all.R, test_1.R
/ man
/ R
I'm pretty sure 'foo' and 'bar' are documented correctly:
#' Foo data
#'
#' Sample foo data
#'
#' #name foo
#' #docType data
NULL
#' Bar data
#'
#' Sample bar data
#'
#' #name bar
#' #docType data
NULL
I would like to use the data in 'foo' and 'bar' in my documentation examples and unit tests.
For example, I would like to use these data sets in my testthat tests by calling:
data(foo)
data(bar)
expect_that(foo$col[1], equals(bar$col[1]))
And, I would like the examples in the documentation to look like this:
#' #examples
#' data(foo)
#' functionThatUsesFoo(foo)
If I try to call data(foo) while developing the package, I get the error "data set 'foo' not found". However, if I build the package, install it, and load it - then I can make the tests and examples work.
My current work-arounds are to not run the example:
#' #examples
#' \dontrun{data(foo)}
#' \dontrun{functionThatUsesFoo(foo)}
And in the tests, pre-load the data using a path specific to my local computer:
foo <- read.delim(pathToFoo, sep="\t", fill = TRUE, comment.char="#")
bar <- read.delim(pathToBar, sep=";", fill = TRUE, comment.char="#"
expect_that(foo$col[1], equals(bar$col[1]))
This does not seem ideal - especially since I'm collaborating with others - requiring all the collaborators to have the same full paths to 'foo' and 'bar'. Plus, the examples in the documentation look like they can't be run, even though once the package is installed, they can.
Any suggestions? Thanks much.
Importing non-RData files within examples/tests
I found a solution to this problem by peering at the JSONIO package, which obviously needed to provide some examples of reading files other than those of the .RData variety.
I got this to work in function-level examples, and satisfy both R CMD check mypackage as well as testthat::test_package().
(1) Re-organize your package structure so that example data directory is within inst. At some point R CMD check mypackage told me to move non-RData data files to inst/extdata, so in this new structure, that is also renamed.
/ mypackage
/ inst
/ tests
* run-all.R, test_1.R
/ extdata
* foo.txt, bar.csv
/ man
/ R
/ tests
* run-testthat-mypackage.R
(2) (Optional) Add a top-level tests directory so that your new testthat tests are now also run during R CMD check mypackage.
The run-testthat-mypackage.R script should have at minimum the following two lines:
library("testthat")
test_package("mypackage")
Note that this is the part that allows testthat to be called during R CMD check mypackage, and not necessary otherwise. You should add testthat as a "Suggests:" dependency in your DESCRIPTION file as well.
(3) Finally, the secret-sauce for specifying your within-package path:
barfile <- system.file("extdata", "bar.csv", package="mypackage")
bar <- read.csv(barfile)
# remainder of example/test code here...
If you look at the output of the system.file() command, it is returning the full system path to your package within the R framework. On Mac OS X this looks something like:
"/Library/Frameworks/R.framework/Versions/2.15/Resources/library/mypackage/extdata/bar.csv"
The reason this seems okay to me is that you don't hard code any path features other than those within your package, so this approach should be robust relative to other R installations on other systems.
data() approach
As for the data() semantics, as far as I can tell this is specific to R binary (.RData) files in the top-level data directory. So you can circumvent my example above by pre-importing the data files and saving them with the save() command into your data-directory. However, this assumes you only need to show an example in which the data is already loaded into R, as opposed to also reproducibly demonstrating the upstream process of importing the files.
Per #hadley's comment, the .RData conversion will work well.
As for the broader question of team collaboration with different environments across team members, a common pattern is to agree on a single environment variable, e.g., FOO_PROJECT_ROOT, that everyone on the team will set up appropriately in their environment. From that point on you can use relative paths, including across projects.
An R-specific approach would be to agree on some data/functions that every team member will set up in their .Rprofile files. That's, for example, how devtools finds packages in non-standard locations.
Last but not least, though it is not optimal, you can actually put developer-specific code in your repository. If #hadley does it, it's not such a bad thing. See, for example, how he activates certain behaviors in testthat in his own environment.

Print the sourced R file to an appendix using Sweave

I keep R and Rnw files separate, then load the R data/plots with load("file.R") in the first Sweave chunk. Is there a way that I can print the sourced R file to an appendix without executing all of the code? (i.e., the code is slow enough that I don't want to source() it in an echo=TRUE chunk).
Thanks!
Update -- actually, I don't think my source() idea works.
How about using a Latex package?
Add into your header
\usepackage{fancyvrb}
Then
\VerbatimInput{yourRfile.R}
You can use highlight package to output nicely formatted, colorful code:
highlight("myRfile.R", renderer = renderer_latex(document = F))
But don't forget to put in your latex doc the lengthy preamble which you get with document=T.
You can experiment with code directly:
highlight(output="test.tex",
parser.output = parser(text = deparse(lm)),
renderer = renderer_latex(document = T))
And get
I usually solve this by:
\begin{appendix}
\section{Appendix A}
\subsection{R session information}
<<SessionInforamtaion,echo=F,eval=T,results=tex>>=
toLatex(sessionInfo())
#
\subsection{The simulation's source code}
<<SourceCode,echo=F,eval=T>>=
Stangle(file.path("Projectpath","RnwFile.Rnw"))
SourceCode <- readLines(file.path("Projectpath","Codefile.R"))
writeLines(SourceCode)
#
\end{appendix}
Using this you have to think of a maximum numbers of characters per line.
Separating R and Rnw files sort of defeats the purpose of literate programming. My own approach is to include the code chunks at the appropriate place in the text. If my audience isn't interested in the code, then I might mark it as
<<foo, echo=FALSE>>=
x <- 1:10
#
I might assemble the code in an appendix as
<<appendix-foo, eval=FALSE>>=
<<foo>>
#
which I admit is a bit of a kludge and error prone (forgotten chunks). One quickly wants to bundle the document with supporting material (data sets, useful helper functions, non-R scripts) into an R package, and these are not difficult to create. Building the package automatically creates the pdf and Stangle'd R file, which is exactly what you want. Package building can be a slow process, but installing the package does not require that the vignettes be rebuilt and so is fast and convenient for whomever you're giving the package to.
For twiddling with formatting / text, I use a global option \SweaveOpts{eval=FALSE}.

Resources