Does roxygen2 work for R scripts in data-raw? - r

I am using RStudio to create a package for a piece of data analysis I'm doing. To put my raw data into the package, I'm using devtools::use_data_raw() as per this article.
I have a script load-raw-data.R that loads the raw data and assembles it into a dataframe, then calls devtools::use_data() on this dataframe to add it to the package. load-raw-data.R is in /data-raw not /R, as per the article. I've added documentation to the functions in this script via a roxygen2 skeleton, however when I build the documentation the .Rd files for these functions are not built. I presume this is because roxygen2 is only looking in /R. Is there a way to tell roxygen2 to look in /data-raw as well? Or have I misunderstood something along the way?
Update: following #phil's suggestion
#phil - thanks - I tried this for one of the functions (load_data_files) in the load-raw-data.R script (see below for the documentation added to R/data.R), but on rebuilding the package I get an error: 'load_data_files' is not an exported object from 'namespace:clahrcnwlhf'. I have included the #export tag in the documentation in R/data.R. Any thoughts on how I might resolve this?
`# This script loads the individual component files of the raw dataset
# and stitches them together, saving the result as an .RData file
#' load_data_files
#'
#' load_data_files loads in a set of Excel files as dataframes
#'
#' #param fl list of paths of the files to be loaded
#'
#' #return A list of dataframes, one for each of the file paths in fl.
#' #export
"load_data_files"`

Related

Using an R Markdown Document as a source for functions

I'm looking into R Markdown for documenting functions I regularly use. I will put them into an R Markdown file to document them and then be able to read my thinking behind the function if i come back to it months later
My question is, if i start a new R project, Is it possible to source the r markdown file and use the library of functions i have created just by calling them similarly to if i was sourcing a regular R file. I dont really wish to maintain two sets of function files
I appreciate this may be a beginners question but any help pointing to tutorials and the like would be greatly appreciated
Thanks
As was mentioned in the comments, you should probably create a package for this purpose. But if you insist on putting function definitions in scripts and document them using RMarkdown files, using read_chunk() from the knitr package might be the way to go.
Note that this approach differs slightly from what you requested. You wanted to have the function definition in the markdown file together with the documentation. And then you wanted to somehow source that file into your R script in order to use the function. I did not find a way to do this (even though it might be possible).
The alternative that I propose puts the function definition in its own R script, say fun.R. The Rmarkdown file then reads the function definition from fun.R and adds documentation. If you want to use the function in some other script, you can simply source fun.R (and not the markdown file). This still means that you have to maintain the code for the function definition only once.
So let me show this with an example. This is fun.R:
## ---- fun
fun <- function(x) x^2
The first line is an identifier that will be used later. The markdown file is as follows:
---
title: "Documentation of fun()"
output: html_document
---
This documents the function `fun()` defined in `fun.R`.
```{r,cache = FALSE}
knitr::read_chunk("fun.R")
```
This is the function definition
```{r fun}
```
This is an example of how to use `fun()`:
```{r use_fun}
fun(3)
```
The first chunk reads in fun.R using knitr::read_chunk. Later on, you can define an empty chunk that has the identifier that was used in fun.R as its name. This will act as if the contents of fun.R were written directly in this file. As you see, you can also use fun() in later chunks. This is a screenshot of the resulting html file:
In a script where you want to use fun() you simply add source("fun.R") to source the function definition.
You could also have several functions in a single R file and still document them separately. Simply put an identifier starting with ## ---- before each function definition and then create empty chunks referring to each one of the identifiers.
This is admittedly somewhat more complicated than what you asked for, because it involves two files instead of just one. But at least there is no redundancy
The klmr/modules package has been superseded by the box package by the same author. It is on CRAN. After the cat command below run these lines to display the roxygen2 help for add2.
box::use(./test)
box::help(test$add2)
Perhaps this is close enough -- you can use the github klmr/modules package (not the CRAN modules package) to combine roxygen2 documentation and code in a single file without creating a package. For example, after installing the modules package copy this to the clipboard and then paste it into the R console to create a single file module with embedded documentation. The subsequent code then imports it, runs a function from it and invokes help. See the documentation of the modules package for more info.
Note that this has the following advantages: (1) everything is in a single file, (2) if you later decide to move to using packages you can use the very same file in your package with roxygen2 -- no need to revise anything, (3) any learning of roxygen2 applies to packages too.
# create a file with our documentation and code
Lines <- "
#' Add two numbers.
#'
#' #param x the first number.
#' #param y the second number.
#' #return The sum.
#' #note This is just a simple example.
#'
#' This function is a simple example intended to show how to use the modules
#' package with roxygen2.
add2 <- function(x, y) x + y
"
cat(Lines, file = "test.R")
# now we can import it
# devtools::install_github("klmr/modules")
library(modules)
test <- import("test") # do not include the .R extension
test$add2(1, 2)
## [1] 3
# this will cause help page to appear
?test$add2

Referencing user-created functions in R from seperate scripts

I'm trying to re-use some code that I've already written but often need to re-execute for various projects (IE I'd like to apply some Object-Oriented principles to my R code). I know that a framework exists for publishing new packages on CRAN, but the code I have isn't something that would be valuable for other parties.
Essentially I'd like to either create my own local packages and reference them using a require() call or at the very least call functions that I've saved in separate .r files as-needed.
I've searched around online and found several lengthy articles about creating packages and compiling them using RTools (I'm on a Windows OS) but since I'm not writing C this seems overkill for my simple purposes. To offer an example of what I'm referring to, I have a script to remove unwanted characters from string data that I constantly need to copy/paste into new scripts; I don't want to do this and would prefer to just do something like require(myFunction).
Is there a simple way to solve this problem or am I best served by grabbing RTools and compiling my custom functions locally?
Creating an R package is actually super easy. The link from Alex is how I started my first package. Here's a slightly simplified version I have to give my students. (NB: full credit to Hilary Parker, the author of the original blogpost).
First install devtools and roxygen:
install.packages("devtools")
library("devtools")
install.packages("roxygen2")
library("roxygen2")
Make a new directory for your functions:
setwd("/path/to/parentdirectory")
create("mypackage")
Add your functions to a file (or files) named anything.R in the R directory. The file should look like this, you can have one function per file, or multiple:
mymeanfun <- function(x){
mean(x)
}
myfilterfun <- function(x, y){
filter(x, y)
}
Now you should document the code. You can document (and import) using roxygen. Make sure you #import functions from any other packages, and #export the functions you want available. Roxygen and devtools will take care of everything else (namespace, requires etc etc.) until you get more advanced. Everything else is optional:
#' My Mean Function
#'
#' Takes the mean
#' #param x any default data type
#' #export
#' #examples
#' mymeanfun(c(1,2,3))
mymeanfun <- function(x){
mean(x)
}
#' My Filter Function
#'
#' Identical to dplyr::filter
#' #param x a data.frame
#' #export
#' #importFrom dplyr filter
myfilterfun <- function(x, y){
filter(x, y)
}
Now run the document() from roxygen2 in the directory you created:
setwd(".\mypackage")
document()
You are now up and running - I'd recommend putting it on github and installing from there:
install_github("yourgithubname/mypackage")
From then on, you can just call:
library(mypackage)
Every time you need your functions.
For more details and better documentation practices, see Hadley's book

Best way to use support function in R to stay DRY

While working on my first R package a noticed that when the package structure gets created in the man directory "man" there is a documentation file for each function/method in the code.
In order to stay DRY (don't repeat yourself) I used some functions as "auxiliary" functions in loops or iteration. How can I tell R that I do not want to provide any documentation for them given that they should not be called directly by the end user?!?!
Use the roxygen2 and devtools packages to document your functions and build your package.
#' Function 1 Title
#'
#' Describe what function 1
#' does in a paragraph. This function
#' will be exported for external use because
#' it includes the #export tag.
#'
#' #param parameter1 describe the first parameter
#' #param parameter2 describe the second parameter
#' #examples
#' function1(letters[1:10], 1:10)
#' #export
function1 <- function(parameter1, parameter2) {
paste(parameter1, parameter2)
}
#' Function 2 Title
#'
#' Description here. This will not
#' be added to the NAMESPACE.
#'
#' #param parameter1
function2 <- function(parameter1) {
parameter1
}
Once you have all your documentation, use the tools in the devtools package to build, document, and check your package. It will automatically update the man files and DESCRIPTION, and add / remove functions from the NAMESPACE.
document()
build()
check()
I also recommend using the rbundler package to control how you load packages.
If you do not export them via the NAMESPACE you are not expected to provide documentation.
Another (older) was is too simple create one, say, internal.Rd and define a bunch of \alias{foo}, \alias{bar}, \alias{frob} and that way codetools is happy too.
thanks #Jojoshua-ulrich and #dirk-eddelbuettel
According to "Writing R Extensions":
The man subdirectory should contain (only) documentation files for the objects in the package in R documentation (Rd) format. The documentation filenames must start with an ASCII (lower or upper case) letter or digit and have the extension .Rd (the default) or .rd. Further, the names must be valid in ‘file://’ URLs, which means9 they must be entirely ASCII and not contain ‘%’. See Writing R documentation files, for more information. Note that all user-level objects in a package should be documented; if a package pkg contains user-level objects which are for “internal” use only, it should provide a file pkg-internal.Rd which documents all such objects, and clearly states that these are not meant to be called by the user. See e.g. the sources for package grid in the R distribution for an example. Note that packages which use internal objects extensively should not export those objects from their namespace, when they do not need to be documented (see Package namespaces).
By the way, is there any convention to include comments in the code so that man grabs the function description, arguments description etc directly from the code?

Package .Rd files using roxygen2 package

I have a question about creating an .Rd file for my R package using the roxygen2 package.
It is clear to me that, for documenting R functions, I can use C-c C-o in emacs to generate comments above the function, and then fill them out, followed by roxygenize("pkg"). In this way, I have .Rd files for the R functions. However, I am not sure how can I get .Rd file for the data examples and the package itself? Currently, I am using prompt("data") to generate data.Rd and promptPackage("pkg") to generate pkg-package.Rd. I have to put these files into the man folder, and then edit them separately. How can I document data and package in a similar way like documenting R functions using roxygen2?
Thank you very much!
For data, see this previous question on SO which suggests:
#' This is data to be included in my package
#'
#' #name data-name
#' #docType data
#' #author My Name \email{blahblah##roxygen.org}
#' #references \url{data_blah.com}
#' #keywords data
NULL
I would suspect that you can do the same for pkg-package.Rd. If it must be in roxygen format, consider
a manual translation or
the rd2roxygen package.

Is it possible to use R package data in testthat tests or run_examples()?

I'm working on developing an R package, using devtools, testthat, and roxygen2. I have a couple of data sets in the data folder (foo.txt and bar.csv).
My file structure looks like this:
/ mypackage
/ data
* foo.txt, bar.csv
/ inst
/ tests
* run-all.R, test_1.R
/ man
/ R
I'm pretty sure 'foo' and 'bar' are documented correctly:
#' Foo data
#'
#' Sample foo data
#'
#' #name foo
#' #docType data
NULL
#' Bar data
#'
#' Sample bar data
#'
#' #name bar
#' #docType data
NULL
I would like to use the data in 'foo' and 'bar' in my documentation examples and unit tests.
For example, I would like to use these data sets in my testthat tests by calling:
data(foo)
data(bar)
expect_that(foo$col[1], equals(bar$col[1]))
And, I would like the examples in the documentation to look like this:
#' #examples
#' data(foo)
#' functionThatUsesFoo(foo)
If I try to call data(foo) while developing the package, I get the error "data set 'foo' not found". However, if I build the package, install it, and load it - then I can make the tests and examples work.
My current work-arounds are to not run the example:
#' #examples
#' \dontrun{data(foo)}
#' \dontrun{functionThatUsesFoo(foo)}
And in the tests, pre-load the data using a path specific to my local computer:
foo <- read.delim(pathToFoo, sep="\t", fill = TRUE, comment.char="#")
bar <- read.delim(pathToBar, sep=";", fill = TRUE, comment.char="#"
expect_that(foo$col[1], equals(bar$col[1]))
This does not seem ideal - especially since I'm collaborating with others - requiring all the collaborators to have the same full paths to 'foo' and 'bar'. Plus, the examples in the documentation look like they can't be run, even though once the package is installed, they can.
Any suggestions? Thanks much.
Importing non-RData files within examples/tests
I found a solution to this problem by peering at the JSONIO package, which obviously needed to provide some examples of reading files other than those of the .RData variety.
I got this to work in function-level examples, and satisfy both R CMD check mypackage as well as testthat::test_package().
(1) Re-organize your package structure so that example data directory is within inst. At some point R CMD check mypackage told me to move non-RData data files to inst/extdata, so in this new structure, that is also renamed.
/ mypackage
/ inst
/ tests
* run-all.R, test_1.R
/ extdata
* foo.txt, bar.csv
/ man
/ R
/ tests
* run-testthat-mypackage.R
(2) (Optional) Add a top-level tests directory so that your new testthat tests are now also run during R CMD check mypackage.
The run-testthat-mypackage.R script should have at minimum the following two lines:
library("testthat")
test_package("mypackage")
Note that this is the part that allows testthat to be called during R CMD check mypackage, and not necessary otherwise. You should add testthat as a "Suggests:" dependency in your DESCRIPTION file as well.
(3) Finally, the secret-sauce for specifying your within-package path:
barfile <- system.file("extdata", "bar.csv", package="mypackage")
bar <- read.csv(barfile)
# remainder of example/test code here...
If you look at the output of the system.file() command, it is returning the full system path to your package within the R framework. On Mac OS X this looks something like:
"/Library/Frameworks/R.framework/Versions/2.15/Resources/library/mypackage/extdata/bar.csv"
The reason this seems okay to me is that you don't hard code any path features other than those within your package, so this approach should be robust relative to other R installations on other systems.
data() approach
As for the data() semantics, as far as I can tell this is specific to R binary (.RData) files in the top-level data directory. So you can circumvent my example above by pre-importing the data files and saving them with the save() command into your data-directory. However, this assumes you only need to show an example in which the data is already loaded into R, as opposed to also reproducibly demonstrating the upstream process of importing the files.
Per #hadley's comment, the .RData conversion will work well.
As for the broader question of team collaboration with different environments across team members, a common pattern is to agree on a single environment variable, e.g., FOO_PROJECT_ROOT, that everyone on the team will set up appropriately in their environment. From that point on you can use relative paths, including across projects.
An R-specific approach would be to agree on some data/functions that every team member will set up in their .Rprofile files. That's, for example, how devtools finds packages in non-standard locations.
Last but not least, though it is not optimal, you can actually put developer-specific code in your repository. If #hadley does it, it's not such a bad thing. See, for example, how he activates certain behaviors in testthat in his own environment.

Resources