Using source subdirectories within R packages with roxygen2 - r

I would like to use a directory structure within the R folder for the source code of a package. For example, within my R folder I have an algos folder with functions I want to export and document. However roxygen2 by default does not seem to go through the subfolders of the R folder.
I tried to use the #include keyword as follows for a file at `R/algos/algo1.r'
#' #include algos/algo1.r
but without success. Is there a simple way to use subfolder for the R source code?

Writing R Extensions has this to say (in Section 1.1.5) about subdirectories under the R directory:
The R and man subdirectories may contain OS-specific subdirectories named unix or windows.
Implied in this is that they can't have other subdirectories other than those two. This is confirmed in an r-devel thread and again later in another r-devel thread.

Another straightforward alternative that I've started to use is to simply define related functions within the same .R file and name these files something that unifies the functions. In the example above one could have something like algos.R in the /R folder and within algos.R:
#' #roxygen_header1
#' #export
algo1 <- function(...){}
#' #roxygen_header2
#' #export
algo2 <- function(...){}
I think this makes navigating /R much more intuitive (at least for the developer, but probably for users too)

Related

Finding help files for moduled package functions in R

I am writing an R package which is organized into modules as per Sebastian Warnholz's modules package. Each function is organized into its own R file under, e.g., R/m1/fun.R. Each one of those files begins with roxygen code. The modules are defined in another file, R/modules.R. Here's an idea of how that file is structured:
#' Module 1
#' #name m1
#' #export
m1 <- modules::module({
expose("R/m1/fun.R")
expose("R/m1/foo.R")
})
The package checks and builds cleanly and I can call functions by issuing m1$fun() and m1$foo(). However, calling the help files don't work, no matter what I try (i.e., combinations of ? or help() with the function names, with or without the module prefix m1$.
Actually, I can't even expect the help files to be there, because after running devtools::document(), the roxygen code is not converted into man/*.Rd files. So I guess my problem is having devtools::document() search into the R subfolders. Running devtools::document("R/m1") doesn't do the trick, though.
One thing that works is putting the function scripts in the parent folder, R/, but then they lose the module scope and the help files (but not the functions themselves) can be seen at package level. Moreover, the "usage" section will state "foo(...)" instead of "m1$foo(...)", which sounds inadequate but I am not sure is currently fixable. This is my first time working with modules, so I was wondering if there's a cleaner way of organizing my functions and help files.

How to call R script from another R script, both in same package?

I'm building a package that uses two main functions. One of the functions model.R requires a special type of simulation sim.R and a way to set up the results in a table table.R
In a sharable package, how do I call both the sim.R and table.R files from within model.R? I've tried source("sim.R") and source("R/sim.R") but that call doesn't work from within the package. Any ideas?
Should I just copy and paste the codes from sim.R and table.R into the model.R script instead?
Edit:
I have all the scripts in the R directory, the DESCRIPTION and NAMESPACE files are all set. I just have multiple scripts in the R directory. ~R/ has premodel.R model.R sim.R and table.R. I need the model.R script to use both sim.R and table.R functions... located in the same directory in the package (e.g. ~R/).
To elaborate on joran's point, when you build a package you don't need to source functions.
For example, imagine I want to make a package named TEST. I will begin by generating a directory (i.e. folder) named TEST. Within TEST I will create another folder name R, in that folder I will include all R script(s) containing the different functions in the package.
At a minimum you need to also include a DESCRIPTION and NAMESPACE file. A man (for help files) and tests (for unit tests) are also nice to include.
Making a package is pretty easy. Here is a blog with a straightforward introduction: http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
As others have pointed out you don't have to source R files in a package. The package loading mechanism will take care of losing the namespace and making all exported functions available. So usually you don't have to worry about any of this.
There are exceptions however. If you have multiple files with R code situations can arise where the order in which these files are processed matters. Often it doesn't matter or the default order used by R happens to be fine. If you find that there are some dependencies within your package that aren't resolved properly you may be faced with a situation where a custom processing order for the R files is required. The DESCRIPTION file offers the optional Collate field for this purpose. Simply list all your R files in the order they should be processed to satisfy the dependencies.
If all your files are in R directory, any function will be in memory after you do a package build or Load_All.
You may have issues if you have code in files that is not in a function tho.
R loads files in alphabetical order.
Usually, this is not a problem, because functions are evaluated when they are called for execution, not at loading time (id. a function can refer another function not yet defined, even in the same file).
But, if you have code outside a function in model.R, this code will be executed immediately at time of file loading, and your package build will fail usually with a
ERROR: lazy loading failed for package 'yourPackageName'
If this is the case, wrap the sparse code of model.R into a function so you can call it later, when the package has fully loaded, external library too.
If this piece of code is there for initialize some value, consider to use_data() to have R take care of load data into the environment for you.
If this piece of code is just interactive code written to test and implement the package itself, you should consider to put it elsewhere or wrap it to a function anyway.
if you really need that code to be executed at loading time or really have dependency to solve, then you must add the collate line into DESCRIPTION file, as already stated by Peter Humburg, to force R to load files order.
Roxygen2 can help you, put before your code
#' #include sim.R table.R
call roxygenize(), and collate line will be generate for you into the DESCRIPTION file.
But even doing that, external library you may depend are not yet loaded by the package, leading to failure again at build time.
In conclusion, you'd better don't leave code outside functions in a .R file if it's located inside a package.
Since you're building a package, the reason why you're having trouble accessing the other functions in your /R directory is because you need to first:
library(devtools)
document()
from within the working directory of your package. Now each function in your package should be accessible to any other function. Then, to finish up, do:
build()
install()
although it should be noted that a simple document() call will already be sufficient to solve your problem.
Make your functions global by defining them with <<- instead of <- and they will become available to any other script running in that environment.

R package: use a custom file/directories structure in pkg/R and pkg/src folders

I'm writing a R package which begins to grow in size, and so would really appreciate to use a custom structure in folders pkg/R/ and (especially) in pkg/src/.
For example, let's say I have two families of algorithms of some type A, and some functions of type B, and a main entry point. Ideally R/ or src/ folders would be organized as follow:
typeA/
algorithms1/
algo11.ext
...
algorithms2/
algo21.ext
...
typeB/
function1.ext
...
main.ext
with "ext" in {R,cpp,c,f,...}, and potentially two files having the same name.
Is it possible ? If yes, how can I do that ?
Thanks in advance !
[2012-12-31] EDIT: an idea would be to write a few scripts - maybe inside another R package - to (un)flatten a structured package for tests or diffusion. But there is probably a better solution, so I will wait a bit.
As the 'Writing R extensions' manual indicates here, a Makevars file under pkg/src allows to have nested subfolders for C/C++/Fortran code. (See e.g. RSiena package).
However, I didn't find anything concerning a custom structure in pkg/R. So I wrote a little package (usable, although needing improvements) which accomplish the following tasks:
Load/Unload a package having (potentially) nested folders under pkg/R
Launch R and/or C unit tests on it [basic framework, to be replaced (e.g. RUnit and check)]
Export the package to be CRAN-compatible (flatten R code, generate Makevars file)
I will link it here if it reaches a publishable state. (For the moment I could send it by email).
The official package documentation https://cran.r-project.org/doc/manuals/r-devel/R-exts.html, section 1.1.5 contains this quote:
The R and man subdirectories may contain OS-specific subdirectories named unix or windows.
I've tried creating a simple test package with subdirectories in R-3.5.1 and it did not work properly.
Nor devtools::load_all() nor R CMD build successfully exported code from subdirectories in R.

Is it possible to use R package data in testthat tests or run_examples()?

I'm working on developing an R package, using devtools, testthat, and roxygen2. I have a couple of data sets in the data folder (foo.txt and bar.csv).
My file structure looks like this:
/ mypackage
/ data
* foo.txt, bar.csv
/ inst
/ tests
* run-all.R, test_1.R
/ man
/ R
I'm pretty sure 'foo' and 'bar' are documented correctly:
#' Foo data
#'
#' Sample foo data
#'
#' #name foo
#' #docType data
NULL
#' Bar data
#'
#' Sample bar data
#'
#' #name bar
#' #docType data
NULL
I would like to use the data in 'foo' and 'bar' in my documentation examples and unit tests.
For example, I would like to use these data sets in my testthat tests by calling:
data(foo)
data(bar)
expect_that(foo$col[1], equals(bar$col[1]))
And, I would like the examples in the documentation to look like this:
#' #examples
#' data(foo)
#' functionThatUsesFoo(foo)
If I try to call data(foo) while developing the package, I get the error "data set 'foo' not found". However, if I build the package, install it, and load it - then I can make the tests and examples work.
My current work-arounds are to not run the example:
#' #examples
#' \dontrun{data(foo)}
#' \dontrun{functionThatUsesFoo(foo)}
And in the tests, pre-load the data using a path specific to my local computer:
foo <- read.delim(pathToFoo, sep="\t", fill = TRUE, comment.char="#")
bar <- read.delim(pathToBar, sep=";", fill = TRUE, comment.char="#"
expect_that(foo$col[1], equals(bar$col[1]))
This does not seem ideal - especially since I'm collaborating with others - requiring all the collaborators to have the same full paths to 'foo' and 'bar'. Plus, the examples in the documentation look like they can't be run, even though once the package is installed, they can.
Any suggestions? Thanks much.
Importing non-RData files within examples/tests
I found a solution to this problem by peering at the JSONIO package, which obviously needed to provide some examples of reading files other than those of the .RData variety.
I got this to work in function-level examples, and satisfy both R CMD check mypackage as well as testthat::test_package().
(1) Re-organize your package structure so that example data directory is within inst. At some point R CMD check mypackage told me to move non-RData data files to inst/extdata, so in this new structure, that is also renamed.
/ mypackage
/ inst
/ tests
* run-all.R, test_1.R
/ extdata
* foo.txt, bar.csv
/ man
/ R
/ tests
* run-testthat-mypackage.R
(2) (Optional) Add a top-level tests directory so that your new testthat tests are now also run during R CMD check mypackage.
The run-testthat-mypackage.R script should have at minimum the following two lines:
library("testthat")
test_package("mypackage")
Note that this is the part that allows testthat to be called during R CMD check mypackage, and not necessary otherwise. You should add testthat as a "Suggests:" dependency in your DESCRIPTION file as well.
(3) Finally, the secret-sauce for specifying your within-package path:
barfile <- system.file("extdata", "bar.csv", package="mypackage")
bar <- read.csv(barfile)
# remainder of example/test code here...
If you look at the output of the system.file() command, it is returning the full system path to your package within the R framework. On Mac OS X this looks something like:
"/Library/Frameworks/R.framework/Versions/2.15/Resources/library/mypackage/extdata/bar.csv"
The reason this seems okay to me is that you don't hard code any path features other than those within your package, so this approach should be robust relative to other R installations on other systems.
data() approach
As for the data() semantics, as far as I can tell this is specific to R binary (.RData) files in the top-level data directory. So you can circumvent my example above by pre-importing the data files and saving them with the save() command into your data-directory. However, this assumes you only need to show an example in which the data is already loaded into R, as opposed to also reproducibly demonstrating the upstream process of importing the files.
Per #hadley's comment, the .RData conversion will work well.
As for the broader question of team collaboration with different environments across team members, a common pattern is to agree on a single environment variable, e.g., FOO_PROJECT_ROOT, that everyone on the team will set up appropriately in their environment. From that point on you can use relative paths, including across projects.
An R-specific approach would be to agree on some data/functions that every team member will set up in their .Rprofile files. That's, for example, how devtools finds packages in non-standard locations.
Last but not least, though it is not optimal, you can actually put developer-specific code in your repository. If #hadley does it, it's not such a bad thing. See, for example, how he activates certain behaviors in testthat in his own environment.

Including Script Files in an R Extension Package

I'm creating an R package and I need it to include a couple of non R script files which get called by one of my functions. I need these script files to be distributed with the package, naturally. So that leaves me with two questions:
a) In which directory of the package
tree should I place these files? b) Is that location mandatory or just convention?
Do I need to change any other
settings or configurations or will
they just get copied to the
directory mentioned in #1 and then I
can figure out the path using
system.file()?
I've tried to find the answer in the Writing R Extensions document, but it didn't jump out at me. And, of course, I didn't read the whole thing. Am I being too honest here?
I think you want either exec/ at the top-level (even though that is labeled 'still experimental, or subdirectory of inst as everything in inst/ gets copied verbatim into the package.
A quick example from the packages I have expanded in source is gdata which has inst/perl, inst/xls and inst/bin. These you could then call from R itself by computing the path of the installed package using system.file().

Resources