How to manage R extension / package documentation with grace (or at least without pain) - r

Now and then I embrace project specific code in R packages. I use the documentation files as suggested by Writing R Extensions to document the application of the code.
So once you set up your project and did all the editing to the .Rd files,
how do you manage a painless and clean versioning without rewriting or intense copy-pasting of all the documentation files in case of code or, even worse, code structure changes?
To be more verbose, my current workflow is that I issue package.skeleton(), do the editing on the .Rd-files followed by R CMD check and R CMD build. When I do changes to my code I need to redo the above maybe appending '.2.0.1' or whatever in order to preserve the precursor version. Before running the R CMD check command I need to repopulate all the .Rd-files with great care in order to get a clean check and succsessful compilation of Tex-files. This is really silly and sometimes a real pain, e.g. if you want to address all the warnings or latex has a bad day.
What tricks do you use? Please share your workflow.

The solution you're looking for is roxygen2.
RStudio provides a handy guide, but briefly you document your function in-line with the function definition:
#' Function doing something
#' Extended description goes here
#' #param x Input foo blah
#' #return A numeric vector length one containing foo
myFunc <- function(x) NULL
If you're using RStudio (and maybe ESS also?) the Build Package command automagically creates the .Rd files for you. If not, you can read the roxygen2 documentation for the commands to generate the docs.

Related

Creating R package using code from script file

I’ve written some R functions and dropped them into a script file using RStudio. These are bits of code that I use over and over, so I’m wondering how I might most easily create an R package out of them (for my own private use).
I’ve read various “how to” guides online but they’re quite complicated. Can anyone suggest an “idiot’s guide” to doing this please?
I've been involved in creating R packages recently, so I can help you with that. Before proceeding to the steps to be followed, there are some pre-requisites, which include:
RStudio
devtools package (for most of the functions involved in creation of a package)
roxygen2 package (for roxygen documentation)
In case you don't have the aforementioned packages, you can install them with these commands respectively:
install.packages("devtools")
install.packages("roxygen2")
Steps:
(1) Import devtools in RStudio by using library(devtools).
(devtools is a core package that makes creating R packages easier with its tools)
(2) Create your package by using:
create_package("~/directory/package_name") for a custom directory.
or
create_package("package_name") if you want your package to be created in current workspace directory.
(3) Soon after you execute this function, it will open a new RStudio session. You will observe that in the old session some lines will be auto-generated which basically tells R to create a new package with required components in the specified directory.
After this, we are done with this old instance of RStudio. We will continue our work on the new RStudio session window.
By far the package creation part is already over (yes, that simple) however, a package isn't directly functionable just by its creation plus the fact that you need to include a function in it requires some additional aspects of a package such as its documentation (where the function's title, parameters, return types, examples etc as mentioned using #param, #return etc - you would be familiar if you see roxygen documentation like in some github repositories) and R CMD checks to get it working.
I'll get to that in the subsequent steps, but just in case you want to verify that your package is created, you can look at:
The top right corner of the new RStudio session, where you can see the package name that you created.
The console, where you will see that R created a new directory/folder in the path that we specified in create_package() function.
The files panel of RStudio session, where you'll notice a bunch of new files and directories within your directory.
(4) As you mentioned in your words, you drop your functions in a script file - hence you will need to create the script first, which can be done using:
use_r("function_name")
A new R script will pop up in your working session, ready to be used.
Now go ahead and write your function(s) in it.
(5) After your done, you need to load the function(s) you have written for your package. This is accomplished by using the devtools::load_all() function.
When you execute load_all() in the console, you'll get to know that the functions have been loaded into your package when you'll see Loading package_name displayed in console.
You can try calling your functions after that in the console to verify that they work as a part of the package.
(6) Now that your function has been written and loaded into your package, it is time to move onto checks. It is a good practice to check the whole package as we make changes to our package. The function devtools::check() offers an easy way to do this.
Try executing check() in the console, it will go through a number of procedures checking your package for warnings/errors and give details for the same as messages on the screen (pertaining to what are the errors/warnings/notes). The R CMD check results at the end will contain the vital logs for you to see what are the errors and warnings you got along with their frequency.
If the functions in your package are written well, (with additional package dependencies taken care of) it will give you two warnings upon execution of check:
The first warning will be regarding the license that your package uses, which is not specified for a new pacakge.
The second should be the one for documentation, warning us that our code is not documented.
To resolve the first issue which is the license, use the use_mit_license("license_holder_name") command (or any other license which suits your package - but then for private use as you mentioned, it doesn't really matter what you specify if only your going to use it or not its to be distributed) with your name as in place of license_holder_name or anything which suits a license name.
This will add the license field in the .DESCRIPTION file (in your files panel) plus create additional files adding the license information.
Also you'll need to edit the .DESCRIPTION file, which have self-explanatory fields to fill-in or edit. Here is an example of how you can have it:
Package: Your_package_name
Title: Give a brief title
Version: 1.0.0.0
Authors#R:
person(given = "Your_first_name",
family = "Your_surname/family_name",
role = c("package_creator", "author"),
email = "youremailaddress#gmail.com",
comment = c(ORCID = "YOUR-ORCID-ID"))
Description: Give a brief description considering your package functionality.
License: will be updated with whatever license you provide, the above step will take care of this line.
Encoding: UTF-8
LazyData: true
To resolve the documentation warning, you'll need to document your function using roxygen documentation. An example:
#' #param a parameter one
#' #param b parameter two
#' #return sum of a and b
#' #export
#'
#' #examples
#' yourfunction(1,2)
yourfunction <- function(a,b)
{
sum <- a+b
return(sum)
}
Follow the roxygen syntax and add attributes as you desire, some may be optional such as #title for specifying title, while others such as #import are required (must) if your importing from other packages other than base R.
After your done documenting your function(s) using the Roxygen skeleton, we can tell our package that we have documented our functions by running devtools::document(). After you execute the document() command, perform check() again to see if you get any warnings. If you don't, then that means you're good to go. (you won't if you follow the steps)
Lastly, you'll need to install the package, for it to be accessible by R. Simply use the install() command (yes the same one you used at the beginning, except you don't need to specify the package here like install("package") since you are currently working in an instance where the package is loaded and is ready to be deployed/installed) and you'll see after a few lines of installation a statement like "Done (package_name)", which indicates the installation of our package is complete.
Now you can try your function by first importing your package using library("package_name") and then calling your desired function from the package. Thats it, congrats you did it!
I've tried to include the procedure in a lucid way (the way I create my R packages), but if you have any doubts feel free to ask.

Are there any good resources/best-practices to "industrialize" code in R for a data science project?

I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!
You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.
One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.

Finding help files for moduled package functions in R

I am writing an R package which is organized into modules as per Sebastian Warnholz's modules package. Each function is organized into its own R file under, e.g., R/m1/fun.R. Each one of those files begins with roxygen code. The modules are defined in another file, R/modules.R. Here's an idea of how that file is structured:
#' Module 1
#' #name m1
#' #export
m1 <- modules::module({
expose("R/m1/fun.R")
expose("R/m1/foo.R")
})
The package checks and builds cleanly and I can call functions by issuing m1$fun() and m1$foo(). However, calling the help files don't work, no matter what I try (i.e., combinations of ? or help() with the function names, with or without the module prefix m1$.
Actually, I can't even expect the help files to be there, because after running devtools::document(), the roxygen code is not converted into man/*.Rd files. So I guess my problem is having devtools::document() search into the R subfolders. Running devtools::document("R/m1") doesn't do the trick, though.
One thing that works is putting the function scripts in the parent folder, R/, but then they lose the module scope and the help files (but not the functions themselves) can be seen at package level. Moreover, the "usage" section will state "foo(...)" instead of "m1$foo(...)", which sounds inadequate but I am not sure is currently fixable. This is my first time working with modules, so I was wondering if there's a cleaner way of organizing my functions and help files.

how to use utils::globalVariables

Following your recommendations (or trying to do it, at least), I have tried some options, but the problem remains, so there must be something I am missing.
I have included a more complete code
setwd("C:/naapp")
#' #import utils
#' #import devtools
I have tried with and without using suppressForeignCheck
if(getRversion() >= "2.15.1"){
utils::globalVariables(c("eleven"))
utils::suppressForeignCheck(c("eleven"))
}
myFunctionSum <- function(X){print(X+eleven)}
myFunctionMul <- function(X){print(X*eleven)}
myFunction11 <- function(X){
assign("eleven",11,envir=environment(myFunctionMul))
}
maybe I should use a particular environment?
package.skeleton(name = "myPack11", list=ls(),
path = "C:/naapp", force = TRUE,
code_files = character())
I remove the "man" directory from the directory myPack11,
otherwise I would get an error because the help files are empty.
I add the imports utils, and devtools to the descrption
Then I run check
devtools::check("myPack11")
And I still get this note
#checking R code for possible problems ... NOTE
#myFunctionMul: no visible binding for global variable 'eleven'
#myFunctionSum: no visible binding for global variable 'eleven'
#Undefined global functions or variables:eleven
I have tried also to make an enviroment, combining Tomas Kalibera's suggetion and an example I found in the Internet.
myEnvir <- new.env()
myEnvir$eleven <- 11
etc
In this case, I get the same note, but with "myEnvir", instead of "eleven"
First version of the question
I trying to use "globalVariables" from the package utils. I am building an interface in R and I am planning to submit to CRAN. This is my first time, so, sorry if the question is very basic.I have read the help and I have tried to find examples, but I still don't know how to use it.
I have made a little silly example to ilustrate my question, which is:
Where do I have to place this line exactly?:
if(getRversion() >= "2.15.1"){utils::globalVariables("eleven")}
My example has three functions. myFunction11 creates the global variable "eleven" and the other two functions manipulate it. In my real code, I cannot use arguments in the functions that are called by means of a button. Consider that this is just a silly example to learn how to use globalVariables (to avoid binding notes).
myFunction11 <- function(){
assign("eleven",11,envir=environment(myFunctionSum))
}
myFunctionSum <- function(X){
print(X+eleven)
}
myFunctionMul <- function(X){
print(X*eleven)
}
Thank you in advance
I thought that the file globals.R would be automatically generated when using globalsVariables. The problem was that I needed to create the package skeleton, then create the file globals.R, add it to the R directory in the package and check the package.
So, I needed to place this in a different file:
#' #import utils
utils::globalVariables(c("eleven"))
and save it
The documentation clearly says:
## In the same source file (to remind you that you did it) add:
if(getRversion() >= "2.15.1") utils::globalVariables(c(".obj1", "obj2"))
so put it in the same source file as your functions. It can go in any of your R source files, but the comment above recommends you put it close to your code. Looking at a bunch of github packages reveals another common pattern is to have a globals.R function with it in, but this is probably a bad idea. If you later remove the global from your package but neglect to update globals.R you could mask a problem. Putting it right close to the functions that use it will hopefully remind you when you edit those functions.
Make sure you put it outside any function definitions in the file, or it won't get seen.
You cannot modify bindings in a package namespace once the package is loaded (and namespace sealed, and bindings locked). The check tool helps you to spot violations of this restriction, so you find out about the problem when checking the package rather than while running it. globalVariables is just a call to silence check when looking for these violations, which is undesirable in almost all cases. If you really need mutable state in a package, you can create a new environment (using new.env) and bind it to an (unexported) "global" variable in your namespace. This binding will be locked, but this is ok, because in R you can change an environment in place (add/remove elements, effectively modifying the elements).
The best situation is however when you can keep all mutable state in user objects (passed in as arguments into functions, and their modified versions returned as output values of functions).

Building R packages - using environment variables in DESCRIPTION file?

At our site, we have a large amount of custom R code that is used to build a set of packages for internal use and distribution to our R users. We try to maintain the entire library in a versioning scheme so that the version numbers and the date are the same. The problem is that we've gotten to the point where the number of packages is substantial enough that manual modification of the DESCRIPTION file and the package .Rd file is very time consuming, and it would be nice to automate these pieces.
We could write a pre-script that goes through the full set of files and writes the current data and version number. This could be done with out a lot of pain, but it would modify our current build chain and we would have to adapt the various steps.
Is there a way that this can be done without having to do a pre-build file modification step? In other words, can the DESCRIPTION file and the .Rd file contain something akin to an environment variable that will be substituted with the current information when called upon by R CMD build ?
You cannot use environment variables as R, when running R CMD build ... or R CMD INSTALL ..., sees the file as fixed.
But the no problem that cannot be fixed by another layer of indirection saying remains true. Your R source code could simply be files within another layer in which you text substitution according to some pattern. If you like autoconf, you could just have DESCRIPTION.in and have a configure script query the environment variables, or a meta-config file or database, or something else, and have that written out. Similarly you could have a sed or perl or python or R or ... script doing the textual substitution.
I used to let svn fill in the argument to Date: in DESCRIPTION, and also encoded revision numbers in an included header file. It's all scriptable to your heart's content.

Resources