Best practices for developing a suite of dependent R packages - r

I am starting to work on a family of R packages, all of which share substantial common code which is housed in its own package, lets call it myPackageUtilities. So I have several packages
myPackage1, myPackage2, etc...
All of these packages depend on every method in myPackageUtilities. For a real-world example, please see statnet on CRAN. The idea is that a future developer might create myPackageN, and instead of having to re-write/duplicate all of the supporting code, this future developer can simply use mypackageUtilities to get started.
There are complications:
1) Some of the code in mypackageUtilities is intended for end-users, and the rest is for internal development purposes. The end-user code needs to be properly documented using roxygen2. This code includes both S3 classes and generics, as well as various helper functions for the user.
2) The dependent packages (myPackage1, myPackage2, etc.) will likely extend S3 generics defined in myPackageUtilities.
My question is: What is the best way to assemble all of this? Here are two natural (but non-exhuastive) options:
Include mypackageUtilities under Imports: for all the dependent packages, and force users to separately load mypackageUtilities,
Include mypackageUtilities under Depends: for all the dependent packages, and be very selective about what is exported from mypackageUtilities so as to avoid cluttering the search path. All of the internal (non-exported) code will have to accessed via ::: in myPackage1, etc.
I originally asked a similar question over here, but quickly discovered the situation gets complicated quickly. For example:
If we use Imports: instead of Depends:, then any generics defined in mypackageUtilities aren't found by myPackage1, etc.
This makes using the generic templates provided by mypackageUtilities difficult/impossible, and almost defeats the purpose of this entire set-up.
If I define an S3 generic in mypackageUtilities and document it there, how can I have roxygen2 reference these docs in myPackage1?
Perhaps I am deeply misunderstanding how namespaces work, in which case this would be a great place to clear up my misunderstanding!

Welcome to the rabbit hole.
You may be pleasantly surprised to learn that you can import a function from myPackageUtilities into myPackage1 and then export it from myPackage1 to make it accessible from the global environment.
So, when you say that you have a function in myPackageUtilities that should be accessible by the end user when myPackage1 is loaded, this is what I would include in my documentation for fn_name in myPackage1
#' #importFrom myPackageUtilities fn_name
#' #export fn_name
(See https://github.com/hadley/dplyr/blob/master/R/utils.r for an example)
That still leaves the question of how to link to the original documentation. And I'm afraid I don't have a good answer for that. My current practice is to, essentially, copy the parameters documentation from the original source and then in my #details section write please see the documentation for \code{\link[myPackageUtilities]{fn_name}}
In the end, I still think your best bet is to export everything from myPackageUtilities that will ever get used outside of myPackageUtilities and do a combination import-export in each package where you want a function from myPackageUtilities to be accessible from the global environment.

Related

[R][package] Load all functions in a package's NAMESPACE

Tl;dr
I have a complete NAMESPACE file for the package I am building. I want to execute all the importFrom(x,y) clauses in that file to get the set of necessary functions I use in the package. Is there an automated way to do that?
Full question
I'm currently working on building a package that in turns depends on a bunch of other packages. Think: both Hmisc and dplyr.
The other contributors to the package don't really like using devtools::build() every 5 minutes when they debug, which I can understand. They want to be able to toggle a dev_mode boolean which, when set to true, loads all dependencies and sources all scripts in the R/ and data-raw/ folders, so that all functions and data objects are loaded in memory. I don't necessarily think that's a perfectly clean solution, but they're technically my clients and it's a big help to them so please don't offer a frame challenge unless it's the most user-friendly solution imaginable.
The problem with this solution is that when I load two libraries whose namespace clash, functions in the package that would perfectly work otherwise start throwing errors. I thus need to improve my import system.
Thanks to thorough documentation (and the help of devtools::check()), my NAMESPACE is complete. I guess I could split it in pieces and eval() some well-chosen parts of it, but it sort of feels like I'm probably not the first person to encounter this difficulty. Is there an automated way to parse NAMESPACE and import the proper functions from the proper packages ?
Thanks !

Include library calls in functions?

Is it good practice to include every library I need to execute a function within that function?
For example, my file global.r contains several functions I need for a shiny app. Currently I have all needed packages at the top of the file. When I'm switching projects/copying these functions I have to load the packages/include them in the new code. Otherwise all needed packages are contained in that function. Of course I have to check all functions with a new R session, but I think this could help in the long run.
When I tried to load a package twice it won't load the package again but checks it's already loaded. My main question is if it would slow my functions if I restructure in that way?
I only saw that practice once, library calls inside functions, so I'm not sure.
As one of the commenters suggest, you should avoid loading packages within a function since
The function now has a global effect - as a general rule, this is something to avoid.
There is a very small performance hit.
The first point is the big one. As with most optimisation, only worry about the second point if it's an issue.
Now that we've established the principle, what are the possible solution.
In small projects, I have a file called packages.R that includes all the library calls I need. This is sourced at the top of my analysis script. BTW, all my functions are in a file call func.R. This workflow was stolen/adapted from a previous SO question
If you're only importing a single function, you could use the :: trick, e.g. package::funcA(...) That way you avoid loading the package.
For larger projects, I create an R package that handles all necessary imports. The benefit of creating a package is detailed in this answer on structuring large R projects.

Hiding Undocumented Functions in a Package - Use of .function_name?

I've got some functions I need to make available in a package, and I don't want to export them or write documentation for them. I'd just hide them inside another function but they need to be available to several functions so doing so becomes a scoping and maintenance issue. What is the right way to do this? By that I mean do they need special names, do they go somewhere other than the R subdirectory, can I put them in a single file, etc? I've checked out the manuals, and what I'm after is like the .internals concept in the core, but I don't any instructions about how to do this generally. I thought I had seen something about this before but cannot locate it just now. Thx.
My solution is to remove unnecessary function from NAMESPACE and call internal function by NAME-OF-PACKAGE:::NAME-OF-INTERNAL-FUNCTION. For example if your package name is RP and the name of the internal function is IFC. Then it would be like RP:::IFC(). Notice that if you use :: (two colon)then you can call functions that are listed in NAMESPACE and when you use ::: (three colon), you can call all functions including internal and exported functions.
After asking on R-help, here is the answer. #Dwin is correct, do not export the internal functions (so fix up your export instructions in NAMESPACE - don't use exportPattern but rather name the functions explicitly using export). You can call them what you want, there is no special naming convention. You do not have to write Rd files for them if you don't export them.

Is it necessary to export base method extensions in an R package? Documentation implications?

In principle, I could keep these extensions not-exported, and this would also allow me to not-add redundant documentation for these already well-documented methods, while still also passing R CMD check myPackage without any reported WARNINGs.
What are some of the drawbacks, if any? Is this possibly recommended to keep extensions of basic methods compartmentalized within the package that defines them? Alternatively, will this make it more difficult for another package to depend on mine, if certain core method-extensions are not exported?
For example, if I don't document and don't export the following:
setMethod("show", "myPackageSpecialClass", function(object){ show(NA) })
I'm trying to flesh-out some of these finer details of best-practices with namespaces and base method extensions.
If you don't export the methods, then users (either at the command line or trying to use your classes and methods in their own package via imports) won't be able to use them -- your class will be displayed with the show,ANY-method.
You are not documenting the generic show, but rather the method appropriate for your class, show,myPackageSpecialClass-method. If in your NAMESPACE you
import(methods)
exportMethods(show)
(note that there is no way to export just some methods on the generic show) and provide no documentation, R CMD check will complain
* checking for missing documentation entries ... WARNING
Undocumented S4 methods:
generic 'show' and siglist 'myPackageSpecialClass'
All user-level objects in a package (including S4 classes and methods)
should have documentation entries.
See the chapter 'Writing R documentation files' in the 'Writing R
Extensions' manual.
Your example (I know it was not meant to be a serious show method :) ) is a good illustration for why methods might be documented -- explaining to the user why every time they try and display the object they get NA, when they were expecting some kind of description about the object.
One approach to documentation is to group methods with the class into a single Rd file, myPackageSpecialClass-class.Rd. This file would contain an alias
\alias{show,myPackageSpecialClass-method}
and a Usage
\S4method{show}{myPackageSpacialClass}(object)
This works so long as no fancy multiple dispatch is used, i.e., it is clear to which class a method applies. If the user asks for help with ?show, they are always pointed toward the methods package help page. For help on your methods / class, they'd need to ask for that specific type of help. There are several ways of doing this but my favorite is
class ? myPackageSpecialClass
method ? "show,myPackageSpecialClass"
This will not be intuitive to the average user; the (class|method) ? ... formulation is not widely used, and the specification of "generic,signature" requires a lot of understanding about how S4 works, including probably a visit to selectMethod(show, "myPackageSpecialClass") (because the method might be implemented on a class that myPackageSpecialClass inherits from) or showMethods(class="myPackageSpecialClass", where=getNamespace("myPackage")) (because you're wondering what you can do with myPackageSpecialClass).

Where in R do I permanently store my custom functions?

I have several custom functions that I use frequently in R. Rather than souce this file (or parts thereof) in each script, is there some way to add this to a base R file such that they are always available when I use R?
Yes, create a package. There are numerous tutorials as well as the Writing R Extensions manual that came with your copy of R.
It may seem like too much work at first, but you will probably be glad that you did this in the longer run.
PS And you can then load that package from ~/.Rprofile. For really short code, you can also define it there.
A package may be overkill for a for a few useful functions. I'd argue there's nothing wrong with explicitly source()ing them as you need them - at least it is explicit so that if you email someone your code, you won't forget to include those other scripts.
Another option is to use the .Rprofile file. You can read about the details in ?Startup. Basically, the idea is that:
...a file called ‘.Rprofile’ is searched for in the current directory or
in the user's home directory (in that order). The user profile file is
sourced into the workspace.
You can read here about how many people use this functionality.
The accepted answer is best long-term: Make a package.
Luckily, the learning curve for doing this has been dramatically reduced by the devtools package: It automates package creation (a nice assist in getting off on the right foot), encourages good practices (like documenting with roxygen2, and helps with using online version control (bitbucket, github or other), sharing your package with others. It's also very helpful for smoothing your way to CRAN submission.
Good docs at http://adv-r.had.co.nz and http://r-pkgs.had.co.nz .
to create your package, for instance you can:
install.packages("devtools")
devtools::create("path/to/package/pkgname")
You could also look at the 'mvbutils' package: it lets you set up a hierarchical set of "tasks" (folders with workspace ".RData" files in them) such that you can always see what's in the ancestral tasks (ie the ancestors are in the search() path). So you can put your custom functions in the "starting task" where you always start R; and then you change to vwhatever project-specific task you require, so you can avoid cluttered workspaces, but you'll still be able to use (and edit) your custom functions because the starting task is always ancestral. Objects (including functions) get stored in ".RData" files and are thus loaded/saved automatically, but there are separate text-backup facilities for functions.
There are lots of different ways of working in R, and no "one-size-fits-all" best solution. It's also not easy to find an overview! Speaking just for myself:
I'm not a fan of having to 'source' everything in every time; for one thing, it simply doesn't work with big data sets and/or results of model runs.
I think packages are hard to create and maintain; there is a really significant overhead. After the first 5 packages you write, it does get a bit easier provided you do it on at least a weekly basis so you don't forget how, but really...
In fact, 'mvbutils' also has a bunch of tools for facilitating the creation and (especially) maintenance of packages, designed to interface smoothly with the task-hierarchy system. I use & edit my own packages all the time (including editing mvbutils itself); but if it wasn't for the tools in 'mvbutils', I'd be grinding my teeth in frustration most days of the week.

Resources