[R][package] Load all functions in a package's NAMESPACE - r

Tl;dr
I have a complete NAMESPACE file for the package I am building. I want to execute all the importFrom(x,y) clauses in that file to get the set of necessary functions I use in the package. Is there an automated way to do that?
Full question
I'm currently working on building a package that in turns depends on a bunch of other packages. Think: both Hmisc and dplyr.
The other contributors to the package don't really like using devtools::build() every 5 minutes when they debug, which I can understand. They want to be able to toggle a dev_mode boolean which, when set to true, loads all dependencies and sources all scripts in the R/ and data-raw/ folders, so that all functions and data objects are loaded in memory. I don't necessarily think that's a perfectly clean solution, but they're technically my clients and it's a big help to them so please don't offer a frame challenge unless it's the most user-friendly solution imaginable.
The problem with this solution is that when I load two libraries whose namespace clash, functions in the package that would perfectly work otherwise start throwing errors. I thus need to improve my import system.
Thanks to thorough documentation (and the help of devtools::check()), my NAMESPACE is complete. I guess I could split it in pieces and eval() some well-chosen parts of it, but it sort of feels like I'm probably not the first person to encounter this difficulty. Is there an automated way to parse NAMESPACE and import the proper functions from the proper packages ?
Thanks !

Related

Include library calls in functions?

Is it good practice to include every library I need to execute a function within that function?
For example, my file global.r contains several functions I need for a shiny app. Currently I have all needed packages at the top of the file. When I'm switching projects/copying these functions I have to load the packages/include them in the new code. Otherwise all needed packages are contained in that function. Of course I have to check all functions with a new R session, but I think this could help in the long run.
When I tried to load a package twice it won't load the package again but checks it's already loaded. My main question is if it would slow my functions if I restructure in that way?
I only saw that practice once, library calls inside functions, so I'm not sure.
As one of the commenters suggest, you should avoid loading packages within a function since
The function now has a global effect - as a general rule, this is something to avoid.
There is a very small performance hit.
The first point is the big one. As with most optimisation, only worry about the second point if it's an issue.
Now that we've established the principle, what are the possible solution.
In small projects, I have a file called packages.R that includes all the library calls I need. This is sourced at the top of my analysis script. BTW, all my functions are in a file call func.R. This workflow was stolen/adapted from a previous SO question
If you're only importing a single function, you could use the :: trick, e.g. package::funcA(...) That way you avoid loading the package.
For larger projects, I create an R package that handles all necessary imports. The benefit of creating a package is detailed in this answer on structuring large R projects.

Best practices for developing a suite of dependent R packages

I am starting to work on a family of R packages, all of which share substantial common code which is housed in its own package, lets call it myPackageUtilities. So I have several packages
myPackage1, myPackage2, etc...
All of these packages depend on every method in myPackageUtilities. For a real-world example, please see statnet on CRAN. The idea is that a future developer might create myPackageN, and instead of having to re-write/duplicate all of the supporting code, this future developer can simply use mypackageUtilities to get started.
There are complications:
1) Some of the code in mypackageUtilities is intended for end-users, and the rest is for internal development purposes. The end-user code needs to be properly documented using roxygen2. This code includes both S3 classes and generics, as well as various helper functions for the user.
2) The dependent packages (myPackage1, myPackage2, etc.) will likely extend S3 generics defined in myPackageUtilities.
My question is: What is the best way to assemble all of this? Here are two natural (but non-exhuastive) options:
Include mypackageUtilities under Imports: for all the dependent packages, and force users to separately load mypackageUtilities,
Include mypackageUtilities under Depends: for all the dependent packages, and be very selective about what is exported from mypackageUtilities so as to avoid cluttering the search path. All of the internal (non-exported) code will have to accessed via ::: in myPackage1, etc.
I originally asked a similar question over here, but quickly discovered the situation gets complicated quickly. For example:
If we use Imports: instead of Depends:, then any generics defined in mypackageUtilities aren't found by myPackage1, etc.
This makes using the generic templates provided by mypackageUtilities difficult/impossible, and almost defeats the purpose of this entire set-up.
If I define an S3 generic in mypackageUtilities and document it there, how can I have roxygen2 reference these docs in myPackage1?
Perhaps I am deeply misunderstanding how namespaces work, in which case this would be a great place to clear up my misunderstanding!
Welcome to the rabbit hole.
You may be pleasantly surprised to learn that you can import a function from myPackageUtilities into myPackage1 and then export it from myPackage1 to make it accessible from the global environment.
So, when you say that you have a function in myPackageUtilities that should be accessible by the end user when myPackage1 is loaded, this is what I would include in my documentation for fn_name in myPackage1
#' #importFrom myPackageUtilities fn_name
#' #export fn_name
(See https://github.com/hadley/dplyr/blob/master/R/utils.r for an example)
That still leaves the question of how to link to the original documentation. And I'm afraid I don't have a good answer for that. My current practice is to, essentially, copy the parameters documentation from the original source and then in my #details section write please see the documentation for \code{\link[myPackageUtilities]{fn_name}}
In the end, I still think your best bet is to export everything from myPackageUtilities that will ever get used outside of myPackageUtilities and do a combination import-export in each package where you want a function from myPackageUtilities to be accessible from the global environment.

Best practices to handle personal functions in R

I have written personal functions in R that are not specific to one (or a few) projects.
What are the best practices (in R) to put those kind of functions?
Is the best way to do it to have one file that gets sourced at startup? or is there a better (recommended) way to deal with this situation?
Create a package named "utilities" , put utility functions in that package, try to aim for one function per file, and store the package in a source control system (e.g., GIT, SVN ). It will save you time in the long run.
P.S. .Rprofile tends to get accidentally deleted.
If you have many, it would be good to make it into a package that you load each time you start working.
It is probably not a good idea to have a monolithic script with a bunch of functions. Instead break the file up into several files each of which either has only one function (my preference) or has a group of functions that are logically similar. That makes it easier to find things when you need to make changes.
Most people use the .Rprofile file for this. Here are two links which talk about this file in some detail.
http://www.statmethods.net/interface/customizing.html
http://blog.revolutionanalytics.com/2013/10/sample-rprofile.html
At the top of my .Rprofile file I call library() for the various libraries which I normally use. I also have some personal handy functions which I've come to rely on. Because this file is sourced on startup, they are available to me every session.
From my experience, a package will be the best choice for personal functions. Firstly I put all new functions into a personal package, which I called it My. When I find some functions was similar and are worth to become an independent package, I will create a new package and move them.

Automatically "sourcing" function on change

While I am writing .R functions I constantly need to manually write source("funcname.r") to get the changes reflected in the workspace. I am sure it must be possible to do this automatically. So what I would like would be just to make changes in my function, save the function and be able to use the new function in R workspace without manually "sourcing" this function. How can I do that?
UPDATE: I know about selecting appropriate lines of code and pressing CTRL+R in R Editor (RGui) or using Notepad++ and executing the lines into R. But this approach has a disadvantage of making my workspace console "muddled". I would like to stop this practice if at all possible.
You can use R studio which has a source on save option.
If you are prepared to package your functions into a package, you may enjoy exploring Hadley's devtools package. This provides a suite of tools to write, test and document
packages.
https://github.com/hadley/devtools
This approach offer many advantages, but mainly reloading the package with a minimum of retyping.
You will still have to type load_all("yourpackage") but I find this small amount of typing is small beer compared to the advantages of devtools.
For additional information, including how to setup devtools, have a look at https://github.com/hadley/devtools/wiki/development
If you're using Eclipse + StatET, you can press CTRL+R+S, which saves your script and sources it. As close to automatic as I can get.
If you can get your text editor to run a system command after it saves the file, then you could use something like AutoIt (on Windows) or a batch script (on UNIX-derivative) to pass a call to source off to all running copies of R. But that's a heck of a lot of work for not much gain.
Still, I think it's much more likely to work being event-driven on the text editor end vs. having R constantly scan for updates (or somehow interface with the OS's update-event-messaging-system).
This is likely not possible (automatically detecting disc changes without intervention or running at least one line).
R needs to read into memory functions, so a change on the disc wouldn't be reflected in the workspace without reloading your functions.
If you are into developing R functions, some amount of messiness during your development process will be likely inevitable, but perhaps I could suggest that you try writing an R-package to house your functions?
This has the advantage of being able to robustly document your functions, using lazy loading so that you have access to your functions/datasets immediately without sourcing them.
Don't be afraid of making a package, it's easy with package.skeleton() and doesn't have to go on CRAN but could be for your own personal use without distribution! Just have fun!
Try to accept some messiness during development knowing you are working towards your goal and fighting the good fight of code organization and documentation!
We are only imperfect people, in an imperfect world, but we mean well!

Where in R do I permanently store my custom functions?

I have several custom functions that I use frequently in R. Rather than souce this file (or parts thereof) in each script, is there some way to add this to a base R file such that they are always available when I use R?
Yes, create a package. There are numerous tutorials as well as the Writing R Extensions manual that came with your copy of R.
It may seem like too much work at first, but you will probably be glad that you did this in the longer run.
PS And you can then load that package from ~/.Rprofile. For really short code, you can also define it there.
A package may be overkill for a for a few useful functions. I'd argue there's nothing wrong with explicitly source()ing them as you need them - at least it is explicit so that if you email someone your code, you won't forget to include those other scripts.
Another option is to use the .Rprofile file. You can read about the details in ?Startup. Basically, the idea is that:
...a file called ‘.Rprofile’ is searched for in the current directory or
in the user's home directory (in that order). The user profile file is
sourced into the workspace.
You can read here about how many people use this functionality.
The accepted answer is best long-term: Make a package.
Luckily, the learning curve for doing this has been dramatically reduced by the devtools package: It automates package creation (a nice assist in getting off on the right foot), encourages good practices (like documenting with roxygen2, and helps with using online version control (bitbucket, github or other), sharing your package with others. It's also very helpful for smoothing your way to CRAN submission.
Good docs at http://adv-r.had.co.nz and http://r-pkgs.had.co.nz .
to create your package, for instance you can:
install.packages("devtools")
devtools::create("path/to/package/pkgname")
You could also look at the 'mvbutils' package: it lets you set up a hierarchical set of "tasks" (folders with workspace ".RData" files in them) such that you can always see what's in the ancestral tasks (ie the ancestors are in the search() path). So you can put your custom functions in the "starting task" where you always start R; and then you change to vwhatever project-specific task you require, so you can avoid cluttered workspaces, but you'll still be able to use (and edit) your custom functions because the starting task is always ancestral. Objects (including functions) get stored in ".RData" files and are thus loaded/saved automatically, but there are separate text-backup facilities for functions.
There are lots of different ways of working in R, and no "one-size-fits-all" best solution. It's also not easy to find an overview! Speaking just for myself:
I'm not a fan of having to 'source' everything in every time; for one thing, it simply doesn't work with big data sets and/or results of model runs.
I think packages are hard to create and maintain; there is a really significant overhead. After the first 5 packages you write, it does get a bit easier provided you do it on at least a weekly basis so you don't forget how, but really...
In fact, 'mvbutils' also has a bunch of tools for facilitating the creation and (especially) maintenance of packages, designed to interface smoothly with the task-hierarchy system. I use & edit my own packages all the time (including editing mvbutils itself); but if it wasn't for the tools in 'mvbutils', I'd be grinding my teeth in frustration most days of the week.

Resources