Official guidelines for using functions newly added to base R - r

I am writing a package that performs a statistical analysis while handling missing values. I am using the wonderful, life-changing function anyNA which was added sometime after 3.0 (commit). Another recently added function that people might want to use is OlsonNames.
So as I am using this function, my package won't work on older versions of R. I see four options for dealing with this.
Make the whole package depend on R >= 3.1 in DESCRIPTION.
Redefine the function in my source.
Redefine the function if the user is using <3.1 and don't define it if they are using >= 3.1 or make the function check the version each time e.g.
anyNA <- function(x)
if(as.numeric(R.Version()$minor) > 3.1){
return(anyNA(x)
} else {
return(any(is.NA(x))
}
}
or
if(as.numeric(R.Version()$minor) > 3.1){
anyNA <- base::anyNA
} else {
anyNA <- function(x) any(is.na(x))
}
I'm not even sure this second one would work in package source code.
Rewrite my code using any(is.na(x)).
My concrete question is is there an official CRAN preference for one of these?
Failing that, are there good reasons to use one over the others? To my eyes they all have failings. 1) It seems unnecessary to require users have R >= 3.1 for the sake of a small function. 2) If I redefine the function, any improvements made to the function in R base won't get used in my package. 3) This mostly seems messy. But also, if the base R version of the function changes I might end up with hard to fix bugs that only occur in certain R versions. 4) Code readability is reduced.

Related

Determining which version of a function is active when many packages are loaded

If I have multiple packages loaded that define functions of the same name, is there an easy way to determine which version of the function is currently the active one? Like, lets say I have base R, the tidyverse, and a bunch of time series packages loaded. I'd like a function which_package("intersect") that would tell me the package name of the active version of the intersect function. I know you can go back and look at all the warning messages you recieved when installing packages, but I think that sort of manual search is not only tedious but also error-prone.
There is a function here that does sort of what I want, except it produces a table for all conflicts rather than the value for one function. I would actually be quite happy with that, and would also accept a similar function as an answer, but I have had problems with the implimentation of function given. As applied to my examples, it inserts vast amounts of white space and many duplicates of the package names (e.g. the %>% function shows up with 132 packages listed), making the output hard to read and hard to use. It seems like it should be easy to remove the white space and duplicates, and I have spent considerable time on various approaches that I expected to work but which had no impact on the outcome.
So, for an example of many conflicts:
install.packages(pkg = c("tidyverse", "fpp3", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink")
lapply(x = c("tidyverse", "fable", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink"),
library, character.only = TRUE)
You can pull this information with your own function helper.
which_package <- function(fun) {
if(is.character(fun)) fun <- getFunction(fun)
stopifnot(is.function(fun))
x <- environmentName(environment(fun))
if (!is.null(x)) return(x)
}
This will return R_GlobalEnv for functions that you define in the global environment. There is also the packageName function if you really want to restrict it to packages only.
For example
library(MASS)
library(dplyr)
which_package(select)
# [1] "dplyr"

Removing/de-registering a specific function from an R package

I may not be using the terminology correctly here so please forgive me...
I have a case of one package 'overwriting' a function with the same name loaded by another package and thus changing the behavior (breaking) of a function.
The specific case:
X <- data.frame ( y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100) )
library(CausalImpact)
a <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # works
library(bfast) # imports quantmod which loads crappy version of as.zoo.data.frame
b <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # Error
I know the error comes from two versions of the function as.zoo.data.frame.
The problematic version is imported by bfast from the package 'quantmod' (see https://github.com/joshuaulrich/quantmod/issues/168). Unfortunately their hotfix did not prevent this error. Super annoying.
I can hack around this specific problem, but I was wondering if there is a general way to like 'de-register' this function variant from the search path. Neither detach nor unloadNamespace remove the offending function (same behavior after). An explanation and similar problem is discussed here and here, but I wasn't able to find a general solution. For instance I'd rather just remove this function than clone and re-write CausalImpact to deal with this behavior.
From R 3.6.0 onwards, there is a new option called "conflicts.policy" to handle this within an established framework. For small issues like this, you can use the new arguments to library(). If you aren't yet to 3.6, the easiest solution might be to explicitly namespace CausalImpact when you need it, i.e. CausalImpact::CausalImpact. That's a mouthful, so you could do causal_impact <- CausalImpact::CausalImpact and use that alias.
# only attach select
library(dplyr, include.only = "select")
# exclude slice/arrange from being attached.
library(dplyr, exclude = c("slice", "arrange"))
library(bfast, exclude = "CausalImpact") should solve your problem.
Attach means that they are available for use without explicit prefixing with their package. In either of these cases, something like dplyr::slice would work just fine.
For more information, you can see ?library. Also, the R-Core member Luke Tierney wrote a blog explaining how the conflicts.policy works. You can find that here
Here's an answer that works, but is less preferable than de-registering a S3 method because it involves replacing the registered version in the S3 Methods table with the desired method:
library(CausalImpact)
library(bfast)
assignInNamespace("as.zoo.data.frame", zoo:::as.zoo.data.frame, ns = asNamespace("zoo"))
based partially on #smingerson's suggestion in the comments

R Package development: overriding a function from one package with a function from another?

I am currently working on developing two packages, below is a simplified
version of my problem:
In package A I have some functions (say "sum_twice"), and I it calls to
another function inside the package (say "slow_sum").
However, in package B, I wrote another function (say "fast_sum"), with
which I wish to replace the slow function in package A.
Now, how do I manage this "overriding" of the "slow_sum" function with the
"fast_sum" function?
Here is a simplified example of such functions (just to illustrate):
############################
##############
# Functions in package A
slow_sum <- function(x) {
sum_x <- 0
for(i in seq_along(x)) sum_x <- sum_x + x[i]
sum_x
}
sum_twice <- function(x) {
x2 <- rep(x,2)
slow_sum(x2)
}
##############
# A function in package B
fast_sum <- function(x) { sum(x) }
############################
If I only do something like slow_sum <- fast_sum, this would not work, since "sum_twice" uses "slow_sum" from the NAMESPACE
of package A.
I tried using the following function when loading package "B":
assignInNamespace(x = "slow_sum", value = B:::fast_sum, ns = "A")
This indeed works, however, it makes the CRAN checks return both a NOTE on
how I should not use ":::", and also a warning for using assignInNamespace
(since it is supposed to not be very safe).
However, I am at a loss.
What would be a way to have "sum_twice" use "fast_sum" instead of
"slow_sum"?
Thank you upfront for any feedback or suggestion,
With regards,
Tal
p.s: this is a double post from here.
UDPATE: motivation for this question
I am developing two packages, one is based solely on R and works fine (but a bit slow), it is dendextend (which is now on CRAN). The other one is meant to speed up the first package by using Rcpp (this is dendextendRcpp which is on github). The second package speeds up the first by overriding some basic functions the first package uses. But in order for the higher levels functions in the first package will use the lower functions in the second package, I have to use assignInNamespace which leads CRAN to throw warnings+NOTES, which ended up having the package rejected from CRAN (until these warnings will be avoided).
The problem is that I have no idea how to approach this issue. The only solution I can think of is either mixing the two packages together (making it harder to maintain, and will automatically require a larger dependency structure for people asking to use the package). And the other option is to just copy paste the higher level functions from dendextend to dendextendRcpp, and thus have them mask the other functions. But I find this to be MUCH less elegant (because that means I will need to copy-paste MANY functions, forcing more double-code maintenance) . Any other ideas? Thanks.
We could put this in sum_twice:
my_sum_ch <- getOption("my_sum", if ("package:fastpkg" %in% search())
"fast_sum" else "slow_sum")
my_sum <- match.fun(my_sum_ch)
If the "my_sum" option were set then that version of my_sum would be used and if not it would make the decision based on whether or not fastpkg had been loaded.
The solution I ended up using (thanks to Uwe and Kurt), is using "local" to create a localized environment with the package options. If you're curious, the function is called "dendextend_options", and is here:
https://github.com/talgalili/dendextend/blob/master/R/zzz.r
Here is an example for its use:
dendextend_options <- local({
options <- list()
function(option, value) {
# ellipsis <- list(...)
if(missing(option)) return(options)
if(missing(value))
options[[option]]
else options[[option]] <<- value
}
})
dendextend_options("a")
dendextend_options("a", 1)
dendextend_options("a")
dendextend_options("a", NULL)
dendextend_options("a")
dendextend_options()

Search all existing functions for package dependencies?

I have a package that I wrote while learning R and its dependency list is quite long. I'm trying to trim it down, for two cases:
I switched to other approaches, and packages listed in Suggests simply aren't used at all.
Only one function out of my whole package relies on a given dependency, and I'd like to switch to an approach where it is loaded only when needed.
Is there an automated way to track down these two cases? I can think of two crude approaches (download the list of functions in all the dependent packages and automate a text search for them through my package's code, or load the package functions without loading the required packages and execute until there's an error), but neither seems particularly elegant or foolproof....
One way to check dependancies in all functions is to use the byte compiler because that will check for functions being available in the global workspace and issue a notice if it does not find said function.
So if you as an example use the na.locf function from the zoo package in any of your functions and then byte compile your function you will get a message like this:
Note: no visible global function definition for 'na.locf'
To correctly address it for byte compiling you would have to write it as zoo::na.locf
So a quick way to test all R functions in a library/package you could do something like this (assuming you didn't write the calls to other functions with the namespace):
Assuming your R files with the functions are in C:\SomeLibrary\ or subfolders there of and then you define a sourceing file as C:\SomeLibrary.r or similar containing:
if (!(as.numeric(R.Version()$major) >=2 && as.numeric(R.Version()$minor) >= 14.0)) {
stop("SomeLibrary needs version 2.14.0 or greater.")
}
if ("SomeLibrary" %in% search()) {
detach("SomeLibrary")
}
currentlyInWorkspace <- ls()
SomeLibrary <- new.env(parent=globalenv())
require("compiler",quietly=TRUE)
pathToLoad <- "C:/SomeLibraryFiles"
filesToSource <- file.path(pathToLoad,dir(pathToLoad,recursive=TRUE)[grepl(".*[\\.R|\\.r].*",dir(pathToLoad,recursive=TRUE))])
for (filename in filesToSource) {
tryCatch({
suppressWarnings(sys.source(filename, envir=SomeLibrary))
},error=function(ex) {
cat("Failed to source: ",filename,"\n")
print(ex)
})
}
for(SomeLibraryFunction in ls(SomeLibrary)) {
if (class(get(SomeLibraryFunction,envir=SomeLibrary))=="function") {
outText <- capture.output(with(SomeLibrary,assign(SomeLibraryFunction,cmpfun(get(SomeLibraryFunction)))))
if(length(outText)>0){
cat("The function ",SomeLibraryFunction," produced the following compile note(s):\n")
cat(outText,sep="\n")
cat("\n")
}
}
}
attach(SomeLibrary)
rm(list=ls()[!ls() %in% currentlyInWorkspace])
invisible(gc(verbose=FALSE,reset=TRUE))
Then start up R with no preloaded packages and source in C:\SomeLibrary.r
And then you should get notes from cmpfun for any call to a function in a package that's not part of the base packages and doesn't have a fully qualified namespace defined.

How to correct for "Error in nullmodel(comm, method) : could not find function "list2env" in the vegan package

I'm busy exploring the package vegan for R, using it to calculate nestedness of community matrices and null models. I'm particularly interested in using the permat functions as well as Oecosimu.
However, when running my program I obtained the following errors:
Error in nullmodel(comm, method) : could not find function "list2env"
Error in nullmodel(m, ALGO) : could not find function "list2env"
I then even ran an example (given below) of how to use these functions given by the R help function, and even these examples gave the same error. Am I suppose to import something else in order to use these functions or how do I go about fixing this?
Examples:
m <- matrix(c(
1,3,2,0,3,1,
0,2,1,0,2,1,
0,0,1,2,0,3,
0,0,0,1,4,3
), 4, 6, byrow=TRUE)
x1 <- permatswap(m, "quasiswap")
summary(x1)
x2 <- permatfull(m)
summary(x2)
x3 <- permatfull(m, "none", mtype="prab")
x3$orig
summary(x3)
x4 <- permatfull(m, strata=c(1,1,2,2))
summary(x4)
Technically, this is a bug in the development version of Vegan on R-Forge. We were failing to declare a dependency on R versions >= 2.12 in DESCRIPTION. I have checked in the relevant change to the source tree to fix this but it will take a day or so before the tarball and binaries are rebuilt by R-Forge.
That said, you should probably update your R to something more recent. Or use the versions of those functions provided in Vegan 2.0-x on CRAN.
list2env is part of R base, which means it comes with the distribution, not in an add-on package. So if you don't have it you're probably either running an old version of R or have a broken installation. The example worked fine for me, with R 2.12.1 and vegan 2.1-0.
Your code works for me without an error message
The most probable cause of your error is your using old versions of R, vegan or permute
The R news for changes says
CHANGES IN R VERSION 2.12.0: NEW FEATURES:
o New list2env() utility function as an inverse of
as.list(<environment>) and for fast multi-assign() to existing
environment. as.environment() is now generic and uses list2env()
as list method.
CHANGES IN R VERSION 2.12.1: BUG FIXES:
o When list2env() created an environment it was missing a PROTECT
call and so was vulnerable to garbage collection.
CHANGES IN R VERSION 2.13.0: NEW FEATURES:
o list2env(envir = NULL) defaults to hashing (with a suitably sized
environment) for lists of more than 100 elements.
So update your version of R and the packages and try again.

Resources