Choose function to load from an R package - r

I like using function reshape from the matlab package, but I need then to specify base::sum(m) each time I want to sum the elements of my matrix or else matlab::sum is called, which only sums by columns..
I need loading package gtools to use the rdirichlet function, but then the function gtools::logit masks the function pracma::logit that I like better..
I gess there are no such things like:
library(loadOnly = "rdirichlet", from = "gtools")
or
library(loadEverythingFrom = "matlab", except = "sum")
.. because functions from the package matlab may internaly work on the matlab::sum function. So the latter must be loaded. But is there no way to get this behavior from the point of view of the user? Something that would feel like:
library(pracma)
library(matlab)
library(gtools)
sum <- base::sum
logit <- pracma::logit
.. but that would not spoil your ls() with all these small utilitary functions?
Maybe I need defining my own default namespace?

To avoid spoiling your ls, you can do something like this:
.ns <- new.env()
.ns$sum <- base::sum
.ns$logit <- pracma::logit
attach(.ns)

To my knowledge there is no easy answer to what you want to achieve. The only dirty hack I can think of is to download the source of the packages "matlab", "gtools", "pracma" and delete the offending functions from their NAMESPACE file prior to installation from source (with R CMD INSTALL package).
However, I would recommend using the explicit notation pracma::logit, because it improves readability of your code for other people and yourself in the future.
This site gives a good overview about package namespaces:
http://r-pkgs.had.co.nz/namespace.html

Related

Determining which version of a function is active when many packages are loaded

If I have multiple packages loaded that define functions of the same name, is there an easy way to determine which version of the function is currently the active one? Like, lets say I have base R, the tidyverse, and a bunch of time series packages loaded. I'd like a function which_package("intersect") that would tell me the package name of the active version of the intersect function. I know you can go back and look at all the warning messages you recieved when installing packages, but I think that sort of manual search is not only tedious but also error-prone.
There is a function here that does sort of what I want, except it produces a table for all conflicts rather than the value for one function. I would actually be quite happy with that, and would also accept a similar function as an answer, but I have had problems with the implimentation of function given. As applied to my examples, it inserts vast amounts of white space and many duplicates of the package names (e.g. the %>% function shows up with 132 packages listed), making the output hard to read and hard to use. It seems like it should be easy to remove the white space and duplicates, and I have spent considerable time on various approaches that I expected to work but which had no impact on the outcome.
So, for an example of many conflicts:
install.packages(pkg = c("tidyverse", "fpp3", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink")
lapply(x = c("tidyverse", "fable", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink"),
library, character.only = TRUE)
You can pull this information with your own function helper.
which_package <- function(fun) {
if(is.character(fun)) fun <- getFunction(fun)
stopifnot(is.function(fun))
x <- environmentName(environment(fun))
if (!is.null(x)) return(x)
}
This will return R_GlobalEnv for functions that you define in the global environment. There is also the packageName function if you really want to restrict it to packages only.
For example
library(MASS)
library(dplyr)
which_package(select)
# [1] "dplyr"

Removing/de-registering a specific function from an R package

I may not be using the terminology correctly here so please forgive me...
I have a case of one package 'overwriting' a function with the same name loaded by another package and thus changing the behavior (breaking) of a function.
The specific case:
X <- data.frame ( y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100) )
library(CausalImpact)
a <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # works
library(bfast) # imports quantmod which loads crappy version of as.zoo.data.frame
b <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # Error
I know the error comes from two versions of the function as.zoo.data.frame.
The problematic version is imported by bfast from the package 'quantmod' (see https://github.com/joshuaulrich/quantmod/issues/168). Unfortunately their hotfix did not prevent this error. Super annoying.
I can hack around this specific problem, but I was wondering if there is a general way to like 'de-register' this function variant from the search path. Neither detach nor unloadNamespace remove the offending function (same behavior after). An explanation and similar problem is discussed here and here, but I wasn't able to find a general solution. For instance I'd rather just remove this function than clone and re-write CausalImpact to deal with this behavior.
From R 3.6.0 onwards, there is a new option called "conflicts.policy" to handle this within an established framework. For small issues like this, you can use the new arguments to library(). If you aren't yet to 3.6, the easiest solution might be to explicitly namespace CausalImpact when you need it, i.e. CausalImpact::CausalImpact. That's a mouthful, so you could do causal_impact <- CausalImpact::CausalImpact and use that alias.
# only attach select
library(dplyr, include.only = "select")
# exclude slice/arrange from being attached.
library(dplyr, exclude = c("slice", "arrange"))
library(bfast, exclude = "CausalImpact") should solve your problem.
Attach means that they are available for use without explicit prefixing with their package. In either of these cases, something like dplyr::slice would work just fine.
For more information, you can see ?library. Also, the R-Core member Luke Tierney wrote a blog explaining how the conflicts.policy works. You can find that here
Here's an answer that works, but is less preferable than de-registering a S3 method because it involves replacing the registered version in the S3 Methods table with the desired method:
library(CausalImpact)
library(bfast)
assignInNamespace("as.zoo.data.frame", zoo:::as.zoo.data.frame, ns = asNamespace("zoo"))
based partially on #smingerson's suggestion in the comments

R: How to check if the libraries that I am loading, I use them for my code in R?

I have used several packages of R libraries for my study. All libraries charge together at the beginning of my code. And here is the problem. It turns out that I have done several tests with different functions that were already in the packages of R. However, in the final code I have not implemented all the functions I have tried. Therefore, I am loading libraries that I do not use.
Would there be any way to check the libraries to know if they really are necessary for my code?
Start by restarting R with a fresh environment, no libraries loaded. For this demonstration, I'm going to define two functions:
zoo1 <- function() na.locf(1:10)
zoo2 <- function() zoo::na.locf(1:10)
With no libraries loaded, let's try something:
codetools::checkUsage(zoo1)
# <anonymous>: no visible global function definition for 'na.locf'
codetools::checkUsage(zoo2)
library(zoo)
# Attaching package: 'zoo'
# The following objects are masked from 'package:base':
# as.Date, as.Date.numeric
codetools::checkUsage(zoo1)
Okay, so we know we can check a single function to see if it is abusing scope and/or using non-base functions. Let's assume that you've loaded your script full of functions (but not the calls to require or library), so let's do this process for all of them. Let's first unload zoo, so that we'll see a complaint again about our zoo1 function:
detach("package:zoo", unload=TRUE)
Now let's iterate over all functions:
allfuncs <- Filter(function(a) is.function(get(a)), ls())
str(sapply(allfuncs, function(fn) capture.output(codetools::checkUsage(get(fn))), simplify=FALSE))
# List of 2
# $ zoo1: chr "<anonymous>: no visible global function definition for 'na.locf'"
# $ zoo2: chr(0)
Now you know to look in the function named zoo1 for a call to na.locf. It'll be up to you to find in which not-yet-loaded package this function resides, but that might be more more reasonable, depending on the number of packages you are loading.
Some side-thoughts:
If you have a script file that does not have everything comfortably ensconced in functions, then just wrap all of the global R code into a single function, say bigfunctionfortest <- function() { as the first line and } as the last. Then source the file and run codetools::checkUsage(bigfunctionfortest).
Package developers have to go through a process that uses this, so that the Imports: and Depends: sections of NAMESPACE (another ref: http://r-pkgs.had.co.nz/namespace.html) will be correct. One good trick to do that will prevent "namespace pollution" is loading the namespace but not the package ... and though that may sound confusing, it often results in using zoo::na.locf for all non-base functions. This gets old quickly (especially if you are using dplyr and such, where most of your daily functions are non-base), suggesting those oft-used functions should be directly imported instead of just referenced wholly. If you're familiar with python, then:
# R
library(zoo)
na.locf(c(1,2,NA,3))
is analagous to
# fake-python
from zoo import *
na_locf([1,2,None,3])
(if that package/function exists). Then the non-polluting variant looks like:
# R
zoo::na.locf(c(1,2,NA,3))
# fake-python
import zoo
zoo.na_locf([1,2,None,3])
where the function's package (and/or subdir packaging) must be used explicitly. There is no ambiguity. It is explicit. This is by some/many considered "A Good Thing (tm)".
(Language-philes will likely say that library(zoo) and from zoo import * are not exactly the same ... a better way to describe what is happening is that they bring everything from zoo into the search path of functions, potentially causing masking as we saw in a console message earlier; while the :: functionality only loads the namespace but does not add it to the search path. Lots of things going on in the background.)

Search R file by package used

Let's say I open R file that I used way back when. On top of the page I see a library loaded, but I don't remember what it does any more. So I think to myself: hm, I wonder where in this long R file is this library used?
Is there a way to list what functions from a given package were used in particula file?
There are certainly other ways to do this but if you can get a list of the functions for the package you could combine readLines (to read the script into R as characters), grepl (to detect matches), and sapply. The way I would grab the functions is using p_funs from the pacman package. (Full disclosure: I am one of the authors).
Here is an example script that I have saved as "test.R"
library(ggplot2)
x <- rnorm(20)
y <- rnorm(20)
qplot(x, y)
summary(x)
and here is a session where I detect which functions are used
script <- readLines("test.R")
funs <- p_funs(ggplot2)
out <- sapply(funs, function(input){any(grepl(input, x = script))})
funs[out]
#[1] "ggplot" "qplot"
If you don't want to install pacman you can use any other method to get a list of the functions in the package. You could replace that with
funs <- objects("package:ggplot2")
and you would essentially get the same answer.
Note that you may get more matches than there actually are in the file - note that the ggplot function wasn't actually in my script but the string "ggplot" is in library(ggplot2). So you may still need to do a little bit of additional digging after the initial sweep through the file.

R Package development: overriding a function from one package with a function from another?

I am currently working on developing two packages, below is a simplified
version of my problem:
In package A I have some functions (say "sum_twice"), and I it calls to
another function inside the package (say "slow_sum").
However, in package B, I wrote another function (say "fast_sum"), with
which I wish to replace the slow function in package A.
Now, how do I manage this "overriding" of the "slow_sum" function with the
"fast_sum" function?
Here is a simplified example of such functions (just to illustrate):
############################
##############
# Functions in package A
slow_sum <- function(x) {
sum_x <- 0
for(i in seq_along(x)) sum_x <- sum_x + x[i]
sum_x
}
sum_twice <- function(x) {
x2 <- rep(x,2)
slow_sum(x2)
}
##############
# A function in package B
fast_sum <- function(x) { sum(x) }
############################
If I only do something like slow_sum <- fast_sum, this would not work, since "sum_twice" uses "slow_sum" from the NAMESPACE
of package A.
I tried using the following function when loading package "B":
assignInNamespace(x = "slow_sum", value = B:::fast_sum, ns = "A")
This indeed works, however, it makes the CRAN checks return both a NOTE on
how I should not use ":::", and also a warning for using assignInNamespace
(since it is supposed to not be very safe).
However, I am at a loss.
What would be a way to have "sum_twice" use "fast_sum" instead of
"slow_sum"?
Thank you upfront for any feedback or suggestion,
With regards,
Tal
p.s: this is a double post from here.
UDPATE: motivation for this question
I am developing two packages, one is based solely on R and works fine (but a bit slow), it is dendextend (which is now on CRAN). The other one is meant to speed up the first package by using Rcpp (this is dendextendRcpp which is on github). The second package speeds up the first by overriding some basic functions the first package uses. But in order for the higher levels functions in the first package will use the lower functions in the second package, I have to use assignInNamespace which leads CRAN to throw warnings+NOTES, which ended up having the package rejected from CRAN (until these warnings will be avoided).
The problem is that I have no idea how to approach this issue. The only solution I can think of is either mixing the two packages together (making it harder to maintain, and will automatically require a larger dependency structure for people asking to use the package). And the other option is to just copy paste the higher level functions from dendextend to dendextendRcpp, and thus have them mask the other functions. But I find this to be MUCH less elegant (because that means I will need to copy-paste MANY functions, forcing more double-code maintenance) . Any other ideas? Thanks.
We could put this in sum_twice:
my_sum_ch <- getOption("my_sum", if ("package:fastpkg" %in% search())
"fast_sum" else "slow_sum")
my_sum <- match.fun(my_sum_ch)
If the "my_sum" option were set then that version of my_sum would be used and if not it would make the decision based on whether or not fastpkg had been loaded.
The solution I ended up using (thanks to Uwe and Kurt), is using "local" to create a localized environment with the package options. If you're curious, the function is called "dendextend_options", and is here:
https://github.com/talgalili/dendextend/blob/master/R/zzz.r
Here is an example for its use:
dendextend_options <- local({
options <- list()
function(option, value) {
# ellipsis <- list(...)
if(missing(option)) return(options)
if(missing(value))
options[[option]]
else options[[option]] <<- value
}
})
dendextend_options("a")
dendextend_options("a", 1)
dendextend_options("a")
dendextend_options("a", NULL)
dendextend_options("a")
dendextend_options()

Resources