See if a variable/function exists in a package? - r

I'm trying to test whether a particular variable or function exists in a package. For example, suppose I wanted to test whether a function called plot existed in package 'graphics'.
The following tests whether a function plot exists, but not what package it comes from:
exists('plot', mode='function')
Or I can test that something called plot exists in the graphics package, but this doesn't tell me whether it's a function:
'plot' %in% ls('package:graphics')
Is there a nice way to ask "does an object called X exist in package Y of mode Z"? (Essentially, can I restrict exists to a particular package?)
(Yes, I can combine the above two lines to first test that plot is in graphics and then ask for the mode of plot, but what if I had my own function plot masking graphics::plot? Could I then trust the output of exists('plot', mode='function')? )
Background: writing tests for a package of mine and want to test that various functions are exported. I'm using package testthat which executes tests in an environment where I can see all the internal functions of the package, and have been stung by exists('myfunction', mode='function') returning true, but I've actually forgotten to export myfunction. I want to test that various functions are exported.

Oh, I found it.
I noticed that in ?ls it says that the first argument ('package:graphics' for me) also counts as an environment. exists' where argument has the same documentation as ls' name argument, so I guessed 'package:graphics' might work there too:
exists('plot', where='package:graphics', mode='function')
[1] TRUE # huzzah!

environment(plot)
<environment: namespace:graphics>
find('+')
#[1] "package:base"
find('plot')
#[1] "package:graphics"
find('plot', mode="function")
#[1] "package:graphics"

All the answers proposed previously walk environments, not namespaces. Their behavior will therefore differ depending on whether the target package was loaded with library(), and in what order.
Here is an approach with utils::getFromNamespace():
function_exists <- function(package, funcname) {
tryCatch({
utils::getFromNamespace(funcname, package)
TRUE
}, error = function(...) { FALSE })
}

Related

Why/how some packages define their functions in nameless environment?

In my code, I needed to check which package the function is defined from (in my case it was exprs(): I needed it from Biobase but it turned out to be overriden by rlang).
From this SO question, I thought I could use simply environmentName(environment(functionname)). But for exprs from Biobase that expression returned empty string:
environmentName(environment(exprs))
# [1] ""
After checking the structure of environment(exprs) I noticed that it has .Generic member which contains package name as an attribute:
environment(exprs)$.Generic
# [1] "exprs"
# attr(,"package")
# [1] "Biobase"
So, for now I made this helper function:
pkgparent <- function(functionObj) {
functionEnv <- environment(functionObj)
envName <- environmentName(functionEnv)
if (envName!="")
return(envName) else
return(attr(functionEnv$.Generic,'package'))
}
It does the job and correctly returns package name for the function if it is loaded, for example:
pkgparent(exprs)
# Error in environment(functionObj) : object 'exprs' not found
library(Biobase)
pkgparent(exprs)
# [1] "Biobase"
library(rlang)
# The following object is masked from ‘package:Biobase’:
# exprs
pkgparent(exprs)
# [1] "rlang"
But I still would like to learn how does it happen that for some packages their functions are defined in "unnamed" environment while others will look like <environment: namespace:packagename>.
What you’re seeing here is part of how S4 method dispatch works. In fact, .Generic is part of the R method dispatch mechanism.
The rlang package is a red herring, by the way: the issue presents itself purely due to Biobase’s use of S4.
But more generally your resolution strategy might fail in other situations, because there are other reasons (albeit rarely) why packages might define functions inside a separate environment. The reason for this is generally to define a closure over some variable.
For example, it’s generally impossible to modify variables defined inside a package at the namespace level, because the namespace gets locked when loaded. There are multiple ways to work around this. A simple way, if a package needs a stateful function, is to define this function inside an environment. For example, you could define a counter function that increases its count on each invocation as follows:
counter = local({
current = 0L
function () {
current <<- current + 1L
current
}
})
local defines an environment in which the function is wrapped.
To cope with this kind of situation, what you should do instead is to iterate over parent environments until you find a namespace environment. But there’s a simpler solution, because R already provides a function to find a namespace environment for a given environment (by performing said iteration):
pkgparent = function (fun) {
nsenv = topenv(environment(fun))
environmentName(nsenv)
}

How to check if a function is anonymous in R

I want to write a function that distinguishes if a function is being sourced from a package or set up anonymously in a case where these are the only 2 possibilities (constraints of a one-liner).
If it's the latter - let's call it a variable anonymous.function - I'd want to 'get it out' of the (function(parameters){inner_workings}) closure as function(){} by eval(parse(text = call(anonymous.function()))
typeof(package::function) returns "closure"
typeof((function(parameters) {inner_workings})) likewise returns "closure"
Should I distinguish the two by comparing environment, or namespace?
environment(package::function) returns <environment: namespace:package>
environment((function)parameters) {inner_workings})) returns <environment: R_GlobalEnv>
I'm hesitant to say that I'll only assume it's anonymous if environment is GlobalEnv, as the stack could be traversed below Global if I understand correctly in some situations, e.g. if I'm running this function within a function, so I'd check if environment is a namespace.
Can anyone tell me how I can check for a namespace? I can see it in the reported string as <environment: namespace:...> but can't seem to access that distinction. The only distinction I can see is that a namespace environment isn't hashed - that's only observed upon failing to run env.profile() for an unhashed environment...
Surely there's a simple way to check for anonymous functions without a try/catch statement! :-) Advice would be greatly appreciated, the R documentation seems to run dry at this level.
Edit ideally accessing the data structure, i.e. without grepling a string!
Does packageName work for you? I was unsure about the desired output of a function like f1 <- mean. See below for my version.
is_package_fun <- function(fun)
!is.null(packageName(environment(fun)))
# defining functions
f1 <- mean
f2 <- function(x) mean(x)
# checking
is_package_fun(mean)
## [1] TRUE
is_package_fun(f1) # not sure about desired output...
## [1] TRUE
is_package_fun(f2)
## [1] FALSE

retrieve original version of package function even if over-assigned

Suppose I replace a function of a package, for example knitr:::sub_ext.
(Note: I'm particularly interested where it is an internal function, i.e. only accessible by ::: as opposed to ::, but the same answer may work for both).
library(knitr)
my.sub_ext <- function (x, ext) {
return("I'm in your package stealing your functions D:")
}
# replace knitr:::sub_ext with my.sub_ext
knitr <- asNamespace('knitr')
unlockBinding('sub_ext', knitr)
assign('sub_ext', my.sub_ext, knitr)
lockBinding('sub_ext', knitr)
Question: is there any way to retrieve the original knitr:::sub_ext after I've done this? Preferably without reloading the package?
(I know some people want to know why I would want to do this so here it is. Not required reading for the question). I've been patching some functions in packages like so (not actually the sub_ext function...):
original.sub_ext <- knitr:::sub_ext
new.sub_ext <- function (x, ext) {
# some extra code that does something first, e.g.
x <- do.something.with(x)
# now call the original knitr:::sub_ext
original.sub_ext(x, ext)
}
# now set knitr:::sub_ext to new.sub_ext like before.
I agree this is not in general a good idea (in most cases these are quick fixes until changes make their way into CRAN, or they are "feature requests" that would never be approved because they are somewhat case-specific).
The problem with the above is if I accidentally execute it twice (e.g. it's at the top of a script that I run twice without restarting R in between), on the second time original.sub_ext is actually the previous new.sub_ext as opposed to the real knitr:::sub_ext, so I get infinite recursion.
Since sub_ext is an internal function (I wouldn't call it directly, but functions from knitr like knit all call it internally), I can't hope to modify all the functions that call sub_ext to call new.sub_ext manually, hence the approach of replacing the definition in the package namespace.
When you do assign('sub_ext', my.sub_ext, knitr), you are irrevocably overwriting the value previously associated with sub_ext with the value of my.sub_ext. If you first stash the original value, though, it's not hard to reset it when you're done:
library(knitr)
knitr <- asNamespace("knitr")
## Store the original value of sub_ext
.sub_ext <- get("sub_ext", envir = knitr)
## Overwrite it with your own function
my.sub_ext <- function (x, ext) "I'm in your package stealing your functions D:"
assignInNamespace('sub_ext', my.sub_ext, knitr)
knitr:::sub_ext("eg.csv", "pdf")
# [1] "I'm in your package stealing your functions D:"
## Reset when you're done
assignInNamespace('sub_ext', .sub_ext, knitr)
knitr:::sub_ext("eg.csv", "pdf")
# [1] "eg.pdf"
Alternatively, as long as you are just adding lines of code to what's already there, you could add that code using trace(). What's nice about trace() is that, when you are done, you can use untrace() to revert the function's body to its original form:
trace(what = "mean.default",
tracer = quote({
a <- 1
b <- 2
x <- x*(a+b)
}),
at = 1)
mean(1:2)
# Tracing mean.default(1:2) step 1
# [1] 4.5
untrace("mean.default")
# Untracing function "mean.default" in package "base"
mean(1:2)
# [1] 1.5
Note that if the function you are tracing is in a namespace, you'll want to use trace()'s where argument, passing it the name of some other (exported) function that shares the to-be-traced function's namespace. So, to trace an unexported function in knitr's namespace, you could set where=knit

Changing defaults in a function inside a locked package [duplicate]

This question already has answers here:
Setting Function Defaults R on a Project Specific Basis
(2 answers)
Closed 9 years ago.
I am developing my first package and it is aimed at users who are new to R, so I am trying to minimize the amount of R skills required to use the package. As a result I want a function that changes defaults in other functions within my package. But I get the following error "cannot add bindings to a locked environment", which means the environment of the package is locked and I am not allowed to change the default values of its functions.
Here is an example that throws a similar error:
library(ggplot2)
assign(formals(geom_point)$position, "somethingelse", pos="package:ggplot2")
When I try assignInNamespace i get:
Error in bindingIsLocked(x, ns) : no binding for "identity"
assignInNamespace(formals(geom_point)$position,"somethingelse", pos = "package:ggplot2")
Here is an example of what I hope to achieve.
default <- function(x=c("A", "B", "C")){
x
}
default()
change.default <- function(x){
formals(default)$x <<- x # Notice the global assign
}
change.default(1:3)
default()
I am aware that this is far from the recommended approach, but I am willing to cut corners to improve the learning curve of the package. Is there a way to achieve this?
This question has been marked as a duplicate of Setting Function Defaults R on a Project Specific Basis. This is a different situation as this question concerns how to allow the user in a interactive session to change the defaults of a function - not how to actually do it. The old question could not have been solved with the options() function and it is therefore a different question.
I think the colloquial way to achieve what you want is via option and packages in fact do so, e.g., lattice (although they use special options) or ascii.
Furthermore, this is also done so in base R, e.g., the famous and notorious default for stringsAsFactors.
If you look at ?read.table or ?data.frame you get: stringsAsFactors = default.stringsAsFactors(). Inspecting this reveals:
> default.stringsAsFactors
function ()
{
val <- getOption("stringsAsFactors")
if (is.null(val))
val <- TRUE
if (!is.logical(val) || is.na(val) || length(val) != 1L)
stop("options(\"stringsAsFactors\") not set to TRUE or FALSE")
val
}
<bytecode: 0x000000000b068478>
<environment: namespace:base>
The relevant part here is getOption("stringsAsFactors") which produces:
> getOption("stringsAsFactors")
[1] TRUE
Changing it is achieved like this:
> options(stringsAsFactors = FALSE)
> getOption("stringsAsFactors")
[1] FALSE
To do what you want your package would need to set an option, and the function take it's values form the options. Another function could then change the options:
options(foo=c("A", "B", "C"))
default <- function(x=getOption("foo")){
x
}
default()
change.default <- function(x){
options(foo=x)
}
change.default(1:3)
default()
If you want your package to set the options when loaded, you need to create a .onAttach or .onLoad function in zzz.R. My afex package e.g., does this and changes the default contrasts. In your case it could look like the following:
.onAttach <- function(libname, pkgname) {
options(foo=c("A", "B", "C"))
}
ascii does it via .onLoad (I don't remember what is the exact difference, but Writing R Extensions will help).
Preferably, a function has the following things:
Input arguments
A function body which does something with those arguments
Output arguments
So in your situation where you want to change something about the behavior of a function, changing the input arguments in the best way to go. See for example my answer to another post.
You could also use an option to save some global settings (e.g. which font to use, which PATH the packages you use are stored), see the answer of #James in the question I linked above. But use these things sparingly as it makes the code hard to read. I would primarily use them read only, i.e. set them once (either by the package or the user) and not allow functions to change them.
The unreadability stems from the fact that the behavior of the function is not solely determined locally (i.e. by the code directly working with it), but also by settings far away. This makes it hard to determine what a function does by purely looking at the code calling it, but you have to dig through much more code to fully understand what is going on. In addition, what if other functions change those options, making it even harder to predict what a given function will do as it depends on the history of functions. And here comes my earlier recommendation for read-only options back into play, if these are read only, some of the problems about readability are lessened.

hiding personal functions in R

I have a few convenience functions in my .Rprofile, such as this handy function for returning the size of objects in memory. Sometimes I like to clean out my workspace without restarting and I do this with rm(list=ls()) which deletes all my user created objects AND my custom functions. I'd really like to not blow up my custom functions.
One way around this seems to be creating a package with my custom functions so that my functions end up in their own namespace. That's not particularly hard, but is there an easier way to ensure custom functions don't get killed by rm()?
Combine attach and sys.source to source into an environment and attach that environment. Here I have two functions in file my_fun.R:
foo <- function(x) {
mean(x)
}
bar <- function(x) {
sd(x)
}
Before I load these functions, they are obviously not found:
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
Create an environment and source the file into it:
> myEnv <- new.env()
> sys.source("my_fun.R", envir = myEnv)
Still not visible as we haven't attached anything
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
and when we do so, they are visible, and because we have attached a copy of the environment to the search path the functions survive being rm()-ed:
> attach(myEnv)
> foo(1:10)
[1] 5.5
> bar(1:10)
[1] 3.027650
> rm(list = ls())
> foo(1:10)
[1] 5.5
I still think you would be better off with your own personal package, but the above might suffice in the meantime. Just remember the copy on the search path is just that, a copy. If the functions are fairly stable and you're not editing them then the above might be useful but it is probably more hassle than it is worth if you are developing the functions and modifying them.
A second option is to just name them all .foo rather than foo as ls() will not return objects named like that unless argument all = TRUE is set:
> .foo <- function(x) mean(x)
> ls()
character(0)
> ls(all = TRUE)
[1] ".foo" ".Random.seed"
Here are two ways:
1) Have each of your function names start with a dot., e.g. .f instead of f. ls will not list such functions unless you use ls(all.names = TRUE) therefore they won't be passed to your rm command.
or,
2) Put this in your .Rprofile
attach(list(
f = function(x) x,
g = function(x) x*x
), name = "MyFunctions")
The functions will appear as a component named "MyFunctions" on your search list rather than in your workspace and they will be accessible almost the same as if they were in your workspace. search() will display your search list and ls("MyFunctions") will list the names of the functions you attached. Since they are not in your workspace the rm command you normally use won't remove them. If you do wish to remove them use detach("MyFunctions") .
Gavin's answer is wonderful, and I just upvoted it. Merely for completeness, let me toss in another one:
R> q("no")
followed by
M-x R
to create a new session---which re-reads the .Rprofile. Easy, fast, and cheap.
Other than that, private packages are the way in my book.
Another alternative: keep the functions in a separate file which is sourced within .RProfile. You can re-source the contents directly from within R at your leisure.
I find that often my R environment gets cluttered with various objects when I'm creating or debugging a function. I wanted a way to efficiently keep the environment free of these objects while retaining personal functions.
The simple function below was my solution. It does 2 things:
1) deletes all non-function objects that do not begin with a capital letter and then
2) saves the environment as an RData file
(requires the R.oo package)
cleanup=function(filename="C:/mymainR.RData"){
library(R.oo)
# create a dataframe listing all personal objects
everything=ll(envir=1)
#get the objects that are not functions
nonfunction=as.vector(everything[everything$data.class!="function",1])
#nonfunction objects that do not begin with a capital letter should be deleted
trash=nonfunction[grep('[[:lower:]]{1}',nonfunction)]
remove(list=trash,pos=1)
#save the R environment
save.image(filename)
print(paste("New, CLEAN R environment saved in",filename))
}
In order to use this function 3 rules must always be kept:
1) Keep all data external to R.
2) Use names that begin with a capital letter for non-function objects that I want to keep permanently available.
3) Obsolete functions must be removed manually with rm.
Obviously this isn't a general solution for everyone...and potentially disastrous if you don't live by rules #1 and #2. But it does have numerous advantages: a) fear of my data getting nuked by cleanup() keeps me disciplined about using R exclusively as a processor and not a database, b) my main R environment is so small I can backup as an email attachment, c) new functions are automatically saved (I don't have to manually manage a list of personal functions) and d) all modifications to preexisting functions are retained. Of course the best advantage is the most obvious one...I don't have to spend time doing ls() and reviewing objects to decide whether they should be rm'd.
Even if you don't care for the specifics of my system, the "ll" function in R.oo is very useful for this kind of thing. It can be used to implement just about any set of clean up rules that fit your personal programming style.
Patrick Mohr
A nth, quick and dirty option, would be to use lsf.str() when using rm(), to get all the functions in the current workspace. ...and let you name the functions as you wish.
pattern <- paste0('*',lsf.str(), '$', collapse = "|")
rm(list = ls()[-grep(pattern, ls())])
I agree, it may not be the best practice, but it gets the job done! (and I have to selectively clean after myself anyway...)
Similar to Gavin's answer, the following loads a file of functions but without leaving an extra environment object around:
if('my_namespace' %in% search()) detach('my_namespace'); source('my_functions.R', attach(NULL, name='my_namespace'))
This removes the old version of the namespace if it was attached (useful for development), then attaches an empty new environment called my_namespace and sources my_functions.R into it. If you don't remove the old version you will build up multiple attached environments of the same name.
Should you wish to see which functions have been loaded, look at the output for
ls('my_namespace')
To unload, use
detach('my_namespace')
These attached functions, like a package, will not be deleted by rm(list=ls()).

Resources