Related
I have a bunch of functions and I'm trying to keep my workspace clean by defining them in an environment and attaching the environment. Some of the functions are S3 generics, and they don't seem to play well with this approach.
A minimum example of what I'm experiencing requires 4 files:
testfun.R
ttt.xxx <- function(object) print("x")
ttt <- function(object) UseMethod("ttt")
ttt2 <- function() {
yyy <- structure(1, class="xxx")
ttt(yyy)
}
In testfun.R I define an S3 generic ttt and a method ttt.xxx, I also define a function ttt2 calling the generic.
testenv.R
test_env <- new.env(parent=globalenv())
source("testfun.R", local=test_env)
attach(test_env)
In testenv.R I source testfun.R to an environment, which I attach.
test1.R
source("testfun.R")
ttt2()
xxx <- structure(1, class="xxx")
ttt(xxx)
test1.R sources testfun.R to the global environment. Both ttt2 and a direct function call work.
test2.R
source("testenv.R")
ttt2()
xxx <- structure(1, class="xxx")
ttt(xxx)
test2.R uses the "attach" approach. ttt2 still works (and prints "x" to the console), but the direct function call fails:
Error in UseMethod("ttt") :
no applicable method for 'ttt' applied to an object of class "xxx"
however, calling ttt and ttt.xxx without arguments show that they are known, ls(pos=2) shows they are on the search path, and sloop::s3_dispatch(ttt(xxx)) tells me it should work.
This questions is related to Confusion about UseMethod search mechanism and the link therein https://blog.thatbuthow.com/how-r-searches-and-finds-stuff/, but I cannot get my head around what is going on: why is it not working and how can I get this to work.
I've tried both R Studio and R in the shell.
UPDATE:
Based on the answers below I changed my testenv.R to:
test_env <- new.env(parent=globalenv())
source("testfun.R", local=test_env)
attach(test_env)
if (is.null(.__S3MethodsTable__.))
.__S3MethodsTable__. <- new.env(parent = baseenv())
for (func in grep(".", ls(envir = test_env), fixed = TRUE, value = TRUE))
.__S3MethodsTable__.[[func]] <- test_env[[func]]
rm(test_env, func)
... and this works (I am only using "." as an S3 dispatching separator).
It’s a little-known fact that you must use .S3method() to define methods for S3 generics inside custom environments (outside of packages).1 The reason almost nobody knows this is because it is not necessary in the global environment; but it is necessary everywhere else since R version 3.6.
There’s virtually no documentation of this change, just a technical blog post by Kurt Hornik about some of the background. Note that the blog post says the change was made in R 3.5.0; however, the actual effect you are observing — that S3 methods are no longer searched in attached environments — only started happening with R 3.6.0; before that, it was somehow not active yet.
… except just using .S3method will not fix your code, since your calling environment is the global environment. I do not understand the precise reason why this doesn’t work, and I suspect it’s due to a subtle bug in R’s S3 method lookup. In fact, using getS3method('ttt', 'xxx') does work, even though that should have the same behaviour as actual S3 method lookup.
I have found that the only way to make this work is to add the following to testenv.R:
if (is.null(.__S3MethodsTable__.)) {
.__S3MethodsTable__. <- new.env(parent = baseenv())
}
.__S3MethodsTable__.$ttt.xxx <- ttt.xxx
… in other words: supply .GlobalEnv manually with an S3 methods lookup table. Unfortunately this relies on an undocumented S3 implementation detail that might theoretically change in the future.
Alternatively, it “just works” if you use ‘box’ modules instead of source. That is, you can replace the entirety of your testenv.R by the following:
box::use(./testfun[...])
This code treats testfun.R as a local module and loads it, attaching all exported names (via the attach declaration [...]).
1 (and inside packages you need to use the equivalent S3method namespace declaration, though if you’re using ‘roxygen2’ then that’s taken care of for you)
First of all, my advice would be: don't try to reinvent R packages. They solve all the problems you say you are trying to solve, and others as well.
Secondly, I'll try to explain what went wrong in test2.R. It calls ttt on an xxx object, and ttt.xxx is on the search list, but is not found.
The problem is how the search for ttt.xxx happens. The search doesn't look for ttt.xxx in the search list, it looks for it in the environment from which ttt was called, then in an object called .__S3MethodsTable__.. I think there are two reasons for this:
First, it's a lot faster. It only needs to look in one or two places, and the table can be updated whenever a package is attached or detached, a relatively rare operation.
Second, it's more reliable. Each package has its own methods table, because two packages can use the same name for generics that have nothing to do with each other, or can use the same class names that are unrelated. So package code needs to be able to count on finding its own definitions first.
Since your call to ttt() happens at the top level, that's where R looks first for ttt.xxx(), but it's not there. Then it looks in the global .__S3MethodsTable__. (which is actually in the base environment), and it's not there either. So it fails.
There is a workaround that will make your code work. If you run
.__S3MethodsTable__. <- list2env(list(ttt.xxx = ttt.xxx))
as the last line of testenv.R, then you'll create a methods table in the global environment. (Normally there isn't one there, because that's user space, and R doesn't like putting things there unless the user asks for it.)
R will find that methods table, and will find the ttt.xxx method that it defines. I wouldn't be surprised if this breaks some other aspect of S3 dispatch, so I don't recommend doing it, but give it a try if you insist on reinventing the package system.
I'm writing a package in which I would like to create a new generic method, called "analyze", wich do different things according to the argument class. Similar to print that has print.lm, print.aov etc.
In the R folder of my package, I created two files, "analyze.lm" and "analyze.aov" that contain eponym functions. However, if I run analyze(fit) on an lm object, it does nothing because R only recognizes analyze.lm and not the root function ("analyze" only).
I've tried adding an "analyze.R" file, which either contained setMethod() (but that errored), setGeneric("analyze", function(x) attributes(x)) (but that did not solve the issue) or an analyze() function that prints "NULL". However, if I then run analyze(fit) on an lm object, in prints NULL instead of running the analyze.lm class method.
How could I create a generic method, similar to base print, that behaves differently according to argument class, and than I maintained splitted in different files (analyze.lm.R, analyze.aov.R etc.). Thanks!
Add a generic function like this:
analyze <- function(object, ...){
UseMethod("analyze")
}
Is there a way to allow a method of an S4 object to directly adjust the values inside the slots of that object without copying the entire object into memory and having to re-write it to the parent environment at the end of the method? Right now I have an object that has slots where it keeps track of its own state. I call a method that advances it to the next state, but right now it seems like I have to assign() each value (or a copy of the object invoking the method) back to the parent environment. As a result, the object oriented code seems to be running a lot slower than code that simply adjusts the various state variables in a loop.
R has three object oriented (OO) systems: S3, S4 and Reference Classes (where the latter were for a while referred to as [[R5]], yet their official name is Reference Classes).
Reference Classes (or refclasses) are new in R 2.12. They fill a long standing need for mutable objects that had previously been filled by non-core packages like R.oo, proto and mutatr. While the core functionality is solid, reference classes are still under active development and some details will change. The most up-to-date documentation for Reference Classes can always be found in ?ReferenceClasses.
There are two main differences between reference classes and S3 and S4:
Refclass objects use message-passing OO
Refclass objects are mutable: the usual R copy on modify semantics do
not apply.
These properties makes this object system behave much more like Java and C#.
read more here:
http://adv-r.had.co.nz/R5.html
http://www.inside-r.org/r-doc/methods/ReferenceClasses
I asked this question on the R-list myself, and found a work-around to simulate a pass by reference, something in the style of :
eval(
eval(
substitute(
expression(object#slot <<- value)
,env=parent.frame(1) )
)
)
Far from the cleanest code around I'd say...
A suggestion coming from the R-help list, uses an environment to deal with these cases.
EDIT : tweaked code inserted.
setClass("MyClass", representation(.cache='environment',masterlist="list"))
setMethod("initialize", "MyClass",
function(.Object, .cache=new.env()) {
.Object#masterlist <- list()
callNextMethod(.Object, .cache=.cache)
})
sv <- function(object,name,value) {} #store value
setMethod("sv",signature=c("MyClass","character","vector"),
function(object, name, value) {
object#.cache$masterlist[[name]] <- value
})
rv <- function(object,name) {} #retrieve value
setMethod("rv",signature=c("MyClass","character"),
function(object, name) {
return(object#.cache$masterlist[[name]])
})
As far as I know (and if I get you correctly), you have to recopy the whole object. You can't easily pass values by reference, it is always passed "by value". So once you have modified (a copy of) your object, you need to recopy it back to your object.
John Chamber is pretty explicit about it in his book Software for Data Analysis. It's a way to avoid surprises or side effects.
I think there are some workaround using environments, but I can't help with this.
Is it possible in R to protect function names (or variables in general) so that they cannot be masked.
I recently spotted that this can be a problem when creating a data frame with the name "new", which masked a function used by lmer and thus stopped it working. (Recovery is easy once you know what the problem is, here "rm(new)" did it.)
There is an easy workaround for your problem, without worrying about protecting variable names (though playing with lockBinding does look fun). If a function becomes masked, as in your example, it is still possible to call the masked version, with the help of the :: operator.
In general, the syntax is packagename::variablename.
(If the function you want has not been exported from the package, then you need three colons instead, :::. This shouldn't apply in this case however.)
Maybe use environments! This is a great way to separate namespaces. For example:
> a <- new.env()
> assign('printer', function(x) print(x), envir=a)
> get('printer', envir=a)('test!')
[1] "test!"
#hdallazuanna recommends (via Twitter)
new <- 1
lockBinding('new', globalenv())
this makes sense when the variable is user created but does not, of course, prevent overwriting a function from a package.
I had the reverse problem from the OP, and I wanted to prevent my custom functions in .Rprofile from being overridden when I defined a variable with the same name as a function, but I ended up putting my functions to ~/.R.R and I added these lines to .Rprofile:
if("myfuns"%in%search())detach("myfuns")
source("~/.R.R",attach(NULL,name="myfuns"))
From the help page of attach:
One useful ‘trick’ is to use ‘what = NULL’ (or equivalently a
length-zero list) to create a new environment on the search path
into which objects can be assigned by assign or load or
sys.source.
...
## create an environment on the search path and populate it
sys.source("myfuns.R", envir = attach(NULL, name = "myfuns"))
My question is about avoiding namespace pollution when writing modules in R.
Right now, in my R project, I have functions1.R with doFoo() and doBar(), functions2.R with other functions, and main.R with the main program in it, which first does source('functions1.R'); source('functions2.R'), and then calls the other functions.
I've been starting the program from the R GUI in Mac OS X, with source('main.R'). This is fine the first time, but after that, the variables that were defined the first time through the program are defined for the second time functions*.R are sourced, and so the functions get a whole bunch of extra variables defined.
I don't want that! I want an "undefined variable" error when my function uses a variable it shouldn't! Twice this has given me very late nights of debugging!
So how do other people deal with this sort of problem? Is there something like source(), but that makes an independent namespace that doesn't fall through to the main one? Making a package seems like one solution, but it seems like a big pain in the butt compared to e.g. Python, where a source file is automatically a separate namespace.
Any tips? Thank you!
I would explore two possible solutions to this.
a) Think more in a more functional manner. Don't create any variables outside of a function. so, for example, main.R should contain one function main(), which sources in the other files, and does the work. when main returns, none of the clutter will remain.
b) Clean things up manually:
#main.R
prior_variables <- ls()
source('functions1.R')
source('functions2.R')
#stuff happens
rm(list = setdiff(ls(),prior_variables))`
The main function you want to use is sys.source(), which will load your functions/variables in a namespace ("environment" in R) other than the global one. One other thing you can do in R that is fantastic is to attach namespaces to your search() path so that you need not reference the namespace directly. That is, if "namespace1" is on your search path, a function within it, say "fun1", need not be called as namespace1.fun1() as in Python, but as fun1(). [Method resolution order:] If there are many functions with the same name, the one in the environment that appears first in the search() list will be called. To call a function in a particular namespace explicitly, one of many possible syntaxes - albeit a bit ugly - is get("fun1","namespace1")(...) where ... are the arguments to fun1(). This should also work with variables, using the syntax get("var1","namespace1"). I do this all the time (I usually load just functions, but the distinction between functions and variables in R is small) so I've written a few convenience functions that loads from my ~/.Rprofile.
name.to.env <- function(env.name)
## returns named environment on search() path
pos.to.env(grep(env.name,search()))
attach.env <- function(env.name)
## creates and attaches environment to search path if it doesn't already exist
if( all(regexpr(env.name,search())<0) ) attach(NULL,name=env.name,pos=2)
populate.env <- function(env.name,path,...) {
## populates environment with functions in file or directory
## creates and attaches named environment to search() path
## if it doesn't already exist
attach.env(env.name)
if( file.info(path[1])$isdir )
lapply(list.files(path,full.names=TRUE,...),
sys.source,name.to.env(env.name)) else
lapply(path,sys.source,name.to.env(env.name))
invisible()
}
Example usage:
populate.env("fun1","pathtofile/functions1.R")
populate.env("fun2","pathtofile/functions2.R")
and so on, which will create two separate namespaces: "fun1" and "fun2", which are attached to the search() path ("fun2" will be higher on the search() list in this case). This is akin to doing something like
attach(NULL,name="fun1")
sys.source("pathtofile/functions1.R",pos.to.env(2))
manually for each file ("2" is the default position on the search() path). The way that populate.env() is written, if a directory, say "functions/", contains many R files without conflicting function names, you can call it as
populate.env("myfunctions","functions/")
to load all functions (and variables) into a single namespace. With name.to.env(), you can also do something like
with(name.to.env("fun1"), doStuff(var1))
or
evalq(doStuff(var1), name.to.env("fun1"))
Of course, if your project grows big and you have lots and lots of functions (and variables), writing a package is the way to go.
If you switch to using packages, you get namespaces as a side-benefit (provided you use a NAMESPACE file). There are other advantages for using packages.
If you were really trying to avoid packages (which you shouldn't), then you could try assigning your variables in specific environments.
Well avoiding namespace pollution, as you put it, is just a matter of diligently partitioning the namespace and keeping your global namespace uncluttered.
Here are the essential functions for those two kinds of tasks:
Understanding/Navigating the Namespace Structure
At start-up, R creates a new environment to store all objects created during that session--this is the "global environment".
# to get the name of that environment:
globalenv()
But this isn't the root environment. The root is an environment called "the empty environment"--all environments chain back to it:
emptyenv()
returns: <environment: R_EmptyEnv>
# to view all of the chained parent environments (which includes '.GlobalEnv'):
search()
Creating New Environments:
workspace1 = new.env()
is.environment(workspace1)
returns: [1] TRUE
class(workspace1)
returns: [1] "environment"
# add an object to this new environment:
with(workspace1, attach(what="/Users/doug/Documents/test_obj.RData",
name=deparse(substitute(what)), warn.conflicts=T, pos=2))
# verify that it's there:
exists("test_obj", where=workspace1)
returns: [1] TRUE
# to locate the new environment (if it's not visible from your current environment)
parent.env(workspace1)
returns: <environment: R_GlobalEnv>
objects(".GlobalEnv")
returns: [1] "test_obj"
Coming from python, et al., this system (at first) seemed to me like a room full of carnival mirrors. The R Gurus on the other hand seem to be quite comfortable with it. I'm sure there are a number of reasons why, but my intuition is that they don't let environments persist. I notice that R beginners use 'attach', as in attach('this_dataframe'); I've noticed that experienced R users don't do that; they use 'with' instead eg,
with(this_dataframe, tapply(etc....))
(I suppose they would achieve the same thing if they used 'attach' then 'detach' but 'with' is faster and you don't have to remember the second step.) In other words, namespace collisions are avoided in part by limiting the objects visible from the global namespace.