Masking in R - Confusing behavior from a new package - r

When I attached the package ffbase for the first time it alerted me that the functions %in% and table were being masked from the base package. I use both functions quite a lot, so I immediately investigated what this means and I am not really sure I understand what's going on here.
As far as I can tell, for table this means that a new method was added:
methods(table)
[1] table.default* table.ff
And for %in%, it's truly been overwritten so the default is the ff version, with base playing backup:
getAnywhere(`%in%`)
2 differing objects matching '%in%' were found in the following places
package::ffbase
package:base
namespace:base
namespace:ffbase
I have two questions now. The first is - if a new method is added an S3 generic, then why would you need to warn about masking? In my mind, table isn't truly masked because doesn't R just figure out what data type I have and dispatch the correct method?
And secondly, if you have actually overwritten a function then why does it still work if I do base functionality without specifying the right namespace?
x <- c(1, 23)
23 %in% x
[1] TRUE
I would have assumed I would have needed to use base::%in% to get this right?
I suppose this second question really boils down to - I trust R when it comes to the generic method dispatch, because the point of a having a class is to provide some way to signal what method you're supposed to use. But if you have this system where package functions (not associated with a class) just get loaded in order of package load, then I don't understand how R knows when the first one it encounters isn't going to work?

Related

how to get help on predict method from lme4 package R (really a question about R class methods help in general)

I'm reading through a tutorial that is using the lme4 package and one of the input options to predict is re.form=Na.
m_lmer <- lmer(log(beakh) ~ log(wingl) + (1 | taxon), data = d)
d$predict_lmer_population <- predict(m_lmer, re.form = NA)
I want to get help for the predict call, but clearly doing ?predict is incorrect.
I then tried asking for the class of the model:
> class(m_lmer)
[1] "lmerMod"
attr(,"package")
[1] "lme4"
I then tried ?lmerMod which RStudio automagically changed to ?`lmerMod-class`. I get the addition of ` to the name because of the - "special character" but where did class come from?
The help then describes the "merMod" class, not "lmerMod". Why the name change (leading l dropped)?
After some searching in that help I found a link to predict.merMod
Further searching confirmed I could have done: methods('predict') and found the same method, although it is listed predict.merMod* for some reason (added * symbol).
In the end I feel like I would be able to find something similar much more quickly the next time but it still seems very hard to find good help for class methods in R. I'm not sure if this would work the same for S4 or R6 (from the documentation it seems predict.merMod is a S3 method)? It is not clear why the l was dropped from the class name (lmerMod to merMod) or why the -class suffix is needed when asking for help. I feel like I'm missing some extremely basic lesson on R documentation.
Throwing this "help in R" link in for reference that seems to omit class based methods help and also seems like it should just point to some official R documentation website rather than being such a long SO post ...
How to get help in R?
This is a very good question. There are a bunch of things going on here, having to do with (1) the difference between S3 and S4 classes and methods, (2) the underlying class structures in lme4.
I want to get help for the predict call, but clearly doing ?predict is incorrect.
?predict gets you help for the generic function which, as you've noticed, isn't useful. In general it's up to the package developers to decide whether their specialized version of a particular method (e.g., the predict() method for merMod objects) is sufficiently special (e.g., has different or unusual arguments) that it should be documented separately. (Writing R Extensions says "If it is necessary or desired to provide an explicit function declaration (in a \usage section) for [a ...] method ...") (emphasis added).
In general, if they're documented, docs for S3 methods will be available as ?function.class, S4 methods will be documented as ?"function,class-method" (which needs quotation marks since it has a hyphen in it).`
The methods() function gives some clues about where to look: if the bbmle and lme4 packages are loaded, predict.merMod* and predict,mle2-method* both show up in the list (the stars mean the functions are hidden, i.e. you can use them by calling predict(my_fit), but the function definitions are not easily available).
I then tried asking for the class of the model:
class(m_lmer)
[1] "lmerMod"
attr(,"package")
[1] "lme4"
lmer() produces an object of class lmerMod, which is a subclass of the merMod class (the other subclass is glmerMod, for GLMMs).
I then tried ?lmerMod which RStudio automagically changed to ?lmerMod-class. I get the addition of ` to the name because of the - "special character" but where did class come from?
I don't know that much about RStudio: the "-class" part is specific to methods for S4 classes.
The help then describes the "merMod" class, not "lmerMod". Why the name change (leading l dropped)?
See above.
The most opaque part of all of this (IMO) is figuring out S4 class hierarchies - if you say methods(class = "lmerMod") you only get two results (getL and show), it's hard to figure out that you need to say methods(class = "merMod") to get most of the available methods (i.e., only a few methods are specific to lmerMod and glmerMod subclasses - most are more general).
According to this answer you can find the subclasses of merMod as follows:
library(lme4)
cls <- getClass("merMod")
names(cls#subclasses)
## [1] "lmerMod" "glmerMod" "nlmerMod"
How about the other direction?
cls <- getClass("lmerMod")
names(cls#contains)
## [1] "merMod"
(Don't ask me more questions, I really don't understand S4 classes all that well!)

Should I use S3 data frames in my package?

I have a package which uses a data.frame based S4 class:
setClass(Class="foobar",
slots=c(a="character", b="character", c="character"),
contains="data.frame")
Works as intended. However, I observe weird warnings when combining with tidyverse:
df <- data.frame(ID=1:5)
df2 <- new("foobar", df)
as_tibble(df2)
The last statement incites a warning message:
Warning message:
In class(x) <- c(subclass, tibble_class) :
Setting class(x) to multiple strings ("tbl_df", "tbl", ...); result will no longer be an S4 object
This is because tidyverse does not support S4 data frames. This can be circumvented in downstream code by using asS3(df). However, users of my package may be puzzled if they see these warnings. I am now faced with the following choices and I don't really know which would be the most reasonable and correct:
Keep the S4 model and hope that the users won't mind seeing this warning each time they pass my data frames into something else.
Use S3. However, I already have another S4 class defined in published versions of my package. I am afraid that I would break someones code.
Mix S3 and S4. Is it even allowed?
Is there another solution I might be overlooking?
There is no brilliant solution to this which is entirely within your control.
The tidyverse package may call class<- on any data-frame-like object given to it, and as you have seen this will destroy the S4 nature of any object. This can't be worked around by (for instance) defining a method for coerce or calling setAs, as class<- doesn't use that mechanism. (class<- isn't generic either, you can't set a method for it.) The only way to make tidyverse support S4 is for tidyverse's author to alter the code to use as or similar, and it doesn't look like that is top of their to-do-list.
You are correct to be worried about dramatically altering the way your class works when you have released a version of your package already with an S4 class.
If:
your package is quite new and doesn't yet have many users;
you can do all you need to do with S3; and
you don't know of another package which has built new classes on top of yours
then it may be best to redefine it as S3, and include a message when your package is installed or loaded to say
thanks for installing myPackage v2. Code may be incompatible with v1.2 or earlier; see help(blah) for details
otherwise, stick with S4.
You can't exactly mix S3 and S4 for class definitions (you can for method definitions). The closest you can come is setOldClass which registers a S3 class as an S4 one (whereas you wanted the opposite). Still, that may help you achieve "you can do all you need to do with S3" above.
One other possibility is to define your own version of class<- which checks to see if an object of S4 class foobar is attempting to be coerced to S3 and calls the ordinary class<- if not. The cure is probably worse than the disease in this case; this will slow down all future S3 class conversions (since class<- is now an ordinary function call, not a primitive) but it should work in principle. Another reason that it is not recommended is that you are relying on no other package higher in the search path doing something similar (what if another package author had the same issue and wanted to do the same trick? Then the results would depend on which package was higher up the search path!)

Forcing package to attach without masking warning

Is there a way to specify that a library should not throw warnings regarding name clashes and masked objects whenever it is attached? I imagine a solution would involve editing the description or one of the special functions such as .onAttach but I can't find anything solving this issue.
I ask becuase the warnings are unneeded. I have defined my own S3 class and the masked function is still called by the default method of the masking function:
median <- function(x, ...) UseMethod("median")
median.default <- stats::median.default
In the event that a user is using median on a typical R data structure such as a vector, the median method in my package will call the masked function automatically, so there is no real need for the user to be aware of the masking.
I'm not sure if your question is that you don't want the user to see the warnings, or that you don't want the warnings to occur.
If the former, you might be able to use shhh in the tfse library around your library call. Or, if it's just for yourself, you could set the warn.conflicts = FALSE argument when calling the library.
If the latter, it would be clearly more elegant to rewrite the offending method so it doesn't conflict in the namespace.

Scoping and functions in R 2.11.1 : What's going wrong?

This question comes from a range of other questions that all deal with essentially the same problem. For some strange reason, using a function within another function sometimes fails in the sense that variables defined within the local environment of the first function are not found back in the second function.
The classical pattern in pseudo-code :
ff <- function(x){
y <- some_value
some_function(y)
}
ff(x)
Error in eval(expr, envir, enclos) :
object 'y' not found
First I thought it had something to do with S4 methods and the scoping in there, but it also happens with other functions. I've had some interaction with the R development team, but all they did was direct me to the bug report site (which is not the most inviting one, I have to say). I never got any feedback.
As the problem keeps arising, I wonder if there is a logic explanation for it. Is it a common mistake made in all these cases, and if so, which one? Or is it really a bug?
Some of those questions :
Using functions and environments
R (statistical) scoping error using transformBy(), part of the doBy package.
How to use acast (reshape2) within a function in R?
Why can't I pass a dataset to a function?
Values not being copied to the next local environment
PS : I know the R-devel list, in case you wondered...
R has both lexical and dynamic scope. Lexical scope works automatically, but dynamic scope must be implemented manually, and requires careful book-keeping. Only functions used interactively for data analysis need dynamic scope, so most authors (like me!) don't learn how to do it correctly.
See also: the standard non-standard evaluation rules.
There are undoubtedly bugs in R, but a lot of the issues that people have been having are quite often errors in the implementation of some_function, not R itself. R has scoping rules ( see http://cran.r-project.org/doc/manuals/R-intro.html#Scope) which when combined with lazy evaluation of function arguments and the ability to eval arguments in other scopes are extremely powerful but which also often lead to subtle errors.
As Dirk mentioned in his answer, there isn't actually a problem with the code that you posted. In the links you posted in the question, there seems to be a common theme: some_function contains code that messes about with environments in some way. This messing is either explicit, using new.env and with or implicitly, using a data argument, that probably has a line like
y <- eval(substitute(y), data)
The moral of the story is twofold. Firstly, try to avoid explicitly manipulating environments, unless you are really sure that you know what you are doing. And secondly, if a function has a data argument then put all the variables that you need the function to use inside that data frame.
Well there is no problem in what you posted:
/tmp$ cat joris.r
#!/usr/bin/r -t
some_function <- function(y) y^2
ff <- function(x){
y <- 4
some_function(y) # so we expect 16
}
print(ff(3)) # 3 is ignored
$ ./joris.r
[1] 16
/tmp$
Could you restate and postan actual bug or misfeature?

Finding What You Need in R: focused searching within R and all (3,500+) CRAN Packages

Often in R, there are a dozen functions scattered across as many packages--all of which have the same purpose but of course differ in accuracy, performance, documentation, theoretical rigor, and so on.
How do you locate these--from within R and even from among the CRAN Packages which you have not installed?
So for instance: the generic plot function. Setting secondary ticks is much easier using a function outside of the base package:
minor.tick(nx=n, ny=n, tick.ratio=n)
Of course plot is in R core, but minor.tick is not, it's actually in Hmisc.
Of course, that doesn't show up in the documentation for plot, nor should you expect it to.
Another example: data-input arguments to plot can be supplied by an object returned from the function hexbin, again, this function is from a library outside of R core.
What would be great obviously is a programmatic way to gather these function arguments from the various libraries and put them in a single namespace?
*edit: (trying to re-state my example just above more clearly:) the arguments to plot supplied in R core, e.g., setting the axis tick frequency are xaxp/yaxp; however, one can also set a/t/f via a function outside of the base package, again, as in the minor.tick function from the Hmisc package--but you wouldn't know that just from looking at the plot method signature. Is there a meta function in R for this?*
So far, as i come across them, i've been manually gathering them, each set gathered in a single TextMate snippet (along with the attendant library imports). This isn't that difficult or time consuming, but i can only update my snippet as i find out about these additional arguments/parameters. Is there a canonical R way to do this, or at least an easier way?
Just in case that wasn't clear, i am not talking about the case where multiple packages provide functions directed to the same statistic or view (e.g., 'boxplot' in the base package; 'boxplot.matrix' in gplots; and 'bplots' in Rlab). What i am talking is the case in which the function name is the same across two or more packages.
The "sos" package is an excellent resource. It's primary interface is the "findFn" command, which accepts a string (your search term) and scans the "function" entries in Johnathan Baron's site search database, and returns the entries that contain the search term in a data frame (of class "findFn").
The columns of this data frame are: Count, MaxScore, TotalScore, Package, Function, Date, Score, Description, and Link. Clicking on "Link" in any entry's row will immediately pull up the help page.
An example: suppose you wanted to find all convolution filters across all 1800+ R packages.
library(sos)
cf = findFn("convolve")
This query will look the term "convolve", in other words, that doesn't have to be the function name.
Keying in "cf" returns an HTML table of all matches found (23 in this case). This table is an HTML rendering of the data frame i mentioned just above. What is particularly convenient is that each column ("Count", "MaxScore", etc.) is sortable by clicking on the column header, so you can view the results by "Score", by "Package Name", etc.
(As an aside: when running that exact query, one of the results was the function "panel.tskernel" in a package called "latticeExtra". I was not aware this package had any time series filters in it and i doubt i would have discovered it otherwise.
Your question is not easy to answer. There is not one definitive function.
formals is the function that gives the named arguments to a function and their defaults in a named list, but you can always have variable arguments through the ... parameter and hidden named arguments with embedded hadArg function. To get a list of those you would have to use a getAnywhere and then scan the expression for the hasArg. I can't think of a automatic way of doing it yourself. That is if the functions hidden arguments are not documented.

Resources