Is there a way to specify that a library should not throw warnings regarding name clashes and masked objects whenever it is attached? I imagine a solution would involve editing the description or one of the special functions such as .onAttach but I can't find anything solving this issue.
I ask becuase the warnings are unneeded. I have defined my own S3 class and the masked function is still called by the default method of the masking function:
median <- function(x, ...) UseMethod("median")
median.default <- stats::median.default
In the event that a user is using median on a typical R data structure such as a vector, the median method in my package will call the masked function automatically, so there is no real need for the user to be aware of the masking.
I'm not sure if your question is that you don't want the user to see the warnings, or that you don't want the warnings to occur.
If the former, you might be able to use shhh in the tfse library around your library call. Or, if it's just for yourself, you could set the warn.conflicts = FALSE argument when calling the library.
If the latter, it would be clearly more elegant to rewrite the offending method so it doesn't conflict in the namespace.
Related
This question may seem basic but this has bothered me quite a while. The help document for many functions has ... as one of its argument, but somehow I can never get my head around this ... thing.
For example, suppose I have created a model say model_xgboost and want to make a prediction based on a dataset say data_tbl using the predict() function, and I want to know the syntax. So I look at its help document which says:
?predict
**Usage**
predict (object, ...)
**Arguments**
object a model object for which prediction is desired.
... additional arguments affecting the predictions produced.
To me the syntax and its examples didn't really enlighten me as I still have no idea what the valid syntax/arguments are for the function. In an online course it uses something like below, which works:
data_tbl %>%
predict(model_xgboost, new_data = .)
However, looking across the help doc I cannot find the new_data argument. Instead it mentioned newdata argument in its Details section, which actually didn't work if I displace the new_data = . with newdata = .:
Error in `check_pred_type_dots()`:
! Did you mean to use `new_data` instead of `newdata`?
My questions are:
How do I know exactly what argument(s) / syntax can be used for a function like this?
Why new_data but not newdata in this example?
I might be missing something here, but is there any reference/resource about how to use/interpret a help document, in plain English? (a lot of document, including R help file seem just give a brief sentence like "additional arguments affecting the predictions produced" etc)
#CarlWitthoft's answer is good, I want to add a little bit of nuance about this particular function. The reason the help page for ?predict is so vague is an unfortunate consequence of the fact that predict() is a generic method in R: that is, it's a function that can be applied to a variety of different object types, using slightly different (but appropriate) methods in each case. As such, the ?predict help page only lists object (which is required as the first argument in all methods) and ..., because different predict methods could take very different arguments/options.
If you call methods("predict") in a clean R session (before loading any additional packages) you'll see a list of 16 methods that base R knows about. After loading library("tidymodels"), the list expands to 69 methods. I don't know what class your object is (class("model_xgboost")), but assuming that it's of class model_fit, we look at ?predict.model_fit to see
predict(object, new_data, type = NULL, opts = list(), ...)
This tells us that we need to call the new data new_data (and, reading a bit farther down, that it needs to be "A rectangular data object, such as a data frame")
The help page for predict says
Most prediction methods which are similar to those for linear
models have an argument ‘newdata’ specifying the first place to
look for explanatory variables to be used for prediction
(emphasis added). I don't know why the parsnip authors (the predict.model_fit method comes from the parsnip package) decided to use new_data rather than newdata, presumably in line with the tidyverse style guide, which says
Use underscores (_) (so called snake case) to separate words within a name.
In my opinion this might have been a mistake, but you can see that the parsnip/tidymodels authors have realized that people are likely to make this mistake and added an informative warning, as shown in your example and noted e.g. here
Among other things, the existence of ... in a function definition means you can enter any arguments (values, functions, etc) you want to. There are some cases where the main function does not even use the ... but passes them to functions called inside the main function. Simple example:
foo <- function(x,...){
y <- x^2
plot(x,y,...)
}
I know of functions which accept a function as an input argument, at which point the items to include via ... are specific to the selected input function name.
I have a package which uses a data.frame based S4 class:
setClass(Class="foobar",
slots=c(a="character", b="character", c="character"),
contains="data.frame")
Works as intended. However, I observe weird warnings when combining with tidyverse:
df <- data.frame(ID=1:5)
df2 <- new("foobar", df)
as_tibble(df2)
The last statement incites a warning message:
Warning message:
In class(x) <- c(subclass, tibble_class) :
Setting class(x) to multiple strings ("tbl_df", "tbl", ...); result will no longer be an S4 object
This is because tidyverse does not support S4 data frames. This can be circumvented in downstream code by using asS3(df). However, users of my package may be puzzled if they see these warnings. I am now faced with the following choices and I don't really know which would be the most reasonable and correct:
Keep the S4 model and hope that the users won't mind seeing this warning each time they pass my data frames into something else.
Use S3. However, I already have another S4 class defined in published versions of my package. I am afraid that I would break someones code.
Mix S3 and S4. Is it even allowed?
Is there another solution I might be overlooking?
There is no brilliant solution to this which is entirely within your control.
The tidyverse package may call class<- on any data-frame-like object given to it, and as you have seen this will destroy the S4 nature of any object. This can't be worked around by (for instance) defining a method for coerce or calling setAs, as class<- doesn't use that mechanism. (class<- isn't generic either, you can't set a method for it.) The only way to make tidyverse support S4 is for tidyverse's author to alter the code to use as or similar, and it doesn't look like that is top of their to-do-list.
You are correct to be worried about dramatically altering the way your class works when you have released a version of your package already with an S4 class.
If:
your package is quite new and doesn't yet have many users;
you can do all you need to do with S3; and
you don't know of another package which has built new classes on top of yours
then it may be best to redefine it as S3, and include a message when your package is installed or loaded to say
thanks for installing myPackage v2. Code may be incompatible with v1.2 or earlier; see help(blah) for details
otherwise, stick with S4.
You can't exactly mix S3 and S4 for class definitions (you can for method definitions). The closest you can come is setOldClass which registers a S3 class as an S4 one (whereas you wanted the opposite). Still, that may help you achieve "you can do all you need to do with S3" above.
One other possibility is to define your own version of class<- which checks to see if an object of S4 class foobar is attempting to be coerced to S3 and calls the ordinary class<- if not. The cure is probably worse than the disease in this case; this will slow down all future S3 class conversions (since class<- is now an ordinary function call, not a primitive) but it should work in principle. Another reason that it is not recommended is that you are relying on no other package higher in the search path doing something similar (what if another package author had the same issue and wanted to do the same trick? Then the results would depend on which package was higher up the search path!)
There are a number of tests which, applied to an object of a given class, produce information about that object. Consider objects of class "function". The functions is.primitive() or is.closure(), or (from rlang) is_primitive_eager() or is_primitive_lazy(), provide information about a function object. However, Using methods(class = "function") (with rlang loaded) does not return any of these functions:
[1] as.data.frame as.list coerce coerce<- fortify head latex plot print tail .
Using extends(class1 = "function", maybe = TRUE, fullInfo = TRUE) shows two superclasses, "OptionalFunction" and "PossibleMethod".
Using completeClassDefinition(Class = "function", doExtends=TRUE) provides 23 subclasses. However, it appears to me (though I am not sure of this) that all or almost all of the super- and sub-classes from these two functions are specifically of S4 classes, which I generally do not use. One of these subclasses is "genericFunction", so I tried to apply it to a base R function which I knew to be generic. Although is(object=plot, class2 = "genericFunction") returns TRUE, and plot() antedates S4 classes, there is no "is.generic" test in base R, but there is an "isGeneric" test in the methods package, which suggests to me that plot() has been rewritten as an S4 object.
At any rate, there are a lot of obvious potential properties of functions, like whether they are generic, for which there are no is.<whatever> tests that I can find, and I would like to know if there are other ways I can search for them, e.g., in packages.
A more generic way of asking this same question is whether there is any way of identifying functions that will accept objects of a specified class and not return an error or nonsense. If so I could take a list of the functions in the reccomended packages or in some specified package and test whether each returns a sensable response when handed a function. This is not exactly an answer --- such a method would return TRUE for quote(), for example -- but it would at least cut the problem down to size.
When I attached the package ffbase for the first time it alerted me that the functions %in% and table were being masked from the base package. I use both functions quite a lot, so I immediately investigated what this means and I am not really sure I understand what's going on here.
As far as I can tell, for table this means that a new method was added:
methods(table)
[1] table.default* table.ff
And for %in%, it's truly been overwritten so the default is the ff version, with base playing backup:
getAnywhere(`%in%`)
2 differing objects matching '%in%' were found in the following places
package::ffbase
package:base
namespace:base
namespace:ffbase
I have two questions now. The first is - if a new method is added an S3 generic, then why would you need to warn about masking? In my mind, table isn't truly masked because doesn't R just figure out what data type I have and dispatch the correct method?
And secondly, if you have actually overwritten a function then why does it still work if I do base functionality without specifying the right namespace?
x <- c(1, 23)
23 %in% x
[1] TRUE
I would have assumed I would have needed to use base::%in% to get this right?
I suppose this second question really boils down to - I trust R when it comes to the generic method dispatch, because the point of a having a class is to provide some way to signal what method you're supposed to use. But if you have this system where package functions (not associated with a class) just get loaded in order of package load, then I don't understand how R knows when the first one it encounters isn't going to work?
This question comes from a range of other questions that all deal with essentially the same problem. For some strange reason, using a function within another function sometimes fails in the sense that variables defined within the local environment of the first function are not found back in the second function.
The classical pattern in pseudo-code :
ff <- function(x){
y <- some_value
some_function(y)
}
ff(x)
Error in eval(expr, envir, enclos) :
object 'y' not found
First I thought it had something to do with S4 methods and the scoping in there, but it also happens with other functions. I've had some interaction with the R development team, but all they did was direct me to the bug report site (which is not the most inviting one, I have to say). I never got any feedback.
As the problem keeps arising, I wonder if there is a logic explanation for it. Is it a common mistake made in all these cases, and if so, which one? Or is it really a bug?
Some of those questions :
Using functions and environments
R (statistical) scoping error using transformBy(), part of the doBy package.
How to use acast (reshape2) within a function in R?
Why can't I pass a dataset to a function?
Values not being copied to the next local environment
PS : I know the R-devel list, in case you wondered...
R has both lexical and dynamic scope. Lexical scope works automatically, but dynamic scope must be implemented manually, and requires careful book-keeping. Only functions used interactively for data analysis need dynamic scope, so most authors (like me!) don't learn how to do it correctly.
See also: the standard non-standard evaluation rules.
There are undoubtedly bugs in R, but a lot of the issues that people have been having are quite often errors in the implementation of some_function, not R itself. R has scoping rules ( see http://cran.r-project.org/doc/manuals/R-intro.html#Scope) which when combined with lazy evaluation of function arguments and the ability to eval arguments in other scopes are extremely powerful but which also often lead to subtle errors.
As Dirk mentioned in his answer, there isn't actually a problem with the code that you posted. In the links you posted in the question, there seems to be a common theme: some_function contains code that messes about with environments in some way. This messing is either explicit, using new.env and with or implicitly, using a data argument, that probably has a line like
y <- eval(substitute(y), data)
The moral of the story is twofold. Firstly, try to avoid explicitly manipulating environments, unless you are really sure that you know what you are doing. And secondly, if a function has a data argument then put all the variables that you need the function to use inside that data frame.
Well there is no problem in what you posted:
/tmp$ cat joris.r
#!/usr/bin/r -t
some_function <- function(y) y^2
ff <- function(x){
y <- 4
some_function(y) # so we expect 16
}
print(ff(3)) # 3 is ignored
$ ./joris.r
[1] 16
/tmp$
Could you restate and postan actual bug or misfeature?