Method dispatch for generic `plot` function in R - r

How does R dispatch plot functions? The standard generic is defined as
plot <- function (x, y, ...) UseMethod("plot")
So usually, all plot methods need arguments x and y. Yet, there exists a variety of other functions with different arguments. Some functions only have argument x:
plot.data.frame <- function (x, ...)
others even have neither x nor y:
plot.formula <- function(formula, data = parent.frame(), ..., subset,
ylab = varnames[response], ask = dev.interactive())
How does this work and where is this documented?
Background
In my package papeR (see GitHub) I want to replace the function plot.data.frame, which is defined in the R package graphics with my own version. Yet, this is usually not allowed
Do not replace registered S3 methods from base/recommended packages,
something which is not allowed by the CRAN policies and will mean that
everyone gets your method even if your namespace is unloaded.
as Brian Ripley let me know last time I tried to do such a thing. A possible solution is as follows:
If you want to change the behaviour of a generic, say predict(), for
an existing class or two, you could add such as generic in your own
package with default method stats::predict, and then register modified
methods for your generic (in your own package).
For other methods I could easily implement this (e.g. toLatex), yet, with plot I am having problems. I added the following to my code:
## overwrite standard generic
plot <- function(x, y, ...)
UseMethod("plot")
## per default fall back to standard generic
plot.default <- function(x, y, ...)
graphics::plot(x, y, ...)
## now specify modified plot function for data frames
plot.data.frame <- function(x, variables = names(x), ...)
This works for data frames and plots with x and y. Yet, it does not work if I try to plot a formula etc:
Error in eval(expr, envir, enclos) :
argument "y" is missing, with no default
I also tried to use
plot.default <- function(x, y, ...)
UseMethod("graphics::plot")
but then I get
Error in UseMethod("graphics::plot") :
no applicable method for 'graphics::plot' applied to an object of class "formula"
So the follow up question is how I can fix this?
[Edit:] Using my solution below fixes the problems within the package. Yet, plot.formula is broken afterwards:
library("devtools")
install_github("hofnerb/papeR")
example(plot.formula, package="graphics") ## still works
library("papeR")
example(plot, package = "papeR") ## works
### BUT
example(plot.formula, package="graphics") ## is broken now

Thanks to #Roland I solved part of my problem.
It seems that the position of the arguments are used for method dispatch (and not only the names). Names are however partially used. So with Rolands example
> plot.myclass <- function(a, b, ...)
> print(b)
> x <- 1
> y <- 2
> class(x) <- "myclass"
we have
> plot(x, y)
[1] 2
> plot(a = x, b = y)
[1] 2
but if we use the standard argument names
> plot(x = x, y = y)
Error in print(b) (from #1) : argument "b" is missing, with no default
it doesnt't work. As one can see x is correctly used for the dispatch but b is then "missing". Also we cannot swap a and b:
> plot(b = y, a = x)
Error in plot.default(b = y, a = x) :
argument 2 matches multiple formal arguments
Yet, one could use a different order if the argument one wants to dispatch for is the first (?) element without name:
> plot(b = y, x)
[1] 2
Solution to the real problem:
I had to use
plot.default <- function(x, y, ...)
graphics::plot(x, y, ...)
The real issue was internally in my plot.data.frame method where I was using something along the lines of:
plot(x1 ~ y1)
without specifying data. Usually this works as data = parent.frame() per default. Somehow in this case this wasn't working. I now use plot(y1, x1) which works like a charm. So this was my sloppiness.

Related

How to use object from another function in a function without explicitely passing it to the function

I'm sure there's already an answer to it, but I'm not sure about the right terms to search for.
I thought that in R, functions that need an argument that is not defined in the function will look for said argument in the environment they are called from / the parent environment, no?
So I thought, the following would work:
f_xyz <- function (y, z)
{
x + y + z
}
f_x <- function()
{
x <- 1
y <- 2
z <- 3
f_xyz(y, z)
}
f_x()
x is defined and created in the f_x function, so when calling the f_xyz function WITHIN f_x, I thought it will find x there and use it. However, that is not the case. Instead I'm getting an error Error in f_xyz(y, z) : object 'x' not found.
If I create the f_xyz function in f_x, the error doesn't appear.
What am I missing?
f_x <- function()
{
f_xyz <- function (y, z)
{x + y + z}
x <- 1
y <- 2
z <- 3
f_xyz(y, z)
}
f_x()
[1] 6
R uses lexical scoping:
the values of free variables are searched for in the environment in which the function was defined.
Function f_xyz was defined in the global env, but x was defined in the env of f_x, resulting in the error 'object not found'. In the latter case, both f_xyz and x were defined in the same scope, so they can share their arguments.
Other languages like Python use enclosing scopes for nested functions, but R doesn't.
See here for details.
Nevertheless, you probably want to use partial evaluation to set some arguments and mention x explicitly in the function. It is very messy to write a function relying on something else than its arguments.

function return as another function parameter with eval() in R

I have a difficulty in learning how to use eval() to evaluate a function,
suppose i have a function:
sq <- function(y){ y**2 }
u can evaluate this function like this:
call <- match.call(expand.dots = FALSE)
call[[1]] <- as.name('sq')
call$y <- 0.2
call <- call[c(1,3)]
eval(call)
and it will give u 0.2^2 = 0.04
But if i want to calculate sth like sq(y), where y = sin(x), i may write:
call <- match.call(expand.dots = FALSE)
call[[1]] <- as.name('sq')
call$y <- as.name('sin')
call$x <- 0.2
call <- call[c(1,3:4)]
eval(call)
it will give me this error:
Error in sq(y = sin, x = 0.2) : unused argument (x = 0.2)
Seems that R cannot recognize x as an argument of sin, but an argument of sq instead. how can we tell R that x is an argument of sin?
Also, it seems that R is the only language i have learned that uses eval() to evaluate a function (i know C++ and Python, but havent seen that syntax before), what is the different (or advantage) to evaluate a function in this way instead of calling sq(y=sin(x=0.2))?
Is there a good book or tutorial talking about its usage, and when to use between the two ways? Thanks!
PS: the example above is actually a simplified version of the code in mlogit package im studying, in which the log likelihood is returned by calling 'lnl.slogit' and is passed to 'mlogit.optim' and get optimized (Line 407 of https://github.com/cran/mlogit/blob/master/R/mlogit.R). I used the same method as the code in the package to call two functions, but i got the error above.
The code is trying to pass:
an argument x to sq but sq has no x argument
the function sin in argument y but a number is required, not a function.
Try this:
x <- 0.2
cl <- call("sq", y = quote(sin(x)))
cl
## sq(y = sin(x))
eval(cl)
## [1] 0.0394695
or maybe what you want is:
x <- 0.2
cl <- call("sq", y = sin(x))
cl
## sq(y = 0.198669330795061)
eval(cl)
## [1] 0.0394695
or
match.fun("sq")(sin(x))
## [1] 0.0394695
or just:
sq(sin(x))
## [1] 0.0394695
Note that ordinarily you do not have to use eval. Just listing the function with its arguments is enough to evaluate it as the in last line of code.
The regression functions in R internally use non-standard code due to considerations related to environments but ordinarily that would not be needed in other contexts.

Redefinition of generic for plot function breaks plot.formula

CRAN policies do not allow that single methods (for generic functions) which are defined in base or recommended packages are replaced. They advice package authors to replace the standard generic and use a xxx.default method which then calls the original standard generic.
An example from my package papeR is as follows:
## overwrite standard generic
plot <- function(x, y, ...)
UseMethod("plot")
## per default fall back to standard generic
plot.default <- function(x, y, ...)
graphics::plot(x, y, ...)
## now specify modified plot function for data frames
plot.data.frame <- function(x, variables = names(x), ...)
The complete code can be found on GitHub.
For various generics such as toLatex this method works perfectly. Yet, the above plot method definitions break the standard plot.formula function:
library("devtools")
install_github("hofnerb/papeR")
## still works:
example(plot.formula, package="graphics")
### But:
library("papeR")
example(plot.formula, package="graphics")
## Error in eval(expr, envir, enclos) : object 'Ozone' not found
I can still use
graphics:::plot.formula(Ozone ~ Wind, data = airquality)
though. Other plot functions keep working, see e.g.
example("plot", package = "graphics")
Additional Information
In my NAMESPACE I have
importFrom("graphics", "plot", "plot.default", ...)
export(plot, plot.data.frame, ...)
S3method(plot, default)
S3method(plot, data.frame)
For a previous discussion on this topic (with a different focus) see also here.
For a discussion and solution (define a new_class and use plot.new_class) see
https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000463.html

Why does order matter when using "data" and "formula" keyword arguments?

In R, why is it that the order of the data and formula keywords matters when plotting? I thought that with named arguments order isn't supposed to matter...
For an example of what I mean, check out this code:
library(MASS)
data(menarche)
# Correct formulation (apparently):
plot(formula=Menarche/Total ~ Age, data=menarche)
# In contrast, note how the following returns an error:
plot(data=menarche, formula=Menarche/Total ~ Age)
Is this just a quirk of the plot function or is this behavior exhibited in other functions as well?
It is related to S3 methods for the S3 generic plot(). S3 dispatches methods based on the first argument however the exact functioning is complicated because formula is allowed as a special exception from the usual generic arguments of plot(), which are x and y plus ...:
> args(plot)
function (x, y, ...)
NULL
Hence what happens in the first case is that the plot.formula() method is run because the first argument supplied is a formula and this matches the arguments of plot.formula()
> args(graphics:::plot.formula)
function (formula, data = parent.frame(), ..., subset, ylab = varnames[response],
ask = dev.interactive())
NULL
for example:
> debugonce(graphics:::plot.formula)
> plot(formula=Menarche/Total ~ Age, data=menarche)
debugging in: plot.formula(formula = Menarche/Total ~ Age, data = menarche)
debug: {
m <- match.call(expand.dots = FALSE)
[...omitted...]
In contrast, when you call plot(data=menarche, formula=Menarche/Total ~ Age), the first argument is a data frame and hence the graphics:::plot.data.frame method is called:
> plot(data=menarche, formula=Menarche/Total ~ Age)
Error in is.data.frame(x) : argument "x" is missing, with no default
> traceback()
3: is.data.frame(x)
2: plot.data.frame(data = menarche, formula = Menarche/Total ~ Age)
1: plot(data = menarche, formula = Menarche/Total ~ Age)
but because that method expects an argument x, which you didn't supply, you get the error about missing x.
So in a sense, the ordering of named arguments doesn't and shouldn't matter but when S3 generics are in play method dispatch kicks in first to decide which method to pass the arguments on to and then the arguments supplied - not the ordering - is what will often catch you out, especially when mixing the formula methods with other non-formula methods.

Object not found error when passing model formula to another function

I have a weird problem with R that I can't seem to work out.
I've tried to write a function that performs K-fold cross validation for a model chosen by the stepwise procedure in R. (I'm aware of the issues with stepwise procedures, it's purely for comparison purposes) :)
Now the issue is, that if I define the function parameters (linmod,k,direction) and run the contents of the function, it works flawlessly. BUT, if I run it as a function, I get an error saying the datas.train object can't be found.
I've tried stepping through the function with debug() and the object clearly exists, but R says it doesn't when I actually run the function. If I just fit a model using lm() it works fine, so I believe it's a problem with the step function in the loop, while inside a function. (try commenting out the step command, and set the predictions to those from the ordinary linear model.)
#CREATE A LINEAR MODEL TO TEST FUNCTION
lm.cars <- lm(mpg~.,data=mtcars,x=TRUE,y=TRUE)
#THE FUNCTION
cv.step <- function(linmod,k=10,direction="both"){
response <- linmod$y
dmatrix <- linmod$x
n <- length(response)
datas <- linmod$model
form <- formula(linmod$call)
# generate indices for cross validation
rar <- n/k
xval.idx <- list()
s <- sample(1:n, n) # permutation of 1:n
for (i in 1:k) {
xval.idx[[i]] <- s[(ceiling(rar*(i-1))+1):(ceiling(rar*i))]
}
#error calculation
errors <- R2 <- 0
for (j in 1:k){
datas.test <- datas[xval.idx[[j]],]
datas.train <- datas[-xval.idx[[j]],]
test.idx <- xval.idx[[j]]
#THE MODELS+
lm.1 <- lm(form,data= datas.train)
lm.step <- step(lm.1,direction=direction,trace=0)
step.pred <- predict(lm.step,newdata= datas.test)
step.error <- sum((step.pred-response[test.idx])^2)
errors[j] <- step.error/length(response[test.idx])
SS.tot <- sum((response[test.idx] - mean(response[test.idx]))^2)
R2[j] <- 1 - step.error/SS.tot
}
CVerror <- sum(errors)/k
CV.R2 <- sum(R2)/k
res <- list()
res$CV.error <- CVerror
res$CV.R2 <- CV.R2
return(res)
}
#TESTING OUT THE FUNCTION
cv.step(lm.cars)
Any thoughts?
When you created your formula, lm.cars, in was assigned its own environment. This environment stays with the formula unless you explicitly change it. So when you extract the formula with the formula function, the original environment of the model is included.
I don't know if I'm using the correct terminology here, but I think you need to explicitly change the environment for the formula inside your function:
cv.step <- function(linmod,k=10,direction="both"){
response <- linmod$y
dmatrix <- linmod$x
n <- length(response)
datas <- linmod$model
.env <- environment() ## identify the environment of cv.step
## extract the formula in the environment of cv.step
form <- as.formula(linmod$call, env = .env)
## The rest of your function follows
Another problem that can cause this is if one passes a character (string vector) to lm instead of a formula. vectors have no environment, and so when lm converts the character to a formula, it apparently also has no environment instead of being automatically assigned the local environment. If one then uses an object as weights that is not in the data argument data.frame, but is in the local function argument, one gets a not found error. This behavior is not very easy to understand. It is probably a bug.
Here's a minimal reproducible example. This function takes a data.frame, two variable names and a vector of weights to use.
residualizer = function(data, x, y, wtds) {
#the formula to use
f = "x ~ y"
#residualize
resid(lm(formula = f, data = data, weights = wtds))
}
residualizer2 = function(data, x, y, wtds) {
#the formula to use
f = as.formula("x ~ y")
#residualize
resid(lm(formula = f, data = data, weights = wtds))
}
d_example = data.frame(x = rnorm(10), y = rnorm(10))
weightsvar = runif(10)
And test:
> residualizer(data = d_example, x = "x", y = "y", wtds = weightsvar)
Error in eval(expr, envir, enclos) : object 'wtds' not found
> residualizer2(data = d_example, x = "x", y = "y", wtds = weightsvar)
1 2 3 4 5 6 7 8 9 10
0.8986584 -1.1218003 0.6215950 -0.1106144 0.1042559 0.9997725 -1.1634717 0.4540855 -0.4207622 -0.8774290
It is a very subtle bug. If one goes into the function environment with browser, one can see the weights vector just fine, but it somehow is not found in the lm call!
The bug becomes even harder to debug if one used the name weights for the weights variable. In this case, since lm can't find the weights object, it defaults to the function weights() from base thus throwing an even stranger error:
Error in model.frame.default(formula = f, data = data, weights = weights, :
invalid type (closure) for variable '(weights)'
Don't ask me how many hours it took me to figure this out.

Resources