Redefinition of generic for plot function breaks plot.formula - r

CRAN policies do not allow that single methods (for generic functions) which are defined in base or recommended packages are replaced. They advice package authors to replace the standard generic and use a xxx.default method which then calls the original standard generic.
An example from my package papeR is as follows:
## overwrite standard generic
plot <- function(x, y, ...)
UseMethod("plot")
## per default fall back to standard generic
plot.default <- function(x, y, ...)
graphics::plot(x, y, ...)
## now specify modified plot function for data frames
plot.data.frame <- function(x, variables = names(x), ...)
The complete code can be found on GitHub.
For various generics such as toLatex this method works perfectly. Yet, the above plot method definitions break the standard plot.formula function:
library("devtools")
install_github("hofnerb/papeR")
## still works:
example(plot.formula, package="graphics")
### But:
library("papeR")
example(plot.formula, package="graphics")
## Error in eval(expr, envir, enclos) : object 'Ozone' not found
I can still use
graphics:::plot.formula(Ozone ~ Wind, data = airquality)
though. Other plot functions keep working, see e.g.
example("plot", package = "graphics")
Additional Information
In my NAMESPACE I have
importFrom("graphics", "plot", "plot.default", ...)
export(plot, plot.data.frame, ...)
S3method(plot, default)
S3method(plot, data.frame)
For a previous discussion on this topic (with a different focus) see also here.

For a discussion and solution (define a new_class and use plot.new_class) see
https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000463.html

Related

optimParallel in Package of the same name cannot find C_dnorm function

I want to optimize a function from a package in R using optimParallel. Till now I only optimized functions that I wrote in my environment and it worked. But functions from any package don't work and I get a Error. I checked with .libPaths() if the paths are the same on each node and I used Sys.info() to check for any differences. Here is an example (which is not meaningful, but it should show my problem)
library(optimParallel)
.libPaths()
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
cl <- makeCluster(2) #also tried to set "master" to my IP
clusterEvalQ(cl, .libPaths())
[[1]]
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
[[2]]
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
setDefaultCluster(cl)
optimParallel(par=0, dnorm, mean=1, method = "L-BFGS-B")$par
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: object 'C_dnorm' not found
#for comparison
optim(par=0, dnorm, mean=1, method = "L-BFGS-B")$par
[1] -5.263924
What am I doing wrong?
Edit: The problem is solved in optimParallel version 0.7-4
The version is available on CRAN: https://CRAN.R-project.org/package=optimParallel
For older versions:
A workaround is to wrap dnorm() into a function defined in the .GlobalEnv.
library("optimParallel")
cl <- makeCluster(2)
setDefaultCluster(cl)
f <- function(x, mean) dnorm(x, mean=mean)
optimParallel(par=0, f, mean=1, method="L-BFGS-B")$par
[1] -5.263924
A more difficult task is to explain why the problem occurs:
optimParallel() uses parallel::parLapply() to evaluate f.
parLapply() has the arguments cl, X, fun.
If we would use parLapply() without pre-processing the arguments passed via ... of optimParallel(), f could not have arguments named cl, X, fun, because this would cause errors like:
Error in lapply(X = x, FUN = f, ...) (from #2) :
formal argument "X" matched by multiple actual arguments
Simply speaking, optimParallel() avoids this error by removing all arguments from f, putting them into an environment and evaluating f in that environment.
One problem of that approach occurs when f is defined in another R package and links to compiled code. That case is illustrated in the question above.
Suggestions for better approaches to handle the issue are welcome. I opened a corresponding question here. As long as there is no better solution, one can use the workaround illustrated above.
Reasoning that your error message indicated that the parallel processes were not getting adequate information, I looked at the examples in the documentation of the optimParallel package. The first one defines a helper function which will carry an environment with it, but it otherwise resembles yours in some respects.
library(optimParallel)
set.seed(123); x <- rnorm(n=1000, mean=1, sd=2)
negll <- function(par, x) -sum(dnorm(x=x, mean=par[1], sd=par[2], log=TRUE))
o1 <- optimParallel(par=c(0, 1), fn=negll, x=x, method="L-BFGS-B", lower=c(-Inf, 0.0001))
o1$par
#[1] 1.032256 1.982398
That example also differs from yours in that it is using data to estimate the parameters. I'm not sure what your result means, whereas I do understand what the values returned by the modification of that example that I posted here. The minimum log-likelihood for that particular data (not completely reproducible since I forgot to set a seed) is at a mean of 1.126 and an sd of 2.007.
For an example of how to create a situation where the environment of a non-base package gets carried to the workers, see this prior answer: parallel::clusterExport how to pass nested functions from global environment?

What causes this weird behaviour in the randomForest.partialPlot function?

I am using the randomForest package (v. 4.6-7) in R 2.15.2. I cannot find the source code for the partialPlot function and am trying to figure out exactly what it does (the help file seems to be incomplete.) It is supposed to take the name of a variable x.var as an argument:
library(randomForest)
data(iris)
rf <- randomForest(Species ~., data=iris)
x1 <- "Sepal.Length"
partialPlot(x=rf, pred.data=iris, x.var=x1)
# Error in `[.data.frame`(pred.data, , xname) : undefined columns selected
partialPlot(x=rf, pred.data=iris, x.var=as.character(x1))
# works!
typeof(x1)
# [1] "character"
x1 == as.character(x1)
# TRUE
# Now if I try to wrap it in a function...
f <- function(w){
partialPlot(x=rf, pred.data=iris, x.var=as.character(w))
}
f(x1)
# Error in as.character(w) : 'w' is missing
Questions:
1) Where can I find the source code for partialPlot?
2) How is it possible to write a function which takes a string x1 as an argument where x1 == as.character(x1), but the function throws an error when as.character is not applied to x1?
3) Why does it fail when I wrap it inside a function? Is partialPlot messing with environments somehow?
Tips/ things to try that might be helpful for solving such questions by myself in future would also be very welcome!
The source code for partialPlot() is found by entering
randomForest:::partialPlot.randomForest
into the console. I found this by first running
methods(partialPlot)
because entering partialPlot only tells me that it uses a method. From the methods call we see that there is one method, and the asterisk next to it tells us that it is a non-exported function. To view the source code of a non-exported function, we use the triple-colon operator :::. So it goes
package:::generic.method
Where package is the package, generic is the generic function (here it's partialPlot), and method is the method (here it's the randomForest method).
Now, as for the other questions, the function can be written with do.call() and you can pass w without a wrapper.
f <- function(w) {
do.call("partialPlot", list(x = rf, pred.data = iris, x.var = w))
}
f(x1)
This works on my machine. It's not so much environments as it is evaluation. Many plotting functions use some non-standard evaluation, which can be handled most of the time with this do.call() construct.
But note that outside the function you can also use eval() on x1.
partialPlot(x = rf, pred.data = iris, x.var = eval(x1))
I don't really see a reason to check for the presence of as.character() inside the function. If you can leave a comment we can go from there if you need more info. I'm not familiar enough with this package yet to go any further.

Method dispatch for generic `plot` function in R

How does R dispatch plot functions? The standard generic is defined as
plot <- function (x, y, ...) UseMethod("plot")
So usually, all plot methods need arguments x and y. Yet, there exists a variety of other functions with different arguments. Some functions only have argument x:
plot.data.frame <- function (x, ...)
others even have neither x nor y:
plot.formula <- function(formula, data = parent.frame(), ..., subset,
ylab = varnames[response], ask = dev.interactive())
How does this work and where is this documented?
Background
In my package papeR (see GitHub) I want to replace the function plot.data.frame, which is defined in the R package graphics with my own version. Yet, this is usually not allowed
Do not replace registered S3 methods from base/recommended packages,
something which is not allowed by the CRAN policies and will mean that
everyone gets your method even if your namespace is unloaded.
as Brian Ripley let me know last time I tried to do such a thing. A possible solution is as follows:
If you want to change the behaviour of a generic, say predict(), for
an existing class or two, you could add such as generic in your own
package with default method stats::predict, and then register modified
methods for your generic (in your own package).
For other methods I could easily implement this (e.g. toLatex), yet, with plot I am having problems. I added the following to my code:
## overwrite standard generic
plot <- function(x, y, ...)
UseMethod("plot")
## per default fall back to standard generic
plot.default <- function(x, y, ...)
graphics::plot(x, y, ...)
## now specify modified plot function for data frames
plot.data.frame <- function(x, variables = names(x), ...)
This works for data frames and plots with x and y. Yet, it does not work if I try to plot a formula etc:
Error in eval(expr, envir, enclos) :
argument "y" is missing, with no default
I also tried to use
plot.default <- function(x, y, ...)
UseMethod("graphics::plot")
but then I get
Error in UseMethod("graphics::plot") :
no applicable method for 'graphics::plot' applied to an object of class "formula"
So the follow up question is how I can fix this?
[Edit:] Using my solution below fixes the problems within the package. Yet, plot.formula is broken afterwards:
library("devtools")
install_github("hofnerb/papeR")
example(plot.formula, package="graphics") ## still works
library("papeR")
example(plot, package = "papeR") ## works
### BUT
example(plot.formula, package="graphics") ## is broken now
Thanks to #Roland I solved part of my problem.
It seems that the position of the arguments are used for method dispatch (and not only the names). Names are however partially used. So with Rolands example
> plot.myclass <- function(a, b, ...)
> print(b)
> x <- 1
> y <- 2
> class(x) <- "myclass"
we have
> plot(x, y)
[1] 2
> plot(a = x, b = y)
[1] 2
but if we use the standard argument names
> plot(x = x, y = y)
Error in print(b) (from #1) : argument "b" is missing, with no default
it doesnt't work. As one can see x is correctly used for the dispatch but b is then "missing". Also we cannot swap a and b:
> plot(b = y, a = x)
Error in plot.default(b = y, a = x) :
argument 2 matches multiple formal arguments
Yet, one could use a different order if the argument one wants to dispatch for is the first (?) element without name:
> plot(b = y, x)
[1] 2
Solution to the real problem:
I had to use
plot.default <- function(x, y, ...)
graphics::plot(x, y, ...)
The real issue was internally in my plot.data.frame method where I was using something along the lines of:
plot(x1 ~ y1)
without specifying data. Usually this works as data = parent.frame() per default. Somehow in this case this wasn't working. I now use plot(y1, x1) which works like a charm. So this was my sloppiness.

R: Evaluate an expression in a data frame with arguments that are passed as an object

I want to write a function that evaluates an expression in a data frame, but one that does so using expressions that may or may not contain user-defined objects.
I think the magic word is "non-standard evaluation", but I cannot quite figure it out just yet.
One simple example (yet realistic for my purposes): Say, I want to evaluate an lm() call for variables found in a data frame.
mydf <- data.frame(x=1:10, y=1:10)
A function that does so can be written as follows:
f <- function(df, expr){
expr <- substitute(expr)
pf <- parent.frame()
eval(expr, df, pf)
}
Such that I get what I want using the following command.
f(mydf, lm(y~x))
# Call:
# lm(formula = y ~ x)
#
# Coefficients:
# (Intercept) x
# 1.12e-15 1.00e+00
Nice. However, there are cases in which it is more convenient to save the model equation in an object before calling lm(). Unfortunately the above function no longer does it.
fml <- y~x
f(mydf, lm(fml))
# Error in eval(expr, envir, enclos): object 'y' not found
Can someone explain why the second call doesn't work? How could the function be altered, such that both calls would lead to the desired results? (desired=fitted model)
Cheers!
From ?lm, re data argument:
If not found in data, the variables are taken from environment(formula)
In your first case, the formula is created in your eval(expr, df, pf) call, so the environment of the formula is an environment based on df. In the second case, the formula is created in the global environment, which is why it doesn't work.
Because formulas come with their own environment, they can be tricky to handle in NSE.
You could try:
with(mydf,
{
print(lm(y~x))
fml <- y~x
print(lm(fml))
}
)
but that probably isn't ideal for you. Short of checking whether any names in the captured parameter resolve to formulas, and re-assigning their environments, you'll have some trouble. Worse, it isn't even necessarily obvious that re-assigning the environment is the right thing to do. In many cases, you do want to look in the formula environment.
There was a loosely related discussion on this issue on R Chat:
Ben Bolker outlines an issue
Josh O'Brien points to some old references

Methods of summary() function

I'm using summary() on output of mle(stats4) function, its output belongs to class mle. I would like to find out how summary() estimates standard deviation of coefficient returned by mle(stats4), but I do not see summary.mle in list printed by methods(summary), why can't I find summary.mle() function ?
(I guess the proper function is summary.mlm(), but I'm not sure that and don't know why it would be mlm, instead of mle)
It's actually what summary.mle would be if it were an S3 method. S3 methods get created and then dispatched using the generic_function_name.class_of_first_argument mechanism whereas S4 methods are dispatched on the basis of their argument "signature" which allows consideration of second and later arguments. This is how to get showMethods to display the code that is called when an S4-method is called. This is an instance where only the first argument is used as the signature. You can choose any of the object signatures that appear in the abbreviated output to specify the classes-agument, and it is the includeDefs flag that prompts display of the code:
showMethods("summary",classes="mle", includeDefs=TRUE)
#---(output to console)----
Function: summary (package base)
object="mle"
function (object, ...)
{
cmat <- cbind(Estimate = object#coef, `Std. Error` = sqrt(diag(object#vcov)))
m2logL <- 2 * object#min
new("summary.mle", call = object#call, coef = cmat, m2logL = m2logL)
}
As shown in
>library(stats4)
>showMethods("summary")
Function: summary (package base)
object="ANY"
object="mle"
The summary is interpreted in the S4 way. I don't know how to check the code in R directly, so I search the source of stats4 directly for you.
In stats4/R/mle.R, there is:
setMethod("summary", "mle", function(object, ...){
cmat <- cbind(Estimate = object#coef,
`Std. Error` = sqrt(diag(object#vcov)))
m2logL <- 2*object#min
new("summary.mle", call = object#call, coef = cmat, m2logL = m2logL)
})
So it creates a S4 object summary.mle. And I guess you could trace the code by yourself now.

Resources