I'm building a function for many model types which needs to extract the formula used to make the model. Is there a flexible way to do this? For example:
x <- rnorm(10)
y <- rnorm(10)
z <- rnorm(10)
equation <- z ~ x + y
model <- lm(equation)
I what I need to do is extract the formula object "equation" once being passed the model.
You could get what you wanted by:
model$call
# lm(formula = formula)
And if you want to see what I did find out then use:
str(model)
Since you passed 'formula' (bad choice of names by the way) from the calling environment you might then need to extract from the object you passed:
eval(model$call[[2]])
# z ~ x + y
#JPMac offered a more compact method: formula(model). It's also worth looking at the mechanism used by the formula.lm function. The function named formula is generic and you use methods(formula) to see what S3 methods have been defined. Since the formula.lm method has an asterisk at its end, you need to wrap it in `getAnywhere:
> getAnywhere(formula.lm)
A single object matching ‘formula.lm’ was found
It was found in the following places
registered S3 method for formula from namespace stats
namespace:stats
with value
function (x, ...)
{
form <- x$formula
if (!is.null(form)) {
form <- formula(x$terms)
environment(form) <- environment(x$formula)
form
}
else formula(x$terms)
}
<bytecode: 0x36ff26158>
<environment: namespace:stats>
So it is using "$" to extract the list item named "formula" rather than pulling it from the call. If the $formula item is missing (which it is in your case) then It then replaces that with formula(x$terms) which I suspect is calling formula.default and looking at the operation of that function appears to only be adjusting the environment of the object.
As noted, model$call will get you the call that created the lm object, but if that call contains an object itself as the model formula, you get the object name, not the formula.
The evaluated object, ie the formula itself, can be accessed in model$terms (along with a bunch of auxiliary information on how it was treated). This should work regardless of the details of the call to lm.
Related
I'd like to define a wrapper class that will encapsulate an actual model, and let the user call predict() with either new data frames or a model matrix:
raw_model <- ...
model <- Model(raw_model)
X <- matrix(...)
predict(model, X)
df <- data.frame(...)
predict(model, df)
I thought it would simply be a matter of defining two methods for predict(), dispatching on the types of the first two arguments:
library(methods)
Model <- setClass("Model", slots = "model")
setMethod("predict", signature("Model", "matrix"),
function(object, newdata, ...) {
stats::predict(object#model, newdata)
})
setMethod("predict", signature("Model", "data.frame"),
function(object, newdata, ...) {
matrix <- model.matrix(newdata) # or something like that
stats::predict(object#model, matrix)
})
However, both calls to setMethod fail with
Error in matchSignature(signature, fdef) :
more elements in the method signature (2) than in the generic signature (1) for function ‘predict’
I understand that an S4 generic is created from the S3 generic predict, whose signature takes just one named argument object, but is there a way to have the S4 methods dispatch on more than just that first argument?
You can make S4 generics dispatch on more than one argument, but you cannot (currently) dispatch on a named argument and .... This is the problem with predict — the only named argument is object.
You can still achieve what you want though, by defining your own generic "one level down" along the lines of
predict2 <- function(model,newdata){stats::predict(model,newdata)}
setGeneric("predict2",signature=c("model","newdata"))
setMethod(
"predict2",
signature=c("Model","data.frame"),
definition=function(model,newdata){
matrix <- model.matrix(newdata) # or something like that
stats::predict(object#model, matrix)
}
)
Now you modify predict.Model (and the S4 method for predict with the signature model="Model") to call predict2(model,newdata).
I am using the randomForest package (v. 4.6-7) in R 2.15.2. I cannot find the source code for the partialPlot function and am trying to figure out exactly what it does (the help file seems to be incomplete.) It is supposed to take the name of a variable x.var as an argument:
library(randomForest)
data(iris)
rf <- randomForest(Species ~., data=iris)
x1 <- "Sepal.Length"
partialPlot(x=rf, pred.data=iris, x.var=x1)
# Error in `[.data.frame`(pred.data, , xname) : undefined columns selected
partialPlot(x=rf, pred.data=iris, x.var=as.character(x1))
# works!
typeof(x1)
# [1] "character"
x1 == as.character(x1)
# TRUE
# Now if I try to wrap it in a function...
f <- function(w){
partialPlot(x=rf, pred.data=iris, x.var=as.character(w))
}
f(x1)
# Error in as.character(w) : 'w' is missing
Questions:
1) Where can I find the source code for partialPlot?
2) How is it possible to write a function which takes a string x1 as an argument where x1 == as.character(x1), but the function throws an error when as.character is not applied to x1?
3) Why does it fail when I wrap it inside a function? Is partialPlot messing with environments somehow?
Tips/ things to try that might be helpful for solving such questions by myself in future would also be very welcome!
The source code for partialPlot() is found by entering
randomForest:::partialPlot.randomForest
into the console. I found this by first running
methods(partialPlot)
because entering partialPlot only tells me that it uses a method. From the methods call we see that there is one method, and the asterisk next to it tells us that it is a non-exported function. To view the source code of a non-exported function, we use the triple-colon operator :::. So it goes
package:::generic.method
Where package is the package, generic is the generic function (here it's partialPlot), and method is the method (here it's the randomForest method).
Now, as for the other questions, the function can be written with do.call() and you can pass w without a wrapper.
f <- function(w) {
do.call("partialPlot", list(x = rf, pred.data = iris, x.var = w))
}
f(x1)
This works on my machine. It's not so much environments as it is evaluation. Many plotting functions use some non-standard evaluation, which can be handled most of the time with this do.call() construct.
But note that outside the function you can also use eval() on x1.
partialPlot(x = rf, pred.data = iris, x.var = eval(x1))
I don't really see a reason to check for the presence of as.character() inside the function. If you can leave a comment we can go from there if you need more info. I'm not familiar enough with this package yet to go any further.
I want to write a function that evaluates an expression in a data frame, but one that does so using expressions that may or may not contain user-defined objects.
I think the magic word is "non-standard evaluation", but I cannot quite figure it out just yet.
One simple example (yet realistic for my purposes): Say, I want to evaluate an lm() call for variables found in a data frame.
mydf <- data.frame(x=1:10, y=1:10)
A function that does so can be written as follows:
f <- function(df, expr){
expr <- substitute(expr)
pf <- parent.frame()
eval(expr, df, pf)
}
Such that I get what I want using the following command.
f(mydf, lm(y~x))
# Call:
# lm(formula = y ~ x)
#
# Coefficients:
# (Intercept) x
# 1.12e-15 1.00e+00
Nice. However, there are cases in which it is more convenient to save the model equation in an object before calling lm(). Unfortunately the above function no longer does it.
fml <- y~x
f(mydf, lm(fml))
# Error in eval(expr, envir, enclos): object 'y' not found
Can someone explain why the second call doesn't work? How could the function be altered, such that both calls would lead to the desired results? (desired=fitted model)
Cheers!
From ?lm, re data argument:
If not found in data, the variables are taken from environment(formula)
In your first case, the formula is created in your eval(expr, df, pf) call, so the environment of the formula is an environment based on df. In the second case, the formula is created in the global environment, which is why it doesn't work.
Because formulas come with their own environment, they can be tricky to handle in NSE.
You could try:
with(mydf,
{
print(lm(y~x))
fml <- y~x
print(lm(fml))
}
)
but that probably isn't ideal for you. Short of checking whether any names in the captured parameter resolve to formulas, and re-assigning their environments, you'll have some trouble. Worse, it isn't even necessarily obvious that re-assigning the environment is the right thing to do. In many cases, you do want to look in the formula environment.
There was a loosely related discussion on this issue on R Chat:
Ben Bolker outlines an issue
Josh O'Brien points to some old references
I'm using summary() on output of mle(stats4) function, its output belongs to class mle. I would like to find out how summary() estimates standard deviation of coefficient returned by mle(stats4), but I do not see summary.mle in list printed by methods(summary), why can't I find summary.mle() function ?
(I guess the proper function is summary.mlm(), but I'm not sure that and don't know why it would be mlm, instead of mle)
It's actually what summary.mle would be if it were an S3 method. S3 methods get created and then dispatched using the generic_function_name.class_of_first_argument mechanism whereas S4 methods are dispatched on the basis of their argument "signature" which allows consideration of second and later arguments. This is how to get showMethods to display the code that is called when an S4-method is called. This is an instance where only the first argument is used as the signature. You can choose any of the object signatures that appear in the abbreviated output to specify the classes-agument, and it is the includeDefs flag that prompts display of the code:
showMethods("summary",classes="mle", includeDefs=TRUE)
#---(output to console)----
Function: summary (package base)
object="mle"
function (object, ...)
{
cmat <- cbind(Estimate = object#coef, `Std. Error` = sqrt(diag(object#vcov)))
m2logL <- 2 * object#min
new("summary.mle", call = object#call, coef = cmat, m2logL = m2logL)
}
As shown in
>library(stats4)
>showMethods("summary")
Function: summary (package base)
object="ANY"
object="mle"
The summary is interpreted in the S4 way. I don't know how to check the code in R directly, so I search the source of stats4 directly for you.
In stats4/R/mle.R, there is:
setMethod("summary", "mle", function(object, ...){
cmat <- cbind(Estimate = object#coef,
`Std. Error` = sqrt(diag(object#vcov)))
m2logL <- 2*object#min
new("summary.mle", call = object#call, coef = cmat, m2logL = m2logL)
})
So it creates a S4 object summary.mle. And I guess you could trace the code by yourself now.
I'm trying to program a model building function which uses the formula expression but I have some problems understanding how the model update function works.
Here's a stripped down function which results an error when using update function:
modelx <- function(formula) {
mf <- mc <- match.call()
mf <- mf[c(1L, match("formula", names(mf), 0L))]
mf[[1L]] <- as.name("model.frame")
mf <- eval(mf, parent.frame())
y <- model.response(mf, "numeric")
mt <- attr(mf, "terms")
X <- model.matrix(mt, mf)
out<-list(y=y,X=X)
out$call<-mc
out
}
The code is pretty much copied from the start of the lm function. Some example data and two models:
y<-x<-x1<-x2<-1:10
model<-modelx(y ~ x)
model1<-modelx(y ~ x1)
Now updating the first model does not work but second does:
model<-update(model, . ~ . + x2)
Error in model.frame.default(formula = y ~ x + x2) :
invalid type (list) for variable 'x'
model1<-update(model1, . ~ . + x2)
If I add a component out$terms <- mt into the output of modelx, everything works in both cases. Why is this component needed and why does the update function work without it in the second case but not in the first case?
If you look at the help for update (?update) it tells you this:
Description
update will update and (by default) re-fit a model. It does this by extracting the call stored in the object, updating the call and (by default) evaluating that call. Sometimes it is useful to call update with only one argument, for example if the data frame has been corrected.
“Extracting the call” in update() and similar functions uses getCall() which itself is a (S3) generic function with a default method that simply gets x$call.
Because of this, update() will often work (via its default method) on new model classes, either automatically, or by providing a simple getCall() method for that class.
Usage
update(object, ...)
getCall(x, ...)
It looks to me like the clash is occurring because information is being passed through to the getCall function where x is the name of a parameter and this then experiences a name clash with your x and the language is choosing the local x over your x.