R error in updating formula when using variable named 'x': - r

I'm trying to program a model building function which uses the formula expression but I have some problems understanding how the model update function works.
Here's a stripped down function which results an error when using update function:
modelx <- function(formula) {
mf <- mc <- match.call()
mf <- mf[c(1L, match("formula", names(mf), 0L))]
mf[[1L]] <- as.name("model.frame")
mf <- eval(mf, parent.frame())
y <- model.response(mf, "numeric")
mt <- attr(mf, "terms")
X <- model.matrix(mt, mf)
out<-list(y=y,X=X)
out$call<-mc
out
}
The code is pretty much copied from the start of the lm function. Some example data and two models:
y<-x<-x1<-x2<-1:10
model<-modelx(y ~ x)
model1<-modelx(y ~ x1)
Now updating the first model does not work but second does:
model<-update(model, . ~ . + x2)
Error in model.frame.default(formula = y ~ x + x2) :
invalid type (list) for variable 'x'
model1<-update(model1, . ~ . + x2)
If I add a component out$terms <- mt into the output of modelx, everything works in both cases. Why is this component needed and why does the update function work without it in the second case but not in the first case?

If you look at the help for update (?update) it tells you this:
Description
update will update and (by default) re-fit a model. It does this by extracting the call stored in the object, updating the call and (by default) evaluating that call. Sometimes it is useful to call update with only one argument, for example if the data frame has been corrected.
“Extracting the call” in update() and similar functions uses getCall() which itself is a (S3) generic function with a default method that simply gets x$call.
Because of this, update() will often work (via its default method) on new model classes, either automatically, or by providing a simple getCall() method for that class.
Usage
update(object, ...)
getCall(x, ...)
It looks to me like the clash is occurring because information is being passed through to the getCall function where x is the name of a parameter and this then experiences a name clash with your x and the language is choosing the local x over your x.

Related

Save an object in R using a function parameter as name

I looked all over the website and could not get the correct answer for this dilemma:
I have an UDF for evaluating some classification models, with different datasets, and i wanted to have a single function for evaluating them. I want to have something like the following, that given the name of the model and the data, it computes some metrics (confusion matrix for example) and saves them to an object outside the function.
The problem here is that I want to create this object using the name of the model I am evaluating.
I ended up with something like this:
foo <- function(x) {return(as.character(substitute(x)))}
model1 <- lm(Sepal.Width ~ Sepal.Length, iris)
Validation.func <- function(model_name, dataset){
Pred_Train = predict(model_name, dataset)
assign(paste("Pred_Train_",foo(model_name), sep=''), Pred_Train, envir=globalenv())
Pred_Train_prob = predict(model_name, dataset, type = "prob")
MC_Train = confusionMatrix(Pred_Train, dataset$target_salto)
}
Running it for Validation.func(model1,iris) We would want to get the variable stored as "Pred_Train_model1".
As model_name is not a string we had to try to convert it using the foo function (which is the answer i found in here) foo = function(x)deparse(substitute(x)) I do not get what I want, since it saves the object as: "Pred_Train_model_name" instead of "Pred_Train_model1".
Does anyone know how to solve it?
model_name in your function must be a model object, hence cannot be used in paste function, which expects characters.
I think you want your function to know that the model object is actually called "model1" in the environment where it comes from. I think this is quite tricky attempt since your model object may be called by various names.
The easiest implementation would be to give both model object and the name separately, and the use the former for prediction and the latter for naming the outcome.
func1 <- function(model, model_str, dataset)
{
p <- predict(model, dataset)
assign(paste("predict_", model_str, sep=""), p, envir=globalenv())
}
model1 <- lm(mpg ~ cyl, data=mtcars)
func1(model1, "model1", mtcars)
predict_model1
Another implementation, tricky but works if used with care, would be to give only the character name of the model and obtain the model object by get function from the parent environment.
func2 <- function(model_str, dataset)
{
p <- predict(get(model_str, envir=parent.env(environment())), dataset)
assign(paste("predict_", model_str, sep=""), p, envir=globalenv())
}
model2 <- lm(mpg ~ cyl, data=mtcars)
func2("model2", mtcars)
predict_model2
Finally, in order to give the model object to the function and let the function to find the variable name, then you can use match.call function to recover how the function has been called.
func3 <- function(model, dataset)
{
s <- match.call()
model_str <- as.character(s)[2]
p <- predict(model, dataset)
assign(paste("predict_", model_str, sep=""), p, envir=globalenv())
}
model3 <- lm(mpg ~ cyl, data=mtcars)
func3(model3, mtcars)
predict_model3
So here's a suggestion, that does not exactly solve the problem, but does make the function work.
Validation.func <- function(model_name, dataset){
model_name_obj<- eval(parse(text = model_name))
Pred_Train = predict(model_name_obj, dataset)
assign(paste("Pred_Train_",model_name, sep=''), Pred_Train, envir=globalenv())
Pred_Train_prob = predict(model_name_obj, dataset, type = "prob")
MC_Train = confusionMatrix(Pred_Train, dataset$target_salto)
}
Validation.func("model1", data)
What I did is pretty much the opposite of what you were trying. I passed model_name as a string, and then evaluate it using parse(text = model_name). Note that the evaluated object is now called model_name_obj and it is passed in the predict function.
I got some errors later on in the function, but they are irrelevant to the issue at hand. They had to do with the type argument in predict and about not recognizing the confusionMatrix, because I assume I didn't load the corresponding package.

R Programming: Evaluating an expression when objects exist in multiple environments

Short Version
An expression with two variables, x and y, where x is contained in environment 1
and y is contained in a second environment. How does the programmer evaluate
the expression?
Detailed Version
I have a function that takes a formula and data.frame as arguments. On the
the right hand side of the formula is a call to the function splines::bs to
generate a B-spline basis. The workhorse function does a few things, one of
which requires extracting the bs call from the formula and evaluating it. The
problem I am trying to solve involves evaluating the bs call when argument
values are contained in different environments.
Here are the functions needed to recreate the issue I am working on
library(splines)
extract_bmat <- function(form) {
B <- NULL
rr <- function(x) {
if (is.call(x) && grepl("bs", deparse(x[[1]]))) {
B <<- x
} else if (is.recursive(x)) {
as.call(lapply(as.list(x), rr))
} else {
x
}
}
z <- lapply(as.list(form), rr)
B
}
some_workhorse <- function(formula, data) {
# ... lots of cool stuff ...
# fit <- lm(formula, data)
bmat <- eval(extract_bmat(formula), data)
bmat
}
# The following works when evaluated in the .GlobalEnv
# The eval(extract_bmat(formula), data) call within the some_workhorse
# function works without errors
xi <- c(3, 4.5)
eg_data <- data.frame(x = 1:10, y = sin(1:10))
some_workhorse(y ~ bs(x, knots = xi), data = eg_data)
Now, if the function some_workhorse and the xi vector and eg_data
data.frame are generated within a function environment causes an error.
foo <- function() {
xi_in_foo <- c(2, 3)
eg_data_in_foo <- data.frame(x = 1:10, y = sin(1:10))
some_workhorse(y ~ bs(x, knots = xi_in_foo), data = eg_data_in_foo)
}
foo()
# Error in sort(c(rep(Boundary.knots, ord), knots)) :
# object 'xi_in_foo' not found
The location of the error is within the splines::bs call, but that is not the
important part; xi_in_foo not found is the important issue to address.
I know the issue is related to my poor handling of environments in R. My
primary question is
How should the call eval(extract_bmat(formula), data) within the
some_workhorse function be written so that it works correctly when called in
the .GlobalEnv or when called within a function environment?
Secondary question:
Within the extract_bmat function, I would prefer to define an environment
for B and use assign instead of <<-. I suspect that <<- is the best
option because of the uncertainty in the levels of recursion taking place.
That said, I would like to see other solutions.
Thanks for the help.
You should define your function as
some_workhorse <- function(formula, data) {
# ... lots of cool stuff ...
# fit <- lm(formula, data)
bmat <- eval(extract_bmat(formula), data, environment(formula))
bmat
}
Note that formulas in R capture the environment in which they were created. As long as xi_in_foo exists in the environment where the formula was defined, this should work. Variables will first be looked up in the data list/data.frame and then the formula environment would be used as the enclosing environment. If you weren't using formula,s sometimes people use parent.frame() as the enclos= parameter so that variables are looked for in the environment in which the function was called, rather than were the function was defined as is the default with R's lexical scoping.

access tuneValue from the model object when it is a S4 object - caret, custom model

I am using a custom model in caret which basically builds on the vanilla "cforest" method.
To build my model, I fetch the modelInfo for the cforest model:
newModel <- getModelInfo("cforest", regex=F)[[1]]
I need to implement a custom predict function so I do:
out$predict = function(modelFit, newdata, submodels = NULL) {
# new predict function which needs the tuned parameters
best.params <- modelFit$tuneValue
# rest of the code using best params
}
The content of the new predict function in itself is irrelevant. The point is, I need the tuned values from within the predict function.
While the code works perfectly fine with other models, this won't work with cforest because in this case, modelFit is a "RandomForest" S4 object and I cannot access tuneValue. (The exact error being "Error in modelFit$tuneValue : $ operator not defined for this S4 class")
I explored the "RandomForest" object and it does not appear to contain the tuned values in any slot.
My guess is that, since it is a S4 object, the caret code which stores the tuned values into $tuneValue does not work in this particular case.
Maybe I can save the tuned values manually at some point during the fitting process, but I don't know
1 - when I should do it (when are the tuned values selected?)
2 - where I should save them to have access to them during predict
Does anyone have an idea how I could do this?
Here is a minimal code to generate a RandomForest S4 object:
x <- matrix(rnorm(20*10), 20, 10)
y <- x %*% rnorm(10)
y <- factor(y<mean(y), levels=c(T,F), labels=c("high", "low"))
new_model <- getModelInfo("cforest", regex=F)[[1]]
fit <- train(x=x, y=y, method = new_model)
# this is a RandomForest S4 object
fit$finalModel
Took me a while to figure out but it was actually kind of straightforward. Since the model is a S4 object and I want to add informations in it... I built my own S4 object inheriting from this model!
In order to do this, I had to change the "fit" function.
# loading vanilla model
newModel <- getModelInfo("cforest", regex=F)[[1]]
# backing up old fit fun
newModel$old_fit <- out$fit
# editing fit function to wrap the S4 model into my custom S4 object
newModel$fit <- function(x, y, wts, param, lev, last, classProbs, ...) {
tmp <- newModel$old_fit(x, y, wts, param, lev, last, classProbs, ...)
if(isS4(tmp)) {
old_class <- as.character(class(tmp))
# creating custom class with the slots I need
setClass(".custom", slots=list(.threshold="numeric", .levels="character"), contains=old_class)
# instanciating the new class with values taken from the argument of fit()
tmp <- new(".custom", .threshold=param$threshold, .levels=lev, tmp)
}
tmp
}
And now, the model objects are consistently of class ".custom" so I can do:
newModel$predict = function(modelFit, newdata, submodels = NULL) {
if(isS4(modelFit)){
if(class(modelFit)!=".custom")
error("predict() received a S4 object whose class was not '.custom'")
obsLevels <- modelFit#.levels
threshold <- modelFit#.threshold
} else {
obsLevels <- modelFit$obsLevels
threshold <- modelFit$tuneValue$threshold
}
# rest of the code
}
This is great, now my custom model can extend any caret model, regardless on whether it relies on S4 objects, like cforest or svm!

Error in eval(expr, envir, enclos) : could not find function - Nested Functions & Environments

The R code shown below is a minimal working example to reproduce an error that I can't say I understand. Running the script should yield the error, Error in eval(expr, envir, enclos) : could not find function "fitModel". After reading a thing or two on environments I think I understand why this is happening in this case, "fitModel" is not defined in the execution environment of "obscureFunction". This I fixed by making the following change to "myFormula":
myFormula <- "y ~ eval(fitModel(x, a), envir = environment(fitModel))"
I don't understand how "fitModel" can be evaluated in the environment of "fitModel" when the function can't be found in the calling environment of "obscureFunction", in other words I don't understand why this code change works. I also don't understand why the original code works fine if the body of "topFunction" is run without calling it, i.e. if we define "fitModel" and "obscureFunction" in R_GlobalEnv and call "obscureFunction" from the console.
## Minimum Working Example to reproduce error
rm(list = ls())
library(minpack.lm)
topFunction <- function(){
fitModel <- function(x, a){
exp(-a * x)
}
## Create a function to use with lapply()
obscureFunction <- function(){
x <- seq(-1, 1, 0.01)
y <- exp(-0.5 * x)
Data <- data.frame(x, y)
init <- c(a = 1)
myFormula <- "y ~ fitModel(x, a)"
myFormula <- as.formula(myFormula)
nlsOutput <- nlsLM(formula = myFormula, start = init, data = Data)
return(nlsOutput)
}
## Function call
obscureFunction()
## Other calculations done with fitModel()
}
topFunction()
Well, there are two issues here. The first is using a string for a formula. It's better to use
myFormula <- y ~ fitModel(x, a)
The reason is that formulas capture their environment, strings do not. (As noted by #BridieG, the as.formula() will capture the environment; I skipped over that line when reading the code. I still think it's better to create the formula directly.) Having a reference environment makes it easier to find functions used in a formula. So if you were using lm() rather than nlsLM, this would work with these two changes
# myFormula <- "y ~ fitModel(x, a)" ... becomes
myFormula <- y ~ fitModel(x, 1)
#nlsOutput <- nlsLM(formula = myFormula, start = init, data = Data) ...becomes
nlsOutput <- lm(formula = myFormula, data = Data)
This works with the formula syntax (unquoted var names) and not the string because the formula can capture the environment.
At least that's how it should work. Package authors are free to evaluate formulas how ever they want, and the authors of the nlsLM() function decided to ignore the environment assigned to the formula. They do so in this function inside nlsLM()
FCT <- function(par) {
mf[m] <- par
rhs <- eval(formula[[3L]], envir = mf)
res <- lhs - rhs
res <- .swts * res
res
}
So this is the second problem. Here they enforce the evaluation in the mf object which is a data.frame made up of the covariates of the data and the parameter estimates. Had it been written as
rhs <- eval(formula[[3L]], envir = mf, environment(formula))
it would have worked. This is basically what model.frame() does in lm() that allows this to work. We can make our own "corrected" version of the function with
# tested with minpack.lm_1.1-8
nlsLM2<-nlsLM
body(nlsLM2)[[27]][[3]][[3]][[3]]<-quote(rhs<-eval(formula[[3L]], envir = mf, environment(formula)))
And then make these substitutions
# myFormula <- "y ~ fitModel(x, a)" ... becomes
myFormula <- y ~ fitModel(x, a)
#nlsOutput <- nlsLM(formula = myFormula, start = init, data = Data) ...becomes
nlsOutput <- nlsLM2(formula = myFormula, start = init, data = Data)
it work work and return
Nonlinear regression model
model: y ~ fitModel(x, a)
data: Data
a
0.5
residual sum-of-squares: 0
Number of iterations to convergence: 5
Achieved convergence tolerance: 1.49e-08
So there's not really much you can say about how all R functions handle environments and scope. This behavior is unique to how the nlsLM() authors decided to evaluate their parameters.

Extract formula from model in R

I'm building a function for many model types which needs to extract the formula used to make the model. Is there a flexible way to do this? For example:
x <- rnorm(10)
y <- rnorm(10)
z <- rnorm(10)
equation <- z ~ x + y
model <- lm(equation)
I what I need to do is extract the formula object "equation" once being passed the model.
You could get what you wanted by:
model$call
# lm(formula = formula)
And if you want to see what I did find out then use:
str(model)
Since you passed 'formula' (bad choice of names by the way) from the calling environment you might then need to extract from the object you passed:
eval(model$call[[2]])
# z ~ x + y
#JPMac offered a more compact method: formula(model). It's also worth looking at the mechanism used by the formula.lm function. The function named formula is generic and you use methods(formula) to see what S3 methods have been defined. Since the formula.lm method has an asterisk at its end, you need to wrap it in `getAnywhere:
> getAnywhere(formula.lm)
A single object matching ‘formula.lm’ was found
It was found in the following places
registered S3 method for formula from namespace stats
namespace:stats
with value
function (x, ...)
{
form <- x$formula
if (!is.null(form)) {
form <- formula(x$terms)
environment(form) <- environment(x$formula)
form
}
else formula(x$terms)
}
<bytecode: 0x36ff26158>
<environment: namespace:stats>
So it is using "$" to extract the list item named "formula" rather than pulling it from the call. If the $formula item is missing (which it is in your case) then It then replaces that with formula(x$terms) which I suspect is calling formula.default and looking at the operation of that function appears to only be adjusting the environment of the object.
As noted, model$call will get you the call that created the lm object, but if that call contains an object itself as the model formula, you get the object name, not the formula.
The evaluated object, ie the formula itself, can be accessed in model$terms (along with a bunch of auxiliary information on how it was treated). This should work regardless of the details of the call to lm.

Resources