Strange glm() behavior in a function - r

Please help me understand the re-producible example below.
I am trying to write a function glm_func() that would call glm(). It works perfectly fine outside of a function.
However, if I pass the linear model formula as an argument, the function glm_func() gives out a strange error:
Error in eval(extras, data, env) : object 'modeldata' not found
Can someone help me understand what went wrong?
# Fully reproducable example
# Specify data
aa = data.frame(y=1:100, x1=1:100, x2=rep(1, 100), z=runif(100))
lm_formula = as.formula('y ~ x1 + x2')
weight_var = 'z'
# GLM works as-is outside of a function
model1 = glm(formula = lm_formula, data = aa, weights = aa[[weight_var]])
# Why does this function not work?
glm_func <- function(modeldata, formula, weight){
thismodel=glm(
formula = formula, #<----- Does not work if formula is passed from argument
data = modeldata, weights = modeldata[[weight]])}
glm_func(modeldata=aa, formula=lm_formula, weight=weight_var)
# This function works
glm_func2 <- function(modeldata, weight){
thismodel=glm(
formula = y ~ x1 + x2, #<----- Works if formula is hardcoded
data = modeldata, weights = modeldata[[weight]])}
glm_func2(modeldata=aa, weight=weight_var)

From help("formula"):
A formula object has an associated environment, and this environment
(rather than the parent environment) is used by model.frame to
evaluate variables that are not found in the supplied data argument.
Formulas created with the ~ operator use the environment in which they
were created. Formulas created with as.formula will use the env
argument for their environment.
From this one would expect that you don't need to care about the environment if you use the data argument. Sadly that's not the case here because the weights are evaluated within the formula's environment (Thanks to useruser2554330 for pointing this out!).
So, you need to ensure that your function environment is associated with the formula:
glm_func <- function(modeldata, formula, weight){
environment(formula) <- environment()
glm(formula = formula, data = modeldata,
weights = modeldata[[weight]])
}
glm_func(modeldata=aa, formula=lm_formula, weight=weight_var)
#works
Personally, I'd do this instead:
glm_func <- function(modeldata, formula, weight){
environment(formula) <- environment()
eval(
bquote(
glm(formula = .(formula), data = modeldata,
weights = modeldata[[weight]])
)
)
}
This way, the actual formula is printed when you print the model object.

As #Roland commented that a formula object has an associated environment so instead of passing a formula object you can pass the variables and create the formula inside the function.
glm_func <- function(modeldata, resp, predictor, weight){
glm(formula = reformulate(predictor, resp),
data = modeldata, weights = modeldata[[weight]])
}
glm_func(modeldata=aa, 'y', c('x1', 'x2'), weight=weight_var)

Related

Use of variable arguments (dot-dot-dot) in stats::lm in R [duplicate]

This question already has answers here:
Ellipsis Trouble: Passing ... to lm
(2 answers)
Closed last year.
Suppose we have a function that makes a call to stats::lm and takes a formula and a data frame as arguments. Further arguments that we want to pass to stats::lm can be provided using variable arguments:
outer_function <- function(formula, data, ...) {
z <- stats::lm(formula = formula, data = data, ...)
return(z)
}
Now suppose we want to use this function and provide an additional argument (weights) that will be passed to stats::lm.
data <- data.frame(replicate(5, rnorm(100)))
weights <- replicate(100, 1)
formula <- X1 ~ X2 + X3
outer_function(formula = formula, data = data, weights = weights)
This produces the following error in stats::lm:
Error in eval(extras, data, env) :
..1 used in an incorrect context, no ... to look in
Debugging the call to stats::lm I see that argument weights is correctly passed to stats::lm, but match.call(), which is later used for evaluation in the function, is
stats::lm(formula = formula, data = data, weights = ..1)
such that weights is assigned the first element of the ...-list, which is empty.
Can anybody elaborate on why this approach fails? In particular, if weights was a scalar (say 5) the problem would not arise and the match.call() would be
stats::lm(formula = formula, data = data, weights = 5)
For now, I am using the following solution for my function:
outer_function <- function(formula, data, ...) {
args <- list(formula = formula, data = data, ...)
z <- do.call(stats::lm, args)
return(z)
}
which works but I am still wondering whether there is no way around do.call in case the arguments in ... are vectors or lists.
I can't think of a work-around as safe and as succinct as do.call. I can explain what's going on, having debugged into the lm call.
In the body of lm, you'll find the statement
mf <- eval(mf, parent.frame())
On the right hand side of the assignment, mf is the call
stats::model.frame(formula = formula, data = data, weights = ..1,
drop.unused.levels = TRUE)
and parent.frame() is the frame of the outer_function call (in other words, the evaluation environment of outer_function). eval is evaluating mf in parent.frame(). Due to S3 dispatch, what is ultimately evaluated in parent.frame() is the call
stats::model.frame.default(formula = formula, data = data, weights = ..1,
drop.unused.levels = TRUE)
In the body of model.frame.default, you'll find the statement
extras <- eval(extras, data, env)
On the right hand side of this assignment, extras is the call
list(weights = ..1)
specifying the arguments from mf matched to the formal argument ... of model.frame.default (just weights, in this case, because model.frame.default has formal arguments named formula, data, and drop.unused.levels); data is the data frame containing your simulated data; and env is your global environment. (env is defined earlier in the body of model.frame.default as environment(formula), which is indeed your global environment, because that is where you defined formula.)
eval is evaluating extras in data with env as an enclosure. An error is thrown here, because the data frame data and your global environment env are not valid contexts for ..n. The symbol ..1 is valid only in the frame of a function with ... as a formal argument.
You might have deduced the issue from ?lm, which notes:
All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula.
There is no problem when weights is given the value of a constant (i.e., not the name of a variable bound in an environment and not a function call) in the outer_function call, because in that situation match.call does not substitute the symbol ..n. Hence
outer_function(formula = formula, data = data, weights = 5)
works (well, a different error is thrown), but
weights <- 5
outer_function(formula = formula, data = data, weights = weights)
and
outer_function(formula = formula, data = data, weights = rep(1, 100))
don't.

Error in calling `lm` in a `lapply` with `weights` argument

I've encounter a weird behavior when calling lm within a lapply using the weights argument.
My code consist of a list of formula on which I run a linear model that I call in lapply. So far it was working:
dd <- data.frame(y = rnorm(100),
x1 = rnorm(100),
x2 = rnorm(100),
x3 = rnorm(100),
x4 = rnorm(100),
wg = runif(100,1,100))
ls.form <- list(
formula(y~x1+x2),
formula(y~x3+x4),
formula(y~x1|x2|x3),
formula(y~x1+x2+x3+x4)
)
res.no.wg <- lapply(ls.form, lm, data = dd)
However, when I add the weights argument, I get a weird error:
res.with.wg <- lapply(ls.form, lm, data = dd, weights = dd[,"wg"])
Error in eval(extras, data, env) :
..2 used in an incorrect context, no ... to look in
It's like if the ... from lapply was conflicting with the ... of the lm call but only because of the weights argument.
Any idea was is the cause of this problem and how to fix it?
NOTE: using the call without the lapply works as expected:
lm(ls.form[[1]], data = dd, weights = dd[,"wg"] )
Call:
lm(formula = ls.form[[1]], data = dd, weights = dd[, "wg"])
Coefficients:
(Intercept) x1 x2
-0.12020 0.06049 -0.01937
EDIT The final call is a lapply within a function of the type:
f1 <- function(samp, dat, wgt){
res.with.wg2 <- lapply(ls.form, function(x) {lm(formula = x, data=dat[samp,], weights=dat[samp,wgt])})
}
f1(1:66, dat=dd, wgt = "wg")
I am not really sure why it is not working, but I do think I have a fix for you:
res.with.wg2 <- lapply(ls.form,
function(x) {lm(formula = x, data=dd, weights=dd[,"wg"])})
Hope this helps!
There is a note in the help file for lapply:
For historical reasons, the calls created by lapply are unevaluated,
and code has been written (e.g., bquote) that relies on this. This
means that the recorded call is always of the form FUN(X[[i]], ...),
with i replaced by the current (integer or double) index. This is not
normally a problem, but it can be if FUN uses sys.call or match.call
or if it is a primitive function that makes use of the call. This
means that it is often safer to call primitive functions with a
wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is
required to ensure that method dispatch for is.numeric occurs
correctly.
lm uses match.call twice in its opening lines:
cl <- match.call()
mf <- match.call(expand.dots = FALSE)
The solution noted in the help file and by #Florian is to use an anonymous function wrapper.
Update
For this specific problem of changing the model formula, you can rewrite to avoid calling lm within the lapply by using update instead:
# create base model (the formula here doesn't really matter, but we can specify the weights safely here)
baselm <- lm(y+x1,data=dd,weights=dd[,"wg"])
# update with lapply
lapply(ls.form,update,object=baselm)
[[1]]
Call:
lm(formula = y ~ x1 + x2, data = dd, weights = dd[, "wg"])
Coefficients:
(Intercept) x1 x2
0.07561 0.16111 0.15014
...

Save an object in R using a function parameter as name

I looked all over the website and could not get the correct answer for this dilemma:
I have an UDF for evaluating some classification models, with different datasets, and i wanted to have a single function for evaluating them. I want to have something like the following, that given the name of the model and the data, it computes some metrics (confusion matrix for example) and saves them to an object outside the function.
The problem here is that I want to create this object using the name of the model I am evaluating.
I ended up with something like this:
foo <- function(x) {return(as.character(substitute(x)))}
model1 <- lm(Sepal.Width ~ Sepal.Length, iris)
Validation.func <- function(model_name, dataset){
Pred_Train = predict(model_name, dataset)
assign(paste("Pred_Train_",foo(model_name), sep=''), Pred_Train, envir=globalenv())
Pred_Train_prob = predict(model_name, dataset, type = "prob")
MC_Train = confusionMatrix(Pred_Train, dataset$target_salto)
}
Running it for Validation.func(model1,iris) We would want to get the variable stored as "Pred_Train_model1".
As model_name is not a string we had to try to convert it using the foo function (which is the answer i found in here) foo = function(x)deparse(substitute(x)) I do not get what I want, since it saves the object as: "Pred_Train_model_name" instead of "Pred_Train_model1".
Does anyone know how to solve it?
model_name in your function must be a model object, hence cannot be used in paste function, which expects characters.
I think you want your function to know that the model object is actually called "model1" in the environment where it comes from. I think this is quite tricky attempt since your model object may be called by various names.
The easiest implementation would be to give both model object and the name separately, and the use the former for prediction and the latter for naming the outcome.
func1 <- function(model, model_str, dataset)
{
p <- predict(model, dataset)
assign(paste("predict_", model_str, sep=""), p, envir=globalenv())
}
model1 <- lm(mpg ~ cyl, data=mtcars)
func1(model1, "model1", mtcars)
predict_model1
Another implementation, tricky but works if used with care, would be to give only the character name of the model and obtain the model object by get function from the parent environment.
func2 <- function(model_str, dataset)
{
p <- predict(get(model_str, envir=parent.env(environment())), dataset)
assign(paste("predict_", model_str, sep=""), p, envir=globalenv())
}
model2 <- lm(mpg ~ cyl, data=mtcars)
func2("model2", mtcars)
predict_model2
Finally, in order to give the model object to the function and let the function to find the variable name, then you can use match.call function to recover how the function has been called.
func3 <- function(model, dataset)
{
s <- match.call()
model_str <- as.character(s)[2]
p <- predict(model, dataset)
assign(paste("predict_", model_str, sep=""), p, envir=globalenv())
}
model3 <- lm(mpg ~ cyl, data=mtcars)
func3(model3, mtcars)
predict_model3
So here's a suggestion, that does not exactly solve the problem, but does make the function work.
Validation.func <- function(model_name, dataset){
model_name_obj<- eval(parse(text = model_name))
Pred_Train = predict(model_name_obj, dataset)
assign(paste("Pred_Train_",model_name, sep=''), Pred_Train, envir=globalenv())
Pred_Train_prob = predict(model_name_obj, dataset, type = "prob")
MC_Train = confusionMatrix(Pred_Train, dataset$target_salto)
}
Validation.func("model1", data)
What I did is pretty much the opposite of what you were trying. I passed model_name as a string, and then evaluate it using parse(text = model_name). Note that the evaluated object is now called model_name_obj and it is passed in the predict function.
I got some errors later on in the function, but they are irrelevant to the issue at hand. They had to do with the type argument in predict and about not recognizing the confusionMatrix, because I assume I didn't load the corresponding package.

Error in eval(expr, envir, enclos) : could not find function - Nested Functions & Environments

The R code shown below is a minimal working example to reproduce an error that I can't say I understand. Running the script should yield the error, Error in eval(expr, envir, enclos) : could not find function "fitModel". After reading a thing or two on environments I think I understand why this is happening in this case, "fitModel" is not defined in the execution environment of "obscureFunction". This I fixed by making the following change to "myFormula":
myFormula <- "y ~ eval(fitModel(x, a), envir = environment(fitModel))"
I don't understand how "fitModel" can be evaluated in the environment of "fitModel" when the function can't be found in the calling environment of "obscureFunction", in other words I don't understand why this code change works. I also don't understand why the original code works fine if the body of "topFunction" is run without calling it, i.e. if we define "fitModel" and "obscureFunction" in R_GlobalEnv and call "obscureFunction" from the console.
## Minimum Working Example to reproduce error
rm(list = ls())
library(minpack.lm)
topFunction <- function(){
fitModel <- function(x, a){
exp(-a * x)
}
## Create a function to use with lapply()
obscureFunction <- function(){
x <- seq(-1, 1, 0.01)
y <- exp(-0.5 * x)
Data <- data.frame(x, y)
init <- c(a = 1)
myFormula <- "y ~ fitModel(x, a)"
myFormula <- as.formula(myFormula)
nlsOutput <- nlsLM(formula = myFormula, start = init, data = Data)
return(nlsOutput)
}
## Function call
obscureFunction()
## Other calculations done with fitModel()
}
topFunction()
Well, there are two issues here. The first is using a string for a formula. It's better to use
myFormula <- y ~ fitModel(x, a)
The reason is that formulas capture their environment, strings do not. (As noted by #BridieG, the as.formula() will capture the environment; I skipped over that line when reading the code. I still think it's better to create the formula directly.) Having a reference environment makes it easier to find functions used in a formula. So if you were using lm() rather than nlsLM, this would work with these two changes
# myFormula <- "y ~ fitModel(x, a)" ... becomes
myFormula <- y ~ fitModel(x, 1)
#nlsOutput <- nlsLM(formula = myFormula, start = init, data = Data) ...becomes
nlsOutput <- lm(formula = myFormula, data = Data)
This works with the formula syntax (unquoted var names) and not the string because the formula can capture the environment.
At least that's how it should work. Package authors are free to evaluate formulas how ever they want, and the authors of the nlsLM() function decided to ignore the environment assigned to the formula. They do so in this function inside nlsLM()
FCT <- function(par) {
mf[m] <- par
rhs <- eval(formula[[3L]], envir = mf)
res <- lhs - rhs
res <- .swts * res
res
}
So this is the second problem. Here they enforce the evaluation in the mf object which is a data.frame made up of the covariates of the data and the parameter estimates. Had it been written as
rhs <- eval(formula[[3L]], envir = mf, environment(formula))
it would have worked. This is basically what model.frame() does in lm() that allows this to work. We can make our own "corrected" version of the function with
# tested with minpack.lm_1.1-8
nlsLM2<-nlsLM
body(nlsLM2)[[27]][[3]][[3]][[3]]<-quote(rhs<-eval(formula[[3L]], envir = mf, environment(formula)))
And then make these substitutions
# myFormula <- "y ~ fitModel(x, a)" ... becomes
myFormula <- y ~ fitModel(x, a)
#nlsOutput <- nlsLM(formula = myFormula, start = init, data = Data) ...becomes
nlsOutput <- nlsLM2(formula = myFormula, start = init, data = Data)
it work work and return
Nonlinear regression model
model: y ~ fitModel(x, a)
data: Data
a
0.5
residual sum-of-squares: 0
Number of iterations to convergence: 5
Achieved convergence tolerance: 1.49e-08
So there's not really much you can say about how all R functions handle environments and scope. This behavior is unique to how the nlsLM() authors decided to evaluate their parameters.

R - Pass optional arguments to nested functions

I would like to write a function and calls different sub-functions with parameters specified by string, such as:
genericModel <- function(model, dat, y, x, ...) {
fit <- get(model)(get(y) ~ get(x), data = dat, ...)
return(fit)
}
I am able to get it to work with simple cases:
> d <- data.frame(x.var = rnorm(10), y.var = rnorm(10), w = rep(1, 10))
> genericModel('lm', d, 'y.var', 'x.var')
Call:
get(model)(formula = get(y) ~ get(x), data = dat)
Coefficients:
(Intercept) get(x)
-0.04242 -0.31619
However, I have not been successful in terms of passing other optional arguments by string:
> genericModel('lm', d, 'y.var', 'x.var', weights = 'w')
Error in model.frame.default(formula = get(y) ~ get(x), data = dat, weights = "w", :
variable lengths differ (found for '(weights)')
I know I can do genericModel('lm', d, 'y.var', 'x.var', weights = d$w), but that defeats the purpose of creating a flexible function where I can specify the model and column names by string.
Also I can foresee complications where the optional parameters include both column names of the data.frame(ex:weights = w) and generic options for the sub-function(ex:na.action=na.pass).
EDIT:
Just to clarify, what I am hoping to achieve is:
genericModel('lm', d, 'y.var', 'x.var', weights = 'w')
genericModel('glm', d, 'y.var', 'x.var', family = 'binomial')
To run linear regression and logistic regression, respectively. I need some way to pass the optional arguments when calling genericModel.
Does anyone know how to deal with this? Thanks.
One suggestion: rather than fiddling with strings to specify analysis variables, what you should do is pass the formula. This is also much more flexible, since you'll be able to pass complicated model formulas directly to the underlying functions without any parsing.
If you do this, then obtaining what you want is simple with some language hacking. Get the call to the function, then manipulate it to call the model-fitting function instead.
genericModel <- function(mod, formula, data, ...)
{
cl <- match.call(expand=TRUE)
cl[[1]] <- cl$mod
cl$mod <- NULL
eval(cl, parent.frame())
}
genericModel(lm, mpg ~ hp, data=mtcars, weights=gear)
genericModel(glm, Volume ~ Girth + Height, data=trees, family=Gamma(link=log))

Resources