Save an object in R using a function parameter as name - r

I looked all over the website and could not get the correct answer for this dilemma:
I have an UDF for evaluating some classification models, with different datasets, and i wanted to have a single function for evaluating them. I want to have something like the following, that given the name of the model and the data, it computes some metrics (confusion matrix for example) and saves them to an object outside the function.
The problem here is that I want to create this object using the name of the model I am evaluating.
I ended up with something like this:
foo <- function(x) {return(as.character(substitute(x)))}
model1 <- lm(Sepal.Width ~ Sepal.Length, iris)
Validation.func <- function(model_name, dataset){
Pred_Train = predict(model_name, dataset)
assign(paste("Pred_Train_",foo(model_name), sep=''), Pred_Train, envir=globalenv())
Pred_Train_prob = predict(model_name, dataset, type = "prob")
MC_Train = confusionMatrix(Pred_Train, dataset$target_salto)
}
Running it for Validation.func(model1,iris) We would want to get the variable stored as "Pred_Train_model1".
As model_name is not a string we had to try to convert it using the foo function (which is the answer i found in here) foo = function(x)deparse(substitute(x)) I do not get what I want, since it saves the object as: "Pred_Train_model_name" instead of "Pred_Train_model1".
Does anyone know how to solve it?

model_name in your function must be a model object, hence cannot be used in paste function, which expects characters.
I think you want your function to know that the model object is actually called "model1" in the environment where it comes from. I think this is quite tricky attempt since your model object may be called by various names.
The easiest implementation would be to give both model object and the name separately, and the use the former for prediction and the latter for naming the outcome.
func1 <- function(model, model_str, dataset)
{
p <- predict(model, dataset)
assign(paste("predict_", model_str, sep=""), p, envir=globalenv())
}
model1 <- lm(mpg ~ cyl, data=mtcars)
func1(model1, "model1", mtcars)
predict_model1
Another implementation, tricky but works if used with care, would be to give only the character name of the model and obtain the model object by get function from the parent environment.
func2 <- function(model_str, dataset)
{
p <- predict(get(model_str, envir=parent.env(environment())), dataset)
assign(paste("predict_", model_str, sep=""), p, envir=globalenv())
}
model2 <- lm(mpg ~ cyl, data=mtcars)
func2("model2", mtcars)
predict_model2
Finally, in order to give the model object to the function and let the function to find the variable name, then you can use match.call function to recover how the function has been called.
func3 <- function(model, dataset)
{
s <- match.call()
model_str <- as.character(s)[2]
p <- predict(model, dataset)
assign(paste("predict_", model_str, sep=""), p, envir=globalenv())
}
model3 <- lm(mpg ~ cyl, data=mtcars)
func3(model3, mtcars)
predict_model3

So here's a suggestion, that does not exactly solve the problem, but does make the function work.
Validation.func <- function(model_name, dataset){
model_name_obj<- eval(parse(text = model_name))
Pred_Train = predict(model_name_obj, dataset)
assign(paste("Pred_Train_",model_name, sep=''), Pred_Train, envir=globalenv())
Pred_Train_prob = predict(model_name_obj, dataset, type = "prob")
MC_Train = confusionMatrix(Pred_Train, dataset$target_salto)
}
Validation.func("model1", data)
What I did is pretty much the opposite of what you were trying. I passed model_name as a string, and then evaluate it using parse(text = model_name). Note that the evaluated object is now called model_name_obj and it is passed in the predict function.
I got some errors later on in the function, but they are irrelevant to the issue at hand. They had to do with the type argument in predict and about not recognizing the confusionMatrix, because I assume I didn't load the corresponding package.

Related

Error including correlation structure in function with gamm

I am trying to create my own function that contains 1.) the mgcv gamm function and 2.) a nested autocorrelation (ARMA) argument. I am getting an error when I try to run the function like this:
df <- AirPassengers
df <- as.data.frame(df)
df$month <- rep(1:12)
df$yr <- rep(1949:1960,each=12)
df$datediff <- 1:nrow(df)
try_fxn1 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+s(datediff,bs="ts",k=20), data=dfz,correlation = corARMA(form = ~ 1|yr, p=2))}
try_fxn1(df,"x")
Error in eval(predvars, data, env) : object 'dfz' not found
I know the issue is with the correlation portion of the formula, as when I run the same function without the correlation structure included (as seen below), the function behaves as expected.
try_fxn2 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+ s(datediff,bs="ts",k=20), data=dfz)}
try_fxn2(df,"x")
Any ideas on how I can modify try_fxn1 to make the function behave as expected? Thank you!
You are confusing a vector with the symbolic representation of that vector when building a formula.
You don't want dfz[[colz]] as the response in the formula, you want x or whatever you set colz too. What you are getting is
dfz[[colz]] ~ ...
when what you really want is the variable colz:
colz ~ ...
And you don't want a literal colz but whatever colz evaluates to. To do this you can create a formula by pasting the parts together:
fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')
This turns colz into whatever it was storing, not the literal colz:
> fml
[1] "x ~ s(month, bs=\"cc\", k=12) + s(datediff,bs=\"ts\",k=20)"
Then convert the string into a formula object using formula() or as.formula().
The final solution then is:
fit_fun <- function(dfz, colz) {
fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')
fml <- formula(fml)
gamm(fml, data = df, correlation = corARMA(form = ~ 1|yr, p=2))
}
This really is not an issue with corARMA() part, other than that triggers somewhat different evaluation code for the formula. The guiding mantra here is to always get a formula as you would type it if not programming with formulas. You would never (or should never) write a formula like
gamm(df[[var]] ~ x + s(z), ....)
While this might work in some settings, it will fail miserably if you ever want to use predict()` and it fails when you have to do something a little more complicated.

Strange glm() behavior in a function

Please help me understand the re-producible example below.
I am trying to write a function glm_func() that would call glm(). It works perfectly fine outside of a function.
However, if I pass the linear model formula as an argument, the function glm_func() gives out a strange error:
Error in eval(extras, data, env) : object 'modeldata' not found
Can someone help me understand what went wrong?
# Fully reproducable example
# Specify data
aa = data.frame(y=1:100, x1=1:100, x2=rep(1, 100), z=runif(100))
lm_formula = as.formula('y ~ x1 + x2')
weight_var = 'z'
# GLM works as-is outside of a function
model1 = glm(formula = lm_formula, data = aa, weights = aa[[weight_var]])
# Why does this function not work?
glm_func <- function(modeldata, formula, weight){
thismodel=glm(
formula = formula, #<----- Does not work if formula is passed from argument
data = modeldata, weights = modeldata[[weight]])}
glm_func(modeldata=aa, formula=lm_formula, weight=weight_var)
# This function works
glm_func2 <- function(modeldata, weight){
thismodel=glm(
formula = y ~ x1 + x2, #<----- Works if formula is hardcoded
data = modeldata, weights = modeldata[[weight]])}
glm_func2(modeldata=aa, weight=weight_var)
From help("formula"):
A formula object has an associated environment, and this environment
(rather than the parent environment) is used by model.frame to
evaluate variables that are not found in the supplied data argument.
Formulas created with the ~ operator use the environment in which they
were created. Formulas created with as.formula will use the env
argument for their environment.
From this one would expect that you don't need to care about the environment if you use the data argument. Sadly that's not the case here because the weights are evaluated within the formula's environment (Thanks to useruser2554330 for pointing this out!).
So, you need to ensure that your function environment is associated with the formula:
glm_func <- function(modeldata, formula, weight){
environment(formula) <- environment()
glm(formula = formula, data = modeldata,
weights = modeldata[[weight]])
}
glm_func(modeldata=aa, formula=lm_formula, weight=weight_var)
#works
Personally, I'd do this instead:
glm_func <- function(modeldata, formula, weight){
environment(formula) <- environment()
eval(
bquote(
glm(formula = .(formula), data = modeldata,
weights = modeldata[[weight]])
)
)
}
This way, the actual formula is printed when you print the model object.
As #Roland commented that a formula object has an associated environment so instead of passing a formula object you can pass the variables and create the formula inside the function.
glm_func <- function(modeldata, resp, predictor, weight){
glm(formula = reformulate(predictor, resp),
data = modeldata, weights = modeldata[[weight]])
}
glm_func(modeldata=aa, 'y', c('x1', 'x2'), weight=weight_var)

Include an object within a function only if it exists

I have a loop that needs to be executed; within which are 6 models. The objects that those models are stored in then need to get passed into a function that executes an AIC analysis. However, sometimes one of the models does not work, which then breaks the code for the AIC function because it does not recognize whatever model that failed because it was not stored as an object.
So, I need a way to pull those models that worked into the AIC function.
Here is an example, but keep in mind it is important that this can all be executed within a loop. Here are three hypothetical models:
hn.1 <- ds(data)
hn.1.obs <- ds(data,formula = ~OBSCODE)
hn.1.obs.mas <- ds(dataformula = ~OBSCODE+MAS)
And this would be my AIC function that compares the models:
summarize_ds_models(hn.1, hn.1.obs, hn.1.obs.mas)
But I get an error if say, the hn.1.obs.mas model failed.
I tried to use "get" and "ls" and I successfully pull the models that exist when I call:
get(ls(pattern='hn.15*'))
But that just returns a character vector, so that when I call:
summarize_ds_models(get(ls(pattern='hn.15*')))
it only conducts the AIC analysis on the first model in the above character vector.
Am I on the right track or is there a better way to do this?
UPDATE with a reproducible example.
Here is a simplified version of my problem:
create and fill two data frames that will be put into a list:
data.frame <- data.frame(x = integer(4),
y = integer(4),
z = integer(4),
i = integer(4))
data.frame$x <- c(1,2,3,4)
data.frame$y <- c(1,4,9,16)
data.frame$z <- c(1,3,8,10)
data.frame$i <- c(1,5,10,15)
data.frame.2 <- data.frame[1:4,1:3]
my.list <- list(data.frame,data.frame.2)
create df to fill with best models from AIC analyses
bestmodels <- data.frame(modelname = character(2))
Here is the function that will run the loop:
myfun <- function(list) {
for (i in 1:length(my.list)){
mod.1 = lm(y ~ x, data = my.list[[i]])
mod.2 = lm(y ~ x + z, data = my.list[[i]])
mod.3 = lm(y ~ i, data = my.list[[i]])
bestmodels[i,1] <- rownames(AIC(mod.1,mod.2,mod.3))[1]#bestmodel is 1st row
}
print(bestmodels)
}
However, on the second iteration of the loop, the AIC function will fail because mod.3 will fail. So, is there a generic way to make it so the AIC function will only execute for those models that worked? The outcome I would want here would be:
> bestmodels
modelname
1 mod.1
2 mod.1
since mod.1 would be chosen for both AIC analyses.
Gregor's comment:
Use a list instead of individual named objects. Then do.call(summarize_ds_models, my_list_of_models). If it isn't done already, you can Filter the list first to make sure only working models are in the list.
solved my problem. Thanks

Inner functions pulls call from outer function and causes error

I'm using a function from the library leaps within another function. The last two rows of the leaps function in question goes:
rval$call <- sys.call(sys.parent())
rval
This apparently causes the call to the outer function to be passed to rval$call. And the actual call to the regsubsets function is needed as an argument later on.
Below an example to illustrate:
library(leaps)
#Create some sample data to perform a regression on
inda <- rnorm(100)
indb <- rnorm(100)
dep <- 2 + 0.1*inda + 0.2*indb + rnorm(100, sd = 0.3)
dfk <- data.frame(dep=dep, inda = inda, indb = indb)
#Create some arbitrary outer function
test <- function(dependent, data){
best.fit <- regsubsets(as.formula(paste0(dependent, " ~ .")), data = data, nvmax = 2)
return(best.fit)
}
#Call outer function
best <- test("dep", dfk)
best$call #Returns "test("dep", dfk)"
So best$call will contain the call to the outer function (test), and not the call to the inner (regsubsets) function. As it's not really an option to change the inner function, is there any way of avoiding this problem?
EDIT:
One way around the problem could be something like this:
test <- function(dependent, data){
thecall <- 'regsubsets(as.formula(paste0(dependent, " ~ .")), data = data, nvmax = 2)'
best.fit <- eval(parse(text = thecall))
#best.fit$call <- [some transformation of thecall
return(best.fit)
}
EDIT2:
The reason I need to access what's inside $call is that it's needed in a predict function that I copied from Introduction to statitical learning:
predict.regsubsets <- function(regsubset_model, newdata, id, ...){
form <- as.formula(regsubset_model$call[[2]])
mat <- model.matrix(form, newdata)
coefi <- coef(regsubset_model, id = id)
xvars <- names(coefi)
mat[, xvars] %*% coefi
}
In the second line it uses $call
I’m still not entirely clear on how this is going to be used but in the case of your test function, you could write the following code:
test = function (dependent, data) {
regsubsets_call = bquote(regsubsets(.(as.formula(paste0(dependent, " ~ ."))),
data = .(substitute(data)), nvmax = 2))
best_fit = eval(regsubsets_call)
best_fit$call = regsubsets_call
best_fit
}
However, the result may not work with downstream functions the package provides (though, realistically, it probably will; I’m guessing summary.regsubsets only uses it to print the call).
What’s going on here?
bquote constructs an unevaluated R expression; it’s similar to quote but it allows you to interpolate values (similar to substitute). substitute(data) means that, rather than putting the actual data.frame into the call (which would lead to a very unwieldy output, it puts the variable name (or expression) the user passed to test. So if the user called it as test('mpg', mtcars), then the resulting expression would be
regsubsets(mpg ~ ., data = mtcars, nvmax = 2)
The resulting call object is then (a) evaluated via eval, and (b) stored in the resulting $call.
Incidentally, the formula can (and, as far as I’m concerned, should) be constructed in the same way; no need to parse a string:
as.formula(bquote(.(as.name(dependent)) ~ .))
Taken together, the whole expression would then become:
formula = as.formula(bquote(.(as.name(dependent)) ~ .))
regsubsets_call = bquote(regsubsets(.(formula), data = .(substitute(data)), nvmax = 2))

Working with dataframe variable names passed as arguments to the R function

I want to fit a statistical model (e.g. a linear one) for arbitrary variables from the dataframe passed as arguments to a function.
E.g.
myLm = function(x,y) {
fit = lm(y~x, data=mtcars)}
myLm("cyl","mpg")
But this does not work.
How do I do that correctly?
You have to pass a formula object to lm function. But you give characters in input of myLM.
You can change your function in this way
myLm = function(x,y) {
fit = lm(formula(paste(x,"~",y)), data=mtcars)
}
myLm("cyl","mpg")
You still give characters input but they are converted inside your function myLM
Why don't you pass a formula object directly:
myLm <- function(f)
lm(f, data=mtcars)
myLm(mpg ~ cyl)

Resources