I'm trying to make a function that will run and compare a set of models given a dataset and a variable name (essentially to be able to change just one model set and have them apply to all relevant dependent variables--selecting an a priori modelset to compare rather than using a data-dredging existing function like glmulti). A simple example:
RunModelset<- function(dataset, response)
{
m1 <- lm(formula=response ~ 1, data=dataset)
m2 <- lm(formula=response ~ 1 + temperature, data=dataset)
comp <- AICctab(m1,m2, base = T, weights = T, nobs=length(data))
return(comp)
}
If I manually enter a specific variable name within the function, it runs the models correctly. However, using the code above and entering a text value for the response argument doesn't work:
RunModel(dataset=MyData,response="responsevariablename")
yields an error: invalid type (NULL) for variable 'dataset$response', which I interpret to mean it isn't finding the column I'm telling it to use. My problem must be in how R inserts a text value as an argument in the function.
How do I enter the response variable name so R knows that "formula=response ~" becomes "formula=dataset$responsevariablename ~"?
ETA Working answer based on this solution:
RunModel<- function(dataset, response)
{
resvar <- eval(substitute(response),dataset)
m1 <- lm(formula=resvar ~ 1, data=dataset)
m2 <- lm(formula=resvar ~ 1 + R.biomass, data=dataset)
comp <- AICctab(m1,m2, base = T, weights = T, nobs=length(data))
return(comp)
}
RunModel(dataset=MyData,response=responsevariablename)
NB - this didn't work when I had quotes on the response argument.
You should be able to use match.call() to achieve this.
See this post
Related
I am trying to create my own function that contains 1.) the mgcv gamm function and 2.) a nested autocorrelation (ARMA) argument. I am getting an error when I try to run the function like this:
df <- AirPassengers
df <- as.data.frame(df)
df$month <- rep(1:12)
df$yr <- rep(1949:1960,each=12)
df$datediff <- 1:nrow(df)
try_fxn1 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+s(datediff,bs="ts",k=20), data=dfz,correlation = corARMA(form = ~ 1|yr, p=2))}
try_fxn1(df,"x")
Error in eval(predvars, data, env) : object 'dfz' not found
I know the issue is with the correlation portion of the formula, as when I run the same function without the correlation structure included (as seen below), the function behaves as expected.
try_fxn2 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+ s(datediff,bs="ts",k=20), data=dfz)}
try_fxn2(df,"x")
Any ideas on how I can modify try_fxn1 to make the function behave as expected? Thank you!
You are confusing a vector with the symbolic representation of that vector when building a formula.
You don't want dfz[[colz]] as the response in the formula, you want x or whatever you set colz too. What you are getting is
dfz[[colz]] ~ ...
when what you really want is the variable colz:
colz ~ ...
And you don't want a literal colz but whatever colz evaluates to. To do this you can create a formula by pasting the parts together:
fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')
This turns colz into whatever it was storing, not the literal colz:
> fml
[1] "x ~ s(month, bs=\"cc\", k=12) + s(datediff,bs=\"ts\",k=20)"
Then convert the string into a formula object using formula() or as.formula().
The final solution then is:
fit_fun <- function(dfz, colz) {
fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')
fml <- formula(fml)
gamm(fml, data = df, correlation = corARMA(form = ~ 1|yr, p=2))
}
This really is not an issue with corARMA() part, other than that triggers somewhat different evaluation code for the formula. The guiding mantra here is to always get a formula as you would type it if not programming with formulas. You would never (or should never) write a formula like
gamm(df[[var]] ~ x + s(z), ....)
While this might work in some settings, it will fail miserably if you ever want to use predict()` and it fails when you have to do something a little more complicated.
Non-Standard evaluation in R. I want to send a formula to a function that uses lm.
I have a data frame with one response: y and many predictors. I will fit a model inside a function. The function will receive a filtering criteria as a string and the name of the predictor variable as a string. The response will enter as a name. The function will filter on the filter criteria, then fit the a model using the predictor variable that was sent to it as a string. I can't get the predictor string to work correctly.
This is very close to using non-standard evaluation with formula.
In fact I illustrate that solution, which gets me part of the way there.
Difference: I want to send a string with the value of my predictor instead of sending the predictor to the function.
Use Case: Eventually I will put this in a shiny ap and let the user select the predictor and response as well as the filter.
Here is what works:
# create a data frame.
n <- 100
levels_1 <- sample(c("a","b","c"),n,replace=TRUE)
levels_2 <- sample(c("a","b","c"),n,replace=TRUE)
d <-tibble(l_1 = levels_1 ,l_2 = levels_2, y = rnorm(n))
# A function that works
my_lm <- function(d,predictor,response,filter_criteria){
d1 <- d %>% filter(l_2 == 'a')
lm(y ~ l_1,data=d1)
}
my_lm(d,l_1,y,'a')
my_lm2 <- function(d,predictor,response,filter_criteria){
enquo_predictor <- enquo(predictor)
enquo_response <- enquo(response)
enquo_filter_criteria <- enquo(filter_criteria)
d1 <- d %>% filter(l_2 == !!filter_criteria)
form <- as.formula(paste(enquo_response, " ~ ", predictor)[2])
# form <- as.formula(paste(enquo_response, " ~ ", enquo_predictor)[2]) wrong way to do it.
lm(form,data=d1)
#lm(!!enqu_preditor ~ !!enquo_response,data=d1)
}
selected_var <- names(d)[1]
selected_var
filter_value <- 'a'
my_lm2(d,l_1,y,filter_value) # This works but is not what I want.
my_lm2(d,selected_var,y,filter_value) # This does not work but is what I want to work.
I need some clarification on the primary post on Passing a data.frame column name to a function
I need to create a function that will take a testSet, trainSet, and colName(aka predictor) as inputs to a function that prints a plot of the dataset with a GAM model trend line.
The issue I run into is:
plot.model = function(predictor, train, test) {
mod = gam(Response ~ s(train[[predictor]], spar = 1), data = train)
...
}
#Function Call
plot.model("Predictor1", 1.0, crime.train, crime.test)
I can't simply pass the predictor as a string into the gam function, but I also can't use a string to index the data frame values as shown in the link above. Somehow, I need to pass the colName key to the game function. This issue occurs in other similar scenarios regarding plotting.
plot <- ggplot(data = test, mapping = aes(x=predictor, y=ViolentCrimesPerPop))
Again, I can't pass a string value for the column name and I can't pass the column values either.
Does anyone have a generic solution for these situations. I apologize if the answer is buried in the above link, but it's not clear to me if it is.
Note: A working gam function call looks like this:
mod = gam(Response ~ s(Predictor1, spar = 1.0), data = train)
Where the train set is a data frame with column names "Response" & "Predictor".
Use aes_string instead of aes when you pass a column name as string.
plot <- ggplot(data = test, mapping = aes_string(x=predictor, y=ViolentCrimesPerPop))
For gam function:: Example which is copied from gam function's documentation. I have used vector, scalar is even easier. Its just using paste with a collapse parameter.
library(mgcv)
set.seed(2) ## simulate some data...
dat <- gamSim(1,n=400,dist="normal",scale=2)
# String manipulate for formula
formula <- as.formula(paste("y~s(", paste(colnames(dat)[2:5], collapse = ")+s("), ")", sep =""))
b <- gam(formula, data=dat)
is same as
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
I'm using a function from the library leaps within another function. The last two rows of the leaps function in question goes:
rval$call <- sys.call(sys.parent())
rval
This apparently causes the call to the outer function to be passed to rval$call. And the actual call to the regsubsets function is needed as an argument later on.
Below an example to illustrate:
library(leaps)
#Create some sample data to perform a regression on
inda <- rnorm(100)
indb <- rnorm(100)
dep <- 2 + 0.1*inda + 0.2*indb + rnorm(100, sd = 0.3)
dfk <- data.frame(dep=dep, inda = inda, indb = indb)
#Create some arbitrary outer function
test <- function(dependent, data){
best.fit <- regsubsets(as.formula(paste0(dependent, " ~ .")), data = data, nvmax = 2)
return(best.fit)
}
#Call outer function
best <- test("dep", dfk)
best$call #Returns "test("dep", dfk)"
So best$call will contain the call to the outer function (test), and not the call to the inner (regsubsets) function. As it's not really an option to change the inner function, is there any way of avoiding this problem?
EDIT:
One way around the problem could be something like this:
test <- function(dependent, data){
thecall <- 'regsubsets(as.formula(paste0(dependent, " ~ .")), data = data, nvmax = 2)'
best.fit <- eval(parse(text = thecall))
#best.fit$call <- [some transformation of thecall
return(best.fit)
}
EDIT2:
The reason I need to access what's inside $call is that it's needed in a predict function that I copied from Introduction to statitical learning:
predict.regsubsets <- function(regsubset_model, newdata, id, ...){
form <- as.formula(regsubset_model$call[[2]])
mat <- model.matrix(form, newdata)
coefi <- coef(regsubset_model, id = id)
xvars <- names(coefi)
mat[, xvars] %*% coefi
}
In the second line it uses $call
I’m still not entirely clear on how this is going to be used but in the case of your test function, you could write the following code:
test = function (dependent, data) {
regsubsets_call = bquote(regsubsets(.(as.formula(paste0(dependent, " ~ ."))),
data = .(substitute(data)), nvmax = 2))
best_fit = eval(regsubsets_call)
best_fit$call = regsubsets_call
best_fit
}
However, the result may not work with downstream functions the package provides (though, realistically, it probably will; I’m guessing summary.regsubsets only uses it to print the call).
What’s going on here?
bquote constructs an unevaluated R expression; it’s similar to quote but it allows you to interpolate values (similar to substitute). substitute(data) means that, rather than putting the actual data.frame into the call (which would lead to a very unwieldy output, it puts the variable name (or expression) the user passed to test. So if the user called it as test('mpg', mtcars), then the resulting expression would be
regsubsets(mpg ~ ., data = mtcars, nvmax = 2)
The resulting call object is then (a) evaluated via eval, and (b) stored in the resulting $call.
Incidentally, the formula can (and, as far as I’m concerned, should) be constructed in the same way; no need to parse a string:
as.formula(bquote(.(as.name(dependent)) ~ .))
Taken together, the whole expression would then become:
formula = as.formula(bquote(.(as.name(dependent)) ~ .))
regsubsets_call = bquote(regsubsets(.(formula), data = .(substitute(data)), nvmax = 2))
I've done a fair amount of reading here on SO and learned that I should generally avoid manipulation of formula objects as strings, but I haven't quite found how to do this in a safe manner:
tf <- function(formula = NULL, data = NULL, groups = NULL, ...) {
# Arguments are unquoted and in the typical form for lm etc
# Do some plotting with lattice using formula & groups (works, not shown)
# Append 'groups' to 'formula':
# Change y ~ x as passed in argument 'formula' to
# y ~ x * gr where gr is the argument 'groups' with
# scoping so it will be understood by aov
new_formula <- y ~ x * gr
# Now do some anova (could do if formula were right)
model <- aov(formula = new_formula, data = data)
# And print the aov table on the plot (can do)
print(summary(model)) # this will do for testing
}
Perhaps the closest I came was to use reformulate but that only gives + on the RHS, not *. I want to use the function like this:
p <- tf(carat ~ color, groups = clarity, data = diamonds)
and have the aov results for carat ~ color * clarity. Thanks in Advance.
Solution
Here is a working version based on #Aaron's comment which demonstrates what's happening:
tf <- function(formula = NULL, data = NULL, groups = NULL, ...) {
print(deparse(substitute(groups)))
f <- paste(".~.*", deparse(substitute(groups)))
new_formula <- update.formula(formula, f)
print(new_formula)
model <- aov(formula = new_formula, data = data)
print(summary(model))
}
I think update.formula can solve your problem, but I've had trouble with update within function calls. It will work as I've coded it below, but note that I'm passing the column to group, not the variable name. You then add that column to the function dataset, then update works.
I also don't know if it's doing exactly what you want in the second equation, but take a look at the help file for update.formula and mess around with it a bit.
http://stat.ethz.ch/R-manual/R-devel/library/stats/html/update.formula.html
tf <- function(formula,groups,d){
d$groups=groups
newForm = update(formula,~.*groups)
mod = lm(newForm,data=d)
}
dat = data.frame(carat=rnorm(10,0,1),color=rnorm(10,0,1),color2=rnorm(10,0,1),clarity=rnorm(10,0,1))
m = tf(carat~color,dat$clarity,d=dat)
m2 = tf(carat~color+color2,dat$clarity,d=dat)
tf2 <- function(formula, group, d) {
f <- paste(".~.*", deparse(substitute(group)))
newForm <- update.formula(formula, f)
lm(newForm, data=d)
}
mA = tf2(carat~color,clarity,d=dat)
m2A = tf2(carat~color+color2,clarity,d=dat)
EDIT:
As #Aaron pointed out, it's deparse and substitute that solve my problem: I've added tf2 as the better option to the code example so you can see how both work.
One technique I use when I have trouble with scoping and calling functions within functions is to pass the parameters as strings and then construct the call within the function from those strings. Here's what that would look like here.
tf <- function(formula, data, groups) {
f <- paste(".~.*", groups)
m <- eval(call("aov", update.formula(as.formula(formula), f), data = as.name(data)))
summary(m)
}
tf("mpg~vs", "mtcars", "am")
See this answer to one of my previous questions for another example of this: https://stackoverflow.com/a/7668846/210673.
Also see this answer to the sister question of this one, where I suggest something similar for use with xyplot: https://stackoverflow.com/a/14858661/210673