I have a large set of model specifications to test, which share a dv but have unique IVs. In the following example
foo <- data.frame(dv = sample(c(0,1), 100, replace=T),
x1 = runif(100),
x2 = runif(100))
I want the first model to only include x1, the second x2, the third both, and the fourth their interaction. So I thought a sensible way would be to build a list of formula statements:
bar <- list("x1",
"x2",
"x1+x2",
"x1*x2")
which I would then use in a llply call from the plyr package to obtain a list of model objects.
require(plyr)
res <- llply(bar, function(i) glm(dv ~ i, data = foo, family = binomial()))
Unfortunately I'm told
Error in model.frame.default(formula = dv ~ i, data = foo, drop.unused.levels = TRUE):variable lengths differ (found for 'i')
Obviously I'm mixing up something fundamental--do I need to manipulate the original foo list in some fashion?
Your problem is with how you are specifying the formula, since inside the function i is a variable. This would work:
glm(paste("dv ~", i), data = foo, family = binomial())
The problem is that dv ~ i isn't a formula. i is (inside the anonymous function) simply a symbol that represents a variable containing a character value.
Try this:
bar <- list("dv~x1",
"dv~x2",
"dv~x1+x2",
"dv~x1*x2")
res <- llply(bar, function(i) glm(i, data = foo, family = binomial()))
But setting statistical issues aside, it might possibly be easier to use something like ?step or ?stepAIC in the MASS package for tasks similar to this?
Related
I've encounter a weird behavior when calling lm within a lapply using the weights argument.
My code consist of a list of formula on which I run a linear model that I call in lapply. So far it was working:
dd <- data.frame(y = rnorm(100),
x1 = rnorm(100),
x2 = rnorm(100),
x3 = rnorm(100),
x4 = rnorm(100),
wg = runif(100,1,100))
ls.form <- list(
formula(y~x1+x2),
formula(y~x3+x4),
formula(y~x1|x2|x3),
formula(y~x1+x2+x3+x4)
)
res.no.wg <- lapply(ls.form, lm, data = dd)
However, when I add the weights argument, I get a weird error:
res.with.wg <- lapply(ls.form, lm, data = dd, weights = dd[,"wg"])
Error in eval(extras, data, env) :
..2 used in an incorrect context, no ... to look in
It's like if the ... from lapply was conflicting with the ... of the lm call but only because of the weights argument.
Any idea was is the cause of this problem and how to fix it?
NOTE: using the call without the lapply works as expected:
lm(ls.form[[1]], data = dd, weights = dd[,"wg"] )
Call:
lm(formula = ls.form[[1]], data = dd, weights = dd[, "wg"])
Coefficients:
(Intercept) x1 x2
-0.12020 0.06049 -0.01937
EDIT The final call is a lapply within a function of the type:
f1 <- function(samp, dat, wgt){
res.with.wg2 <- lapply(ls.form, function(x) {lm(formula = x, data=dat[samp,], weights=dat[samp,wgt])})
}
f1(1:66, dat=dd, wgt = "wg")
I am not really sure why it is not working, but I do think I have a fix for you:
res.with.wg2 <- lapply(ls.form,
function(x) {lm(formula = x, data=dd, weights=dd[,"wg"])})
Hope this helps!
There is a note in the help file for lapply:
For historical reasons, the calls created by lapply are unevaluated,
and code has been written (e.g., bquote) that relies on this. This
means that the recorded call is always of the form FUN(X[[i]], ...),
with i replaced by the current (integer or double) index. This is not
normally a problem, but it can be if FUN uses sys.call or match.call
or if it is a primitive function that makes use of the call. This
means that it is often safer to call primitive functions with a
wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is
required to ensure that method dispatch for is.numeric occurs
correctly.
lm uses match.call twice in its opening lines:
cl <- match.call()
mf <- match.call(expand.dots = FALSE)
The solution noted in the help file and by #Florian is to use an anonymous function wrapper.
Update
For this specific problem of changing the model formula, you can rewrite to avoid calling lm within the lapply by using update instead:
# create base model (the formula here doesn't really matter, but we can specify the weights safely here)
baselm <- lm(y+x1,data=dd,weights=dd[,"wg"])
# update with lapply
lapply(ls.form,update,object=baselm)
[[1]]
Call:
lm(formula = y ~ x1 + x2, data = dd, weights = dd[, "wg"])
Coefficients:
(Intercept) x1 x2
0.07561 0.16111 0.15014
...
I am beginning to write a function that builds linear mixed models with nlme. I am encountering an error: Error in eval(expr, envir, enclos) : object 'value' not found, which I believe is due to R not knowing where to find the data frame variables (e.g., value). If this is, in fact, why the error is occurring, how do I tell the function that value and timepoint belong to the variables in Dat in the (reproducible) code below?
require(nlme)
Dat <- data.frame(
id = sample(10:19),
Time = sample(c("one", "two"), 10, replace = T),
Value = sample(1:10)
)
nlme_rct_lmm <- function (data, value, timepoint,
ID) {
#base_level intercept only model
bl_int_only <- gls(value ~ 1,
data = data,
method = "ML",
na.action="na.omit")
#vary intercept across participants
randomIntercept <- lme(value ~ 1,
data = data,
random = ~1|ID,
method = "ML",
na.action = "na.omit")
#add timepoint as a fixed effect
timeFE <- lme(value ~ timepoint,
data = data,
random = ~1|ID,
method = "ML",
na.action = "na.omit")
}
nlme_rct_lmm(Dat, Value, Time, id)
This isn't (as you and I both expected) a problem with evaluation within different frames; rather, it's an issue of consistency between the names of the variables between the formula and the data. R is case-sensitive, so it matters whether you use value or Value, id or ID, etc.. Furthermore, formula interpretation uses non-standard evaluation (NSE), so if you have a variable value equal to the symbol Value, value ~ 1 does not magically get transmuted to Value ~ 1. What I've outlined below works by passing the names of the response, time, and ID variables to the function, because it's the easiest approach. It's a little bit more elegant to the end-user if you use non-standard evaluation, but that's a bit harder to program (and therefore understand, debug, etc.).
Below the easy/boneheaded approach, I also discuss how to implement the NSE approach (scroll all the way down ...)
Note that your example doesn't return anything; with R, that means that all the results will be discarded when it finishes the function. You might want to return the results as a list (or perhaps your real function will do something other stuff with the fitted models, such as a series of model tests, and return those answers as the results ...)
require(nlme)
Dat <- data.frame(
ID = sample(10:19),
Time = sample(c("one", "two"), 10, replace = T),
Value = sample(1:10)
)
nlme_rct_lmm <- function (data, value, timepoint,
ID) {
nullmodel <- reformulate("1",response=value)
fullmodel <- reformulate(c("1",timepoint),response=value)
remodel <- reformulate(paste("1",ID,sep="|"))
#base_level intercept only model
bl_int_only <- gls(nullmodel,
data = data,
method = "ML",
na.action="na.omit")
#vary intercept across participants
randomIntercept <- lme(nullmodel,
data = data,
random = remodel,
method = "ML",
na.action = "na.omit")
#add timepoint as a fixed effect
timeFE <- lme(fullmodel,
data = data,
random = remodel,
method = "ML",
na.action = "na.omit")
}
nlme_rct_lmm(Dat, "Value", "Time", "ID")
If you want something a bit more elegant (but internally obscure) you can substitute the following lines for defining the models. The inner substitute() calls retrieves the symbols that were passed to the function as arguments; the outer substitute() calls insert those symbols into the formula.
nullmodel <- formula(substitute(v~1,list(v=substitute(value))))
fullmodel <- formula(substitute(v~t,list(v=substitute(value),
t=substitute(timepoint))))
remodel <- formula(substitute(~1|i,list(i=substitute(ID))))
Now this would work, without specifying the variables as strings, as you expected: nlme_rct_lmm(Dat, Value, Time, ID)
I am looking for a simple way to pass function arguments that are variable names into formulae in R functions.
Test dataset:
set.seed(4892)
df.pass <- data.frame("alfa"=sample(1:9, 100, replace=T), "beta"=sample(1:9, 100, replace=T),
"theta"=sample(1:9, 100, replace=T), "out"=runif(100, 0, 1))
Example analysis (testing if interaction model is different) to be made into function:
lrtest(glm(out~alfa*beta, family = binomial("logit"), df.pass),
glm(out~alfa + beta, family = binomial("logit"), df.pass))
If the goal is to create the generic function invinteract that solves the problem above with arbitrary variable names and data sets, what would be the simplest way to pass variable names from function() arguments to formulae terms corresponding to the positions of out, alfa and beta?
Inserting the raw variable names into the formulae does not work, because R tries to evaluate the names as objects and finds nothing.
Inserting string variable names directly into formulae does not work either.
Is it necessary to reconstruct the formulae with paste(), or is there a more direct way?
glm also accepts a character string instead of a formula. Thus, you can do this:
mytest <- function(DF, y, x1, x2) {
lrtest(glm(sprintf("%s ~ %s * %s", y, x1, x2), family = binomial("logit"), DF),
glm(sprintf("%s ~ %s + %s", y, x1, x2), family = binomial("logit"), DF))
}
mytest(df.pass, "out", "alfa", "beta")
I would like to write a function and calls different sub-functions with parameters specified by string, such as:
genericModel <- function(model, dat, y, x, ...) {
fit <- get(model)(get(y) ~ get(x), data = dat, ...)
return(fit)
}
I am able to get it to work with simple cases:
> d <- data.frame(x.var = rnorm(10), y.var = rnorm(10), w = rep(1, 10))
> genericModel('lm', d, 'y.var', 'x.var')
Call:
get(model)(formula = get(y) ~ get(x), data = dat)
Coefficients:
(Intercept) get(x)
-0.04242 -0.31619
However, I have not been successful in terms of passing other optional arguments by string:
> genericModel('lm', d, 'y.var', 'x.var', weights = 'w')
Error in model.frame.default(formula = get(y) ~ get(x), data = dat, weights = "w", :
variable lengths differ (found for '(weights)')
I know I can do genericModel('lm', d, 'y.var', 'x.var', weights = d$w), but that defeats the purpose of creating a flexible function where I can specify the model and column names by string.
Also I can foresee complications where the optional parameters include both column names of the data.frame(ex:weights = w) and generic options for the sub-function(ex:na.action=na.pass).
EDIT:
Just to clarify, what I am hoping to achieve is:
genericModel('lm', d, 'y.var', 'x.var', weights = 'w')
genericModel('glm', d, 'y.var', 'x.var', family = 'binomial')
To run linear regression and logistic regression, respectively. I need some way to pass the optional arguments when calling genericModel.
Does anyone know how to deal with this? Thanks.
One suggestion: rather than fiddling with strings to specify analysis variables, what you should do is pass the formula. This is also much more flexible, since you'll be able to pass complicated model formulas directly to the underlying functions without any parsing.
If you do this, then obtaining what you want is simple with some language hacking. Get the call to the function, then manipulate it to call the model-fitting function instead.
genericModel <- function(mod, formula, data, ...)
{
cl <- match.call(expand=TRUE)
cl[[1]] <- cl$mod
cl$mod <- NULL
eval(cl, parent.frame())
}
genericModel(lm, mpg ~ hp, data=mtcars, weights=gear)
genericModel(glm, Volume ~ Girth + Height, data=trees, family=Gamma(link=log))
I've done a fair amount of reading here on SO and learned that I should generally avoid manipulation of formula objects as strings, but I haven't quite found how to do this in a safe manner:
tf <- function(formula = NULL, data = NULL, groups = NULL, ...) {
# Arguments are unquoted and in the typical form for lm etc
# Do some plotting with lattice using formula & groups (works, not shown)
# Append 'groups' to 'formula':
# Change y ~ x as passed in argument 'formula' to
# y ~ x * gr where gr is the argument 'groups' with
# scoping so it will be understood by aov
new_formula <- y ~ x * gr
# Now do some anova (could do if formula were right)
model <- aov(formula = new_formula, data = data)
# And print the aov table on the plot (can do)
print(summary(model)) # this will do for testing
}
Perhaps the closest I came was to use reformulate but that only gives + on the RHS, not *. I want to use the function like this:
p <- tf(carat ~ color, groups = clarity, data = diamonds)
and have the aov results for carat ~ color * clarity. Thanks in Advance.
Solution
Here is a working version based on #Aaron's comment which demonstrates what's happening:
tf <- function(formula = NULL, data = NULL, groups = NULL, ...) {
print(deparse(substitute(groups)))
f <- paste(".~.*", deparse(substitute(groups)))
new_formula <- update.formula(formula, f)
print(new_formula)
model <- aov(formula = new_formula, data = data)
print(summary(model))
}
I think update.formula can solve your problem, but I've had trouble with update within function calls. It will work as I've coded it below, but note that I'm passing the column to group, not the variable name. You then add that column to the function dataset, then update works.
I also don't know if it's doing exactly what you want in the second equation, but take a look at the help file for update.formula and mess around with it a bit.
http://stat.ethz.ch/R-manual/R-devel/library/stats/html/update.formula.html
tf <- function(formula,groups,d){
d$groups=groups
newForm = update(formula,~.*groups)
mod = lm(newForm,data=d)
}
dat = data.frame(carat=rnorm(10,0,1),color=rnorm(10,0,1),color2=rnorm(10,0,1),clarity=rnorm(10,0,1))
m = tf(carat~color,dat$clarity,d=dat)
m2 = tf(carat~color+color2,dat$clarity,d=dat)
tf2 <- function(formula, group, d) {
f <- paste(".~.*", deparse(substitute(group)))
newForm <- update.formula(formula, f)
lm(newForm, data=d)
}
mA = tf2(carat~color,clarity,d=dat)
m2A = tf2(carat~color+color2,clarity,d=dat)
EDIT:
As #Aaron pointed out, it's deparse and substitute that solve my problem: I've added tf2 as the better option to the code example so you can see how both work.
One technique I use when I have trouble with scoping and calling functions within functions is to pass the parameters as strings and then construct the call within the function from those strings. Here's what that would look like here.
tf <- function(formula, data, groups) {
f <- paste(".~.*", groups)
m <- eval(call("aov", update.formula(as.formula(formula), f), data = as.name(data)))
summary(m)
}
tf("mpg~vs", "mtcars", "am")
See this answer to one of my previous questions for another example of this: https://stackoverflow.com/a/7668846/210673.
Also see this answer to the sister question of this one, where I suggest something similar for use with xyplot: https://stackoverflow.com/a/14858661/210673