I am writing a function that takes two variables and separately regresses each of them on a set of controls expressed as a one-sided formula. Right now I'm using the following to make the formula for one of the regressions, but it feels a bit hacked-up:
foo <- function(x, y, controls) {
cl <- match.call()
xn <- cl[["x"]]
xf <- as.formula(paste(xn, deparse(controls)))
}
I'd prefer to do this using update.formula(), but of course update.formula(controls, x ~ .) and update.formula(controls, as.name(x) ~ .) don't work. What should I be doing?
Here's one approach:
right <- ~ a + b + c
left <- ~ y
left_2 <- substitute(left ~ ., list(left = left[[2]]))
update(right, left_2)
But I think you'll have to either paste text strings together, or use substitute. To the best of my knowledge, there are no functions to create one two sided formula from two one-sided formulas (or similar equivalents).
I am not sure about update.formula(), but I have used the approach you take here of pasting text and converting it via as.formula in the past with success. My reading of help(update.formula) does not make me think you can substitute the left-hand side as you desire.
Lastly, trust the dispatching mechanism. If you object is of type formula, just call update which is preferred over the explicit update.formula.
Related
I'm trying to understand why
foo = function(d,y,x) {
fit = with(d, lm(y ~ x))
}
foo(myData, Y, X)
won't work, where for instance
myData = data.frame(Y=rnorm(50), X=runif(50))
The bit that seems tricky to me is passing the arguments x and y to a formula, as in lm(y ~ x).
#DMT's answer explains what's going on nicely.
Here are the hoops to jump through if you want things to work as you expect:
lmwrap <- function(d,y,x) {
ys <- deparse(substitute(y))
xs <- deparse(substitute(x))
f <- reformulate(xs,response=ys)
return(lm(f,data=d))
}
mydata <- data.frame(X=1:10,Y=rnorm(10))
lmwrap(mydata,Y, X)
Or it can be simplified a bit if you pass the column names as strings rather than symbols.
lmwrap <- function(d,y,x) {
f <- reformulate(xs, response=ys)
return(lm(f, data=d))
}
lmwrap(mydata, "Y", "X")
This approach will be a bit fragile, e.g. if you pass arguments through another function. Also, getting the "Call" part of the formula to read Y~X takes more trickery ...
Y and X are your column names, not variables. They wouldn't in this case, be arguments to your function unless you passed them in as strings and essentially call
lm(mydata[,"Y"]~ mydata[,"X"])
If you were to run ls() on your console, Y and X would most likely not be there, so the function won't work. Print both x and y prior to the fit = call, and you'll likely see NULLs, which won't fly in lm.
One way to do this in your form is the following
lmwrap<-function(df, yname, xname){
fit=lm(d[,yname] ~ d[,xname])
}
lmwrap(mydata,"Y", "X")
But you could just make the lm call like regular
I need to use mixed model lme function many times in my code. But I do not know how to use it within a function. If used otherwise, the lme function works just well but when used within the function, it throws errors:
myfunc<- function(cc, x, y, z)
{
model <- lme(fixed = x ~1 , random = ~ 1|y/z,
data=cc,
method="REML")
}
on calling this function:
myfunc (dbcon2, birthweight, sire, dam)
I get the error :
Error in model.frame.default(formula = ~x + y + z, data = list(animal
= c("29601/9C1", : invalid type (list) for variable 'x'
I think, there is a different procedure for using this which I am unaware of. Any help would be greatly appreciated.
Thanks in advance
Not sure if you are looking for this, you may try to use this, as correctly pointed out by #akrun, you may use paste, I am using paste0 however(its a special case of paste), paste concatenates two strings:
Here the idea is to concatenate the variable names with the formula, but since paste converts it to a string hence you can't refer that as formula to build a model,so you need to convert that string to a formula using as.formula which is wrapped around paste0 statement.
To understand above, Try writing a formula like below using paste:
formula <-paste0("mpg~", paste0("hp","+", "am"))
print(formula)
[1] "mpg~hp+am"
class(formula)
[1] "character" ##This should ideally be a formula rather than character
formula <- as.formula(formula) ##conversion of character string to formula
class(formula)
[1] "formula"
To work inside a model, you would always require a formula object, also please also try to learn about collapse and sep option in paste they are very handy.
I don't have your data , hence I have used mtcars data to represent the same.
library("nlme")
myfunc<- function(cc, x, y, z)
{
model <- lme(fixed = as.formula(paste0(x," ~1")) , random = as.formula(paste0("~", "1|",y,"/",z)),
data=cc,
method="REML")
}
models <- myfunc(cc=mtcars, x="hp", y="mpg", z="am")
summary(models)
You can read more about paste by typing ?paste in your console.
Let me explain my goal first because while the title expresses my strategy, I don't think it is likely to be the only way to solve the problem.
I have an R function to which I pass fitted model objects, like those from lm, and the function extracts the model frame, saves that as a data frame, standardizes the variables in the new data frame, then refits the model with the standardized variables to ease the interpretation of the model's coefficients.
Example code without wrapping it in a function:
mod <- lm(mpg ~ wt, data = mtcars)
new_data <- model.frame(mod)
new_data <- data.frame(lapply(new_data, FUN = scale))
standardized_mod <- update(mod, data = new_data)
Now a summary of standardized_mod by virtue of being fitted with standardized data will give standardized coefficients.
This isn't the most efficient way of doing things, I admit, since I could do something like multiplying the estimates and SEs by each variable's standard deviation. But in the context of the function, I'm trying to be more flexible; this gets less straightforward when working with survey package objects and the like. I also use the same logic to fit models with interaction terms for simple slopes analysis. But this is besides the main point of the question, I just want to offer some explanation to avoid getting bogged down with "there's other ways to standardize coefficients" responses. I'm more interested in this general problem with formulae than the specific application.
The solution above falls apart when a function is applied to any of the variables. For example,
mod <- lm(mpg ~ log(wt), data = mtcars)
new_data <- model.frame(mod)
new_data <- data.frame(lapply(new_data, FUN = scale), check.names = FALSE)
standardized_mod <- update(mod, data = new_data)
This will break on update(mod, data = new_data), because lm is going to look for a column called wt to apply log to in new_data, which only has columns called mpg and log(wt).
What I would like to do is manipulate the model formula in such a way that it goes from mpg ~ log(data) to mpg ~ `log(data)`. Of course, if it was just log I was worried about, I might be able to get something really hacky going to address it. But I'd like to be able to do the same regardless of the function in the formula, like if it's poly or some such.
Here are some solutions I've considered:
Instead of update, re-fit the model with lm directly and use the . for the RHS of the formula. This would work for some cases, but has big drawbacks, too. This will ignore any interaction terms in the original formula or other arithmetic uses of the formula from the original model. It also won't fix the problem if the function was applied to the LHS of the formula in the original model.
Use some kind of convoluted regex matching to isolate terms that appear to be functions on the basis of being right before (, but as a general rule I'm fearful of using string manipulation since it may fail in confusing ways. I'm not completely ruling this route out, but I haven't wrapped my head around how to do it safely and am not sure how to match terms with functions without accidentally capturing other parts of the formula.
I've tried messing around with the terms object and trying to use that as a way to use update on the formula itself, but haven't had much luck figuring out how to edit the terms object in the right ways.
We can avoid having to re-create the formula like this. mm0 is the model matrix columns except for the intercept. scale that giving mm0_std0. Now compute the new standardized lm:
mod <- lm(mpg ~ log(wt) * qsec, data = mtcars)
response <- mod$model[1]
mm0 <- model.matrix(mod)[, -1]
mm0_std <- scale(mm0)
mod_std <- lm(cbind(response, mm0_std))
If you do want the formula this will give it:
formula(mod_std)
## mpg ~ `log(wt)` + qsec + `log(wt):qsec`
## <environment: 0x000000000b1988c8>
I've thought of another potential solution as well, but I've not extensively tested it and it uses regex, which is in my understanding not the most R way of doing things.
mod <- lm(mpg ~ log(wt) * qsec, data = mtcars)
new_data <- model.frame(mod)
new_data <- data.frame(lapply(new_data, FUN = scale), check.names = FALSE)
We have the usual start, above.
Now I pull the variable names from the terms object.
vars <- as.character(attributes(terms(mod))$variables)
vars <- vars[-1] # gets rid of "list"
And save the full formula as a string.
char_form <- as.character(deparse(formula(mod)))
Now I iterate through the variables and use regex to surround each one in backticks. This gets around the trickier regex I was worried about with regard to detect which variables had functions applied.
for (var in vars) {
backtick_name <- paste("`", var, "`", sep = "")
char_form <- gsub(var, backtick_name, char_form, fixed = TRUE)
}
If I want to specify a variable not to standardize, like the outcome variable, I can exclude it from the vars vector programmatically. For instance, I can do this:
response <- as.character(formula(mod))[2]
vars <- vars[vars != response]
Of course, we can remove the response by dropping the first item in the list, but the above is for demonstrative purposes.
Now I can refit the model with the new data and new formula.
new_model <- update(mod, formula = as.formula(char_form), data = new_data)
In this narrow case, I don't really need to use update since I have all I need for lm. But if I was starting with a glm object or some other model, other user-supplied arguments like family are preserved.
Note: Weights and offsets can be problematic here, but it's not an intractable problem. I think the most straightforward thing to do is explicitly exclude columns named "(weights)" and "(offset)" from the model frame before scaling, then cbinding it back together afterwards. Then the user can use conditionals or some such to decide when to supply weights = `(weights)` and offset = `(offset)` arguments to update.
Built-in functions in R can be used in formula objects, for example
reg1 = lm(y ~ log(x), data = data1)
How can I write my functions such that they can be used in formula objects?
fnMyFun = function(x) {
return(x^2)
}
reg2 = lm(y ~ fnMyFun(x), data = data1)
What you've got certainly works. One problem is that different modelling functions handle formulas in different ways. I think that as long as you return something that model.matrix can make sense of, you'll be fine. That would mean
The function is vectorised; ie given a vector of length N, it returns a result also of length N
It has to return an atomic vector or matrix (but not a list, or of type raw)
I want to run a multiple comparisons analysis for the different variables of a model. My idea is as follows:
library(multcomp)
set.seed(123)
x1 <- gl(4,10)
x2 <- gl(5,2,40)
y <- rnorm(40)
fm1 <- lm(y ~ x1 + x2)
for(var in c('x1', 'x2'))
{
mc1 <- glht(fm1, linfct=mcp(var='Tukey'))
print(summary(mc1))
}
When I run, I get the following error:
Error en mcp2matrix(model, linfct = linfct) :
Variable(s) ‘var’ have been specified in ‘linfct’ but cannot be found in ‘model’!
That is, it is not possible to use a character to specify an attribute of the mcp function.
Anyone knows a solution?
It's generally better to avoid working with strings representing code wherever possible - it prevents errors that are hard to debug, and aesthetically is much more elegant. This problem turns out to be fairly easy to solve if you use do.call and the setNames function:
var <- "x1"
cmp <- do.call(mcp, setNames(list("Tukey"), var))
glht(fm1, linfct = cmp)
You can't use substitute here because it does not allow you modify the names of function parameters. I have some intuition for why this is reasonable, but not enough to explain it :/
If you're a package author, it's a good idea to provide an alternative version of functions that use unusual syntax so they can be accessed programmatically without jumping through hoops.
(Update: Make sure to see Hadley's answer for the better way of doing this, without resorting to string-pasting. My answer will still be useful for explaining why that is harder-than-usual in this case.)
The peculiarities of mcp() require you to use the relatively brute force approach of pasting together the expression you'd like to evaluate and then passing it through eval(parse()).
The tricky bit is that mcp() interprets its first argument in a nonstandard way. Within mcp(), x1 = 'Tukey' does not (as it normally would) mean "assign a value of 'Tukey' to the argument x1". Instead, the whole thing is interpreted as a symbolic description of the intended contrasts. (In this, it is much like more familiar formula objects such as the y ~ x1 + x2 in your lm() call).
for(var in c('x1', 'x2')) {
# Construct a character string with the expression you'd type at the command
# line. For example : "mcp(x1 = 'Tukey')"
exprString <- paste("mcp(", var, "='Tukey')")
# eval(parse()) it to get an 'mcp' object.
LINFCT <- eval(parse(text = exprString))
mc1 <- glht(fm1, linfct = LINFCT)
print(summary(mc1))
}
Have you tried: eval(parse(text='variable'))
or assign ?