How is formula specified in R's lm function - r

I have a function in R, lets say
myfunction <- function(formula,data)
Among other things, the function contains a call to lm(). Formula should include the covariates, and should be specified as
formula = x1 + x2 + ... + x_n
Data contains columns Z and W, where the response
y=data$Z/data$W
I only want to have formula including the covariates, since the function modifies the response variable for each iteration.
The call for lm() should then work with
lm(y~formula,data=data)

why would you do that? It is cleaner to pass the whole formula in myfunction, for instance:
myfunction <- function(formula,data) {
data = data*2 # this is an example of data manipulation
lm(formula=formula, data=data)
}
then use myfunction as you would use lm
If you REALLY want to create complexity (for nothing?), you can also use the fact that lm will coerce whatever string you pass as an argument into a proper formula object
myfunction2 <- function(formula2,data) {
data = data*2 # this is an example of data manipulation
lm(formula=paste0("y~",formula2), data=data)
}

Related

lme within a user defined function in r

I need to use mixed model lme function many times in my code. But I do not know how to use it within a function. If used otherwise, the lme function works just well but when used within the function, it throws errors:
myfunc<- function(cc, x, y, z)
{
model <- lme(fixed = x ~1 , random = ~ 1|y/z,
data=cc,
method="REML")
}
on calling this function:
myfunc (dbcon2, birthweight, sire, dam)
I get the error :
Error in model.frame.default(formula = ~x + y + z, data = list(animal
= c("29601/9C1", : invalid type (list) for variable 'x'
I think, there is a different procedure for using this which I am unaware of. Any help would be greatly appreciated.
Thanks in advance
Not sure if you are looking for this, you may try to use this, as correctly pointed out by #akrun, you may use paste, I am using paste0 however(its a special case of paste), paste concatenates two strings:
Here the idea is to concatenate the variable names with the formula, but since paste converts it to a string hence you can't refer that as formula to build a model,so you need to convert that string to a formula using as.formula which is wrapped around paste0 statement.
To understand above, Try writing a formula like below using paste:
formula <-paste0("mpg~", paste0("hp","+", "am"))
print(formula)
[1] "mpg~hp+am"
class(formula)
[1] "character" ##This should ideally be a formula rather than character
formula <- as.formula(formula) ##conversion of character string to formula
class(formula)
[1] "formula"
To work inside a model, you would always require a formula object, also please also try to learn about collapse and sep option in paste they are very handy.
I don't have your data , hence I have used mtcars data to represent the same.
library("nlme")
myfunc<- function(cc, x, y, z)
{
model <- lme(fixed = as.formula(paste0(x," ~1")) , random = as.formula(paste0("~", "1|",y,"/",z)),
data=cc,
method="REML")
}
models <- myfunc(cc=mtcars, x="hp", y="mpg", z="am")
summary(models)
You can read more about paste by typing ?paste in your console.

Pass df column names to nested equation in Graph Printing Function

I need some clarification on the primary post on Passing a data.frame column name to a function
I need to create a function that will take a testSet, trainSet, and colName(aka predictor) as inputs to a function that prints a plot of the dataset with a GAM model trend line.
The issue I run into is:
plot.model = function(predictor, train, test) {
mod = gam(Response ~ s(train[[predictor]], spar = 1), data = train)
...
}
#Function Call
plot.model("Predictor1", 1.0, crime.train, crime.test)
I can't simply pass the predictor as a string into the gam function, but I also can't use a string to index the data frame values as shown in the link above. Somehow, I need to pass the colName key to the game function. This issue occurs in other similar scenarios regarding plotting.
plot <- ggplot(data = test, mapping = aes(x=predictor, y=ViolentCrimesPerPop))
Again, I can't pass a string value for the column name and I can't pass the column values either.
Does anyone have a generic solution for these situations. I apologize if the answer is buried in the above link, but it's not clear to me if it is.
Note: A working gam function call looks like this:
mod = gam(Response ~ s(Predictor1, spar = 1.0), data = train)
Where the train set is a data frame with column names "Response" & "Predictor".
Use aes_string instead of aes when you pass a column name as string.
plot <- ggplot(data = test, mapping = aes_string(x=predictor, y=ViolentCrimesPerPop))
For gam function:: Example which is copied from gam function's documentation. I have used vector, scalar is even easier. Its just using paste with a collapse parameter.
library(mgcv)
set.seed(2) ## simulate some data...
dat <- gamSim(1,n=400,dist="normal",scale=2)
# String manipulate for formula
formula <- as.formula(paste("y~s(", paste(colnames(dat)[2:5], collapse = ")+s("), ")", sep =""))
b <- gam(formula, data=dat)
is same as
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)

Inner functions pulls call from outer function and causes error

I'm using a function from the library leaps within another function. The last two rows of the leaps function in question goes:
rval$call <- sys.call(sys.parent())
rval
This apparently causes the call to the outer function to be passed to rval$call. And the actual call to the regsubsets function is needed as an argument later on.
Below an example to illustrate:
library(leaps)
#Create some sample data to perform a regression on
inda <- rnorm(100)
indb <- rnorm(100)
dep <- 2 + 0.1*inda + 0.2*indb + rnorm(100, sd = 0.3)
dfk <- data.frame(dep=dep, inda = inda, indb = indb)
#Create some arbitrary outer function
test <- function(dependent, data){
best.fit <- regsubsets(as.formula(paste0(dependent, " ~ .")), data = data, nvmax = 2)
return(best.fit)
}
#Call outer function
best <- test("dep", dfk)
best$call #Returns "test("dep", dfk)"
So best$call will contain the call to the outer function (test), and not the call to the inner (regsubsets) function. As it's not really an option to change the inner function, is there any way of avoiding this problem?
EDIT:
One way around the problem could be something like this:
test <- function(dependent, data){
thecall <- 'regsubsets(as.formula(paste0(dependent, " ~ .")), data = data, nvmax = 2)'
best.fit <- eval(parse(text = thecall))
#best.fit$call <- [some transformation of thecall
return(best.fit)
}
EDIT2:
The reason I need to access what's inside $call is that it's needed in a predict function that I copied from Introduction to statitical learning:
predict.regsubsets <- function(regsubset_model, newdata, id, ...){
form <- as.formula(regsubset_model$call[[2]])
mat <- model.matrix(form, newdata)
coefi <- coef(regsubset_model, id = id)
xvars <- names(coefi)
mat[, xvars] %*% coefi
}
In the second line it uses $call
I’m still not entirely clear on how this is going to be used but in the case of your test function, you could write the following code:
test = function (dependent, data) {
regsubsets_call = bquote(regsubsets(.(as.formula(paste0(dependent, " ~ ."))),
data = .(substitute(data)), nvmax = 2))
best_fit = eval(regsubsets_call)
best_fit$call = regsubsets_call
best_fit
}
However, the result may not work with downstream functions the package provides (though, realistically, it probably will; I’m guessing summary.regsubsets only uses it to print the call).
What’s going on here?
bquote constructs an unevaluated R expression; it’s similar to quote but it allows you to interpolate values (similar to substitute). substitute(data) means that, rather than putting the actual data.frame into the call (which would lead to a very unwieldy output, it puts the variable name (or expression) the user passed to test. So if the user called it as test('mpg', mtcars), then the resulting expression would be
regsubsets(mpg ~ ., data = mtcars, nvmax = 2)
The resulting call object is then (a) evaluated via eval, and (b) stored in the resulting $call.
Incidentally, the formula can (and, as far as I’m concerned, should) be constructed in the same way; no need to parse a string:
as.formula(bquote(.(as.name(dependent)) ~ .))
Taken together, the whole expression would then become:
formula = as.formula(bquote(.(as.name(dependent)) ~ .))
regsubsets_call = bquote(regsubsets(.(formula), data = .(substitute(data)), nvmax = 2))

Working with dataframe variable names passed as arguments to the R function

I want to fit a statistical model (e.g. a linear one) for arbitrary variables from the dataframe passed as arguments to a function.
E.g.
myLm = function(x,y) {
fit = lm(y~x, data=mtcars)}
myLm("cyl","mpg")
But this does not work.
How do I do that correctly?
You have to pass a formula object to lm function. But you give characters in input of myLM.
You can change your function in this way
myLm = function(x,y) {
fit = lm(formula(paste(x,"~",y)), data=mtcars)
}
myLm("cyl","mpg")
You still give characters input but they are converted inside your function myLM
Why don't you pass a formula object directly:
myLm <- function(f)
lm(f, data=mtcars)
myLm(mpg ~ cyl)

User defined functions as formula input

Built-in functions in R can be used in formula objects, for example
reg1 = lm(y ~ log(x), data = data1)
How can I write my functions such that they can be used in formula objects?
fnMyFun = function(x) {
return(x^2)
}
reg2 = lm(y ~ fnMyFun(x), data = data1)
What you've got certainly works. One problem is that different modelling functions handle formulas in different ways. I think that as long as you return something that model.matrix can make sense of, you'll be fine. That would mean
The function is vectorised; ie given a vector of length N, it returns a result also of length N
It has to return an atomic vector or matrix (but not a list, or of type raw)

Resources