Conditional expression for if variable present in model - r

What I want to do is to make a condition for if there is a certain variable in linear model
Example. If there is a B in a linear model
model <- lm(Y ~ A + B + C)
I want to do something. I have used the summary function before to refer to R-squared.
summary(model)$r.squared
Probably I am looking for something like this
if (B %in% summary(model)$xxx)
or
if (B %in% summary(model)[xxx])
But I can't find xxx. Please help =)

Try this:
if ("B" %in% all.vars(formula(model))) ...

Another way:
if ("B" %in% names(coef(model)))

Yet another way:
if ("B" %in% variable.names(model)) ...

One option is to grab the model terms from the fitted model and interrogate the term.labels attribute. Using some dummy data:
set.seed(1)
DF <- data.frame(Y = rnorm(100), A = rnorm(100), B = rnorm(100), C = rnorm(100))
model <- lm(Y ~ A + B + C, data = DF)
The terms object contains the labels in an attribute:
> attr(terms(model), "term.labels")
[1] "A" "B" "C"
So check if "B" is in that set of labels:
> if("B" %in% attr(terms(model), "term.labels")) {
+ summary(model)$r.squared
+ }
[1] 0.003134009

A (somewhat inelegant) possible solutions would be:
length(grep("\\bB\\b",formula(model))) > 0
where \\b matches the word boundary and B is the variable name you're looking for.

Related

How to paste formula into model.matrix function in R?

By way of simplified example, say you have the following data:
n <- 10
df <- data.frame(x1 = rnorm(n, 3, 1), x2 = rnorm(n, 0, 1))
And you wish to create a model matrix of the following form:
model.matrix(~ df$x1 + df$x2)
or more preferably:
model.matrix(~ x1 + x2, data = df)
but instead by pasting the formula into model.matrix. I have experimented with the following but encounter errors with all of them:
form1 <- "df$x1 + df$x2"
model.matrix(~ as.formula(form1))
model.matrix(~ eval(parse(text = form1)))
model.matrix(~ paste(form1))
model.matrix(~ form1)
I've also tried the same with the more preferable structure:
form2 <- "x1 + x2, data = df"
Is there a direct solution to this problem? Or is the model.matrix function not conducive to this approach?
Do you mean something like this?
expr <- "~ x1 + x2"
model.matrix(as.formula(expr), df)
You need to give df as the data argument outside of as.formula, as the data argument defines the environment within which to evaluate the formula.
If you don't want to specify the data argument you can do
model.matrix(as.formula("~ df$x1 + df$x2"))

How to remove intercept from formula

I have a formula which I would like to use to create a model matrix, but for my use I need to stop the user from adding an intercept as this will be taken care of at a later stage in the regression. How can I remove the intercept from the formula and is there a better option than update?
You can do this a few ways. The first option specified below is probably the best way of going about this.
# Create dataset and form for example
dta <- data.frame(y = rnorm(3), x = rnorm(3), z = rnorm(3))
form <- y ~ x + z
# No censoring
(X <- model.matrix(form, dta))
# Option 1 (my default option)
tf <- terms(form)
attr(tf, "intercept") <- 0
model.matrix(tf, dta)
# Option 2
X[, !colnames(X) %in% "(Intercept)"]
# Option 3
form2 <- update(form, . ~ . - 1)
model.matrix(form2, dta)

How to write the program using "lm" command?

I tried to predict the t121 columns using the "lm" command below like this,
Model<-lm(t121 ~ t1 + t2 + ..... +t120, mydata)
In my data dependent variables are more than 100, So it's difficult for predicting each columns using "lm" command that's why i want to write the program for my data like this given below i written,
for(j in 120:179){
model[[j+1]]<-lm(t[j+1] ~ add1(t1:t[j]),mydata)
}
Instead of add1 place i used add.bigq,sum commands but these three commands are not correct please tell me what is the command suitable for that place?
From what I understand, you want to write a loop that allows you to use lm with different formulas. The nice thing about lm is that it can take objects of the class formula as its first argument. Lets see how that works.
# Create a data set
df <- data.frame(col1=(1:10+rnorm(10)), col2 = 1:10, col3 = rnorm(10), col4 = rnorm(10))
If we want to run lm on col1 as the dependent and col2 as the independent variable, then we can do this:
model_a <- lm(col1 ~ col2, data = df)
form_b <- as.formula("col1 ~ col2")
model_b <- lm(form_b, data = df)
all.equal(model_a,model_b)
# [1] "Component “call”: target, current do not match when deparsed"
So the only thing that differed between the two models is that the function call was different (in model_b we used form_b, not col1 ~ col2). Other than that, the models are identical.
So now you know how to use the formula class to run lm. You can easily construct formulas with paste, by setting collapse to +
ind_vars <- paste(names(df)[-1],collapse = " + ")
form_lm <- paste(names(df)[1], "~", ind_vars)
form_lm
# [1] "col1 ~ col2 + col3 + col4"
If we want three different models, we can do a couple of things, for example:
lis <- list()
for (i in 2:length(names(df))) {
ind_vars <- paste(names(df)[2:i], collapse="+")
form_lm <- paste(names(df)[1], "~", ind_vars)
lis[[i-1]] <- lm(form_lm,data=df)
}

How to construct a big regular formula for a model in R?

I am trying create model to predict "y" from data "D" that contain predictor x1 to x100 and other 200 variables . since all Xs are not stored consequently I can't call them by column.
I can't use ctree( y ~ , data = D) because other variables , Is there a way that I can refer them x1:100 ?? in the model ?
instead of writing a very long code
ctree( y = x1 + x2 + x..... x100)
Some recommendation would be appreciated.
Two more. The simplest in my mind is to subset the data:
ctree(y ~ ., data = D[, c("y", paste0("x", 1:100))]
Or a more functional approach to building dynamic formulas:
ctree(reformulate(paste0("x", 1:100), "y"), data = D)
Construct your formula as a text string, and convert it with as.formula.
vars <- names(D)[1:100] # or wherever your desired predictors are
fm <- paste("y ~", paste(vars, collapse="+"))
fm <- as.formula(fm)
ctree(fm, data=D, ...)
You can use this:
fml = as.formula(paste("y", paste0("x", 1:100, collapse=" + "), sep=" ~ "))
ctree(fmla)

Is there a better alternative than string manipulation to programmatically build formulas?

Everyone else's functions seem to take formula objects and then do dark magic to them somewhere deep inside and I'm jealous.
I'm writing a function that fits multiple models. Parts of the formulas for these models remain the same and part change from one model to the next. The clumsy way would be to have the user input the formula parts as character strings, do some character manipulation on them, and then use as.formula.
But before I go that route, I just want to make sure that I'm not overlooking some cleaner way of doing it that would allow the function to accept formulas in the standard R format (e.g. extracted from other formula-using objects).
I want something like...
> LHS <- y~1; RHS <- ~a+b; c(LHS,RHS);
y ~ a + b
> RHS2 <- ~c;
> c(LHS, RHS, RHS2);
y ~ a + b + c
or...
> LHS + RHS;
y ~ a + b
> LHS + RHS + RHS2;
y ~ a + b + c
...but unfortunately neither syntax works. Does anybody know if there is something that does? Thanks.
reformulate will do what you want.
reformulate(termlabels = c('x','z'), response = 'y')
## y ~ x + z
Or without an intercept
reformulate(termlabels = c('x','z'), response = 'y', intercept = FALSE)
## y ~ x + z - 1
Note that you cannot construct formulae with multiple reponses such as x+y ~z+b
reformulate(termlabels = c('x','y'), response = c('z','b'))
z ~ x + y
To extract the terms from an existing formula (given your example)
attr(terms(RHS), 'term.labels')
## [1] "a" "b"
To get the response is slightly different, a simple approach (for a single variable response).
as.character(LHS)[2]
## [1] 'y'
combine_formula <- function(LHS, RHS){
.terms <- lapply(RHS, terms)
new_terms <- unique(unlist(lapply(.terms, attr, which = 'term.labels')))
response <- as.character(LHS)[2]
reformulate(new_terms, response)
}
combine_formula(LHS, list(RHS, RHS2))
## y ~ a + b + c
## <environment: 0x577fb908>
I think it would be more sensible to specify the response as a character vector, something like
combine_formula2 <- function(response, RHS, intercept = TRUE){
.terms <- lapply(RHS, terms)
new_terms <- unique(unlist(lapply(.terms, attr, which = 'term.labels')))
response <- as.character(LHS)[2]
reformulate(new_terms, response, intercept)
}
combine_formula2('y', list(RHS, RHS2))
you could also define a + operator to work with formulae (update setting an new method for formula objects)
`+.formula` <- function(e1,e2){
.terms <- lapply(c(e1,e2), terms)
reformulate(unique(unlist(lapply(.terms, attr, which = 'term.labels'))))
}
RHS + RHS2
## ~a + b + c
You can also use update.formula using . judiciously
update(~a+b, y ~ .)
## y~a+b

Resources