Running multiple linear regression in R - r

How would one run a multiple linear regression on R, with > 100 covariates?
Is there a faster way besides (y ~ x1 + x2 + x3 + ... + x100)?

lm(y ~ ., data = YourData)
You can use the . , which takes all columns other than the response column from the supplied data as covariates.

Related

Subsetting with dredge function (MuMin)

I'm trying to subset a series of models dredged from a global model that has both linear & non-linear terms. There are no interactions e.g.
Glblm <- Y ~ X1 + X2 + X3 + I(X3^2) + X4 + X5 + X6 + I(X6^2) + X7 + I(X7^2)
I want to specify that X3^2 should never appear without X3, but X3 could appear alone without X3^2 (and the same for X6 & X7).
I have tried the following as I understood from the documentation:
ssm <-dredge (Glblm, subset=(X3| !I(X3^2)) && (X6| !I(X6^2)) && (X7| !I(X7^2)))
I also tried making a subset first as I read https://stackoverflow.com/questions/55252019/dredge-in-mumin-r-keeps-models-with-higher-order-terms-without-their-respectiv
e.g.
hbfsubset <- expression( dc(X3, `I(X3^2)`) & dc(`X6`, `I(X6^2)`)& dc(`X7`, `I(X7^2)`))
ssm <-dredge (Glblm, subset=hbfsubset)
neither has produced a subset of models, instead the full list of models is returned when inspecting 'ssm' using:
model.sel(ssm)
Any help would be greatly appreciated.
A reproducible example from you is needed to pinpoint the issue, specifying what type of model you are fitting.
In simple linear models (lm, those examples provided in MuMIn handbook), the name of fitted terms is exactly the same as what you typed in the global model, but this may not be the case in more complex models (e.g. glmmTMB).
Here is an example:
library(MuMIn)
library(glmmTMB)
# a simple linear model, using Cement data from MuMIn
m1 <- lm(y ~ X1 + I(X1^2) + X2 + I(X2^2), data = Cement, na.action = "na.fail")
# dredge without a subset
d1 <- dredge(m1)
# 16 models produced
# dredge with a subset
d1_sub <- dredge(m1, subset = dc(`X1`, `I(X1^2)`) & dc(`X2`, `I(X2^2)`))
# 9 models produced, works totally fine
# a glmmTMB linear model
m2 <- glmmTMB(y ~ X1 + I(X1^2) + X2 + I(X2^2), data = Cement, na.action = "na.fail")
# dredge without a subset
d2 <- dredge(m2)
# 16 models produced
# dredge with a subset
d2_sub <- dredge(m2, subset = dc(`X1`, `I(X1^2)`) & dc(`X2`, `I(X2^2)`))
# 16 models produced, subset didn't work and no warning or error produced
# this is because the term names of a glmmTMB object in dredge() is not the same as the typed global model anymore:
names(d2_sub)
# [1] "cond((Int))" "disp((Int))" "cond(X1)" "cond(I(X1^2))" "cond(X2)" "cond(I(X2^2))" "df" "logLik" "AICc"
# [10] "delta" "weight"
# e.g., now the X1 in the typed global model is actually called cond(X1)
# what will work for glmmTMB:
d2_sub <- dredge(m2, subset = dc(`cond(X1)`, `cond(I(X1^2))`) & dc(`cond(X2)`, `cond(I(X2^2))`))
# 9 models produced

Running all Combinations of Dummy Variables Through a Regression Equation

I have an issue that concerns itself with extracting output from a regression for all possible combinations of dummy variable while keeping the continuous predictor variables fixed.
The problem is that my model contains over 100 combinations of interactions and manually calculating all of these will be quite tedious. Is there an efficient method for iteratively calculating output?
The only way I can think of is to write a loop that generates all desired combinations to subsequently feed into the predict() function.
Some context:
I am trying to identify the regional differences of automobile resale prices by the model of car.
My model looks something like this:
lm(data, price ~ age + mileage + region_dummy_1 + ... + region_dummy_n + model_dummy_1 + ... + model_dummy_n + region_dummy_1 * model_dummy_1 + ... + region_dummy_1 * model_dummy_n)
My question is:
How do I produce a table of predicted prices for every model/region combination?
Use .*.
lm(price ~ .*.)
Here's a small reproducible example:
> df <- data.frame(y = rnorm(100,0,1),
+ x1 = rnorm(100,0,1),
+ x2 = rnorm(100,0,1),
+ x3 = rnorm(100,0,1))
>
> lm(y ~ .*., data = df)
Call:
lm(formula = y ~ . * ., data = df)
Coefficients:
(Intercept) x1 x2 x3 x1:x2 x1:x3
-0.02036 0.08147 0.02354 -0.03055 0.05752 -0.02399
x2:x3
0.24065
How does it work?
. is shorthand for "all predictors", and * includes the two-way interaction term.
For example, consider a dataframe with 3 columns: Y (independent variable), and 2 predictors (X1 and X2). The syntax lm(Y ~ X1*X2) is shorthand for lm(Y ~ X1 + X2 + X1:X2), where, X1:X2 is the interaction term.
Extending this simple case, imagine we have a data frame with 3 predictors, X1, X2, and X3. lm(Y ~ .*.) is equivalent to lm(Y ~ X1 + X2 + X3 + X1:X2 + X1:X3 + X2:X3).

Multiple linear regression and ANOVA in R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
So, I have a table which consists of 20 football teams and 6 variables, the variables are X1, X2, X3, X4, X5 and X6.
X1 = % of goals shots at goal which result in a goal
X2 = goals scored outside the box
X3 = ratio of short to long passes
X4 = Number of ball crosses
X5 = Average number of goals conceded
X6 = # Yellow cards recieved
and then I have a Y column which is the number of league points each team has.
How would I perform multiple linear regression and ANOVA on this? I am at a complete loss with R.
Thanks
The data is this:
Multiple Linear Regression
The lm() in base R does exactly what you want (no need to use glm if you are only running linear regression):
Reg = lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6, data = mydata)
If Y and the X's are the only columns in your data.frame, you can use this much simpler syntax:
Reg = lm(Y ~ ., data = mydata)
The . means "all other columns".
To see regression output (as suggested by #Manassa Mauler):
summary(Reg)
Refer to ?lm and ?glm for more information.
ANOVA
If you want to compare nested models with the "intercept-only" model, you can do something like the following:
fit0 = lm(Y ~ 1, data = mydata)
fit1 = update(fit0, . ~ . + X1)
fit2 = update(fit1, . ~ . + X2)
fit3 = update(fit2, . ~ . + X3)
fit4 = update(fit3, . ~ . + X4)
fit5 = update(fit4, . ~ . + X5)
fit6 = update(fit5, . ~ . + X6)
This successively adds an additional variable to the intercept-only model.
To compare them, use the anova() function:
anova(fit0, fit1, fit2, fit3, fit4, fit5, fit6, test = "F")
Refer to ?anova or ?anova.lm for more information.
Just set up the multiple regression as follows using the glm function and then extract the results using summary.
model <- glm(Y~X1+X2+X3+X4+X5+X6)
summary(model)

GLM and GAM modelling in RStudio [duplicate]

I need to create a probit model without the intercept. So, how can I remove the intercept from a probit model in R?
You don't say how you are intending to fit the probit model, but if it uses R's formula notation to describe the model then you can supply either + 0 or - 1 as part of the formula to suppress the intercept:
mod <- foo(y ~ 0 + x1 + x2, data = bar)
or
mod <- foo(y ~ x1 + x2 - 1, data = bar)
(both using pseudo R code of course - substitute your modelling function and data/variables.)
If this is a model fitting by glm() then something like:
mod <- glm(y ~ x1 + x2 - 1, data = bar, family = binomial(link = "probit"))
should do it (again substituting in your data and variable names as appropriate.)
Also, if you have an existing formula object, foo, you can remove the intercept with update like this:
foo <- y ~ x1 + x2
bar <- update(foo, ~ . -1)
# bar == y ~ x1 + x2 - 1

How to add one variable each time into the regression model?

I have a question about how to add one variable each time into the regression model to evaluate the adjusted R squared.
For example,
lm(y~x1)
next time, I want to do
lm(y~x1+x2)
and then,
lm(y~x1+x2+x3)
I tried paste, it does not work. for example, lm(y~paste("x1","x2",sep="+")).
Any idea?
Assuming you fit 3 variables to your linear regression model: x1, x2 and x3
lm.fit1 = lm(y ~ x1 + x2 + x3)
Introducing an additional variable (x4) can be achieved by using the update function:
lm.fit2 = update(lm.fit1, .~. + x4)
You could even introduce an interaction term if required:
lm.fit3 = update(lm.fit2, .~. + x2:x3)
Further details on adding variables to regression models can be obtained here

Resources