We are given the following dataset [dataset used for linear regression][1]
[1]: https://github.com/Iron-Maiden-19/regression/blob/master/shel2x.csv and we fit this linear regression model - Model A
modelA <- lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8,data=shel2x)
which is fine but then we are given the following problem which I am unsure how to solve the following question - Fit Model B and compare the AIC of it to modelA and here is modelB:
Y = β0 + β1X1+ β2X2+ β3X2^2 + β4X4+ β5X6 +ε
So I know the beta values represent my coefficients from the first model but how do I do regression and how do I form an equation for regression.
In R, you perform a linear regression just the way you already have.
modelA <- lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8,data=shel2x)
ModelA is a linear model of the form:
Y = beta0 + beta1*X1 + beta2*X2 + beta3*X3 + beta4*X4 + beta5*X5 + beta6*X6 + beta7*X7 + beta8*X8
So, to fit model B, you would just create another linear model in the following manner:
modelB <- lm(Y ~ X1 + X2 + X2^2 + X4 + X6, data=shel2x)
Then calling:
summary(modelA)
summary(modelB)
Should give you the summary output for the two separate linear models, which will include the separate AIC for both of them. Without running the models and without looking at your data, I'm almost positive that modelB will have a smaller AIC, as it always tends to favor the more parsimonious model.
Related
Suppose I have 6 independent variables, x1,x2,x3,x4,x5,x6. Suppose I want to fit a model with all main effects and the two terms interactions. Then my syntax will be:
lm(y~.+.^2, data=d)
x1 is actually ID variable. So if I want to discard the main effect, I can write lm(y~.+-x1+.^2, data=d)
But how can I write the syntax to remove the interaction terms having id (x1 variable) ?
How about this:
lm(y ~ . + (. - x1) ^ 2, data = d)
which leads to
(Intercept)
x1
x2
x3
x4
x5
x6
x2:x3
x2:x4
x2:x5
x2:x6
x3:x4
x3:x5
x3:x6
x4:x5
x4:x6
x5:x6
I have encountered logistic regression code with the format
mlogit(Y ~ 0| X1 + X2 +...Xn)
What is the difference between this and syntax of the form
mlogit(Y ~ X1 + X2 +... Xn)
I am familiar with the use of syntax of the form
glm(Y ~ X1 + X2 + ... Xn, family = 'binomial')
as standard R syntax for multiple logistic regression of outcome Y against predictors. I am not familiar with the syntax of Y ~ 0| X1 + ... Xn and would appreciate a clarification of what the difference is.
I have an issue that concerns itself with extracting output from a regression for all possible combinations of dummy variable while keeping the continuous predictor variables fixed.
The problem is that my model contains over 100 combinations of interactions and manually calculating all of these will be quite tedious. Is there an efficient method for iteratively calculating output?
The only way I can think of is to write a loop that generates all desired combinations to subsequently feed into the predict() function.
Some context:
I am trying to identify the regional differences of automobile resale prices by the model of car.
My model looks something like this:
lm(data, price ~ age + mileage + region_dummy_1 + ... + region_dummy_n + model_dummy_1 + ... + model_dummy_n + region_dummy_1 * model_dummy_1 + ... + region_dummy_1 * model_dummy_n)
My question is:
How do I produce a table of predicted prices for every model/region combination?
Use .*.
lm(price ~ .*.)
Here's a small reproducible example:
> df <- data.frame(y = rnorm(100,0,1),
+ x1 = rnorm(100,0,1),
+ x2 = rnorm(100,0,1),
+ x3 = rnorm(100,0,1))
>
> lm(y ~ .*., data = df)
Call:
lm(formula = y ~ . * ., data = df)
Coefficients:
(Intercept) x1 x2 x3 x1:x2 x1:x3
-0.02036 0.08147 0.02354 -0.03055 0.05752 -0.02399
x2:x3
0.24065
How does it work?
. is shorthand for "all predictors", and * includes the two-way interaction term.
For example, consider a dataframe with 3 columns: Y (independent variable), and 2 predictors (X1 and X2). The syntax lm(Y ~ X1*X2) is shorthand for lm(Y ~ X1 + X2 + X1:X2), where, X1:X2 is the interaction term.
Extending this simple case, imagine we have a data frame with 3 predictors, X1, X2, and X3. lm(Y ~ .*.) is equivalent to lm(Y ~ X1 + X2 + X3 + X1:X2 + X1:X3 + X2:X3).
It's a bit of a long question so thanks for bearing with me.
Here's my data
https://www.dropbox.com/s/jo22d68a8vxwg63/data.csv?dl=0
I constructed a mixed effect model
library(lme4)
mod <- lmer(sqrt(y) ~ x1 + I(x1^2) + x2 + I(x2^2) + x3 + I(x3^2) + x4 + I(x4^2) + x5 + I(x5^2) +
x6 + I(x6^2) + x7 + I(x7^2) + x8 + I(x8^2) + (1|loc) + (1|year), data = data)
All the predictors are standardised and I am interested in knowing how does y changes with changes in x5while keeping other variables at their mean values (equal to 0 since all the variables are standardised).
This is how I do it.
# make all predictors except x5 equal to zero
data$x1<-0
data$x2<-0
data$x3<-0
data$x4<-0
data$x6<-0
data$x7<-0
data$x8<-0
# Use the predict function
library(merTools)
fitted <- predictInterval(merMod = mod, newdata = data, level = 0.95, n.sims = 1000,stat = "median",include.resid.var = TRUE)
Now I want to plot the fitted as a quadratic function of x5. I do this:
i<-order(data$x5)
plot(data$x5[i],fitted$fit[i],type="l")
I expected this to produce a plot of y as a quadratic function of x5. But As you can see, I get the following plot which does not have any quadratic curve. Can anyone tell me what I am doing wrong here?
I'm not sure where predictInterval comes from, but you can do this with predict. The trick is just to make sure you set your random effects to 0. Here's how you can do that
newdata <- data
newdata[,paste0("x", setdiff(1:8,5))] <- 0
y <- predict(mod, newdata=newdata, re.form=NA)
plot(data$x5, y)
The re.form=NA part drops out the random effect
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
So, I have a table which consists of 20 football teams and 6 variables, the variables are X1, X2, X3, X4, X5 and X6.
X1 = % of goals shots at goal which result in a goal
X2 = goals scored outside the box
X3 = ratio of short to long passes
X4 = Number of ball crosses
X5 = Average number of goals conceded
X6 = # Yellow cards recieved
and then I have a Y column which is the number of league points each team has.
How would I perform multiple linear regression and ANOVA on this? I am at a complete loss with R.
Thanks
The data is this:
Multiple Linear Regression
The lm() in base R does exactly what you want (no need to use glm if you are only running linear regression):
Reg = lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6, data = mydata)
If Y and the X's are the only columns in your data.frame, you can use this much simpler syntax:
Reg = lm(Y ~ ., data = mydata)
The . means "all other columns".
To see regression output (as suggested by #Manassa Mauler):
summary(Reg)
Refer to ?lm and ?glm for more information.
ANOVA
If you want to compare nested models with the "intercept-only" model, you can do something like the following:
fit0 = lm(Y ~ 1, data = mydata)
fit1 = update(fit0, . ~ . + X1)
fit2 = update(fit1, . ~ . + X2)
fit3 = update(fit2, . ~ . + X3)
fit4 = update(fit3, . ~ . + X4)
fit5 = update(fit4, . ~ . + X5)
fit6 = update(fit5, . ~ . + X6)
This successively adds an additional variable to the intercept-only model.
To compare them, use the anova() function:
anova(fit0, fit1, fit2, fit3, fit4, fit5, fit6, test = "F")
Refer to ?anova or ?anova.lm for more information.
Just set up the multiple regression as follows using the glm function and then extract the results using summary.
model <- glm(Y~X1+X2+X3+X4+X5+X6)
summary(model)