What is the meaning of Y~0|X1+X2... syntax - r

I have encountered logistic regression code with the format
mlogit(Y ~ 0| X1 + X2 +...Xn)
What is the difference between this and syntax of the form
mlogit(Y ~ X1 + X2 +... Xn)
I am familiar with the use of syntax of the form
glm(Y ~ X1 + X2 + ... Xn, family = 'binomial')
as standard R syntax for multiple logistic regression of outcome Y against predictors. I am not familiar with the syntax of Y ~ 0| X1 + ... Xn and would appreciate a clarification of what the difference is.

Related

Linear Regression with coefficients question

We are given the following dataset [dataset used for linear regression][1]
[1]: https://github.com/Iron-Maiden-19/regression/blob/master/shel2x.csv and we fit this linear regression model - Model A
modelA <- lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8,data=shel2x)
which is fine but then we are given the following problem which I am unsure how to solve the following question - Fit Model B and compare the AIC of it to modelA and here is modelB:
Y = β0 + β1X1+ β2X2+ β3X2^2 + β4X4+ β5X6 +ε
So I know the beta values represent my coefficients from the first model but how do I do regression and how do I form an equation for regression.
In R, you perform a linear regression just the way you already have.
modelA <- lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8,data=shel2x)
ModelA is a linear model of the form:
Y = beta0 + beta1*X1 + beta2*X2 + beta3*X3 + beta4*X4 + beta5*X5 + beta6*X6 + beta7*X7 + beta8*X8
So, to fit model B, you would just create another linear model in the following manner:
modelB <- lm(Y ~ X1 + X2 + X2^2 + X4 + X6, data=shel2x)
Then calling:
summary(modelA)
summary(modelB)
Should give you the summary output for the two separate linear models, which will include the separate AIC for both of them. Without running the models and without looking at your data, I'm almost positive that modelB will have a smaller AIC, as it always tends to favor the more parsimonious model.

Syntax to remove interaction terms of one variable with all other variables in the dataset

Suppose I have 6 independent variables, x1,x2,x3,x4,x5,x6. Suppose I want to fit a model with all main effects and the two terms interactions. Then my syntax will be:
lm(y~.+.^2, data=d)
x1 is actually ID variable. So if I want to discard the main effect, I can write lm(y~.+-x1+.^2, data=d)
But how can I write the syntax to remove the interaction terms having id (x1 variable) ?
How about this:
lm(y ~ . + (. - x1) ^ 2, data = d)
which leads to
(Intercept)
x1
x2
x3
x4
x5
x6
x2:x3
x2:x4
x2:x5
x2:x6
x3:x4
x3:x5
x3:x6
x4:x5
x4:x6
x5:x6

Running all Combinations of Dummy Variables Through a Regression Equation

I have an issue that concerns itself with extracting output from a regression for all possible combinations of dummy variable while keeping the continuous predictor variables fixed.
The problem is that my model contains over 100 combinations of interactions and manually calculating all of these will be quite tedious. Is there an efficient method for iteratively calculating output?
The only way I can think of is to write a loop that generates all desired combinations to subsequently feed into the predict() function.
Some context:
I am trying to identify the regional differences of automobile resale prices by the model of car.
My model looks something like this:
lm(data, price ~ age + mileage + region_dummy_1 + ... + region_dummy_n + model_dummy_1 + ... + model_dummy_n + region_dummy_1 * model_dummy_1 + ... + region_dummy_1 * model_dummy_n)
My question is:
How do I produce a table of predicted prices for every model/region combination?
Use .*.
lm(price ~ .*.)
Here's a small reproducible example:
> df <- data.frame(y = rnorm(100,0,1),
+ x1 = rnorm(100,0,1),
+ x2 = rnorm(100,0,1),
+ x3 = rnorm(100,0,1))
>
> lm(y ~ .*., data = df)
Call:
lm(formula = y ~ . * ., data = df)
Coefficients:
(Intercept) x1 x2 x3 x1:x2 x1:x3
-0.02036 0.08147 0.02354 -0.03055 0.05752 -0.02399
x2:x3
0.24065
How does it work?
. is shorthand for "all predictors", and * includes the two-way interaction term.
For example, consider a dataframe with 3 columns: Y (independent variable), and 2 predictors (X1 and X2). The syntax lm(Y ~ X1*X2) is shorthand for lm(Y ~ X1 + X2 + X1:X2), where, X1:X2 is the interaction term.
Extending this simple case, imagine we have a data frame with 3 predictors, X1, X2, and X3. lm(Y ~ .*.) is equivalent to lm(Y ~ X1 + X2 + X3 + X1:X2 + X1:X3 + X2:X3).

Remove dependent variable from formula for model.matrix

I'm just learning how to deal with model.matrix. For example, to create out-of-sample predictions I extract the formula from my model, say it's a linear model.
Using the function formula(mymodel) extracts that:
form <- formula(y ~ x1 + x2 * x3)
Now, to create predictions I need a model.matrix without my y. I could type that by hand:
X <- model.matrix(~ x1 + x2 * x3, data=out.of.sample.data)
However, is there a way using, for example, update to get rid of the left part my formula?
Thanks!
It can be done with update by setting the response variable to NULL:
form <- formula(y ~ x1 + x2 * x3)
newform <- update(form, NULL ~ .)
This is how I usually do this. I'm not aware of a built-in function for this.
df = data.frame(y=rnorm(10), x1=rnorm(10), x2=rnorm(10), x3=rnorm(10))
mymodel = lm(y ~ x1 + x2 + x3, df)
form_vars_only =
formula(paste("~",strsplit(as.character(formula(mymodel)),"~")[[3]]))

GLM and GAM modelling in RStudio [duplicate]

I need to create a probit model without the intercept. So, how can I remove the intercept from a probit model in R?
You don't say how you are intending to fit the probit model, but if it uses R's formula notation to describe the model then you can supply either + 0 or - 1 as part of the formula to suppress the intercept:
mod <- foo(y ~ 0 + x1 + x2, data = bar)
or
mod <- foo(y ~ x1 + x2 - 1, data = bar)
(both using pseudo R code of course - substitute your modelling function and data/variables.)
If this is a model fitting by glm() then something like:
mod <- glm(y ~ x1 + x2 - 1, data = bar, family = binomial(link = "probit"))
should do it (again substituting in your data and variable names as appropriate.)
Also, if you have an existing formula object, foo, you can remove the intercept with update like this:
foo <- y ~ x1 + x2
bar <- update(foo, ~ . -1)
# bar == y ~ x1 + x2 - 1

Resources