GLM and GAM modelling in RStudio [duplicate] - r

I need to create a probit model without the intercept. So, how can I remove the intercept from a probit model in R?

You don't say how you are intending to fit the probit model, but if it uses R's formula notation to describe the model then you can supply either + 0 or - 1 as part of the formula to suppress the intercept:
mod <- foo(y ~ 0 + x1 + x2, data = bar)
or
mod <- foo(y ~ x1 + x2 - 1, data = bar)
(both using pseudo R code of course - substitute your modelling function and data/variables.)
If this is a model fitting by glm() then something like:
mod <- glm(y ~ x1 + x2 - 1, data = bar, family = binomial(link = "probit"))
should do it (again substituting in your data and variable names as appropriate.)

Also, if you have an existing formula object, foo, you can remove the intercept with update like this:
foo <- y ~ x1 + x2
bar <- update(foo, ~ . -1)
# bar == y ~ x1 + x2 - 1

Related

Use linear model for propensity score matching (R MatchIt)

I want to use the propensity score fitted from a linear model to match observations using the MatchIt library.
For example, suppose df is a dataframe. If I were to balance its treatment column in terms of x1, x2 and x3 using the propensity score from a logit, I would set distance = 'glm' and link = 'logit':
m <- matchit(formula = treatment ~ x1 + x2 + x3,
data = df,
method = 'nearest',
distance = 'glm',
link = 'logit')
How can I do the same with a linear model instead of a logit?
As per the documentation:
When link is prepended by "linear.", the linear predictor is used instead of the predicted probabilities.
Hence, I tried:
m <- matchit(formula = treatment ~ x1 + x2 + x3,
data = df,
method = 'nearest',
distance = 'glm',
link = 'linear.logit')
I'm afraid that doing this (link = linear.logit) would use the score from the log-odds of the logit model.
Is there any way I can just use a linear model instead of a generalized linear model?
You cannot do this from within matchit() (and you shouldn't, in general, which is why it is not allowed). But you can always estimate propensity score outside matchit() however you want and the supply them to matchit(). To use a linear probability model, you would run the following:
fit <- lm(treatment ~ x1 + x2 + x3, data = df)
ps <- fit$fitted
m <- matchit(treatment ~ x1 + x2 + x3,
data = df,
method = 'nearest',
distance = ps)

Running all Combinations of Dummy Variables Through a Regression Equation

I have an issue that concerns itself with extracting output from a regression for all possible combinations of dummy variable while keeping the continuous predictor variables fixed.
The problem is that my model contains over 100 combinations of interactions and manually calculating all of these will be quite tedious. Is there an efficient method for iteratively calculating output?
The only way I can think of is to write a loop that generates all desired combinations to subsequently feed into the predict() function.
Some context:
I am trying to identify the regional differences of automobile resale prices by the model of car.
My model looks something like this:
lm(data, price ~ age + mileage + region_dummy_1 + ... + region_dummy_n + model_dummy_1 + ... + model_dummy_n + region_dummy_1 * model_dummy_1 + ... + region_dummy_1 * model_dummy_n)
My question is:
How do I produce a table of predicted prices for every model/region combination?
Use .*.
lm(price ~ .*.)
Here's a small reproducible example:
> df <- data.frame(y = rnorm(100,0,1),
+ x1 = rnorm(100,0,1),
+ x2 = rnorm(100,0,1),
+ x3 = rnorm(100,0,1))
>
> lm(y ~ .*., data = df)
Call:
lm(formula = y ~ . * ., data = df)
Coefficients:
(Intercept) x1 x2 x3 x1:x2 x1:x3
-0.02036 0.08147 0.02354 -0.03055 0.05752 -0.02399
x2:x3
0.24065
How does it work?
. is shorthand for "all predictors", and * includes the two-way interaction term.
For example, consider a dataframe with 3 columns: Y (independent variable), and 2 predictors (X1 and X2). The syntax lm(Y ~ X1*X2) is shorthand for lm(Y ~ X1 + X2 + X1:X2), where, X1:X2 is the interaction term.
Extending this simple case, imagine we have a data frame with 3 predictors, X1, X2, and X3. lm(Y ~ .*.) is equivalent to lm(Y ~ X1 + X2 + X3 + X1:X2 + X1:X3 + X2:X3).

Remove dependent variable from formula for model.matrix

I'm just learning how to deal with model.matrix. For example, to create out-of-sample predictions I extract the formula from my model, say it's a linear model.
Using the function formula(mymodel) extracts that:
form <- formula(y ~ x1 + x2 * x3)
Now, to create predictions I need a model.matrix without my y. I could type that by hand:
X <- model.matrix(~ x1 + x2 * x3, data=out.of.sample.data)
However, is there a way using, for example, update to get rid of the left part my formula?
Thanks!
It can be done with update by setting the response variable to NULL:
form <- formula(y ~ x1 + x2 * x3)
newform <- update(form, NULL ~ .)
This is how I usually do this. I'm not aware of a built-in function for this.
df = data.frame(y=rnorm(10), x1=rnorm(10), x2=rnorm(10), x3=rnorm(10))
mymodel = lm(y ~ x1 + x2 + x3, df)
form_vars_only =
formula(paste("~",strsplit(as.character(formula(mymodel)),"~")[[3]]))

lme4 random effect structure with dredge

I have constructed an lme4 model for model selection in dredge but I am having trouble aligning the random effects with the relevant fixed effects.The structure of my full model is as follows.
fullModel<-glmer(y ~x1 + x2 + (0+x1|Year) + (0+x1|Country) + (0+x2|Year) + (0+x2|Country) + (1 | Year) +(1|Country), family=binomial('logit'),data = alldata)
In this model structure, model selection in dredge produces three combinations of fixed effects, i.e. x1, x2, and x1+x2, however the random effect structure remains the same as in the full model, such that even when fixed effect is only x1, the random effect will include (0+x2|Year) + (0+x2|Country). For example the model with only x1 as the fixed effect, will still have x2 within the random effects structure as follows.
y ~x1 + (0+x1|Year) + (0+x1|Country) + (0+x2|Year) +(0+x2|Country) + (1 | Year) +(1|Country), family=binomial('logit')
Is there a way to configure dredge not to select random effects that have other fixed effects specified in them? I have about x1….x50.
You cannot do that out-of-box as dredge currently omits all (x|g) expressions, but you can make a "wrapper" around (g)lmer that replaces the "|" terms in the formula with something else (e.g. re(x,g)), so that dredge thinks these are fixed effects. Example:
glmerwrap <-
function(formula) {
cl <- origCall <- match.call()
cl[[1L]] <- as.name("glmer") # replace 'lmerwrap' with 'glmer'
# replace "re" with "|" in the formula:
f <- as.formula(do.call("substitute", list(formula, list(re = as.name("|")))))
environment(f) <- environment(formula)
cl$formula <- f
x <- eval.parent(cl) # evaluate modified call
# store original call and formula in the result:
x#call <- origCall
attr(x#frame, "formula") <- formula
x
}
formals(glmerwrap) <- formals(lme4::glmer)
Following example(glmer):
# note the use of re(x,group) instead of (x|group)
(fm <- glmerwrap(cbind(incidence, size - incidence) ~ period +
re(1, herd) + re(1, obs), family = binomial, data = cbpp))
Now,
dredge(fm)
manipulates both fixed and random effects.

How to add one variable each time into the regression model?

I have a question about how to add one variable each time into the regression model to evaluate the adjusted R squared.
For example,
lm(y~x1)
next time, I want to do
lm(y~x1+x2)
and then,
lm(y~x1+x2+x3)
I tried paste, it does not work. for example, lm(y~paste("x1","x2",sep="+")).
Any idea?
Assuming you fit 3 variables to your linear regression model: x1, x2 and x3
lm.fit1 = lm(y ~ x1 + x2 + x3)
Introducing an additional variable (x4) can be achieved by using the update function:
lm.fit2 = update(lm.fit1, .~. + x4)
You could even introduce an interaction term if required:
lm.fit3 = update(lm.fit2, .~. + x2:x3)
Further details on adding variables to regression models can be obtained here

Resources