I'm playing around with interaction in the formula. I wondered if it's possible to do a regression with interaction for one of the two dummy variables. This seems to work in regular linear regression using the lm() function but with the ols() function in the rms package the same formula fails. Anyone know why?
Here's my example
data(mtcars)
mtcars$gear <- factor(mtcars$gear)
regular_lm <- lm(mpg ~ wt + cyl + gear + cyl:gear, data=mtcars)
summary(regular_lm)
regular_lm <- lm(mpg ~ wt + cyl + gear + cyl:I(gear == "4"), data=mtcars)
summary(regular_lm)
And now the rms example
library(rms)
dd <- datadist(mtcars)
options(datadist = "dd")
regular_ols <- ols(mpg ~ wt + cyl + gear + cyl:gear, data=mtcars)
regular_ols
# Fails with:
# Error in if (!length(fname) || !any(fname == zname)) { :
# missing value where TRUE/FALSE needed
regular_ols <- ols(mpg ~ wt + cyl + gear + cyl:I(gear == "4"), data=mtcars)
This experiment might not be the wisest statistic to do as it seems that the estimates change significantly but I'm a little curious to why ols() fails since it should do the "same fitting routines used by lm"
I don't know exactly, but it has to do with the way the formula is evaluated rather than with the way the fit is done once the model has been translated. Using traceback() shows that the problem occurs within Design(eval.parent(m)); using options(error=recover) gets you to the point where you can see that
Browse[1]> fname
[1] "wt" "cyl" "gear"
Browse[1]> zname
[1] NA
in other words, zname is some internal variable that hasn't been set right because the Design function can't quite handle defining the interaction between cylinders and the (gear==4) dummy on the fly.
This works though:
mtcars$cylgr <- with(mtcars,interaction(cyl,gear == "4"))
regular_ols <- ols(mpg ~ wt + cyl + gear + cylgr, data=mtcars)
Related
I'm having a hard time getting a fixest object to play nicely with ggeffects in R, when fixed effects are included.
When I run the following code:
m <- feols(mpg ~ disp + gear + hp | cyl, mtcars,
cluster = c("am", "cyl"))
summary(m)
marg1 <- ggeffect(m, terms = c("disp"))
I get an error reading:
Can't compute marginal effects, 'effects::Effect()' returned an error.
Reason: non-conformable arguments
You may try 'ggpredict()' or 'ggemmeans()'.
However, there are no problems when I remove the fixed effects term / include it without using the pipe:
m <- feols(mpg ~ disp + gear + hp + cyl, mtcars,
cluster = c("am", "cyl"))
summary(m)
marg1 <- ggeffect(m, terms = c("disp"))
ggpredict also returns an error on my data (Could not compute variance-covariance matrix of predictions. No confidence intervals are returned.) but I am unable to replicate that same error using the toy data.
Using the built-in mtcars data set I've just run the following bit of code:
my_mtcars <- mtcars
my_mtcars$cyl <- as.factor(my_mtcars$cyl)
my_mtcars$gear <- as.factor(my_mtcars$gear)
Fit a simple GLM: mpg ~ cyl + gear
my_glm <- glm(mpg ~ cyl + gear, data = my_mtcars)
summary(my_glm)
I want to offset cyl. If I set offset to be the same as the glm parameter estimates, should get same model in respect of gear
my_mtcars$cyl_offset <- case_when(
my_mtcars$cyl == 4 ~ 0,
my_mtcars$cyl == 6 ~ -6.656,
my_mtcars$cyl == 8 ~ -10.542
)
Fit same model but use offset instead of normal term
my_glm <- glm(mpg ~ offset(cyl_offset) + gear, data = my_mtcars)
summary(my_glm)
I do indeed get the same parameter estimates (intercept + gear), and the same residual deviance, but smaller standard errors. I wasn't expecting that - should I have been?
I'm trying to: (1) estimate multiple models where only the dependent variable changes (2) Tabulate the results with the Stargazer package
The following code works, but I have to repeat a line of code for each model:
library(stargazer)
data(mtcars)
reg1 <- lm(mpg ~ am + gear + carb, data=mtcars)
reg2 <- lm(cyl ~ am + gear + carb, data=mtcars)
reg3 <- lm(disp ~ am + gear + carb, data=mtcars)
stargazer(reg1, reg2, reg3,
title="Regression Results", type="text",
df=FALSE, digits=3)
You can see that the (trimmed) output has the correct headings for the dependent variables (mpg, cyl, disp):
Regression Results
==================================================
Dependent variable:
------------------------------
mpg cyl disp
(1) (2) (3)
--------------------------------------------------
am 3.545* -0.176 -40.223
(1.897) (0.615) (48.081)
If I use lapply and paste, it ends up changing the headings of the dependent variables in stargazer:
dependents <- c('mpg', 'cyl', 'disp')
outs <- lapply(dependents, function(x) {
fit <- lm(paste(x,'~', 'am + gear + carb'), data=mtcars)})
stargazer(outs[[1]], outs[[2]], outs[[3]],
title="Regression Results", type="text",
df=FALSE, digits=3)
gives the output where x is the heading for the dependent variables:
Regression Results
==================================================
Dependent variable:
------------------------------
x
(1) (2) (3)
--------------------------------------------------
am 3.545* -0.176 -40.223
(1.897) (0.615) (48.081)
Is there any way for me to fix this? Thank you.
If you create the formula before you run the regression it should work. I just separated the formula creation and the regression.
dependents <- c('mpg', 'cyl', 'disp')
outs <- lapply(dependents, function(x) {
formula <- as.formula(paste(x,'~', 'am + gear + carb'))
fit <- lm(formula, data=mtcars)})
stargazer(outs[[1]], outs[[2]], outs[[3]],
title="Regression Results", type="text",
df=FALSE, digits=3)
When writing statistic model, I usually use a lot of co-variables to adjust the model, so I need to rewrite the variable again and again. Even though I could copy and paste, the model looks very long. Could I create a variable which could replace many variables? E.g.:
fm <- lm(y ~ a+b+c+d+e, data)
I could create a variable like: model1 = a+b+c+d+e, then the model looks like:
fm <-lm(y ~ model1, data)
I tried many ways, but it did successful, like model1 <- c(a+b+c+d),
Could someone help me with this?
how about saving it as a formula?
model <- ~a+b+c+d
You can then extract the terms using terms, or update the formula using update
Example:
model <- mpg ~ disp + wt + cyl
lm(model, mtcars)
## Call:
## lm(formula = model, data = mtcars)
##
## Coefficients:
## (Intercept) disp wt cyl
## 41.107678 0.007473 -3.635677 -1.784944
model <- update(model, ~. + qsec)
lm(model, mtcars)
## Call:
## lm(formula = model, data = mtcars)
##
## Coefficients:
## (Intercept) disp wt cyl qsec
## 30.17771 0.01029 -4.55318 -1.24109 0.55277
Edit:
As Kristoffer Winther Balling mentioned in the comments, a cleverer way to do this is to save the formula as a string (e.g. "mpg ~ disp + wt + cyl") and then use as.formula. You can then use familiar paste or other string manipulation functions to change the formula.
I'm wanting to run through a long vector of potential explanatory variables,
regressing a response variable on each in turn. Rather than paste together
the model formula, I'm thinking of using reformulate(),
as demonstrated here.
The function fun() below seems to do the job, fitting the desired model. Notice, though, that
it records in its call element the name of the constructed formula object
rather than its value.
## (1) Function using programmatically constructed formula
fun <- function(XX) {
ff <- reformulate(response="mpg", termlabels=XX)
lm(ff, data=mtcars)
}
fun(XX=c("cyl", "disp"))
#
# Call:
# lm(formula = ff, data = mtcars) <<<--- Note recorded call
#
# Coefficients:
# (Intercept) cyl disp
# 34.66099 -1.58728 -0.02058
## (2) Result of directly specified formula (just for purposes of comparison)
lm(mpg ~ cyl + disp, data=mtcars)
#
# Call:
# lm(formula = mpg ~ cyl + disp, data = mtcars) <<<--- Note recorded call
#
# Coefficients:
# (Intercept) cyl disp
# 34.66099 -1.58728 -0.02058
My question: Is there any danger in this? Can this become a
problem if, for instance, I want to later apply update, or predict or
some other function to the model fit object, (possibly from some other environment)?
A slightly more awkward alternative that does, nevertheless, get the recorded
call right is to use eval(substitute()). Is this in any way a generally safer construct?
fun2 <- function(XX) {
ff <- reformulate(response="mpg", termlabels=XX)
eval(substitute(lm(FF, data=mtcars), list(FF=ff)))
}
fun2(XX=c("cyl", "disp"))$call
## lm(formula = mpg ~ cyl + disp, data = mtcars)
I'm always hesitant to claim there are no situations in which something involving R environments and scoping might bite, but ... after some more exploration, my first usage above does look safe.
It turns out that the printed call is a bit of red herring.
The formula that actually gets used by other functions (and the one extracted by formula() and as.formula()) is the one stored in the terms element of the fit object, and it gets the actual formula right. (The terms element contains an object of class "terms", which is just a "formula" with a bunch of attached attributes.)
To see that all of the proposals in my question and the associated comments store the same "formula" object (up to the associated environment), run the following.
## First the three approaches in my post
formula(fun(XX=c("cyl", "disp")))
# mpg ~ cyl + disp
# <environment: 0x026d2b7c>
formula(lm(mpg ~ cyl + disp, data=mtcars))
# mpg ~ cyl + disp
formula(fun2(XX=c("cyl", "disp"))$call)
# mpg ~ cyl + disp
# <environment: 0x02c4ce2c>
## Then Gabor Grothendieck's idea
XX = c("cyl", "disp")
ff <- reformulate(response="mpg", termlabels=XX)
formula(do.call("lm", list(ff, quote(mtcars))))
## mpg ~ cyl + disp
To confirm that formula() really is deriving its output from the terms element of the fit object, have a look at stats:::formula.lm and stats:::formula.terms.