Is it possible to add multiple variables in the same regression model?

Is it possible to add multiple variables in the same regression model? - r

These are the regression models that I want to obtain. I want to select many variables at the same time to develop a multivariate model, since my data frame has 357 variables.
summary(lm(formula = bci_bci ~ bti_acp, data = qog))
summary(lm(formula = bci_bci ~ wdi_pop, data = qog))
summary(lm(formula = bci_bci ~ ffp_sl, data = qog))

Instead of listing all your variables using + signs, you can also use the shorthand notation . to add all variables in data as explanatory variables (except the target variable on the left hand side of course).
data("mtcars")
mod <- lm(mpg ~ ., data = mtcars)
summary(mod)
#>
#> Call:
#> lm(formula = mpg ~ ., data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.4506 -1.6044 -0.1196 1.2193 4.6271
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 12.30337 18.71788 0.657 0.5181
#> cyl -0.11144 1.04502 -0.107 0.9161
#> disp 0.01334 0.01786 0.747 0.4635
#> hp -0.02148 0.02177 -0.987 0.3350
#> drat 0.78711 1.63537 0.481 0.6353
#> wt -3.71530 1.89441 -1.961 0.0633 .
#> qsec 0.82104 0.73084 1.123 0.2739
#> vs 0.31776 2.10451 0.151 0.8814
#> am 2.52023 2.05665 1.225 0.2340
#> gear 0.65541 1.49326 0.439 0.6652
#> carb -0.19942 0.82875 -0.241 0.8122
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.65 on 21 degrees of freedom
#> Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
#> F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
par(mfrow=c(2,2))
plot(mod)
par(mfrow=c(1,1))
Created on 2021-12-21 by the reprex package (v2.0.1)
If you want to include all two-way interactions, the notation would be this:
lm(mpg ~ (.)^2, data = mtcars)
If you want to include all three-way interactions, the notation would be this:
lm(mpg ~ (.)^3, data = mtcars)
If you create very large models (with many variables or interactions), make sure that you also perform some model size reduction after that, e.g. using the function step(). It's very likely that not all your predictors are actually going to be informative, and many could be correlated, which causes problems in multivariate models. One way out of this could be to remove any predictors that are highly correlated to other predictors from the model.

Related

Fixed effect counts in modelsummary

I have a modelsummary of three fixed effects regressions like so:
remotes::install_github("lrberge/fixest")
remotes::install_github("vincentarelbundock/modelsummary")
library(fixest)
library(modelsummary)
mod1 <- feols(mpg ~ hp | cyl, data = mtcars)
mod2 <- feols(mpg ~ wt | cyl, data = mtcars)
mod3 <- feols(mpg ~ drat | cyl, data = mtcars)
modelsummary(list(mod1, mod2, mod3), output = "markdown")
Model 1
Model 2
Model 3
hp
-0.024
(0.015)
wt
-3.206
(1.188)
drat
1.793
(1.564)
Num.Obs.
32
32
32
R2
0.754
0.837
0.745
R2 Adj.
0.727
0.820
0.718
R2 Within
0.080
0.392
0.048
R2 Within Adj.
0.047
0.371
0.014
AIC
167.9
154.6
169.0
BIC
173.8
160.5
174.9
RMSE
2.94
2.39
2.99
Std.Errors
by: cyl
by: cyl
by: cyl
FE: cyl
X
X
X
Instead of having the table show merely whether certain fixed effects were present, is it possible to show the number of fixed effects that were estimated instead?
The raw models do contain this information:
> mod1
OLS estimation, Dep. Var.: mpg
Observations: 32
Fixed-effects: cyl: 3
Standard-errors: Clustered (cyl)
Estimate Std. Error t value Pr(>|t|)
hp -0.024039 0.015344 -1.56664 0.25771
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 2.94304 Adj. R2: 0.727485
Within R2: 0.07998

Yes, you’ll need to define a glance_custom.fixest() method. See this section of the docs for detailed instructions and many examples:
https://vincentarelbundock.github.io/modelsummary/articles/modelsummary.html#customizing-existing-models-part-i
And here’s an example with fixest:
library(fixest)
library(tibble)
library(modelsummary)
models <- list(
feols(mpg ~ hp | cyl, data = mtcars),
feols(mpg ~ hp | am, data = mtcars),
feols(mpg ~ hp | cyl + am, data = mtcars)
)
glance_custom.fixest <- function(x, ...) {
tibble::tibble(`# FE` = paste(x$fixef_sizes, collapse = " + "))
}
modelsummary(models, gof_map = c("nobs", "# FE"))
(1)
(2)
(3)
hp
-0.024
-0.059
-0.044
(0.015)
(0.000)
(0.016)
Num.Obs.
32
32
32
# FE
3
2
3 + 2

How do I extract variables that have a low p-value in R

I have a logistic model with plenty of interactions in R.
I want to extract only the variables and interactions that are either interactions or just predictor variables that are significant.
It's fine if I can just look at every interaction that's significant as I can still look at which non-significant fields were used to get them.
Thank you.
This is the most I have
broom::tidy(logmod)[,c("term", "estimate", "p.value")]

Here is a way. After fitting the logistic model use a logical condition to get the significant predictors and a regex (logical grep) to get the interactions. These two index vectors can be combined with &, in the case below returning no significant interactions at the alpha == 0.05 level.
fit <- glm(am ~ hp + qsec*vs, mtcars, family = binomial)
summary(fit)
#>
#> Call:
#> glm(formula = am ~ hp + qsec * vs, family = binomial, data = mtcars)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.93876 -0.09923 -0.00014 0.05351 1.33693
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 199.02697 102.43134 1.943 0.0520 .
#> hp -0.12104 0.06138 -1.972 0.0486 *
#> qsec -10.87980 5.62557 -1.934 0.0531 .
#> vs -108.34667 63.59912 -1.704 0.0885 .
#> qsec:vs 6.72944 3.85348 1.746 0.0808 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 43.230 on 31 degrees of freedom
#> Residual deviance: 12.574 on 27 degrees of freedom
#> AIC: 22.574
#>
#> Number of Fisher Scoring iterations: 8
alpha <- 0.05
pval <- summary(fit)$coefficients[,4]
sig <- pval <= alpha
intr <- grepl(":", names(coef(fit)))
coef(fit)[sig]
#> hp
#> -0.1210429
coef(fit)[sig & intr]
#> named numeric(0)
Created on 2022-09-15 with reprex v2.0.2

Different independent variables and table from summary-values

I have problem that I have been trying to solve for a couple of hours now but I simply can't figure it out (I'm new to R btw..).
Basically, what I'm trying to do (using mtcars to illustrate) is to make R test different independent variables (while adjusting for "cyl" and "disp") for the same independent variable ("mpg"). The best soloution I have been able to come up with is:
lm <- lapply(mtcars[,4:6], function(x) lm(mpg ~ cyl + disp + x, data = mtcars))
summary <- lapply(lm, summary)
... where 4:6 corresponds to columns "hp", "drat" and "wt".
This acutually works OK but the problem is that the summary appers with an "x" instead of for instace "hp":
$hp
Call:
lm(formula = mpg ~ cyl + disp + x, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.0889 -2.0845 -0.7745 1.3972 6.9183
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.18492 2.59078 13.195 1.54e-13 ***
cyl -1.22742 0.79728 -1.540 0.1349
disp -0.01884 0.01040 -1.811 0.0809 .
x -0.01468 0.01465 -1.002 0.3250
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.055 on 28 degrees of freedom
Multiple R-squared: 0.7679, Adjusted R-squared: 0.743
F-statistic: 30.88 on 3 and 28 DF, p-value: 5.054e-09
Questions:
Is there a way to fix this? And have I done this in the smartest way using lapply, or would it be better to use for instance for loops or other options?
Ideally, I would also very much like to make a table showing for instance only the estimae and P-value for each dependent variable. Can this somehow be done?
Best regards

One approach to get the name of the variable displayed in the summary is by looping over the names of the variables and setting up the formula using paste and as.formula:
lm <- lapply(names(mtcars)[4:6], function(x) {
formula <- as.formula(paste0("mpg ~ cyl + disp + ", x))
lm(formula, data = mtcars)
})
summary <- lapply(lm, summary)
summary
#> [[1]]
#>
#> Call:
#> lm(formula = formula, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4.0889 -2.0845 -0.7745 1.3972 6.9183
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 34.18492 2.59078 13.195 1.54e-13 ***
#> cyl -1.22742 0.79728 -1.540 0.1349
#> disp -0.01884 0.01040 -1.811 0.0809 .
#> hp -0.01468 0.01465 -1.002 0.3250
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.055 on 28 degrees of freedom
#> Multiple R-squared: 0.7679, Adjusted R-squared: 0.743
#> F-statistic: 30.88 on 3 and 28 DF, p-value: 5.054e-09
Concerning the second part of your question. One way to achieve this by making use of broom::tidy from the broom package which gives you a summary of regression results as a tidy dataframe:
lapply(lm, broom::tidy)
#> [[1]]
#> # A tibble: 4 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 34.2 2.59 13.2 1.54e-13
#> 2 cyl -1.23 0.797 -1.54 1.35e- 1
#> 3 disp -0.0188 0.0104 -1.81 8.09e- 2
#> 4 hp -0.0147 0.0147 -1.00 3.25e- 1

We could use reformulate to create the formula for the lm
lst1 <- lapply(names(mtcars)[4:6], function(x) {
fmla <- reformulate(c("cyl", "disp", x),
response = "mpg")
model <- lm(fmla, data = mtcars)
model$call <- deparse(fmla)
model
})
Then, get the summary
summary1 <- lapply(lst1, summary)
summary1[[1]]
#Call:
#"mpg ~ cyl + disp + hp"
#Residuals:
# Min 1Q Median 3Q Max
#-4.0889 -2.0845 -0.7745 1.3972 6.9183
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 34.18492 2.59078 13.195 1.54e-13 ***
#cyl -1.22742 0.79728 -1.540 0.1349
#disp -0.01884 0.01040 -1.811 0.0809 .
#hp -0.01468 0.01465 -1.002 0.3250
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 3.055 on 28 degrees of freedom
#Multiple R-squared: 0.7679, Adjusted R-squared: 0.743
#F-statistic: 30.88 on 3 and 28 DF, p-value: 5.054e-09

Get model estimates from another reference level, without running new model?

I am wondering if there is a simple way to change what values are in the intercept, perhaps mathematically, without re-running large models. As an example:
mtcars$cyl<-as.factor(mtcars$cyl)
summary(
lm(mpg~cyl+hp,data=mtcars)
)
Output:
Call:
lm(formula = mpg ~ cyl + hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.818 -1.959 0.080 1.627 6.812
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.65012 1.58779 18.044 < 2e-16 ***
cyl6 -5.96766 1.63928 -3.640 0.00109 **
cyl8 -8.52085 2.32607 -3.663 0.00103 **
hp -0.02404 0.01541 -1.560 0.12995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.146 on 28 degrees of freedom
Multiple R-squared: 0.7539, Adjusted R-squared: 0.7275
F-statistic: 28.59 on 3 and 28 DF, p-value: 1.14e-08
Now I can change the reference level to 6 cyl, and can see how 8 cyl now compares to 6 cyl, rather than 4 cyl:
mtcars$cyl<-relevel(mtcars$cyl,"6")
summary(
lm(mpg~cyl+hp,data=mtcars)
)
Output:
Call:
lm(formula = mpg ~ cyl + hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.818 -1.959 0.080 1.627 6.812
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.68246 2.22805 10.18 6.48e-11 ***
cyl4 5.96766 1.63928 3.64 0.00109 **
cyl8 -2.55320 1.97867 -1.29 0.20748
hp -0.02404 0.01541 -1.56 0.12995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.146 on 28 degrees of freedom
Multiple R-squared: 0.7539, Adjusted R-squared: 0.7275
F-statistic: 28.59 on 3 and 28 DF, p-value: 1.14e-08
What I am wondering is there a way to get these values without re-running a model? You can see that the comparison from 4 cyl to 6 cyl is the same in each model (-5.96 and 5.96), but how would I get the estimate for the 'other' coefficient in either model (e.g. the -2.55 from the first model). Of course in this case, it takes a fraction of a second to run the other model. But with very large models, it would be convenient to be able to change reference level without re-running. Are there relatively simple ways to convert all of the estimates and standard errors to be based off of a different reference level, or is it too complicated to do such a thing?
Any solutions for lme4, glmmTMB, or rstanarm models would be appreciated.

Here's a function that will give you the coefficiencts for every rearrangement of a given factor variable without having to run the model again or specify contrasts:
rearrange_model_factors <- function(model, var)
{
var <- deparse(substitute(var))
coefs <- coef(model)
level_coefs <- grep(paste0("^", var), names(coefs))
coefs[level_coefs] <- coefs[1] + coefs[level_coefs]
used_levels <- gsub(var, "", names(coefs[level_coefs]))
all_levels <- levels(model$model[[var]])
names(coefs)[1] <- paste0(var, setdiff(all_levels, used_levels))
level_coefs <- grep(paste0("^", var), names(coefs))
levs <- coefs[level_coefs]
perms <- gtools::permutations(length(levs), length(levs))
perms <- lapply(seq(nrow(perms)), function(i) levs[perms[i,]])
lapply(perms, function(x) {
x[-1] <- x[-1] - x[1]
coefs[level_coefs] <- x
names(coefs)[level_coefs] <- names(x)
names(coefs)[1] <- "(Intercept)"
coefs
})
}
Suppose you had a model like this:
iris_mod <- lm(Sepal.Width ~ Species + Sepal.Length, data = iris)
To see how your coefficients would change if Species were in a different order, you would just do:
rearrange_model_factors(iris_mod, Species)
#> [[1]]
#> (Intercept) Speciesversicolor Speciesvirginica Sepal.Length
#> 1.6765001 -0.9833885 -1.0075104 0.3498801
#>
#> [[2]]
#> (Intercept) Speciesvirginica Speciesversicolor Sepal.Length
#> 1.6765001 -1.0075104 -0.9833885 0.3498801
#>
#> [[3]]
#> (Intercept) Speciessetosa Speciesvirginica Sepal.Length
#> 0.69311160 0.98338851 -0.02412184 0.34988012
#>
#> [[4]]
#> (Intercept) Speciesvirginica Speciessetosa Sepal.Length
#> 0.69311160 -0.02412184 0.98338851 0.34988012
#>
#> [[5]]
#> (Intercept) Speciessetosa Speciesversicolor Sepal.Length
#> 0.66898976 1.00751035 0.02412184 0.34988012
#>
#> [[6]]
#> (Intercept) Speciesversicolor Speciessetosa Sepal.Length
#> 0.66898976 0.02412184 1.00751035 0.34988012
Or with your own example:
mtcars$cyl <- as.factor(mtcars$cyl)
rearrange_model_factors(lm(mpg ~ cyl + hp, data = mtcars), cyl)
#> [[1]]
#> (Intercept) cyl6 cyl8 hp
#> 28.65011816 -5.96765508 -8.52085075 -0.02403883
#>
#> [[2]]
#> (Intercept) cyl8 cyl6 hp
#> 28.65011816 -8.52085075 -5.96765508 -0.02403883
#>
#> [[3]]
#> (Intercept) cyl4 cyl8 hp
#> 22.68246309 5.96765508 -2.55319567 -0.02403883
#>
#> [[4]]
#> (Intercept) cyl8 cyl4 hp
#> 22.68246309 -2.55319567 5.96765508 -0.02403883
#>
#> [[5]]
#> (Intercept) cyl4 cyl6 hp
#> 20.12926741 8.52085075 2.55319567 -0.02403883
#>
#> [[6]]
#> (Intercept) cyl6 cyl4 hp
#> 20.12926741 2.55319567 8.52085075 -0.02403883
We need a bit of exposition to see why this works.
Although the function above only runs the model once, let's start by creating a list containing 3 versions of mtcars, where the baseline factor levels of cyl are all different.
df_list <- list(mtcars_4 = within(mtcars, cyl <- factor(cyl, c(4, 6, 8))),
mtcars_6 = within(mtcars, cyl <- factor(cyl, c(6, 4, 8))),
mtcars_8 = within(mtcars, cyl <- factor(cyl, c(8, 4, 6))))
Now we can extract the coefficients of your model for all three versions at once using lapply. For clarity, we will remove the hp coefficient, which remains static across all three versions anyway:
coefs <- lapply(df_list, function(x) coef(lm(mpg ~ cyl + hp, data = x))[-4])
coefs
#> $mtcars_4
#> (Intercept) cyl6 cyl8
#> 28.650118 -5.967655 -8.520851
#>
#> $mtcars_6
#> (Intercept) cyl4 cyl8
#> 22.682463 5.967655 -2.553196
#>
#> $mtcars_8
#> (Intercept) cyl4 cyl6
#> 20.129267 8.520851 2.553196
Now, we remind ourselves that the coefficient for each factor level is given relative to the baseline level. That means for the non-intercept coefficients, we can simply add the intercept value to their coefficients to get their absolute value. That means that these numbers represent the expected value for mpg when hp equals 0 for all three levels of cyl
coefs <- lapply(coefs, function(x) c(x[1], x[-1] + x[1]))
coefs
#> $mtcars_4
#> (Intercept) cyl6 cyl8
#> 28.65012 22.68246 20.12927
#>
#> $mtcars_6
#> (Intercept) cyl4 cyl8
#> 22.68246 28.65012 20.12927
#>
#> $mtcars_8
#> (Intercept) cyl4 cyl6
#> 20.12927 28.65012 22.68246
Since we now have all three values as absolutes, let's rename "Intercept" to the appropriate factor level:
coefs <- mapply(function(x, y) { names(x)[1] <- y; x},
x = coefs, y = c("cyl4", "cyl6", "cyl8"), SIMPLIFY = FALSE)
coefs
#> $mtcars_4
#> cyl4 cyl6 cyl8
#> 28.65012 22.68246 20.12927
#>
#> $mtcars_6
#> cyl6 cyl4 cyl8
#> 22.68246 28.65012 20.12927
#>
#> $mtcars_8
#> cyl8 cyl4 cyl6
#> 20.12927 28.65012 22.68246
Finally, let's rearrange the order so we can compare the absolute values of all three factor levels:
coefs <- lapply(coefs, function(x) x[order(names(x))])
coefs
#> $mtcars_4
#> cyl4 cyl6 cyl8
#> 28.65012 22.68246 20.12927
#>
#> $mtcars_6
#> cyl4 cyl6 cyl8
#> 28.65012 22.68246 20.12927
#>
#> $mtcars_8
#> cyl4 cyl6 cyl8
#> 28.65012 22.68246 20.12927
We can see they are all the same. This is why the ordering of factors is arbitrary in lm: changing the order of the factor levels gives the same numerical predictions in the end, even if the summary appears different.
TL;DR
So the answer to your question of where do you get the -2.55 if you only have the first model is find the difference between the non-intercept coefficients. In this case
(-8.520851) -(-5.967655)
#> [1] -2.553196
Alternatively, add the intercept on to the non-intercept coefficients and you can see what the intercept would be if any of the levels were baseline and you can get the coefficient for any level relative to any other by simple subtraction. That's how my function rearrange_model_factors works.
Created on 2020-10-05 by the reprex package (v0.3.0)

Testing the difference between marginal effects calculated across factors

I'm trying to test the difference between two marginal effects. I can get R to calculate the effects, but I can't find any resource explaining how to test their difference.
I've looked in the margins documentations and other marginal effects packages but have not been able to find something that tests the difference.
data("mtcars")
mod<-lm(mpg~as.factor(am)*disp,data=mtcars)
(marg<-margins(model = mod,at = list(am = c("0","1"))))
at(am) disp am1
0 -0.02758 0.4518
1 -0.05904 0.4518
summary(marg)
factor am AME SE z p lower upper
am1 1.0000 0.4518 1.3915 0.3247 0.7454 -2.2755 3.1791
am1 2.0000 0.4518 1.3915 0.3247 0.7454 -2.2755 3.1791
disp 1.0000 -0.0276 0.0062 -4.4354 0.0000 -0.0398 -0.0154
disp 2.0000 -0.0590 0.0096 -6.1353 0.0000 -0.0779 -0.0402
I want to produce a test that decides whether or not the marginal effects in each row of marg are significantly different; i.e., that the slopes in the marginal effects plots are different. This appears to be true because the confidence intervals do not overlap -- indicating that the effect of displacement is different for am=0 vs am=1.
We discuss in the comments below that we can test contrasts using emmeans, but that is a test of the average response across am=0 and am=1.
emm<-emmeans(mod,~ as.factor(am)*disp)
emm
am disp emmean SE df lower.CL upper.CL
0 231 18.8 0.763 28 17.2 20.4
1 231 19.2 1.164 28 16.9 21.6
cont<-contrast(emm,list(`(0-1)`=c(1,-1)))
cont
contrast estimate SE df t.ratio p.value
(0-1) -0.452 1.39 28 -0.325 0.7479
Here the p-value is large indicating that average response when am=0 is not significantly different than when am=1.
Is it reasonable to do this (like testing the difference of two means)?
smarg<-summary(marg)
(z=as.numeric((smarg$AME[3]-smarg$AME[4])/sqrt(smarg$SE[3]^2+smarg$SE[4]^2)))
[1] 2.745
2*pnorm(-abs(z))
[1] 0.006044
This p-value appears to agree with the analysis of non overlapping confidence intervals.

If I understand your question, it can be answered using emtrends:
library(emmeans)
emt = emtrends(mod, "am", var = "disp")
emt # display the estimated slopes
## am disp.trend SE df lower.CL upper.CL
## 0 -0.0276 0.00622 28 -0.0403 -0.0148
## 1 -0.0590 0.00962 28 -0.0787 -0.0393
##
## Confidence level used: 0.95
pairs(emt) # test the difference of slopes
## contrast estimate SE df t.ratio p.value
## 0 - 1 0.0315 0.0115 28 2.745 0.0104

For the question of "Are the slopes statistically different, indicating that the effect of displacement is different for am=0 vs am=1?" question, you can get the p-value associated with the comparison directly from the summary of the lm() fit.
> summary(mod)
Call:
lm(formula = mpg ~ as.factor(am) * disp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.6056 -2.1022 -0.8681 2.2894 5.2315
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.157064 1.925053 13.068 1.94e-13 ***
as.factor(am)1 7.709073 2.502677 3.080 0.00460 **
disp -0.027584 0.006219 -4.435 0.00013 ***
as.factor(am)1:disp -0.031455 0.011457 -2.745 0.01044 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.907 on 28 degrees of freedom
Multiple R-squared: 0.7899, Adjusted R-squared: 0.7674
F-statistic: 35.09 on 3 and 28 DF, p-value: 1.27e-09
Notice that the p-value for the as.factor(am)1:disp term is 0.01044, which matches the output from pairs(emt) in Russ Lenth's answer.
(posting as an answer because insufficient reputation to post as a comment, yet)

Not sure, but probably you're looking at contrasts or pairwise comparisons of marginal effects? You can do this using the emmeans package:
library(margins)
library(emmeans)
library(magrittr)
data("mtcars")
mod <- lm(mpg ~ as.factor(am) * disp, data = mtcars)
marg <- margins(model = mod, at = list(am = c("0", "1")))
marg
#> Average marginal effects at specified values
#> lm(formula = mpg ~ as.factor(am) * disp, data = mtcars)
#> at(am) disp am1
#> 0 -0.02758 0.4518
#> 1 -0.05904 0.4518
emmeans(mod, c("am", "disp")) %>%
contrast(method = "pairwise")
#> contrast estimate SE df t.ratio p.value
#> 0,230.721875 - 1,230.721875 -0.452 1.39 28 -0.325 0.7479
emmeans(mod, c("am", "disp")) %>%
contrast()
#> contrast estimate SE df t.ratio p.value
#> 0,230.721875 effect -0.226 0.696 28 -0.325 0.7479
#> 1,230.721875 effect 0.226 0.696 28 0.325 0.7479
#>
#> P value adjustment: fdr method for 2 tests
Or simply use summary():
library(margins)
data("mtcars")
mod <- lm(mpg ~ as.factor(am) * disp, data = mtcars)
marg <- margins(model = mod, at = list(am = c("0", "1")))
marg
#> Average marginal effects at specified values
#> lm(formula = mpg ~ as.factor(am) * disp, data = mtcars)
#> at(am) disp am1
#> 0 -0.02758 0.4518
#> 1 -0.05904 0.4518
summary(marg)
#> factor am AME SE z p lower upper
#> am1 1.0000 0.4518 1.3915 0.3247 0.7454 -2.2755 3.1791
#> am1 2.0000 0.4518 1.3915 0.3247 0.7454 -2.2755 3.1791
#> disp 1.0000 -0.0276 0.0062 -4.4354 0.0000 -0.0398 -0.0154
#> disp 2.0000 -0.0590 0.0096 -6.1353 0.0000 -0.0779 -0.0402
Created on 2019-06-07 by the reprex package (v0.3.0)

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Is it possible to add multiple variables in the same regression model? - r

Related

Fixed effect counts in modelsummary

How do I extract variables that have a low p-value in R

Different independent variables and table from summary-values

Get model estimates from another reference level, without running new model?

Testing the difference between marginal effects calculated across factors

Categories

Resources