How to modify variable labels in gtsummary table - r

As recommended in the tutorial for gtsummary's tbl_regression function, I am using the labelled package to assign attribute labels to my regression variables. However, when my regression formula includes a quadratic term, the resulting table includes the same variable label twice:
library(gtsummary)
library(labelled)
library(tidyverse)
df <- as_tibble(mtcars)
var_label(df) <- list( disp = "Displacement", vs = "Engine type")
c("disp", "disp + I(disp^2)") %>%
map(
~ paste("vs", .x, sep = " ~ ") %>%
as.formula() %>%
glm(data = df,
family = binomial(link = "logit")) %>%
tbl_regression(exponentiate = TRUE)) %>%
tbl_merge()
Is there a way to modify the label for the quadratic term in this case?

If you assign the label inside the tbl_regression() function, you'll see what you want to get.
library(gtsummary)
c("disp", "disp + I(disp^2)") %>%
purrr::map(
~ paste("vs", .x, sep = " ~ ") %>%
as.formula() %>%
glm(data = mtcars, family = binomial(link = "logit")) %>%
tbl_regression(
exponentiate = TRUE,
label = list(
disp = "Displacement",
`I(disp^2)` = "Displacement^2"
)
)
) %>%
tbl_merge() %>%
as_kable()
#> ✖ `I(disp^2)` terms have not been found in `x`.
Characteristic
OR
95% CI
p-value
OR
95% CI
p-value
Displacement
0.98
0.96, 0.99
0.002
0.99
0.92, 1.07
0.8
Displacement^2
1.00
1.00, 1.00
0.8
Created on 2022-09-19 with reprex v2.0.2

Related

Likelihood ratio test pvalues in gtsummary

How do l incorporate likelihood ratio test p values in gtsummary output table?
library(gtsummary)
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE)
You can use add_global_p(test = "LR") to add the LRT p-value. In the background, the function is using car::Anova(mod = x, type = "III", test = "LR") to calculate the p-value
library(gtsummary)
#> #BlackLivesMatter
tbl <-
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial)
) %>%
add_global_p(test = "LR")
#> add_global_p: Global p-values for variable(s) `add_global_p(include = "grade")`
#> were calculated with
#> `car::Anova(mod = x$model_obj, type = "III", test = "LR")`
Created on 2021-05-12 by the reprex package (v2.0.0)

How can to combine odds ratios and the confidence intervals

I am trying to combine the ORs and confidence interval in one column so as to achieve the following results 1.10(0.52,2.29)
library(gtsummary)
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE
)
You can use the modify_table_styling() function to merge two or more columns. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.0'
tbl <-
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE
) %>%
modify_table_styling(
columns = estimate,
rows = !is.na(ci),
cols_merge_pattern = "{estimate} ({ci})"
) %>%
modify_header(estimate ~ "**OR (95% CI)**") %>%
modify_footnote(estimate ~ "OR = Odds Ratio, CI = Confidence Interval",
abbreviation = TRUE)
Created on 2021-05-03 by the reprex package (v2.0.0)

Can tbl_summary() be customized to display significance stars to the footer?

The gtsummary package in R has a new neat function add_significance_stars() which adds significance stars to coefficient estimates with small p-values in regression models. However, this function can only operate on tbl_regression or tbl_uvregression objects.
Is there a similar method that can be applied to a table_summary object so that p-value stars notate significant summary statistics?
library(tidyverse)
library(gtsummary)
This is a table_summary object with p-values displayed in a column.
mtcars %>%
select(gear, mpg, disp, hp, wt) %>%
tbl_summary(by = "gear") %>%
add_p()
table_summary object
This is a tbl_regression object with p-values displayed in the desired fashion in the footer
mtcars %>%
select(gear, mpg, disp, hp, wt) %>%
lm(formula = gear ~ mpg + disp + hp + wt) %>%
tbl_regression(intercept = TRUE) %>%
add_significance_stars()
tbl_regression object
The purpose of add_estimate_stars() is to replace the p-values with stars. If you'd like to add stars to p-values in a tbl_summary(), you can define a function that appends stars to significant p-values. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.0'
fmt_pvalue_with_stars <- function(x) {
dplyr::case_when(
x < 0.001 ~ paste0(style_pvalue(x), "***"),
x < 0.01 ~ paste0(style_pvalue(x), "**"),
x < 0.05 ~ paste0(style_pvalue(x), "*"),
TRUE ~ style_pvalue(x)
)
}
tbl <-
mtcars %>%
select(am, hp, cyl) %>%
tbl_summary(by = am) %>%
add_p(pvalue_fun = fmt_pvalue_with_stars) %>%
modify_footnote(p.value ~ "*p<0.05; **p<0.01; ***p<0.001")
Created on 2021-04-24 by the reprex package (v2.0.0)

Fit models with robust standard errors

I am using the following R code to run several linear regression models and extract results to dataframe:
library(tidyverse)
library(broom)
data <- mtcars
outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
models <- expand.grid(outcomes, exposures) %>%
group_by(Var1) %>% rowwise() %>%
summarise(frm = paste0(Var1, "~factor(", Var2, ")")) %>%
group_by(model_id = row_number(),frm) %>%
do(tidy(lm(.$frm, data = data))) %>%
mutate(lci = estimate-(1.96*std.error),
uci = estimate+(1.96*std.error))
How can I modify my code to use robust standard errors similar to STATA?
* example of using robust standard errors in STATA
regress y x, robust
There is a comprehensive discussion about the robust standard errors in lm models at stackexchange.
You can update your code in the following way:
library(sandwich)
models <- expand.grid(outcomes, exposures) %>%
group_by(Var1) %>% rowwise() %>%
summarise(frm = paste0(Var1, "~factor(", Var2, ")")) %>%
group_by(model_id = row_number(),frm) %>%
do(cbind(
tidy(lm(.$frm, data = data)),
robSE = sqrt(diag(vcovHC(lm(.$frm, data = data), type="HC1"))) )
) %>%
mutate(
lci = estimate - (1.96 * std.error),
uci = estimate + (1.96 * std.error),
lciR = estimate - (1.96 * robSE),
uciR = estimate + (1.96 * robSE)
)
The important line is this:
sqrt(diag(vcovHC(lm(.$frm, data = data), type="HC1"))) )
Function vcovHC returns covariance matrix. You need to extract variances on the diagonal diag and take compute a square root sqrt.

R print equation of linear regression on the plot itself

How do we print the equation of a line on a plot?
I have 2 independent variables and would like an equation like this:
y=mx1+bx2+c
where x1=cost, x2 =targeting
I can plot the best fit line but how do i print the equation on the plot?
Maybe i cant print the 2 independent variables in one equation but how do i do it for say
y=mx1+c at least?
Here is my code:
fit=lm(Signups ~ cost + targeting)
plot(cost, Signups, xlab="cost", ylab="Signups", main="Signups")
abline(lm(Signups ~ cost))
I tried to automate the output a bit:
fit <- lm(mpg ~ cyl + hp, data = mtcars)
summary(fit)
##Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.90833 2.19080 16.847 < 2e-16 ***
## cyl -2.26469 0.57589 -3.933 0.00048 ***
## hp -0.01912 0.01500 -1.275 0.21253
plot(mpg ~ cyl, data = mtcars, xlab = "Cylinders", ylab = "Miles per gallon")
abline(coef(fit)[1:2])
## rounded coefficients for better output
cf <- round(coef(fit), 2)
## sign check to avoid having plus followed by minus for negative coefficients
eq <- paste0("mpg = ", cf[1],
ifelse(sign(cf[2])==1, " + ", " - "), abs(cf[2]), " cyl ",
ifelse(sign(cf[3])==1, " + ", " - "), abs(cf[3]), " hp")
## printing of the equation
mtext(eq, 3, line=-2)
Hope it helps,
alex
You use ?text. In addition, you should not use abline(lm(Signups ~ cost)), as this is a different model (see my answer on CV here: Is there a difference between 'controling for' and 'ignoring' other variables in multiple regression). At any rate, consider:
set.seed(1)
Signups <- rnorm(20)
cost <- rnorm(20)
targeting <- rnorm(20)
fit <- lm(Signups ~ cost + targeting)
summary(fit)
# ...
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.1494 0.2072 0.721 0.481
# cost -0.1516 0.2504 -0.605 0.553
# targeting 0.2894 0.2695 1.074 0.298
# ...
windows();{
plot(cost, Signups, xlab="cost", ylab="Signups", main="Signups")
abline(coef(fit)[1:2])
text(-2, -2, adj=c(0,0), labels="Signups = .15 -.15cost + .29targeting")
}
Here's a solution using tidyverse packages.
The key is the broom package, whcih simplifies the process of extracting model data. For example:
fit1 <- lm(mpg ~ cyl, data = mtcars)
summary(fit1)
fit1 %>%
tidy() %>%
select(estimate, term)
Result
# A tibble: 2 x 2
estimate term
<dbl> <chr>
1 37.9 (Intercept)
2 -2.88 cyl
I wrote a function to extract and format the information using dplyr:
get_formula <- function(object) {
object %>%
tidy() %>%
mutate(
term = if_else(term == "(Intercept)", "", term),
sign = case_when(
term == "" ~ "",
estimate < 0 ~ "-",
estimate >= 0 ~ "+"
),
estimate = as.character(round(abs(estimate), digits = 2)),
term = if_else(term == "", paste(sign, estimate), paste(sign, estimate, term))
) %>%
summarize(terms = paste(term, collapse = " ")) %>%
pull(terms)
}
get_formula(fit1)
Result
[1] " 37.88 - 2.88 cyl"
Then use ggplot2 to plot the line and add a caption
mtcars %>%
ggplot(mapping = aes(x = cyl, y = mpg)) +
geom_point() +
geom_smooth(formula = y ~ x, method = "lm", se = FALSE) +
labs(
x = "Cylinders", y = "Miles per Gallon",
caption = paste("mpg =", get_formula(fit1))
)
Plot using geom_smooth()
This approach of plotting a line really only makes sense to visualize the relationship between two variables. As #Glen_b pointed out in the comment, the slope we get from modelling mpg as a function of cyl (-2.88) doesn't match the slope we get from modelling mpg as a function of cyl and other variables (-1.29). For example:
fit2 <- lm(mpg ~ cyl + disp + wt + hp, data = mtcars)
summary(fit2)
fit2 %>%
tidy() %>%
select(estimate, term)
Result
# A tibble: 5 x 2
estimate term
<dbl> <chr>
1 40.8 (Intercept)
2 -1.29 cyl
3 0.0116 disp
4 -3.85 wt
5 -0.0205 hp
That said, if you want to accurately plot the regression line for a model that includes variables that don't appear included in the plot, use geom_abline() instead and get the slope and intercept using broom package functions. As far as I know geom_smooth() formulas can't reference variables that aren't already mapped as aesthetics.
mtcars %>%
ggplot(mapping = aes(x = cyl, y = mpg)) +
geom_point() +
geom_abline(
slope = fit2 %>% tidy() %>% filter(term == "cyl") %>% pull(estimate),
intercept = fit2 %>% tidy() %>% filter(term == "(Intercept)") %>% pull(estimate),
color = "blue"
) +
labs(
x = "Cylinders", y = "Miles per Gallon",
caption = paste("mpg =", get_formula(fit2))
)
Plot using geom_abline()

Resources