Change `gtsummary::tbl_regression` columns - r

I would like to reformat the column in gtsummary::tbl_regression similar to tbl_summary using the statistic argument. However, I cannot find the corresponding argument to make this adjustment. Thank you for your help pointing me to the argument!
For example, instead of:
library(dplyr)
library(gtsummary)
glm(response ~ age, trial, family = binomial(link = "logit")) %>%
tbl_regression(exponentiate = TRUE)
Created on 2021-07-13 by the reprex package (v0.3.0)
I would like:
Characteristic
OR (95% CI; p value)
Age
1.02 (1.00,1.04; 0.10)

You can merge columns in gtsummary, but I will say that this feature is not documented for users because it is still being thought out and it is possible that is implementation may change slightly in a future release. Example below!
library(gtsummary)
glm(response ~ age, trial, family = binomial(link = "logit")) %>%
tbl_regression(exponentiate = TRUE) %>%
modify_table_styling(
column = estimate,
rows = !is.na(estimate),
cols_merge_pattern = "{estimate} ({ci}; {p.value})",
label = "**OR (95% CI; p value)**"
) %>%
modify_footnote(estimate ~ "OR = Odds Ratio, CI = Confidence Interval",
abbreviation = TRUE)

Related

Use purrr to run multiple regression models with changing outcomes and then extract residuals

I would like to run some regression models with different y (so the independent variables stay the same for all the models), and then extract the residuals from each of these models and add them to the original data set.
I will use diamonds to show what I came up with:
# In my example, the models are: x or y or z = carat + cut + color + clarity + price
dependent = c("x", "y", "z")
model = function(y, dataset) {
a = map(
setNames(y, y), ~ glm(reformulate(termlabels = c("carat", "cut", "color", "clarity", "price"),
response = y),
family = gaussian(link = "identity"),
data = dataset
))
resids = map_dfr(a, residuals)
df = bind_cols(dataset, resids)
print(df)
}
model(y = dependent, dataset = diamonds)
But this code doesn't work. I would also like to have sensible names for the residuals that are added as new columns, otherwise it is difficult to differentiate the residuals when the number of models is big.
generate example data
library(tidyverse)
set.seed(101)
dd <- diamonds
dependent <- c("x", "y", "z")
for (d in dependent) {
dd[[d]] <- rnorm(nrow(diamonds))
}
process
library(broom)
res <- (dependent
## set names so .id = argument works downstream
%>% setNames(dependent)
## construct list of formulas
%>% map(reformulate, termlabels = c("carat", "cut", "color", "clarity", "price"))
## fit glmes
%>% map(glm, family = gaussian(link = "identity"), dd,
na.action = na.exclude)
## compute resids (add observation number) and collapse to tibble
%>% map_dfr(~tibble(.obs=seq(nrow(dd)), .resid = residuals(.)), .id = "response")
## widen data → residuals from each response variable as a column
%>% pivot_wider(names_from = "response", values_from = ".resid",
names_prefix ="res_")
%>% select(-.obs)
)
## combine with original data
res2 <- bind_cols(dd, res)
Notes:
it's not obvious to me why you're using glm(., family = gaussian(link = "identity)) here, unless it's as a placeholder to something more complicated you're doing with your real data. (If this is your actual model then using lm() will be simpler and faster.)
adding na.action = na.exclude to the [g]lm() call will include NA values in the predictions, residuals, etc., which will help your residuals line up better with the original data.

Likelihood ratio test pvalues in gtsummary

How do l incorporate likelihood ratio test p values in gtsummary output table?
library(gtsummary)
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE)
You can use add_global_p(test = "LR") to add the LRT p-value. In the background, the function is using car::Anova(mod = x, type = "III", test = "LR") to calculate the p-value
library(gtsummary)
#> #BlackLivesMatter
tbl <-
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial)
) %>%
add_global_p(test = "LR")
#> add_global_p: Global p-values for variable(s) `add_global_p(include = "grade")`
#> were calculated with
#> `car::Anova(mod = x$model_obj, type = "III", test = "LR")`
Created on 2021-05-12 by the reprex package (v2.0.0)

How can to combine odds ratios and the confidence intervals

I am trying to combine the ORs and confidence interval in one column so as to achieve the following results 1.10(0.52,2.29)
library(gtsummary)
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE
)
You can use the modify_table_styling() function to merge two or more columns. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.0'
tbl <-
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE
) %>%
modify_table_styling(
columns = estimate,
rows = !is.na(ci),
cols_merge_pattern = "{estimate} ({ci})"
) %>%
modify_header(estimate ~ "**OR (95% CI)**") %>%
modify_footnote(estimate ~ "OR = Odds Ratio, CI = Confidence Interval",
abbreviation = TRUE)
Created on 2021-05-03 by the reprex package (v2.0.0)

Get confidence intervals and exp with broom from nested coxph-models

Data and libraries:
test <- tibble(start=c(1,2,5,2,1,7,3,4,8,8),
age=c(2,3,6,7,8,9,9,9,14,17),
event=c(1,1,0,1,1,1,1,0,0,0),
x=c(1,0,0,1,0,1,1,1,0,0),
sex=c(0,0,0,0,0,1,1,1,1,1))
library(tidyverse)
library(broom)
library(survival)
I want to nest several grouped tibbles and create coxph objects and extract and nest data with tidy and glance (from broom package). In the tidy output I also want the data to be exponentiated and with confidence intervals. This works:
coxph_obj <- (coxph(Surv(start, event) ~ x + sex + age, test))
tidy(coxph_obj, exponentiate = TRUE, conf.int = TRUE)
However, I dont know how to get exponentiate = TRUE, conf.int = TRUE to work in tidied = map(fit, tidy) below:
test %>%
nest(data = -sex) %>%
mutate(
fit = map(data, ~ coxph(Surv(start, event) ~ x + sex + age, data = test)),
tidied = map(fit, tidy),
glanced = map(fit, glance)
)
unnest(c(tidied, glanced), names_repair = "universal" )
Answer provided by Ben in a comment:
"What does using tidied = map(fit, tidy, exponentiate = TRUE, conf.int = TRUE) give you in your mutate"

sort `caret` models in `bwplot()`

I am plotting box-plots of the accuracy scores of resamples of yearly models trained with caret.
The models are named by the years they refer to: 2000, 2001, 2002, ..., 2010.
I want the models to appear in the box-plots with ascending order based on the year i.e. name of the model.
The summary of resamples based on the below code
fit.year.res <- resamples(fit.year)
summary(fit.year.res)
looks like this:
But then, the different yearly models in the box plot are not sorted:
scales <- list(x=list(relation="free"), y=list(relation="free"))
bwplot(fit.year.res, scales=scales)
I have tried converting the models element of resamples fit.year.res$models to factor from character, but it didn't make nay difference.
I am not aware of an easy solution using bwplot method from caret package. Perhaps there is one but my lattice skills are lacking. I recommend plotting the boxplots manually using ggplot2. This way you will have much better control over the final plot.
Since you did not post an example with data I will use one of the examples from ?caret:::bwplot.resamples
library(caret)
library(party)
library(RWeka)
load(url("http://topepo.github.io/caret/exampleModels.RData"))
resamps <- resamples(list(CART = rpartFit,
CondInfTree = ctreeFit,
MARS = earthFit))
bwplot(resamps,
metric = "RMSE")
produces:
To make the plot manually using ggplot you will need some data manipulation:
library(tidyverse)
resamps$values %>% #extract the values
select(1, ends_with("RMSE")) %>% #select the first column and all columns with a name ending with "RMSE"
gather(model, RMSE, -1) %>% #convert to long table
mutate(model = sub("~RMSE", "", model)) %>% #leave just the model names
ggplot()+ #call ggplot
geom_boxplot(aes(x = RMSE, y = model)) -> p1 #and plot the box plot
p1
To set a specific order on the y axis:
p1 +
scale_y_discrete(limits = c("MARS", "CART", "CondInfTree"))
If you prefer lattice
library(lattice)
resamps$values %>%
select(1, ends_with("RMSE")) %>%
gather(model, RMSE, -1) %>%
mutate(model = sub("~RMSE", "", model)) %>%
{bwplot(model ~ RMSE, data = .)}
to change the order change the levels of model (this approach also works with ggplot2):
resamps$values %>%
select(1, ends_with("RMSE")) %>%
gather(model, RMSE, -1) %>%
mutate(model = sub("~RMSE", "", model),
model = factor(model, levels = c("MARS", "CART", "CondInfTree"))) %>%
{bwplot(model ~ RMSE, data = .)}
The function bwplot.resamples is used to generate this plot and if you look at the underlying code, the variables are factorized based on their average performance under the metric of interest.
Below I have the relevant code that does the factorization:
bwplot.resamples <- function (x, data = NULL, models = x$models, metric = x$metric, ...)
{
....
avPerf <- ddply(subset(plotData, Metric == metric[1]),
.(Model),
function(x) c(Median = median(x$value, na.rm = TRUE)))
avPerf <- avPerf[order(avPerf$Median),]
......
}
I guess what you need to do is to make the plot manually:
data(BloodBrain)
gbmFit <- train(bbbDescr[,-3], logBBB,"gbm",tuneLength=6,
trControl = trainControl(method = "cv"),verbose=FALSE)
glmnetFit <- train(bbbDescr[,-3], logBBB,"glmnet",tuneLength=6,
trControl = trainControl(method = "cv"))
rfFit <- train(bbbDescr[,-3], logBBB,"rf",tuneLength=6,
trControl = trainControl(method = "cv"))
knnFit <- train(bbbDescr[,-3], logBBB,"knn",tuneLength=6,
trControl = trainControl(method = "cv"))
resamps <- resamples(list(gbm = gbmFit,glmnet=glmnetFit,knn=knnFit,rf=rfFit))
If you plot, you can see they are sorted according to the medians (the solid dot):
bwplot(resamps,metric="MAE")
You can access the values under $values and make a function to plot it, something like below:
plotMet = function(obj,metric,var_order){
mat = obj$values
mat = mat[,grep(metric,colnames(mat))]
colnames(mat) = gsub("[~][^ ]*","",colnames(mat))
boxplot(mat[,var_order],horizontal=TRUE,las=2,xlab=metric)
}
plotMet(resamps,"MAE",c("rf","knn","gbm","glmnet"))
Also not a very good idea to name your models with numbers.. try something like model_2000, model_2001 etc

Resources