Likelihood ratio test pvalues in gtsummary - r

How do l incorporate likelihood ratio test p values in gtsummary output table?
library(gtsummary)
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE)

You can use add_global_p(test = "LR") to add the LRT p-value. In the background, the function is using car::Anova(mod = x, type = "III", test = "LR") to calculate the p-value
library(gtsummary)
#> #BlackLivesMatter
tbl <-
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial)
) %>%
add_global_p(test = "LR")
#> add_global_p: Global p-values for variable(s) `add_global_p(include = "grade")`
#> were calculated with
#> `car::Anova(mod = x$model_obj, type = "III", test = "LR")`
Created on 2021-05-12 by the reprex package (v2.0.0)

Related

How to modify variable labels in gtsummary table

As recommended in the tutorial for gtsummary's tbl_regression function, I am using the labelled package to assign attribute labels to my regression variables. However, when my regression formula includes a quadratic term, the resulting table includes the same variable label twice:
library(gtsummary)
library(labelled)
library(tidyverse)
df <- as_tibble(mtcars)
var_label(df) <- list( disp = "Displacement", vs = "Engine type")
c("disp", "disp + I(disp^2)") %>%
map(
~ paste("vs", .x, sep = " ~ ") %>%
as.formula() %>%
glm(data = df,
family = binomial(link = "logit")) %>%
tbl_regression(exponentiate = TRUE)) %>%
tbl_merge()
Is there a way to modify the label for the quadratic term in this case?
If you assign the label inside the tbl_regression() function, you'll see what you want to get.
library(gtsummary)
c("disp", "disp + I(disp^2)") %>%
purrr::map(
~ paste("vs", .x, sep = " ~ ") %>%
as.formula() %>%
glm(data = mtcars, family = binomial(link = "logit")) %>%
tbl_regression(
exponentiate = TRUE,
label = list(
disp = "Displacement",
`I(disp^2)` = "Displacement^2"
)
)
) %>%
tbl_merge() %>%
as_kable()
#> ✖ `I(disp^2)` terms have not been found in `x`.
Characteristic
OR
95% CI
p-value
OR
95% CI
p-value
Displacement
0.98
0.96, 0.99
0.002
0.99
0.92, 1.07
0.8
Displacement^2
1.00
1.00, 1.00
0.8
Created on 2022-09-19 with reprex v2.0.2

Change `gtsummary::tbl_regression` columns

I would like to reformat the column in gtsummary::tbl_regression similar to tbl_summary using the statistic argument. However, I cannot find the corresponding argument to make this adjustment. Thank you for your help pointing me to the argument!
For example, instead of:
library(dplyr)
library(gtsummary)
glm(response ~ age, trial, family = binomial(link = "logit")) %>%
tbl_regression(exponentiate = TRUE)
Created on 2021-07-13 by the reprex package (v0.3.0)
I would like:
Characteristic
OR (95% CI; p value)
Age
1.02 (1.00,1.04; 0.10)
You can merge columns in gtsummary, but I will say that this feature is not documented for users because it is still being thought out and it is possible that is implementation may change slightly in a future release. Example below!
library(gtsummary)
glm(response ~ age, trial, family = binomial(link = "logit")) %>%
tbl_regression(exponentiate = TRUE) %>%
modify_table_styling(
column = estimate,
rows = !is.na(estimate),
cols_merge_pattern = "{estimate} ({ci}; {p.value})",
label = "**OR (95% CI; p value)**"
) %>%
modify_footnote(estimate ~ "OR = Odds Ratio, CI = Confidence Interval",
abbreviation = TRUE)

How can to combine odds ratios and the confidence intervals

I am trying to combine the ORs and confidence interval in one column so as to achieve the following results 1.10(0.52,2.29)
library(gtsummary)
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE
)
You can use the modify_table_styling() function to merge two or more columns. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.0'
tbl <-
trial %>%
select(response, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE
) %>%
modify_table_styling(
columns = estimate,
rows = !is.na(ci),
cols_merge_pattern = "{estimate} ({ci})"
) %>%
modify_header(estimate ~ "**OR (95% CI)**") %>%
modify_footnote(estimate ~ "OR = Odds Ratio, CI = Confidence Interval",
abbreviation = TRUE)
Created on 2021-05-03 by the reprex package (v2.0.0)

R | How to get accuracy from cv.glmnet

I've been using the cv.glmnet function to fit a lasso logistic regression model. I'm using R
Here's my code. I'm using the iris dataset.
df = iris %>%
mutate(Species = as.character(Species)) %>%
filter(!(Species =="setosa")) %>%
mutate(Species = as.factor(Species))
X = data.matrix(df %>% select(-Species))
y = df$Species
Model = cv.glmnet(X, y, alpha = 1, family = "binomial")
How do I get the model accuracy from the cv.glmnet object (Model).
If I had been using caret on a normal logistic regression model, accuracy is already in the output.
train_control = trainControl(method = "cv", number = 10)
M2 = train(Species ~., data = df, trControl = train_control,
method = "glm", family = "binomial")
M2$results
but a cv.glmnet object doesn't seem to contain this information.
You want to add type.measure='class' as in Model 2 below, otherwise the default for family='binomial' is 'deviance'.
df = iris %>%
mutate(Species = as.character(Species)) %>%
filter(!(Species =="setosa")) %>%
mutate(Species = as.factor(Species))
X = data.matrix(df %>% select(-Species))
y = df$Species
Model = cv.glmnet(X, y, alpha = 1, family = "binomial")
Model2 = cv.glmnet(X, y, alpha = 1, family = "binomial", type.measure = 'class')
Then cvm gives the misclassification rate.
Model2$lambda ## lambdas used in CV
Model2$cvm ## mean cross-validated error for each of those lambdas
If you want results for the best lambda, you can use lambda.min
Model2$lambda.min ## lambda with the lowest cvm
Model2$cvm[Model2$lambda==Model2$lambda.min] ## cvm for lambda.min

Prediction of single bagged tree models dependent on pre-processing using caret

I'm using the caret package to predict a time series with method treebag. caret estimates bagging regression trees with 25 bootstrap replications.
What I'm struggling to understand is how the final prediction of that 'treebag model' relates to the predictions made by each of the 25 trees, depending on whether I use caret::preProcess, or not.
I am aware of this question and the linked resources therein. (But could not draw the right conclusions from it.)
Here is an example using the economics data. Let's say I want to predict unemploy_rate, which has to be created first.
# packages
library(caret)
library(tidyverse)
# data
data("economics")
economics$unemploy_rate <- economics$unemploy / economics$pop * 100
x <- economics[, -c(1, 7)]
y <- economics[["unemploy_rate"]]
I wrote a function that extracts the 25 individual trees from the train object, makes a prediction for each tree, averages these 25 predictions, and compares this average with the prediction from the train object. It returns a plot.
predict_from_treebag <- function(model) {
# extract 25 trees from train object
bagged_trees <- map(.x = model$finalModel$mtrees, .f = pluck, "btree")
# make a prediction for each tree
pred_trees <- map(bagged_trees, .f = predict, newdata = x)
names(pred_trees) <- paste0("tree_", seq_along(pred_trees))
# aggreagte predictions
pred_trees <- as.data.frame(pred_trees) %>%
add_column(date = economics$date, .before = 1) %>%
gather(tree, value, matches("^tree")) %>%
group_by(date) %>%
mutate(mean_pred_from_trees = mean(value)) %>%
ungroup()
# add prediction from train object
pred_trees$bagging_model_prediction = predict(model, x)
pred_trees <- pred_trees %>%
gather(model, pred_value, 4:5)
# plot
p <- ggplot(data = pred_trees, aes(date)) +
geom_line(aes(y = value, group = tree), alpha = .2) +
geom_line(aes(y = pred_value, col = model)) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "bottom"
)
p
}
Now I estimate two models, the first will be unscaled, the second will be centered and scaled.
preproc_opts <- list(unscaled = NULL,
scaled = c("center", "scale"))
# estimate the models
models <- map(preproc_opts, function(preproc)
train(
x = x,
y = y,
trControl = trainControl(method = "none"), # since there are no tuning parameters for this model
metric = "RMSE",
method = "treebag",
preProcess = preproc
))
# apply predict_from_treebag to each model
imap(.x = models,
.f = ~{predict_from_treebag(.x) + labs(title = .y)})
The results are shown below. The unscaled model prediction is the average of the 25 trees but why is each prediction from the 25 trees a constant when I use preProcess?
Thank you for any advice where I might be wrong.
The problem is in this part of the code:
pred_trees <- map(bagged_trees, .f = predict, newdata = x)
in the function predict_from_treebag
this predict function is in fact predict.rpart since
class(bagged_trees[[1]])
predict.rpart does not know that you pre-processed the data in caret.
Here is a quick fix:
predict_from_treebag <- function(model) {
# extract 25 trees from train object
bagged_trees <- map(.x = model$finalModel$mtrees, .f = pluck, "btree")
x <- economics[, -c(1, 7)]
# make a prediction for each tree
newdata = if(is.null(model$preProcess)) x else predict(model$preProcess, x)
pred_trees <- map(bagged_trees, .f = predict, newdata = newdata)
names(pred_trees) <- paste0("tree_", seq_along(pred_trees))
# aggreagte predictions
pred_trees <- as.data.frame(pred_trees) %>%
add_column(date = economics$date, .before = 1) %>%
gather(tree, value, matches("^tree")) %>%
group_by(date) %>%
mutate(mean_pred_from_trees = mean(value)) %>%
ungroup()
# add prediction from train object
pred_trees$bagging_model_prediction = predict(model, x)
pred_trees <- pred_trees %>%
gather(model, pred_value, 4:5)
# plot
p <- ggplot(data = pred_trees, aes(date)) +
geom_line(aes(y = value, group = tree), alpha = .2) +
geom_line(aes(y = pred_value, col = model)) +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "bottom"
)
p
}
Now after running:
preproc_opts <- list(unscaled = NULL,
scaled = c("center", "scale"))
models <- map(preproc_opts, function(preproc)
train(
x = x,
y = y,
trControl = trainControl(method = "none"), # since there are no tuning parameters for this model
metric = "RMSE",
method = "treebag",
preProcess = preproc
))
map2(.x = models,
.y = names(models),
.f = ~{predict_from_treebag(.x) + labs(title = .y)})
the result is in line with the expected

Resources