How to make a forest plots for mixed models co-effiecents and their corresponding confidence interval.
I tried this code
Model = lme (fixed = score~ Age+Sex+yearsofeducation+walkspeed,
random = ~1|ID,
data=DB,
na.action = na.omit, method = "ML",
)
plot_summs (model)
However, I want the OR in the forest plots to be ordered in a descending fashion.
Thanks for the help.
I would call this a "coefficient plot", not a "forest plot". (A forest plot is used in meta-analyses, when you are comparing the magnitude of estimates of the same effect from many different studies.)
example setup
This is a slightly silly example, but should be close enough to yours (not clear to me why you're mentioning OR (= odds ratios?), these are typically from a logistic regression ... ?)
library(nlme)
mtcars <- transform(mtcars, cylgear = interaction(cyl, gear))
m1 <- lme(mpg ~ disp + hp + drat + wt + qsec,
random = ~1|cylgear,
data = mtcars)
coefficient plots: dotwhisker
You could get approximately what you want directly from the dotwhisker package, but it won't sort effects (or not easily, as far as I know):
library(dotwhisker)
library(broom.mixed) ## required to 'tidy' (process) lme fits
dwplot(m1, effects = "fixed")
coefficient plots: tidyverse
I usually do the processing myself, as I prefer increased flexibility.
library(tidyverse)
tt <- (m1
## extract estimates and CIs
|> tidy(effects = "fixed", conf.int = TRUE)
## usually *don't* want to compare intercept (dwplot does this automatically)
|> filter(term != "(Intercept)")
## scale parameters by 2SD - usually necessary for comparison
|> dotwhisker::by_2sd(data = mtcars)
## take only the bits we need, rename some (cosmetic)
|> select(term, estimate, lwr = conf.low, upr = conf.high)
## order terms by estimate value
|> mutate(across(term, ~reorder(factor(.), estimate)))
)
gg0 <- (ggplot(tt,
aes(estimate, term))
+ geom_pointrange(aes(xmin = lwr, xmax = upr))
+ geom_vline(xintercept = 0, lty = 2)
)
print(gg0)
The only remaining/possibility tricky question here is what to do if you have positive and negative coefficients of similar magnitude. If you want to sort by absolute value then
|> mutate(across(term, ~reorder(factor(.), estimate,
FUN = function(x) mean(abs(x)))
although this gets a bit ugly.
If you like the tidyverse you can substitute forcats::fct_reorder for reorder.
I’m just adding one more option to Ben Bolker’s excellent answer: using the modelsummary package. (Disclaimer: I am the author.)
With that package, you can use the modelplot() function to create a forest plot, and the coef_map argument to rename and reorder coefficients. If you are estimating a logit model and want the odds ratios, you can use the exponentiate argument.
The order in which you insert coefficients in the coef_map vector sorts them in the plot, from bottom to top. For example:
library(lme4)
library(modelsummary)
mod <- lmer(mpg ~ wt + drat + (1 | gear), data = mtcars)
modelplot(
mod,
coef_map = c("(Intercept)" = "Constant",
"drat" = "Rear Axle Ratio",
"wt" = "Weight"))
Related
I've recently written my first ggplot2 stat and geom methods. I want to write another that uses the data passed in ggplot2::ggplot(data=) to add a p-value as a caption to the figure. Is that possible?
For example, I would like to write something like this:
library(ggplot2)
mtcars |>
ggplot(aes(x = mpg, y = cyl)) +
add_pvalue()
Where add_pvalue() would calculate a p-value (e.g. an anova p-value for different mean MPG by the number of cylinders), and add the p-value as a caption, labs(caption = "p = 0.45").
Thank you!
Daniel, it's possible. You can use this example. Hope that help you !
library(ggplot2)
library(glue)
p_value <- 0.05
mtcars |>
ggplot(aes(x = mpg, y = cyl)) +
labs(caption = glue("p = {p_value}"))
You could do something like the following, picking your preferred statistical model, "types" of p-values, and formatting of the p-value. If you wanted to build in lots of functionality to make it useful for a wide variety of models, you'd want to add conditional extractor functions for those models.
# Packages
library(ggplot2)
library(dplyr)
library(rlang)
# Define "add_pvalue()" function
# adds p-value from linear regression of y on x
# note that this assumes x and y are reals or integers
add_pvalue <- function(ggplot_obj) {
# Get x and y variable names from ggplot object
x <- ggplot_obj$mapping$x %>%
rlang::quo_get_expr() %>%
deparse()
y <- ggplot_obj$mapping$y %>%
rlang::quo_get_expr() %>%
deparse()
# Build regression model formula, fit model, return model summary
mod <- paste0(y, "~ ", x) %>%
as.formula() %>%
lm(data = ggplot_obj$data) %>%
summary()
# Extract two-tailed t-test p-value from model object (reformat as desired)
pval <- mod$coefficients[x, "Pr(>|t|)"]
# Add p_value as plot caption
ggplot_obj +
labs(caption = paste0("p = ", pval))
}
# Example with p-value for linear model and 95% confidence intervals
mtcars %>%
ggplot(aes(x = mpg, y = cyl)) %>%
add_pvalue() +
geom_smooth(method = "lm", se = TRUE, level = 0.95)
#> `geom_smooth()` using formula 'y ~ x'
Note that blindly fitting a linear regression or ANOVA to your data is probably not the best decision since x or y may not be real or integer types. If they aren't, this won't really make sense since some models either throw runtime errors or employ one-hot encoding when passed other types of variables.
Similarly, the p-values you obtain may be utterly meaningless if, for example, each row in the data is not an independent observation, you run lots of models that invalidate the sampling assumptions of p-values, your hypothesis doesn't match the test, etc.
Finally, you could also try using the output of stat_smooth() that is produced when you call geom_smooth() to do this. The upside would be that you wouldn't need to fit the model twice to have both that geom and the p-value (using the standard error and coefficients plus normal distribution to get the p-value). That's a bit outside of the scope and would be more limiting since you're stuck with the models it employs and the same issues plague those as well. It's also pretty annoying to extract those: Method to extract stat_smooth line fit
I've fitted a logistic regression model that predicts the a binary outcome vs from mpg (mtcars dataset). The plot is shown below. How can I determine the mpg value for any particular vs value? For example, I'm interested in finding out what the mpg value is when the probability of vs is 0.50. Appreciate any help anyone can provide!
model <- glm(vs ~ mpg, data = mtcars, family = binomial)
ggplot(mtcars, aes(mpg, vs)) +
geom_point() +
stat_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE)
The easiest way to calculate predicted values from your model is with the predict() function. Then you can use a numerical solver to find particular intercepts. For example
findInt <- function(model, value) {
function(x) {
predict(model, data.frame(mpg=x), type="response") - value
}
}
uniroot(findInt(model, .5), range(mtcars$mpg))$root
# [1] 20.52229
Here findInt just takes the model and a particular target value and returns a function that uniroot can solve for 0 to find your solution.
You can solve for mpg directly as follows:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]
Detailed explanation:
When you fit the regression model, the equation you are fitting is the following:
log(p/(1-p)) = a + b*mpg
Where p is the probability that vs=1, a is the intercept and b is the coefficient of mpg. From the model fit results (just type model or summary(model)) we see that a = -8.8331 and b = 0.4304. We want to find mpg when p=0.5. So, the equation we need to solve is:
log(0.5/(1-0.5)) = -8.331 + 0.4304*mpg
log(1) = 0 = -8.331 + 0.4303*mpg
Rearranging,
mpg = 8.8331/0.4304 = 20.523
In general, to solve for mpg for any value of p:
mpg = (log(p/(1-p)) + 8.8331)/0.4304
Or, to make it more easily reproducible:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]
I've fitted a logistic regression model that predicts the a binary outcome vs from mpg (mtcars dataset). The plot is shown below. How can I determine the mpg value for any particular vs value? For example, I'm interested in finding out what the mpg value is when the probability of vs is 0.50. Appreciate any help anyone can provide!
model <- glm(vs ~ mpg, data = mtcars, family = binomial)
ggplot(mtcars, aes(mpg, vs)) +
geom_point() +
stat_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE)
The easiest way to calculate predicted values from your model is with the predict() function. Then you can use a numerical solver to find particular intercepts. For example
findInt <- function(model, value) {
function(x) {
predict(model, data.frame(mpg=x), type="response") - value
}
}
uniroot(findInt(model, .5), range(mtcars$mpg))$root
# [1] 20.52229
Here findInt just takes the model and a particular target value and returns a function that uniroot can solve for 0 to find your solution.
You can solve for mpg directly as follows:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]
Detailed explanation:
When you fit the regression model, the equation you are fitting is the following:
log(p/(1-p)) = a + b*mpg
Where p is the probability that vs=1, a is the intercept and b is the coefficient of mpg. From the model fit results (just type model or summary(model)) we see that a = -8.8331 and b = 0.4304. We want to find mpg when p=0.5. So, the equation we need to solve is:
log(0.5/(1-0.5)) = -8.331 + 0.4304*mpg
log(1) = 0 = -8.331 + 0.4303*mpg
Rearranging,
mpg = 8.8331/0.4304 = 20.523
In general, to solve for mpg for any value of p:
mpg = (log(p/(1-p)) + 8.8331)/0.4304
Or, to make it more easily reproducible:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]
My data is binary with two linear independent variables. For both predictors, as they get bigger, there are more positive responses. I have plotted the data in a heatplot showing density of positive responses along the two variables. There are the most positive responses in the top right corner and negative responses in the bottom left, with a gradient change visible along both axes.
I would like to plot a line on the heatplot showing where a logistic regression model predicts that positive and negative responses are equally likely. (My model is of the form response~predictor1*predictor2+(1|participant).)
My question: How can I figure out the line based on this model at which the positive response rate is 0.5?
I tried using predict(), but that works the opposite way; I have to give it values for the factor rather than giving the response rate I want. I also tried using a function that I used before when I had only one predictor (function(x) (log(x/(1-x))-fixef(fit)[1])/fixef(fit)[2]), but I can only get single values out of that, not a line, and I can only get values for one predictor at a time.
Using a simple example logistic regression model fitted to the mtcars dataset, and the algebra described here, I can produce a heatmap with a decision boundary using:
library(ggplot2)
library(tidyverse)
data("mtcars")
m1 = glm(am ~ hp + wt, data = mtcars, family = binomial)
# Generate combinations of hp and wt across their observed range. Only
# generating 50 values of each here, which is not a lot but since each
# combination is included, you get 50 x 50 rows
pred_df = expand.grid(
hp = seq(min(mtcars$hp), max(mtcars$hp), length.out = 50),
wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 50)
)
pred_df$pred_p = predict(m1, pred_df, type = "response")
# For a given value of hp (predictor1), find the value of
# wt (predictor2) that will give predicted p = 0.5
find_boundary = function(hp_val, coefs) {
beta_0 = coefs['(Intercept)']
beta_1 = coefs['hp']
beta_2 = coefs['wt']
boundary_wt = (-beta_0 - beta_1 * hp_val) / beta_2
}
# Find the boundary value of wt for each of the 50 values of hp
# Using the algebra in the linked question you can instead find
# the slope and intercept of the boundary, so you could potentially
# skip this step
boundary_df = pred_df %>%
select(hp) %>%
distinct %>%
mutate(wt = find_boundary(hp, coef(m1)))
ggplot(pred_df, aes(x = hp, y = wt)) +
geom_tile(aes(fill = pred_p)) +
geom_line(data = boundary_df)
Producing:
Note that this only takes into account the fixed effects from the model, so if you want to somehow take into account random effects this could be more complex.
I've fitted a logistic regression model that predicts the a binary outcome vs from mpg (mtcars dataset). The plot is shown below. How can I determine the mpg value for any particular vs value? For example, I'm interested in finding out what the mpg value is when the probability of vs is 0.50. Appreciate any help anyone can provide!
model <- glm(vs ~ mpg, data = mtcars, family = binomial)
ggplot(mtcars, aes(mpg, vs)) +
geom_point() +
stat_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE)
The easiest way to calculate predicted values from your model is with the predict() function. Then you can use a numerical solver to find particular intercepts. For example
findInt <- function(model, value) {
function(x) {
predict(model, data.frame(mpg=x), type="response") - value
}
}
uniroot(findInt(model, .5), range(mtcars$mpg))$root
# [1] 20.52229
Here findInt just takes the model and a particular target value and returns a function that uniroot can solve for 0 to find your solution.
You can solve for mpg directly as follows:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]
Detailed explanation:
When you fit the regression model, the equation you are fitting is the following:
log(p/(1-p)) = a + b*mpg
Where p is the probability that vs=1, a is the intercept and b is the coefficient of mpg. From the model fit results (just type model or summary(model)) we see that a = -8.8331 and b = 0.4304. We want to find mpg when p=0.5. So, the equation we need to solve is:
log(0.5/(1-0.5)) = -8.331 + 0.4304*mpg
log(1) = 0 = -8.331 + 0.4303*mpg
Rearranging,
mpg = 8.8331/0.4304 = 20.523
In general, to solve for mpg for any value of p:
mpg = (log(p/(1-p)) + 8.8331)/0.4304
Or, to make it more easily reproducible:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]