I've fitted a logistic regression model that predicts the a binary outcome vs from mpg (mtcars dataset). The plot is shown below. How can I determine the mpg value for any particular vs value? For example, I'm interested in finding out what the mpg value is when the probability of vs is 0.50. Appreciate any help anyone can provide!
model <- glm(vs ~ mpg, data = mtcars, family = binomial)
ggplot(mtcars, aes(mpg, vs)) +
geom_point() +
stat_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE)
The easiest way to calculate predicted values from your model is with the predict() function. Then you can use a numerical solver to find particular intercepts. For example
findInt <- function(model, value) {
function(x) {
predict(model, data.frame(mpg=x), type="response") - value
}
}
uniroot(findInt(model, .5), range(mtcars$mpg))$root
# [1] 20.52229
Here findInt just takes the model and a particular target value and returns a function that uniroot can solve for 0 to find your solution.
You can solve for mpg directly as follows:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]
Detailed explanation:
When you fit the regression model, the equation you are fitting is the following:
log(p/(1-p)) = a + b*mpg
Where p is the probability that vs=1, a is the intercept and b is the coefficient of mpg. From the model fit results (just type model or summary(model)) we see that a = -8.8331 and b = 0.4304. We want to find mpg when p=0.5. So, the equation we need to solve is:
log(0.5/(1-0.5)) = -8.331 + 0.4304*mpg
log(1) = 0 = -8.331 + 0.4303*mpg
Rearranging,
mpg = 8.8331/0.4304 = 20.523
In general, to solve for mpg for any value of p:
mpg = (log(p/(1-p)) + 8.8331)/0.4304
Or, to make it more easily reproducible:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]
Related
The function centiles.pred is a great option to extract z-scores based on a gamlss model like in the following code:
library(gamlss)
FIT = gamlss(mpg ~ disp, data = mtcars, family = BCPE)
NEWDATA = data.frame(disp = 300, mpg = 17)
centiles.pred(FIT, xvalues = NEWDATA$disp, xname = "disp", yval = NEWDATA$mpg, type = "z-scores")
However, the help-page of centiles.pred says "A restriction of the function is that it applies to models with only one explanatory variable". In many cases, however, you have more than one explanatory variable as in the following example:
FIT = gamlss(mpg ~ disp + qsec, data = mtcars, family = BCPE)
My question is:
Is there a workable way to calculate z-scores and centiles (also according to the arguments family = "standard-centiles", and family = "centiles" in the function centiled.pred) from a gamlss model with more than one explanatory variable?
The function predictAll(FIT,newdata= )
gives the fitted parameters (mu, sigma, nu, tau)
for new values of display and qsec.
[See Stasinopoulos et al. (2017) page 143.]
Then with the fitted (mu, sigma, nu, tau)
use the cdf of BCPE (i.e. pBCPE) to find the probability (say p) of being below your corresponding new value of mpg.
The z-score is then given by
qNO(p)
NOTE: example code
NEWDATA = data.frame(disp = 300, qsec = 50)
params <- predictAll(FIT,newdata=NEWDATA)
For mpg = 17, then
p <- pBCPE(17, params$mu, params$sigma, params$nu, params$tau)
The z-score is then given by
z <- qNO(p)
qNo is the inverse cdf of the standard normal distribution.
How to make a forest plots for mixed models co-effiecents and their corresponding confidence interval.
I tried this code
Model = lme (fixed = score~ Age+Sex+yearsofeducation+walkspeed,
random = ~1|ID,
data=DB,
na.action = na.omit, method = "ML",
)
plot_summs (model)
However, I want the OR in the forest plots to be ordered in a descending fashion.
Thanks for the help.
I would call this a "coefficient plot", not a "forest plot". (A forest plot is used in meta-analyses, when you are comparing the magnitude of estimates of the same effect from many different studies.)
example setup
This is a slightly silly example, but should be close enough to yours (not clear to me why you're mentioning OR (= odds ratios?), these are typically from a logistic regression ... ?)
library(nlme)
mtcars <- transform(mtcars, cylgear = interaction(cyl, gear))
m1 <- lme(mpg ~ disp + hp + drat + wt + qsec,
random = ~1|cylgear,
data = mtcars)
coefficient plots: dotwhisker
You could get approximately what you want directly from the dotwhisker package, but it won't sort effects (or not easily, as far as I know):
library(dotwhisker)
library(broom.mixed) ## required to 'tidy' (process) lme fits
dwplot(m1, effects = "fixed")
coefficient plots: tidyverse
I usually do the processing myself, as I prefer increased flexibility.
library(tidyverse)
tt <- (m1
## extract estimates and CIs
|> tidy(effects = "fixed", conf.int = TRUE)
## usually *don't* want to compare intercept (dwplot does this automatically)
|> filter(term != "(Intercept)")
## scale parameters by 2SD - usually necessary for comparison
|> dotwhisker::by_2sd(data = mtcars)
## take only the bits we need, rename some (cosmetic)
|> select(term, estimate, lwr = conf.low, upr = conf.high)
## order terms by estimate value
|> mutate(across(term, ~reorder(factor(.), estimate)))
)
gg0 <- (ggplot(tt,
aes(estimate, term))
+ geom_pointrange(aes(xmin = lwr, xmax = upr))
+ geom_vline(xintercept = 0, lty = 2)
)
print(gg0)
The only remaining/possibility tricky question here is what to do if you have positive and negative coefficients of similar magnitude. If you want to sort by absolute value then
|> mutate(across(term, ~reorder(factor(.), estimate,
FUN = function(x) mean(abs(x)))
although this gets a bit ugly.
If you like the tidyverse you can substitute forcats::fct_reorder for reorder.
I’m just adding one more option to Ben Bolker’s excellent answer: using the modelsummary package. (Disclaimer: I am the author.)
With that package, you can use the modelplot() function to create a forest plot, and the coef_map argument to rename and reorder coefficients. If you are estimating a logit model and want the odds ratios, you can use the exponentiate argument.
The order in which you insert coefficients in the coef_map vector sorts them in the plot, from bottom to top. For example:
library(lme4)
library(modelsummary)
mod <- lmer(mpg ~ wt + drat + (1 | gear), data = mtcars)
modelplot(
mod,
coef_map = c("(Intercept)" = "Constant",
"drat" = "Rear Axle Ratio",
"wt" = "Weight"))
I would like to plot the line and the shaded 95% confidence interval bands (for example using polygon)from a glm model (family binomial)or using gglot. For linear models (lm), I have previously been able to plot the confidence intervals from the predictions as they included the fit, lower and upper level but I do not know how to do it here. I have tried to use the function predict.glm with the optional argument se.fit set to TRUE, and then using the prediction +/- 1.96 * std.error to calculate the confidence intervals but it did not work for me.
Thanks for help in advance. You can find here the data that I used (it contains 10 variables and 996 observations): https://drive.google.com/file/d/1Yu7Dk2eh0R1ztKiuNTtN_W5Yg4C2Ne-2/view?usp=sharing Code and figure here:
# Models
mod= glm(site ~S + age + pH + soil + peat+
spruce+ I(spruce^2)+pine+ birch+
tsumma+ I(tsumma^2),
data=test.dat,family=binomial)
# Means of all covariates
means = apply(test.dat[,c("S", "pH","soil", "spruce", "pine","birch", "tsumma")],2,mean,na.rm=T)
# Calculate the constant given by all other covariates being at their means and assuming only pine on the plot
const = mod$coefficients[1]+
mod$coefficients["S"]*means["S"]+
mod$coefficients["pH"]*means["pH"]+
mod$coefficients["soil"]*means["soil"]+
mod$coefficients["spruce"]*means["spruce"]+
mod$coefficients["I(spruce^2)"]*means["spruce"]*means["spruce"]+
mod$coefficients["pine"]*means["pine"]+
mod$coefficients["birch"]*means["birch"]+
mod$coefficients["tsumma"]*means["tsumma"]+
mod$coefficients["I(tsumma^2)"]*means["tsumma"]*means["tsumma"]
# Plot
age = seq(from=min(test.dat$age,na.rm=T),to=150,length=100)
lin= const + mod$coefficients["age"]*age
Pr = exp(lin) / (exp(lin)+1)
par(mar = c(4, 4, 1.5, 0.3))
plot(age,Pr,type="l", ylim=c(0,.5),las=1, main="Probability of hotspot", ylab="Probability of occurrence",xlab="Forest age (years)")
You can use a package, indicating the term to plot while holding others constant:
library(sjPlot)
set.seed(888)
data = mtcars
data$vs = data$vs + rnorm(nrow(data))
mod = glm(am ~ disp + vs + carb+ I(vs^2),data=data,family="binomial")
plot_model(mod,type="pred",terms="disp")
Or derive it like you did, except I think you might need to create the extra term for the squared value, so that you can hold the other terms at their means, and use the predict.lm function :
data$vs2 = data$vs^2
mod = glm(am ~ disp + vs + carb+ vs2,data=data,family="binomial")
varMeans = colMeans(mod$model)[c("vs","carb","vs2")]
pred_disp = seq(min(data$disp),max(data$disp),length.out=100)
df = data.frame(
disp = pred_disp,
t(replicate(length(pred_disp),varMeans))
)
pred = predict(mod,df,se=TRUE)
plot(df$disp,plogis(pred$fit),"l")
lines(df$disp,plogis(pred$fit + 1.96*pred$se.fit),col="blue",lty=8)
lines(df$disp,plogis(pred$fit - 1.96*pred$se.fit),col="blue",lty=8)
I've fitted a logistic regression model that predicts the a binary outcome vs from mpg (mtcars dataset). The plot is shown below. How can I determine the mpg value for any particular vs value? For example, I'm interested in finding out what the mpg value is when the probability of vs is 0.50. Appreciate any help anyone can provide!
model <- glm(vs ~ mpg, data = mtcars, family = binomial)
ggplot(mtcars, aes(mpg, vs)) +
geom_point() +
stat_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE)
The easiest way to calculate predicted values from your model is with the predict() function. Then you can use a numerical solver to find particular intercepts. For example
findInt <- function(model, value) {
function(x) {
predict(model, data.frame(mpg=x), type="response") - value
}
}
uniroot(findInt(model, .5), range(mtcars$mpg))$root
# [1] 20.52229
Here findInt just takes the model and a particular target value and returns a function that uniroot can solve for 0 to find your solution.
You can solve for mpg directly as follows:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]
Detailed explanation:
When you fit the regression model, the equation you are fitting is the following:
log(p/(1-p)) = a + b*mpg
Where p is the probability that vs=1, a is the intercept and b is the coefficient of mpg. From the model fit results (just type model or summary(model)) we see that a = -8.8331 and b = 0.4304. We want to find mpg when p=0.5. So, the equation we need to solve is:
log(0.5/(1-0.5)) = -8.331 + 0.4304*mpg
log(1) = 0 = -8.331 + 0.4303*mpg
Rearranging,
mpg = 8.8331/0.4304 = 20.523
In general, to solve for mpg for any value of p:
mpg = (log(p/(1-p)) + 8.8331)/0.4304
Or, to make it more easily reproducible:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]
I've fitted a logistic regression model that predicts the a binary outcome vs from mpg (mtcars dataset). The plot is shown below. How can I determine the mpg value for any particular vs value? For example, I'm interested in finding out what the mpg value is when the probability of vs is 0.50. Appreciate any help anyone can provide!
model <- glm(vs ~ mpg, data = mtcars, family = binomial)
ggplot(mtcars, aes(mpg, vs)) +
geom_point() +
stat_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE)
The easiest way to calculate predicted values from your model is with the predict() function. Then you can use a numerical solver to find particular intercepts. For example
findInt <- function(model, value) {
function(x) {
predict(model, data.frame(mpg=x), type="response") - value
}
}
uniroot(findInt(model, .5), range(mtcars$mpg))$root
# [1] 20.52229
Here findInt just takes the model and a particular target value and returns a function that uniroot can solve for 0 to find your solution.
You can solve for mpg directly as follows:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]
Detailed explanation:
When you fit the regression model, the equation you are fitting is the following:
log(p/(1-p)) = a + b*mpg
Where p is the probability that vs=1, a is the intercept and b is the coefficient of mpg. From the model fit results (just type model or summary(model)) we see that a = -8.8331 and b = 0.4304. We want to find mpg when p=0.5. So, the equation we need to solve is:
log(0.5/(1-0.5)) = -8.331 + 0.4304*mpg
log(1) = 0 = -8.331 + 0.4303*mpg
Rearranging,
mpg = 8.8331/0.4304 = 20.523
In general, to solve for mpg for any value of p:
mpg = (log(p/(1-p)) + 8.8331)/0.4304
Or, to make it more easily reproducible:
mpg = (log(p/(1-p)) - coef(model)[1])/coef(model)[2]