Confidence intervals for predicted probabilities from predict.lrm - r

I am trying to determine confidence intervals for predicted probabilities from a binomial logistic regression in R. The model is estimated using lrm (from the package rms) to allow for clustering standard errors on survey respondents (each respondent appears up to 3 times in the data):
I am able to estimate a predicted probability for the outcome using predict.lrm:
What I want to determine is a 95% confidence interval for this predicted probability. I have tried specifying, but this not permissible in predict.lrm when type=fitted.
I have spent the last few hours scouring the Internet for how to do this with lrm to no avail (obviously). Can anyone point me toward a method for determining this confidence interval? Alternatively, if it is impossible or difficult with lrm models, is there another way to estimate a logit with clustered standard errors for which confidence intervals would be more easily obtainable?

The help file for predict.lrm has a clear example. Here is a slight modification of it:
L <- predict(fit, newdata=data.frame(...),
plogis(with(L, linear.predictors + 1.96*cbind(-,
For some problems you may want to use the gendata or Predict functions, e.g.
L <- predict(fit, gendata(fit, var1=1), # leave other vars at median/mode
Predict(fit, var1=1:2, var2=3) # leave other vars at median/mode; gives CLs


Accuracy of predicted model in R

I am trying to calculate the accuracy of a predicted model with respect to the real case. In this case, I am using linear regression to predict my desired value. Plots show good accuracy between predicted and real models but not as good as my calculation for accuracy suggests. I am using the following code to calculate the accuracy of my model in Rstudio:
predicted <- import.list$y1
actual <- import.list$T_stp_cool
comparison <- data.frame(actual,predicted)
difference <- ((actual-predicted)/actual)
In this case, accuracy is 99% most of the time but comparing the plots of the real case and predicted one does not show 99% accuracy. What's the most accurate way to calculate the accuracy of my model?

What post-hoc test should be used for a glmer model with a continious and a categorical predictor variable?

I'm a bit of a newbie with stats and R, so need a bit of direction to find a suitable post-hoc test for my glmer model.
The model has a binary dependent variable (absent/present) and the predictor variables are interactive terms between a continuous variable(eg temp) and a categorical variable (species, n=3). Only interactive terms, rather than the continuous factor in isolation, produce significant results when an anova is run on the model. Species by itself has a large effect because one species is much rarer than the others. I'm trying to tease apart how the presence of these species varies across pH and between species.
I've tried lsmeans test with Tukey, and Firth's Bias-Reduced Logistic Regression, emmeans. I ran the effects function on the interactive terms, so had a rough expectation of what a post hoc could show, but the results logistf (firth's) have produced I was not expecting. Emmeans and tukey both gave the same results and ignored the continuous variable I assume because it's not a factor.
When I run firth's regression it produces chi-squared and p values that are either infinity for chi values or the p values astronomically small, even though what I saw through effects suggested no significant difference. I can't tell with the interactive term if there truly is an effect of the environmental variable or if the significant effect is because of the difference in species. Based on what I have seen of the logistf function, I didn't think it would produce a chi-square score. Is this an issue in coding or is it because of my data?
If I wasn't clear enough about something please let me know and if anyone has any suggestions or advice, they would be massively appreciated. Thanks!
The model and test code I used are below:
###glmer model
Large<-glmer(Abs.Pres~ Species:Q.Depth+Species:Conductivity+Species:Temp+Species:pH+Species:DO.P+(1|QID),
Output:Analysis of Variance Table
npar Sum Sq Mean Sq F value
Species:Q.Depth 3 234.904 78.301 78.3014
Species:Conductivity 3 32.991 10.997 10.9970
Species:Temp 3 39.001 13.000 13.0004
Species:pH 3 25.369 8.456 8.4562
Species:DO.P 3 34.930 11.643 11.6434
Lp<-logistf(Abs.Pres~Species:pH, data=Stacked_Pref, contrasts.arg=list(pH="contr.treatment", Species="contr.sum"))
> Lp
logistf(formula = Abs.Pres ~ Species:pH, data = Stacked_Pref,
contrasts.arg = list(pH = "contr.treatment", Species = "contr.sum"))
Model fitted by Penalized ML
Confidence intervals and p-values by Profile Likelihood
coef se(coef) lower 0.95 upper 0.95 Chisq p
(Intercept) 1.9711411 0.57309880 0.8552342 3.1015114 12.09107 5.066380e-04
SpeciesGoby:pH -0.3393185 0.07146049 -0.4804047 -0.2003108 23.31954 1.371993e-06
SpeciesMosquito:pH -0.3001385 0.07127771 -0.4408186 -0.1614419 18.24981 1.937453e-05
SpeciesRFBE:pH -0.4771393 0.07232469 -0.6200179 -0.3365343 45.73750 1.352096e-11
Likelihood ratio test=267.0212 on 3 df, p=0, n=3945

Test of second differences for average marginal effects in logistic regression

I have a question similar to the one here: Testing the difference between marginal effects calculated across factors. I used the same code to generate average marginal effects for two groups. The difference is that I am running a logistic rather than linear regression model. My average marginal effects are on the probability scale, so emmeans will not provide the correct contrast. Does anyone have any suggestions for how to test whether there is a significant difference in the average marginal effects between group 1 and group 2?
Thank you so much,
It is a bit unclear what the issue really is, but I'll try. I'm supposing your logistic regression model was fitted using, say, glm:
mod <- glm(cbind(heads, tails) ~ treat, data = mydata, family = binomial())
If you then do
emm <- emmeans(mod, "treat")
emm ### marginal means
pairs(emm) ### differences
Your results will be presented on the logit scale.
If you want them on the probability scale, you can do
summary(emm, type = "response")
summary(pairs(emm), type = "response")
However, the latter will back-transform the differences of logits, thereby producing odds ratios.
If you actually want differences of probabilities rather than ratios of odds, use regrid(), which will construct a new grid of values after back-transforming (and hence it will forget the log transformation):
It seems possible that two or more factors are present and you want contrasts of contrasts on the probability scale. In that case, extend this idea by calling regrid() on the table of EMMs to put everything on the probability scale, then follow the analogous procedure used in the linked article.

Plotting standard errors for effects

I have a lme4 model I have run for a hierarchical logistic regression, and I'm plotting the effects using the effects package. I would like to create an effects graph with the standard error of the mean as the error bars. I can get the point estimates, 95% confidence intervals, and standard errors into a dataframe. The standard errors, however, seem at odds with the confidence limit parameters, see below for an example in a regular glm.
mtcars <- mtcars %>%
mutate(vs = factor(vs))
glm1 <- glm(am ~ vs, mtcars, family = "binomial")
(glm1_eff <- Effect("vs", glm1) %>%
vs fit se lower upper
1 0 0.3333333 0.4999999 0.1580074 0.5712210
2 1 0.5000000 0.5345225 0.2596776 0.7403224
My understanding is that the fit column displays the point estimate for the probability of am is equal to 1 and that lower and upper correspond to the 95% confidence intervals for the probability that am equals 1. Note that the standard error does not seem to correspond to the confidence interval (e.g., .33+.49 > .57).
Here's what I am shooting for. As opposed to a 95% confidence interval, I would like to have an effects plot with +- the standard error of the mean.
Are the standard errors in log-odds instead of probability? Is there a simply way to convert them to probabilities and plot them so that I can make the graph?
John Fox shared this helpful response:
From ?Effect: "se: (for "eff" objects) a vector of standard errors for the effect, on the scale of the linear predictor." So the standard errors are on the log-odds scale." You could use the delta method to get standard errors on the probability scale but that would be very ill-advised, since the approach to asymptotic normality of estimated probabilities will be much slower than of log-odds. Effect() computes confidence limits on the scale of the linear predictor (log-odds for a logit model) and then inverse-transforms them to the scale of the response (probabilities).
All of the information you need to create a custom plot is in the "eff" object returned by Effect(); the contents of the object are documented in ?Effect.
I agree, by the way, that the method could be improved, and I'll do that when I have a chance. In particular, it invites misunderstanding to report the effects and confidence limits on the scale of the response but to show standard errors for the linear-predictor scale.
I'm answering the mystery first, then addressing the "show SE on the plot" question
Explanation of the SE mystery: All math in a GLM needs to be done on the link scale because this is the additive scale (where stuff can be added up). So...
The values in the column "fit" are the predicted probability of success (or the "predictions on the response scale"). Their values are expit(b0) and expit(b0 + b1). expit() is the inverse logit function. The SEs are on the link scale. An SE on the response scale doesn't make much sense because the response scale is non-linear (although its kinda weird to have stats on the response and link scale in the same table). "lower" and "upper" are on the response scale, so these are the CIs of the predicted probabilities of success. They are computed as expit(b0 ± 1.96SE) and expit(b0 + b1 ± 1.96SE). To recover these values with what is given
library(boot) # inv.logit and logit functions
expit.pred_0 <- 1/3 # fit 0
expit.pred_1 <- 1/2 # fit 1
se1 <- 1/2
se2 <- .5345225
inv.logit(logit(expit.pred_0) - qnorm(.975)*se1)
inv.logit(logit(expit.pred_0) + qnorm(.975)*se1)
inv.logit(logit(expit.pred_1) - qnorm(.975)*se2)
inv.logit(logit(expit.pred_1) + qnorm(.975)*se2)
> inv.logit(logit(expit.pred_0) - qnorm(.975)*se1)
[1] 0.1580074
> inv.logit(logit(expit.pred_0) + qnorm(.975)*se1)
[1] 0.5712211
> inv.logit(logit(expit.pred_1) - qnorm(.975)*se2)
[1] 0.2596776
> inv.logit(logit(expit.pred_1) + qnorm(.975)*se2)
[1] 0.7403224
Showing an SE computed from a glm on the response (non additive) scale doesn't make any sense because the SE is only additive on the link scale. In other words Multiplying SE by some quantile on the response scale (the scale of the plot you envision, with probability on the y axis) is meaningless. A CI is a point estimate back transformed from the link scale and so makes sense for plotting.
I frequently see researchers plotting SE bars computed from a linear model, like you envision, even though the statistics presented are from a GLM. These SE's are meaningful in a sense I guess but they often imply absurd consequences (like probabilities that could be less than zero or greater than one) so...don't do that either.

Combining ROC estimates from multiple imputation data

I have used the following R packages: mice, mitools, and pROC.
Basic design: 3 predictor measures with missing data rates between 5% and 70% on n~1,000. 1 binary target outcome variable.
Analytic Goal: Determine the AUROC of each of the 3 predictors.
I used the mice package to impute data and now have m datasets of imputed data.
Using the following command, I am able to get AUROC curves for each of m datasets:
fit1<-with(imp2, (roc(target, symptom1, ci=TRUE)))
fit2<-with(imp2, (roc(target, symptom2, ci=TRUE)))
fit3<-with(imp2, (roc(target, symptom3, ci=TRUE)))
I can see the estimates for each of m datasets without any problems.
To combine the parameters, I attempted to use mitools
I get the following error message:
"Error in pool(fit): Object has no vcov() method".
When combining coefficient estimates from m datasets, my understanding is that this is a simple average of the coefficients. However, the error term is more complex.
My question: How do I pool the "m" ROC parameter estimates (AUROC and 95% C.I. or S.E.) to get an accurate estimate of the error term for significance testing/95% Confidence Intervals?
Thank you for any help in advance.
I think the following works to combine the estimates.
pROC produces a point estimate for the AUROC as well as a 95% Confidence Interval.
To combine the AUROC from m imputation dataets, it is simply averaging the AUROC.
To create an appropriate standard error estimate and then a 95% C.I., I converted the 95% C.I.s into S.E. Using the standard formulas (Multiple Imputation FAQ, I computed the within, between, and total variance for the estimate. Once I had the standard error, I converted that back to a 95% C.I.
If anyone has any better suggestions, I would very much appreciate it.
I would use bootstrapping with the boot package to assess the different sources of variance. For instance for the variance due to imputation, you could use something like this:
bootstrap.imputation <- function(d, i, symptom){ <- d[i,] <- ... # here the code you use to generate one imputed dataset, but apply it to
boot.n <- 2000
boot(dataset, bootstrap.imputation, boot.n, "symptom1") # symptom1 is passed with ... to bootstrap.imputation
boot(dataset, bootstrap.imputation, boot.n, "symptom2")
boot(dataset, bootstrap.imputation, boot.n, "symptom3")
Then you can then do the same to assess the variance of the AUC. Impute your data, and apply the bootstrap again (or you can do with the built-in functions of pROC).
