How to estimate the odds ratio with CI for X in a logistic regression containing the square of X using R? - r

I am trying to calculate odds ratios in R for variables with not only linear but also with quadratic terms in logistic regression. Let's say there is X and X^2 in the model. I know how to get the odds ratio (for a unit change of X) when X takes a specific value, but I don't know how to calculate confidence interval for this estimate. I found this reference how it's done in SAS: http://support.sas.com/kb/35/189.html , but I would like to do it in R. Any suggestions?
#BenBolker
Here is an example:
mydata <-read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata <- transform(mydata, gpaSquared=gpa^2,greSquared=gre^2)
model <- glm(admit ~ gpa + gpaSquared + gre , family = binomial(logit), data = mydata)
In this example the odds ratio for gpa depends on the actual value of gpa (e.g. the effect of a unit change in gpa if gpa=4). I can calculate the log odds for gpa=5 and gpa=4 and get the odds ratio from those, but I don't know how to get CI for the OR. (please ignore that in the example the squared term is not stat. significant...)

m <- glm(x~X1^2+X2,data,family=binomial(link="logit"))
summary(m)
confint(m) # 95% CI for the coefficients using profiled log-likelihood
confint.default(m) ## CIs using standard errors
exp(coef(m)) # exponentiated coefficients
exp(confint(m)) # 95% CI for exponentiated coefficients

Related

95% CI for the ICC in linear mixed effects model (multilevel model, hierarchical model)

I fitted a linear mixed effect model to predict the math score as the outcome, x= participant factor (nominal or ordinal) as the fixed effect, Schl is the random effect. Then I compared it with the simple linear regression model using compare_performance, and while the output gives the ICC, I was not sure how to calculate the 95% for it? (for coefficients I used confintconfint and it did the job)
lm1<- lm(math~ gender, data= df)
lme1<- lmer(math~gender+(1|schl), data=df)
compare_performance(lm1,lme1)
the ICC was 0.15
From this gist from Peter Dahlgren, taken in turn from a CrossValidated answer by #Ashe, here is the crux:
calc.icc <- function(y) {
sumy <- summary(y)
(sumy$varcor$id[1]) / (sumy$varcor$id[1] + sumy$sigma^2)
}
boot.icc <- bootMer(mymod, calc.icc, nsim=1000)
#Draw from the bootstrap distribution the usual 95% upper and lower confidence limits
quantile(boot.icc$t, c(0.025, 0.975))
You can (and should) check that this calc.icc() function gives the same results as your compare_performance() function. Since this uses parametric bootstrapping, you can substitute any ICC function you like as it long takes a fitted model as input and returns the ICC as a single numeric value. (Also, because it uses PB, it will be slow; there are potentially faster approximate methods, but PB is reliable and easy to program.)

standard error & variance of model predictions using merTools::predictInterval

I would like to estimate the standard error and variance of predictions from a linear mixed model. I'm using merTools::predictInterval to estimate prediction intervals because I want to include some of the uncertainty in the random effects (in addition to the uncertainty in the fixed effects). Is it acceptable to use the simulations from merTools::predictInterval to estimate the se and variance of predictions? If so, how should I calculate them? I can think of 2 ways:
To get variance corresponding to the prediction interval (i.e. including residual variance), I would first get the simulated predictions:
predictions <- merTools::predictInterval(...,
include.resid.var = TRUE,
returnSims = TRUE)
1. Then I could estimate variance using the normal approximation (calculate the distance between the fit and the upper/lower interval and then divide that by 1.96):
var1 <- ((predictions$upr - predictions$lwr)/2/1.96)^2
2. Or I could just take the variance of the simulated values:
var2 <- apply(X = attr(x = predictions, which = 'sim.results'), MARGIN = 1, FUN = var)
The SE would then be the square root of the variance. To get the SE and/or variance relating to the confidence interval, I could repeat this with include.resid.var = FALSE in the merTools::predictInterval call.
Is either of these methods acceptable? Is one preferable to the other?

Calculating logLik by hand from a logistic regression

I ran a mixed model logistic regression adjusting my model with genetic relationship matrix using an R package known as GMMAT (function: glmmkin()).
My output from the model includes (taken from the user manual):
theta: the dispersion parameter estimate [1] and the variance component parameter estimate [2]
coefficients: fixed effects parameter estimates (including the intercept).
linear.predictors: the linear predictors.
fitted.values: fitted mean values on the original scale.
Y: a vector of length equal to the sample size for the final working vector.
P: the projection matrix with dimensions equal to the sample size.
residuals: residuals on the original scale. NOT rescaled by the dispersion parameter.
cov: covariance matrix for the fixed effects (including the intercept).
converged: a logical indicator for convergence.
I am trying to obtain the log-likelihood in order to compute variance explained. My first instinct was to pull apart the logLik.glm function in order to compute this "by hand" and I got stuck at trying to compute AIC. I used the answer from here.
I did a sanity check with a logistic regression run with stats::glm() where the model1$aic is 4013.232 but using the Stack Overflow answer I found, I obtained 30613.03.
My question is -- does anyone know how to compute log likelihood from a logistic regression by hand using the output that I have listed above in R?
No statistical insight here, just the solution I see from looking at glm.fit. This only works if you did not specify weights while fitting the models (or if you did, you would need to include those weights in the model object)
get_logLik <- function(s_model, family = binomial(logit)) {
n <- length(s_model$y)
wt <- rep(1, n) # or s_model$prior_weights if field exists
deviance <- sum(family$dev.resids(s_model$y, s_model$fitted.values, wt))
mod_rank <- sum(!is.na(s_model$coefficients)) # or s_model$rank if field exists
aic <- family$aic(s_model$y, rep(1, n), s_model$fitted.values, wt, deviance) + 2 * mod_rank
log_lik <- mod_rank - aic/2
return(log_lik)
}
For example...
model <- glm(vs ~ mpg, mtcars, family = binomial(logit))
logLik(model)
# 'log Lik.' -12.76667 (df=2)
sparse_model <- model[c("theta", "coefficients", "linear.predictors", "fitted.values", "y", "P", "residuals", "cov", "converged")]
get_logLik(sparse_model)
#[1] -12.76667

unscale predictor coefficients lmer model fit with an unscaled response

I have fitted a lmer model, and now I am trying to interpret the coefficients in terms of the real coefficients instead of scaled ones.
My top model is:
lmer(logcptplus1~scale.t6+scale.logdepth+(1|location) + (1|Fyear),data=cpt, REML=TRUE)
so both the predictor variables are scaled, with one being the scaled log values. my response variable is not scaled and just logged.
to scale my predictor variables, I used the scale(data$column, center=TRUE,scale=TRUE) function in r.
The output for my model is:
Fixed effects:
Estimate Std. Error t value
(int) 3.31363 0.15163 21.853
scale.t6 -0.34400 0.10540 -3.264
scale.logdepth -0.58199 0.06486 -8.973
so how can I obtain real estimates for my response variable from these coefficients that are scaled based on my scaled predictor variables?
NOTE: I understand how to unscale my predictor variables, just not how to unscale/transform the coefficients
Thanks
The scale function does a z-transform of the data, which means it takes the original values, subtracts the mean, and then divides by the standard deviation.
to_scale <- 1:10
using_scale <- scale(to_scale, center = TRUE, scale = TRUE)
by_hand <- (to_scale - mean(to_scale))/sd(to_scale)
identical(as.numeric(using_scale), by_hand)
[1] TRUE
Therefore, to reverse the model coefficients all you need to do is multiply the coefficient by the standard deviation of the covariate and add the mean. The scale function holds onto the mean and sd for you. So, if we assume that your covariate values are the using_scale vector for the scale.t6 regression coefficient we can write a function to do the work for us.
get_real <- function(coef, scaled_covariate){
# collect mean and standard deviation from scaled covariate
mean_sd <- unlist(attributes(scaled_covariate)[-1])
# reverse the z-transformation
answer <- (coef * mean_sd[2]) + mean_sd[1]
# this value will have a name, remove it
names(answer) <- NULL
# return unscaled coef
return(answer)
}
get_real(-0.3440, using_scale)
[1] 4.458488
In other words, it is the same thing as unscaling your predictor variables because it is a monotonic transformation.

Is it possible to have similar standard errors for marginal effects under a probit regression for all estimates?

Data: Data
Code:
## Regression
probit_enae = glm(emploi ~ genre + filiere + satisfaction + competence + anglais, family=binomial(link="probit"),
data=ENAE_Probit.df)
summary(probit_enae) #Summary output of the regression
confint(probit_enae) #Gives the 95% confidence interval for the estimated coefficients
## Marginal effects
mfx_enae = mfx(probit_enae)
So, when you run the mfx command to get the marginal effects, it works. But then you get a standard error estimate (9.901) which is the same across all estimated parameters. Is this a normal behavior?
Thanks

Resources