unscale predictor coefficients lmer model fit with an unscaled response - r

I have fitted a lmer model, and now I am trying to interpret the coefficients in terms of the real coefficients instead of scaled ones.
My top model is:
lmer(logcptplus1~scale.t6+scale.logdepth+(1|location) + (1|Fyear),data=cpt, REML=TRUE)
so both the predictor variables are scaled, with one being the scaled log values. my response variable is not scaled and just logged.
to scale my predictor variables, I used the scale(data$column, center=TRUE,scale=TRUE) function in r.
The output for my model is:
Fixed effects:
Estimate Std. Error t value
(int) 3.31363 0.15163 21.853
scale.t6 -0.34400 0.10540 -3.264
scale.logdepth -0.58199 0.06486 -8.973
so how can I obtain real estimates for my response variable from these coefficients that are scaled based on my scaled predictor variables?
NOTE: I understand how to unscale my predictor variables, just not how to unscale/transform the coefficients
Thanks

The scale function does a z-transform of the data, which means it takes the original values, subtracts the mean, and then divides by the standard deviation.
to_scale <- 1:10
using_scale <- scale(to_scale, center = TRUE, scale = TRUE)
by_hand <- (to_scale - mean(to_scale))/sd(to_scale)
identical(as.numeric(using_scale), by_hand)
[1] TRUE
Therefore, to reverse the model coefficients all you need to do is multiply the coefficient by the standard deviation of the covariate and add the mean. The scale function holds onto the mean and sd for you. So, if we assume that your covariate values are the using_scale vector for the scale.t6 regression coefficient we can write a function to do the work for us.
get_real <- function(coef, scaled_covariate){
# collect mean and standard deviation from scaled covariate
mean_sd <- unlist(attributes(scaled_covariate)[-1])
# reverse the z-transformation
answer <- (coef * mean_sd[2]) + mean_sd[1]
# this value will have a name, remove it
names(answer) <- NULL
# return unscaled coef
return(answer)
}
get_real(-0.3440, using_scale)
[1] 4.458488
In other words, it is the same thing as unscaling your predictor variables because it is a monotonic transformation.

Related

get pairwise difference from emmeans with quadratic covariate interaction

I have a factor X with three levels and a continuous covariate Z.
To predict the continuous variable Y, I have the model
model<-lm(Y ~ X*poly(Z,2,raw=TRUE))
I know that the emmeans package in R has the function emtrends() to estimate the pairwise difference between factor level slopes and does a p-value adjustment.
emtrends(model, pairwise ~ X, var = "Z")
however this works when Z is a linear term. Here I have a quadratic term. I guess this means I have to look at pairwise differences at pre specified values of Z? and get something like the local "slope" trend?
Is this possible to do with emmeans? How would I need to do the p-adjustment, does it scale with the number of grid points? -so when the number of grid values where I do the comparison increases, bonferroni will become too conservative?
Also how would I do the pairwise comparison of the mean (prediction) at different grid values with emmeans (or is this the same regardless of using poly() as this relies only on model predicitons)?
thanks.

standard error & variance of model predictions using merTools::predictInterval

I would like to estimate the standard error and variance of predictions from a linear mixed model. I'm using merTools::predictInterval to estimate prediction intervals because I want to include some of the uncertainty in the random effects (in addition to the uncertainty in the fixed effects). Is it acceptable to use the simulations from merTools::predictInterval to estimate the se and variance of predictions? If so, how should I calculate them? I can think of 2 ways:
To get variance corresponding to the prediction interval (i.e. including residual variance), I would first get the simulated predictions:
predictions <- merTools::predictInterval(...,
include.resid.var = TRUE,
returnSims = TRUE)
1. Then I could estimate variance using the normal approximation (calculate the distance between the fit and the upper/lower interval and then divide that by 1.96):
var1 <- ((predictions$upr - predictions$lwr)/2/1.96)^2
2. Or I could just take the variance of the simulated values:
var2 <- apply(X = attr(x = predictions, which = 'sim.results'), MARGIN = 1, FUN = var)
The SE would then be the square root of the variance. To get the SE and/or variance relating to the confidence interval, I could repeat this with include.resid.var = FALSE in the merTools::predictInterval call.
Is either of these methods acceptable? Is one preferable to the other?

Calculating logLik by hand from a logistic regression

I ran a mixed model logistic regression adjusting my model with genetic relationship matrix using an R package known as GMMAT (function: glmmkin()).
My output from the model includes (taken from the user manual):
theta: the dispersion parameter estimate [1] and the variance component parameter estimate [2]
coefficients: fixed effects parameter estimates (including the intercept).
linear.predictors: the linear predictors.
fitted.values: fitted mean values on the original scale.
Y: a vector of length equal to the sample size for the final working vector.
P: the projection matrix with dimensions equal to the sample size.
residuals: residuals on the original scale. NOT rescaled by the dispersion parameter.
cov: covariance matrix for the fixed effects (including the intercept).
converged: a logical indicator for convergence.
I am trying to obtain the log-likelihood in order to compute variance explained. My first instinct was to pull apart the logLik.glm function in order to compute this "by hand" and I got stuck at trying to compute AIC. I used the answer from here.
I did a sanity check with a logistic regression run with stats::glm() where the model1$aic is 4013.232 but using the Stack Overflow answer I found, I obtained 30613.03.
My question is -- does anyone know how to compute log likelihood from a logistic regression by hand using the output that I have listed above in R?
No statistical insight here, just the solution I see from looking at glm.fit. This only works if you did not specify weights while fitting the models (or if you did, you would need to include those weights in the model object)
get_logLik <- function(s_model, family = binomial(logit)) {
n <- length(s_model$y)
wt <- rep(1, n) # or s_model$prior_weights if field exists
deviance <- sum(family$dev.resids(s_model$y, s_model$fitted.values, wt))
mod_rank <- sum(!is.na(s_model$coefficients)) # or s_model$rank if field exists
aic <- family$aic(s_model$y, rep(1, n), s_model$fitted.values, wt, deviance) + 2 * mod_rank
log_lik <- mod_rank - aic/2
return(log_lik)
}
For example...
model <- glm(vs ~ mpg, mtcars, family = binomial(logit))
logLik(model)
# 'log Lik.' -12.76667 (df=2)
sparse_model <- model[c("theta", "coefficients", "linear.predictors", "fitted.values", "y", "P", "residuals", "cov", "converged")]
get_logLik(sparse_model)
#[1] -12.76667

How to estimate the odds ratio with CI for X in a logistic regression containing the square of X using R?

I am trying to calculate odds ratios in R for variables with not only linear but also with quadratic terms in logistic regression. Let's say there is X and X^2 in the model. I know how to get the odds ratio (for a unit change of X) when X takes a specific value, but I don't know how to calculate confidence interval for this estimate. I found this reference how it's done in SAS: http://support.sas.com/kb/35/189.html , but I would like to do it in R. Any suggestions?
#BenBolker
Here is an example:
mydata <-read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata <- transform(mydata, gpaSquared=gpa^2,greSquared=gre^2)
model <- glm(admit ~ gpa + gpaSquared + gre , family = binomial(logit), data = mydata)
In this example the odds ratio for gpa depends on the actual value of gpa (e.g. the effect of a unit change in gpa if gpa=4). I can calculate the log odds for gpa=5 and gpa=4 and get the odds ratio from those, but I don't know how to get CI for the OR. (please ignore that in the example the squared term is not stat. significant...)
m <- glm(x~X1^2+X2,data,family=binomial(link="logit"))
summary(m)
confint(m) # 95% CI for the coefficients using profiled log-likelihood
confint.default(m) ## CIs using standard errors
exp(coef(m)) # exponentiated coefficients
exp(confint(m)) # 95% CI for exponentiated coefficients

Scale back linear regression coefficients in R from scaled and centered data

I'm fitting a linear model using OLS and have scaled my regressors with the function scale in R because of the different units of measure between variables. Then, I fit the model using the lm command and get the coefficients of the fitted model. As far as I know the coefficients of the fitted model are not in the same units of the original regressors variables and therefore must be scaled back before they can be interpreted. I have been searching for a direct way to do it by couldn't find anything. Does anyone know how to do that?
Please have a look to the code, could you please help me implementing what you proposed?
library(zoo)
filename="DataReg4.csv"
filepath=paste("C:/Reg/",filename, sep="")
separator=";"
readfile=read.zoo(filepath, sep=separator, header=T, format = "%m/%d/%Y", dec=".")
readfile=as.data.frame(readfile)
str(readfile)
DF=readfile
DF=as.data.frame(scale(DF))
fm=lm(USD_EUR~diff_int+GDP_US+Net.exports.Eur,data=DF)
summary(fm)
plot(fm)
I'm sorry this is the data.
http://www.mediafire.com/?hmcp7urt0ag8187
If you used the scale function with default arguments then your regressors will be centered (subtracting their mean) and divided by their standard deviations. You can interpret the coefficients without transforming them back to the original units:
Holding everything else constant, on average, a one standard deviation change in one of the regressors is associated with a change in the dependent variable corresponding to the coefficient of that regressor.
If you have included an intercept term in your model keep in mind that the interpretation of the intercept will change. The estimated intercept now represents the average level of the dependent variable when all of the regressors are at their average levels. This is a result of subtracting the mean from each variable.
To interpret the coefficients in non-standard deviation terms, just calculate the standard deviation of each regressor and multiple that by the coefficient.
To de-scale or back-transform regression coefficients from a regression done with scaled predictor variable(s) and non-scaled response variable the intercept and slope should be calculated as:
A = As - Bs*Xmean/sdx
B = Bs/sdx
thus the regression is,
Y = As - Bs*Xmean/sdx + Bs/sdx * X
where
As = intercept from the scaled regression
Bs = slope from the scaled regression
Xmean = the mean of the scaled predictor variable
sdx = the standard deviation of the predictor variable
This can be adjusted if Y was also scaled but it appears you decided not to do that ultimately with your dataset.
If I understand your description (that is unfortunately at the moment code-free), you are getting standardized regression coefficients for Y ~ As + Bs*Xs where all those "s" items are scaled variables. The coefficients then are the predicted change on a std deviation scale of Y associated with a change in X of one standard deviation of X. The scale function would have recorded the means and standard deviations in attributes for hte scaled object. If not, then you will have those estimates somewhere in your console log. The estimated change in dY for a change dX in X should be: dY*(1/sdY) = Bs*dX*(1/sdX). Predictions should be something along these lines:
Yest = As*(sdX) + Xmn + Bs*(Xs)*(sdX)
You probably should not have needed to standardize the Y values, and I'm hoping that you didn't because it makes dealing with the adjustment for the means of the X's easier. Put some code and example data in if you want implemented and checked answers. I think #DanielGerlance is correct in saying to multiply rather than divide by the SD's.

Resources