Calculate α and β in Probit Model in R - r

I am facing following issue: I want to calculate the α and β from the following probit model in R, which is defined as:
Probability = F(α + β sprd )
where sprd denotes the explanatory variable, α and β are constants, F is the cumulative normal distribution function.
I can calculate probabilities for the entire dataset, the coeffcients (see code below) etc. but I do not know how to get the constant α and β.
The purpose is to determine the Spread in Excel that corresponds to a certain probability. E.g: Which Spread corresponds to 50% etc.
Thank you in advance!
Probit model coefficients
probit<- glm(Y ~ X, family=binomial (link="probit"))
summary(probit)
Call:
glm(formula = Y ~ X, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4614 -0.6470 -0.3915 -0.2168 2.5730
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3566755 0.0883634 -4.036 5.43e-05 ***
X -0.0058377 0.0007064 -8.264 < 2e-16 ***

From the help("glm") page you can see that the object returns a value named coefficients.
An object of class "glm" is a list containing at least the following
components:
coefficients a named vector of coefficients
So after you call glm() that object will be a list, and you can access each element using $name_element.
Reproducible example (not a Probit model, but it's the same):
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
d.AD <- data.frame(treatment, outcome, counts)
# fit model
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
Now glm.D93$coefficients will print the vector with all the coefficients:
glm.D93$coefficients
# (Intercept) outcome2 outcome3 treatment2 treatment3
#3.044522e+00 -4.542553e-01 -2.929871e-01 1.337909e-15 1.421085e-15
You can assign that and access each individually:
coef <- glm.D93$coefficients
coef[1] # your alpha
#(Intercept)
# 3.044522
coef[2] # your beta
# outcome2
#-0.4542553

I've seen in your deleted post that you are not convinced by #RLave's answer. Here are some simulations to convince you:
# (large) sample size
n <- 10000
# covariate
x <- (1:n)/n
# parameters
alpha <- -1
beta <- 1
# simulated data
set.seed(666)
y <- rbinom(n, 1, prob = pnorm(alpha + beta*x))
# fit the probit model
probit <- glm(y ~ x, family = binomial(link="probit"))
# get estimated parameters - very close to the true parameters -1 and 1
coef(probit)
# (Intercept) x
# -1.004236 1.029523
The estimated parameters are given by coef(probit), or probit$coefficients.

Related

Equivalence of a mixed model fitted by lme and lmer

I have fitted a mixed effects model considering both functions widely used in R, namely: the lme function from the nlme package and the lmer function from the lme4 package.
To readjust the model from lme to lme4, following the same reparametrization, I used the following information from this topic, being that is only possible to do this in lme4 in a hackable way.: Heterocesdastic model of mixed effects via lmer function
I apologize for hosting the data in a link, however, I couldn't find an internal R database that has variables that might match my problem.
Data: https://drive.google.com/file/d/1jKFhs4MGaVxh-OPErvLDfMNmQBouywoM/view?usp=sharing
The fitted models were:
library(nlme)
library(lme4)
ModLME = lme(Var1~I(Var2)+I(Var2^2),
random = ~1|Var3,
weights = varIdent(form=~1|Var4),
Dataone, method="REML")
ModLMER = lmer(Var1~I(Var2)+I(Var2^2)+(1|Var3)+(0+dummy(Var4,"1")|Var5),
Dataone, REML = TRUE,
control=lmerControl(check.nobs.vs.nlev="ignore",
check.nobs.vs.nRE="ignore"))
Which are equivalent, see:
all.equal(REMLcrit(ModLMER), c(-2*logLik(ModLME)))
[1] TRUE
all.equal(fixef(ModLME), fixef(ModLMER), tolerance=1e-7)
[1] TRUE
> summary(ModLME)
Linear mixed-effects model fit by REML
Data: Dataone
AIC BIC logLik
-209.1431 -193.6948 110.5715
Random effects:
Formula: ~1 | Var3
(Intercept) Residual
StdDev: 0.05789852 0.03636468
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Var4
Parameter estimates:
0 1
1.000000 5.641709
Fixed effects: Var1 ~ I(Var2) + I(Var2^2)
Value Std.Error DF t-value p-value
(Intercept) 0.9538547 0.01699642 97 56.12093 0
I(Var2) -0.5009804 0.09336479 97 -5.36584 0
I(Var2^2) -0.4280151 0.10038257 97 -4.26384 0
summary(ModLMER)
Linear mixed model fit by REML. t-tests use Satterthwaites method [lmerModLmerTest]
Formula: Var1 ~ I(Var2) + I(Var2^2) + (1 | Var3) + (0 + dummy(Var4, "1") |
Var5)
Data: Dataone
Control: lmerControl(check.nobs.vs.nlev = "ignore", check.nobs.vs.nRE = "ignore")
REML criterion at convergence: -221.1
Scaled residuals:
Min 1Q Median 3Q Max
-4.1151 -0.5891 0.0374 0.5229 2.1880
Random effects:
Groups Name Variance Std.Dev.
Var3 (Intercept) 6.466e-12 2.543e-06
Var5 dummy(Var4, "1") 4.077e-02 2.019e-01
Residual 4.675e-03 6.837e-02
Number of obs: 100, groups: Var3, 100; Var5, 100
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.95385 0.01700 95.02863 56.121 < 2e-16 ***
I(Var2) -0.50098 0.09336 92.94048 -5.366 5.88e-07 ***
I(Var2^2) -0.42802 0.10038 91.64017 -4.264 4.88e-05 ***
However, when observing the residuals of these models, note that they are not similar. See that in the model adjusted by lmer, mysteriously appears a residue with the shape of a few points close to a straight line. So, how could you solve such a problem so that they are identical? I believe the problem is in the lme4 model.
aa=plot(ModLME, main="LME")
bb=plot(ModLMER, main="LMER")
gridExtra::grid.arrange(aa,bb,ncol=2)
I can tell you what's going on and what should in principle fix it, but at the moment the fix doesn't work ...
The residuals being plotted take all of the random effects into account, which in the case of the lmer fit includes the individual-level random effects (the (0+dummy(Var4,"1")|Var5) term), which leads to weird residuals for the Var4==1 group. To illustrate this:
plot(ModLMER, col = Dataone$Var4+1)
i.e., you can see that the weird residuals are exactly the ones in red == those for which Var4==1.
In theory we should be able to get the same residuals via:
res <- Dataone$Var1 - predict(ModLMER, re.form = ~(1|Var3))
i.e., ignore the group-specific observation-level random effect term. However, it looks like there is a bug at the moment ("contrasts can be applied only to factors with 2 or more levels").
An extremely hacky solution is to construct the random-effect predictions without the observation-level term yourself:
## fixed-effect predictions
p0 <- predict(ModLMER, re.form = NA)
## construct RE prediction, Var3 term only:
Z <- getME(ModLMER, "Z")
b <- drop(getME(ModLMER, "b"))
## zero out observation-level components
b[101:200] <- 0
## add RE predictions to fixed predictions
p1 <- drop(p0 + Z %*% b)
## plot fitted vs residual
plot(p1, Dataone$Var1 - p1)
For what it's worth, this also works:
library(glmmTMB)
ModGLMMTMB <- glmmTMB(Var1~I(Var2)+I(Var2^2)+(1|Var3),
dispformula = ~factor(Var4),
REML = TRUE,
data = Dataone)

How do I find the p-value for my random effect in my linear mixed effect model?

I am running the following line of code in R:
model = lme(divedepth ~ oarea, random=~1|deployid, data=GDataTimes, method="REML")
summary(model)
and I am seeing this result:
Linear mixed-effects model fit by REML
Data: GDataTimes
AIC BIC logLik
2512718 2512791 -1256352
Random effects:
Formula: ~1 | deployid
(Intercept) Residual
StdDev: 9.426598 63.50004
Fixed effects: divedepth ~ oarea
Value Std.Error DF t-value p-value
(Intercept) 25.549003 3.171766 225541 8.055135 0.0000
oarea2 12.619669 0.828729 225541 15.227734 0.0000
oarea3 1.095290 0.979873 225541 1.117787 0.2637
oarea4 0.852045 0.492100 225541 1.731447 0.0834
oarea5 2.441955 0.587300 225541 4.157933 0.0000
[snip]
Number of Observations: 225554
Number of Groups: 9
However, I cannot find the p-value for the random variable: deployID. How can I see this value?
As stated in the comments, there is stuff about significance tests of random effects in the GLMM FAQ. You should definitely consider:
why you are really interested in the p-value (it's not never of interest, but it's an unusual case)
the fact that the likelihood ratio test is extremely conservative for testing variance parameters (in this case it gives a p-value that's 2x too large)
Here's an example that shows that the lme() fit and the corresponding lm() model without the random effect have commensurate log-likelihoods (i.e., they're computed in a comparable way) and can be compared with anova():
Load packages and simulate data (with zero random effect variance)
library(lme4)
library(nlme)
set.seed(101)
dd <- data.frame(x = rnorm(120), f = factor(rep(1:3, 40)))
dd$y <- simulate(~ x + (1|f),
newdata = dd,
newparams = list(beta = rep(1, 2),
theta = 0,
sigma = 1))[[1]]
Fit models (note that you cannot compare a model fitted with REML to a model without random effects).
m1 <- lme(y ~ x , random = ~ 1 | f, data = dd, method = "ML")
m0 <- lm(y ~ x, data = dd)
Test:
anova(m1, m0)
## Model df AIC BIC logLik Test L.Ratio p-value
## m1 1 4 328.4261 339.5761 -160.2131
## m0 2 3 326.4261 334.7886 -160.2131 1 vs 2 6.622332e-08 0.9998
Here the test correctly identifies that the two models are identical and gives a p-value of 1.
If you use lme4::lmer instead of lme you have some other, more accurate (but slower) options (RLRsim and PBmodcomp packages for simulation-based tests): see the GLMM FAQ.

Zero-inflated two-part models in GLMMadaptive (R): anova on fixed effects zero-part?

I'm running a hurdle lognormal model using the GLMMadaptive package in R. Both the continuous part as well as the zero-part have categorical variables defined in the fixed effects. I would like to run an ANOVA on these categorical variables to detect if there is a main effect.
I've seen that using the glmmTMB package you are able to separately run an ANOVA on the conditional model and the zero-part model separately, as is demonstrated here.
Is there a similar strategy available for the GLMMadaptive package? (The glmmTMB does not support hurdle lognormal models as far as I understood). Perhaps using the joint_tests function from the emmeans package? If so, how do you define that you want to test the zero-part model? As emmeans::joint_tests(hurdlemodel) only gives the F-tests for the conditional part of the model.
Or as an alternative method, could you compare the fit of the models where you exclude the variable of interest against a the full model, as is demonstrated for the relevance of random effects in this vignette?
Many thanks!
The suggestion by Russ Lenth in the comments are implemented below, using the data and model in the GLMMadaptive two-part model vignette:
library(GLMMadaptive)
library(emmeans)
# data generating code from the vignette:
{
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 5 # maximum follow-up time
# we construct a data frame with the design:
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects non-zero part
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ 1, data = DF)
# design matrices for the fixed and random effects zero part
X_zi <- model.matrix(~ sex, data = DF)
Z_zi <- model.matrix(~ 1, data = DF)
betas <- c(1.5, 0.05, 0.05, -0.03) # fixed effects coefficients non-zero part
shape <- 2 # shape/size parameter of the negative binomial distribution
gammas <- c(-1.5, 0.5) # fixed effects coefficients zero part
D11 <- 0.5 # variance of random intercepts non-zero part
D22 <- 0.4 # variance of random intercepts zero part
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor non-zero part
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, 1, drop = FALSE]))
# linear predictor zero part
eta_zi <- as.vector(X_zi %*% gammas + rowSums(Z_zi * b[DF$id, 2, drop = FALSE]))
# we simulate negative binomial longitudinal data
DF$y <- rnbinom(n * K, size = shape, mu = exp(eta_y))
# we set the extra zeros
DF$y[as.logical(rbinom(n * K, size = 1, prob = plogis(eta_zi)))] <- 0
}
#create categorical time variable
DF$time_categorical[DF$time<2.5] <- "early"
DF$time_categorical[DF$time>=2.5] <- "late"
DF$time_categorical <- as.factor(DF$time_categorical)
#model with interaction in fixed effects zero part and adding nesting in zero part as in model above
km3 <- mixed_model(y ~ sex * time_categorical, random = ~ 1 | id, data = DF,
family = hurdle.lognormal(), n_phis = 1,
zi_fixed = ~ sex * time_categorical, zi_random = ~ 1 | id)
#### ATTEMPT at QDRG function in emmeans ####
coef_zero_part <- fixef(km3, sub_model = "zero_part")
vcov_zero_part <- vcov(km3)[9:12,9:12]
qd_km3 <- emmeans::qdrg(formula = ~ sex * time_categorical, data = DF,
coef = coef_zero_part, vcov = vcov_zero_part)
Output:
> joint_tests(qd_km3)
model term df1 df2 F.ratio p.value
sex 1 Inf 11.509 0.0007
time_categorical 1 Inf 0.488 0.4848
sex:time_categorical 1 Inf 1.077 0.2993
> emmeans(qd_km3, pairwise ~ sex|time_categorical)
$emmeans
time_categorical = early:
sex emmean SE df asymp.LCL asymp.UCL
male -1.592 0.201 Inf -1.99 -1.198
female -1.035 0.187 Inf -1.40 -0.669
time_categorical = late:
sex emmean SE df asymp.LCL asymp.UCL
male -1.914 0.247 Inf -2.40 -1.429
female -0.972 0.188 Inf -1.34 -0.605
Confidence level used: 0.95
$contrasts
time_categorical = early:
contrast estimate SE df z.ratio p.value
male - female -0.557 0.270 Inf -2.064 0.0390
time_categorical = late:
contrast estimate SE df z.ratio p.value
male - female -0.942 0.306 Inf -3.079 0.0021
Checking if contrasts correspond with zero-part fixed effects:
> fixef(km3, sub_model = "zero_part")
(Intercept) sexfemale time_categoricallate sexfemale:time_categoricallate
-1.5920415 0.5568072 -0.3220390 0.3849780
> (-1.5920) - (-1.5920 + 0.5568)
[1] -0.5568 #matches contrast within "early" level of "time_categorical"
> (-1.5920 + -0.3220) - (-1.5920 + -0.3220 + 0.5568 + 0.3850)
[1] -0.9418 #matches contrast within "late" level of "time_categorical"
The function emmeans::qdrg() can sometimes be used to create the needed object for a model not directly supported by emmeans. See its documentation. In very simple models (e.g., inheriting from lm, it may be enough to supply the object and data arguments.
That usually does not work for more sophisticated models, in which case
you will need to specify data, the fixed-effects formula for the conditional or zero part of the model, and the associated regression coefficients (coef) and variance-covariance matrix (vcov) for the part of the model in question. Often with models like this with multiple components, you likely will have to pick a subset of the coefficients and covariance matrix. These all must conform: the length of coef must equal the number of rows and columns of vcov and the number of columns in the model matrix generated by formula [which may be checked via model.matrix(formula, data = data)].
qdrg() will not work for a multivariate model -- or at least it's tricky -- because the implied model involves other factor(s) that delineate the levels of the multivariate response. If there are special provisions for, say, spline smoothing, that is another instance where qdrg() probably can't be made to work.
Once qdrg() actually runs and produces results, it is a good idea to use it to estimate some contrasts that are estimated by the model parameterization. For example, suppose that the model was fitted with the default contr.treatment contrasts. Then the regression coefficients are interpretable as a comparison with the first level as a reference level. Accordingly, if we computedrg <- qdrg(...), and one of the factors is "treat", look at contrast(rg, "trt.vs.ctrl1", simple = "treat"), and check to see if the first set of estimated contrasts matches the main-effect estimates for treat.
I will illustrate all of this with a simple lm model, ignoring the fact that it is already supported by emmeans.
> warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
Here is the reference grid
> rg <- qdrg(~ wool * tension, coef = coef(warp.lm), vcov = vcov(warp.lm),
+ df = df.residual(warp.lm), data = warpbreaks)
Here is a sanity check -- First, look at the model summary:
> summary(warp.lm)$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.55556 3.646761 12.217842 2.425903e-16
woolB -16.33333 5.157299 -3.167032 2.676803e-03
tensionM -20.55556 5.157299 -3.985721 2.280796e-04
tensionH -20.00000 5.157299 -3.877999 3.199282e-04
woolB:tensionM 21.11111 7.293523 2.894501 5.698287e-03
woolB:tensionH 10.55556 7.293523 1.447251 1.543266e-01
Second, look at selected contrasts:
> contrast(rg, "trt.vs.ctrl1", simple = "wool")
tension = L:
contrast estimate SE df t.ratio p.value
B - A -16.33 5.16 48 -3.167 0.0027
tension = M:
contrast estimate SE df t.ratio p.value
B - A 4.78 5.16 48 0.926 0.3589
tension = H:
contrast estimate SE df t.ratio p.value
B - A -5.78 5.16 48 -1.120 0.2682
> contrast(rg, "trt.vs.ctrl1", simple = "tension")
wool = A:
contrast estimate SE df t.ratio p.value
M - L -20.556 5.16 48 -3.986 0.0005
H - L -20.000 5.16 48 -3.878 0.0006
wool = B:
contrast estimate SE df t.ratio p.value
M - L 0.556 5.16 48 0.108 0.9863
H - L -9.444 5.16 48 -1.831 0.1338
P value adjustment: dunnettx method for 2 tests
Comparing with the regression coefficients, we do confirm that the first contrast for wool is estimated as -16.33, matching the regression coefficient for woolB. Also, the first set of contrasts for tension are estimated as -20.556 and -20.0, matching the regression coefficients for tensionM and tensionH. The SEs and t ratios match as well. (The P values for the second set do not match due to the multiplicity adjustment.)

weights option in GAM

My dataset has many redundant observations (but each observation should be counted). So I consider using 'weights' option in GAM because it significantly reduces computation time.
gam function (in mgcv package) explains that they are 'equivalent' (from ?gam on arguments weights):
"Note that a weight of 2, for example, is equivalent to having made exactly the same observation twice."
But it does not seem right.
yy = c(5,2,8,9)
xx = 1:4
wgts = c(3,2,4,1)
yy2 = rep(yy, wgts)
xx2 = rep(xx, wgts)
mod1 = gam(yy2 ~ xx2)
mod2 = gam(yy ~ xx, weights = wgts)
mod3 = gam(yy ~ xx, weights = wgts / mean(wgts))
predict(mod1,data.frame(xx2=1:4))
predict(mod2,data.frame(xx=1:4))
predict(mod3,data.frame(xx=1:4))
The estimates are identical in all three models.
Standard error are same in model 2 and 3 but different in model 1.
GCV is different in all three models.
I understand GCVs can be different. But how can we say that the models are identical if standard errors are different? Is this an error, or is there any good explanation for this?
The issues you saw is not about GAM. You have used gam to fit a parametric model, in which case gam behaves almost as same as lm. To answer your questions, it is sufficient to focus on the linear regression case. What happens to a linear model will happens to GLMs and GAMs, too. Here is how we can reproduce the issue with lm:
yy <- c(5,2,8,9)
xx <- 1:4
wgts <- c(3,2,4,1)
yy2 <- rep(yy,wgts)
xx2 <- rep(xx,wgts)
fit1 <- lm(yy2 ~ xx2)
fit2 <- lm(yy ~ xx, weights = wgts)
fit3 <- lm(yy ~ xx, weights = wgts/mean(wgts))
summary1 <- summary(fit1)
summary2 <- summary(fit2)
summary3 <- summary(fit3)
pred1 <- predict(fit1, list(xx2 = xx), interval = "confidence", se.fit = TRUE)
pred2 <- predict(fit2, list(xx = xx), interval = "confidence", se.fit = TRUE)
pred3 <- predict(fit3, list(xx = xx), interval = "confidence", se.fit = TRUE)
All models have the same regression coefficients, but other results may differ. You asked:
For weighted regression fit2 and fit3, why is almost everything the same except residual standard error?
Why is weighted regression (fit2 or fit3) not equivalent to ordinary regression with ties?
Your first question is about the scaling invariance of weight least squares to weights. Here is a brief summary I made:
If we rescale W by an arbitrary positive value, only residual standard error and unscaled covariance will change. Such change does not imply a different, non-equivalent model. In fact, everything related to prediction is not affected. In weighted regression, don't just look at sigma2; it is just a marginal variance. What is really of interest is the gross variance after multiplying weights. If you divide your weights by 2, you will find sigma2 doubles, but you still get the same result when multiplying them together.
summary2$coef
summary3$coef
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 2.128713 3.128697 0.6803832 0.5664609
#xx 1.683168 1.246503 1.3503125 0.3094222
pred2
pred3
#$fit
# fit lwr upr
#1 3.811881 -5.0008685 12.62463
#2 5.495050 -0.1299942 11.12009
#3 7.178218 0.6095820 13.74685
#4 8.861386 -1.7302209 19.45299
#
#$se.fit
# 1 2 3 4
#2.048213 1.307343 1.526648 2.461646
#
#$df
#[1] 2
#
#$residual.scale ## for `pred2`
#[1] 3.961448
#
#$residual.scale ## for `pred3`
#[1] 2.50544
Your second question is about the meaning of weights. Weights is used to model heteroscedastic response to overcome leverage effect in ordinary least square regression. Weights are proportional to reciprocal variance: You give bigger weights to data with smaller expected errors. Weights can be non-integer, so it does not have a naturual explanation in terms of repeated data. Thus, what is written in mgcv package is not rigorously correct.
The real difference between fit1 and fit2? is the degree of freedom. Check the above table for (n - p). n is the number of data you have, while p is the number of non-NA coefficients, so n - p is the residual degree of freedom. For both models we have p = 2 (intercept and slope), but for fit1 we have n = 10 while for fit2 we have n = 4. This has dramatic effect on inference, as now standard errors for coefficients and predictions (hence confidence intervals) will differ. These two models are far from being equivalent.
summary1$coef
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 2.128713 1.5643486 1.360766 0.21068210
#xx2 1.683168 0.6232514 2.700625 0.02704784
summary2$coef
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 2.128713 3.128697 0.6803832 0.5664609
#xx 1.683168 1.246503 1.3503125 0.3094222
pred1
#$fit
# fit lwr upr
#1 3.811881 1.450287 6.173475
#2 5.495050 3.987680 7.002419
#3 7.178218 5.417990 8.938446
#4 8.861386 6.023103 11.699669
#
#$se.fit
# 1 2 3 4
#1.0241066 0.6536716 0.7633240 1.2308229
#
#$df # note, this is `10 - 2 = 8`
#[1] 8
#
#$residual.scale
#[1] 1.980724
pred2
#$fit
# fit lwr upr
#1 3.811881 -5.0008685 12.62463
#2 5.495050 -0.1299942 11.12009
#3 7.178218 0.6095820 13.74685
#4 8.861386 -1.7302209 19.45299
#
#$se.fit
# 1 2 3 4
#2.048213 1.307343 1.526648 2.461646
#
#$df # note, this is `4 - 2 = 2`
#[1] 2
#
#$residual.scale ## for `pred2`
#[1] 3.961448

lmer linear contrasts : Kenward Rogers or Satterthwaite DF and SE

In R, I am searching for a way to estimate confidence intervals for linear contrasts for lmer models that use either kenward-rogers or satterthwaite degrees of freedom and SE.
For example, I can compute a CI for a fixed effect parameter in a mixed model like SAS with R, using the t-value (with df from KR) and SE.
mod<-lmerTest::lmer(y~time1+treatment+time1:treatment+(1|PersonID),data=data)
lmerTest::summary(mod,ddf = "Kenward-Roger")
This output:
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 49.0768 1.0435 56.4700 47.029 < 2e-16 ***
time1 5.8224 0.5963 48.0000 9.764 5.51e-13 ***
treatment 1.6819 1.4758 56.4700 1.140 0.2592
time1:treatment 2.0425 0.8433 48.0000 2.422 0.0193 *
Allows a CI for time1 like:
5.8224+abs(qt(0.05/2, 48))*0.5963 #7.021342
5.8224-abs(qt(0.05/2, 48))*0.5963 #4.623458
I would like to do this same thing for a linear contrast of the fixed coefficients. This is the p-value but there is no SE output.
pbkrtest::KRmodcomp(mod,matrix(c(0,0,1,0),nrow = 1))
stat ndf ddf F.scaling p.value
Ftest 1.2989 1.0000 56.4670 1 0.2592
Is there anyway to get a SE or a CI from lmer linear contrasts that uses this type of df?
For this, you have at least two options: using the lsmeans package, or doing it manually (using functions vcovAdj.lmerMod and pbkrtest::get_Lb_ddf). Personally, I go with the later if the contrast to be tested is not very "simple", because I find the syntax in lsmeans a bit complicated.
To exemplify, take the following model:
library(pbkrtest)
library(lme4)
library(nlme) # for the 'Orthodont' data
# 'age' is a numeric variable, while 'Sex' and 'Subject' are factors
model <- lmer(distance ~ age : Sex + (1 | Subject), data = Orthodont)
Linear mixed model fit by REML ['lmerMod']
Formula: distance ~ age:Sex + (1 | Subject)
…
Fixed Effects:
(Intercept) age:SexMale age:SexFemale
16.7611 0.7555 0.5215
from which we would like to obtain stats on the difference between the coefficients for age in males and females (i.e., age:SexMale - age:SexFemale).
Using lsmeans:
library(lsmeans)
# Evaluate the contrast at a value of 'age' set to 1,
# so that the resulting value is equal to the regression coefficient
lsm = lsmeans(model, pairwise ~ age : Sex, at = list(age = 1))$contrast
produces:
contrast estimate SE df t.ratio p.value
1,Male - 1,Female 0.2340135 0.06113276 42.64 3.828 0.0004
Alternatively, doing the calculation manually:
# Specify the contrasts: age:SexMale - age:SexFemale
# Must have the same order as the fixed effects in the model
K = c("(Intercept)" = 0, "age:SexMale" = 1, "age:SexFemale" = -1)
# Retrieve the adjusted variance-covariance matrix, to calculate the SE
V = pbkrtest::vcovAdj.lmerMod(model, 0)
# Point estimate, SE and df
point_est = sum(K * fixef(model))
SE = sqrt(sum(K * (V %*% K)))
df = pbkrtest::get_Lb_ddf(model, K)
alpha = 0.05 # significance level
# Calculate confidence interval for the difference between the 'age' coefficients for males and females
Delta_age_CI = point_est + SE * qt(c(0.5 * alpha, 1 - 0.5 * alpha), df)
will result in a point estimate equal to 0.2340135, SE 0.06113276, df 42.63844, and confidence interval [0.1106973, 0.3573297]

Resources