How to get coefficients and their confidence intervals in mixed effects models? - r

In lm and glm models, I use functions coef and confint to achieve the goal:
m = lm(resp ~ 0 + var1 + var1:var2) # var1 categorical, var2 continuous
coef(m)
confint(m)
Now I added random effect to the model - used mixed effects models using lmer function from lme4 package. But then, functions coef and confint do not work any more for me!
> mix1 = lmer(resp ~ 0 + var1 + var1:var2 + (1|var3))
# var1, var3 categorical, var2 continuous
> coef(mix1)
Error in coef(mix1) : unable to align random and fixed effects
> confint(mix1)
Error: $ operator not defined for this S4 class
I tried to google and use docs but with no result. Please point me in the right direction.
EDIT: I was also thinking whether this question fits more to https://stats.stackexchange.com/ but I consider it more technical than statistical, so I concluded it fits best here (SO)... what do you think?

Not sure when it was added, but now confint() is implemented in lme4.
For example the following example works:
library(lme4)
m = lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
confint(m)

There are two new packages, lmerTest and lsmeans, that can calculate 95% confidence limits for lmer and glmer output. Maybe you can look into those? And coefplot2, I think can do it too (though as Ben points out below, in a not so sophisticated way, from the standard errors on the Wald statistics, as opposed to Kenward-Roger and/or Satterthwaite df approximations used in lmerTest and lsmeans)... Just a shame that there are still no inbuilt plotting facilities in package lsmeans (as there are in package effects(), which btw also returns 95% confidence limits on lmer and glmer objects but does so by refitting a model without any of the random factors, which is evidently not correct).

I suggest that you use good old lme (in package nlme). It has confint, and if you need confint of contrasts, there is a series of choices (estimable in gmodels, contrast in contrasts, glht in multcomp).
Why p-values and confint are absent in lmer: see http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76742.html .

Assuming a normal approximation for the fixed effects (which confint would also have done), we can obtain 95% confidence intervals by
estimate + 1.96*standard error.
The following does not apply to the variance components/random effects.
library("lme4")
mylm <- lmer(Reaction ~ Days + (Days|Subject), data =sleepstudy)
# standard error of coefficient
days_se <- sqrt(diag(vcov(mylm)))[2]
# estimated coefficient
days_coef <- fixef(mylm)[2]
upperCI <- days_coef + 1.96*days_se
lowerCI <- days_coef - 1.96*days_se

I'm going to add a bit here. If m is a fitted (g)lmer model (most of these work for lme too):
fixef(m) is the canonical way to extract coefficients from mixed models (this convention began with nlme and has carried over to lme4)
you can get the full coefficient table with coef(summary(m)); if you have loaded lmerTest before fitting the model, or convert the model after fitting (and then loading lmerTest) via coef(summary(as(m,"merModLmerTest"))), then the coefficient table will include p-values. (The coefficient table is a matrix; you can extract the columns via e.g. ctab[,"Estimate"], ctab[,"Pr(>|t|)"], or convert the matrix to a data frame and use $-indexing.)
As stated above you can get likelihood profile confidence intervals via confint(m); these may be computationally intensive. If you use confint(m, method="Wald") you'll get the standard +/- 1.96SE confidence intervals. (lme uses intervals(m) instead of confint().)
If you prefer to use broom.mixed:
tidy(m,effects="fixed") gives you a table with estimates, standard errors, etc.
tidy(as(m,"merModLmerTest"), effects="fixed") (or fitting with lmerTest in the first place) includes p-values
adding conf.int=TRUE gives (Wald) CIs
adding conf.method="profile" (along with conf.int=TRUE) gives likelihood profile CIs
You can also get confidence intervals by parametric bootstrap (method="boot"), which is considerably slower but more accurate in some circumstances.

To find the coefficient, you can simply use the summary function of lme4
m = lm(resp ~ 0 + var1 + var1:var2) # var1 categorical, var2 continuous
m_summary <- summary(m)
to have all coefficients :
m_summary$coefficient
If you want the confidence interval, multiply the standart error by 1.96:
CI <- m_summary$coefficient[,"Std. Error"]*1.96
print(CI)

I'd suggest tab_model() function from sjPlot package as alternative. Clean and readable output ready for markdown. Reference here and examples here.
For those more visually inclined plot_model() from the same package might come handy too.
Alternative solution is via parameters package using model_parameters() function.

Related

How to obtain R^2 for robust mixed effect model (rlmer command; robustlmm)?

I estimated a robust mixed effect model with the rlmercommand from the robustlmmpackage. Is there a way to obtain the marginal and conditional R^2 values?
Just going to answer that myself. I could not find a package or rather a function in R that is equivalent to e.g. r.squaredGLMM in the case of lmerMod objects but I found a quick workaround that works with rlmerMod objects. Basically you just have to extract the variance components for the fixed effects, random effects and residuals and then manually calcualte the marginal and conditional R^2 based on the formula provided by Nakagawa & Schielzeth (2013).
library(robustlmm)
library(insight)
library(lme4)
data(Dyestuff, package = "lme4")
robust.model <- rlmer(Yield ~ 1|Batch, data=Dyestuff)
var.fix <- get_variance_fixed(robust.model)
var.ran <- get_variance_random(robust.model)
var.res <- get_variance_residual(robust.model)
R2m = var.fix/(var.fix+var.ran+var.res)
R2c = (var.fix+var.ran)/(var.fix+var.ran+var.res)
Literature:
Nakagawa, S. and Schielzeth, H. (2013), A general and simple method for obtaining R2 from generalized linear mixed‐effects models. Methods Ecol Evol, 4: 133-142. doi:10.1111/j.2041-210x.2012.00261.x

How I do to obtain the solution for Random Effects using package lme4?

I have a model similar to this:
model=lmer(y ~ (1|ID) + Factor.A + Factor.B, data=df)
I would like to obtain the solution of random effects, but I only could obtain the solution of fixed effects, using this codes:
coef(summary(model))
summary(model)
I tried this code too:
coef(model)
but I suppose this output is not for the solution of random effects. Is there a code to obtain the solution of random effects using the package lme4 or another one?
Using only the lme4 package, you can most conveniently get the conditional modes along with the conditional standard deviations via as.data.frame(ranef(fitted_model)):
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
as.data.frame(ranef(fm1))
## grpvar term grp condval condsd
## 1 Subject (Intercept) 308 2.2575329 12.070389
## 2 Subject (Intercept) 309 -40.3942719 12.070389
## 3 Subject (Intercept) 310 -38.9563542 12.070389
## ... etc.
I'm not sure I would be comfortable calling these "standard errors" - there's a whole can of worms here about what kind of inferences you can make on the observed conditional values of random variables ... according to Doug Bates
Regarding the terminology, I prefer to call the quantities that are
returned by the ranef extractor "the conditional modes of the random
effects". If you want to be precise, these are the conditional modes
(for a linear mixed model they are also the conditional means) of the
random effects B given Y = y, evaluated at the parameter estimates.
One can also evaluate the cond[i]tional variance-covariance of B given Y
= y and hence obtain a prediction interval.
I think clearly stating your question, and what you are trying to do would be helpful. However, based on the comments, I think I know what you are trying to do.
As #Marius said, ranef(model) will give you the intercepts.
the package arm has a se.ranef function that gives you "standard errors". I am not sure how these are calculated. See this link to make sure that it is doing what you want it to:
https://rdrr.io/cran/arm/man/se.coef.html
So all together:
library(lme4)
model=lmer(y ~ (1|ID) + Factor.A + Factor.B, data=df)
ranef(model)
library(arm)
se.ranef(model)

How to get 95% CIs using ezANOVA()

This is a programming question for people who like to use the ez package in R. I am accustomed to using linear mixed effects models with lmer(). Among the useful outputs of lmer (), I get a coefficient value for each of my experimental factors, and using pvals.fnc() I can easily get 95% confidence intervals (CI) to report together with the model coefficients.
I have recently started using ezANOVA, and I would like to know: Is there a mainstream way to get the same output? That is, I'd like to get a value for the coefficient of an experimental factor and a CI to go along with it. Here is sample code to make this concrete:
library(languageR) #necessary to use pvals.fnc()
library(lme4) #necessary for lmer()
library(ez) #necessary for ezANOVA
data(ANT) #load sample data
If I were using lmer, I would estimate my model and then get 95% CIs for the coefficients:
model_lmer = lmer( formula = rt ~ cue*flank + (1|subnum), data = ANT)
pvals.fnc(model_lmer, withMCMC=T)$fixed
So, for example, I know that the estimate of the interaction between cue and flank (when cue has the level "center" and flank has the level "congruent") is -3.9511 and the 95% CI is [-12.997, 5.535]
Now say that I want to run an anova by-subjects and by-items using ezANOVA, and I want to get 95% CIs for the by-subject estimates. This is my model:
model.f1 = ezANOVA(data=ANT, dv=rt,wid=subnum,within=.(cue,flank),return_aov=T)
But in the output, I don't see the model estimates when I do:
model.f1$ANOVA
And I don't know how to calculate the 95% CIs corresponding to those estimates. I think I should be able to use ezBoot() but I tried and I'm not sure how to implement it.
Any suggestions? Thanks for your help!
This answer was provided by the author of the "ez" package in another forum. I'm copying it here in case someone else finds it useful:
"One somewhat hacky way to get CIs for effects is to use ezStats () to get the means
and FLSD, compute the difference between the means to get the effect,
and divide the FLSD by sqrt(2) to get the CI"

Logistic Regression Using R

I am running logistic regressions using R right now, but I cannot seem to get many useful model fit statistics. I am looking for metrics similar to SAS:
http://www.ats.ucla.edu/stat/sas/output/sas_logit_output.htm
Does anyone know how (or what packages) I can use to extract these stats?
Thanks
Here's a Poisson regression example:
## from ?glm:
d.AD <- data.frame(counts=c(18,17,15,20,10,20,25,13,12),
outcome=gl(3,1,9),
treatment=gl(3,3))
glm.D93 <- glm(counts ~ outcome + treatment,data = d.AD, family=poisson())
Now define a function to fit an intercept-only model with the same response, family, etc., compute summary statistics, and combine them into a table (matrix). The formula .~1 in the update command below means "refit the model with the same response variable [denoted by the dot on the LHS of the tilde] but with only an intercept term [denoted by the 1 on the RHS of the tilde]"
glmsumfun <- function(model) {
glm0 <- update(model,.~1) ## refit with intercept only
## apply built-in logLik (log-likelihood), AIC,
## BIC (Bayesian/Schwarz Information Criterion) functions
## to models with and without intercept ('model' and 'glm0');
## combine the results in a two-column matrix with appropriate
## row and column names
matrix(c(logLik(glm.D93),BIC(glm.D93),AIC(glm.D93),
logLik(glm0),BIC(glm0),AIC(glm0)),ncol=2,
dimnames=list(c("logLik","SC","AIC"),c("full","intercept_only")))
}
Now apply the function:
glmsumfun(glm.D93)
The results:
full intercept_only
logLik -23.38066 -26.10681
SC 57.74744 54.41085
AIC 56.76132 54.21362
EDIT:
anova(glm.D93,test="Chisq") gives a sequential analysis of deviance table containing df, deviance (=-2 log likelihood), residual df, residual deviance, and the likelihood ratio test (chi-squared test) p-value.
drop1(glm.D93) gives a table with the AIC values (df, deviances, etc.) for each single-term deletion; drop1(glm.D93,test="Chisq") additionally gives the LRT test p value.
Certainly glm with a family="binomial" argument is the function most commonly used for logistic regression. The default handling of contrasts of factors is different. R uses treatment contrasts and SAS (I think) uses sum contrasts. You can look these technical issues up on R-help. They have been discussed many, many times over the last ten+ years.
I see Greg Snow mentioned lrm in 'rms'. It has the advantage of being supported by several other functions in the 'rms' suite of methods. I would use it , too, but learning the rms package may take some additional time. I didn't see an option that would create SAS-like output.
If you want to compare the packages on similar problems that UCLA StatComputing pages have another resource: http://www.ats.ucla.edu/stat/r/dae/default.htm , where a large number of methods are exemplified in SPSS, SAS, Stata and R.
Using the lrm function in the rms package may give you the output that you are looking for.

Heteroscedasticity robust standard errors with the PLM package

I am trying to learn R after using Stata and I must say that I love it. But now I am having some trouble. I am about to do some multiple regressions with Panel Data so I am using the plm package.
Now I want to have the same results with plm in R as when I use the lm function and Stata when I perform a heteroscedasticity robust and entity fixed regression.
Let's say that I have a panel dataset with the variables Y, ENTITY, TIME, V1.
I get the same standard errors in R with this code
lm.model<-lm(Y ~ V1 + factor(ENTITY), data=data)
coeftest(lm.model, vcov.=vcovHC(lm.model, type="HC1))
as when I perform this regression in Stata
xi: reg Y V1 i.ENTITY, robust
But when I perform this regression with the plm package I get other standard errors
plm.model<-plm(Y ~ V1 , index=C("ENTITY","YEAR"), model="within", effect="individual", data=data)
coeftest(plm.model, vcov.=vcovHC(plm.model, type="HC1))
Have I missed setting some options?
Does the plm model use some other kind of estimation and if so how?
Can I in some way have the same standard errors with plm as in Stata with , robust
By default the plm package does not use the exact same small-sample correction for panel data as Stata. However in version 1.5 of plm (on CRAN) you have an option that will emulate what Stata is doing.
plm.model<-plm(Y ~ V1 , index=C("ENTITY","YEAR"), model="within",
effect="individual", data=data)
coeftest(plm.model, vcov.=function(x) vcovHC(x, type="sss"))
This should yield the same clustered by group standard-errors as in Stata (but as mentioned in the comments, without a reproducible example and what results you expect it's harder to answer the question).
For more discussion on this and some benchmarks of R and Stata robust SEs see Fama-MacBeth and Cluster-Robust (by Firm and Time) Standard Errors in R.
See also:
Clustered standard errors in R using plm (with fixed effects)
Is it possible that your Stata code is different from what you are doing with plm?
plm's "within" option with "individual" effects means a model of the form:
yit = a + Xit*B + eit + ci
What plm does is to demean the coefficients so that ci drops from the equation.
yit_bar = Xit_bar*B + eit_bar
Such that the "bar" suffix means that each variable had its mean subtracted. The mean is calculated over time and that is why the effect is for the individual. You could also have a fixed time effect that would be common to all individuals in which case the effect would be through time as well (that is irrelevant in this case though).
I am not sure what the "xi" command does in STATA, but i think it expands an interaction right ? Then it seems to me that you are trying to use a dummy variable per ENTITY as was highlighted by #richardh.
For your Stata and plm codes to match you must be using the same model.
You have two options:(1) you xtset your data in stata and use the xtreg option with the fe modifier or (2) you use plm with the pooling option and one dummy per ENTITY.
Matching Stata to R:
xtset entity year
xtreg y v1, fe robust
Matching plm to Stata:
plm(Y ~ V1 + as.factor(ENTITY) , index=C("ENTITY","YEAR"), model="pooling", effect="individual", data=data)
Then use vcovHC with one of the modifiers. Make sure to check this paper that has a nice review of all the mechanics behind the "HC" options and the way they affect the variance covariance matrix.
Hope this helps.

Resources