How to compute reliability estimates from lmer / lme results? - r

Disclaimer: Sorry for including images. I tried writing formulas in markdown format but failed
I'm coming from a commercial MLM (HLM7) software and would like to replicate some numbers in R.
Specifically i'm looking for a function or formula computing the reliability for the least squares estimates of each level1 coefficient across the set of J level-2 units
Below is an example based on the simple sleepstudy data. What I'm looking for is a way to compute reliability values not only in this very example, but also in situations where there are more level1 variables.
From HLM7 manual (Raudenbush, Bryk (2002), p.11) a definition of reliability is given:
Equation 3.58 in Hierarchical Linear Models (2nd ed.)
which is followed by the notion that:
I used the sleepstudy data from lme4 package to compute a random intercept and slope model with lme4::lmer:
library(lme4)
m <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)
summary(m)
And with HLM7 software
Fixed and random effects estimates are pretty similar (differences in rounding occur), but HLM7 will also provide it's reliability estimates:
----------------------------------------------------
Random level-1 coefficient Reliability estimate
----------------------------------------------------
INTRCPT1, G0 0.730
DAYS, G1 0.815
----------------------------------------------------
And this is something I'd like to be able to get from lmer() results.
Any ideas?
Thanks a lot

I needed to include reliability measures for a school project and could not find any package that provided these values. Thus, I had to create them via dplyr:
parameter variance/(parameter variance + (residual variance/n clusters)). Hope this helps should anyone else seek the answer to this question.

Related

Longitudinal analysis using sampling weigths in R

I have longitudinal data from two surveys and I want to do a pre-post analysis. Normally, I would use survey::svyglm() or svyVGAM::svy_vglm (for multinomial family) to include sampling weights, but these functions don't account for the random effects. On the other hand, lme4::lmer accounts for the repeated measures, but not the sampling weights.
For continuous outcomes, I understand that I can do
w_data_wide <- svydesign(ids = ~1, data = data_wide, weights = data_wide$weight)
svyglm((post-pre) ~ group, w_data_wide)
and get the same estimates that I would get if I could use lmer(outcome ~ group*time + (1|id), data_long) with weights [please correct me if I'm wrong].
However, for categorical variables, I don't know how to do the analyses. WeMix::mix() has a parameter weights, but I'm not sure if it treats them as sampling weights. Still, this function can't support multinomial family.
So, to resume: can you enlighten me on how to do a pre-post test analysis of categorical outcomes with 2 or more levels? Any tips about packages/functions in R and how to use/write them would be appreciated.
I give below some data sets with binomial and multinomial outcomes:
library(data.table)
set.seed(1)
data_long <- data.table(
id=rep(1:5,2),
time=c(rep("Pre",5),rep("Post",5)),
outcome1=sample(c("Yes","No"),10,replace=T),
outcome2=sample(c("Low","Medium","High"),10,replace=T),
outcome3=rnorm(10),
group=rep(sample(c("Man","Woman"),5,replace=T),2),
weight=rep(c(1,0.5,1.5,0.75,1.25),2)
)
data_wide <- dcast(data_long, id~time, value.var = c('outcome1','outcome2','outcome3','group','weight'))[, `:=` (weight_Post = NULL, group_Post = NULL)]
EDIT
As I said below in the comments, I've been using lmer and glmer with variables used to calculate the weights as predictors. It happens that glmer returns a lot of problems (convergence, high eigenvalues...), so I give another look at #ThomasLumley answer in this post and others (https://stat.ethz.ch/pipermail/r-help/2012-June/315529.html | https://stats.stackexchange.com/questions/89204/fitting-multilevel-models-to-complex-survey-data-in-r).
So, my question is now if a can use participants id as clusters in svydesign
library(survey)
w_data_long_cluster <- svydesign(ids = ~id, data = data_long, weights = data_long$weight)
summary(svyglm(factor(outcome1) ~ group*time, w_data_long_cluster, family="quasibinomial"))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.875e+01 1.000e+00 18.746 0.0339 *
groupWoman -1.903e+01 1.536e+00 -12.394 0.0513 .
timePre 5.443e-09 5.443e-09 1.000 0.5000
groupWoman:timePre 2.877e-01 1.143e+00 0.252 0.8431
and still interpret groupWoman:timePre as differences in the average rate of change/improvement in the outcome over time between sex groups, as if I was using mixed models with participants as random effects.
Thank you once again!
A linear model with svyglm does not give the same parameter estimates as lme4::lmer. It does estimate the same parameters as lme4::lmer if the model is correctly specified, though.
Generalised linear models with svyglm or svy_vglm don't estimate the same parameters as lme4::glmer, as you note. However, they do estimate perfectly good regression parameters and if you aren't specifically interested in the variance components or in estimating the realised random effects (BLUPs) I would recommend just using svy_glm.
Another option if you have non-survey software for random effects versions of the models is to use that. If you scale the weights to sum to the sample size and if all the clustering in the design is modelled by random effects in the model, you will get at least a reasonable approximation to valid inference. That's what I've seen recommended for Bayesian survey modelling, for example.

Can emmeans apply sphericity corrections to repeated-measures contrasts?

I am analysing a 2-way repeated-measures dataset, modelled as follows:
model= lmer(result ~ treatment * time + (1|subject), data=df)
... where every subject receives every treatment and is tested at every time. However when analysing the contrasts, there appears to be no correction for sphericity. Here I am using emmeans to test for a difference between each treatment and the control-treatment, at each level of "time"...
emm <- emmeans(model,c("treatment","time")
contrast(emm, "trt.vs.ctrl", ref="Control", by="time")
When I look at the output from contrast(), I confirmed there is no G-G correction by comparing with the output from GraphPad Prism for the same dataset.
Is there a simple way of achieving a sphericity-corrected analysis of the contrasts?
Thank's to the commentors for identifying this solution. the afex() packages is specifically designed for repeated measures factorial designs, and allows the appropriate corrections.
aov_ez (in the afex package) automatically applies corrections for non-sphericity
emmeans should specify that the model is multivariate
contrast (in the emmeans package) should specify the appropriate adjustment to "p"
The solution therefore would look like this...
library(afex)
library(emmeans)
model= aov_ez("subject","result",df,within=c("treatment","time"),type="III")
emm= emmeans(model,c("treatment","time"),model="multivariate")
contrast(emm,"trt.vs.ctrl",ref="Control",by="time",adjust="dunn")
Hope this is helpful for anyone else with a similar question!

Consider autocorrelation in a Linear Quantile mixed models (LQMM)

(I am using R and the lqmm package)
I was wondering how to consider autocorrelation in a Linear Quantile mixed models (LQMM).
I have a data frame that looks like this:
df1<-data.frame( Time=seq(as.POSIXct("2017-11-13 00:00:00",tz="UTC"),
as.POSIXct("2017-11-13 00:1:59",tz="UTC"),"sec"),
HeartRate=rnorm(120, mean=60, sd=10),
Treatment=rep("TreatmentA",120),
AnimalID=rep("ID01",120),
Experiment=rep("Exp01",120))
df2<-data.frame( Time=seq(as.POSIXct("2017-08-11 00:00:00",tz="UTC"),
as.POSIXct("2017-08-11 00:1:59",tz="UTC"),"sec"),
HeartRate=rnorm(120, mean=62, sd=14),
Treatment=rep("TreatmentB",120),
AnimalID=rep("ID02",120),
Experiment=rep("Exp02",120))
df<-rbind(df1,df2)
head(df)
With:
The heart rates (HeartRate) that are measured every second on some animals (AnimalID). These measures are carried during an experiment (Experiment) with different treatment possible (Treatment). Each animal (AnimalID) was observed for multiple experiments with different treatments. I wish to look at the effect of the variable Treatment on the 90th percentile of the Heart Rates but including Experiment as a random effect and consider the autocorrelation (as heart rates are taken every second). (If there is a way to include AnimalID as random effect as well it would be even better)
Model for now:
library(lqmm)
model<-lqmm(fixed= HeartRate ~ Treatment, random= ~1| Exp01, data=df, tau=0.9)
Thank you very much in advance for your help.
Let me know if you need more information.
For resources on thinking about this type of problem you might look at chapters 17 and 19 of Koenker et al. 2018 Handbook of Quantile Regression from CRC Press. Neither chapter has nice R code to go from, but they discuss different approaches to the kind of data you're working with. lqmm does use nlme machinery, so there may be a way to customize the covariance matrices for the random effects, but I suspect it would be easiest to either ask for help from the package author or to do a deep dive into the package code to figure out how to do that.
Another resource is the quantile regression model for mixed effects models accounting for autocorrelation in 'Quantile regression for mixed models with an application to examine blood pressure trends in China' by Smith et al. (2015). They model a bivariate response with a copula, but you could do the simplified version with univariate response. I think their model only at this points incorporates lag-1 correlation structure within subjects/clusters. The code for that model does not seem to be available online either though.

Variable selection methods

I have been doing variable selection for a modeling problem.
I have used trial and error for the selection (adding / removing a variable) with a decrease in error. However, I have the challenge as the number of variables grows into the hundreds that manual variable selection can not be performed as the model takes 1/2 hour to compute, rendering the task impossible.
Would you happen to know of any other packages than the regsubsets from the leaps package (which when tested with the same trial and error variables produced a higher error, it did not include some variables which were lineraly dependant - excluding some valuable variables).
You need a better (i.e. not flawed) approach to model selection. There are plenty of options, but one that should be easy to adapt to your situation would be using some form of regularization, such as the Lasso or the elastic net. These apply shrinkage to the sizes of the coefficients; if a coefficient is shrunk from its least squares solution to zero, that variable is removed from the model. The resulting model coefficients are slightly biased but they have lower variance than the selected OLS terms.
Take a look at the lars, glmnet, and penalized packages
Try using the stepAIC function of the MASS package.
Here is a really minimal example:
library(MASS)
data(swiss)
str(swiss)
lm <- lm(Fertility ~ ., data = swiss)
lm$coefficients
## (Intercept) Agriculture Examination Education Catholic
## 66.9151817 -0.1721140 -0.2580082 -0.8709401 0.1041153
## Infant.Mortality
## 1.0770481
st1 <- stepAIC(lm, direction = "both")
st2 <- stepAIC(lm, direction = "forward")
st3 <- stepAIC(lm, direction = "backward")
summary(st1)
summary(st2)
summary(st3)
You should try the 3 directions and ckeck which model works better with your test data.
Read ?stepAIC and take a look at the examples.
EDIT
It's true stepwise regression isn't the greatest method. As it's mentioned in GavinSimpson answer, lasso regression is a better/much more efficient method. It's much faster than stepwise regression and will work with large datasets.
Check out the glmnet package vignette:
http://www.stanford.edu/~hastie/glmnet/glmnet_alpha.html

How to get coefficients and their confidence intervals in mixed effects models?

In lm and glm models, I use functions coef and confint to achieve the goal:
m = lm(resp ~ 0 + var1 + var1:var2) # var1 categorical, var2 continuous
coef(m)
confint(m)
Now I added random effect to the model - used mixed effects models using lmer function from lme4 package. But then, functions coef and confint do not work any more for me!
> mix1 = lmer(resp ~ 0 + var1 + var1:var2 + (1|var3))
# var1, var3 categorical, var2 continuous
> coef(mix1)
Error in coef(mix1) : unable to align random and fixed effects
> confint(mix1)
Error: $ operator not defined for this S4 class
I tried to google and use docs but with no result. Please point me in the right direction.
EDIT: I was also thinking whether this question fits more to https://stats.stackexchange.com/ but I consider it more technical than statistical, so I concluded it fits best here (SO)... what do you think?
Not sure when it was added, but now confint() is implemented in lme4.
For example the following example works:
library(lme4)
m = lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
confint(m)
There are two new packages, lmerTest and lsmeans, that can calculate 95% confidence limits for lmer and glmer output. Maybe you can look into those? And coefplot2, I think can do it too (though as Ben points out below, in a not so sophisticated way, from the standard errors on the Wald statistics, as opposed to Kenward-Roger and/or Satterthwaite df approximations used in lmerTest and lsmeans)... Just a shame that there are still no inbuilt plotting facilities in package lsmeans (as there are in package effects(), which btw also returns 95% confidence limits on lmer and glmer objects but does so by refitting a model without any of the random factors, which is evidently not correct).
I suggest that you use good old lme (in package nlme). It has confint, and if you need confint of contrasts, there is a series of choices (estimable in gmodels, contrast in contrasts, glht in multcomp).
Why p-values and confint are absent in lmer: see http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76742.html .
Assuming a normal approximation for the fixed effects (which confint would also have done), we can obtain 95% confidence intervals by
estimate + 1.96*standard error.
The following does not apply to the variance components/random effects.
library("lme4")
mylm <- lmer(Reaction ~ Days + (Days|Subject), data =sleepstudy)
# standard error of coefficient
days_se <- sqrt(diag(vcov(mylm)))[2]
# estimated coefficient
days_coef <- fixef(mylm)[2]
upperCI <- days_coef + 1.96*days_se
lowerCI <- days_coef - 1.96*days_se
I'm going to add a bit here. If m is a fitted (g)lmer model (most of these work for lme too):
fixef(m) is the canonical way to extract coefficients from mixed models (this convention began with nlme and has carried over to lme4)
you can get the full coefficient table with coef(summary(m)); if you have loaded lmerTest before fitting the model, or convert the model after fitting (and then loading lmerTest) via coef(summary(as(m,"merModLmerTest"))), then the coefficient table will include p-values. (The coefficient table is a matrix; you can extract the columns via e.g. ctab[,"Estimate"], ctab[,"Pr(>|t|)"], or convert the matrix to a data frame and use $-indexing.)
As stated above you can get likelihood profile confidence intervals via confint(m); these may be computationally intensive. If you use confint(m, method="Wald") you'll get the standard +/- 1.96SE confidence intervals. (lme uses intervals(m) instead of confint().)
If you prefer to use broom.mixed:
tidy(m,effects="fixed") gives you a table with estimates, standard errors, etc.
tidy(as(m,"merModLmerTest"), effects="fixed") (or fitting with lmerTest in the first place) includes p-values
adding conf.int=TRUE gives (Wald) CIs
adding conf.method="profile" (along with conf.int=TRUE) gives likelihood profile CIs
You can also get confidence intervals by parametric bootstrap (method="boot"), which is considerably slower but more accurate in some circumstances.
To find the coefficient, you can simply use the summary function of lme4
m = lm(resp ~ 0 + var1 + var1:var2) # var1 categorical, var2 continuous
m_summary <- summary(m)
to have all coefficients :
m_summary$coefficient
If you want the confidence interval, multiply the standart error by 1.96:
CI <- m_summary$coefficient[,"Std. Error"]*1.96
print(CI)
I'd suggest tab_model() function from sjPlot package as alternative. Clean and readable output ready for markdown. Reference here and examples here.
For those more visually inclined plot_model() from the same package might come handy too.
Alternative solution is via parameters package using model_parameters() function.

Resources