Can emmeans apply sphericity corrections to repeated-measures contrasts? - r

I am analysing a 2-way repeated-measures dataset, modelled as follows:
model= lmer(result ~ treatment * time + (1|subject), data=df)
... where every subject receives every treatment and is tested at every time. However when analysing the contrasts, there appears to be no correction for sphericity. Here I am using emmeans to test for a difference between each treatment and the control-treatment, at each level of "time"...
emm <- emmeans(model,c("treatment","time")
contrast(emm, "trt.vs.ctrl", ref="Control", by="time")
When I look at the output from contrast(), I confirmed there is no G-G correction by comparing with the output from GraphPad Prism for the same dataset.
Is there a simple way of achieving a sphericity-corrected analysis of the contrasts?

Thank's to the commentors for identifying this solution. the afex() packages is specifically designed for repeated measures factorial designs, and allows the appropriate corrections.
aov_ez (in the afex package) automatically applies corrections for non-sphericity
emmeans should specify that the model is multivariate
contrast (in the emmeans package) should specify the appropriate adjustment to "p"
The solution therefore would look like this...
library(afex)
library(emmeans)
model= aov_ez("subject","result",df,within=c("treatment","time"),type="III")
emm= emmeans(model,c("treatment","time"),model="multivariate")
contrast(emm,"trt.vs.ctrl",ref="Control",by="time",adjust="dunn")
Hope this is helpful for anyone else with a similar question!

Related

How to compute reliability estimates from lmer / lme results?

Disclaimer: Sorry for including images. I tried writing formulas in markdown format but failed
I'm coming from a commercial MLM (HLM7) software and would like to replicate some numbers in R.
Specifically i'm looking for a function or formula computing the reliability for the least squares estimates of each level1 coefficient across the set of J level-2 units
Below is an example based on the simple sleepstudy data. What I'm looking for is a way to compute reliability values not only in this very example, but also in situations where there are more level1 variables.
From HLM7 manual (Raudenbush, Bryk (2002), p.11) a definition of reliability is given:
Equation 3.58 in Hierarchical Linear Models (2nd ed.)
which is followed by the notion that:
I used the sleepstudy data from lme4 package to compute a random intercept and slope model with lme4::lmer:
library(lme4)
m <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)
summary(m)
And with HLM7 software
Fixed and random effects estimates are pretty similar (differences in rounding occur), but HLM7 will also provide it's reliability estimates:
----------------------------------------------------
Random level-1 coefficient Reliability estimate
----------------------------------------------------
INTRCPT1, G0 0.730
DAYS, G1 0.815
----------------------------------------------------
And this is something I'd like to be able to get from lmer() results.
Any ideas?
Thanks a lot
I needed to include reliability measures for a school project and could not find any package that provided these values. Thus, I had to create them via dplyr:
parameter variance/(parameter variance + (residual variance/n clusters)). Hope this helps should anyone else seek the answer to this question.

Variable selection methods

I have been doing variable selection for a modeling problem.
I have used trial and error for the selection (adding / removing a variable) with a decrease in error. However, I have the challenge as the number of variables grows into the hundreds that manual variable selection can not be performed as the model takes 1/2 hour to compute, rendering the task impossible.
Would you happen to know of any other packages than the regsubsets from the leaps package (which when tested with the same trial and error variables produced a higher error, it did not include some variables which were lineraly dependant - excluding some valuable variables).
You need a better (i.e. not flawed) approach to model selection. There are plenty of options, but one that should be easy to adapt to your situation would be using some form of regularization, such as the Lasso or the elastic net. These apply shrinkage to the sizes of the coefficients; if a coefficient is shrunk from its least squares solution to zero, that variable is removed from the model. The resulting model coefficients are slightly biased but they have lower variance than the selected OLS terms.
Take a look at the lars, glmnet, and penalized packages
Try using the stepAIC function of the MASS package.
Here is a really minimal example:
library(MASS)
data(swiss)
str(swiss)
lm <- lm(Fertility ~ ., data = swiss)
lm$coefficients
## (Intercept) Agriculture Examination Education Catholic
## 66.9151817 -0.1721140 -0.2580082 -0.8709401 0.1041153
## Infant.Mortality
## 1.0770481
st1 <- stepAIC(lm, direction = "both")
st2 <- stepAIC(lm, direction = "forward")
st3 <- stepAIC(lm, direction = "backward")
summary(st1)
summary(st2)
summary(st3)
You should try the 3 directions and ckeck which model works better with your test data.
Read ?stepAIC and take a look at the examples.
EDIT
It's true stepwise regression isn't the greatest method. As it's mentioned in GavinSimpson answer, lasso regression is a better/much more efficient method. It's much faster than stepwise regression and will work with large datasets.
Check out the glmnet package vignette:
http://www.stanford.edu/~hastie/glmnet/glmnet_alpha.html

Pseudo R squared for cumulative link function

I have an ordinal dependent variable and trying to use a number of independent variables to predict it. I use R. The function I use is clm in the ordinal package, to perform a cumulative link function with a probit link, to be precise:
I tried the function pR2 in the package pscl to get the pseudo R squared with no success.
How do I get pseudo R squareds with the clm function?
Thanks so much for your help.
There are a variety of pseudo-R^2. I don't like to use any of them because I do not see the results as having a meaning in the real world. They do not estimate effect sizes of any sort and they are not particularly good for statistical inference. Furthermore in situations like this with multiple observations per entity, I think it is debatable which value for "n" (the number of subjects) or the degrees of freedom is appropriate. Some people use McFadden's R^2 which would be relatively easy to calculate, since clm generated a list with one of its values named "logLik". You just need to know that the logLikelihood is only a multiplicative constant (-2) away from the deviance. If one had the model in the first example:
library(ordinal)
data(wine)
fm1 <- clm(rating ~ temp * contact, data = wine)
fm0 <- clm(rating ~ 1, data = wine)
( McF.pR2 <- 1 - fm1$logLik/fm0$logLik )
[1] 0.1668244
I had seen this question on CrossValidated and was hoping to see the more statistically sophisticated participants over there take this one on, but they saw it as a programming question and dumped it over here. Perhaps their opinion of R^2 as a worthwhile measure is as low as mine?
Recommend to use function nagelkerke from rcompanion package to get Pseudo r-squared.
When your predictor or outcome variables are categorical or ordinal, the R-Squared will typically be lower than with truly numeric data. R-squared merely a very weak indicator about model's fit, and you can't choose model based on this.

repeated measure anova using regression models (LM, LMER)

I would like to run repeated measure anova in R using regression models instead an 'Analysis of Variance' (AOV) function.
Here is an example of my AOV code for 3 within-subject factors:
m.aov<-aov(measure~(task*region*actiontype) + Error(subject/(task*region*actiontype)),data)
Can someone give me the exact syntax to run the same analysis using regression models? I want to make sure to respect the independence of residuals, i.e. use specific error terms as with AOV.
In a previous post I read an answer of the type:
lmer(DV ~ 1 + IV1*IV2*IV3 + (IV1*IV2*IV3|Subject), dataset))
I am really not sure about this solution since it still treats variables as between subjects, and I don't understand how adding random factors would change this.
Does someone know how to run repeated measure anova with lm/lmer taking into account residual independence?
Many thanks,
Solene
I have some worked examples with more detail here: https://keithlohse.github.io/mixed_effects_models/lohse_MER_chapter_02.html
But if you want to get a mixed model that is homologous to your ANOVA, you can include random intercepts for your each subject:factor with your within-subject factors. E.g.,
aov(DV~W1*W2*W3 + Error(SUBJECT/(W1*W2*W3)),data)
has a mixed-model equivalent of:
lmer(speed ~
# Fixed Effects
W1*W2*W3 +
# Random Effects
(1|SUBJECT) + (1|W1:SUBJECT) + (1|W2:SUBJECT) + (1|W3:SUBJECT),
data = DATA,
REML = TRUE)
With REML set to TRUE and a balanced design, you should get degrees of freedom and f-values that are identical to your ANOVA. ML tends to underestimate variance components, so if you are comparing nested models and need to use ML your results will not match precisely. If you are not comparing nested models and can use REML, then the ANOVA and mixed-model should match (again, in a balanced design).
To #skan's earlier answer and other ideas people might have, I am not saying this is THE random-effects structure (as it might be more appropriate to include random slopes for W1 compared to random-intercepts), but if you have one observation per subject:condition, then these random-effects produce an equivalent result.
If your aov example is right (maybe you don't want to nest things) you want this:
lmer(measure~(task*region*actiontype) + 1(1|subject/(task:region:actiontype))
If residual independence means intercept and slope independently calculated you need to specify them separately:
+(1|yourfactors)+(0+variable|yourfactors)
or use the symbol:
+(1||yourfactors)
Anyway if you read the help files you can find that lme4 can't deal with the most general problems.

How to do a Tukey HSD test with the Anova command (car package)

I'm dealing with an unbalanced design/sample and originally learned aov(). I know now that for my ANOVA tests I need to use the Type III Sum of Squares which involves using fitting using lm() rather than using aov().
The problem is getting post-hoc tests (specifically Tukey's HSD) using lm(). All the research I've done has said that using simint in the multcomp package would work, but now that it's updated that command seems to not be available. It also seems to rely upon going through aov() to calculate.
Essentially all of the Tukey HSD tests I've found for R assume that you use aov() for the comparison rather than lm(). To get the Type III Sum of Squares I need for the unbalanced design I have to use:
mod<-lm(Snavg~StudentEthnicity*StudentGender)
Anova(mod, type="III")
How do I use a Tukey HSD test with my mod using lm()? Or conversely, calculate my ANOVA using Type III and still be able to run a Tukey HSD test?
Thanks!
Try HSD.test in agricolae
library(agricolae)
data(sweetpotato)
model<-lm(yield~virus, data=sweetpotato)
comparison <- HSD.test(model,"virus", group=TRUE,
main="Yield of sweetpotato\nDealt with different virus")
Output
Study: Yield of sweetpotato
Dealt with different virus
HSD Test for yield
Mean Square Error: 22.48917
virus, means
yield std.err replication
cc 24.40000 2.084067 3
fc 12.86667 1.246774 3
ff 36.33333 4.233727 3
oo 36.90000 2.482606 3
alpha: 0.05 ; Df Error: 8
Critical Value of Studentized Range: 4.52881
Honestly Significant Difference: 12.39967
Means with the same letter are not significantly different.
Groups, Treatments and means
a oo 36.9
ab ff 36.33333
bc cc 24.4
c fc 12.86667
As an initial note, unless it's been changed, to get the correct results for type iii sum of squares, you need to set the contrast coding for the factor variables. This can be done inside the lm call or with options. The example below uses options.
I would be cautious about using HSD.test and similar functions with unbalanced designs unless the documentation addresses their use in these situations. The documentation for TukeyHSD mentions that it adjusts for "mildly unbalanced" designs. I don't know if HSD.test handles things differently. You'd have to check additional documentation for the package or the original reference cited for the function.
As a side note, enclosing the whole HSD.test function in parentheses will cause it to print the results. See example below.
In general, I would recommend using the flexible emmeans (née lsmeans) or multcomp packages for all your post-hoc comparison needs. emmeans is particularly useful for doing mean separations on interactions or for examining contrasts among treatments. [EDIT: Caveat that I am the author of these pages.]
With an unbalanced design, you may want to report the E.M. (or L.S.) means instead of the arithmetic means. See SAEPER: What are least square means?. [EDIT: Caveat that I am the author of this page.] Note in the example below that the marginal means reported by emmeans are different than those reported by HSD.test.
Also note that the "Tukey" in glht has nothing to do with Tukey HSD or Tukey-adjusted comparisons; it just sets up the contrasts for all pairwise tests, as the output says.
However, the adjust="tukey" in emmeans functions does mean to use Tukey-adjusted comparisons, as the output says.
The following example is partially adapted from ARCHBS: One-way Anova.
### EDIT: Some code changed to reflect changes to some functions
### in the emmeans package
if(!require(car)){install.packages("car")}
library(car)
data(mtcars)
mtcars$cyl.f = factor(mtcars$cyl)
mtcars$carb.f = factor(mtcars$carb)
options(contrasts = c("contr.sum", "contr.poly"))
model = lm(mpg ~ cyl.f + carb.f, data=mtcars)
library(car)
Anova(model, type="III")
if(!require(agricolae)){install.packages("agricolae")}
library(agricolae)
(HSD.test(model, "cyl")$groups)
if(!require(emmeans)){install.packages("emmeans")}
library(emmeans)
marginal = emmeans(model,
~ cyl.f)
pairs(marginal, adjust="tukey")
if(!require(multcomp)){install.packages("multcomp")}
library(multcomp)
cld(marginal, adjust="tukey", Letters=letters)
if(!require(multcomp)){install.packages("multcomp")}
library(multcomp)
mc = glht(model,
mcp(cyl.f = "Tukey"))
summary(mc, test=adjusted("single-step"))
cld(mc)
I found HSD.test() also to be very meticulous about the way you have built either the lm() or aov() model that you're using for it.
There was no output from HSD.test() with my data when I had used following idea of coding for lm() :
model<-lm(sweetpotato$yield ~ sweetpotato$virus)
out <- HSD.test(model,"virus", group=TRUE, console=TRUE)
Output was only:
Name: virus
sweetpotato$virus
The output was equally bad when using the same logic for aov()
model<-aov(sweetpotato$yield ~ sweetpotato$virus)
To get the output for HSD.test() the lm()
(or also if using aov() for the model )
must be constructed strictly using the logic presented in the MYaseen208 answer:
model <- lm(yield~virus, data=sweetpotato)
Hope this helps someone who's not getting a proper output from HSD.test().
I was stuck with the same problem of the HSD.test printing out nothing. You need to put console=TRUE inside the function, so it prints out automatically.
For example:
HSD.test(alturacrit.anova, "fator", console=TRUE).
Hope it helps!

Resources