R - 2x2 mixed ANOVA with repeated measures simple effect analysis - r

I would like to ask how to perform the simple main effect analysis in R correctly, in case of presence interaction effects between Group and Stage variables ?
One of my friends do same analysis in SPSS (using Bonferroni correction) and I try to reproduce his result in R.
I have data set of following structure:
ID Group Stage Y
1 I pre 0.123
1 I post 0.453
2 II pre 0.676
2 II post 0.867
3 I pre 0.324
3 I post 0.786
4 II pre 0.986
4 II post 0.112
... ... ... ...
This is 2x2 mixed ANOVA schema (1 between subject variable 'Group', 1 within subject variable 'Stage', which constitutes repated measure of y dependent variable).
I analysed it using ezANOVA function:
ezANOVA(data = dat, dv = y, wid = ID, between = Group, within = Stage, detailed = TRUE, type = "III")
I found a significant interaction Stage*Group. So I have determine simple effects using Bonferroni correction. I tried to do that with many methods. For example, if I want to find significant interactions in group I, between levels of Stage variable, I tried to use:
dataControl <- subset(dat, Group == "control" )
ezANOVA(data = dataControl, dv = y, wid = ID, within = Stage, detailed = TRUE, type = "III" ) // method 1
aov(data = dataControl, y ~ Stage + Error(ID/Stage)) // method 2
t.test(y ~ Stage, paired=TRUE) // method 3
But every method gave me different p-value result. None of these p-values matched those calculated with SPSS. Interesingly main effects p-values and other calculation gave the same result in SPSS and R. So I conclude that I am using wrong method in simple main effect analysis.
I would be very thankful I you could help me.

If you want R to give you the same numbers as SPSS, do this:
#pairwise comparisons
library(asbio)
bonf <- pairw.anova(data$dv, data$group, method="bonf") #also try "tukey" or "lsd"
print(bonf)
#plot(bonf) #can plot the CFs
This will give you t(s), mean differences, upper and lower bounds, HLSD Diff Lower Upper Decision Adj. p-value decision, and adjusted p-value.

Related

emmeans: regrid() for binomial GLMM with user-defined link function

I have fitted a binomial GLMM in R with a modified link function with a fixed guessing probability as suggested in this thread - except that the guessing probability is 1/2 and not 1/3. Therefore the sigmoidal activation in my case becomes:
P(correct) = 0.5 + 0.5*(exp(term)/(1 + exp(term))).
My model looks like this:
library(lme4)
m = 2
mod = glmer(correct ~ group*stim_strength + (stim_strength|subject) ,
family=binomial(link=mafc.logit(m)), data=obs_data)
where: guessing probability is 1/m; correct is a categorical variable indicating correct/incorrect response; group is a factor with two levels; stim_strength is numerical with values in [0,1]; mafc.logit is the function suggested in the thread.
I'm essentially fitting separate psychometric curves of the stimulus strength (stim_strength) for the two groups, while taking into account the inter-subject fluctuations in slope and intercept (random effect structure (stim_strength|subject))
This is what I get:
plot_model(mod, type = 'emm', terms = c('stim_strength', 'group'))
---> plot
The model describes the data nicely, and I now want to perform some post-hoc analyses on it. Specifically, I want to run for example:
mod.emm = emmeans(mod, ~group|stim_strength, at=list(stim_strength=c(.25,.75)))
confint(regrid(mod.emm))
contrast(regrid(mod.emm), 'pairwise', simple = 'group', combine = TRUE, adjust = 'holm')
i.e. compute confidence intervals for the %correct of the two groups at some specified values of stim_strength, and compare the %correct of the two groups at these values.
Note that I'm using regrid(), because I want the analyses to be done on the back-transformed values, not on the linear part of the model!
However, regrid() won't work with a user-defined link function. In fact, the regrid is just ignored here, as you can see e.g. from the output of the confint() call above (estimates are labelled as prob but they're clearly not transformed to [.5,1]):
stim_strength = 0.25:
group prob SE df asymp.LCL asymp.UCL
1 -1.329 0.173 Inf -1.716 -0.942
2 -0.553 0.161 Inf -0.913 -0.192
stim_strength = 0.75:
group prob SE df asymp.LCL asymp.UCL
1 1.853 0.372 Inf 1.018 2.687
2 3.375 0.395 Inf 2.489 4.261
Similarly, when adding type='response' in emmeans, I get the message:
Unknown transformation "mafc.logit(2)": no transformation done
Any workaround?
Thanks!
Looking at the linked suggestion, it appears that mafc.logit() is a function that returns a list with all the information needed to implement the transform. All you need to do is update the emmGrid object with that information:
mod.emm <- update(mod.emm, tran = mafc.logit(2))
confint(regrid(mod.emm), adjust = 'holm')
# etc...
See, for example, this vignette section and possibly other parts of that vignette.

Repeated measures ANOVA and link to mixed-effect models in R

I have a problem when performing a two-way rm ANOVA in R on the following data (link : https://drive.google.com/open?id=1nIlFfijUm4Ib6TJoHUUNeEJnZnnNzO29):
subjectnbr is the id of the subject and blockType and linesTTL are the independent variables. RT2 is the dependent variable
I first performed the rm ANOVA through using ezANOVA with the following code:
ANOVA_RTS <- ezANOVA(
data=castRTs
, dv=RT2
, wid=subjectnbr
, within = .(blockType,linesTTL)
, type = 2
, detailed = TRUE
, return_aov = FALSE
)
ANOVA_RTS
The result is correct (I double-checked using statistica).
However, when I perform the rm ANOVA using the lme function, I do not get the same answer and I have no clue why.
There is my code:
lmeRTs <- lme(
RT2 ~ blockType*linesTTL,
random = ~1|subjectnbr/blockType/linesTTL,
data=castRTs)
anova(lmeRTs)
Here are the outputs of both ezANOVA and lme.
I hope I have been clear enough and have given you all the information needed.
I'm looking forward for your help as I am trying to figure it out for at least 4 hours!
Thanks in advance.
Here is a step-by-step example on how to reproduce ezANOVA results with nlme::lme.
The data
We read in the data and ensure that all categorical variables are factors.
# Read in data
library(tidyverse);
df <- read.csv("castRTs.csv");
df <- df %>%
mutate(
blockType = factor(blockType),
linesTTL = factor(linesTTL));
Results from ezANOVA
As a check, we reproduce the ez::ezANOVA results.
## ANOVA using ez::ezANOVA
library(ez);
model1 <- ezANOVA(
data = df,
dv = RT2,
wid = subjectnbr,
within = .(blockType, linesTTL),
type = 2,
detailed = TRUE,
return_aov = FALSE);
model1;
# $ANOVA
# Effect DFn DFd SSn SSd F p
#1 (Intercept) 1 13 2047405.6654 34886.767 762.9332235 6.260010e-13
#2 blockType 1 13 236.5412 5011.442 0.6136028 4.474711e-01
#3 linesTTL 1 13 6584.7222 7294.620 11.7348665 4.514589e-03
#4 blockType:linesTTL 1 13 1019.1854 2521.860 5.2538251 3.922784e-02
# p<.05 ges
#1 * 0.976293831
#2 0.004735442
#3 * 0.116958989
#4 * 0.020088855
Results from nlme::lme
We now run nlme::lme
## ANOVA using nlme::lme
library(nlme);
model2 <- anova(lme(
RT2 ~ blockType * linesTTL,
random = list(subjectnbr = pdBlocked(list(~1, pdIdent(~blockType - 1), pdIdent(~linesTTL - 1)))),
data = df))
model2;
# numDF denDF F-value p-value
#(Intercept) 1 39 762.9332 <.0001
#blockType 1 39 0.6136 0.4382
#linesTTL 1 39 11.7349 0.0015
#blockType:linesTTL 1 39 5.2538 0.0274
Results/conclusion
We can see that the F test results from both methods are identical. The somewhat complicated structure of the random effect definition in lme arises from the fact that you have two crossed random effects. Here "crossed" means that for every combination of blockType and linesTTL there exists an observation for every subjectnbr.
Some additional (optional) details
To understand the role of pdBlocked and pdIdent we need to take a look at the corresponding two-level mixed effect model
The predictor variables are your categorical variables blockType and linesTTL, which are generally encoded using dummy variables.
The variance-covariance matrix for the random effects can take different forms, depending on the underlying correlation structure of your random effect coefficients. To be consistent with the assumptions of a two-level repeated measure ANOVA, we must specify a block-diagonal variance-covariance matrix pdBlocked, where we create diagonal blocks for the offset ~1, and for the categorical predictor variables blockType pdIdent(~blockType - 1) and linesTTL pdIdent(~linesTTL - 1), respectively. Note that we need to subtract the offset from the last two blocks (since we've already accounted for the offset).
Some relevant/interesting resources
Pinheiro and Bates, Mixed-Effects Models in S and S-PLUS, Springer (2000)
Potvin and Schutz, Statistical power for the two-factor
repeated measures ANOVA, Behavior Research Methods, Instruments & Computers, 32, 347-356 (2000)
Deming Mi, How to understand and apply
mixed-effect models, Department of Biostatistics, Vanderbilt university

R: predict.averaging is not taking an offset into account when plotting

I'm currently trying to use the predict.averaging function in MuMIn to create some graphs from some model averaging I've done on some GLMMs. I'm interested in whether the number of insects caught per daylight hour in some traps changes when the traps are left out for different lengths of time; I included offset(log(Daylight)) in my GLMMs to account for this. But when I use the predict function it doesn't take the offset into account and I get the same graph that I get if I hadn't included the offset in the first place. But I know the offset is having an effect due to the output from my model averaged GLMMs, and it's the kind of effect I would expect from observations of my data.
Does anyone know why this problem might be and how I might make predict.averaging take the offset into account? I've included the code that I'm using below:
# global model for total insect abundance
glmm11 <- glmmadmb(Total_polls ~ Max_temp+Wind+Precipitation+Veg_height+Season+Year+log(Mean.nectar+1)+I(log(Nectar+1)-log(Mean.nectar+1))+Pan_colour*Assoc_col+Treatment*Area*Depth+(1|Transect)+offset(log(Daylight)), data = ab, zeroInflation = FALSE, family = "nbinom")
# make predictions based on model averaging output (subset delta < 2)
preds<-predict(ave21, full = F, type = "response", backtransform = FALSE) # on the response scale
Where ave21 is a model averaging object generated using pdredge and model.avg that was constrained to have the offset in every model: model11 <- pdredge(glmm11, cluster = clust, fixed = ~offset(log(Daylight))+(1|Transect)). The object itself looks like this:
Call:
model.avg(object = get.models(object = model11, subset = delta <
2))
Component model call:
glmmadmb(formula = Total_polls ~ <3 unique rhs>, data = ab, family = nbinom,
zeroInflation = FALSE)
Component models:
df logLik AICc delta weight
1/2/3/4/5/6/7/8/9/10/11/12 20 -864.14 1769.22 0.00 0.47
1/2/3/4/5/6/7/8/9/10/11/12/13 23 -861.39 1770.03 0.81 0.31
1/3/4/5/6/7/8/9/10/11/12 19 -865.97 1770.79 1.57 0.21
Term codes:
Area Assoc_col
1 2
Depth I(log(Nectar + 1) - log(Mean.nectar + 1))
3 4
Max_temp Pan_colour
5 6
Season Treatment
7 8
Year log(Mean.nectar + 1)
9 10
offset(log(Daylight)) Area:Depth
11 12
Assoc_col:Pan_colour
13
Which I then used to get predictions:
pred_results<-cbind(glmm21$frame, preds) # append original dataframe to predictions
plot(pred_results$preds~pred_results$Treatment) # Treatment = trap duration (hours)
This code might go around the houses a little as I borrowed it off of a fellow PhD student. The graph I get when I plot my predictions looks like this:[Model predictions vs. Trap duration (hours)][1], which is very different from the view given by the summary results of my model averaging:
(conditional average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) -5.896725 0.948102 0.949386 6.211 < 2e-16 ***
Treatment24 -0.714283 0.130226 0.130403 5.478 < 2e-16 ***
Treatment48 -0.983881 0.122416 0.122582 8.026 < 2e-16 ***
Any help would be great, as I can't find any specific instances where this has been addressed on the site to date. Thank you in advance and please let me know if you need me to add anything to make this question better.
Tom
[1]:
https://i.stack.imgur.com/Pn4dK.jpg

Converting Repeated Measures mixed model formula from SAS to R

There are several questions and posts about mixed models for more complex experimental designs, so I thought this more simple model would help other beginners in this process as well as I.
So, my question is I would like to formulate a repeated measures ancova in R from sas proc mixed procedure:
proc mixed data=df1;
FitStatistics=akaike
class GROUP person day;
model Y = GROUP X1 / solution alpha=.1 cl;
repeated / type=cs subject=person group=GROUP;
lsmeans GROUP;
run;
Here is the SAS output using the data created in R (below):
. Effect panel Estimate Error DF t Value Pr > |t| Alpha Lower Upper
Intercept -9.8693 251.04 7 -0.04 0.9697 0.1 -485.49 465.75
panel 1 -247.17 112.86 7 -2.19 0.0647 0.1 -460.99 -33.3510
panel 2 0 . . . . . . .
X1 20.4125 10.0228 7 2.04 0.0811 0.1 1.4235 39.4016
Below is how I formulated the model in R using 'nlme' package, but am not getting similar coefficient estimates:
## create reproducible example fake panel data set:
set.seed(94); subject.id = abs(round(rnorm(10)*10000,0))
set.seed(99); sds = rnorm(10,15,5);means = 1:10*runif(10,7,13);trends = runif(10,0.5,2.5)
this = NULL; set.seed(98)
for(i in 1:10) { this = c(this,rnorm(6, mean = means[i], sd = sds[i])*trends[i]*1:6)}
set.seed(97)
that = sort(rep(rnorm(10,mean = 20, sd = 3),6))
df1 = data.frame(day = rep(1:6,10), GROUP = c(rep('TEST',30),rep('CONTROL',30)),
Y = this,
X1 = that,
person = sort(rep(subject.id,6)))
## use package nlme
require(nlme)
## run repeated measures mixed model using compound symmetry covariance structure:
summary(lme(Y ~ GROUP + X1, random = ~ +1 | person,
correlation=corCompSymm(form=~day|person), na.action = na.exclude,
data = df1,method='REML'))
Now, the output from R, which I now realize is similar to the output from lm():
Value Std.Error DF t-value p-value
(Intercept) -626.1622 527.9890 50 -1.1859379 0.2413
GROUPTEST -101.3647 156.2940 7 -0.6485518 0.5373
X1 47.0919 22.6698 7 2.0772934 0.0764
I believe I'm close as to the specification, but not sure what piece I'm missing to make the results match (within reason..). Any help would be appreciated!
UPDATE: Using the code in the answer below, the R output becomes:
> summary(model2)
Scroll to bottom for the parameter estimates -- look! identical to SAS.
Linear mixed-effects model fit by REML
Data: df1
AIC BIC logLik
776.942 793.2864 -380.471
Random effects:
Formula: ~GROUP - 1 | person
Structure: Diagonal
GROUPCONTROL GROUPTEST Residual
StdDev: 184.692 14.56864 93.28885
Correlation Structure: Compound symmetry
Formula: ~day | person
Parameter estimate(s):
Rho
-0.009929987
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | GROUP
Parameter estimates:
TEST CONTROL
1.000000 3.068837
Fixed effects: Y ~ GROUP + X1
Value Std.Error DF t-value p-value
(Intercept) -9.8706 251.04678 50 -0.0393178 0.9688
GROUPTEST -247.1712 112.85945 7 -2.1900795 0.0647
X1 20.4126 10.02292 7 2.0365914 0.0811
Please try below:
model1 <- lme(
Y ~ GROUP + X1,
random = ~ GROUP | person,
correlation = corCompSymm(form = ~ day | person),
na.action = na.exclude, data = df1, method = "REML"
)
summary(model1)
I think random = ~ groupvar | subjvar option with R lme provides similar result of repeated / subject = subjvar group = groupvar option with SAS/MIXED in this case.
Edit:
SAS/MIXED
R (a revised model2)
model2 <- lme(
Y ~ GROUP + X1,
random = list(person = pdDiag(form = ~ GROUP - 1)),
correlation = corCompSymm(form = ~ day | person),
weights = varIdent(form = ~ 1 | GROUP),
na.action = na.exclude, data = df1, method = "REML"
)
summary(model2)
So, I think these covariance structures are very similar (σg1 = τg2 + σ1).
Edit 2:
Covariate estimates (SAS/MIXED):
Variance person GROUP TEST 8789.23
CS person GROUP TEST 125.79
Variance person GROUP CONTROL 82775
CS person GROUP CONTROL 33297
So
TEST group diagonal element
= 125.79 + 8789.23
= 8915.02
CONTROL group diagonal element
= 33297 + 82775
= 116072
where diagonal element = σk1 + σk2.
Covariate estimates (R lme):
Random effects:
Formula: ~GROUP - 1 | person
Structure: Diagonal
GROUP1TEST GROUP2CONTROL Residual
StdDev: 14.56864 184.692 93.28885
Correlation Structure: Compound symmetry
Formula: ~day | person
Parameter estimate(s):
Rho
-0.009929987
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | GROUP
Parameter estimates:
1TEST 2CONTROL
1.000000 3.068837
So
TEST group diagonal element
= 14.56864^2 + (3.068837^0.5 * 93.28885 * -0.009929987) + 93.28885^2
= 8913.432
CONTROL group diagonal element
= 184.692^2 + (3.068837^0.5 * 93.28885 * -0.009929987) + (3.068837 * 93.28885)^2
= 116070.5
where diagonal element = τg2 + σ1 + σg2.
Oooh, this is going to be a tricky one, and if it's even possible using standard nlme functions, is going to take some serious study of Pinheiro/Bates.
Before you spend the time doing that though, you should make absolutely sure that this is exact model you need. Perhaps there's something else that might fit the story of your data better. Or maybe there's something R can do more easily that is just as good, but not quite the same.
First, here's my take on what you're doing in SAS with this line:
repeated / type=cs subject=person group=GROUP;
This type=cs subject=person is inducing correlation between all the measurements on the same person, and that correlation is the same for all pairs of days. The group=GROUP is allowing the correlation for each group to be different.
In contrast, here's my take on what your R code is doing:
random = ~ +1 | person,
correlation=corCompSymm(form=~day|person)
This code is actually adding almost the same effect in two different ways; the random line is adding a random effect for each person, and the correlation line is inducing correlation between all the measurements on the same person. However, these two things are almost identical; if the correlation is positive, you get the exact same result by including either of them. I'm not sure what happens when you include both, but I do know that only one is necessary. Regardless, this code has the same correlation for all individuals, it's not allowing each group to have their own correlation.
To let each group have their own correlation, I think you have to build a more complicated correlation structure up out of two different pieces; I've never done this but I'm pretty sure I remember Pinheiro/Bates doing it.
You might consider instead adding a random effect for person and then letting the variance be different for the different groups with weights=varIdent(form=~1|group) (from memory, check my syntax, please). This won't quite be the same but tells a similar story. The story in SAS is that the measurements on some individuals are more correlated than the measurements on other individuals. Thinking about what that means, the measurements for individuals with higher correlation will be closer together than the measurements for individuals with lower correlation. In contrast, the story in R is that the variability of measurements within individuals varies; thinking about that, measurements with higher variability with have lower correlation. So they do tell similar stories, but come at it from opposite sides.
It is even possible (but I would be surprised) that these two models end up being different parameterizations of the same thing. My intuition is that the overall measurement variability will be different in some way. But even if they aren't the same thing, it would be worth writing out the parameterizations just to be sure you understand them and to make sure that they are appropriately describing the story of your data.

Anova Type 2 and Contrasts

the study design of the data I have to analyse is simple. There is 1 control group (CTRL) and
2 different treatment groups (TREAT_1 and TREAT_2). The data also includes 2 covariates COV1 and COV2. I have been asked to check if there is a linear or quadratic treatment effect in the data.
I created a dummy data set to explain my situation:
df1 <- data.frame(
Observation = c(rep("CTRL",15), rep("TREAT_1",13), rep("TREAT_2", 12)),
COV1 = c(rep("A1", 30), rep("A2", 10)),
COV2 = c(rep("B1", 5), rep("B2", 5), rep("B3", 10), rep("B1", 5), rep("B2", 5), rep("B3", 10)),
Variable = c(3944133, 3632461, 3351754, 3655975, 3487722, 3644783, 3491138, 3328894,
3654507, 3465627, 3511446, 3507249, 3373233, 3432867, 3640888,
3677593, 3585096, 3441775, 3608574, 3669114, 4000812, 3503511, 3423968,
3647391, 3584604, 3548256, 3505411, 3665138,
4049955, 3425512, 3834061, 3639699, 3522208, 3711928, 3576597, 3786781,
3591042, 3995802, 3493091, 3674475)
)
plot(Variable ~ Observation, data = df1)
As you can see from the plot there is a linear relationship between the control and the treatment groups. To check if this linear effect is statistical significant I change the contrasts using the contr.poly() function and fit a linear model like this:
contrasts(df1$Observation) <- contr.poly(levels(df1$Observation))
lm1 <- lm(log(Variable) ~ Observation, data = df1)
summary.lm(lm1)
From the summary we can see that the linear effect is statistically significant:
Observation.L 0.029141 0.012377 2.355 0.024 *
Observation.Q 0.002233 0.012482 0.179 0.859
However, this first model does not include any of the two covariates. Including them results in a non-significant p-value for the linear relationship:
lm2 <- lm(log(Variable) ~ Observation + COV1 + COV2, data = df1)
summary.lm(lm2)
Observation.L 0.04116 0.02624 1.568 0.126
Observation.Q 0.01003 0.01894 0.530 0.600
COV1A2 -0.01203 0.04202 -0.286 0.776
COV2B2 -0.02071 0.02202 -0.941 0.354
COV2B3 -0.02083 0.02066 -1.008 0.320
So far so good. However, I have been told to conduct a Type II Anova rather than Type I. To conduct a Type II Anova I used the Anova() function from the car package.
Anova(lm2, type="II")
Anova Table (Type II tests)
Response: log(Variable)
Sum Sq Df F value Pr(>F)
Observation 0.006253 2 1.4651 0.2453
COV1 0.000175 1 0.0820 0.7763
COV2 0.002768 2 0.6485 0.5292
Residuals 0.072555 34
The problem here with using Type II is that you do not get a p-value for the linear and quadratic effect. So I do not know if the effect is statistically linear and or quadratic.
I found out that the following code produces the same p-value for Observation as the Anova() function. But the result also does not include any p-values for the linear or quadratic effect:
lm2 <- lm(log(Variable) ~ Observation + COV1 + COV2, data = df1)
lm3 <- lm(log(Variable) ~ COV1 + COV2, data = df1)
anova(lm2, lm3)
Does anybody know how to conduct a Type II anova and the contrasts function to obtain the p-values for the linear and quadratic effects?
Help would be very much appreciated.
Best
Peter
I found one partial workaround for this, but it may require further correction. The documentation for the function drop1() from the stats package indicates that this function produces Type II sums of squares (although this page: http://www.statmethods.net/stats/anova.html ) declares that drop1() produces Type III sums of squares, and I didn't spend too much time poring over this (http://afni.nimh.nih.gov/sscc/gangc/SS.html) to cross-check sums of squares calculations. You could use it to calculate everything manually, but I suspect you're asking this question because it would be nice if someone had already worked through it.
Anyway, I added a second vector to the dummy data called Observation2, and set it up with just the linear contrasts (you can only specify one set of contrasts for a given vector at a given time):
df1[,"Observation2"]<-df1$Observation
contrasts(df1$Observation2, how.many=1)<-contr.poly
Then created a third linear model:
lm3<-lm(log(Variable)~Observation2+COV1+COV2, data=df1)
And conducted F tests with drop1 to compare F statistics from Type II ANOVAs between the two models:
lm2, which contains both the linear and quadratic terms:
drop1(lm2, test="F")
lm3, which contains just the linear contrasts:
drop1(lm3, test="F")
This doesn't include a direct comparison of the models against each other, although the F statistic is higher (and p value accordingly lower) for the linear model, which would lead one to rely upon it instead of the quadratic model.

Resources