R - Mixed Design ANOVA post hoc test - r

I have the following data structure (with example values):
id var1 var2 value
1 true tr 1.34
2 true ct 4.89
3 false mm 2.38
4 true tr 1.28
The data is saved in 'longData'. So 'var1' is between subject variable that can be true or false, 'var2' is a within subject factor with 3 levels (tr, ct, mm) and 'value' is a numeric value.
I've made a mixed design ANOVA like this:
anovaResult = ezANOVA(data=longData,
dv=.("value"),
wid=.("id"),
within=.("var2"),
between=.("var1"),
type=3)
The result showed signifikant interaction between var1 and var2. Now I would like to examine this interaction further, but I don't know how. I've heard about the emmeans package (estimated marginal means seems to be the statistic of choice here, since I am new to statistics, feel free to advise me otherwise) but could not get the command to work. This is probably because I am new to R and do not understand the syntax fully.
Can anyone provide me with a working example of how to test the interaction between the two factors? I would not say no to an explanation of how to interpret the results as well.
I know this is much I am asking for, but I cannot figure it out for myself and have to present results soon without much time to learn statistics and R.
Thank you.

It would help to provide an example dataset.
However, you can run a Tukey test:
mod1<-aov(value~Factor1*Factor2, df)
TukeyHSD(mod1)
Or to run emmeans on an Anova with an interaction:
mod1<-aov(value~Factor1*Factor2, df)
library(emmeans)
emmeans(mod1, pairwise~Treatment*Time)
Or to do a mixed model, which is what you seem to be doing:
mod1<-lmer(value~Factor1*Factor2+(1|subject), df)
Anova(mod1)
summary(mod1)
emmeans(mod1, pairwise~Factor1*Factor2)

Related

Repeated measures ANOVA - different resutls for SPSS versus R

I am trying to run a repeated - measures ANOVA using R and compared it to the SPSS output and results differ a lot! Maybe I make a mistake somewhere, but I cannot figure it out
So some sample data:
id is the subject. Every subject makes one rating for three items (res_1, res_2 and res_3). I want to compare an overall effect of item.
id<-c(1,2,3,4,5,6)
res_1<-c(1,1,1,2,2,1)
res_2<-c(4,5,2,4,4,3)
res_3<-c(4,5,6,3,6,6)
## wide format for spss
table<-as.data.frame(cbind(id, res_1, res_2, res_3))
## reshape to long format
library(reshape2)
table<-melt(table, id.vars="id")
colnames(table)<-c("id", "item", "rating")
aov.out = aov(rating ~ item+ Error(id/item), data=table)
summary(aov.out)
And here is my SPSS code (from wide format data)
GLM item_1 item_2 item_3
/WSFACTOR=factor1 3 Polynomial
/METHOD=SSTYPE(3)
/PRINT=DESCRIPTIVE
/CRITERIA=ALPHA(.05)
/WSDESIGN=factor1.
The results I get from
R: p value 0.0526 (error:within)
and SPSS: p value 0.003 (test of within subject effect)
Does anyone have a suggestion that may explain the difference?
If I do a non-parametric Friedmann test, I get the same results in SPSS and R.
Actually, when looking at my data, the summary(aov.out) is the same as SPSS's "test of within subjects contrast" (but I learned to look at the test of within subjects effect).
Thanks!
There's a lot of stuff out there; I am a bit surprised that your google for 'spss versus R anova' did not bring you to links explaining about the difference in sums-of-squares between SPSS (type-III) and R (type-I) as well as difference in how contrasts are handled.
These are the top two results that I found:
http://myowelt.blogspot.ca/2008/05/obtaining-same-anova-results-in-r-as-in.html and
https://stats.stackexchange.com/questions/40958/testing-anova-hypothesis-with-contrasts-in-r-and-spss

How to replicate Stata "factor" command in R

I'm trying to replicate some Stata results in R and am having a lot of trouble. Specifically, I want to recover the same eigenvalues as Stata does in exploratory factor analysis. To provide a specific example, the factor help in Stata uses bg2 data (something about physician costs) and gives you the following results:
webuse bg2
factor bg2cost1-bg2cost6
(obs=568)
Factor analysis/correlation Number of obs = 568
Method: principal factors Retained factors = 3
Rotation: (unrotated) Number of params = 15
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 0.85389 0.31282 1.0310 1.0310
Factor2 | 0.54107 0.51786 0.6533 1.6844
Factor3 | 0.02321 0.17288 0.0280 1.7124
Factor4 | -0.14967 0.03951 -0.1807 1.5317
Factor5 | -0.18918 0.06197 -0.2284 1.3033
Factor6 | -0.25115 . -0.3033 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000
I'm interested in the eigenvalues in the first column of the table. When I use the same data in R, I get the following results:
bg2 = read.dta("bg2.dta")
eigen(cor(bg2)
$values
[1] 1.7110112 1.4036760 1.0600963 0.8609456 0.7164879 0.6642889 0.5834942
As you can see, these values are quite different from Stata's results. It is likely that the two programs are using different means of calculating the eigenvalues, but I've tried a wide variety of different methods of extracting the eigenvalues, including most (if not all) of the options in R commands fa, factanal, principal, and maybe some other R commands. I simply cannot extract the same eigenvalues as Stata. I've also read through Stata's manual to try and figure out exactly what method Stata uses, but couldn't figure it out with enough specificity.
I'd love any help! Please let me know if you need any additional information to answer the question.
I would advise against carrying out a factor analysis on all the variables in the bg2 data as one of the variables is clinid, which is an arbitrary identifier 1..568 and carries no information, except by accident.
Sensibly or not, you are not using the same data, as you worked on the 6 cost variables in Stata and those PLUS the identifier in R.
Another way to notice that would be to spot that you got 6 eigenvalues in one case and 7 in the other.
Nevertheless the important principle is that eigen(cor(bg2)) is just going to give you the eigenvalues from a principal component analysis based on the correlation matrix. So you can verify that pca in Stata would match what you report from R.
So far, so clear.
But your larger question remains. I don't know how to mimic Stata's (default) factor analysis in R. You may need a factor analysis expert, if any hang around here.
In short, PCA is not equal to principal axis method factor analysis.
Different methods of calculating eigenvalues are not the issue here. I'd bet that given the same matrix Stata and R match up well in reporting eigenvalues. The point is that different techniques mean different eigenvalues in principle.
P.S. I am not an R person, but I think what you call R commands are strictly R functions. In turn I am open to correction on that small point.

Multiple comparisions using glht with repeated measure anova

I'm using the following code to try to get at post-hoc comparisons for my cell means:
result.lme3<-lme(Response~Pressure*Treatment*Gender*Group, mydata, ~1|Subject/Pressure/Treatment)
aov.result<-aov(result.lme3, mydata)
TukeyHSD(aov.result, "Pressure:Treatment:Gender:Group")
This gives me a result, but most of the adjusted p-values are incredibly small - so I'm not convinced the result is correct.
Alternatively I'm trying this:
summary(glht(result.lme3,linfct=mcp(????="Tukey")
I don't know how to get the Pressure:Treatment:Gender:Group in the glht code.
Help is appreciated - even if it is just a link to a question I didn't find previously.
I have 504 observations, Pressure has 4 levels and is repeated in each subject, Treatment has 2 levels and is repeated in each subject, Group has 3 levels, and Gender is obvious.
Thanks
I solved a similar problem creating a interaction dummy variable using interaction() function which contains all combinations of the leves of your 4 variables.
I made many tests, the estimates shown for the various levels of this variable show the joint effect of the active levels plus the interaction effect.
For example if:
temperature ~ interaction(infection(y/n), acetaminophen(y/n))
(i put the possible leves in the parenthesis for clarity) the interaction var will have a level like "infection.y:acetaminophen.y" which show the effect on temperature of both infection, acetaminophen and the interaction of the two in comparison with the intercept (where both variables are n).
Instead if the model was:
temperature ~ infection(y/n) * acetaminophen(y/n)
to have the same coefficient for the case when both vars are y, you would have had to add the two simple effect plus the interaction effect. The result is the same but i prefer using interaction since is more clean and elegant.
The in glht you use:
summary(glht(model, linfct= mcp(interaction_var = 'Tukey'))
to achieve your post-hoc, where interaction_var <- interaction(infection, acetaminophen).
TO BE NOTED: i never tested this methodology with nested and mixed models so beware!

One-way repeated measures ANOVA with unbalanced data

I'm new to R, and I've read these forums (for help with R) for awhile now, but this is my first time posting. After googling each error here, I still can't figure out and fix my mistakes.
I am trying to run a one-way repeated measures ANOVA with unequal sample sizes. Here is a toy version of my data and the code that I'm using. (If it matters, my real data have 12 bins with up to 14 to 20 values in each bin.)
## the data: average probability for a subject, given reaction time bin
bin1=c(0.37,0.00,0.00,0.16,0.00,0.00,0.08,0.06)
bin2=c(0.33,0.21,0.000,1.00,0.00,0.00,0.00,0.00,0.09,0.10,0.04)
bin3=c(0.07,0.41,0.07,0.00,0.10,0.00,0.30,0.25,0.08,0.15,0.32,0.18)
## creating the data frame
# dependent variable column
probability=c(bin1,bin2,bin3)
# condition column
bin=c(rep("bin1",8),rep("bin2",11),rep("bin3",12))
# subject column (in the order that will match them up with their respective
# values in the dependent variable column)
subject=c("S2","S3","S5","S7","S8","S9","S11","S12","S1","S2","S3","S4","S7",
"S9","S10","S11","S12","S13","S14","S1","S2","S3","S5","S7","S8","S9","S10",
"S11","S12","S13","S14")
# putting together the data frame
dataFrame=data.frame(cbind(probability,bin,subject))
## one-way repeated measures anova
test=aov(probability~bin+Error(subject/bin),data=dataFrame)
These are the errors I get:
Error in qr.qty(qr.e, resp) :
invalid to change the storage mode of a factor
In addition: Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : - not meaningful for factors
3: In aov(probability ~ bin + Error(subject/bin), data = dataFrame) :
Error() model is singular
Sorry for the complexity (assuming it is complex; it is to me). Thank you for your time.
For an unbalanced repeated-measures design, it might be easiest to
use lme (from the nlme package):
## this should be the same as the data you constructed above, just
## a slightly more compact way to do it.
datList <- list(
bin1=c(0.37,0.00,0.00,0.16,0.00,0.00,0.08,0.06),
bin2=c(0.33,0.21,0.000,1.00,0.00,0.00,0.00,0.00,0.09,0.10,0.04),
bin3=c(0.07,0.41,0.07,0.00,0.10,0.00,0.30,0.25,0.08,0.15,0.32,0.18))
subject=c("S2","S3","S5","S7","S8","S9","S11","S12",
"S1","S2","S3","S4","S7","S9","S10","S11","S12","S13","S14",
"S1","S2","S3","S5","S7","S8","S9","S10","S11","S12","S13","S14")
d <- data.frame(probability=do.call(c,datList),
bin=paste0("bin",rep(1:3,sapply(datList,length))),
subject)
library(nlme)
m1 <- lme(probability~bin,random=~1|subject/bin,data=d)
summary(m1)
The only real problem is that some aspects of the interpretation etc.
are pretty far from the classical sum-of-squares-decomposition approach
(e.g. it's fairly tricky to do significance tests of variance components).
Pinheiro and Bates (Springer, 2000) is highly recommended reading if you're
going to head in this direction.
It might be a good idea to simulate/make up some balanced data and do the
analysis with both aov() and lme(), look at the output, and make sure
you can see where the correspondences are/know what's going on.

coxph stratified by year

I think this should be something very easy, but I can't quite get my head around it.
I have the following code:
library(survival)
cox <- coxph(Surv(SURV, DEAD)~YEAR, data)
summary(cox)
but I would like to have the result split down into the individual years.
Here's what the SPSS syntax and solution would look like:
COXREG surv /STATUS=dead(1) /CONTRAST (year)=Indicator(1)
/METHOD=ENTER year /PRINT=CI(95)
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
EXECUTE.
and the same thing in STATA:
xi: stcox i.year
Here's the output of
str(data)
You did not show us str(data) or how to construct a reproducible example the gave "data". I suspect that "YEAR" will turn out to be a numeric vector. If it had been a factor variable you would have seen an Intercept and n-1 coefficients. The Interecpt coefficient would then have been the same as the "year" and the other coefficients would have matched up to the year(n) values. You told the SPSS engine that "year" was an "INDICATOR" but you didn't offer the same courtesy to the R engine.
Try this:
data$year.ind <- factor(data$year) # equivalent of SPSS INDICATOR
# or SAS /CLASS
cox.mdl <- coxph(Surv(SURV, DEAD)~YEAR, data)
as.matrix(coef(coc.mdl)
summary(cox.mdl)
R often splits computing and display of results to allow more freedom. I assume you need the predict function of coxph (?predict.coxph).
There are examples at the bottom of the documentation page, most likely you want
predict(cox, type="terms")

Resources