How to obtain Tukey compact letter display from a GLM with interactions - r

I have set of data that I've analyzed with a generalized linear model that has three categorical factors in 3-way interaction (factorA, factorB, factorC) and a fourth continuous factor (factorD) that is simply added in the model. I am trying to obtain a set of Tukey letter groups (ie, compact letter display) from the model but haven't found a way to include the interaction successfully. I'm not interested in including factorD, just the three in the interaction.
I have gotten the Tukey-adjusted pairwise comparisons with this:
lsmeans(my.glm, factorA*factorB*factorC)
But I was not able to figure out how to produce a compact letters display from that. It can be done with multcomp package but I could only find ways to do it with main effects with that package, not interactions.
So then I tried the agricolae package, as this post (https://stats.stackexchange.com/questions/31547/how-to-obtain-the-results-of-a-tukey-hsd-post-hoc-test-in-a-table-showing-groupe) discusses that that should work. However, following the instructions in that answer led to a non-functional response from HSD.test. Specifically, I could get the main effects tests to work fine, e.g. HSD.test(my.glm,"factorA") but I could not get the interactions to work. I tried this:
intxns<-with(my.data, interaction(factorA,factorB,factorC))
HSD.test(my.glm,"intxns",group=TRUE)
But a get an error that indicates the HSD.test function didn't recognize "intxns" as a valid object, it looks like this (also, I checked the intxns object and it looks good and the number of rows matched the number of residuals of my glm):
Name: inxtns
factorA factorB factorC factorD
I get that same error if I just put nonsense into the factor field in the HSD.test function call. I checked the inxtns object and it looks good and the number of rows matched the number of residu
The agricolae notes don't actually cover the use of interactions in HSD.test, but I assume it can work.
Does anyone know how to get HSD.test to work with interactions? Or is there any other function you've gotten to work to produce compact letter displays for a glm with interactions?
I've been working on this for a number of days now and haven't been able to find a solution, hopefully I'm not missing something obvious.
Thanks!

I don't know how you've specified your glm model, but for HSD.test, it's looking to match the particular treatment name with the same name specified in the glm formula as well as the data frame. This is why your main effect, factorA will work, but not the 3-way interaction. For multiple comparison tests on interactions, I find it easiest to generate the interactions separately and add them to the data frame as additional columns. The glm model can then be specified using the new variables which code for the interaction.
For example,
set.seed(42)
glm.dat <- data.frame(y = rnorm(1000), factorA = sample(letters[1:2],
size = 1000, replace = TRUE),
factorB = sample(letters[1:2], size = 1000, replace = TRUE),
factorC = sample(letters[1:2], size = 1000, replace = TRUE))
# Generate interactions explicitly and add them to the data.frame
glm.dat$factorAB <- with(glm.dat, interaction(factorA, factorB))
glm.dat$factorAC <- with(glm.dat, interaction(factorA, factorC))
glm.dat$factorBC <- with(glm.dat, interaction(factorB, factorC))
glm.dat$factorABC <- with(glm.dat, interaction(factorA, factorB, factorC))
# General linear model
glm.mod <- glm(y ~ factorA + factorB + factorC + factorAB + factorAC +
factorBC + factorABC, family = 'gaussian', data = glm.dat)
# Multiple comparison test
library(agricolae)
comp <- HSD.test(glm.mod, trt = "factorABC", group = TRUE)
giving
comp$groups giving
trt means M
1 a.a.a 0.070052189 a
2 a.b.b 0.035684571 a
3 b.a.a 0.020517535 a
4 b.b.b -0.008153257 a
5 a.b.a -0.036136140 a
6 a.a.b -0.078891136 a
7 b.a.b -0.080845419 a
8 b.b.a -0.115808772 a

Related

weighting not works in 'aov' function of R

I have got in trouble with implementing weighted dataset by aov function in R.
For example my dataset "data_file" has target var "Y", and four independent var named (treat, V1 ,V2, V3).
Assuming:
V1(2 groups) & treat(3 groups) --> categorical,
V2 and v3 --> continuous.
I want to check baseline comparisons of independent variables among treat groups.
I ran aov test for this purpose,example:
base_V2_aov <- aov(data_file$V2 ~ data_file$treat)
base_V2_anov <- anova(base_V2_aov)
base_V2
It worked and showed significant difference of V2 among "treat" group but other variables were non significant, then i decided to weight my data based on V2 and run aov test in weighted data.
I used mnps function in Twang package for weighting.
mnps.data <- mnps(treat ~ V2, data_file, estimand = "ATE", stop-method = "es.mean", n.trees=5000, varbose = F)
data_file$ weight <- get.weights(mnps.data, stop.method = "es.mean")
I have read in one stackoverflow answer that survey packages does not support weighting for one-way ANOVA test, but aov function does.
So i ran this code:
base_V2_aov <- aov(data_file$V2 ~ data_file$trea, weights(data_file$weight))
base_V2_anov <- anova(base_V2_aov)
print(base_V2_anov)
It showes an error:
Error: $ operator is invalid for atomic vectors
I tried :
base_V2_aov <- aov(data_file$V2 ~ data_file$trea, weights(weight))
It did not find object "weight"
I also checked this :
base_V2_aov <- aov(data_file$V2 ~ data_file$trea, weights(data_file))
It did not show error, but the results were exactly the same as without weighting(i expected to change base on significant difference without weighting)
I want to know what is the appropriate "object" for weights in aov function?
It seems you should use:
weight = your weighting variable in the aov arguments.
I replicate something like your dataset, and after using above code the results of the comparisons between groups were different which showed that weighting method had worked.

Reporting interactions in a linear model in r

I am trying to report all the interactions in a linear model that reads:
mod1.lme <- lm(volume ~ Group * Treatment + Group + Treatment, data = df)
Group is a factor variable with 3 levels: A, B and C.
The result that I currently get is for (I made up the data):
These two estimates are in reference to Treatment:A, but I would like to see each effect independently. So the output that I would like to get is:
Treatment:A
Treatment:B
Treatment:C
If I eliminate the intercept adding -1 at the end I get:
What is the best way to code this?
Thanks
The reason you are seeing the output that you are, is that one of the factor levels of Treatment becomes a reference level. When interpreting the model the coefficients become "the difference in effect from the reference level". This is necessary as long as the model includes an intercept, so the only way to get the interpretation with all coefficients shown is to remove the intercept as shown below.
mod1.lme <- lm(volume ~ Group * Treatment - 1, data = df)
Edit:
To change the name of the interaction effect, one would have to edit the name manually
sum.lm <- summary(mod1.lme)
rownames(sum.lm$coef) <- c("groupA","groupB","groupC", "groupA:Treatment", "groupB:Treatment", "groupC:Treatment")
or alternatively use another package for summaries such as sjPlot
library(sjPlot)
tab_model(mod1.lme, pred.labels = c("groupA","groupB","groupC", "groupA:Treatment", "groupB:Treatment", "groupC:Treatment"))

Discrepancy emmeans in R (using ezAnova) vs estimated marginal means in SPSS

So this is a bit of a hail mary, but I'm hoping someone here has encountered this before. I recently switched from SPSS to R, and I'm now trying to do a mixed-model ANOVA. Since I'm not confident in my R skills yet, I use the exact same dataset in SPSS to compare my results.
I have a dataset with
dv = RT
within = Session (2 levels), Cue (3 levels), Flanker (2 levels)
between = Group(3 levels).
no covariates.
unequal number of participants per group level (25,25,23)
In R I'm using the ezAnova package to do the mixed-model anova:
results <- ezANOVA(
data = ant_rt_correct
, wid = subject
, dv = rt
, between = group
, within = .(session, cue, flanker)
, detailed = T
, type = 3
, return_aov = T
)
In SPSS I use the following GLM:
GLM rt.1.center.congruent rt.1.center.incongruent rt.1.no.congruent rt.1.no.incongruent
rt.1.spatial.congruent rt.1.spatial.incongruent rt.2.center.congruent rt.2.center.incongruent
rt.2.no.congruent rt.2.no.incongruent rt.2.spatial.congruent rt.2.spatial.incongruent BY group
/WSFACTOR=session 2 Polynomial cue 3 Polynomial flanker 2 Polynomial
/METHOD=SSTYPE(3)
/EMMEANS=TABLES(group*session*cue*flanker)
/PRINT=DESCRIPTIVE
/CRITERIA=ALPHA(.05)
/WSDESIGN=session cue flanker session*cue session*flanker cue*flanker session*cue*flanker
/DESIGN=group.
The results of which line up great, ie:
R: Session F(1,70) = 46.123 p = .000
SPSS: Session F(1,70) = 46.123 p = .000
I also ask for the means per cell using:
descMeans <- ezStats(
data = ant_rt_correct
, wid = subject
, dv = rt
, between = group
, within = .(session, cue, flanker) #,cue,flanker)
, within_full = .(location,direction)
, type = 3
)
Which again line up perfectly with the descriptives from SPSS, e.g. for the cell:
Group(1) - Session(1) - Cue(center) - Flanker(1)
R: M = 484.22
SPSS: M = 484.22
However, when I try to get to the estimated marginal means, using the emmeans package:
eMeans <- emmeans(results$aov, ~ group | session | cue | flanker)
I run into descrepancies as compared to the Estimated Marginal Means table from the SPSS GLM output (for the same interactions), eg:
Group(1) - Session(1) - Cue(center) - Flanker(1)
R: M = 522.5643
SPSS: M = 484.22
It's been my understanding that the estimated marginal means should be the same as the descriptive means in this case, as I have not included any covariates. Am I mistaken in this? And if so, how come the two give different results?
Since the group sizes are unbalanced, I also redid the analyses above after making the groups of equal size. In that case the emmeans became:
Group(1) - Session(1) - Cue(center) - Flanker(1)
R: M =521.2954
SPSS: M = 482.426
So even with equal group sizes in both conditions, I end up with quite different means. Keep in mind that the rest of the statistics and the descriptive means áre equal between SPSS and R. What am I missing... ?
Thanks!
EDIT:
The plot thickens.. If I perform the ANOVA using the AFEX package:
results <- aov_ez(
"subject"
,"rt"
,ant_rt_correct
,between=c("group")
,within=c("session", "cue", "flanker")
)
)
and then take the emmeans again:
eMeans <- emmeans(results, ~ group | session | cue | flanker)
I suddenly get values much closer to that of SPSS (and the descriptive means)
Group(1) - Session(1) - Cue(center) - Flanker(1)
R: M = 484.08
SPSS: M = 484.22
So perhaps ezANOVA is doing something fishy somewhere?
I suggest you try this:
library(lme4) ### I'm guessing you need to install this package first
mod <- lmer(rt ~ session + cue + flanker + (1|group),
data = ant_rt_correct)
library(emmeans)
emm <- emmeans(mod, ~ session * cue * flanker)
pairs(emm, by = c("cue", "flanker") # simple comparisons for session
pairs(emm, by = c("session", "flanker") # simple comparisons for cue
pairs(emm, by = c("session", "cue") # simple comparisons for flanker
This fits a mixed model with random intercepts for each group. It uses REML estimation, which is likely to be what SPSS uses.
In contrast, ezANOVA fits a fixed-effects model (no within factor at all), and aov_ez uses the aov function which produces an analysis that ignores the inter-block effects. Those make a difference especially with unbalanced data.
An alternative is to use afex::mixed, which in fact uses lme4::lmer to fit the model.

LMER Factor vs numeric Interaction

I am attempting to use lmer to model my data.
My data has 2 independent variables and a dependent variable.
The first is "Morph" and has values "Identical", "Near", "Far".
The second is "Response" which can be "Old" or "New".
The dependent variable is "Fix_Count".
So here is a sample dataframe and what I currently have for running the linear model.
Subject <- c(rep(1, times = 6), rep(2, times = 6))
q <- c("Identical", "Near", "Far")
Morph <- c(rep(q, times = 4))
t <- c(rep("old", times = 3),rep("new", times=3))
Response <- c(rep(t, times = 2))
Fix_Count <- sample(1:9, 12, replace = T)
df.main <- data.frame(Subject,Morph, Response, Fix_Count, stringsAsFactors = T)
df.main$Subject <- as.factor(df.main$Subject)
res = lmer(Fix_Count ~ (Morph * Response) + (1|Subject), data=df.main)
summary(res)
And the output looks like this:
The issue is I do not want it to do combination but an overall interaction of Morph:Response.
I can get it to do this by converting Morph to numeric instead of factor. However I'm not sure conceptually that makes sense as the values don't properly represent 1,2,3 but low-mid-high (ordered but qualitative).
So: 1. Is it possible to run lmer to get interaction effects between 2 factor variables?
2. Or do you think numeric is a fine way to class "Identica", "Near", "Far"?
3. I have tried setting contrasts to see if that can help, but sometimes I get an error and other times it seems like nothing is changed. If contrasts would help, could you explain how I would implement this?
Thank you so much for any help you can offer. I have also posted this question to stack exchange as I am unsure if this is a coding issue or a stats issue. However I can remove it from the less relevant forum once I know.
Best, Kirk
Two problems I see. First, you should be using a factor variable for Subject. It's clearly not a continuous or integer variable. And to (possibly) address part of your question, there is an interaction function designed to work with regression formulas. I'm pretty sure that the formula interface will interpret the "*" operator that you used as a call to interaction, but the labeling of the output may be different and perhaps more to your liking. I get the same number of coefficients with:
res = lmer(Fix_Count ~ interaction(Morph , Response) + (1|Subject), data=df.main)
But that's not an improvement.
However, they differ from the model created with Morph*Response. Probably there is a different set of contrast options.
The way to get an overall statistical test of the interaction is to compare nested models:
res_simple = lmer(Fix_Count ~ Morph + Response + (1|Subject), data=df.main)
And then do an anova for the model comparison:
anova(res,res_simple)
refitting model(s) with ML (instead of REML)
Data: df.main
Models:
res_simple: Fix_Count ~ Morph + Response + (1 | Subject)
res: Fix_Count ~ interaction(Morph, Response) + (1 | factor(Subject))
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
res_simple 6 50.920 53.830 -19.460 38.920
res 8 54.582 58.461 -19.291 38.582 0.3381 2 0.8445
My opinion is that it is sufficiently close to the boundary for stats vs coding that it could have been acceptable on either forum. (You are not supposed to cross post, however.) If you are satisfied with a coding answer then we are done. If you need help with understanding model comparison, then you may need to edit your CV.com's question to request a more theory-based answer than mine. (I checked to make sure the anova results are the same regardless of whether you use the interaction function or the "*" operator.)

"Vectorizing" this for-loop in R? (suppressing interaction main effects in lm)

When interactions are specified in lm, R includes main effects by default, with no option to suppress them. This is usually appropriate and convenient, but there are certain instances (within estimators, ratio LHS variables, among others) where this isn't appropriate.
I've got this code that fits a log-transformed variable to a response variable, independently within subsets of the data.
Here is a silly yet reproducible example:
id = as.factor(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,6,7,7,8,8,8,9,9,9,9,10))
x = rexp(length(id))
y = rnorm(length(id))
logx = log(x)
data = data.frame(id,y,logx)
for (i in data$id){
sub = subset(data, id==i) #This splits the data by id
m = lm(y~logx-1,data=sub) #This gives me the linear (log) fit for one of my id's
sub$x.tilde = log(1+3)*m$coef #This linearizes it and gives me the expected value for x=3
data$x.tilde[data$id==i] = sub$x.tilde #This puts it back into the main dataset
data$tildecoeff[data$id==i] = m$coef #This saves the coefficient (I use it elsewhere for plotting)
}
I want to fit a model like the following:
Y = B(X*id) +e
with no intercept and no main effect of id. As you can see from the loop, I'm interested in the expectation of Y when X=3, constrained the fit through the origin (because Y is a (logged) ratio of Y[X=something]/Y[X=0].
But if I specify
m = lm(Y~X*as.factor(id)-1)
there is no means of suppressing the main effects of id. I need to run this loop several hundred times in an iterative algorithm, and as a loop it is far too slow.
The other upside of de-looping this code is that it'll be much more convenient to get prediction intervals.
(Please, I don't need pious comments about how leaving out main effects and intercepts is improper -- it usually is, but I can promise that it isn't in this instance).
Thanks in advance for any ideas!
I think you want
m <- lm(y ~ 0 + logx : as.factor(id))
see R-intro '11.1 Defining statistical models; formulae'

Resources