LMER Factor vs numeric Interaction - r

I am attempting to use lmer to model my data.
My data has 2 independent variables and a dependent variable.
The first is "Morph" and has values "Identical", "Near", "Far".
The second is "Response" which can be "Old" or "New".
The dependent variable is "Fix_Count".
So here is a sample dataframe and what I currently have for running the linear model.
Subject <- c(rep(1, times = 6), rep(2, times = 6))
q <- c("Identical", "Near", "Far")
Morph <- c(rep(q, times = 4))
t <- c(rep("old", times = 3),rep("new", times=3))
Response <- c(rep(t, times = 2))
Fix_Count <- sample(1:9, 12, replace = T)
df.main <- data.frame(Subject,Morph, Response, Fix_Count, stringsAsFactors = T)
df.main$Subject <- as.factor(df.main$Subject)
res = lmer(Fix_Count ~ (Morph * Response) + (1|Subject), data=df.main)
summary(res)
And the output looks like this:
The issue is I do not want it to do combination but an overall interaction of Morph:Response.
I can get it to do this by converting Morph to numeric instead of factor. However I'm not sure conceptually that makes sense as the values don't properly represent 1,2,3 but low-mid-high (ordered but qualitative).
So: 1. Is it possible to run lmer to get interaction effects between 2 factor variables?
2. Or do you think numeric is a fine way to class "Identica", "Near", "Far"?
3. I have tried setting contrasts to see if that can help, but sometimes I get an error and other times it seems like nothing is changed. If contrasts would help, could you explain how I would implement this?
Thank you so much for any help you can offer. I have also posted this question to stack exchange as I am unsure if this is a coding issue or a stats issue. However I can remove it from the less relevant forum once I know.
Best, Kirk

Two problems I see. First, you should be using a factor variable for Subject. It's clearly not a continuous or integer variable. And to (possibly) address part of your question, there is an interaction function designed to work with regression formulas. I'm pretty sure that the formula interface will interpret the "*" operator that you used as a call to interaction, but the labeling of the output may be different and perhaps more to your liking. I get the same number of coefficients with:
res = lmer(Fix_Count ~ interaction(Morph , Response) + (1|Subject), data=df.main)
But that's not an improvement.
However, they differ from the model created with Morph*Response. Probably there is a different set of contrast options.
The way to get an overall statistical test of the interaction is to compare nested models:
res_simple = lmer(Fix_Count ~ Morph + Response + (1|Subject), data=df.main)
And then do an anova for the model comparison:
anova(res,res_simple)
refitting model(s) with ML (instead of REML)
Data: df.main
Models:
res_simple: Fix_Count ~ Morph + Response + (1 | Subject)
res: Fix_Count ~ interaction(Morph, Response) + (1 | factor(Subject))
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
res_simple 6 50.920 53.830 -19.460 38.920
res 8 54.582 58.461 -19.291 38.582 0.3381 2 0.8445
My opinion is that it is sufficiently close to the boundary for stats vs coding that it could have been acceptable on either forum. (You are not supposed to cross post, however.) If you are satisfied with a coding answer then we are done. If you need help with understanding model comparison, then you may need to edit your CV.com's question to request a more theory-based answer than mine. (I checked to make sure the anova results are the same regardless of whether you use the interaction function or the "*" operator.)

Related

linear mixed models: (in)appropriate factor level setup? + rank deficient matrix warning

I'm trying to run a mixed-effects model to look at the effects of three treatments on weight in two different groups. I have two questions - I'm really sorry if they're stupid ones. I have next to no experience using R; my supervisor said that I should use RStudio and gave me some code (which...he didn't write either, so he can't really help me with it...and nor can anybody else in my lab.), and I'm uncertain about whether it's doing what I need it to be doing. I'd really appreciate some help.
The first question is about the way my factor levels function in the analysis. My groups and factor levels are set up as follows:
GT<- as.factor(data$GT)
AT <- as.factor(data$AT)
FM <- as.factor(data$FM)
batch <- as.factor(data$BOX)
data$GT<-factor(data$GT,levels = c("WT", "m5"))
data$AT<-factor(data$AT,levels = c("no","yes"))
data$FM<-factor(data$FM,levels = c("no","yes"))
Notably, everyone in FM is also in AT.
(i.e. for both WT and M5, there's ATno/FMno, ATyes/FMno, and ATyes/FMyes, but no ATno/FMyes treatments)
This is the code I've been given:
Weight<- data[,53:58]
Weight.new <- organise_data(data = Weight, groups = group.info,
new_var_name = 'Time')
Weight.new$Time <- factor(Weight.new$Time,
levels = c('Weight_W8','Weight_W9','Weight_W10',
'Weight_W11','Weight_W12', 'Weight_W13'))
Weight.new$Time = as.numeric(Weight.new$Time)
Weight.lmm <- lmer(value ~ GT + AT + FM + Time + GT*FM + AT*FM + GT*AT*FM + (1|BOX), na.action=na.omit,
data = Weight.new)
summary(Weight.lmm)
Will this mean that results for ATyes include both ATyesFMTno and ATyesFMTyes? If so, given ATyesFMTno and ATyesFMTyes are different treatment groups they should be looked at separately, right? How should I do this?
Second, two of the interactions (AT * FM and GT * AT * FM) are always dropped with the following warning:
fixed-effect model matrix is rank deficient so dropping 2 columns / coefficients
I understand there can be a number of reasons for this, and that it's not necessarily a fatal kind of thing, but here I'm worried that it's because I'm using the model inappropriately, or have incorrectly specified some terms or something. Is this an appropriate way to have structured things? Again, I'm sorry if this is completely obvious, I'm just a bit inexperienced and overwhelmed and there's nobody else I can ask. Any advice would be extremely appreciated.

Reporting interactions in a linear model in r

I am trying to report all the interactions in a linear model that reads:
mod1.lme <- lm(volume ~ Group * Treatment + Group + Treatment, data = df)
Group is a factor variable with 3 levels: A, B and C.
The result that I currently get is for (I made up the data):
These two estimates are in reference to Treatment:A, but I would like to see each effect independently. So the output that I would like to get is:
Treatment:A
Treatment:B
Treatment:C
If I eliminate the intercept adding -1 at the end I get:
What is the best way to code this?
Thanks
The reason you are seeing the output that you are, is that one of the factor levels of Treatment becomes a reference level. When interpreting the model the coefficients become "the difference in effect from the reference level". This is necessary as long as the model includes an intercept, so the only way to get the interpretation with all coefficients shown is to remove the intercept as shown below.
mod1.lme <- lm(volume ~ Group * Treatment - 1, data = df)
Edit:
To change the name of the interaction effect, one would have to edit the name manually
sum.lm <- summary(mod1.lme)
rownames(sum.lm$coef) <- c("groupA","groupB","groupC", "groupA:Treatment", "groupB:Treatment", "groupC:Treatment")
or alternatively use another package for summaries such as sjPlot
library(sjPlot)
tab_model(mod1.lme, pred.labels = c("groupA","groupB","groupC", "groupA:Treatment", "groupB:Treatment", "groupC:Treatment"))

glmer extracts coefficient only for main (predictor) factor, not contrast

here is the GLMER model
model <- glmer(ACC~Group*M_O*Lblock+ (1| Subject) + (1| hand),data = learndata_long3,family="binomial")
while the 'Lblock' factor has 9 levels, others have 2 levels.
The results generate like this:
summary(model)$coefficients
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.437931021 0.16334362 2.68104155 7.339340e-03
Group1 -0.032138148 0.14961572 -0.21480463 8.299196e-01
M_O1 0.135726477 0.04115871 3.29763642 9.750230e-04
Lblock1 0.301264476 0.08343952 3.61057288 3.055214e-04
Lblock2 0.623913565 0.08247767 7.56463576 3.889529e-14
Lblock3 1.022046512 0.08235930 12.40960689 2.317880e-35
Lblock4 1.399407518 0.08337615 16.78426631 3.181367e-63
Lblock5 1.741198402 0.08541505 20.38514752 2.265326e-92
Lblock6 2.065315516 0.08843600 23.35378765 1.261292e-120
Lblock7 2.268393650 0.09075950 24.99345703 7.201546e-138
Lblock8 2.637079325 0.09707420 27.16560426 1.656429e-162
ALL I want is extract each factor, like"
Estimate : Group / M_O / Lblock
how can I do? just sum up and then mean the block? or ?
Very new to these fields, thanks for your help
Thanks for clarifying a bit. I think what you are expecting is something similar to the output of an ANOVA? But this will not be possible with your data, as you have two random effects specified.
As you are running a logistic regression, you should read up a bit on how to interpret them. (I'm just putting this here because you said you were new to this)
https://stats.idre.ucla.edu/stata/output/logistic-regression-analysis/
Now, if you want to test the contribution of one of your factors to to the model, you have to create nested models, and compare them with a likelihood-ratio test using the anova() function in R.
For example, let's say that you had the same model you had above, but without any interactions specified:
m1 <- glmer(ACC~Group+M_O+Lblock+ (1| Subject) + (1| hand),data = learndata_long3,family="binomial")
And then one without the Group predictor:
m2 <- glmer(ACC~M_O+Lblock+ (1| Subject) + (1| hand),data = learndata_long3,family="binomial")
Then we compare whether having the Group predictor significantly improved the model:
anova(m1,m2)
This will give your a p-value telling you if the addition of Group significantly improves model fit.
If this seems all like a lot, which it is if you're not familiar with model comparison, I'd recommend looking at this tutorial from Bodo Winter. It's directed at people who are new mixed-models, and want a conceptual foundation of what is going on. I don't know what field you are in, but I think the examples are pretty accessible to everyone.
https://arxiv.org/abs/1308.5499
Please let me know if you need any other clarifications or have any questions during the tutorial.

Ordinal regression - proportional odds assumption not met for variable in interaction

I try to analyze a dataset with an ordinal response (0-4) and three categorical factors. I'm interested in the interactions of all three factors as well as the main effects. I used the clm function of the package "ordinal" and checked the assumptions by using the "nominal_test" function. It revealed a significant difference for one of the predictors. And now I don't know how to proceed... I tried to put the problematic factor and all its interactions in the "nominal" argument (see code) and R gives me warnings. Nevertheless, I made several likelyhood ratio tests always comparing a model including an interaction with one missing it (ANOVA(without,with, test="Chisq")) and get some nice significant results. Still, I feel like I have no clue what I'm doing here and I don't trust the results. So my question is: Is it ok what I did? What else can I do? or is the data just 'unanalyzable'?
Here is the code for the test:
# this is the model
res=clm(cue~ intention:outcome:age+
intention:outcome+
intention:age+
outcome:age+
intention+outcome+age+
Gender,
data=xdata)
#proportional odds assumption
nominal_test(res)
# Df logLik AIC LRT Pr(>Chi)
#<none> -221.50 467.00
#intention 3 -215.05 460.11 12.891 0.004879 **
#outcome 3 -219.44 468.87 4.124 0.248384
#age
#Gender 3 -219.50 469.00 3.994 0.262156
#intention:outcome
#intention:age
#outcome:age 6 -217.14 470.28 8.716 0.190199
#intention:outcome:age 12 -188.09 424.19 66.808 1.261e-09 ***
And here is an example of how I tried to solve it -> and check the 3-way-interaction of all three predictors. I did the same for the 2-way-interactions as well...
res=clm(cue~ outcome:age+
outcome+age+
Gender,
nominal= ~ intention:age:outcome+
intention:age+
intention:outcome+
intention,
data=xdata)
res.red=clm(cue~ outcome:age+
outcome+age+
Gender,
nominal= ~
intention:age+
intention:outcome+
intention,
data=xdata)
anova(res,res.red, test="Chisq")
# no.par AIC logLik LR.stat df Pr(>Chisq)
#res.red 26 412.50 -180.25
#res 33 424.11 -179.05 2.3945 7 0.9348
And here is the warning that R gives me when I try to cenverge the model:
Warning message:
(-3) not all thresholds are increasing: fit is invalid
In addition: Absolute convergence criterion was met, but relativecriterion was not met
I'm especially concerned about the sentence "Fit is not valid"... I don't know what to do with this and would be happy about any idea or hint!
Thank you!
Have you tried to use a more general model like the partial proportional odds model? Your data only has to be nominal, not ordinal to use this model. If you find hugh differences between the log likelihoods, your assumption about ordinality is not met.
You can use vlgm() from the VGAM package. Here are a few examples.
As I don't know how your data looks like, I can't say whether it's unanalyzable, but the code would be something like this:
library(VGAM)
res <- vglm(cue ~ intention:outcome:age+
intention:outcome+
intention:age+
outcome:age+
intention+outcome+age+
Gender,
family = cumulative(parallel = FALSE ~ intention),
data = xdata)
summary(res)
I think you could use pchiq() as proposed in the example I posted above to compare both models like you did before with anova():
pchisq(deviance(res) - deviance(res.red),
df = df.residual(res) - df.residual(res.red), lower.tail = FALSE)

How to obtain Tukey compact letter display from a GLM with interactions

I have set of data that I've analyzed with a generalized linear model that has three categorical factors in 3-way interaction (factorA, factorB, factorC) and a fourth continuous factor (factorD) that is simply added in the model. I am trying to obtain a set of Tukey letter groups (ie, compact letter display) from the model but haven't found a way to include the interaction successfully. I'm not interested in including factorD, just the three in the interaction.
I have gotten the Tukey-adjusted pairwise comparisons with this:
lsmeans(my.glm, factorA*factorB*factorC)
But I was not able to figure out how to produce a compact letters display from that. It can be done with multcomp package but I could only find ways to do it with main effects with that package, not interactions.
So then I tried the agricolae package, as this post (https://stats.stackexchange.com/questions/31547/how-to-obtain-the-results-of-a-tukey-hsd-post-hoc-test-in-a-table-showing-groupe) discusses that that should work. However, following the instructions in that answer led to a non-functional response from HSD.test. Specifically, I could get the main effects tests to work fine, e.g. HSD.test(my.glm,"factorA") but I could not get the interactions to work. I tried this:
intxns<-with(my.data, interaction(factorA,factorB,factorC))
HSD.test(my.glm,"intxns",group=TRUE)
But a get an error that indicates the HSD.test function didn't recognize "intxns" as a valid object, it looks like this (also, I checked the intxns object and it looks good and the number of rows matched the number of residuals of my glm):
Name: inxtns
factorA factorB factorC factorD
I get that same error if I just put nonsense into the factor field in the HSD.test function call. I checked the inxtns object and it looks good and the number of rows matched the number of residu
The agricolae notes don't actually cover the use of interactions in HSD.test, but I assume it can work.
Does anyone know how to get HSD.test to work with interactions? Or is there any other function you've gotten to work to produce compact letter displays for a glm with interactions?
I've been working on this for a number of days now and haven't been able to find a solution, hopefully I'm not missing something obvious.
Thanks!
I don't know how you've specified your glm model, but for HSD.test, it's looking to match the particular treatment name with the same name specified in the glm formula as well as the data frame. This is why your main effect, factorA will work, but not the 3-way interaction. For multiple comparison tests on interactions, I find it easiest to generate the interactions separately and add them to the data frame as additional columns. The glm model can then be specified using the new variables which code for the interaction.
For example,
set.seed(42)
glm.dat <- data.frame(y = rnorm(1000), factorA = sample(letters[1:2],
size = 1000, replace = TRUE),
factorB = sample(letters[1:2], size = 1000, replace = TRUE),
factorC = sample(letters[1:2], size = 1000, replace = TRUE))
# Generate interactions explicitly and add them to the data.frame
glm.dat$factorAB <- with(glm.dat, interaction(factorA, factorB))
glm.dat$factorAC <- with(glm.dat, interaction(factorA, factorC))
glm.dat$factorBC <- with(glm.dat, interaction(factorB, factorC))
glm.dat$factorABC <- with(glm.dat, interaction(factorA, factorB, factorC))
# General linear model
glm.mod <- glm(y ~ factorA + factorB + factorC + factorAB + factorAC +
factorBC + factorABC, family = 'gaussian', data = glm.dat)
# Multiple comparison test
library(agricolae)
comp <- HSD.test(glm.mod, trt = "factorABC", group = TRUE)
giving
comp$groups giving
trt means M
1 a.a.a 0.070052189 a
2 a.b.b 0.035684571 a
3 b.a.a 0.020517535 a
4 b.b.b -0.008153257 a
5 a.b.a -0.036136140 a
6 a.a.b -0.078891136 a
7 b.a.b -0.080845419 a
8 b.b.a -0.115808772 a

Resources