I built a model in r with the lmer function:
lmer(DV ~ IV1 + IV1:IV2 - 1 + (1|Group/Participant)
This was correctly specified and I got the results I expected.
I'm now trying to replicate these results in spss. so far I have:
MIXED DV BY IV1 IV2
/FIXED IV1 IV1*IV2 | NOINT SSTYPE(3)
/METHOD REML
/RANDOM= INTERCEPT | SUBJECT(Group*Participant) COVTYPE(VC)
/PRINT= SOLUTION TESTCOV.
My results are not remotely similar, and I believe it is because of the differences between the : and * terms.
How can I replicate IV1 + IV1:IV2 in SPSS?
If I'm understanding the R formula documentation properly, the ":" there is the same as the "*" or "BY" specification in SPSS.
If you want a second, uncorrelated random effect with just Group as the subject specification, simply add a second RANDOM subcommand, such as:
/RANDOM= INTERCEPT | SUBJECT(Group) COVTYPE(VC)
Related
I was using the glmer code for a logistic regression model with 2.5 million observations. However, after I added the multi-level component (a few hundred thousand groups), the data was too large to run in a timely manner on my computer. I want to try a general additive model instead, but I am confused about how to write the code.
The glmer code is as follows:
mylogit.m1a <- glmer(outcome ~
exposure*risk+ tenure.yr + CurrentAge + percap.inc.k + employment + rentership + pop.change + pop.den.k +
(1 | geo_id / house_id),
data = temp, family = "binomial", control = glmerControl(optimizer="bobyqa", calc.derivs=FALSE))
print(Sys.time()-start)
The example I found writes the gam like this:
ga_model = gam(
Reaction ~ Days + s(Subject, bs = 're') + s(Days, Subject, bs = 're'),
data = sleepstudy,
method = 'REML'
)
But I am confused about why there are two bits in parenthesis / what I should put in parenthesis to specify the model correctly.
The details are given in ?smooth.construct.re.smooth.spec:
Exactly how the random effects are implemented is best seen by
example. Consider the model term ‘s(x,z,bs="re")’. This will
result in the model matrix component corresponding to ‘~x:z-1’
being added to the model matrix for the whole model.
So s(Days, Subject, bs = "re") is equivalent to the (0 + Days|Subject) term in the lmer model: both of them encode "random variation in slope with respect to day across subjects"
So your (1 | geo_id / house_id) would be translated to mgcv syntax as
s(geo_id, bs = "re") + s(geo_id, house_id, bs = "re")
(the nesting syntax a/b expands in general to a + a:b).
A couple of other comments:
you should probably use bam() as a drop-in replacement for gam() (much faster)
you may very well run into problems with memory usage: mgcv doesn't use sparse matrices for the random effects terms, so they can get big
Aside from R function nlme::lme(), I'm wondering how else I can model the Level-1 residual variance-covariance structure?
ps. My search showed I could possibly use glmmTMB package but it seems it is not about Level-1 residuals but random-effects themselves (see below code).
glmmTMB::glmmTMB(y ~ times + ar1(times | subjects), data = data) ## DON'T RUN
nlme::lme (y ~ times, random = ~ times | subjects,
correlation = corAR1(), data = data) ## DON'T RUN
glmmTMB can effectively be used to model level-1 residuals, by adding an observation-level random effect to the model (and if necessary suppressing the level-1 variance via dispformula ~ 0. For example, comparing the same fit in lme and glmmTMB:
library(glmmTMB)
library(nlme)
data("sleepstudy" ,package="lme4")
ss <- sleepstudy
ss$times <- factor(ss$Days) ## needed for glmmTMB
I initially tried with random = ~Days|Subject but neither lme nor glmmTMB were happy (overfitted):
lme1 <- lme(Reaction ~ Days, random = ~1|Subject,
correlation=corAR1(form=~Days|Subject), data=ss)
m1 <- glmmTMB(Reaction ~ Days + (1|Subject) +
ar1(times + 0 | Subject),
dispformula=~0,
data=ss,
REML=TRUE,
start=list(theta=c(4,4,1)))
Unfortunately, in order to get a good answer with glmmTMB I did have to tweak the starting values ...
I originally ran my data in SPSS because figuring out the lmer package took some time for me to learn. I spent a few weeks writing up a script in R, but my output in R is different than what I'm getting using SPSS.
I have 3 Fixed Effects: Group, Session, and TrialType.
When I ran a mixed model in SPSS, I got the interaction Group*Session p=.08 OR p=.02, depending on which covariance structure I used. This is partly the reason I wanted to use R, because I didn't have enough information to help me decide which structure to use.
Here are my models in R. I'm using Log Likelihood Test to get a p-value for this Group*Session interaction.
Mod2 = lmer(accuracy ~ group*session*trialtype + (trialtype|subject), REML=F, data=data,
control = lmerControl(optimizer = "optimx", optCtrl=list(method='L-BFGS-B'))))
Mod5 = lmer(accuracy ~ session + trialtype + group + session*trialtype + trialtype*group + (trialtype|subject),
data=data, REML=FALSE,
control = lmerControl(optimizer = "optimx", optCtrl=list(method='L-BFGS-B')))
anova(Mod2, Mod5)
Data: data
Models:
Mod5: accuracy ~ session + trialtype + group + session * trialtype +
Mod5: trialtype * group + (trialtype | subject)
Mod2: accuracy ~ group * session * trialtype + (trialtype | subject)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
Mod5 23 -961.32 -855.74 503.66 -1007.3
Mod2 27 -956.32 -832.38 505.16 -1010.3 2.9989 4 0.558
I'll also note that I added the lmerControl based on the 2 warning/error messages I was getting. When I added, this, I got the singular boundary warning message.
Is it possible that R is not recognizing a grouping variable in my data? I'm not sure how to identify this or correct it.
Here is my syntax from SPSS:
MIXED Acc BY Test TrialType Group
/CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0,
ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
/FIXED=Test TrialType Group Test*TrialType Test*Group TrialType*Group Test*TrialType*Group |
SSTYPE(3)
/METHOD=ML
/PRINT=COVB DESCRIPTIVES G SOLUTION
/RANDOM=INTERCEPT TrialType | SUBJECT(Subject) COVTYPE(CS)
/REPEATED=Test | SUBJECT(Subject) COVTYPE(ID).
The first thing to do to figure this out is to make sure the log-likelihood values for the fitted models are the same, as if the models aren't getting the same results, the test statistics wouldn't be expected to be the same. Even if the models are the same, in R you're using a chi-square statistic rather than an F, as is used in SPSS Statistics MIXED. The p values often would differ, though not usually by as much as from .02-.08 to .558. I suspect you haven't actually got strictly comparable results here.
I'm trying to compare a difference I encountered in mixed model analyses using the package lme4. Maybe my statistical background is not sharp enough but I just can't figure what the "+0" in the code is and what the resulting difference (to a model without +0) implies.
Here my example with the +0:
lmer(Yield ~ Treatment + 0 + (1|Batch) + (1|Irrigation), data = D)
in contrast to:
lmer(Yield ~ Treatment + (1|Batch) + (1|Irrigation), data = D)
Does anyone have a smart explanation for what the +0 is and what it does to the results?
Models with + 0 usually mean "without an overall intercept" (in the fixed effects). By default, models have an intercept included, you can also make that explicit using + 1.
Most discussions of regression modelling will recommend including an intercept, unless there's good reason to believe the outcome will be 0 when the predictors are all zero (maybe true of some physical processes?).
Compare:
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
fm2 <- lmer(Reaction ~ Days + 0 + (Days | Subject), sleepstudy)
summary(fm1)
summary(fm2)
paying attention to the fixed effects
In Stata, I know that if I use the following command, I can get the logits for each possible combination between my dependent variable (thkbins) and my two predictor variables (cc & tv):
melogit thkbins cc#tv || school:,
Is there a way to produce a similar output in R? I have been using the glmer command from the lme4 package, and while I can get the output with the interaction term, it isn't exactly what I can produce in Stata.
model1 <- glmer(thkbin ~ cc + tv + cc*tv + (1|school),
data=thkdata, family = binomial, nAGQ = 7)
summary(model1)
I would use clmm from the package ordinal (tutorial here):
model<-clmm(DepVar~IndVar+(1|WithinVar),data=df)
I hope this helps.