What will be best "formula" for this mixed effects model - r

I have following study which I want to analyze with Mixed effects model:
"Subjects" are divided in two "Group" (Treatment A and B).
"Weight" is recorded before and 3 months ("Time") after treatment (repeated measures).
Need to correct for subjects "age" and "gender" also.
Main question is: Whether two groups differ in their effect on weight?
For Mixed effects, I was considering following syntax with lmer function of lme4 package:
lmer(weight ~ Group*Time + age, (1|subject) + (1|gender), data=mydata)
Is this syntax correct or do I need to use more complex terms such as ones given below:
(time|subject)
(time + 1|subject)
(1|subject) + (1|Group:subject) + (1|Time:subject)
I have tried to see different sources on the internet but literature seems to be very confusing.

gender should not be a random effect (intercept). It doesn't meet any of the usual requirements for it to be treated as random.
(time|subject)
and
(time + 1|subject)
are the same. It means you are allowing the fixed effect of time to vary at different levels of subject
(1|subject) + (1|Group:subject) + (1|Time:subject)
makes very little sense. This says that Time is nested in subject because (1|Time:subject) is the samee as (1|subject:Time) and (1|subject) + (1|subject:Time) is the definition of how to specify nested random effects. The addition of (1|Group:subject) seems bizarre and I would be surprised if such a model is identified. Your research question is "Whether two groups differ" so this means you want to know the fixed effect of Group, so (1|Group:subject) does not make sense.
The model:
lmer(weight ~ Group*Time + age + gender, (1|subject), data=mydata)
makes sense.
Finally, this question should be on Cross Validated.

Related

Boundary (singular) fit in lmer

I know this error has already been issued in stackoverflow, but the solution for the other questions doesn't seem to apply to my problem.
I have a very simple model that predicts energy expenditure based on the number of days.
a<-lmer(energy ~ days + (1|PCBType), data = stp_summary_v1 )
and the model gives the warning:
boundary (singular) fit: see ?isSingular
I cannot share the data, but here is the distribution:
What I've already tried without success:
a<-lmer(log(energy) ~ days + (1|PCBType), data = stp_summary_v1)
a<-lmer(scale(energy) ~ days + (1|PCBType), data = stp_summary_v1)
a<-lmer(log(energy) ~ log(days) + (1|PCBType), data = stp_summary_v1)
add more independent variables
change glmer() family
change the independent variable
Any idea why I keep getting this warning?
With only two levels of PCBType, this variable should be a fixed effect.
By specifying it as random you are asking the software to estimate a variance for a normally distributed variable from only 2 observations, which of course does not make any sense and is almost certainly the cause of the singular fit.

Linear regression with ordinal variable

I have five cognitive variables (memory, cognitive flexibility, critical thinking, verbal, and attention) and one ordinal variable (adversity scores from 1-10). I have cortical thickness as my outcome variable (or dependent variable).
I was wondering how I can set up my regression?
I was thinking to do this:
lm(cortical_thickness ~ memory + cognitive_flexibility + critical_thinking + verbal + attention + adversity_score)
or should I set it up like this instead:
lm(cortical_thickness ~ (memory + cognitive_flexibility + critical_thinking + verbal + attention)* adversity_score)
In my opinion your question is a more statistical question. And it is not as trivial as it sounds. The question is how to deal with ordinal predictors in the context of multiple linear regression.
The simple answer is treat your 10 scaled ordinal predictor as a continuous variable and then I would use:
model1 <- lm(cortical_thickness ~ memory + cognitive_flexibility + critical_thinking + verbal + attention + adversity_score, data=yourdataset)
How to model depends strong on your data. Therefore I think you should ask this question with your data here: https://stats.stackexchange.com/
Although you can run lm, ordinal regression models do not satisfy the assumptions of linear regression so model comparisons and other tests won't be valid.
R does have ordinal regression functions which you may wish to try. Four such packages are listed here.
Regarding which model to use, run both models and compare them. If fm1 and fm2 are the two models then
anova(fm1, fm2)
will compare them and works for at least clm and polr. See ?clm in ordinal package or ?polr in the MASS package for examples including the use of anova.

How to decide when and how to include covariates in a linear mixed-effects model in lme4

I am running a linear mixed-effects model in R, and I'm not sure how to include a covariate of no interest in the model, or even how to decide if I should do that.
I have two within-subject variables, let's call them A and B with two levels each, with lots of observations per participant. I'm interested in how their interaction changes across 4 groups. My outcome is reaction time. At the simplest level, I have this model:
RT ~ 1 + A*B*Groups + (1+A | Subject ID)
I would like to add Gender as a covariate of no interest. I have no theoretical reason to assume it affects anything, but it's really imbalanced across groups, so I'd like to include it. The first part of my question is: What is the best way to do this?
Is it this model:
RT ~ 1 + A*B*Groups + Gender + (1+A | Subject ID)
or this:
RT ~ 1 + A*B*Groups*Gender + (1+A | Subject ID)
? Or some other way? My worries about this second model is that it somewhat unreasonably inflates the number of terms in the model. Plus I'm worried about overfitting.
The second part of my question: When selecting the best model, when should I add the covariate to see if it makes any difference at all? Let me explain what I mean.
Let's say I start with the simplest model I mentioned above, but without the slope for A, so this:
RT ~ 1 + A*B*Groups + (1| Subject ID)
Should I add the covariate first, either as a main effect ( + Gender) or as part of the interaction (*Gender), and then see if adding a slope for A makes a difference (by using the anova() function), or can I go ahead with adding the slope (which is theoretically more important) first, and then see if gender matters at all?
Following are some suggestions regarding your two questions.
I would recommend an iterative modelling strategy.
Start with
RT ~ 1 + A*B*Groups*Gender + (1+A | Subject ID)
and see if the problem is tractable. Above model will include both additive effects as well as all interaction terms between A, B, Groups and Gender.
If the problem is not tractable, discard the interaction terms between Gender and the other covariates, and model
RT ~ 1 + A*B*Groups + Gender + (1+A | Subject ID)
It's difficult to make a statement about potential overfitting without any details on the number of observations.
Concerning your second question: Generally, I would recommend a Bayesian approach; take a look at the rstan-based brms R package, which allows you to use the same lme4/glmm formula syntax, making it easy to translate models. Model comparison and predictive performance are very broad terms. There exist various ways to explore and compare the predictive performance of these type of nested/hierarchical Bayesian models. See for example the papers by Piironi and Vehtari and Vehtari and Ojanen.

How to convert Afex or car ANOVA models to lmer? Observed variables

In the afex package we can find this example of ANOVA analysis:
data(obk.long, package = "afex")
# estimate mixed ANOVA on the full design:
# can be written in any of these ways:
aov_car(value ~ treatment * gender + Error(id/(phase*hour)), data = obk.long,
observed = "gender")
aov_4(value ~ treatment * gender + (phase*hour|id), data = obk.long,
observed = "gender")
aov_ez("id", "value", obk.long, between = c("treatment", "gender"),
within = c("phase", "hour"), observed = "gender")
My question is, How can I write the same model in lme4?
In particular, I don't know how to include the "observed" term?
If I just write
lmer(value ~ treatment * gender + (phase*hour|id), data = obk.long,
observed = "gender")
I get an error telling that observed is not a valid option.
Furthermore, if I just remove the observed option lmer produces the error:
Error: number of observations (=240) <= number of random effects (=240) for term (phase * hour | id); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable.
Where in the lmer syntax do I specify the "between" or "within" variable?. As far as I know you just write the dependent variable on the left side and all other variables on the right side, and the error term as (1|id).
The package "car" uses the idata for the intra-subject variable.
I might not know enough about classical ANOVA theory to answer this question completely, but I'll take a crack. First, a couple of points:
the observed argument appears only to be relevant for the computation of effect size.
observed: ‘character’ vector indicating which of the variables are
observed (i.e, measured) as compared to experimentally
manipulated. The default effect size reported (generalized
eta-squared) requires correct specification of the obsered [sic]
(in contrast to manipulated) variables.
... so I think you'd be safe leaving it out.
if you want to override the error you can use
control=lmerControl(check.nobs.vs.nRE="ignore")
... but this probably isn't the right way forward.
I think but am not sure that this is the right way:
m1 <- lmer(value ~ treatment * gender + (1|id/phase:hour), data = obk.long,
control=lmerControl(check.nobs.vs.nRE="ignore",
check.nobs.vs.nlev="ignore"),
contrasts=list(treatment=contr.sum,gender=contr.sum))
This specifies that the interaction of phase and hour varies within id. The residual variance and (phase by hour within id) variance are confounded (which is why we need the overriding lmerControl() specification), so don't trust those particular variance estimates. However, the main effects of treatment and gender should be handled just the same. If you load lmerTest instead of lmer and run summary(m1) or anova(m1) it gives you the same degrees of freedom (10) for the fixed (gender and treatment) effects that are computed by afex.
lme gives comparable answers, but needs to have the phase-by-hour interaction constructed beforehand:
library(nlme)
obk.long$ph <- with(obk.long,interaction(phase,hour))
m2 <- lme(value ~ treatment * gender,
random=~1|id/ph, data = obk.long,
contrasts=list(treatment=contr.sum,gender=contr.sum))
anova(m2,type="marginal")
I don't know how to reconstruct afex's tests of the random effects.
As Ben Bolker correctly says, simply leave observed out.
Furthermore, I would not recommend to do what you want to do. Using a mixed model for a data set without replications within each cell of the design per participant is somewhat questionable as it is not really clear how to specify the random effects structure. Importantly, the Barr et al. maxim of "keep it maximal" does not work here as you realized. The problem is that the model is overparametrized (hence the error from lmer).
I recommend using the ANOVA. More discussion on exactly this question can be found on a crossvalidated thread where Ben and me discussed this more thoroughly.

Mixed Modelling - Different Results between lme and lmer functions

I am currently working through Andy Field's book, Discovering Statistics Using R. Chapter 14 is on Mixed Modelling and he uses the lme function from the nlme package.
The model he creates, using speed dating data, is such:
speedDateModel <- lme(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality,
random = ~1|participant/looks/personality)
I tried to recreate a similar model using the lmer function from the lme4 package; however, my results are different. I thought I had the proper syntax, but maybe not?
speedDateModel.2 <- lmer(dateRating ~ looks + personality + gender +
looks:gender + personality:gender +
(1|participant) + (1|looks) + (1|personality),
data = speedData, REML = FALSE)
Also, when I run the coefficients of these models I notice that it only produces random intercepts for each participant. I was trying to then create a model that produces both random intercepts and slopes. I can't seem to get the syntax correct for either function to do this. Any help would be greatly appreciated.
The only difference between the lme and the corresponding lmer formula should be that the random and fixed components are aggregated into a single formula:
dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+ (1|participant/looks/personality)
using (1|participant) + (1|looks) + (1|personality) is only equivalent if looks and personality have unique values at each nested level.
It's not clear what continuous variable you want to define your slopes: if you have a continuous variable x and groups g, then (x|g) or equivalently (1+x|g) will give you a random-slopes model (x should also be included in the fixed-effects part of the model, i.e. the full formula should be y~x+(x|g) ...)
update: I got the data, or rather a script file that allows one to reconstruct the data, from here. Field makes a common mistake in his book, which I have made several times in the past: since there is only a single observation in the data set for each participant/looks/personality combination, the three-way interaction has one level per observation. In a linear mixed model, this means the variance at the lowest level of nesting will be confounded with the residual variance.
You can see this in two ways:
lme appears to fit the model just fine, but if you try to calculate confidence intervals via intervals(), you get
intervals(speedDateModel)
## Error in intervals.lme(speedDateModel) :
## cannot get confidence intervals on var-cov components:
## Non-positive definite approximate variance-covariance
If you try this with lmer you get:
## Error: number of levels of each grouping factor
## must be < number of observations
In both cases, this is a clue that something's wrong. (You can overcome this in lmer if you really want to: see ?lmerControl.)
If we leave out the lowest grouping level, everything works fine:
sd2 <- lmer(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+
(1|participant/looks),
data=speedData)
Compare lmer and lme fixed effects:
all.equal(fixef(sd2),fixef(speedDateModel)) ## TRUE
The starling example here gives another example and further explanation of this issue.

Resources