Boundary (singular) fit in lmer - r

I know this error has already been issued in stackoverflow, but the solution for the other questions doesn't seem to apply to my problem.
I have a very simple model that predicts energy expenditure based on the number of days.
a<-lmer(energy ~ days + (1|PCBType), data = stp_summary_v1 )
and the model gives the warning:
boundary (singular) fit: see ?isSingular
I cannot share the data, but here is the distribution:
What I've already tried without success:
a<-lmer(log(energy) ~ days + (1|PCBType), data = stp_summary_v1)
a<-lmer(scale(energy) ~ days + (1|PCBType), data = stp_summary_v1)
a<-lmer(log(energy) ~ log(days) + (1|PCBType), data = stp_summary_v1)
add more independent variables
change glmer() family
change the independent variable
Any idea why I keep getting this warning?

With only two levels of PCBType, this variable should be a fixed effect.
By specifying it as random you are asking the software to estimate a variance for a normally distributed variable from only 2 observations, which of course does not make any sense and is almost certainly the cause of the singular fit.

Related

Syntax error when fitting a Bayesian logistic regression

I am attempting to model binary species traits, where presence is represented by 1 and absence by 0, as a function of some sampling variables. To accomplish this, I have constructed a brms model and added a phylogenetic structure to it. Here is the model I used:
model <- brms::brm(male_head | trials(1 + 0) ~
PC1 + PC2 + PC3 +
(1|gr(phylo, cov = covariance_matrix)),
data = data,
family = binomial(),
prior = prior,
data2 = list(covariance_matrix = covariance_matrix))
Each line of my df represents one observation with a binary outcome.
Initially, I was unsure about which arguments to use in the trials() function. Since my species are non-repeated and some have the traits I'm modeling while others do not, I thought that trials(1 + 0) might be appropriate. I recall seeing a vignette that suggested this, but I can't find it now. Is this syntax correct?
Furthermore, for some reason I'm unaware, the model is producing one estimate value for each line of my predictors. As my df has 362 lines, the model summary displays a lengthy list of 362 estimate values. I would prefer to have one estimate value for each sampling variable instead. Although I have managed to achieve this by making the treatment effect a random effect (i.e., (1|PC1) + (1|PC2) + (1|PC3)), I don't think this is the appropriate approach. I also tried bernoulli() but no success either. Do you have any suggestions for how I can address this issue?
EDIT:
For some reason the values of my sampling variables/principal components were being read as factors. The second part of this question was solved.

With a lmer model, can I extract the fitted values of 'y' for the whole model

I am a novice with R and this is a very basic question. I am using lmer to fit a mixed model to a data frame, as follows:
model1=lmer(Mass~Season + Area + Month + (1|Season:Month), data=Transdata)
and then using ggplot2 to plot the fitted data and various diagnostics. For example, for fitted values:
ggplot(model1, aes(x = Season, y = Mass)) + geom_point()
gives me a plot of Mass per Season, shown separately for each of the 3 different areas and 4 different months. Is there a way in which I can get a single estimate of the mean Mass per Season integrated across the different Areas and Months (i.e. from the fixed effects), and e.g. the SEs for each estimate?
Probably the easiest way to do this is
library(emmeans)
emmeans(model1, ~ Season)
I think you could reparameterize/modify the formula so that the parameters corresponded to means per season (rather than the default, which is to fit an intercept corresponding to the first season and then parameterize the model in terms of differences between seasons), but using emmeans is easier.
Is there a way in which I can get a single estimate of the mean Mass per Season integrated across the different Areas and Months (i.e. from the fixed effects), and e.g. the SEs for each estimate?
If I have understood the question properly, surely just the output from summary(model1) will provide this. It will give a separate estimate for each level of Season, apart from the reference level, and each estimate is then the expected difference in Mass for each Season relative to the reference level, keeping the other fixed effects constant, which would seem to answer your question.
Edit: After re-reading the question, the title seems to ask a different question to the body. As for the title:
With a lmer model, can I extract the fitted values of 'y' for the whole model
Yes, you can simply run
fitted(model1)

What will be best "formula" for this mixed effects model

I have following study which I want to analyze with Mixed effects model:
"Subjects" are divided in two "Group" (Treatment A and B).
"Weight" is recorded before and 3 months ("Time") after treatment (repeated measures).
Need to correct for subjects "age" and "gender" also.
Main question is: Whether two groups differ in their effect on weight?
For Mixed effects, I was considering following syntax with lmer function of lme4 package:
lmer(weight ~ Group*Time + age, (1|subject) + (1|gender), data=mydata)
Is this syntax correct or do I need to use more complex terms such as ones given below:
(time|subject)
(time + 1|subject)
(1|subject) + (1|Group:subject) + (1|Time:subject)
I have tried to see different sources on the internet but literature seems to be very confusing.
gender should not be a random effect (intercept). It doesn't meet any of the usual requirements for it to be treated as random.
(time|subject)
and
(time + 1|subject)
are the same. It means you are allowing the fixed effect of time to vary at different levels of subject
(1|subject) + (1|Group:subject) + (1|Time:subject)
makes very little sense. This says that Time is nested in subject because (1|Time:subject) is the samee as (1|subject:Time) and (1|subject) + (1|subject:Time) is the definition of how to specify nested random effects. The addition of (1|Group:subject) seems bizarre and I would be surprised if such a model is identified. Your research question is "Whether two groups differ" so this means you want to know the fixed effect of Group, so (1|Group:subject) does not make sense.
The model:
lmer(weight ~ Group*Time + age + gender, (1|subject), data=mydata)
makes sense.
Finally, this question should be on Cross Validated.

How to convert Afex or car ANOVA models to lmer? Observed variables

In the afex package we can find this example of ANOVA analysis:
data(obk.long, package = "afex")
# estimate mixed ANOVA on the full design:
# can be written in any of these ways:
aov_car(value ~ treatment * gender + Error(id/(phase*hour)), data = obk.long,
observed = "gender")
aov_4(value ~ treatment * gender + (phase*hour|id), data = obk.long,
observed = "gender")
aov_ez("id", "value", obk.long, between = c("treatment", "gender"),
within = c("phase", "hour"), observed = "gender")
My question is, How can I write the same model in lme4?
In particular, I don't know how to include the "observed" term?
If I just write
lmer(value ~ treatment * gender + (phase*hour|id), data = obk.long,
observed = "gender")
I get an error telling that observed is not a valid option.
Furthermore, if I just remove the observed option lmer produces the error:
Error: number of observations (=240) <= number of random effects (=240) for term (phase * hour | id); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable.
Where in the lmer syntax do I specify the "between" or "within" variable?. As far as I know you just write the dependent variable on the left side and all other variables on the right side, and the error term as (1|id).
The package "car" uses the idata for the intra-subject variable.
I might not know enough about classical ANOVA theory to answer this question completely, but I'll take a crack. First, a couple of points:
the observed argument appears only to be relevant for the computation of effect size.
observed: ‘character’ vector indicating which of the variables are
observed (i.e, measured) as compared to experimentally
manipulated. The default effect size reported (generalized
eta-squared) requires correct specification of the obsered [sic]
(in contrast to manipulated) variables.
... so I think you'd be safe leaving it out.
if you want to override the error you can use
control=lmerControl(check.nobs.vs.nRE="ignore")
... but this probably isn't the right way forward.
I think but am not sure that this is the right way:
m1 <- lmer(value ~ treatment * gender + (1|id/phase:hour), data = obk.long,
control=lmerControl(check.nobs.vs.nRE="ignore",
check.nobs.vs.nlev="ignore"),
contrasts=list(treatment=contr.sum,gender=contr.sum))
This specifies that the interaction of phase and hour varies within id. The residual variance and (phase by hour within id) variance are confounded (which is why we need the overriding lmerControl() specification), so don't trust those particular variance estimates. However, the main effects of treatment and gender should be handled just the same. If you load lmerTest instead of lmer and run summary(m1) or anova(m1) it gives you the same degrees of freedom (10) for the fixed (gender and treatment) effects that are computed by afex.
lme gives comparable answers, but needs to have the phase-by-hour interaction constructed beforehand:
library(nlme)
obk.long$ph <- with(obk.long,interaction(phase,hour))
m2 <- lme(value ~ treatment * gender,
random=~1|id/ph, data = obk.long,
contrasts=list(treatment=contr.sum,gender=contr.sum))
anova(m2,type="marginal")
I don't know how to reconstruct afex's tests of the random effects.
As Ben Bolker correctly says, simply leave observed out.
Furthermore, I would not recommend to do what you want to do. Using a mixed model for a data set without replications within each cell of the design per participant is somewhat questionable as it is not really clear how to specify the random effects structure. Importantly, the Barr et al. maxim of "keep it maximal" does not work here as you realized. The problem is that the model is overparametrized (hence the error from lmer).
I recommend using the ANOVA. More discussion on exactly this question can be found on a crossvalidated thread where Ben and me discussed this more thoroughly.

Mixed Modelling - Different Results between lme and lmer functions

I am currently working through Andy Field's book, Discovering Statistics Using R. Chapter 14 is on Mixed Modelling and he uses the lme function from the nlme package.
The model he creates, using speed dating data, is such:
speedDateModel <- lme(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality,
random = ~1|participant/looks/personality)
I tried to recreate a similar model using the lmer function from the lme4 package; however, my results are different. I thought I had the proper syntax, but maybe not?
speedDateModel.2 <- lmer(dateRating ~ looks + personality + gender +
looks:gender + personality:gender +
(1|participant) + (1|looks) + (1|personality),
data = speedData, REML = FALSE)
Also, when I run the coefficients of these models I notice that it only produces random intercepts for each participant. I was trying to then create a model that produces both random intercepts and slopes. I can't seem to get the syntax correct for either function to do this. Any help would be greatly appreciated.
The only difference between the lme and the corresponding lmer formula should be that the random and fixed components are aggregated into a single formula:
dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+ (1|participant/looks/personality)
using (1|participant) + (1|looks) + (1|personality) is only equivalent if looks and personality have unique values at each nested level.
It's not clear what continuous variable you want to define your slopes: if you have a continuous variable x and groups g, then (x|g) or equivalently (1+x|g) will give you a random-slopes model (x should also be included in the fixed-effects part of the model, i.e. the full formula should be y~x+(x|g) ...)
update: I got the data, or rather a script file that allows one to reconstruct the data, from here. Field makes a common mistake in his book, which I have made several times in the past: since there is only a single observation in the data set for each participant/looks/personality combination, the three-way interaction has one level per observation. In a linear mixed model, this means the variance at the lowest level of nesting will be confounded with the residual variance.
You can see this in two ways:
lme appears to fit the model just fine, but if you try to calculate confidence intervals via intervals(), you get
intervals(speedDateModel)
## Error in intervals.lme(speedDateModel) :
## cannot get confidence intervals on var-cov components:
## Non-positive definite approximate variance-covariance
If you try this with lmer you get:
## Error: number of levels of each grouping factor
## must be < number of observations
In both cases, this is a clue that something's wrong. (You can overcome this in lmer if you really want to: see ?lmerControl.)
If we leave out the lowest grouping level, everything works fine:
sd2 <- lmer(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+
(1|participant/looks),
data=speedData)
Compare lmer and lme fixed effects:
all.equal(fixef(sd2),fixef(speedDateModel)) ## TRUE
The starling example here gives another example and further explanation of this issue.

Resources