Problem with heteroscedastic residuals using lme with varIdent - r

I´m having problems when I try to fix heteroscedasticity in mixed models with lme.
My experimental design consists of 12 independent enclosures (Encl) with populations of lizards (Subject_ID). We applied 2 crossed treatments (Lag: 3 levels and Var: 2 levels). And we repeated the experiment two years (Year), so individuals that survived the first year, were measured again the next year. We analyse the snout vent length (SVL) in mm. Sex (males and females). This was my model:
ctrl <- lmeControl(maxIter=200, msMaxIter=200, msVervose=TRUE, opt="optim")
options (contrast=c(factor="contr.sum", ordered="contr.poly"))
model.SVL <- lme(SVL~Lag*Var*Sex*Year, random=list(~1|Subject_ID, ~1|Encl), control=ctrl, data=data)
enter image description here
enter image description here
enter image description here
The model showed heteroscedasticity in several triple interactions using bartlett.test, so I corrected it with varIdent. However, heteroscedasticity was not fixed, and now, qqplot indicates leptokurtic distribution.
model.SVL2 <- lme(SVL~Lag*Var*Sex*Year, random=list(~1|Subject_ID, ~1|Encl), control=ctrl, weights=varIdent (form=~1|Lag*Var*Sex*Year), data=data)
What could be the problem?
I think the problem is using varIdent when I include Subject_ID as a random factor. If I remove it, this doesn't happen. Maybe it is because many individuals do not survive two years, and it is a random factor with many levels but few replications

Related

Crossed or nested random effects in lme

I have doubts about how to specify the random effects structure in my mixed models.
My experimental design consists of 12 independent enclosures (Encl) with populations of lizards (Subject*_ID).* We applied 2 crossed treatments (Lag: 3 levels and Var: 2 levels). And we repeated the experiment two years (Year), so individuals that survived the first year, were measured again the next year. We analyse the snout vent length (SVL) in mm. Sex (males and females). Individuals were redistributed to different enclosures and treatments the second year, so I include the interaction of Encl:Year in a new column (Encl_Year).
This was my model:
ctrl <- lmeControl(maxIter=200, msMaxIter=200, msVervose=TRUE, opt="optim")
options (contrast=c(factor="contr.sum", ordered="contr.poly"))
model.SVL <- lme(SVL~Lag*Var*Sex*Year, random=list(~1|Subject_ID, ~1|Encl_Year), weight=varIdent(form=~1|Lag*Var*Sex), control=ctrl, data=data)
But I don't know how it would be correct to define random effects. Since it would not be a cross random effect model, because not all levels (Subject_ID) are replicated in the enclosures (Encl:Year), but it is not nested either, because there are repeated individuals in different enclosures. What would be the most correct way to write the model?
Depending on the order:
random=list(~1|Subject_ID, ~1|Encl_Year)
or
random=list(~1|Encl_Year, ~1|Subject_ID)
, the results change quite a lot. I also tried a cross random effect model:
data$Dummy <- factor(1)
data <- groupedData(SVL ~ 1 | Dummy, data)
model.SVL <- lme(SVL~Lag*Var*Sex*Year, random=pdBlocked(list(pdIdent(~ 0 + Subject_ID), pdIdent(~ 0 + Encl_Year))) control=ctrl, weight=varIdent(form=~1|Lag*Var*Sex), data=data)
I should add, that I use the lme function, because there is heteroscedasticity in the residuals that I have corrected with the varident function.

Diagnostic plots fail with LMMs

I've been working on the following problem recently: We sent 18 people, 9 each, several times to two different clubs "N" and "O". These people arrived at the club either between 8 and 10 am (10) or between 10 and 12 pm (12). Each club consists of four sectors with ascending price classes. At the end of each test run, the subjects filled out a questionnaire reflecting a score for their satisfaction depending on the different parameters. The aim of the study is to find out how satisfaction can be modelled as a function of the club. You can download the data as csv for one week with this link (without spaces): https: // we.tl/t-I0UXKYclUk
After some try and error, I fitted the following model using the lme4 package in R (the other models were singular, had too strong internal correlations or higher AIC/BIC):
mod <- lmer(Score ~ Club + (1|Sector:Subject) + (1|Subject), data = dl)
Now I wanted to create some diagnostic plots as indicated here.
plot(resid(mod), dl$Score)
plot(mod, col=dl$Club)
library(lattice)
qqmath(mod, id=0.05)
Unfortunately, it turns out that there are still patterns in the residuals that can be attributed to the club but are not captured by the model. I have already tried to incorporate the club into the random effects, but this leads to singularities. Does anyone have a suggestion on how I can deal with these patterns in the residuals? Thank you!

How to add level2 predictors in multilevel regression (package nlme)

I have a question concerning multi level regression models in R, specifically how to add predictors for my level 2 "measure".
Please consider the following example (this is not a real dataset, so the values might not make much sense in reality):
date id count bmi poll
2012-08-05 1 3 20.5 1500
2012-08-06 1 2 20.5 1400
2012-08-05 2 0 23 1500
2012-08-06 2 3 23 1400
The data contains
different persons ("id"...so it's two persons)
the body mass index of each person ("bmi", so it doesn't vary within an id)
the number of heart problems each person has on a specific day ("count). So person 1 had three problems on August the 5th, whereas person 2 had no difficulties/problems on that day
the amount of pollutants (like Ozon or sulfit dioxide) which have been measured on that given day
My general research question is, if the amount of pollutants effects the numer of heart problems in the population.
In a first step, this could be a simple linear regression:
lm(count ~ poll)
However, my data for each day is so to say clustered within persons. I have two measures from person 1 and two measures from person 2.
So my basic idea was to set up a multilevel model with persons (id) as my level 2 variable.
I used the nlme package for this analysis:
lme(fixed=count ~ poll, random = ~poll|id, ...)
No problems so far.
However, the true influence on level 2 might not only come from the fact that I have different persons. Rather it would be much more likely that the effect WITHIN a person might come from his or her bmi (and many other person related variables, like age, amount of smoking and so on).
To make a longstory short:
How can I specify such level two predictors in the lme function?
Or in other words: How can I setup a model, where the relationship between heart problems and pollution is different/clustered/moderated by the body mass index of a person (and as I said maybe additionally by this person's amount of smoking or age)
Unfortunately, I don't have a clue, how to tell R, what I want. I know oif other software (one of them called HLM), which is capable of doing waht I want, but I'm quite sure that R can this as well...
So, many thanks for any help!
deschen
Short answer: you do not have to, as long as you correctly specify random effects. The lme function automatically detects which variables are level 1 or 2. Consider this example using Oxboys where each subject was measured 9 times. For the time being, let me use lmer in the lme4 package.
library(nlme)
library(dplyr)
library(lme4)
library(lmerTest)
Oxboys %>% #1
filter(as.numeric(Subject)<25) %>% #2
mutate(Group=rep(LETTERS[1:3], each=72)) %>% #3
lmer(height ~ Occasion*Group + (1|Subject), data=.) %>% #4
anova() #5
Here I am picking 24 subjects (#2) and arranging them into 3 groups (#3) to make this data balanced. Now the design of this study is a split-plot design with a repeated-measures factor (Occasion) with q=9 levels and a between-subject factor (Group) with p=3 levels. Each group has n=8 subjects. Occasion is a level-1 variable while Group is level 2.
In #4, I did not specify which variable is level 1 or 2, but lmer gives you correct output. How do I know it is correct? Let us check the multi-level model's degrees of freedom for the fixed effects. If your data is balanced, the Kenward–Roger approximation used in the lmerTest will give you exact dfs and F/t-ratios according to this article. That is, in this example dfs for the test of Group, Occasion, and their interaction should be p-1=2, q-1=8, and (p-1)*(q-1)=16, respectively. The df for the Subject error term is (n-1)p = 21 and the df for the Subject:Occasion error term is p(n-1)(q-1)=168. In fact, these are the "exact" values we get from the anova output (#5).
I do not know what algorithm lme uses for approximating dfs, but lme does give you the same dfs. So I am assuming that it is accurate.

Multiple comparisions using glht with repeated measure anova

I'm using the following code to try to get at post-hoc comparisons for my cell means:
result.lme3<-lme(Response~Pressure*Treatment*Gender*Group, mydata, ~1|Subject/Pressure/Treatment)
aov.result<-aov(result.lme3, mydata)
TukeyHSD(aov.result, "Pressure:Treatment:Gender:Group")
This gives me a result, but most of the adjusted p-values are incredibly small - so I'm not convinced the result is correct.
Alternatively I'm trying this:
summary(glht(result.lme3,linfct=mcp(????="Tukey")
I don't know how to get the Pressure:Treatment:Gender:Group in the glht code.
Help is appreciated - even if it is just a link to a question I didn't find previously.
I have 504 observations, Pressure has 4 levels and is repeated in each subject, Treatment has 2 levels and is repeated in each subject, Group has 3 levels, and Gender is obvious.
Thanks
I solved a similar problem creating a interaction dummy variable using interaction() function which contains all combinations of the leves of your 4 variables.
I made many tests, the estimates shown for the various levels of this variable show the joint effect of the active levels plus the interaction effect.
For example if:
temperature ~ interaction(infection(y/n), acetaminophen(y/n))
(i put the possible leves in the parenthesis for clarity) the interaction var will have a level like "infection.y:acetaminophen.y" which show the effect on temperature of both infection, acetaminophen and the interaction of the two in comparison with the intercept (where both variables are n).
Instead if the model was:
temperature ~ infection(y/n) * acetaminophen(y/n)
to have the same coefficient for the case when both vars are y, you would have had to add the two simple effect plus the interaction effect. The result is the same but i prefer using interaction since is more clean and elegant.
The in glht you use:
summary(glht(model, linfct= mcp(interaction_var = 'Tukey'))
to achieve your post-hoc, where interaction_var <- interaction(infection, acetaminophen).
TO BE NOTED: i never tested this methodology with nested and mixed models so beware!

3 way anova nested in r

I'm trying to figure out the model for a fully factorial experiment.
I have the following factors
Treatment Day Hour Subject ResponseVariable
10 days of measurements, 4 different time points within each day, 2 different treatments measured, 12 subjects )6 subjects within treatment 1, and 6 different subjects in treatment 2)
for each day I measured: 6 subjects in treatment 1, the other 6 in treatment 2, at 4 different time points.
For Subjects, I have 12 different subjects, but Subjects 1-6 are in Treatment-1 and Subjects 7-12 are in Treatment-2. The subjects did not change treatments, thus I measured the same set of subjects for each treatment each of the 10 days
So what's tripping me up is specifying the correct error term.
I thought I had the general model down but R is giving me "Error() model is singular"
aov(ResponseVariable ~ T + R + S + TR + TS + RS + Error(T/S)
any thoughts would help?
I've gotten the same error, and I think my problem was missing observations. Are you missing any observations? I believe they're less of a problem for linear mixed effects, and I've read that some people use lme instead of repeated-measures ANOVA for those cases.
Your error term can be interpreted as "the S effect within each T". It sounds from your description as though that's what you want, so I don't think that's what's causing your error message.
One note: I see you've got a variable named "T". R let you do that? T is normally reserved for meaning "TRUE". That might be part of your problem.

Resources