Split plot design with nested random effects and interactions between random effects - r

I am trying to analyse a split plot design for a plant growth experiment with these variables:
Biomass (dependent variable)
Transect (sub plot factor with three levels)
Treatment (main plot factor with two levels)
Block (2 blocks in total, serving as replicates of the treatment)
Location (multiple locations within each transect point)
I know what the random effect structure should look like. However, I can’t work out how to write this in R script. Could someone please help me? It’s probably very easy, but I have been looking for hours and hours and can’t find it.
Random effects should be:
Block
Interaction Block and Treatment
Location nested within Transect
(Location nested within Transect), interaction with Treatment
So perhaps something like:
(1|block) + (1|block*treatment) + (1|location:transect) +
(1|(location:transect)*treatment)

OK, I'll take a shot at this.
First: in 'modern' mixed model approaches it is not practical to treat a two-level categorical variable as random. In 'classical' method-of-moment/SSQ ratio approaches it works, although the power is terrible; in modern methods you will end up with 'singular models' (do a web search for "GLMM FAQ" or search here and on CrossValidated for more info). (The exception to this statement is if you go full-Bayesian and put regularizing priors on the random-effects parameters ...) Therefore, I'm going to take block as a fixed effect.
This would be (I think) your maximal model:
~ treatment*block + (treatment|transect/location)
treatment*block (expands to 1 + block + treatment + block:treatment: the baseline biomass (intercept) could differ between blocks, the treatments could differ, the treatment effect could differ between blocks
(treatment|transect/location) (expands to (1+treatment|transect) + (1+treatment|transect:location)); the intercept and treatment effect vary among transects and among locations within transects. (This assumes that transects are uniquely coded between blocks, i.e. you don't have a transect 001 in both blocks, rather they are labeled something like A001 and B001. If not, you need something like (1+treatment|block:(transect/location)) ...
This also assumes you have multiple observations per transect/location/treatment combination. If not (if each treatment is observed only once per location), then the full interaction will be confounded with the residual variation and you instead need something like (1+treatment|transect) + (1|transect:location).

Related

Linear mixed effects models in R - mixed advice on random effects factors with less than 5 levels

I ran an experiment where I moved subjects arm about the elbow to a reference position, returned it to a home position and then asked them to try and replicate the position, all without vision of their arm. I then measured the error in their position matching as an estimate of their upper limb position sense acuity. This experiment aimed to compare position sense acuity between a group of older and young adults.
The experiment was designed such that subjects performed 4 repeats at each of 3 reference positions for both extension and flexion movements of the elbow. To make best use of the data (avoid averaging and potential loss of data across repeated measure levels with a mixed ANOVA), I would like to analyse the effect of age group (2 levels) on matching error whilst controlling for reference position (3 levels) and movement direction (2 levels) in a linear mixed effects model, but I’m having some issues working out how to model the random effects.
On the one hand, I have fairly consistently read that that random effects factors typically need a minimum of 5-6 levels to achieve a robust estimate of variance (e.g. pg.33, https://lme4.r-forge.r-project.org/book/Ch2.pdf and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5970551/) which makes me think that it would not be possible to model reference position (3 levels) and movement direction (2 levels) with random intercepts in this way…
Err_model_1 <- lmer(error ~ age_group + (1|subjects) + (1|move_direct) + (1|ref_pos))
In which case I was considering including them as fixed factors instead, however, I have also read that for designs using within-subjects measures as fixed factors, they should also be modelled with a random slope for a maximal model that minimizes Type I error rates (Barr 2013; https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00328/full) which somewhat contradicts the minimum 5-6 number of levels rule, but I think would look as follows…
Err_model_2 <- lmer(error ~ age_group * move_direct * ref_pos + (1 + move_direct + ref_pos | subjects)
Is there something specific to using the within-subjects measures as
fixed factors in Err_model_2 that allows you to model them as a
random effect validly?
Is there another way I would be able to model movement direction and reference position as random effects in this
kind of model?
Any other help or comments would be appreciated, thanks!

Lavaan - CFA - categorical variables - the last threshold is strange

I want to perform a multiple group CFA with lavaan in R.
I have several categorical variables and some variables contains 11 categories. So these variables will have 10 thresholds. In the results below you can see thatthe 10th threshold is smaller than the 9th, i.e., it is not in the creasing order.
Several variables with 11 categories have the same problem.
Question:
Why are the thresholds distorted?
R-code:
model2<-'range = ~ NA*gvjbevn + gvhlthc + gvslvol + gvslvue + gvcldcr + gvpdlwk
goals = ~ NA*sbprvpv + sbeqsoc + sbcwkfm
range~~1*range
goals~~1*goals
gvhlthc ~~ gvslvol
gvcldcr ~~ gvpdlwk
'
cfa.model2<-cfa(model2, ordered=varcat, estimator="WLSMV",data=sub)
summary(cfa.model2,fit.measures=TRUE,standardized=TRUE, modindices=TRUE)
Label assignation of the thresholds was sorted alphabetically, aka c('t1','t10','t2','t3'....) but summary() sorts it ""properly"".
You can try to add additional factors to check if your scale corresponds to:
c('t1','t10','t11','t12',...,'t2','t3'....)
Not much you can do on your side, except understand which row is each of your factors.
Well, it seems like I cannot add a comment due to not having enough reputation, so I can only reply with an answer, although this is not a proper answer (it will definitely not solve your issue, though I hope it points in the right direction).
For your example to be reproducible, you should provide the community with the data to fit the model.
On the other side, I guess your problem must have to do with the nature of the category: it's possible that your 11th category does not mean "the most level of agreement" with the item, or that the response categories are not ordered from 1 to 11, or something similar. Given that the rest of the thresholds seem to accurately represent a continuous, monotonically increasing scale, and that this same problem precisely happens in the same category in different variables (at least the two that you are showing), there must be something with the response scale in those items.
In summary, it seems to be more of a problem of interpretation of the parameters of the model rather than a statistical issue.

Incorporating time series into a mixed effects model in R (using lme4)

I've had a search for similar questions and come up short so apologies if there are related questions that I've missed.
I'm looking at the amount of time spent on feeders (dependent variable) across various conditions with each subject visiting feeders 30 times.
Subjects are exposed to feeders of one type which will have a different combination of being scented/unscented, having visual patterns/being blank, and having these visual or scented patterns presented in one of two spatial arrangements.
So far my model is:
mod<-lmer(timeonfeeder ~ scent_yes_no + visual_yes_no +
pattern_one_or_two + (1|subject), data=data)
How can I incorporate the visit numbers into the model to see if these factors have an effect on the time spent on the feeders over time?
You have a variety of choices (this question might be marginally better for CrossValidated).
as #Dominix suggests, you can allow for a linear increase or decrease in time on feeder over time. It probably makes sense to allow this change to vary across birds:
timeonfeeder ~ time + ... + (time|subject)
you could allow for an arbitrary pattern of change over time (i.e. not just linear):
timeonfeeder ~ factor(time) + ... + (1|subject)
this probably doesn't make sense in your case, because you have a large number of observations, so it would require many parameters (it would be more sensible if you had, say, 3 time points per individual)
you could allow for a more complex pattern of change over time via an additive model, i.e. modeling change over time with a cubic spline. For example:
library(mgcv)
gamm(timeonfeeder ~ s(time) + ... , random = ~1|subject
(1) this assumes the temporal pattern is the same across subjects; (2) because gamm() uses lme rather than lmer under the hood you have to specify the random effect as a separate argument. (You could also use the gamm4 package, which uses lmer under the hood.)
You might want to allow for temporal autocorrelation. For example,
lme(timeonfeeder ~ time + ... ,
random = ~ time|subject,
correlation = corAR1(form= ~time|subject) , ...)

Repeated-measures ANOVA in R: +Error(subject) or + Error(subject/ VI1 * VI2)?

I have spent a lot of time on multiple posts and tutorials, but I still do not understand wich "rule" I have to apply to my current data, and why.
My experiment follows a within-subject design, as every subjects (n=17) performed a task in 2 conditions, accross 5 blocks of trials. VD is the mean RTs, fixed effects are condition and block, random effect is subject.
I would like to analyse the interaction between condition and block.
Using aov
I first aggregated my data:
ag<-aggregate(RT~condition+block+subject,data=d, FUN=mean)
But then I don't know if I have to include my within-subject factors into my Error term:
(1)aov<-aov(RT~ condition * block + Error(subject/(condition * block)), data=ag)
OR
(2)aov<-aov(RT~ condition * block + Error(subject), data=ag)
I have seen on several posts that the within factors have to be included in the error term, as in (1), but I do not understand how the dfs are calculated.
Using lmer
Additionally, I would like to attempt using lmer instead of aov.
I suspect that the equivalent of (2) would be:
lmer(RT ~ 1+(1|sujet)+condition*block, ag)
But if it is the (1) which is the correct one, I can not figure out how does it would have to be specified using lmer.

Multiple comparisions using glht with repeated measure anova

I'm using the following code to try to get at post-hoc comparisons for my cell means:
result.lme3<-lme(Response~Pressure*Treatment*Gender*Group, mydata, ~1|Subject/Pressure/Treatment)
aov.result<-aov(result.lme3, mydata)
TukeyHSD(aov.result, "Pressure:Treatment:Gender:Group")
This gives me a result, but most of the adjusted p-values are incredibly small - so I'm not convinced the result is correct.
Alternatively I'm trying this:
summary(glht(result.lme3,linfct=mcp(????="Tukey")
I don't know how to get the Pressure:Treatment:Gender:Group in the glht code.
Help is appreciated - even if it is just a link to a question I didn't find previously.
I have 504 observations, Pressure has 4 levels and is repeated in each subject, Treatment has 2 levels and is repeated in each subject, Group has 3 levels, and Gender is obvious.
Thanks
I solved a similar problem creating a interaction dummy variable using interaction() function which contains all combinations of the leves of your 4 variables.
I made many tests, the estimates shown for the various levels of this variable show the joint effect of the active levels plus the interaction effect.
For example if:
temperature ~ interaction(infection(y/n), acetaminophen(y/n))
(i put the possible leves in the parenthesis for clarity) the interaction var will have a level like "infection.y:acetaminophen.y" which show the effect on temperature of both infection, acetaminophen and the interaction of the two in comparison with the intercept (where both variables are n).
Instead if the model was:
temperature ~ infection(y/n) * acetaminophen(y/n)
to have the same coefficient for the case when both vars are y, you would have had to add the two simple effect plus the interaction effect. The result is the same but i prefer using interaction since is more clean and elegant.
The in glht you use:
summary(glht(model, linfct= mcp(interaction_var = 'Tukey'))
to achieve your post-hoc, where interaction_var <- interaction(infection, acetaminophen).
TO BE NOTED: i never tested this methodology with nested and mixed models so beware!

Resources