Linear mixed effects models in R - mixed advice on random effects factors with less than 5 levels - r

I ran an experiment where I moved subjects arm about the elbow to a reference position, returned it to a home position and then asked them to try and replicate the position, all without vision of their arm. I then measured the error in their position matching as an estimate of their upper limb position sense acuity. This experiment aimed to compare position sense acuity between a group of older and young adults.
The experiment was designed such that subjects performed 4 repeats at each of 3 reference positions for both extension and flexion movements of the elbow. To make best use of the data (avoid averaging and potential loss of data across repeated measure levels with a mixed ANOVA), I would like to analyse the effect of age group (2 levels) on matching error whilst controlling for reference position (3 levels) and movement direction (2 levels) in a linear mixed effects model, but I’m having some issues working out how to model the random effects.
On the one hand, I have fairly consistently read that that random effects factors typically need a minimum of 5-6 levels to achieve a robust estimate of variance (e.g. pg.33, https://lme4.r-forge.r-project.org/book/Ch2.pdf and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5970551/) which makes me think that it would not be possible to model reference position (3 levels) and movement direction (2 levels) with random intercepts in this way…
Err_model_1 <- lmer(error ~ age_group + (1|subjects) + (1|move_direct) + (1|ref_pos))
In which case I was considering including them as fixed factors instead, however, I have also read that for designs using within-subjects measures as fixed factors, they should also be modelled with a random slope for a maximal model that minimizes Type I error rates (Barr 2013; https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00328/full) which somewhat contradicts the minimum 5-6 number of levels rule, but I think would look as follows…
Err_model_2 <- lmer(error ~ age_group * move_direct * ref_pos + (1 + move_direct + ref_pos | subjects)
Is there something specific to using the within-subjects measures as
fixed factors in Err_model_2 that allows you to model them as a
random effect validly?
Is there another way I would be able to model movement direction and reference position as random effects in this
kind of model?
Any other help or comments would be appreciated, thanks!

Related

Use of svyglm and svydesign with R for multistage stratified cluster design

I have a complicated data set which was made by a multistage stratified cluster design. I had originally analysed this using glm, however now realise that I have to use svyglm. I'm not quite sure about how is best to model the data utilising svyglm. I was wondering if anyone could help shed some light.
I am attempting to see the effect that a variety of covariates taken at time 1 have on a binary outcome taken at time 2.
The sampling strategy was as follows: state -> urban/rural -> district -> subdistrict -> village. Within each village, individuals were randomly selected, with each of these having an id (uniqid).
I have a variable in the df for each of these stages of the sampling strategy. I also have the following variables: outcome, age, sex, income, marital_status, urban_or_rural_area, uniqid, weights. The formula that I want for my regression equation is outcome ~ age + sex + income + marital_status + urban_or_rural_area . Weights are coded by the weights variable. I had set the family to binomial(link = logit).
If anyone has any idea how such an approach could be coded in R with svyglm I would be most appreciative. I'm quite confused as to what should be inputted as ID, fpc and nest. Do I have to specify all levels of the stratified design or just some?
Any direction, or resources which explain this well would be massively appreciated.
You don't really give enough information about the design: which of the geographical units are strata and which are clusters. For example, my guess is that you sample both urban and rural in all states, and you don't sample all villages, but I don't know whether you sample all districts or subdistricts. I also don't know whether your overall sampling fraction is large or small (so whether the with-replacement approximation is ok)
Let's pretend you sample just some districts, so districts are your Primary Sampling Units, and that the overall sampling fraction of people is small. The design command is
your_design <- svydesign(id=~district, weights=~weights,
strata=~interaction(state, urban_rural,drop=TRUE),
data=your_data_frame)
That is, the strata are combinations of state and urban/rural and any combinations that aren't in your data set don't exist in the population (maybe some states are all-rural or all-urban). Within each stratum you have districts, and only some of these appear in the sample. In your geographical hierarchy, districts are then the first level that is sampled rather than exhaustively enumerated.
You don't need fpc unless you want to specify the full multistage design without replacement.
The nest option is not about how the survey was done but is about how variables are coded. The US National Center for Health Statistics (bless their hearts) set up a lot of designs that have many strata and two primary sampling units per stratum. They call these primary sampling units 1 and 2; that is, they reuse the names 1 and 2 in every stratum. The svydesign function is set up to expect different sampling unit names in different strata, and to verify that each sampling unit name appears in just one stratum, as a check against data errors. This check has to be disabled for NCHS surveys and perhaps some others that also reuse sampling unit names. You can always leave out the nest option at first; svydesign will tell you if it might be needed.
Finally, the models:
svyglm(outcome ~ age + sex + income + marital_status + urban_or_rural_area,
design=your_design, family=quasibinomial)
Using binomial or quasibinomial will give identical answers, but using binomial will give you a harmless warning about non-integer weights. If you use quasibinomial, the harmless warning is suppressed.

Split plot design with nested random effects and interactions between random effects

I am trying to analyse a split plot design for a plant growth experiment with these variables:
Biomass (dependent variable)
Transect (sub plot factor with three levels)
Treatment (main plot factor with two levels)
Block (2 blocks in total, serving as replicates of the treatment)
Location (multiple locations within each transect point)
I know what the random effect structure should look like. However, I can’t work out how to write this in R script. Could someone please help me? It’s probably very easy, but I have been looking for hours and hours and can’t find it.
Random effects should be:
Block
Interaction Block and Treatment
Location nested within Transect
(Location nested within Transect), interaction with Treatment
So perhaps something like:
(1|block) + (1|block*treatment) + (1|location:transect) +
(1|(location:transect)*treatment)
OK, I'll take a shot at this.
First: in 'modern' mixed model approaches it is not practical to treat a two-level categorical variable as random. In 'classical' method-of-moment/SSQ ratio approaches it works, although the power is terrible; in modern methods you will end up with 'singular models' (do a web search for "GLMM FAQ" or search here and on CrossValidated for more info). (The exception to this statement is if you go full-Bayesian and put regularizing priors on the random-effects parameters ...) Therefore, I'm going to take block as a fixed effect.
This would be (I think) your maximal model:
~ treatment*block + (treatment|transect/location)
treatment*block (expands to 1 + block + treatment + block:treatment: the baseline biomass (intercept) could differ between blocks, the treatments could differ, the treatment effect could differ between blocks
(treatment|transect/location) (expands to (1+treatment|transect) + (1+treatment|transect:location)); the intercept and treatment effect vary among transects and among locations within transects. (This assumes that transects are uniquely coded between blocks, i.e. you don't have a transect 001 in both blocks, rather they are labeled something like A001 and B001. If not, you need something like (1+treatment|block:(transect/location)) ...
This also assumes you have multiple observations per transect/location/treatment combination. If not (if each treatment is observed only once per location), then the full interaction will be confounded with the residual variation and you instead need something like (1+treatment|transect) + (1|transect:location).

Fitting random factors for a linear model using lme4

I have 4 random factors and I want to provide its linear model using lme4. But struggled to fit the model.
Assuming A is nested within B (2 levels), which in turn nested within each of xx preceptors (P). All responded to xx Ms (M).
I want to fit my model to get variances for each factor and their interactions.
I have used the following codes to fit the model, but I was unsuccessful.
lme4::lmer(value ~ A +
(1 + A|B) +
(1 + P|A),
(1+ P|M),
data = myData, na.action = na.exclude)
I also read interesting materials here, but Still, I struggle to fit the model. Any help?
At a guess, if the nesting structure is ( P (teachers) / B (occasions) / A (participants) ), meaning that the occasions for one teacher are assumed to be completely independent of the occasions for any other teacher, and that participants in turn are never shared across occasions or teachers, but questions (M) are shared across all teachers and occasions and participants:
value ~ 1 + (1| P / B / A) + (1|M)
Some potential issues:
as you hint in the comments, it may not be practical to fit random effects for factors with small numbers of levels (say, < 5); this is likely to lead to the dreaded "singular model" message (see the GLMM FAQ for more detail).
if all of the questions (M) are answered by every participant, then in principle it's possible to fit a model that takes account of the among-question correlation within participants: the maximal model would be ~ 1 + (M | P / B / A) (which would look for among-question correlations at the level of teacher, occasion within teacher, and participant within occasion within teacher). However, this is very unlikely to work in practice (especially if each participant answers each question only once, in which case the teacher:occasion:participant:question variance will be confounded with the residual variance in a linear model). In this case, you will get an error about "probably unidentifiable": see e.g. this question for more explanation/detail.

Incorporating time series into a mixed effects model in R (using lme4)

I've had a search for similar questions and come up short so apologies if there are related questions that I've missed.
I'm looking at the amount of time spent on feeders (dependent variable) across various conditions with each subject visiting feeders 30 times.
Subjects are exposed to feeders of one type which will have a different combination of being scented/unscented, having visual patterns/being blank, and having these visual or scented patterns presented in one of two spatial arrangements.
So far my model is:
mod<-lmer(timeonfeeder ~ scent_yes_no + visual_yes_no +
pattern_one_or_two + (1|subject), data=data)
How can I incorporate the visit numbers into the model to see if these factors have an effect on the time spent on the feeders over time?
You have a variety of choices (this question might be marginally better for CrossValidated).
as #Dominix suggests, you can allow for a linear increase or decrease in time on feeder over time. It probably makes sense to allow this change to vary across birds:
timeonfeeder ~ time + ... + (time|subject)
you could allow for an arbitrary pattern of change over time (i.e. not just linear):
timeonfeeder ~ factor(time) + ... + (1|subject)
this probably doesn't make sense in your case, because you have a large number of observations, so it would require many parameters (it would be more sensible if you had, say, 3 time points per individual)
you could allow for a more complex pattern of change over time via an additive model, i.e. modeling change over time with a cubic spline. For example:
library(mgcv)
gamm(timeonfeeder ~ s(time) + ... , random = ~1|subject
(1) this assumes the temporal pattern is the same across subjects; (2) because gamm() uses lme rather than lmer under the hood you have to specify the random effect as a separate argument. (You could also use the gamm4 package, which uses lmer under the hood.)
You might want to allow for temporal autocorrelation. For example,
lme(timeonfeeder ~ time + ... ,
random = ~ time|subject,
correlation = corAR1(form= ~time|subject) , ...)

Standardisation in MuMIn package in R

I am using the 'MuMIn' package in R to select models and calculate effect sizes of the input variables (rain, brk, onset, wid). To make my effect size comparable between variables, I standardised them using standardize function in arm package. Here is the code that I am following:
For reference, please refer to the appendix of this paper: http://onlinelibrary.wiley.com/doi/10.1111/j.1420-9101.2010.02210.x/full
Grueber et al. 2011: Multimodel inference in ecology and evolution: challenges and solutions
data1<-read.csv("data.csv",header=TRUE) #reads the data
global.model<-lmer(yld.res ~ rain + brk + onset + wid + (1|state),data=data1,REML="FALSE") # prepares a global model
stdz.model <- standardize(global.model,standardize.y = FALSE) # standardise the input varaibles
model.set <- dredge(stdz.model) ### generates the full submodel set
top.models <- get.models(model.set, subset= delta<2) # selects models with delta AIC <2
model.avg(top.models) # calculates the average effect size of input variables
Here is the result of model.avg(top.models) which gives the average effect size of each input variable
Coefficients:
(Intercept) brk rain wid onset
subset -4.281975e-14 -106.0919 51.54688 39.82837 35.68766
I read around how the standardize function works- subtracts mean and divides by 2SD.
My question is this: Since I have standardised the input variables, should not the effect sizes be between -1 to 1? or the effect size which the output shows is correct?
Please advise
Thanks a lot
This is more of a statistical question than a programming question, but: you've only standardized the predictor variables, not the response variable (you specified standardize.y=FALSE); therefore, each of your coefficients represents the expected change of the response (in the response's units!) per 2 SD change in the predictor. If the range of the response is large (as it must be in your example), then there could be a very large change. For example, if I were analyzing the change in elephant weight measured in milligrams, I could expect very large changes in the response for reasonably small changes in the predictors (e.g. sex, age, food availability). You should probably use standardize.y=TRUE if you want truly nondimensional/unitless effect sizes. Even nondimensional effects aren't necessarily constrained to be between -1 and +1, but it would be surprising for them to be so large.
By the way, I think your standardize function comes from the arm package, not from MuMIn (library("sos"); findFn("standardize",sortby="Function)).

Resources