How to define random effects in the linear mixed effects model? - r

I read a paper which applied linear mixed-effects model for data analysis. I am confused about defining random effects in the equations.
First, how to define a combined random effect, such as 𝜀𝑓𝑖𝑒𝑙𝑑−𝑠𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦 where 𝑓𝑖𝑒𝑙𝑑 indicates plot number and 𝑠𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦 indicates somewhat classification results.
Second, how to include random effects in the slope term, such as intercept + slope * (var1 + random effect) + residuals
I do not know how to write code to represent this equations.
I expect an expression of these equations.

Like Nate mentioned, the lme4 package will do all that you'd need. Their vignette here will have the examples for your answer, particularly section 2.2.
Simple REs can be written using (1 | group) which will add a group-specific intercept estimated, and a random effect on the intercept varying by group for the fixed effect x let's say, can be written as (1 + x | group).

Related

Multilevel model using glmer: Singularity issue

I'm using R to run a logistic multilevel model with random intercepts. I'm using the frequentist approach (glmer). I'm not able to use Bayesian methods due to the research centre's policy.
When I run my code it says that my model is singular. I'm not sure why or how to fix the issue. Any advice would be appreciated!
More information about the multilevel model I used:
I'm using a multilevel modelling method used in intersectionality research called multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA). The method uses individual level data as level 2 (the intersection group) and nests individuals within their intersections.
My outcome is binary and I have three categorical variables as fixed effects (gender, martial status, and disability). The random effect (level 2) is called intersect1 which includes each unique combination of the categorical variables (gender x marital x disability).
This is the code:
MAIHDA_full <- glmer(IPV_pos ~ factor(sexgender) + factor(marital) + factor(disability) + (1|intersect1), data=Data, family=binomial, control=glmerControl(optimizer=”bobyqa”,optCtrl=list(maxfun=2e5)))
The usual reason for a singular fit with mixed effects models is that either the random structure is overfitted - typically because of the inclusion of random slopes, or in the case such as this where we only have random intercepts, then the variation in the intercepts is so small that the model cannot detect it.
Looking at your model formula I suspect the issue is:
The random effect (level 2) is called intersect1 which includes each unique combination of the categorical variables (gender x marital x disability).
If I have understood this correctly, the model is equivalent to:
IPV_pos ~ sexgender + marital + disability + (1 | sexgender:marital:disability)
It is likely that any variation in sexgender:marital:disability is captured by the fixed effects, leading to near-zero variation in the random intercepts.
I suspect you will find almost identical results if you don't use any random effect.

Mixed effect model or multiple regressions comparison in nested setup

I have a response Y that is a percentage ranging between 0-1. My data is nested by taxonomy or evolutionary relationship say phylum/genus/family/species and I have one continuous covariate temp and one categorial covariate fac with levels fac1 & fac2.
I am interested in estimating:
is there a difference in Y between fac1 and fac2 (intercept) and how much variance is explained by that
does each level of fac responds differently in regard to temp (linearly so slope)
is there a difference in Y for each level of my taxonomy and how much variance is explained by those (see varcomp)
does each level of my taxonomy responds differently in regard to temp (linearly so slope)
A brute force idea would be to split my data into the lowest taxonomy here species, do a linear beta regression for each species i as betareg(Y(i)~temp) . Then extract slope and intercepts for each speies and group them to a higher taxonomic level per fac and compare the distribution of slopes (intercepts) say, via Kullback-Leibler divergence to a distribution that I get when bootstrapping my Y values. Or compare the distribution of slopes (or interepts) just between taxonomic levels or my factor fac respectively.Or just compare mean slopes and intercepts between taxonomy levels or my factor levels.
Not sure is this is a good idea. And also not sure of how to answer the question of how many variance is explained by my taxonomy level, like in nested random mixed effect models.
Another option may be just those mixed models, but how can I include all the aspects I want to test in one model
say I could use the "gamlss" package to do:
library(gamlss)
model<-gamlss(Y~temp*fac+re(random=~1|phylum/genus/family/species),family=BE)
But here I see no way to incorporate a random slope or can I do:
model<-gamlss(Y~re(random=~temp*fac|phylum/genus/family/species),family=BE)
but the internal call to lme has some trouble with that and guess this is not the right notation anyways.
Is there any way to achive what I want to test, not necessarily with gamlss but any other package that inlcuded nested structures and beta regressions?
Thanks!
In glmmTMB, if you have no exact 0 or 1 values in your response, something like this should work:
library(glmmTMB)
glmmTMB(Y ~ temp*fac + (1 + temp | phylum/genus/family/species),
data = ...,
family = beta_family)
if you have zero values, you will need to do something . For example, you can add a zero-inflation term in glmmTMB; brms can handle zero-one-inflated Beta responses; you can "squeeze" the 0/1 values in a little bit (see the appendix of Smithson and Verkuilen's paper on Beta regression). If you have only a few 0/1 values it won't matter very much what you do. If you have a lot, you'll need to spend some serious time thinking about what they mean, which will influence how you handle them. Do they represent censoring (i.e. values that aren't exactly 0/1 but are too close to the borders to measure the difference)? Are they a qualitatively different response? etc. ...)
As I said in my comment, computing variance components for GLMMs is pretty tricky - there's not necessarily an easy decomposition, e.g. see here. However, you can compute the variances of intercept and slope at each taxonomic level and compare them (and you can use the standard deviations to compare with the magnitudes of the fixed effects ...)
The model given here might be pretty demanding, depending on the size of your phylogeny - for example, you might not have enough replication at the phylum level (in which case you could fit the model ~ temp*(fac + phylum) + (1 + temp | phylum:(genus/family/species)), i.e. pull out the phylum effects as fixed effects).
This is assuming that you're willing to assume that the effects of fac, and its interaction with temp, do not vary across the phylogeny ...

Restricted Cubic Spline output in R rms package after cph

I am developing a COX regression model in R.
The model I am currently using is as follows
fh <- cph(S ~ rcs(MPV,4) + rcs(age,3) + BMI + smoking + hyperten + gender +
rcs(FVCPP,3) + TLcoPP, x=TRUE, y=TRUE, surv=TRUE, time.inc=2*52)
If I then want to look at this with
print(fh, latex = TRUE)
I get 3 coefs/SE/Wald etc for MPV (MVP, MVP' and MVP'') and 2 for age (age, age').
Could someone please explain to me what these outputs are? i.e. I believe they are to do with the restricted cubic splines I have added.
When you write rcs(MPV,4), you define the number of knots to use in the spline; in this case 4. Similarly, rcs(age,3) defines a spline with 3 knots. Due to identifiability constraints, 1 knot from each spline is subtracted out. You can think of this as defining an intercept for each spline. So rcs(Age,3) is a linear combination of 2 nonlinear basis functions and an intercept, while rcs(MPV,4) is a linear combination of 3 nonlinear basis functions and an intercept, i.e.,
and
In the notation above, what you get out from the print statement are the regression coefficients and , with corresponding standard errors, p-values etc. The intercepts and are typically set to zero, but they are important, because without them, the model fitting routine how have no idea of where on the y-axis to constrain the splines.
As a final note, you might actually be more interested in the output of summary(fh).

predict and multiplicative variables / interaction terms in probit regressions

I want to determine the marginal effects of each dependent variable in a probit regression as follows:
predict the (base) probability with the mean of each variable
for each variable, predict the change in probability compared to the base probability if the variable takes the value of mean + 1x standard deviation of the variable
In one of my regressions, I have a multiplicative variable, as follows:
my_probit <- glm(a ~ b + c + I(b*c), family = binomial(link = "probit"), data=data)
Two questions:
When I determine the marginal effects using the approach above, will the value of the multiplicative term reflect the value of b or c taking the value mean + 1x standard deviation of the variable?
Same question, but with an interaction term (* and no I()) instead of a multiplicative term.
Many thanks
When interpreting the results of models involving interaction terms, the general rule is DO NOT interpret coefficients. The very presence of interactions means that the meaning of coefficients for terms will vary depending on the other variate values being used for prediction. The right way to go about looking at the results is to construct a "prediction grid", i.e. a set of values that are spaced across the range of interest (hopefully within the domain of data support). The two essential functions for this process are expand.grid and predict.
dgrid <- expand.grid(b=fivenum(data$b)[2:4], c=fivenum(data$c)[2:4]
# A grid with the upper and lower hinges and the medians for `a` and `b`.
predict(my_probit, newdata=dgrid)
You may want to have the predictions on a scale other than the default (which is to return the linear predictor), so perhaps this would be easier to interpret if it were:
predict(my_probit, newdata=dgrid, type ="response")
Be sure to read ?predict and ?predict.glm and work with some simple examples to make sure you are getting what you intended.
Predictions from models containing interactions (at least those involving 2 covariates) should be thought of as being surfaces or 2-d manifolds in three dimensions. (And for 3-covariate interactions as being iso-value envelopes.) The reason that non-interaction models can be decomposed into separate term "effects" is that the slopes of the planar prediction surfaces remain constant across all levels of input. Such is not the case with interactions, especially those with multiplicative and non-linear model structures. The graphical tools and insights that one picks up in a differential equations course can be productively applied here.

repeated measure anova using regression models (LM, LMER)

I would like to run repeated measure anova in R using regression models instead an 'Analysis of Variance' (AOV) function.
Here is an example of my AOV code for 3 within-subject factors:
m.aov<-aov(measure~(task*region*actiontype) + Error(subject/(task*region*actiontype)),data)
Can someone give me the exact syntax to run the same analysis using regression models? I want to make sure to respect the independence of residuals, i.e. use specific error terms as with AOV.
In a previous post I read an answer of the type:
lmer(DV ~ 1 + IV1*IV2*IV3 + (IV1*IV2*IV3|Subject), dataset))
I am really not sure about this solution since it still treats variables as between subjects, and I don't understand how adding random factors would change this.
Does someone know how to run repeated measure anova with lm/lmer taking into account residual independence?
Many thanks,
Solene
I have some worked examples with more detail here: https://keithlohse.github.io/mixed_effects_models/lohse_MER_chapter_02.html
But if you want to get a mixed model that is homologous to your ANOVA, you can include random intercepts for your each subject:factor with your within-subject factors. E.g.,
aov(DV~W1*W2*W3 + Error(SUBJECT/(W1*W2*W3)),data)
has a mixed-model equivalent of:
lmer(speed ~
# Fixed Effects
W1*W2*W3 +
# Random Effects
(1|SUBJECT) + (1|W1:SUBJECT) + (1|W2:SUBJECT) + (1|W3:SUBJECT),
data = DATA,
REML = TRUE)
With REML set to TRUE and a balanced design, you should get degrees of freedom and f-values that are identical to your ANOVA. ML tends to underestimate variance components, so if you are comparing nested models and need to use ML your results will not match precisely. If you are not comparing nested models and can use REML, then the ANOVA and mixed-model should match (again, in a balanced design).
To #skan's earlier answer and other ideas people might have, I am not saying this is THE random-effects structure (as it might be more appropriate to include random slopes for W1 compared to random-intercepts), but if you have one observation per subject:condition, then these random-effects produce an equivalent result.
If your aov example is right (maybe you don't want to nest things) you want this:
lmer(measure~(task*region*actiontype) + 1(1|subject/(task:region:actiontype))
If residual independence means intercept and slope independently calculated you need to specify them separately:
+(1|yourfactors)+(0+variable|yourfactors)
or use the symbol:
+(1||yourfactors)
Anyway if you read the help files you can find that lme4 can't deal with the most general problems.

Resources