I am using the lmer() function (lme4 package) in R for analysing a longitudinal study in which I measured 120 subjects, 6 times. In first instance, I specified a model like this:
library(lme4)
model1 = lmer(DV ~ 1 + X1*X2 + (1+X1|SubjectID), REML="false")
X1 is a time-varying variable (level-1) and X2 is a subject-level variable (level-2).
Because these subjects are nested within several teams, I was advised to include a random intercept at the team-level (level-3). However, I only find how to include both random intercept and slope:
model2 = lmer(DV ~ 1 + X1*X2 + (1+X1|TeamID/SubjectID), REML="false")
Does anyone know how to add only a level-3 random intercept to model 1?
By using the term (1|SubjectID) you're telling the model to expect differing baselines only for different instances of SubjectID. To tell the model to expect different responses of this to the fixed effect X1, we use (1+X1|SubjectID). Therefore, you just need the terms
(1|TeamID) + (1+X1|SubjectID)
in your model.
By the way there's plenty of good information about this on Cross Validated.
Related
I'm using R to run a logistic multilevel model with random intercepts. I'm using the frequentist approach (glmer). I'm not able to use Bayesian methods due to the research centre's policy.
When I run my code it says that my model is singular. I'm not sure why or how to fix the issue. Any advice would be appreciated!
More information about the multilevel model I used:
I'm using a multilevel modelling method used in intersectionality research called multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA). The method uses individual level data as level 2 (the intersection group) and nests individuals within their intersections.
My outcome is binary and I have three categorical variables as fixed effects (gender, martial status, and disability). The random effect (level 2) is called intersect1 which includes each unique combination of the categorical variables (gender x marital x disability).
This is the code:
MAIHDA_full <- glmer(IPV_pos ~ factor(sexgender) + factor(marital) + factor(disability) + (1|intersect1), data=Data, family=binomial, control=glmerControl(optimizer=”bobyqa”,optCtrl=list(maxfun=2e5)))
The usual reason for a singular fit with mixed effects models is that either the random structure is overfitted - typically because of the inclusion of random slopes, or in the case such as this where we only have random intercepts, then the variation in the intercepts is so small that the model cannot detect it.
Looking at your model formula I suspect the issue is:
The random effect (level 2) is called intersect1 which includes each unique combination of the categorical variables (gender x marital x disability).
If I have understood this correctly, the model is equivalent to:
IPV_pos ~ sexgender + marital + disability + (1 | sexgender:marital:disability)
It is likely that any variation in sexgender:marital:disability is captured by the fixed effects, leading to near-zero variation in the random intercepts.
I suspect you will find almost identical results if you don't use any random effect.
I have a question about using linear mixed model effects in R using lmer.
I have a repeated measure experiment with 117 participants. They all perform a task with 5 categories (Prime_Names). The dependent variable is reaction times (Score). I want to compare those 5 categories with each other. There is a lot of missing data so I think a RM anova is not an option.
I have two questions:
Am I using the correct analysis if I do a linear mixed model effect analysis in R with lmer?
I am not sure if my model is completely correct, especially for the random effects. When do you use only "+ (1|Resp_ID)" and when do you use "+ (Prime_Name|Resp_ID)"
Two options:
Option 1:
model <- lmer(Score ~ Prime_Name + (1|Resp_ID), data=df)
Option 2:
model <- lmer(Score ~ Prime_Name + (Prime_Name|Resp_ID), data=df)
Any help will be appreciated.
Thank you
I run a Cox Regression with two categorical variables (x1 and x2) and their interaction. I need to know the significance of the overall effect of x1, x2 and of the interaction.
The overall effect of the interaction:
I know how do find out the overall effect of the interaction using anova():
library(survival)
fit_x1_x2 <- coxph(Surv(time, death) ~ x1 + x2 , data= df)
fit_full <- coxph(Surv(time, death) ~ x1 + x2 + x1:x2, data= df)
anova(fit_x1_x2, fit_full)
But how are we supposed to use anova() to find out the overall effect of x1 or x2? What I tried is this:
The overall effect of x1
fit_x2_ia <- coxph(Surv(time, death) ~ x2 + x1:x2, data= df)
fit_full <- coxph(Surv(time, death) ~ x1 + x2 + x1:x2, data= df)
anova(fit_x2_ia, fit_full)
The overall effect of x2
fit_x1_ia <- coxph(Surv(time, death) ~ x1 + x1:x2, data= df)
fit_full <- coxph(Surv(time, death) ~ x1 + x2 + x1:x2, data= df)
anova(fit_x1_ia, fit_full)
I am not sure whether this is how we are supposed to use anova(). The fact that the output shows degree of freedom is zero makes me sceptical. I am even more puzzled that both times, for the overall effect of x1 and x2, the test is significant, although the log likelihood values of the models are the same and the Chi value is zero.
Here is the data I used
set.seed(1) # make it reproducible
df <- data.frame(x1= rnorm(1000), x2= rnorm(1000)) # generate data
df$death <- rbinom(1000,1, 1/(1+exp(-(1 + 2 * df$x1 + 3 * df$x2 + df$x1 * df$x2)))) # dead or not
library(tidyverse) # for cut_number() function
df$x1 <- cut_number(df$x1, 4); df$x2 <- cut_number(df$x2, 4) # make predictors to groups
df$time <- rnorm(1000); df$time[df$time<0] <- -df$time[df$time<0] # add survival times
The two models you have constructed for "overall effect" do really not appear to satisfy the statistical property of being hierarchical, i.e properly nested. Specifically, if you look at the actual models that get constructed with that code you should see that they are actually the same model with different labels for the two-way crossed effects. In both cases you have 15 estimated coefficients (hence zero degrees of freedom difference) and you will not that the x1 parameter in the full model has the same coefficient as the x2[-3.2532,-0.6843):x1[-0.6973,-0.0347) parameter in the "reduced" model looking for an x1-effect, namely 0.19729. The crossing operator is basically filling in all the missing cells for the main effects with interaction results.
There really is little value in looking at interaction models without all of the main effects if you want to stay within the bounds of generally accepted statistical practice.
If you type:
fit_full
... you should get a summary of the model that has p-values for x1 levels, x2 levels,and the interaction levels. Because you chose to categorize these by four arbitrary cutpoints each you will end up with a total of 15 parameter estimates. If instead you made no cuts and modeled the linear effects and the linear-by-linear interaction, you could get three p-values directly. I'm guessing there was suspicion that the effects were not linear and if so I thought a cubic spline model might be more parsimonious and distort the biological reality less than discretization into 4 disjoint levels. If you thought the effects might be non-linear but ordinal, there is an ordinal version of factor classed variables, but the results are generally confusion to the uninitiated.
The answer from 42- is informative but after reading it I still did not know how to determine the three p values or if this is possible at all. Thus I talked to the professor of biostatistic of my university. His answer was quite simple and I share it in case others have similar questions.
In this design it is not possible to determine the three p values for overall effect of x1, x2 and their interaction. If we want to know the p values of the three overall effects, we need to keep the continuous variables as they are. But breaking up the variables into groups answers a different question, hence we can not test the hypothesis of the overall effects no matter which statisstical model we use.
I am fairly new to linear mixed models, and I'm trying to generate a model using lmer in which I test the effects of:
Group (fixed): 2 levels
Treatment (fixed): 2 levels (unstimulated and
stimulated)
Group * Treatment
on the dependent variable "Outcome", considering the random effect of "Subject".
In this experiment, each subject in the two groups had one arm stimulated and one unstimulated.
So far, the model I came up with is
lmer(Outcome ~ Group + Treatment + Group*Treatment + (1|Subject), REML=FALSE, data= data)
However, I'm not sure of how to specify that each subject has one arm unstimulated and one stimulated.
Can anybody please help?
If your question is more about an appropriate model specification for your case, I would say that it depends on your study and your goals. What you are describing is in line with your formula, and it makes sense as it is. You are already accounting for the Student effect with the (1|Student) and Treatment specifies the treated arm and the non treated arm. I would suggest you to check out this post which discusses fixed and mixed effects
Regarding the way of specifying models in lmer using formulas, my first comment is that the following 3 are equivalent:
Outcome ~ Group + Treatment + Group*Treatment
Outcome ~ Group + Treatment + Group:Treatment
Outcome ~ Group*Treatment
The third is a compact form of the second and the first is redundant. Then I would suggest you to try the following alternatives which are valid too, so that you get more familiar with the formula notation
model2 <- lmer(Outcome ~ Treatment +(1+Treatment|Group)+(1|Subject), REML=FALSE, data= data);coef(summary(model2));ranef(model2)
model3 <- lmer(Outcome ~ Treatment +(0+Treatment|Group)+(1|Subject), REML=FALSE, data= data);coef(summary(model3));ranef(model3)
model4 <- lmer(Outcome ~ Treatment +(1|Group)+(1|Subject), REML=FALSE, data= data);coef(summary(model4));ranef(model4)
I am currently working through Andy Field's book, Discovering Statistics Using R. Chapter 14 is on Mixed Modelling and he uses the lme function from the nlme package.
The model he creates, using speed dating data, is such:
speedDateModel <- lme(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality,
random = ~1|participant/looks/personality)
I tried to recreate a similar model using the lmer function from the lme4 package; however, my results are different. I thought I had the proper syntax, but maybe not?
speedDateModel.2 <- lmer(dateRating ~ looks + personality + gender +
looks:gender + personality:gender +
(1|participant) + (1|looks) + (1|personality),
data = speedData, REML = FALSE)
Also, when I run the coefficients of these models I notice that it only produces random intercepts for each participant. I was trying to then create a model that produces both random intercepts and slopes. I can't seem to get the syntax correct for either function to do this. Any help would be greatly appreciated.
The only difference between the lme and the corresponding lmer formula should be that the random and fixed components are aggregated into a single formula:
dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+ (1|participant/looks/personality)
using (1|participant) + (1|looks) + (1|personality) is only equivalent if looks and personality have unique values at each nested level.
It's not clear what continuous variable you want to define your slopes: if you have a continuous variable x and groups g, then (x|g) or equivalently (1+x|g) will give you a random-slopes model (x should also be included in the fixed-effects part of the model, i.e. the full formula should be y~x+(x|g) ...)
update: I got the data, or rather a script file that allows one to reconstruct the data, from here. Field makes a common mistake in his book, which I have made several times in the past: since there is only a single observation in the data set for each participant/looks/personality combination, the three-way interaction has one level per observation. In a linear mixed model, this means the variance at the lowest level of nesting will be confounded with the residual variance.
You can see this in two ways:
lme appears to fit the model just fine, but if you try to calculate confidence intervals via intervals(), you get
intervals(speedDateModel)
## Error in intervals.lme(speedDateModel) :
## cannot get confidence intervals on var-cov components:
## Non-positive definite approximate variance-covariance
If you try this with lmer you get:
## Error: number of levels of each grouping factor
## must be < number of observations
In both cases, this is a clue that something's wrong. (You can overcome this in lmer if you really want to: see ?lmerControl.)
If we leave out the lowest grouping level, everything works fine:
sd2 <- lmer(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+
(1|participant/looks),
data=speedData)
Compare lmer and lme fixed effects:
all.equal(fixef(sd2),fixef(speedDateModel)) ## TRUE
The starling example here gives another example and further explanation of this issue.