How to indicate paired observations with lmer mixed models - r

I am fairly new to linear mixed models, and I'm trying to generate a model using lmer in which I test the effects of:
Group (fixed): 2 levels
Treatment (fixed): 2 levels (unstimulated and
stimulated)
Group * Treatment
on the dependent variable "Outcome", considering the random effect of "Subject".
In this experiment, each subject in the two groups had one arm stimulated and one unstimulated.
So far, the model I came up with is
lmer(Outcome ~ Group + Treatment + Group*Treatment + (1|Subject), REML=FALSE, data= data)
However, I'm not sure of how to specify that each subject has one arm unstimulated and one stimulated.
Can anybody please help?

If your question is more about an appropriate model specification for your case, I would say that it depends on your study and your goals. What you are describing is in line with your formula, and it makes sense as it is. You are already accounting for the Student effect with the (1|Student) and Treatment specifies the treated arm and the non treated arm. I would suggest you to check out this post which discusses fixed and mixed effects
Regarding the way of specifying models in lmer using formulas, my first comment is that the following 3 are equivalent:
Outcome ~ Group + Treatment + Group*Treatment
Outcome ~ Group + Treatment + Group:Treatment
Outcome ~ Group*Treatment
The third is a compact form of the second and the first is redundant. Then I would suggest you to try the following alternatives which are valid too, so that you get more familiar with the formula notation
model2 <- lmer(Outcome ~ Treatment +(1+Treatment|Group)+(1|Subject), REML=FALSE, data= data);coef(summary(model2));ranef(model2)
model3 <- lmer(Outcome ~ Treatment +(0+Treatment|Group)+(1|Subject), REML=FALSE, data= data);coef(summary(model3));ranef(model3)
model4 <- lmer(Outcome ~ Treatment +(1|Group)+(1|Subject), REML=FALSE, data= data);coef(summary(model4));ranef(model4)

Related

What will be best "formula" for this mixed effects model

I have following study which I want to analyze with Mixed effects model:
"Subjects" are divided in two "Group" (Treatment A and B).
"Weight" is recorded before and 3 months ("Time") after treatment (repeated measures).
Need to correct for subjects "age" and "gender" also.
Main question is: Whether two groups differ in their effect on weight?
For Mixed effects, I was considering following syntax with lmer function of lme4 package:
lmer(weight ~ Group*Time + age, (1|subject) + (1|gender), data=mydata)
Is this syntax correct or do I need to use more complex terms such as ones given below:
(time|subject)
(time + 1|subject)
(1|subject) + (1|Group:subject) + (1|Time:subject)
I have tried to see different sources on the internet but literature seems to be very confusing.
gender should not be a random effect (intercept). It doesn't meet any of the usual requirements for it to be treated as random.
(time|subject)
and
(time + 1|subject)
are the same. It means you are allowing the fixed effect of time to vary at different levels of subject
(1|subject) + (1|Group:subject) + (1|Time:subject)
makes very little sense. This says that Time is nested in subject because (1|Time:subject) is the samee as (1|subject:Time) and (1|subject) + (1|subject:Time) is the definition of how to specify nested random effects. The addition of (1|Group:subject) seems bizarre and I would be surprised if such a model is identified. Your research question is "Whether two groups differ" so this means you want to know the fixed effect of Group, so (1|Group:subject) does not make sense.
The model:
lmer(weight ~ Group*Time + age + gender, (1|subject), data=mydata)
makes sense.
Finally, this question should be on Cross Validated.

How to find overall significance for main effects in a dummy interaction using anova()

I run a Cox Regression with two categorical variables (x1 and x2) and their interaction. I need to know the significance of the overall effect of x1, x2 and of the interaction.
The overall effect of the interaction:
I know how do find out the overall effect of the interaction using anova():
library(survival)
fit_x1_x2 <- coxph(Surv(time, death) ~ x1 + x2 , data= df)
fit_full <- coxph(Surv(time, death) ~ x1 + x2 + x1:x2, data= df)
anova(fit_x1_x2, fit_full)
But how are we supposed to use anova() to find out the overall effect of x1 or x2? What I tried is this:
The overall effect of x1
fit_x2_ia <- coxph(Surv(time, death) ~ x2 + x1:x2, data= df)
fit_full <- coxph(Surv(time, death) ~ x1 + x2 + x1:x2, data= df)
anova(fit_x2_ia, fit_full)
The overall effect of x2
fit_x1_ia <- coxph(Surv(time, death) ~ x1 + x1:x2, data= df)
fit_full <- coxph(Surv(time, death) ~ x1 + x2 + x1:x2, data= df)
anova(fit_x1_ia, fit_full)
I am not sure whether this is how we are supposed to use anova(). The fact that the output shows degree of freedom is zero makes me sceptical. I am even more puzzled that both times, for the overall effect of x1 and x2, the test is significant, although the log likelihood values of the models are the same and the Chi value is zero.
Here is the data I used
set.seed(1) # make it reproducible
df <- data.frame(x1= rnorm(1000), x2= rnorm(1000)) # generate data
df$death <- rbinom(1000,1, 1/(1+exp(-(1 + 2 * df$x1 + 3 * df$x2 + df$x1 * df$x2)))) # dead or not
library(tidyverse) # for cut_number() function
df$x1 <- cut_number(df$x1, 4); df$x2 <- cut_number(df$x2, 4) # make predictors to groups
df$time <- rnorm(1000); df$time[df$time<0] <- -df$time[df$time<0] # add survival times
The two models you have constructed for "overall effect" do really not appear to satisfy the statistical property of being hierarchical, i.e properly nested. Specifically, if you look at the actual models that get constructed with that code you should see that they are actually the same model with different labels for the two-way crossed effects. In both cases you have 15 estimated coefficients (hence zero degrees of freedom difference) and you will not that the x1 parameter in the full model has the same coefficient as the x2[-3.2532,-0.6843):x1[-0.6973,-0.0347) parameter in the "reduced" model looking for an x1-effect, namely 0.19729. The crossing operator is basically filling in all the missing cells for the main effects with interaction results.
There really is little value in looking at interaction models without all of the main effects if you want to stay within the bounds of generally accepted statistical practice.
If you type:
fit_full
... you should get a summary of the model that has p-values for x1 levels, x2 levels,and the interaction levels. Because you chose to categorize these by four arbitrary cutpoints each you will end up with a total of 15 parameter estimates. If instead you made no cuts and modeled the linear effects and the linear-by-linear interaction, you could get three p-values directly. I'm guessing there was suspicion that the effects were not linear and if so I thought a cubic spline model might be more parsimonious and distort the biological reality less than discretization into 4 disjoint levels. If you thought the effects might be non-linear but ordinal, there is an ordinal version of factor classed variables, but the results are generally confusion to the uninitiated.
The answer from 42- is informative but after reading it I still did not know how to determine the three p values or if this is possible at all. Thus I talked to the professor of biostatistic of my university. His answer was quite simple and I share it in case others have similar questions.
In this design it is not possible to determine the three p values for overall effect of x1, x2 and their interaction. If we want to know the p values of the three overall effects, we need to keep the continuous variables as they are. But breaking up the variables into groups answers a different question, hence we can not test the hypothesis of the overall effects no matter which statisstical model we use.

R: Specifying random effects using glmer command

I am analyzing categorical data from a questionnaire conducted in different schools to see what factors might have influenced pupil's responses. I am therefore building a mixed model using the glmer command from R's lme4 package. For each survey question response I have six predictor variables and I want to include School as a random effect in such a way as both the intercept and slope vary by school. I have searched long and hard both online and offline and have found conflicting accounts concerning the correct way to code for this and, being an R novice, am not sure which is right! Here is what I've come up with (where Like is the response variable):
LikeM1 <- glmer(Like ~ Treatment + Regularity + Learn + Age + Gender +
Organisation_Membership_Summary + (1 + Like|School),
data = MagpieData, na.action = "na.omit", family = binomial(logit))
Have I specified School as a random effect correctly so that both the intercept and slope vary by School, or not? I should perhaps mention that being categorical data, all my variables are factors in R.
If you want both the slope and the intercept to vary by group, the general form is: y ~ x + (1 + x | group). In the parentheses, the 1 indicates that the intercept should vary by group, and the x indicates that the coefficient of predictor x should vary by group. You've got a lot of predictors in your model. I'd start with one predictor at a time to make interpretation a bit easier.
I think you want to do this:
LikeM1 <-glmer(Like ~ Treatment + Regularity + Learn + Age + Gender + Organisation_Membership_Summary + (1 | School) + (0 + Treatment + Regularity + Learn + Age + Gender + Organisation_Membership_Summary | School), data = MagpieData, na.action = "na.omit",family = binomial(logit))
The first part of the formula in parentheses is the random intercept and the second is the random slope. This link provides a really good explanation.

Specifying level-3 random intercept in R

I am using the lmer() function (lme4 package) in R for analysing a longitudinal study in which I measured 120 subjects, 6 times. In first instance, I specified a model like this:
library(lme4)
model1 = lmer(DV ~ 1 + X1*X2 + (1+X1|SubjectID), REML="false")
X1 is a time-varying variable (level-1) and X2 is a subject-level variable (level-2).
Because these subjects are nested within several teams, I was advised to include a random intercept at the team-level (level-3). However, I only find how to include both random intercept and slope:
model2 = lmer(DV ~ 1 + X1*X2 + (1+X1|TeamID/SubjectID), REML="false")
Does anyone know how to add only a level-3 random intercept to model 1?
By using the term (1|SubjectID) you're telling the model to expect differing baselines only for different instances of SubjectID. To tell the model to expect different responses of this to the fixed effect X1, we use (1+X1|SubjectID). Therefore, you just need the terms
(1|TeamID) + (1+X1|SubjectID)
in your model.
By the way there's plenty of good information about this on Cross Validated.

Stata's xtlogit (fe, re) equivalent in R?

Stata allows for fixed effects and random effects specification of the logistic regression through the xtlogit fe and xtlogit re commands accordingly. I was wondering what are the equivalent commands for these specifications in R.
The only similar specification I am aware of is the mixed effects logistic regression
mymixedlogit <- glmer(y ~ x1 + x2 + x3 + (1 | x4), data = d, family = binomial)
but I am not sure whether this maps to any of the aforementioned commands.
The glmer command is used to quickly fit logistic regression models with varying intercepts and varying slopes (or, equivalently, a mixed model with fixed and random effects).
To fit a varying intercept multilevel logistic regression model in R (that is, a random effects logistic regression model), you can run the following using the in-built "mtcars" data set:
data(mtcars)
head(mtcars)
m <- glmer(mtcars$am ~ 1 + mtcars$wt + (1|mtcars$gear), family="binomial")
summary(m)
# and you can examine the fixed and random effects
fixef(m); ranef(m)
To fit a varying-intercept slope model in Stata, you of course use the xtlogit command (using the similar but not identical in-built "auto" data set in Stata):
sysuse auto
xtset gear_ratio
xtlogit foreign weight, re
I'll add that I find the entire reference to "fixed" versus "random" effects ambiguous, and I prefer to refer to the structure of the model itself (e.g., are the intercepts varying? which slopes are varying, if any? is the model nested in 2 levels or more? are the levels cross-classified or not?). For a similar view, see Andrew Gelman's thoughts on "fixed" versus "random" effects.
Update: Ben Bolker's excellent comment below points out that in R it's more informative when using predict commands to use the data=mtcars option instead of, say, the dollar notation:
data(mtcars)
m1 <- glmer(mtcars$am ~ 1 + mtcars$wt + (1|mtcars$gear), family="binomial")
m2 <- glmer(am ~ 1 + wt + (1|gear), family="binomial", data=mtcars)
p1 <- predict(m1); p2 <- predict(m2)
names(p1) # not that informative...
names(p2) # very informative!

Resources