Okay, I have students in classrooms in schools. I want to know if test score number depends on your school.
my basic model is:
basemodel <- lmer(test ~ schoolnumber +
(1 | schoolnumber/classnumber), data=mydata)
Do I want to try and add in the student level?
Doesn't work:
model1 <- lmer(test ~ schoolnumber +
(1 | schoolnumber/classnumber/ studentID), data=ED)
Doesn't work:
model2 <- lmer(test ~ schoolnumber +
(1 | schoolnumber/classnumber) +( 1 |studentID), data=ED)
Doesn't work:
model3 <- lmer(test ~ schoolnumber +
(1 + studentID | schoolnumber/classnumber), data=ED)
model4 <- lmer(test ~ schoolnumber + studentID +
(1 | schoolnumber/classnumber), data=ED)
When I add student ID it says
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Also my current test score is a standardised score taken from raw scores, then z scores then linear transformation (standard scores); 100 + 15(z).
Am I okay to use these linear transformed scores or should I be using something else? I've seen code elsewhere saying to use scale()?
As Roland says, if schoolnumber is categorical/a factor variable, then your first model should fail:
~ schoolnumber + (1 | schoolnumber/classnumber)
includes schoolnumber as both a fixed categorical predictor and as a random effects grouping variable. ~ (1|schoolnumber/classnumber) would make more sense.
If you get rid of schoolnumber as a fixed effect predictor, then
~ (1 | schoolnumber/classnumber) + (1|studentID)
should work. I wouldn't recommend adding studentID as a fixed effect.
I'm assuming that students are labeled uniquely, i.e. that there isn't a student 1A57 in school number 1 and a different student 1A57 in school number 2 ...
How large is your data set at each level (observations, students, classes, schools)? I'm guessing that students are nested within schools but crossed among classes, i.e. each student is in only one school but in more than one class. As long as you have students labeled uniquely, it won't matter as much how you specify the model.
Related
I am trying to determine whether there is a significant effect of treatment on microbiome diversity between two timepoints (two timepoints x three treatments).
Can somebody please explain how to model this using linear mixed models using the nlme library in R?
Particularly how to handle repeated sampling of the same subject over time.
I have seen the three following syntaxes used but don't really understand the difference between them.
model1 <- lme(diversity ~ treatment * timepoint,
random = ~ 1 | mouseID,
data = alpha_df)
model2 <- lme(diversity ~ treatment * timepoint,
random = ~ timepoint | mouseID,
data = alpha_df)
model3 <- lme(shannon ~ treatment * timepoint,
random = ~ 1 + timepoint | mouse,
data = alpha_df)
I think model3 is the correct one for my use but I am not sure.
Thanks in advance!
~ 1 | mouse means "one intercept per mouse". There is a main, fixed intercept (actually there are three intercepts, one per treatment), and the random intercepts of the mice are normally distributed around the main intercept.
~ timepoint | mouse is the same as ~ 1 + timepoint | mouse. It means "one regression line (i.e. an intercept and a slope) per mouse". There is a main slope (actually three main slopes because of the interaction term with the treatments) and the random slopes are normally distributed around the main slope.
So the "biggest" model is ~ 1 + timepoint | mouse. If there is a biological justification that the mice have the same diversity value at time 0, you can drop the intercept: random = ~ 0 + timepoint.
level 1 variable:
income - continuous
level 2 variable:
state's general whether: three leveled categorical variable: hot/moderate/cool
used effect coded, and generate two variables because it has three levels.
(weather_ef1, weather_ef2)
enrolled in university - binary : yes/no ( effect coded. yes = -1, no =1)
DV:
math score
grouping variable: household
model 1: (fixed slope)
Dv is predicted by income, enrollment, and the interaction between enrollment and income.
in this case,
lmer(y~ 1 + income + enrollment +income*enrollment+ (1|householdID), data=data)
lmer(y~ 1 + income + enrollment +income:enrollment+ (1|householdID), data=data)
: is it for interaction? or * is it for interaction?
further, do I have to do factor(enrollment)?
or is it okay because it is already effect coded?
model 2: (fixed slope)
DV is predicted by income, weather, and interaction between income and weather
lmer( y ~ 1 + income + weather_ef1 + weather_ef2 + weather_ef1*income
+ weather_ef2*income +(1|houshold_id), data)
lmer ( y ~ l + income + weather_ef1+ weather_ef2 + weather_ef1:income
+ weather_ef2:income + (1|houshold_id), data)
Still confusing * is right or: is right.
I think the effect code variables are already effect coded, so I don't have to
do use the factor(weather_ef1) things.
From the documentation (use ?formula):
The * operator denotes factor crossing: a*b interpreted as a+b+a:b.
In other words a*b adds the main effects of a and b and their interaction. So in your model when you use income*enrollment this is the same as income + enrollment +income:enrollment. The two versions you described for each model should give identical results. You could just have used:
lmer(y~ 1 + income*enrollment+ (1|householdID), data=data)
which also describes the same model.
If your variables are effect coded then you don't need to use factor but be careful about the interpretation of the effects.
I'm analyzing some longitudinal data using lme4 package (lmer function) with 3 Levels: measurement points nested in individuals nested in households. I'm interested in linear and non-linear change curves surrounding a specific life event. My model has many time predictors (indicating linear change before and after the event occurs and indicating non-linear change (i.e., squared time variables) before and after the event occurs). Additionally, I have several Level-2 predictors that do not vary with time (i.e., personality traits) and some control variables (e.g., age, gender). So far I did not include any random slopes or cross-level interactions.
This is my model code:
model.RI <- lmer(outcome ~ time + female_c + age_c + age_c2 + preLin + preLin.sq + postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c + (1 | ID) + (1 | House))
outcome = my dependent variable
time = year 1, year 2, year 3 ... (until year 9); this variable symbolizes something like a testing effect
female_c = gender centered
age_c = age centered
age_c2 = age squared centered
preLin = time variable indicating time to the event (this variable is 0 after the event has occurred and is -1 e.g. one year ahead of the event, -2 two years ahead of the event etc.)
preLin.sq = squared values of preLin
postLin = time variable indicating time after the event (this variable is 0 before the event and increases after the event has occurred; e.g. is +1 one year after the event)
postLin.sq = squared values of postLin
per1.c until per5.c = personality traits on Level 2 (centered)
ID = indicating the individuum
House = indicating the household
I was wondering how I could plot the predicted values of this lmer model (e.g., using ggplot2?). I've plotted change curves using the method=gam in R. This is a rather data-driven method to inspect the data without pre-defining if the curve is linear or quadratic or whatever. I would now like to check whether my parametric lmer model is comparable to that data-driven gam-plot I already have. Do you have any advise how to do this?
I would be more than happy to get some help on this! Please also feel free to ask if I was not precise enough on my explanation of what I would like to do!
Thanks a lot!
Follow this link: This is how my gam-plot looks like and I hope to get something similar when plotting the predicted values of my lmer model!
You can use the ggpredict()-function from the ggeffects-package. If you want to plot predicted values of time (preLin), you would simply write:
ggpredict(model.RI, "preLin")
The function returns a data frame (see articles), which you can use in ggplot, but you can also directly plot the results:
ggpredict(model.RI, "preLin") %>% plot()
or
p <- ggpredict(model.RI, "preLin")
plot(p)
You could also use the sjPlot-package, however, for marginal effects / predicted values, the sjPlot::plot_model()-function internally just calls ggeffects::ggpredict(), so the results would basically be identical.
Another note to your model: if you have longitudinal data, you should also include your time-variable as random slope. I'm not sure how postLin acutally refers to preLin, but if preLin captures all your measurements, you should at least write your model like this:
model.RI <- lmer(
outcome ~ time + female_c + age_c + age_c2 + preLin + preLin.sq +
postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c +
(1 + preLin | ID) + (1 + preLin | House)
)
If you also assume a quadratic trend for each person (ID), you could even add the squared term as random slope.
As your figure example suggests using splines, you could also try this:
library(splines)
model.RI <- lmer(
outcome ~ time + female_c + age_c + age_c2 + bs(preLin)
postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c +
(1 + preLin | ID) + (1 + preLin | House)
)
p <- ggpredict(model.RI, "preLin")
plot(p)
Examples for splines are also demonstrated on the website I mentioned above.
Edit:
Another note is related to nesting: you're currently modelling a fully crossed or cross-classified model. If it's completely nested, the random parts would look like this:
... + (1 + preLin | House / ID)
(see also this small code-example).
I constructed a mixed effect model with three fixed effects and one random effect.
mdl1 <- lmer(yld.res ~ veg + rep + rip + (1|state),
REML=FALSE,data=data2)
I want to get the most parsimonious model from the above model. To do this, I want to drop one independent variable at a time and see if it improved the fit of the model (by looking at the AICc value). But when I use drop1, it gives me the following error:
drop1(mdl1, test="F")
Error in match.arg(test) : 'arg' should be one of “none”, “Chisq”, “user”
I am not really sure how to go about this and would really appreciate any help.
If you just use drop1() with the default test="none" it will give you the AIC values corresponding to the model with each fixed effect dropped in turn.
Here's a slightly silly example (it probably doesn't make sense to test the model with a quadratic but no linear term):
library('lme4')
fm1 <- lmer(Reaction ~ Days + I(Days^2) + (Days | Subject), sleepstudy)
drop1(fm1)
## Single term deletions
##
## Model:
## Reaction ~ Days + I(Days^2) + (Days | Subject)
## Df AIC
## <none> 1764.3
## Days 1 1769.2
## I(Days^2) 1 1763.9
How badly do you need AICc rather than AIC? That could be tricky/require some hacking ...
This is a question about lme() syntax. My response variable is 'response'. My fixed variable is 'year'. I have 2 random variables: 'student' which is nested within 'school'.
I want to include a year*school interaction, but I do NOT want to include a year*student one.
This is the syntax I have so far, but this seems to include 2 random interactions where I only want 1.
lme1 = lme(response ~ 1 + year, random = ~ year | school/student,
method = "REML", data = data)
This would be much easier to do in lmer from the lme4 package. Assuming each student has a unique identifier, it would look like this (untested).
lmer(response ~ year + (year | school) + (1 | student), data = data)