Root mean square error in R - mixed effect model - r

Could you please tell me how to get/compute the value RMSE (root mean square error) in R when you perform a mixed effect model
Data: na.omit(binh)
AIC BIC logLik
888.6144 915.1201 -436.3072
Random effects:
Formula: ~1 | Study
(Intercept) Residual
StdDev: 3.304345 1.361858
Fixed effects: Eeff ~ ADF + CP + DE + ADF2 + DE2
Value Std.Error DF t-value p-value
(Intercept) -0.66390 18.870908 158 -0.035181 0.9720
ADF 1.16693 0.424561 158 2.748556 0.0067
CP 0.25723 0.097524 158 2.637575 0.0092
DE -36.09593 12.031791 158 -3.000046 0.0031
ADF2 -0.03708 0.011014 158 -3.366625 0.0010
DE2 4.77918 1.932924 158 2.472513 0.0145
Correlation:
(Intr) ADF CP DE ADF2
ADF -0.107
CP -0.032 0.070
DE 0.978 -0.291 -0.043
ADF2 0.058 -0.982 -0.045 0.250
DE2 -0.978 0.308 0.039 -0.997 -0.265
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.28168116 -0.45260885 0.06528363 0.57071734 2.54144168
Number of Observations: 209
Number of Groups: 46

You don't give details of what function you used to make your model, but they tend to store their residuals using the same name, which you could check with str(), and RMSE is easily calculated from the residuals:
#make a model
library(nlme)
r <- lme(conc ~ age, data=IGF)
#get the RMSE
r.rmse <- sqrt(mean(r$residuals^2))
And in comments below, Ben Bolker points out that objects made by model fitting functions should have a residuals method, making it possible to do this (although some types of models may return residuals that have been transformed):
r.rmse <- sqrt(mean(residuals(r)^2))

The same result can be obtained from:
library(nlme)
library(sjstats)
fit <- lmer(Yield ~ Species + (1|Population/variety), data = df1,REML=T)
rmse(fit)

Related

Interpreting output from emmeans::contrast

I have data from a longitudinal study and calculated the regression using the lme4::lmer function. After that I calculated the contrasts for these data but I am having difficulty interpreting my results, as they were unexpected. I think I might have made a mistake in the code. Unfortunately I couldn't replicate my results with an example, but I will post both the failed example and my actual results below.
My results:
library(lme4)
library(lmerTest)
library(emmeans)
#regression
regmemory <- lmer(memory ~ as.factor(QuartileConsumption)*Age+
(1 + Age | ID) + sex + education +
HealthScore, CognitionData)
#results
summary(regmemory)
#Fixed effects:
# Estimate Std. Error df t value Pr(>|t|)
#(Intercept) -7.981e-01 9.803e-02 1.785e+04 -8.142 4.15e-16 ***
#as.factor(QuartileConsumption)2 -8.723e-02 1.045e-01 2.217e+04 -0.835 0.40376
#as.factor(QuartileConsumption)3 5.069e-03 1.036e-01 2.226e+04 0.049 0.96097
#as.factor(QuartileConsumption)4 -2.431e-02 1.030e-01 2.213e+04 -0.236 0.81337
#Age -1.709e-02 1.343e-03 1.989e+04 -12.721 < 2e-16 ***
#sex 3.247e-01 1.520e-02 1.023e+04 21.355 < 2e-16 ***
#education 2.979e-01 1.093e-02 1.061e+04 27.266 < 2e-16 ***
#HealthScore -1.098e-06 5.687e-07 1.021e+04 -1.931 0.05352 .
#as.factor(QuartileConsumption)2:Age 1.101e-03 1.842e-03 1.951e+04 0.598 0.55006
#as.factor(QuartileConsumption)3:Age 4.113e-05 1.845e-03 1.935e+04 0.022 0.98221
#as.factor(QuartileConsumption)4:Age 1.519e-03 1.851e-03 1.989e+04 0.821 0.41174
#contrasts
emmeans(regmemory, poly ~ QuartileConsumption * Age)$contrast
#$contrasts
# contrast estimate SE df z.ratio p.value
# linear 0.2165 0.0660 Inf 3.280 0.0010
# quadratic 0.0791 0.0289 Inf 2.733 0.0063
# cubic -0.0364 0.0642 Inf -0.567 0.5709
The interaction terms in the regression results are not significant, but the linear contrast is. Shouldn't the p-value for the contrast be non-significant?
Below is the code I wrote to try to recreate these results, but failed:
library(dplyr)
library(lme4)
library(lmerTest)
library(emmeans)
data("sleepstudy")
#create quartile column
sleepstudy$Quartile <- sample(1:4, size = nrow(sleepstudy), replace = T)
#regression
model1 <- lmer(Reaction ~ Days * as.factor(Quartile) + (1 + Days | Subject), data = sleepstudy)
#results
summary(model1)
#Fixed effects:
# Estimate Std. Error df t value Pr(>|t|)
#(Intercept) 258.1519 9.6513 54.5194 26.748 < 2e-16 ***
#Days 9.8606 2.0019 43.8516 4.926 1.24e-05 ***
#as.factor(Quartile)2 -11.5897 11.3420 154.1400 -1.022 0.308
#as.factor(Quartile)3 -5.0381 11.2064 155.3822 -0.450 0.654
#as.factor(Quartile)4 -10.7821 10.8798 154.0820 -0.991 0.323
#Days:as.factor(Quartile)2 0.5676 2.1010 152.1491 0.270 0.787
#Days:as.factor(Quartile)3 0.2833 2.0660 155.5669 0.137 0.891
#Days:as.factor(Quartile)4 1.8639 2.1293 153.1315 0.875 0.383
#contrast
emmeans(model1, poly ~ Quartile*Days)$contrast
#contrast estimate SE df t.ratio p.value
# linear -1.91 18.78 149 -0.102 0.9191
# quadratic 10.40 8.48 152 1.227 0.2215
# cubic -18.21 18.94 150 -0.961 0.3379
In this example, the p-value for the linear contrast is non-significant just as the interactions from the regression. Did I do something wrong, or these results are to be expected?
Look at the emmeans() call for the original model:
emmeans(regmemory, poly ~ QuartileConsumption * Age)
This requests that we obtain marginal means for combinations of QuartileConsumption and Age, and obtain polynomial contrasts from those results. It appears that Age is a quantitative variable, so in computing the marginal means, we just use the mean value of Age (see documentation for ref_grid() and vignette("basics", "emmeans")). So the marginal means display, which wasn't shown in the OP, will be in this general form:
QuartileConsumption Age emmean
------------------------------------
1 <mean> <est1>
2 <mean> <est2>
3 <mean> <est3>
4 <mean> <est4>
... and the contrasts shown will be the linear, quadratic, and cubic trends of those four estimates, in the order shown.
Note that these marginal means have nothing to do with the interaction effect; they are just predictions from the model for the four levels of QuartileConsumption at the mean Age (and mean education, mean health score), averaged over the two sexes, if I understand the data structure correctly. So essentially the polynomial contrasts estimate polynomial trends of the 4-level factor at the mean age. And note in particular that age is held constant, so we certainly are not looking at any effects of Age.
I am guessing what you want to be doing to examine the interaction is to assess how the age trend varies over the four levels of that factor. If that is the case, one useful thing to do would be something like
slopes <- emtrends(regmemory, ~ QuartileConsumption, var = "age")
slopes # display the estimated slope at each level
pairs(slopes) # pairwise comparisons of these slopes
See vignette("interactions", "emmeans") and the section on interactions with covariates.

Getting p-values for mixed model run using lmer function

I've run some mixed models using lmer and they don't give p-values. I would like to know if there is a way to get p-values for these models. Someone suggested using the afex package. I've looked into this and am confused and overwhelmed. At https://rdrr.io/rforge/afex/man/mixed.html, for example, it gives what looks like very complicated and involved code; it's so overwhelming it makes me wonder if this is really what I need to do! Below is an example of a mixed model I run; I would like to get p-values for the fixed effects and the correlations of fixed effects. Any help would be appreciated!
Linear mixed model fit by REML ['lmerMod']
Formula: score ~ group + condition + (1 | subject) + (1 | token_set) + (1 | list)
Data: EN_JT_1
REML criterion at convergence: 744.9
Scaled residuals:
Min 1Q Median 3Q Max
-3.5860 -0.0364 0.2183 0.5424 1.6575
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 0.006401 0.08000
token_set (Intercept) 0.001667 0.04083
list (Intercept) 0.000000 0.00000
Residual 0.084352 0.29043
Number of obs: 1704, groups: subject, 71; token_set, 24; list, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.99796 0.02425 41.156
groupHS -0.08453 0.02741 -3.084
groupSB -0.03103 0.03034 -1.023
conditionEN-GJT-D-ENG -0.10329 0.01990 -5.190
conditionEN-GJT-D-NNS -0.01288 0.02617 -0.492
conditionEN-GJT-D-NTR -0.19250 0.02596 -7.415
Correlation of Fixed Effects:
(Intr) gropHS gropSB cEN-GJT-D-E cEN-GJT-D-NN
groupHS -0.452
groupSB -0.409 0.361
cEN-GJT-D-E -0.410 0.000 0.000
cEN-GJT-D-NN -0.531 0.000 0.000 0.380
cEN-GJT-D-NT -0.535 0.000 0.000 0.383 0.700
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular

Visualising crossed random effect for lme

I am new to mixed models and have some problems. I've got a model:
lmer(F2 ~ (phoneme|individual) + (1|word) + age + frequency + (1|zduration), data = nurse_female)
Linear mixed model fit by REML ['lmerMod']
Formula:
F2 ~ (phoneme | individual) + (1 | word) + age + frequency +
(1 | zduration)
Data: nurse_female
REML criterion at convergence: 654.4
Scaled residuals:
Min 1Q Median 3Q Max
-2.09203 -0.20332 0.03263 0.25273 1.37056
Random effects:
Groups Name Variance Std.Dev. Corr
zduration (Intercept) 0.27779 0.5271
word (Intercept) 0.04488 0.2118
individual (Intercept) 0.34181 0.5846
phonemeIr 0.54227 0.7364 -0.82
phonemeVr 1.52090 1.2332 -0.93 0.91
Residual 0.06326 0.2515
Number of obs: 334, groups:
zduration, 280; word, 116; individual, 23
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.79167 0.32138 5.575
age -0.01596 0.00508 -3.142
frequencylow -0.37587 0.18560 -2.025
frequencymid -1.18901 0.27738 -4.286
frequencyvery high -0.68365 0.26564 -2.574
Correlation of Fixed Effects:
(Intr) age frqncyl frqncym
age -0.811
frequencylw -0.531 -0.013
frequencymd -0.333 -0.006 0.589
frqncyvryhg -0.356 0.000 0.627 0.389
The model predicts the normalised formant values of vowels such in NURSE for female speakers. Without getting too much into it, there are roughly 3 variants possible that I coded under phoneme as <Er, Ir, Vr>. Individual describes the speaker. I managed to plot the F2 variance of each speaker using random effects.
But how do I plot the model predictions for the F2 values for each speaker with phoneme on the x-axis (i.e. 3 marks for <Er, Ir, Vr>) and F2 on the y-axis?
I tried a few ways but none of them worked.
Thanks in advance. If you need further information/data just say so

predict() in lmer regression, but I need it only 2 categories

I am attempting to estimate a multilevel model. My code is:
fullModel2 <- lmer(pharmexp_2001 ~ gdp_1000_gm + health_exp_per_cap_1000_gm + life_exp +
labour_cost_1000_gm + (year_gm|lowerID), data=adat, REML=F)
which results in the following model:
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: pharmexp_2001 ~ gdp_1000_gm + health_exp_per_cap_1000_gm + life_exp +
labour_cost_1000_gm + (year_gm | lowerID)
Data: adat
AIC BIC logLik deviance df.resid
1830.2 1859.9 -906.1 1812.2 191
Scaled residuals:
Min 1Q Median 3Q Max
-2.5360 -0.6853 -0.0842 0.4923 4.0051
Random effects:
Groups Name Variance Std.Dev. Corr
lowerID (Intercept) 134.6851 11.6054
year_gm 0.4214 0.6492 -1.00
Residual 487.5324 22.0801
Number of obs: 200, groups: lowerID, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) -563.7924 75.4125 -7.476
gdp_1000_gm -0.9050 0.2051 -4.413
health_exp_per_cap_1000_gm 37.5394 6.3943 5.871
life_exp 8.8571 0.9498 9.326
labour_cost_1000_gm -1.3573 0.4684 -2.898
Correlation of Fixed Effects:
(Intr) g_1000 h____1 lif_xp
gdp_1000_gm -0.068
hl____1000_ 0.374 -0.254
life_exp -0.996 0.072 -0.393
lbr_c_1000_ -0.133 -0.139 -0.802 0.142
I know that it's a problem that the correlation is -1 by random effects, but I have a bigger problem. I have to plot my results, but only I need 2 lines: when lowerID=0 and when lowerID=1. So I want to plot pharmaexp_2001 on the y-axis against year on the x-axis, but I need only 2 lines (by lowerID). I know that I have to use predict.merMod, but how can I plot these results, plotting only these two lines? Currently my plot has 21 lines (because I analyse pharmaceutical expenditure in 21 countries).
Welcome to the site, #Eszter Takács!
You only need to specify the two IDs in newdata. Here is an example based on sleepstudy data in R. I assume you want to plot the predicted values on the y-axis. Just replace the code with your data and variables, you will obtain the predicted values for lowerID==0 and lowerID==1. Then you can use your code to plot the two lines for the two IDs.
> (fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy, REML=F))
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Reaction ~ Days + (Days | Subject)
Data: sleepstudy
AIC BIC logLik deviance
1763.9393 1783.0971 -875.9697 1751.9393
Random effects:
Groups Name Std.Dev. Corr
Subject (Intercept) 23.781
Days 5.717 0.08
Residual 25.592
Number of obs: 180, groups: Subject, 18
Fixed Effects:
(Intercept) Days
251.41 10.47
> newdata = sleepstudy[sleepstudy$Subject==308 | sleepstudy$Subject==333,]
> str(p <- predict(fm1,newdata)) # new data, all RE
Named num [1:20] 254 274 293 313 332 ...
- attr(*, "names")= chr [1:20] "1" "2" "3" "4" ...

gls() vs. lme() in the nlme package

In the nlme package there are two functions for fitting linear models (lme and gls).
What are the differences between
them in terms of the types of models
that can be fit, and the fitting
process?
What is the design
rational for having two functions to
fit linear mixed models where most
other systems (e.g. SAS SPSS) only
have one?
Update: Added bounty. Interested to know differences in the fitting process, and the rational.
From Pinheiro & Bates 2000, Section 5.4, p250:
The gls function is used to fit the
extended linear model, using either
maximum likelihood, or restricted
maximum likelihood. It can be veiwed
as an lme function without the
argument random.
For further details, it would be instructive to compare the lme analysis of the orthodont dataset (starting on p147 of the same book) with the gls analysis (starting on p250). To begin, compare
orth.lme <- lme(distance ~ Sex * I(age-11), data=Orthodont)
summary(orth.lme)
Linear mixed-effects model fit by REML
Data: Orthodont
AIC BIC logLik
458.9891 498.655 -214.4945
Random effects:
Formula: ~Sex * I(age - 11) | Subject
Structure: General positive-definite
StdDev Corr
(Intercept) 1.7178454 (Intr) SexFml I(-11)
SexFemale 1.6956351 -0.307
I(age - 11) 0.2937695 -0.009 -0.146
SexFemale:I(age - 11) 0.3160597 0.168 0.290 -0.964
Residual 1.2551778
Fixed effects: distance ~ Sex * I(age - 11)
Value Std.Error DF t-value p-value
(Intercept) 24.968750 0.4572240 79 54.60945 0.0000
SexFemale -2.321023 0.7823126 25 -2.96687 0.0065
I(age - 11) 0.784375 0.1015733 79 7.72226 0.0000
SexFemale:I(age - 11) -0.304830 0.1346293 79 -2.26421 0.0263
Correlation:
(Intr) SexFml I(-11)
SexFemale -0.584
I(age - 11) -0.006 0.004
SexFemale:I(age - 11) 0.005 0.144 -0.754
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.96534486 -0.38609670 0.03647795 0.43142668 3.99155835
Number of Observations: 108
Number of Groups: 27
orth.gls <- gls(distance ~ Sex * I(age-11), data=Orthodont)
summary(orth.gls)
Generalized least squares fit by REML
Model: distance ~ Sex * I(age - 11)
Data: Orthodont
AIC BIC logLik
493.5591 506.7811 -241.7796
Coefficients:
Value Std.Error t-value p-value
(Intercept) 24.968750 0.2821186 88.50444 0.0000
SexFemale -2.321023 0.4419949 -5.25124 0.0000
I(age - 11) 0.784375 0.1261673 6.21694 0.0000
SexFemale:I(age - 11) -0.304830 0.1976661 -1.54214 0.1261
Correlation:
(Intr) SexFml I(-11)
SexFemale -0.638
I(age - 11) 0.000 0.000
SexFemale:I(age - 11) 0.000 0.000 -0.638
Standardized residuals:
Min Q1 Med Q3 Max
-2.48814895 -0.58569115 -0.07451734 0.58924709 2.32476465
Residual standard error: 2.256949
Degrees of freedom: 108 total; 104 residual
Notice that the estimates of the fixed effects are the same (to 6 decimal places), but the standard errors are different, as is the correlation matrix.
Interesting question.
In principle the only difference is that gls can't fit models with random effects, whereas lme can. So the commands
fm1 <- gls(follicles ~ sin(2*pi*Time)+cos(2*pi*Time),Ovary,
correlation=corAR1(form=~1|Mare))
and
lm1 <- lme(follicles~sin(2*pi*Time)+cos(2*pi*Time),Ovary,
correlation=corAR1(form=~1|Mare))
ought to give the same result but they don't. The fitted parameters differ slightly.

Resources