Data fits a mixed effects model with nested random effects. How do I get the r-square and F-ratio for this model?
set.seed(111)
df <- data.frame(level = rep(c("A","B"), times = 8),
time = rep(c("1","2","3","4"), each = 4),
x1 = rnorm(16,3,1),
x2 = rnorm(16,3,1))
mod <- lmer(x1 ~ x2 + I(x2^2) + (1|time/level), df)
summary(mod)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest'] Formula: x1 ~ x2 + I(x2^2) + (1 | time/level) Data: df
REML criterion at convergence: 47.9
Scaled residuals:
Min 1Q Median 3Q Max
-1.72702 -0.41979 0.00653 0.43709 2.36393
Random effects: Groups Name Variance Std.Dev. level:time (Intercept) 0.00 0.00 time (Intercept) 0.00 0.00 Residual 1.02 1.01 Number of obs: 16, groups: level:time, 8; time, 4
Fixed effects:
Estimate Std. Error df t value Pr(>|t|) (Intercept) 3.58299 0.81911 13.00000 4.374 0.000753 *** x2
-0.59777 0.54562 13.00000 -1.096 0.293147 I(x2^2) 0.07686 0.09356 13.00000 0.822 0.426136
--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) x2 x2 -0.868 I(x2^2) 0.660 -0.928 optimizer (nloptwrap) convergence code: 0 (OK) boundary (singular) fit: see ?isSingular
For the R-squared you can use the r.squaredLR function from the MuMIn package:
library(MuMIn)
r.squaredLR(mod)
Output:
[1] 0.1017782
attr(,"adj.r.squared")
[1] 0.1086741
For the F-ratio, maybe you want this:
anova(mod)
Output:
Analysis of Variance Table
npar Sum Sq Mean Sq F value
x2 1 0.81407 0.81407 0.7981
I(x2^2) 1 0.68850 0.68850 0.6750
I am trying to model the relation between a scar acquisition rate of a wild population of animals, and I have calculated yearly rates before.
If you see below the plot, it seems to me that rates rise through the middle of the period and than fall again. I have tried to fit a polynomial LM with the code
model1 <- lm(Rate~poly(year, 2, raw = TRUE),data=yearlyratesub)
summary(model1)
model1
I have plotted using:
g <-ggplot(yearlyratesub, aes(year, Rate)) + geom_point(shape=1) + geom_smooth(method = lm, formula = y ~ poly(x, 2, raw = TRUE))
g
The model output was:
Call:
lm(formula = Rate ~ poly(year, 2, raw = TRUE), data = yearlyratesub)
Residuals:
Min 1Q Median 3Q Max
-0.126332 -0.037683 -0.002602 0.053222 0.083503
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.796e+03 3.566e+03 -2.467 0.0297 *
poly(year, 2, raw = TRUE)1 8.747e+00 3.545e+00 2.467 0.0297 *
poly(year, 2, raw = TRUE)2 -2.174e-03 8.813e-04 -2.467 0.0297 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0666 on 12 degrees of freedom
Multiple R-squared: 0.3369, Adjusted R-squared: 0.2264
F-statistic: 3.048 on 2 and 12 DF, p-value: 0.08503
How can I enterpret that now? The overall model p value is not significant but the intercept and single slopes are?
Should I rather try another fit than x² or even group the values and test between groups e.g. with an ANOVA? I know the LM has low fit but I guess it's because I have little values and maybe x² might be not it...?
Would be happy about input regarding model and outcome interpretation..
Grouping
Since the data was not provided (next time please provide a complete reproducible question including all inputs) we used the data in the Note at the end. We see that that the model is highly significant if we group the points using the indicated breakpoints.
g <- factor(findInterval(yearlyratesub$year, c(2007.5, 2014.5))+1); g
## [1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3
## Levels: 1 2 3
fm <- lm(rate ~ g, yearlyratesub)
summary(fm)
giving
Call:
lm(formula = rate ~ g, data = yearlyratesub)
Residuals:
Min 1Q Median 3Q Max
-0.064618 -0.018491 0.006091 0.029684 0.046831
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.110854 0.019694 5.629 0.000111 ***
g2 0.127783 0.024687 5.176 0.000231 ***
g3 -0.006714 0.027851 -0.241 0.813574
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03939 on 12 degrees of freedom
Multiple R-squared: 0.7755, Adjusted R-squared: 0.738
F-statistic: 20.72 on 2 and 12 DF, p-value: 0.0001281
We could consider combining the outer two groups.
g2 <- factor(g == 2)
fm2 <- lm(rate ~ g2, yearlyratesub)
summary(fm2)
giving:
Call:
lm(formula = rate ~ g2, data = yearlyratesub)
Residuals:
Min 1Q Median 3Q Max
-0.064618 -0.016813 0.007096 0.031363 0.046831
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.10750 0.01341 8.015 2.19e-06 ***
g2TRUE 0.13114 0.01963 6.680 1.52e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03793 on 13 degrees of freedom
Multiple R-squared: 0.7744, Adjusted R-squared: 0.757
F-statistic: 44.62 on 1 and 13 DF, p-value: 1.517e-05
Sinusoid
Looking at the graph it seems that the points are turning up at the left and right edges suggesting we use a sinusoidal fit. a + b * cos(c * year)
fm3 <- nls(rate ~ cbind(a = 1, b = cos(c * year)),
yearlyratesub, start = list(c = 0.5), algorithm = "plinear")
summary(fm3)
giving
Formula: rate ~ cbind(a = 1, b = cos(c * year))
Parameters:
Estimate Std. Error t value Pr(>|t|)
c 0.4999618 0.0001449 3449.654 < 2e-16 ***
.lin.a 0.1787200 0.0150659 11.863 5.5e-08 ***
.lin.b 0.0753754 0.0205818 3.662 0.00325 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.05688 on 12 degrees of freedom
Number of iterations to convergence: 2
Achieved convergence tolerance: 5.241e-08
Comparison
Plotting the fits and looking at their residual sum of squares and AIC we have
plot(yearlyratesub)
# fm0 from Note at end, fm and fm2 are grouping models, fm3 is sinusoidal
L <- list(fm0 = fm0, fm = fm, fm2 = fm2, fm3 = fm3)
for(i in seq_along(L)) {
lines(fitted(L[[i]]) ~ year, yearlyratesub, col = i, lwd = 2)
}
legend("topright", names(L), col = seq_along(L), lwd = 2)
giving the following where lower residual sum of squares and AIC (which takes into account the number of paramters) are better. We see that fm fits the most closely based on residual sum of squares but with fm2 fitting almost as well; however, when taking the number of parameters into account by using AIC fm2 has the lowest and so is most favored by that criterion.
cbind(RSS = sapply(L, deviance), AIC = sapply(L, AIC))
## RSS AIC
## fm0 0.05488031 -33.59161
## fm 0.01861659 -49.80813
## fm2 0.01870674 -51.73567
## fm3 0.04024237 -38.24512
Note
yearlyratesub <-
structure(list(year = c(2004, 2005, 2006, 2007, 2008, 2009, 2010,
2011, 2012, 2013, 2014, 2015, 2017, 2018, 2019), rate = c(0.14099813521287,
0.0949946651016247, 0.0904788394070601, 0.11694517831575, 0.26786193592875,
0.256346628540479, 0.222029818828298, 0.180116679856725, 0.285467976459104,
0.174019208113095, 0.28461698734932, 0.0574827955982996, 0.103378448084776,
0.114593695172686, 0.141105952837639)), row.names = c(NA, -15L
), class = "data.frame")
fm0 <- lm(rate ~ poly(year, 2, raw = TRUE), yearlyratesub)
summary(fm0)
giving
Call:
lm(formula = rate ~ poly(year, 2, raw = TRUE), data = yearlyratesub)
Residuals:
Min 1Q Median 3Q Max
-0.128335 -0.038289 -0.002715 0.054090 0.084792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.930e+03 3.621e+03 -2.466 0.0297 *
poly(year, 2, raw = TRUE)1 8.880e+00 3.600e+00 2.467 0.0297 *
poly(year, 2, raw = TRUE)2 -2.207e-03 8.949e-04 -2.467 0.0297 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.06763 on 12 degrees of freedom
Multiple R-squared: 0.3381, Adjusted R-squared: 0.2278
F-statistic: 3.065 on 2 and 12 DF, p-value: 0.0841
I am trying to fit a linear model having 4 predictors. The problem I am facing is my code doesn't estimate the one parameter. Every time when I put the one variable at last of my lm formula it doesn't estimate it. My code is:
AllData <- read.csv("AllBandReflectance.csv",header = T)
Swir2ref <- AllData$band7
x1 <- AllData$X1
x2 <- AllData$X2
y1 <- AllData$Y1
y2 <- AllData$Y2
linear.model <- lm( Swir2ref ~ x1 + y1 +x2 +y2 , data = AllData )
summary(linear.model)
Call:
lm(formula = Swir2ref ~ x1 + y1 + x2 + y2, data = AllData)
Residuals:
Min 1Q Median 3Q Max
-0.027277 -0.008793 -0.000689 0.010085 0.035097
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.595593 0.002006 296.964 <2e-16 ***
x1 0.002175 0.003462 0.628 0.532
y1 0.001498 0.003638 0.412 0.682
x2 0.022671 0.018786 1.207 0.232
y2 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01437 on 67 degrees of freedom
Multiple R-squared: 0.02876, Adjusted R-squared: -0.01473
F-statistic: 0.6613 on 3 and 67 DF, p-value: 0.5787
I have trouble understanding the difference between these two notations.
According to R intro y~x1/x2 represents that x2 in nested within x1. If x1 is a factor and x2 a continuous variable, is lm( y~x1/x2) a correct representation of nested ANCOVA?
What is confusing is that some online help topics suggest using aov(y~x1+Error(x2)) to represent a nested anova. Yet those two codes have completely different results.
For example:
x2 = rnorm(1000,2)
x1 = rep( c("A","B"), each=500)
y = x2*3+rnorm(1000)
Under this scenario I would expect x2 to be significant and x1 to be non significant.
summary(aov(y~x1+Error(x2)))
Error: x2
Df Sum Sq Mean Sq
x1 1 9262 9262
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
x1 1 0.0 0.0003 0 0.985
Residuals 997 967.9 0.9708
aov() works as expected. However, lm()....
summary(lm( y~x1/x2))
Call:
lm(formula = y ~ x1/x2)
Residuals:
Min 1Q Median 3Q Max
-3.4468 -0.6352 0.0092 0.6526 2.8294
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.08727 0.09566 0.912 0.3618
x1B -0.24501 0.13715 -1.786 0.0743 .
x1A:x2 2.94012 0.04362 67.401 <2e-16 ***
x1B:x2 3.06272 0.04326 70.806 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9838 on 996 degrees of freedom
Multiple R-squared: 0.9058, Adjusted R-squared: 0.9055
F-statistic: 3191 on 3 and 996 DF, p-value: < 2.2e-16
x1 is marginally significant, and in many iterations it is highly significant? How can these results be so different?
What am I missing? Those two formulas are not suppose to represent the same thing? Or am I misunderstanding something on the underlying statistics?
Call:
glm(formula = Y1 ~ 0 + x1 + x2 + x3 + x4 + x5, family = quasibinomial(link = cauchit))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5415 0.2132 0.3988 0.6614 1.8426
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 -0.7280 0.3509 -2.075 0.03884 *
x2 -0.9108 0.3491 -2.609 0.00951 **
x3 0.2377 0.1592 1.494 0.13629
x4 -0.2106 0.1573 -1.339 0.18151
x5 3.6982 0.8658 4.271 2.57e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 0.8782731)
Null deviance: 443.61 on 320 degrees of freedom
Residual deviance: 270.17 on 315 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 12
Here is the output from glm in R.
Do you know a way to pull out Dispersion parameter which is 0.8782731 in this case, instead of just copy and paste. Thanks.
You can extract it from the output of summary:
data(iris)
mod <- glm((Petal.Length > 5) ~ Sepal.Width, data=iris)
summary(mod)
#
# Call:
# glm(formula = (Petal.Length > 5) ~ Sepal.Width, data = iris)
#
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -0.3176 -0.2856 -0.2714 0.7073 0.7464
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.38887 0.26220 1.483 0.140
# Sepal.Width -0.03561 0.08491 -0.419 0.676
#
# (Dispersion parameter for gaussian family taken to be 0.2040818)
#
# Null deviance: 30.240 on 149 degrees of freedom
# Residual deviance: 30.204 on 148 degrees of freedom
# AIC: 191.28
#
# Number of Fisher Scoring iterations: 2
summary(mod)$dispersion
# [1] 0.2040818
The str function in R is often helpful to solve these sorts of questions. For instance, I looked at str(summary(mod)) to answer the question.