I used the lmer package to run mixed models, when I use the anova function to retrieve the anova results, everything works. However, when I try and calculate the eta squared, I consistently get the error below. Any ideas?
Dyestuff is a dataset available with the lmerTest package. I
use package ‘lme4’ version 1.1-21. package ‘lmerTest’ version 3.1-0 and package ‘sjstats’ version 0.17.7
fm1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff)
am <- anova(fm1, test="F")
eta_sq(am, partial = FALSE, ci.lvl = NULL, n = 1000, method = c("dist", "quantile"))
Error: Result 2 is not a length 1 atomic vector
In addition: Warning message:
In tidy.anova(model) :
The following column names in ANOVA output were not recognized or transformed: NumDF, DenDF
tl;dr
it may be theoretically difficult to compute eta-squared for mixed models, see e.g. this CV question (it does suggest some ways of computing R^2 values for mixed models, which might satisfy your need for an effect size)
practically speaking, the proximal problem seems to be that internally the eta-squared computation in sjstats expects that the anova() method will return a table containing a row corresponding to the residual variance. ?anova.lmerModLmerTest returns a table with only rows corresponding to the fixed effect terms (not the residual variance).
in any case you might expect to have trouble computing an eta-squared for a model with no non-trivial fixed effects (i.e. a fixed-effect intercept only) ...
This might be more appropriate for the sjstats issues list but I'll use this space to share what I've figured out so far.
Fitting an intercept-only model gives a similar error even if it's just an lm() fit (which ought to work if anything does):
fm0 <- lm(Yield ~ 1 , Dyestuff)
am0 <- anova(fm0, test="F")
eta_sq(am0)
Error: Result 2 must be a single double, not a double vector of length 0
Run rlang::last_error() to see where the error occurred.
However: fitting a non-trivial (more fixed effects than just the intercept) lmer(Test) model also fails:
fm2 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
am2 <- anova(fm2, test="F")
eta_sq(am2)
Error: Result 2 must be a single double, not a double vector of length 0
Run rlang::last_error() to see where the error occurred.
In addition: Warning message:
In tidy.anova(model) :
The following column names in ANOVA output were not recognized or transformed: NumDF, DenDF
(From what I can tell the warning message is actually harmless.)
The proximal cause of this problem seems to be that the internal sjstats:::aov_stat_summary() function returns a table with only a single row, for the SSQ/MSQ/etc. due to Days; it should also have a row for the residual SSQ/MSQ/etc.
sjstats:::aov_stat_summary(am3)
## term sumsq meansq NumDF DenDF statistic p.value
## 1 Days 30030.94 30030.94 1 16.99998 45.85296 3.263825e-06
The problem is that the number of terms is internally computed as (nrow(aov.sum)-1), which doesn't make sense here.
Compare this with what we get with a 1+Days model using lm():
fm3 <- lm(Reaction ~ Days , sleepstudy)
am3 <- anova(fm3, test="F")
sjstats:::aov_stat_summary(am3)
## term df sumsq meansq statistic p.value
## 1 Days 1 162702.7 162702.652 71.46442 9.894096e-15
## 2 Residuals 178 405251.6 2276.694 NA NA
Digging a little deeper, we can see that this is a direct consequence of the way the anova() results are reported for mixed models:
anova(fm2)
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## Days 30031 30031 1 17 45.853 3.264e-06 ***
Note there is no "residuals" row. In contrast:
anova(fm3)
## Analysis of Variance Table
## Response: Reaction
## Df Sum Sq Mean Sq F value Pr(>F)
## Days 1 162703 162703 71.464 9.894e-15 ***
## Residuals 178 405252 2277
I think that if you use the function anova_stats from sjstats package it works.
fm2 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
am2 <- anova_stats(fm2, test="F")
Related
I need to extract the standard error of variance component from the output of lmer .
library(lme4)
model <- lmer(Reaction ~ Days + (1|Subject), sleepstudy)
The following produces estimates of variance component :
s2 <- VarCorr(model)$Subject[1]
It is NOT the standard error of the variance. And I want the standard error . How can I have it ?
EDIT :
Perhaps I am unable to make you understand what I meant by "standard error of the variance component". So I am editing my post .
In Chapter 12 , Experiments with Random Factors , of the book Design and Analysis of Experiments , by Douglas C. Montgomery , at the end of the chapter , Example 12-2 is done by SAS . In Example 12-2 , the model is a two-factor factorial random effect model .The output is given in Table 12-17
I am trying to fit the model in R by lmer .
library(lme4)
fit <- lmer(y~(1|operator)+(1|part),data=dat)
R codes for extracting the Estimate , annotated by 4 in the table 12-17 :
est_ope=VarCorr(fit)$operator[1]
est_part = VarCorr(fit)$part[1]
sig = summary(fit)$sigma
est_res = sig^2
Now I want to extract the results of Std Errors , annotated by 5 in the table 12-17 from lmer output .
Many Thanks !
I think you are looking for the Wald standard error of the variance estimates. Please note that these (as often pointed out by Doug Bates) the Wald standard errors are often very poor estimates of the uncertainty of variances, because the likelihood profiles are often far from quadratic on the variance scale ... I'm assuming you know what you're doing and have some good use for these numbers ...
This can be (now) done using the merDeriv package.
library(lme4)
library(merDeriv)
m1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
sqrt(diag(vcov(m1, full = TRUE)))
vv <- vcov(m1, full = TRUE)
colnames(vv)
## [1] "(Intercept)" "Days"
## [3] "cov_Subject.(Intercept)" "cov_Subject.Days.(Intercept)"
## [5] "cov_Subject.Days" "residual"
If we want the standard errors of the variance components, we take the square root of the diagonal and keep only the last three elements:
sqrt(diag(vv)[3:5])
## [1] 288.78602 46.67876 14.78208
old answer
library("lme4")
model <- lmer(Reaction ~ Days + (1|Subject), sleepstudy, REML=FALSE)
(At present it's quite a bit harder to do this for the REML estimates ...)
Extract deviance function parameterized in terms of standard deviation and correlation rather than in terms of Cholesky factors (note this an internal function, so there's not a guarantee that it will keep working in the same way in the future ...)
dd.ML <- lme4:::devfun2(model,useSc=TRUE,signames=FALSE)
Extract parameters as standard deviations on original scale:
vv <- as.data.frame(VarCorr(model)) ## need ML estimates!
pars <- vv[,"sdcor"]
## will need to be careful about order if using this for
## a random-slopes model ...
Now compute the second-derivative (Hessian) matrix:
library("numDeriv")
hh1 <- hessian(dd.ML,pars)
vv2 <- 2*solve(hh1) ## 2* converts from log-likelihood to deviance scale
sqrt(diag(vv2)) ## get standard errors
These are the standard errors of the standard deviations: double them to get the standard errors of the variances (when you transform a value, its standard errors scale according to the derivative of the transformation).
I think this should do it, but you might want to double-check it ...
I'm not really sure what you mean by "standard error of variance component". My best guess (based on your code) is that you want the standard error of the random effect. You can get this using package arm:
library(arm)
se.ranef(model)
#$Subject
# (Intercept)
#308 9.475668
#309 9.475668
#310 9.475668
#330 9.475668
#331 9.475668
#332 9.475668
#333 9.475668
#334 9.475668
#335 9.475668
#337 9.475668
#349 9.475668
#350 9.475668
#351 9.475668
#352 9.475668
#369 9.475668
#370 9.475668
#371 9.475668
#372 9.475668
This is actually the square root of the conditional variance-covariance matrix of the random effect:
sqrt(attr(ranef(model, condVar = TRUE)$Subject, "postVar"))
mn2=lmer(pun~ pre + (pre|pro), REML = TRUE, data = pro)
summary(mn2)
coe2=coef(mn2)
coe2
# Matriz de varianza-covarianza (covarianza)
as.data.frame(VarCorr(mn2))
# Extraer coeficientes fijos
fixef(mn2)
# Extraer desvios de a - alfa y b - beta
re=as.data.frame(ranef(mn2))
I have two models which I am running across an imputed dataset in order to produce pooled estimates. My understanding is that because both models are ran through hundreds of imputed data frames, I have to pool or essentially "average out" all the regression model estimates into one "overall" estimate. Below are the steps I did:
#1 IMPUTE MASTER DATASET
imputed_data <- mice(master, m=20, maxit=50, seed=5798713)
#2 RUN LINEAR MODEL
model.linear <- with(imputed_data, lm(outcome~exposure+age+gender+weight))
summary(pool(model.linear))
#3 RUN NON-LINEAR RESTRICTED CUBIC SPLINE (3-KNOT) MODEL
model.rcs <- with(imputed_data, lm(outcome~rcs(exposure,3)+age+gender+weight))
summary(pool(model.rcs))
#4 COMPARE BOTH MODELS USING POOL.COMPARE FUNCTION
pool.compare(model.rcs, model.linear)
Both linear and RCS models produce "pooled" estimates, 95% CI's, and p-values once I use the "summary(pool(..)" function. However, the issue is that when I run the "pool.compare" function, I get an error that states:
Error: Model 'fit0' not contained in 'fit1'
In addition: Warning message:
'pool.compare' is deprecated.
Use 'D1' instead.
See help("Deprecated")
I'm confused as to why the model says fit0 is not contained in fit1 when the "exposure", "outcome", and all the covariates listed are the same between the linear and RCS models. Is there an option that I'm missing here?
Any help/guidance would be very appreciated.
P.S. I am unfortunately unable to provide a sample datacut considering how large the imputed dataset is. Let me know how I can better improve my question if there's any confusion.
As the error says, pool.compare is deprecated. Instead use D1
library(mice)
library(rms)
D1(model.rcs, model.linear)
# test statistic df1 df2 dfcom p.value riv
# 1 ~~ 2 6.248565 2 8.635754 20 0.02098072 0.449098
In some examples, there is only warning, but in others, it give both Error and warning
pool.compare(model.rcs, model.linear)
#Error: Model 'fit0' not contained in 'fit1'
#In addition: Warning message:
# 'pool.compare' is deprecated.
#Use 'D1' instead.
#See help("Deprecated")
The error would be because of the model itself i.e. rcs model while below we are comparing two linear models
imp <- mice(nhanes)
model.linear <- with(imp, lm(age ~ bmi + hyp + chl))
model.rcs <- with(imp, lm(age ~ rcs(bmi, 3) + hyp + chl))
Reproducible example
imp <- mice(nhanes2, print=FALSE, m=50, seed=00219)
fit0 <- with(data=imp,expr=lm(bmi~age+hyp))
fit1 <- with(data=imp,expr=lm(bmi~age+hyp+chl))
stat <- pool.compare(fit1, fit0)
#Warning message:
#'pool.compare' is deprecated.
#Use 'D1' instead.
#See help("Deprecated")
stat <- D1(fit1, fit0)
stat
# test statistic df1 df2 dfcom p.value riv
# 1 ~~ 2 7.606026 1 16.2182 20 0.01387548 0.3281893
I'm trying to perform an repeated measurements ANOVA from my field data (field_data) with the following code:
#First convert variables into factors:
field_data$month = factor(field_data$month,
levels=unique(field_data$month))
field_data$distance = factor(field_data$distance)
field_data$CO2.f = factor(field_data$CO2,
ordered = TRUE)
# Model
model_CO2 = clmm(CO2.f ~ month + distance + month:distance + (1|nest),
data = field_data,
threshold = "equidistant")
# Anova of the model
Anova(model_CO2,
type = "II")
I've run the same code for a couple of weeks without any problems, but today when I tried to run it again I got some errors: everything seems to work fine until I make the model, but when I run the Anova I get the following message:
Error in eval(predvars, data, env) : object 'CO2.f' not found
I've run:
is.data.frame(field_data)
and also tried to change the excel where I have my data, but nothing seems to work out.
Any ideas?
Thanks!
Update:
I've tried to run this in another computer and it seems to work fine. Does anyone have an idea why this is happening?
Update 2:
When I run the model_CO2, I get results:
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: CO2.f ~ month + distance + month:distance + (1 | nest)
data: field_data
link threshold nobs logLik AIC niter max.grad
logit equidistant 123 -551.68 1137.36 1213(4936) 5.30e-04
Random effects:
Groups Name Variance Std.Dev.
nest (Intercept) 3.521 1.876
Number of groups: nest 5
Coefficients:
monthjuly monthseptember distance0.25 m
-1.1232 1.6078 0.4056
distance0.5 m distance0.75 m distance1 m
2.5090 0.6062 2.5836
monthjuly:distance0.25 m monthseptember:distance0.25 m monthjuly:distance0.5 m
0.2939 -0.3551 -1.2597
monthseptember:distance0.5 m monthjuly:distance0.75 m monthseptember:distance0.75 m
-1.7791 0.3966 0.4520
monthjuly:distance1 m monthseptember:distance1 m
-1.4307 -0.7867
Thresholds:
threshold.1 spacing
-3.61618 0.08347
This is why I think it has something to do with the ANOVA and not with the factor. However, I've tried other kinds of ANOVA (to see if the problem was one library) and I keep getting the same error.
R's mice contains a function, pool.compare, to compare nested models fit to imputed objects. If I try to include an interaction term:
library(mice)
imput = mice(nhanes2)
mi1 <- with(data=imput, expr=lm(bmi~age*hyp))
mi0 <- with(data=imput, expr=lm(bmi~age+hyp))
pc <- pool.compare(mi1, mi0, method="Wald")
then it returns the following error:
Error in pool(fit1) :
Different number of parameters: coef(fit): 6, vcov(fit): 5
It sounds like the variance-covariance matrix doesn't include the interaction term as its own variable. What's the best way around this?
The problem appears to be that some of your parameters are un-estimatable in some of your imputed data.sets. When I run the code, I see
( fit1<-mi1$analyses[[1]] )
# lm(formula = bmi ~ age * hyp)
#
# Coefficients:
# (Intercept) age2 age3 hyp2 age2:hyp2
# 28.425 -5.425 -3.758 1.200 3.300
# age3:hyp2
# NA
In this set, it was not possible to estimate age3*hyp2 (presumably because there were no observations in this group).
This causes the discrepancy in coef(fit1) and vcov(fit1) since the covariance cannot be estimated for that term.
What to do in this case is more of a statistical problem than a programming problem. If you are unsure of what would be appropriate for your data, I suggest you consult with the statisticians over at Cross Validated.
Least Squares Means with their standard errors for aov object can be obtained with model.tables function:
npk.aov <- aov(yield ~ block + N*P*K, npk)
model.tables(npk.aov, "means", se = TRUE)
I wonder how to get the generalized least squares means with their standard errors from nlme or lme4 objects:
library(nlme)
data(Machines)
fm1Machine <- lme(score ~ Machine, data = Machines, random = ~ 1 | Worker )
Any comment and hint will be highly appreciated. Thanks
lme and nlme fit through maximum likelihood or restricted maximum likelihood (the latter is the default), so your results will be based on either of those methods
summary(fm1Machine) will provide you with the output that includes the means and standard errors:
....irrelevant output deleted
Fixed effects: score ~ Machine
Value Std.Error DF t-value p-value
(Intercept) 52.35556 2.229312 46 23.48507 0
MachineB 7.96667 1.053883 46 7.55935 0
MachineC 13.91667 1.053883 46 13.20514 0
Correlation:
....irrelevant output deleted
Because you have fitted the fixed effects with an intercept, you get an intercept term in the fixed effects result instead of a result for MachineA. The results for MachineB and MachineC are contrasts with the intercept, so to get the means for MachineB and MachineC, add the value of each to the intercept mean. But the standard errors are not the ones you would like.
To get the information you are after, fit the model so it doesn't have an intercept term in the fixed effects (see the -1 at the end of the fixed effects:
fm1Machine <- lme(score ~ Machine-1, data = Machines, random = ~ 1 | Worker )
This will then give you the means and standard error output you want:
....irrelevant output deleted
Fixed effects: score ~ Machine - 1
Value Std.Error DF t-value p-value
MachineA 52.35556 2.229312 46 23.48507 0
MachineB 60.32222 2.229312 46 27.05867 0
MachineC 66.27222 2.229312 46 29.72765 0
....irrelevant output deleted
To quote Douglas Bates from
http://markmail.org/message/dqpk6ftztpbzgekm
"I have a strong suspicion that, for most users, the definition of lsmeans is "the numbers that I get from SAS when I use an lsmeans statement". My suggestion for obtaining such numbers is to buy a SAS license and use SAS to fit your models."