How to get 95% CIs using ezANOVA() - r

This is a programming question for people who like to use the ez package in R. I am accustomed to using linear mixed effects models with lmer(). Among the useful outputs of lmer (), I get a coefficient value for each of my experimental factors, and using pvals.fnc() I can easily get 95% confidence intervals (CI) to report together with the model coefficients.
I have recently started using ezANOVA, and I would like to know: Is there a mainstream way to get the same output? That is, I'd like to get a value for the coefficient of an experimental factor and a CI to go along with it. Here is sample code to make this concrete:
library(languageR) #necessary to use pvals.fnc()
library(lme4) #necessary for lmer()
library(ez) #necessary for ezANOVA
data(ANT) #load sample data
If I were using lmer, I would estimate my model and then get 95% CIs for the coefficients:
model_lmer = lmer( formula = rt ~ cue*flank + (1|subnum), data = ANT)
pvals.fnc(model_lmer, withMCMC=T)$fixed
So, for example, I know that the estimate of the interaction between cue and flank (when cue has the level "center" and flank has the level "congruent") is -3.9511 and the 95% CI is [-12.997, 5.535]
Now say that I want to run an anova by-subjects and by-items using ezANOVA, and I want to get 95% CIs for the by-subject estimates. This is my model:
model.f1 = ezANOVA(data=ANT, dv=rt,wid=subnum,within=.(cue,flank),return_aov=T)
But in the output, I don't see the model estimates when I do:
model.f1$ANOVA
And I don't know how to calculate the 95% CIs corresponding to those estimates. I think I should be able to use ezBoot() but I tried and I'm not sure how to implement it.
Any suggestions? Thanks for your help!

This answer was provided by the author of the "ez" package in another forum. I'm copying it here in case someone else finds it useful:
"One somewhat hacky way to get CIs for effects is to use ezStats () to get the means
and FLSD, compute the difference between the means to get the effect,
and divide the FLSD by sqrt(2) to get the CI"

Related

SE for logistic regression predictions

I have been tasked with calculating the SE for logistic regression point estimates (where all my predictor variables are factors). I typically use ggpredict to estimate my predictions which provides CI's. However, we are comparing our results to estimates from program MARK and we find readers have a better grasp at understanding our plots with SE as opposed to 95% CI's.
Based on reading the package notes, it appears I can simply calculate (conf.high - predicted value)/1.96). Am I correct? Or am I missing something and that is not the correct way to calculate SE for the predicted estimates. If I am wrong, any ideas on how I can do this or do I need to just use CI's?
Thank you very much for your help.

Survdiff (package survival) with frequency weights (obtained after Coarsened Exact Matching - package matchit)

I am doing a counterfactual impact evaluation on survival data. More precisely, I try to evaluate the impact of vocational training on time spent in unemployment. I use the Kaplan Meier estimator of the survival curve (package survival).
Before doing Kaplan Meier, I use coarsened exact matching (aim is ATT) to get the control and treatment groups close in terms of pretreatment covariates (package MatchIt).
For the Kaplan Meier estimator, I have to use the weights form the matching, which works well using the weights option and robust standard errors of survfit :
library(survival)
library(survminer)
kp_cem <- survfit(Surv(time=time_cem,event=status_cem)~treatment_cem, data=data_impact_cem,robust =TRUE,weights =weights)
Although, when I try to use a log-rank test to test for the difference in survival curves between treatment and control groups, I cannot take into account the frequency weights from the matching so the test statistics are not correct.
log_rank <- survdiff(Surv(time=time_cem,event=status_cem)~treatment_cem, data=data_impact_cem,rho=0)
I tried the option "pval = TRUE" of ggsurvplot (package survminer) but the problem is the same, the frequency weights are not taken into account.
How can I include frequency weights in survdiff? Are there other packages to compute log-rank test taking into account frequency weights (obtained after matching)?
There are at least two ways to do this:
First, you can use the survey::svylogrank function, as #IRTFM suggests. This will treat the weights as sampling weights, but I think that's ok with the robust standard errors that svylogrank uses.
Second, you can use survival::coxph. The logrank test is the score test in a Cox model, and coxph takes frequency weights. Use robust=TRUE if you want a robust score test: it will be at the bottom of the output of summary(your_cox_model) and you can extract it as summary(your_cox_model)$robscore
Thank you very much #Thomas Lumley and #IRTFM for your answers.
Here is how I apply your 2 suggestions (I added some comments + references).
1. Using survey::svylogrank
I don’t feel very confortable using sampling weights while it is really frequency weights that I have.
How should I specify the survey design ? The weights come from Coarsened Exact Matching (matchit with method = "cem") which is a class of stratum matching.
Should I specify the strata and the weights in the survey design ? In this vignette form Matchit Estimating Effects After Matching, it is suggested to use only weights and robust standard errors in the survival analysis (not the strata) (p. 27).
Here is how I specify the design and how I obtain the log-rank test using the package survey taking into account the weights from matching :
library(survey)
design_weights <- svydesign(id=~ibis, strata=~subclass, weights=~weights, data=data_impact_cem)
log_rank <- svylogrank(Surv(time=time_cem,event=status_cem)~treatment_cem, design=design_weights, rho=0)
2. Using survival::coxph
Thank you for this piece of information, being quite new to survival analysis, I overlooked this nice property of the equivalency of score test from cox model and log-rank test. For people wishing more info on this subject, I found this book very instructive : Moore, D. (2016). Applied survival analysis using R. New York: NY : Springer (p 58).
I find this 2d option more attractive than the 1st involving survey. Here is how I apply it :
library(survival)
cox_cem <-coxph(Surv(time=time_cem,event=status_cem)~treatment_cem, data=data_impact_cem,robust =TRUE,weights =weights)
sum_cox_cem <-summary(cox_cem)
score_test <-sum_cox_month[[13]][[1]]
score_test <- round(score_test,3)
pvalue <- sum_cox_month[[13]][[3]]
pvalue <-if(pvalue<0.001){"<0.001"} else{round(pvalue,3)}
Here is the difference between the 2 test statistics (quite close in the end).
enter image description here
Though, I still wonder why the weights option does not exist in survdiff.

Confidence intervals for proportions by svypredmeans()

I am trying to obtain the 95% CI for proportions which are actually predictive marginal means, as computed with the survey package for R. I'm including this reproducible example that makes no sense content-wise, but hopefully illustrates well my purpose:
library(survey)
data(api)
dstrat <- svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
# 1.- marginal means for groups according to a variable
svyglm(I(sch.wide=="Yes") ~ awards+comp.imp, design=dstrat, family=quasipoisson()) %>% svypredmeans(., ~yr.rnd)
# 2.- the confidence intervals I'd like to compute by svyciprop(), in an obviously wrong way
svyby(I(sch.wide=="Yes") ~ awards+comp.imp, ~yr.rnd, design=dstrat, svyciprop, vartype="ci", method="xlogit")
What I can't figure out is how to enter the right arguments into svyciprop(), or if this is even possible. The function svyciprop() takes a single formula, and this does not seem compatible with the way compute predictive marginal means are computed, nor with the output of svypredmeans().
Thanks beforehand for any help!
EDIT Apologies because the code was right, but there was a typo. However, there's a follow-up question: the estimates for the predictive marginal means in step 1 will not match the same estimates in step 2. Can someone shed light on why these differences?
Your code does not work for a very simple reason: R is case-sensitive. Change sch.wide=="yes" to sch.wide=="Yes" and it should work.

R language, how to use bootstraps to generate maximum likelihood and AICc?

Sorry for a quite stupid question. I am doing multiple comparisons of morphologic traits through correlations of bootstraped data. I'm curious if such multiple comparisons are impacting my level of inference, as well as the effect of the potential multicollinearity in my data. Perhaps, a reasonable option would be to use my bootstraps to generate maximum likelihood and then generate AICc-s to do comparisons with all of my parameters, to see what comes out as most important... the problem is that although I have (more or less clear) the way, I don't know how to implement this in R. Can anybody be so kind as to throw some light on this for me?
So far, here an example (using R language, but not my data):
library(boot)
data(iris)
head(iris)
# The function
pearson <- function(data, indices){
dt<-data[indices,]
c(
cor(dt[,1], dt[,2], method='p'),
median(dt[,1]),
median(dt[,2])
)
}
# One example: iris$Sepal.Length ~ iris$Sepal.Width
# I calculate the r-squared with 1000 replications
set.seed(12345)
dat <- iris[,c(1,2)]
dat <- na.omit(dat)
results <- boot(dat, statistic=pearson, R=1000)
# 95% CIs
boot.ci(results, type="bca")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = results, type = "bca")
Intervals :
Level BCa
95% (-0.2490, 0.0423 )
Calculations and Intervals on Original Scale
plot(results)
I have several more pairs of comparisons.
More of a Cross Validated question.
Multicollinearity shouldn't be a problem if you're just assessing the relationship between two variables (in your case correlation). Multicollinearity only becomes an issue when you fit a model, e.g. multiple regression, with several highly correlated predictors.
Multiple comparisons is always a problem though because it increases your type-I error. The way to address that is to do a multiple comparison correction, e.g. Bonferroni-Holm or the less conservative FDR. That can have its downsides though, especially if you have a lot of predictors and few observations - it may lower your power so much that you won't be able to find any effect, no matter how big it is.
In high-dimensional setting like this, your best bet may be with some sort of regularized regression method. With regularization, you put all predictors into your model at once, similarly to doing multiple regression, however, the trick is that you constrain the model so that all of the regression slopes are pulled towards zero, so that only the ones with the big effects "survive". The machine learning versions of regularized regression are called ridge, LASSO, and elastic net, and they can be fitted using the glmnet package. There is also Bayesian equivalents in so-called shrinkage priors, such as horseshoe (see e.g. https://avehtari.github.io/modelselection/regularizedhorseshoe_slides.pdf). You can fit Bayesian regularized regression using the brms package.

How to get coefficients and their confidence intervals in mixed effects models?

In lm and glm models, I use functions coef and confint to achieve the goal:
m = lm(resp ~ 0 + var1 + var1:var2) # var1 categorical, var2 continuous
coef(m)
confint(m)
Now I added random effect to the model - used mixed effects models using lmer function from lme4 package. But then, functions coef and confint do not work any more for me!
> mix1 = lmer(resp ~ 0 + var1 + var1:var2 + (1|var3))
# var1, var3 categorical, var2 continuous
> coef(mix1)
Error in coef(mix1) : unable to align random and fixed effects
> confint(mix1)
Error: $ operator not defined for this S4 class
I tried to google and use docs but with no result. Please point me in the right direction.
EDIT: I was also thinking whether this question fits more to https://stats.stackexchange.com/ but I consider it more technical than statistical, so I concluded it fits best here (SO)... what do you think?
Not sure when it was added, but now confint() is implemented in lme4.
For example the following example works:
library(lme4)
m = lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
confint(m)
There are two new packages, lmerTest and lsmeans, that can calculate 95% confidence limits for lmer and glmer output. Maybe you can look into those? And coefplot2, I think can do it too (though as Ben points out below, in a not so sophisticated way, from the standard errors on the Wald statistics, as opposed to Kenward-Roger and/or Satterthwaite df approximations used in lmerTest and lsmeans)... Just a shame that there are still no inbuilt plotting facilities in package lsmeans (as there are in package effects(), which btw also returns 95% confidence limits on lmer and glmer objects but does so by refitting a model without any of the random factors, which is evidently not correct).
I suggest that you use good old lme (in package nlme). It has confint, and if you need confint of contrasts, there is a series of choices (estimable in gmodels, contrast in contrasts, glht in multcomp).
Why p-values and confint are absent in lmer: see http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76742.html .
Assuming a normal approximation for the fixed effects (which confint would also have done), we can obtain 95% confidence intervals by
estimate + 1.96*standard error.
The following does not apply to the variance components/random effects.
library("lme4")
mylm <- lmer(Reaction ~ Days + (Days|Subject), data =sleepstudy)
# standard error of coefficient
days_se <- sqrt(diag(vcov(mylm)))[2]
# estimated coefficient
days_coef <- fixef(mylm)[2]
upperCI <- days_coef + 1.96*days_se
lowerCI <- days_coef - 1.96*days_se
I'm going to add a bit here. If m is a fitted (g)lmer model (most of these work for lme too):
fixef(m) is the canonical way to extract coefficients from mixed models (this convention began with nlme and has carried over to lme4)
you can get the full coefficient table with coef(summary(m)); if you have loaded lmerTest before fitting the model, or convert the model after fitting (and then loading lmerTest) via coef(summary(as(m,"merModLmerTest"))), then the coefficient table will include p-values. (The coefficient table is a matrix; you can extract the columns via e.g. ctab[,"Estimate"], ctab[,"Pr(>|t|)"], or convert the matrix to a data frame and use $-indexing.)
As stated above you can get likelihood profile confidence intervals via confint(m); these may be computationally intensive. If you use confint(m, method="Wald") you'll get the standard +/- 1.96SE confidence intervals. (lme uses intervals(m) instead of confint().)
If you prefer to use broom.mixed:
tidy(m,effects="fixed") gives you a table with estimates, standard errors, etc.
tidy(as(m,"merModLmerTest"), effects="fixed") (or fitting with lmerTest in the first place) includes p-values
adding conf.int=TRUE gives (Wald) CIs
adding conf.method="profile" (along with conf.int=TRUE) gives likelihood profile CIs
You can also get confidence intervals by parametric bootstrap (method="boot"), which is considerably slower but more accurate in some circumstances.
To find the coefficient, you can simply use the summary function of lme4
m = lm(resp ~ 0 + var1 + var1:var2) # var1 categorical, var2 continuous
m_summary <- summary(m)
to have all coefficients :
m_summary$coefficient
If you want the confidence interval, multiply the standart error by 1.96:
CI <- m_summary$coefficient[,"Std. Error"]*1.96
print(CI)
I'd suggest tab_model() function from sjPlot package as alternative. Clean and readable output ready for markdown. Reference here and examples here.
For those more visually inclined plot_model() from the same package might come handy too.
Alternative solution is via parameters package using model_parameters() function.

Resources