I want to fit a mixed model with data containing missing values.
The imputation is performed with mice.
How can I compare the original data model fit to the mice one?
Example code..
## dummy data
set.seed(123)
DF <- data.frame(countryname = rep(LETTERS[1:10],each = 10), x1 = sample(10,100,replace = T),x2 = sample(5,100,replace = T), y = sample(10,100,replace = T))
# impute NAs
DF[sample(100,10),c("x1")] <- NA
DF[sample(100,10),c("x2")] <- NA
DF[sample(100,10),c("y")] <- NA
#
library(mice)
imp = mice(data = DF, m = 10, printFlag = FALSE)
fit = with(imp, expr=lme4::lmer(y~ x1+x2+ (1 | countryname)))
library(broom.mixed)
pool(fit)
summary(fit)
## fit to original data
fitor= lme4::lmer(y~ x1+x2+ (1 | countryname),data=DF)
## how to compare model estimates for fit and fitor?
## example output
##
## =======================================
## base w/SES
## ---------------------------------------
## (Intercept) 0.105 -0.954 ***
## (0.058) (0.085)
## x1 -0.497 *** -0.356 ***
## (0.058) (0.054)
## x2 -0.079 -0.102 *
## (0.043) (0.040)
## ---------------------------------------
## R2 0.039 0.157
## Nobs 4073 4073
## =======================================
## *** p < 0.001, ** p < 0.01, * p < 0.05
###
Related
I'm trying to create my first function in R to calculate moderation analysis.
But now a problem accured I cannot solve :/
When I run my function, I don't get any output..
I also tried print() and return() with the same result.
Any recommendations?
Moderation <- function(Mod, UV, AV) {
meanUV <- mean(UV, na.rm = TRUE)
sdUV <- sd(UV, na.rm = TRUE)
ZUV <-((UV - meanUV)/sdUV)
meanMod <- mean(Mod, na.rm = TRUE)
sdMod <- sd(Mod, na.rm = TRUE)
ZMod <- (Mod - meanMod)/sdMod
Interaktion <- ZUV*ZAV
Moderation.fit <- 'AV ~ ZUV + ZMod + Interaktion'
summary(sem(model = Moderation.fit, data = MyData, meanstructure = TRUE))
}
Moderation(MyData$SKK, MyData$ZDT2, MyData$HO4)
Thank you for your help!
The problem in your function is that you refer to MyData in the last line even though you do not pass this dataset to the function. Actually, the function does look into your global environment and finds MyData there. But your z-scored variables and Interaktion are not included in this version of MyData, so the function reports an error about "missing variables in dataset".
It works with the following changes:
Moderation <- function(Mod, UV, AV) {
require(lavaan) #This ensures the function works even if you did not do library(lavaan) first.
ZMod <- scale(Mod) #I simplified stuff here. The "scale" function is equivalent to what you did.
ZUV <- scale(UV)
Interaktion <- ZUV*ZMod
MyData <- as.data.frame(list(ZMod=ZMod, ZUV=ZUV, AV=AV, Interaktion=Interaktion)) #Here, MyData is generated so that you can use it further below
Moderation.fit <- 'AV ~ ZUV + ZMod + Interaktion' #nothing changed here
summary(sem(model = Moderation.fit, data = MyData, meanstructure = TRUE))
}
I tested it with the popular mtcars dataset:
data(mtcars)
Moderation(mtcars$am, mtcars$hp, mtcars$mpg)
##lavaan 0.6-7 ended normally after 28 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 5
##
## Number of observations 32
##
##Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
##Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
##Regressions:
## Estimate Std.Err z-value P(>|z|)
## AV ~
## ZUV -4.043 0.560 -7.225 0.000
## ZMod 2.633 0.513 5.134 0.000
## Interaktion 0.014 0.527 0.026 0.979
##
##Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .AV 20.094 0.505 39.785 0.000
##
##Variances:
## Estimate Std.Err z-value P(>|z|)
## .AV 7.670 1.917 4.000 0.000
Viel Erfolg und Spass damit!
stargazer(model1, model2, title = "Models", header=FALSE,
dep.var.labels.include = FALSE,
column.labels = c("Count", "Percentage"),
style = "ajs",
report = "vcp*",
single.row = TRUE)
This is my code to create regression tables with stargazer. However, the p-value still shows up below the coefficient estimates. How do I get p-values to show up next to the coefficient estimates?
You may replace standard errors with p-values. Put models into a list, which allows you to use lapply.
model1 <- lm(mpg ~ hp, mtcars)
model2 <- lm(mpg ~ hp + cyl, mtcars)
model.lst <- list(model1, model2)
stargazer::stargazer(model.lst, title = "Models", header=FALSE,
dep.var.labels.include = FALSE,
column.labels = c("Count", "Percentage"),
style = "ajs",
report = "vcs*",
single.row = TRUE, type="text",
se=lapply(model.lst, function(x) summary(x)$coef[,4]))
# Models
# =================================================================
# Count Percentage
# 1 2
# -----------------------------------------------------------------
# hp -.068 (0.000)*** -.019 (.213)
# cyl -2.265 (0.000)***
# Constant 30.099 (0.000)*** 36.908 (0.000)***
# Observations 32 32
# R2 .602 .741
# Adjusted R2 .589 .723
# Residual Std. Error 3.863 (df = 30) 3.173 (df = 29)
# F Statistic 45.460*** (df = 1; 30) 41.422*** (df = 2; 29)
# -----------------------------------------------------------------
# Notes: *P < .05
# **P < .01
# ***P < .001
Note, that this is also possible with texreg which might look a little bit cleaner and the package is well maintained.
texreg::screenreg(model.lst, single.row=TRUE,
reorder.coef=c(2:3, 1),
custom.model.names=c("Count", "Percentage"),
override.se=lapply(model.lst, function(x) summary(x)$coef[,4]),
override.pvalues=lapply(model.lst, function(x) summary(x)$coef[,4]),
digits=3
)
# ===================================================
# Count Percentage
# ---------------------------------------------------
# hp -0.068 (0.000) *** -0.019 (0.213)
# cyl -2.265 (0.000) ***
# (Intercept) 30.099 (0.000) *** 36.908 (0.000) ***
# ---------------------------------------------------
# R^2 0.602 0.741
# Adj. R^2 0.589 0.723
# Num. obs. 32 32
# ===================================================
# *** p < 0.001; ** p < 0.01; * p < 0.05
I'm trying to compare two models built using multiple imputations. When I try to compare the models, mice's pool.compare() gives the error that Error: No glance method for objects of class call or Error: unequal number of imputations for 'fit1' and 'fit0', even though I'm using the same imputed dataset. Here is a reproducible example:
library(mice)
library(miceadds)
library(lmerTest)
imp <- mice(nhanes, maxit = 2, m = 4)
summary(m0 <- pool(with(imp, lmerTest::lmer(bmi ~ 1 + (1 | chl)))))
summary(m1 <- pool(with(imp, lmerTest::lmer(bmi ~ 1 + hyp + (1 | chl)))))
pool.compare(m0, m1)
Error: No glance method for objects of class call
You need to compare the objects before pooling. And the order matters, m1 > m0. (Note: I used lme4 here.)
library(mice)
library(miceadds)
set.seed(42)
imp <- mice(nhanes, maxit = 2, m = 4)
summary(pool(m0 <- with(imp, lme4::lmer(bmi ~ 1 + (1 | chl)))))
# boundary (singular) fit: see ?isSingular
# estimate std.error statistic df p.value
# (Intercept) 26.60791 0.9722573 27.36715 18.24326 4.440892e-16
summary(pool(m1 <- with(imp, lme4::lmer(bmi ~ 1 + hyp + (1 | chl)))))
# boundary (singular) fit: see ?isSingular
# estimate std.error statistic df p.value
# (Intercept) 27.2308286 3.759095 7.2439857 5.181367 0.0006723643
# hyp -0.5310514 2.746281 -0.1933711 4.928222 0.8543848658
pool.compare(m1, m0)
# $call
# pool.compare(fit1 = m1, fit0 = m0)
#
# $call11
# with.mids(data = imp, expr = lme4::lmer(bmi ~ 1 + hyp + (1 |
# chl)))
#
# $call12
# mice(data = nhanes, m = 4, maxit = 2)
#
# $call01
# with.mids(data = imp, expr = lme4::lmer(bmi ~ 1 + (1 | chl)))
#
# $call02
# mice(data = nhanes, m = 4, maxit = 2)
#
# $method
# [1] "wald"
#
# $nmis
# age bmi hyp chl
# 0 9 8 10
#
# $m
# [1] 4
#
# $qbar1
# (Intercept) hyp
# 27.2308286 -0.5310514
#
# $qbar0
# (Intercept)
# 26.60791
#
# $ubar1
# [1] 6.916910 3.560812
#
# $ubar0
# [1] 0.8786098
#
# $deviances
# NULL
#
# $Dm
# [,1]
# [1,] 0.03739239
#
# $rm
# [1] 1.118073
#
# $df1
# [1] 1
#
# $df2
# [1] 10.76621
#
# $pvalue
# [,1]
# [1,] 0.850268
I'm trying to reproduce the 95% CI that Stata produces when you run a model with clustered standard errors. For example:
regress api00 acs_k3 acs_46 full enroll, cluster(dnum)
Regression with robust standard errors Number of obs = 395
F( 4, 36) = 31.18
Prob > F = 0.0000
R-squared = 0.3849
Number of clusters (dnum) = 37 Root MSE = 112.20
------------------------------------------------------------------------------
| Robust
api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
acs_k3 | 6.954381 6.901117 1.008 0.320 -7.041734 20.9505
acs_46 | 5.966015 2.531075 2.357 0.024 .8327565 11.09927
full | 4.668221 .7034641 6.636 0.000 3.24153 6.094913
enroll | -.1059909 .0429478 -2.468 0.018 -.1930931 -.0188888
_cons | -5.200407 121.7856 -0.043 0.966 -252.193 241.7922
------------------------------------------------------------------------------
I am able to reproduce the coefficients and the standard errors:
library(readstata13)
library(texreg)
library(sandwich)
library(lmtest)
clustered.se <- function(model_result, data, cluster) {
model_variables <-
intersect(colnames(data), c(colnames(model_result$model), cluster))
model_rows <- rownames(model_result$model)
data <- data[model_rows, model_variables]
cl <- data[[cluster]]
M <- length(unique(cl))
N <- nrow(data)
K <- model_result$rank
dfc <- (M / (M - 1)) * ((N - 1) / (N - K))
uj <-
apply(estfun(model_result), 2, function(x)
tapply(x, cl, sum))
vcovCL <- dfc * sandwich(model_result, meat = crossprod(uj) / N)
standard.errors <- coeftest(model_result, vcov. = vcovCL)[, 2]
p.values <- coeftest(model_result, vcov. = vcovCL)[, 4]
clustered.se <-
list(vcovCL = vcovCL,
standard.errors = standard.errors,
p.values = p.values)
return(clustered.se)
}
elemapi2 <- read.dta13(file = 'elemapi2.dta')
lm1 <-
lm(formula = api00 ~ acs_k3 + acs_46 + full + enroll,
data = elemapi2)
clustered_se <-
clustered.se(model_result = lm1,
data = elemapi2,
cluster = "dnum")
htmlreg(
lm1,
override.se = clustered_se$standard.errors,
override.p = clustered_se$p.value,
star.symbol = "\\*",
digits = 7
)
=============================
Model 1
-----------------------------
(Intercept) -5.2004067
(121.7855938)
acs_k3 6.9543811
(6.9011174)
acs_46 5.9660147 *
(2.5310751)
full 4.6682211 ***
(0.7034641)
enroll -0.1059909 *
(0.0429478)
-----------------------------
R^2 0.3848830
Adj. R^2 0.3785741
Num. obs. 395
RMSE 112.1983218
=============================
*** p < 0.001, ** p < 0.01, * p < 0.05
Alas, I cannot reproduce the 95% confidence Interval:
screenreg(
lm1,
override.se = clustered_se$standard.errors,
override.p = clustered_se$p.value,
digits = 7,
ci.force = TRUE
)
========================================
Model 1
----------------------------------------
(Intercept) -5.2004067
[-243.8957845; 233.4949710]
acs_k3 6.9543811
[ -6.5715605; 20.4803228]
acs_46 5.9660147 *
[ 1.0051987; 10.9268307]
full 4.6682211 *
[ 3.2894567; 6.0469855]
enroll -0.1059909 *
[ -0.1901670; -0.0218148]
----------------------------------------
R^2 0.3848830
Adj. R^2 0.3785741
Num. obs. 395
RMSE 112.1983218
========================================
* 0 outside the confidence interval
If I do it 'by hand', I get the same thing than with texreg:
level <- 0.95
a <- 1-(1 - level)/2
coeff <- lm1$coefficients
se <- clustered_se$standard.errors
lb <- coeff - qnorm(a)*se
ub <- coeff + qnorm(a)*se
> lb
(Intercept) acs_k3 acs_46 full enroll
-243.895784 -6.571560 1.005199 3.289457 -0.190167
> ub
(Intercept) acs_k3 acs_46 full enroll
233.49497100 20.48032276 10.92683074 6.04698550 -0.02181481
What is Stata doing and how can I reproduce it in R?
PS: This is a follow up question.
PS2: The Stata data is available here.
It looks like Stata is using confidence intervals based on t(36) rather than Z (i.e. Normal errors).
Taking the values from the Stata output
coef=6.954381; rse= 6.901117 ; lwr= -7.041734; upr= 20.9505
(upr-coef)/rse
## [1] 2.028095
(lwr-coef)/rse
## [1] -2.028094
Computing/cross-checking the tail values for t(36):
pt(2.028094,36)
## [1] 0.975
qt(0.975,36)
## [1] 2.028094
I don't know how you pass confidence intervals to texreg. Since you haven't given a reproducible example (I don't have elemapi2.dta) I can't say exactly how you would get the df, but it looks like you would want tdf <- length(unique(elemapi2$dnum))-1
level <- 0.95
a <- 1- (1 - level)/2
bounds <- coef(lm1) + c(-1,1)*clustered_se*qt(a,tdf)
Indeed Stata is using the t distribution rather than the normal distribution. There is now a really easy solution to getting confidence intervals that match Stata into texreg using lm_robust from the estimatr package, which you can install from CRAN install.packages(estimatr).
> library(estimatr)
> lmro <- lm_robust(mpg ~ hp, data = mtcars, clusters = cyl, se_type = "stata")
> screenreg(lmro)
===========================
Model 1
---------------------------
(Intercept) 30.10 *
[13.48; 46.72]
hp -0.07
[-0.15; 0.01]
---------------------------
R^2 0.60
Adj. R^2 0.59
Num. obs. 32
RMSE 3.86
===========================
* 0 outside the confidence interval
When I attempt to run wald.test (from the aod package) on a categorical variable on my linear model, I get the following error:
Error in L %*% V : non-conformable arguments
The code that I'm having trouble with:
m1 <- glm(comment_count ~ factor(has_conflicts) + factor(base_repo_id) + **snip**, data = mydata)
summary(m1) # shows that base_repo_id's factors are coefficients 3 through 12
# Determine whether base_repo_id matters
wald.test(b = coef(m1), Sigma = vcov(m1), Terms = 3:12)
As I understand it, wald.test's b parameter is the linear regression's coefficients, Sigma is the regression's variances, and Terms selects the variables I want to run the Wald test on. So why am I getting the error?
In principle your code looks ok but it must be something about the particular fit to your data that did not work. Maybe there have been problems with non-identified parameters or a singular covariance matrix or something like that?
If I create a random data set with the variables above, then everything runs smoothly:
set.seed(1)
mydata <- data.frame(
comment_count = rpois(500, 3),
has_conflicts = sample(0:1, 500, replace = TRUE),
base_repo_id = sample(1:11, 500, replace = TRUE)
)
m1 <- glm(comment_count ~ factor(has_conflicts) + factor(base_repo_id),
data = mydata)
The Wald test can then be carried out by base R's anova() (which in the Gaussian case is equivalent to the Wald test):
m0 <- update(m1, . ~. - factor(base_repo_id))
anova(m0, m1, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: comment_count ~ factor(has_conflicts)
## Model 2: comment_count ~ factor(has_conflicts) + factor(base_repo_id)
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 498 1426.1
## 2 488 1389.2 10 36.91 0.2256
Or you can use aod:
library("aod")
wald.test(b = coef(m1), Sigma = vcov(m1), Terms = 3:12)
## Wald test:
## ----------
##
## Chi-squared test:
## X2 = 13.0, df = 10, P(> X2) = 0.23
Or lmtest:
library("lmtest")
waldtest(m1, "factor(base_repo_id)", test = "Chisq")
## Wald test
##
## Model 1: comment_count ~ factor(has_conflicts) + factor(base_repo_id)
## Model 2: comment_count ~ factor(has_conflicts)
## Res.Df Df Chisq Pr(>Chisq)
## 1 488
## 2 498 -10 12.966 0.2256
Or car:
library("car")
linearHypothesis(m1, names(coef(m1))[3:12])
## Linear hypothesis test
##
## Hypothesis:
## factor(base_repo_id)2 = 0
## factor(base_repo_id)3 = 0
## factor(base_repo_id)4 = 0
## factor(base_repo_id)5 = 0
## factor(base_repo_id)6 = 0
## factor(base_repo_id)7 = 0
## factor(base_repo_id)8 = 0
## factor(base_repo_id)9 = 0
## factor(base_repo_id)10 = 0
## factor(base_repo_id)11 = 0
##
## Model 1: restricted model
## Model 2: comment_count ~ factor(has_conflicts) + factor(base_repo_id)
##
## Res.Df Df Chisq Pr(>Chisq)
## 1 498
## 2 488 10 12.966 0.2256
I have met the same problem.
The error shows that the size of two matrix L and V does not match.
Please check whether there are NA elements in your coefficients.
vcov() removes NA elements automatically which changes the size of the matrix so that their sizes do not match.