Estimating regression paths in lavaan, df and test statistics - r

I am trying to compare structural equation models using lavaan in R.
I have 4 latent variables, three of which are being estimated by 8 observed variables and one of which is being estimated by 2 observed variables.
When I run the measurement model, the test user model has 293 degrees of freedom with 58 model parameters. When I run the structural model, with three additional regression paths estimated, I receive the same model statistics with the same number of degrees of freedom (293) and the same number of model parameters (58).
Because the models are identical, and I try to compare them with anova, there are no degrees of freedom difference, and no chi-square difference, because it is the same model output.
So, I receive the following error
Warning message: In lavTestLRT(object = object, ..., model.names =
NAMES) : lavaan WARNING: some models have the same degrees of
freedom
semPaths is showing the regression coefficients estimated, and the output for parameter estimates are showing the regression coefficients for the structural model, but the fit indices (AIC, BIC, etc.), chi-square, and degrees of freedom are identical.
I thought I had simply put the wrong model in the summary function, but no, that was not it.
I am trying not to be a dolt, but I cannot figure out why lavaan is giving me exactly the same df and chi-square when I am estimating three additional paths/parameters.
Any insight is welcomed. Again, I apologize of I am missing the obvious.
Here is the code:
# Pre-Post Measurement Model 1 - (TM)
MeasTM1 <- '
posttransp8 =~ post_Understand_Successful_Work +
post_Purpose_Assignment +
post_Assignment_Objectives_Course +
post_Instructor_Identified_Goal +
post_Steps_Required +
post_Assignment_Instructions +
post_Detailed_Directions +
post_Knew_How_Evaluated
preskills8 =~ pre_Express_Ideas_Write +
pre_Express_Ideas_Speak +
pre_Collaborate_Academic +
pre_Analyz + pre_Synthesize +
pre_Apply_New_Contexts +
pre_Consider_Ethics +
pre_Capable_Self_Learn
postskills8 =~ post_Express_Ideas_Write +
post_Express_Ideas_Speak +
post_Collaborate_Academic +
post_Analyz + post_Synthesize +
post_Apply_New_Contexts +
post_Consider_Ethics +
post_Capable_Self_Learn
postbelong2 =~ post_Belong_School_Commty + post_Helped_Belong_School_Commty
'
fitMeasTM1 <- sem(MeasTM1, data=TILTSEM)
summary(fitMeasTM1, standardized=TRUE, fit.measures=TRUE)
semPaths(fitMeasTM1, whatLabels = "std", layout = "tree")
# Pre-Post Factor Model 1 - (TM)
#Testing regression on Pre-Post Skills
FactTM1 <- '
#latent factors
posttransp8 =~ post_Understand_Successful_Work +
post_Purpose_Assignment +
post_Assignment_Objectives_Course +
post_Instructor_Identified_Goal +
post_Steps_Required +
post_Assignment_Instructions +
post_Detailed_Directions +
post_Knew_How_Evaluated
preskills8 =~ pre_Express_Ideas_Write +
pre_Express_Ideas_Speak +
pre_Collaborate_Academic +
pre_Analyz + pre_Synthesize +
pre_Apply_New_Contexts +
pre_Consider_Ethics +
pre_Capable_Self_Learn
postskills8 =~ post_Express_Ideas_Write +
post_Express_Ideas_Speak +
post_Collaborate_Academic +
post_Analyz + post_Synthesize +
post_Apply_New_Contexts +
post_Consider_Ethics +
post_Capable_Self_Learn
postbelong2 =~ post_Belong_School_Commty + post_Helped_Belong_School_Commty
#regressions
postskills8 ~ preskills8 + postbelong2 + posttransp8
'
fitFactTM1 <- sem(FactTM1, data=TILTSEM)
summary(fitFactTM1, standardized=TRUE, fit.measures=TRUE)
semPaths(fitFactTM1, whatLabels = "std", layout = "tree")
anova(fitMeasTM1,fitFactTM1)
Here is the model output for the two models (to show that they are identical):
=========================Pre-Post Measurement Model 1 - (TM)=============================
Estimator ML
Optimization method NLMINB
Number of model parameters 58
Used Total
Number of observations 521 591
Model Test User Model:
Test statistic 1139.937
Degrees of freedom 293
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 4720.060
Degrees of freedom 325
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.807
Tucker-Lewis Index (TLI) 0.786
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -13335.136
Loglikelihood unrestricted model (H1) -12765.167
Akaike (AIC) 26786.271
Bayesian (BIC) 27033.105
Sample-size adjusted Bayesian (BIC) 26849.000
Root Mean Square Error of Approximation:
RMSEA 0.074
90 Percent confidence interval - lower 0.070
90 Percent confidence interval - upper 0.079
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.068
=========================Pre-Post Factor Model 1 - (TM)======================
Estimator ML
Optimization method NLMINB
Number of model parameters 58
Used Total
Number of observations 521 591
Model Test User Model:
Test statistic 1139.937
Degrees of freedom 293
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 4720.060
Degrees of freedom 325
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.807
Tucker-Lewis Index (TLI) 0.786
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -13335.136
Loglikelihood unrestricted model (H1) -12765.167
Akaike (AIC) 26786.271
Bayesian (BIC) 27033.105
Sample-size adjusted Bayesian (BIC) 26849.000
Root Mean Square Error of Approximation:
RMSEA 0.074
90 Percent confidence interval - lower 0.070
90 Percent confidence interval - upper 0.079
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.068

You aren't using up more degrees of freedom.
One thing that makes lavaan::sem() dangerous to use over lavaan::lavaan() is that its defaults are hard to remember and/or notice. If you look at ?lavaan::sem, you will see those defaults:
The sem function is a wrapper for the more general lavaan function, but setting the following default options: int.ov.free = TRUE, int.lv.free = FALSE, auto.fix.first = TRUE (unless std.lv = TRUE), auto.fix.single = TRUE, auto.var = TRUE, auto.cov.lv.x = TRUE, auto.efa = TRUE, auto.th = TRUE, auto.delta = TRUE, and auto.cov.y = TRUE
You can find out what this means via ?lavOptions:
auto.cov.lv.x: If TRUE, the covariances of exogenous latent variables are included in the model and set free.
By default, all exogenous latent variables (i.e., all of your latent factors) are correlated in your model already.
I'm also not sure how you are identifying the 2-item factor, so I'm surprised this does not throw a warnings, unless you're ignoring it.

Related

lme4 1.1-27.1 error: pwrssUpdate did not converge in (maxit) iterations

Sorry that this error has been discussed before, each answer on stackoverflow seems specific to the data
I'm attempting to run the following negative binomial model in lme4:
Model5.binomial<-glmer.nb(countvariable ~ waves + var1 + dummycodedvar2 + dummycodedvar3 + (1|record_id), data=datadfomit)
However, I receive the following error when attempting to run the model:
Error in f_refitNB(lastfit, theta = exp(t), control = control) :pwrssUpdate did not converge in (maxit) iterations
I first ran the model with only 3 predictor variables (waves, var1, dummycodedvar2) and got the same error. But centering the predictors fixed this problem and the model ran fine.
Now with 4 variables (all centered) I expected the model to run smoothly, but receive the error again.
Since every answer on this site seems to point towards a problem in the data, data that replicates the problem can be found here:
https://file.io/3vtX9RwMJ6LF
Your response variable has a lot of zeros:
I would suggest fitting a model that takes account of this, such as a zero-inflated model. The GLMMadaptive package can fit zero-inflated negative binomial mixed effects models:
## library(GLMMadaptive)
## mixed_model(countvariable ~ waves + var1 + dummycodedvar2 + dummycodedvar3, ## random = ~ 1 | record_id, data = data,
## family = zi.negative.binomial(),
## zi_fixed = ~ var1,
## zi_random = ~ 1 | record_id) %>% summary()
Random effects covariance matrix:
StdDev Corr
(Intercept) 0.8029
zi_(Intercept) 1.0607 -0.7287
Fixed effects:
Estimate Std.Err z-value p-value
(Intercept) 1.4923 0.1892 7.8870 < 1e-04
waves -0.0091 0.0366 -0.2492 0.803222
var1 0.2102 0.0950 2.2130 0.026898
dummycodedvar2 -0.6956 0.1702 -4.0870 < 1e-04
dummycodedvar3 -0.1746 0.1523 -1.1468 0.251451
Zero-part coefficients:
Estimate Std.Err z-value p-value
(Intercept) 1.8726 0.1284 14.5856 < 1e-04
var1 -0.3451 0.1041 -3.3139 0.00091993
log(dispersion) parameter:
Estimate Std.Err
0.4942 0.2859
Integration:
method: adaptive Gauss-Hermite quadrature rule
quadrature points: 11
Optimization:
method: hybrid EM and quasi-Newton
converged: TRUE

How to interpret significant test of moderators and null percentage of explained variance in meta-analysis using "metafor" package?

I'm currently working on a meta-analysis of proportions (number of mosquitoes transmitting a disease/number of mosquitoes tested), using metafor package (Viechtbauer, 2010). My aim is to compute a summarized effect size for each of 5 mosquitoes species. As far, my analysis strategy is:
Using PFT (double arcsine) transformation in order to normalize data (I have lots of 0 and 1 as values)
Running overall model to assess heterogeneity (test for residual heterogeneity significative)
As I have several measures coming from each of the articles included in the meta-analysis, I assessed the necessity of using a three-level model (LRT significative, for "measure" and "article" (measure nested in article)
Using subgroup analysis, with mosquitoes species as moderator
Assessing for residual heterogeneity
Testing some moderators to try to explain residual heterogeneity
When I performed subgroup analysis, I got a test for moderator significative for the variable "Specie". But I then wanted to know which part of variance is explained by this significative variable, and I obtained -0.9% (truncated at 0%) (I used "pseudo R-squared" method suggested by W. Viechtbauer here).
So, my question is: would it be possible/coherent to have a test of moderators significative and no variance explained by this moderator ? How to explain it ?
As I use REML estimation, I can't use LRT to test if the variable is significative (and I would rather not use ML back just to compute LRT).
Thanks in advance if someone can help me,
Best regards,
Alex
If useful, here is a abstract of the code I used:
ies.da <- escalc(xi = data_test[, "n"],
ni = data_test[, "n_tested"],
data = data,
measure = "PFT",
add = 0)
subganal.specie.mv <- rma.mv(yi, vi,
data = ies.da,
mods = ~factor(Specie),
method = "REML",
random = ~1|article/measure)
subganal.no.specie.mv <- rma.mv(yi, vi,
data = ies.da,
method = "REML",
random = ~1|article/measure)
pseudo.r.squared <- (sum(subganal.no.specie.mv$sigma2) - sum(subganal.specie.mv$sigma2)) / sum(subganal.no.specie.mv$sigma2)
As result, I get a test of moderator significative:
subganal.specie.mv
Multivariate Meta-Analysis Model (k = 165; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0184 0.1358 21 no article
sigma^2.2 0.0215 0.1466 165 no article/measure
Test for Residual Heterogeneity:
QE(df = 161) = 897.9693, p-val < .0001
Test of Moderators (coefficients 2:4):
QM(df = 3) = 12.3578, p-val = 0.0063
Model Results:
estimate se zval pval ci.lb ci.ub
intrcpt 0.6172 0.1051 5.8729 <.0001 0.4112 0.8232 ***
factor(Specie)1 -0.0123 0.1240 -0.0990 0.9211 -0.2554 0.2308
factor(Specie)2 -0.2110 0.1178 -1.7913 0.0733 -0.4419 0.0199 .
factor(Specie)3 -0.2299 0.1008 -2.2813 0.0225 -0.4274 -0.0324 *
But my "pseudo R squared" is null:
pseudo.r.squared
[1] -0.009012437

Compute the confidence interval of the hazard ratio of a survival regression model

I tried to get confident interval of hazard ratio in survreg() function, but it's not that straightforward as coxph.
Call:
survreg(formula = Surv(survival, DIED) ~ AGE + GENDER + PLATE +
NEUTRO + NIH, data = IMRAWandIST, dist = "exponential")
Value Std. Error z p
PLATE 0.00236 0.000367 6.42 1.39e-10
EUTRO -0.02726 0.016695 -1.63 1.03e-01
Scale fixed at 1
Exponential distribution
Loglik(model)= -4628.6 Loglik(intercept only)= -4736.1
Chisq= 215 on 5 degrees of freedom, p= 0
Number of Newton-Raphson Iterations: 5
n=917 (28 observations deleted due to missingness)
#estimate of beta
a <- c(coefficients(summary(fitexp)))
print(coef <- (a * -1 * 1 / 1)
#estimate of HR
print(HR <- exp(coef))
The result doesn't have CI, only standard deviation, so my question is how can I transform SE of AFT coefficient into PH coefficient then compute the CI of HR.
I'm kinda stuck here. Can someone help?

R: loglikelihood of Saturated Model in GLM

Let LL = loglikelihood
Residual Deviance = 2(LL(Saturated Model) - LL(Proposed Model))
However, when I use glm function, it seems that
Residual Deviance = -2LL(Proposed Model)
For example,
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)
mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
summary(mylogit)
###
Residual deviance: 458.52 on 394 degrees of freedom
AIC: 470.52
#Residual deviance
-2*logLik(mylogit)
##'log Lik.' 458.5175 (df=6)
#AIC
-2*logLik(mylogit)+2*(5+1)
##470.5175
Where is LL(Saturated Model) and how can I get it's value in R?
Thank you.
I have got the answer: it only happens when the log likelihood of the saturated model is 0, which for discrete models implies that the probability of the observed data under the saturated model is 1. Binary data is pretty much the only case where this is true (because individual fitted probabilities become either zero or one).H and Here for details.

Access z-value and other statistics in output of Zelig relogit

I want to compute a logit regression for rare events. I decided to use the Zelig package (relogit function) to do so.
Usually, I use stargazer to extract and save regression results. However, there seem to be compatibility issues with these two packages (Using stargazer with Zelig).
I now want to extract the following information from the Zelig relogit output:
Coefficients, z values, p values, number of observations, log likelihood, AIC
I have managed to extract the p-values and coefficients. However, I failed at the rest. But I am sure these values must be accessible somehow, because they are reported in the summary() output (however, I did not manage to store the summary output as an R object). The summary cannot be processed in the same way as a regular glm summary (https://stats.stackexchange.com/questions/176821/relogit-model-from-zelig-package-in-r-how-to-get-the-estimated-coefficients)
A reproducible example:
##Initiate package, model and data
require(Zelig)
data(mid)
z.out1 <- zelig(conflict ~ major + contig + power + maxdem + mindem + years,
data = mid, model = "relogit")
##Call summary on output (reports in console most of the needed information)
summary(z.out1)
##Storing the summary fails and only produces a useless object
summary(z.out1) -> z.out1.sum
##Some of the output I can access as follows
z.out1$get_coef() -> z.out1.coeff
z.out1$get_pvalue() -> z.out1.p
z.out1$get_se() -> z.out1.se
However, I did not find similar commands for other elements, such as z values, AIC etc. However, as they are shown in the summary() call, they should be accessible somehow.
The summary call result:
Model:
Call:
z5$zelig(formula = conflict ~ major + contig + power + maxdem +
mindem + years, data = mid)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.0742 -0.4444 -0.2772 0.3295 3.1556
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.535496 0.179685 -14.111 < 2e-16
major 2.432525 0.157561 15.439 < 2e-16
contig 4.121869 0.157650 26.146 < 2e-16
power 1.053351 0.217243 4.849 1.24e-06
maxdem 0.048164 0.010065 4.785 1.71e-06
mindem -0.064825 0.012802 -5.064 4.11e-07
years -0.063197 0.005705 -11.078 < 2e-16
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3979.5 on 3125 degrees of freedom
Residual deviance: 1868.5 on 3119 degrees of freedom
AIC: 1882.5
Number of Fisher Scoring iterations: 6
Next step: Use 'setx' method
Use from_zelig_model for deviance, AIC.
m <- from_zelig_model(z.out1)
m$aic
...
Z-values are coefficient / sd.
z.out1$get_coef()[[1]]/z.out1$get_se()[[1]]

Resources