Cross Validation Structural Equation Modeling - r

Not sure why it is difficult to find info on this topic.
I want to CV my SEM model. N = 360. I've pulled 70% of data into a train set and have built the model, first on theory then using modification indices. I also have a test data frame where I have the observed values (for well-being), but I want to use the model to predict the values. lavPredict only seems to be used to predict values of latent variables. Perhaps I'm missing something, but doesn't seem so straightforward as in lmer or basic linear regression. Does one just use the model fit indices from the test dataset? Seems like one should be able to compare observed and predicted values in SEM.
I've included some data here: https://drive.google.com/file/d/1AX50DFNik30Qsyiyp6XnPMETNfVXK83r/view?usp=sharing
Here is the final model I have through the train dataset. When I go to test it, I just get this
Error in lavPredict(fit.latent.8, newdata = test) :
inherits(object, "lavaan") is not TRUE
Thanks much!
fit.latent.8 <- '#factor loadings; measurement model portion
pl =~ exercisescore + mindfulnessscore + promistscore
sl =~ family_support + friendshipcount + friendshipnet + sense_of_community
trauma =~ neglectscore + abusescore + exposure + family_support + age + sesscore
#regressions: structural model
wellbeing ~ age + gender + ethnicity + sesscore + resiliencescore + pl + emotionalsupportscore + trauma
resiliencescore ~ age + sesscore + emotionalsupportscore + pl
emotionalsupportscore ~ sl + gender
#Covariances
friendshipnet~~age
friendshipnet ~~ abusescore
'
train.1 <- sem(fit.latent.8, data = train, meanstructure = TRUE, std.lv = TRUE)
summary(train.1, fit.measures = TRUE,standardized = TRUE, rsquare = TRUE, estimates = FALSE)
modindices(train.1, sort. = TRUE, minimum.value = 10)
test.1 <- sem(fit.latent.8, data = test, meanstructure = TRUE, std.lv = TRUE)
summary(test.1, fit.measures = TRUE,standardized = TRUE, rsquare = TRUE, estimates = FALSE)

Related

svyglm - how to code for a logistic regression model across all variables?

In R using GLM to include all variables you can simply use a . as shown How to succinctly write a formula with many variables from a data frame?
for example:
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
however I am struggling to do this with svydesign. I have many exploratory variables and an ID and weight variable, so first I create my survey design:
des <-svydesign(ids=~id, weights=~wt, data = df)
Then I try creating my binomial model using weights:
binom <- svyglm(y~.,design = des, family="binomial")
But I get the error:
Error in svyglm.survey.design(y ~ ., design = des, family = "binomial") :
all variables must be in design = argument
What am I doing wrong?
You typically wouldn't want to do this, because "all the variables" would include design metadata such as weights, cluster indicators, stratum indicators, etc
You can use col.names to extract all the variable names from a design object and then reformulate, probably after subsetting the names, eg with the api example in the package
> all_the_names <- colnames(dclus1)
> all_the_actual_variables <- all_the_names[c(2, 11:37)]
> reformulate(all_the_actual_variables,"y")
y ~ stype + pcttest + api00 + api99 + target + growth + sch.wide +
comp.imp + both + awards + meals + ell + yr.rnd + mobility +
acs.k3 + acs.46 + acs.core + pct.resp + not.hsg + hsg + some.col +
col.grad + grad.sch + avg.ed + full + emer + enroll + api.stu

how to visualize the coefficients from different models in just one plot?

I have 2 different datasets. To each one i apply the same plm regression. I would like to know how can i visualize, in the same plot, the estimated coefficients of each model.
mainstream <- plm(log(sum_plays) ~ cancel_public_events + close_public_transport + internationaltravel + restrictions_on_gatherings + school_closing + stay_at_home_requirements + workplace_closing + new_cases_per_million + new_deaths_per_million +
data = top200, model = "within")
long_tail <- plm(log(sum_plays) ~ cancel_public_events + close_public_transport + internationaltravel + restrictions_on_gatherings + school_closing + stay_at_home_requirements + workplace_closing + new_cases_per_million + new_deaths_per_million +
data = bottom, model = "within")
I can make the plot for each individual model, however i want to have the info of this both plots in just one. Probably differentiate the coefficients by color (i.e coefficients from "mainstream" in red and the coefficients from "longtail" in blue)
a <- plot_model(long_tail, transform = NULL, show.values = TRUE, value.offset =.3, terms = c("workplace_closing" , "stay_at_home_requirements", "school_closing", "close_public_transport", "internationaltravel", "restrictions_on_gatherings", "cancel_public_events"), title = "Coefficients for Long-Tail Music Consumption")
b <- plot_model(mainstream, transform = NULL, show.values = TRUE, value.offset =.3, terms = c("workplace_closing" , "stay_at_home_requirements", "school_closing", "close_public_transport", "internationaltravel", "restrictions_on_gatherings", "cancel_public_events"), title = "Coefficients for Long-Tail Music Consumption")

Fitting two coefplot in one graph using par(mfrow()) method

I'm trying to arrange two coefplot objects into one graph via the par(mfrow(,)) method, but it didn't work out. What did I do wrong? Or is that coefplot just doesn't work this way? What will be alternative method?
I've referenced this earlier thread, but I tend to think that mine is a quite different issue.
# load the data
dat <- readRDS(url("https://www.dropbox.com/s/88h7hmiroalx3de/act.rds?dl=1"))
#fit two models
library(lmer4)
act1.fit <- glmer(act1 ~ os + education + marital + nat6 + nat5 + nat4 + nat3 + nat2 + nat1 +
(1 | region_id), data = action, family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
action2.fit <- glmer(act2 ~ os + education + marital + nat6 + nat5 + nat4 + nat3 + nat2 + nat1 +
(1 | region_id), data = action, family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
# plot the two model individually
library(coefplot)
# construct coefplot objects
coefplot:::buildModelCI(action1.fit)
coefplot:::buildModelCI(action2.fit)
coefplot(action2.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
# arrange two plots in one graph
par(mfrow=c(1,2))
coefplot(action1.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
coefplot(action2.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
# didn't work ???

How to use predict function with my pooled results from mice()?

Hi I just started using R as part of a module in school. I have a data set with missing data and I have used mice() to impute the missing data. I'm now trying to use the predict function with my pooled results. However, I observed the following error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('mipo', 'data.frame')"
I have included my entire code below and I'd greatly apprciate it if y'all can help a novice out. Thanks!
```{r}
library(magrittr)
library(dplyr)
train = read.csv("Train_Data.csv", na.strings=c("","NA"))
test = read.csv("Test_Data.csv", na.strings=c("","NA"))
cols <- c("naCardiac", "naFoodNutrition", "naGenitourinary", "naGastrointestinal", "naMusculoskeletal", "naNeurological", "naPeripheralVascular", "naPain", "naRespiratory", "naSkin")
train %<>%
mutate_each_(funs(factor(.)),cols)
test %<>%
mutate_each_(funs(factor(.)),cols)
str(train)
str(test)
```
```{r}
library(mice)
md.pattern(train)
```
```{r}
miTrain = mice(train, m = 5, maxit = 50, meth = "pmm")
```
```{r}
model = with(miTrain, lm(LOS ~ Age + Gender + Race + Temperature + RespirationRate + HeartRate + SystolicBP + DiastolicBP + MeanArterialBP + CVP + Braden + SpO2 + FiO2 + PO2_POCT + Haemoglobin + NumWBC + Haematocrit + NumPlatelets + ProthrombinTime + SerumAlbumin + SerumChloride + SerumPotassium + SerumSodium + SerumLactate + TotalBilirubin + ArterialpH + ArterialpO2 + ArterialpCO2 + ArterialSaO2 + Creatinine + Urea + GCS + naCardiac + GCS + naCardiac + naFoodNutrition + naGenitourinary + naGastrointestinal + naMusculoskeletal + naNeurological + naPeripheralVascular + naPain + naRespiratory + naSkin))
model
summary(model)
```
```{r}
modelResults = pool(model)
modelResults
```
```{r}
pred = predict(modelResults, newdata = test)
PredTest = data.frame(test$PatientID, modelResults)
str(PredTest)
summary(PredTest)
```
One slightly hacky way to achieve this may be to take one of the fitted models created by fit() and replace the stored coefficients with the final pooled estimates. I haven't done detailed testing but it seems to be working on this simple example:
library(mice)
imp <- mice(nhanes, maxit = 2, m = 2)
fit <- with(data = imp, exp = lm(bmi ~ hyp + chl))
pooled <- pool(fit)
# Copy one of the fitted lm models fit to
# one of the imputed datasets
pooled_lm = fit$analyses[[1]]
# Replace the fitted coefficients with the pooled
# estimates (need to check they are replaced in
# the correct order)
pooled_lm$coefficients = summary(pooled)$estimate
# Predict - predictions seem to match the
# pooled coefficients rather than the original
# lm that was copied
predict(fit$analyses[[1]], newdata = nhanes)
predict(pooled_lm, newdata = nhanes)
As far as I know predict() for a linear regression should only depend
on the coefficients, so you shouldn't have to replace other
stored values in the fitted model (but you would have to if applying
methods other than predict()).

Predict function for heckman model

I use the example from the sampleSelection package
## Greene( 2003 ): example 22.8, page 786
data( Mroz87 )
Mroz87$kids <- ( Mroz87$kids5 + Mroz87$kids618 > 0 )
# Two-step estimation
test1 = heckit( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87 )
# ML estimation
test2 = selection( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87 )
pr2 <- predict(test2,Mroz87)
pr1 <- predict(test1,Mroz87)
My problem is that the predict function does not work. I get this error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('selection', 'maxLik', 'maxim', 'list')"
The predict function works for many models so I wonder why I get an error for heckman regression models.
-----------UPDATE-----------
I made some progress but I still need your help. I build an original heckman model for comparsion:
data( Mroz87 )
Mroz87$kids <- ( Mroz87$kids5 + Mroz87$kids618 > 0 )
test1 = heckit( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87[1:600,] )
After that I start building it on my own. Heckman model requires a selection equation:
zi* = wi γ + ui
where zi =1 if zi* >0 and zi = 0 if zi* <=0
after you calculate yi = xi*beta +ei ONLY for the cases where zi*>0
I build the probit model first:
library(MASS)
#probit1 = probit(lfp ~ age + I( age^2 ) + faminc + kids + educ, Mroz87, x = TRUE, print.level = print.level - 1, iterlim = 30)
myprobit <- glm(lfp ~ age + I( age^2 ) + faminc + kids + educ, family = binomial(link = "probit"),
data = Mroz87[1:600,])
summary(myprobit)
The model is exactly the same just as with the heckit command.
Then I build a lm model:
#get predictions for the variables (the data is not needed but I specify it anyway)
selectvar <- predict(myprobit,data = Mroz87[1:600,])
#bind the prediction to the table (I build a new one in my case)
newdata = cbind(Mroz87[1:600,],selectvar)
#Build an lm model for the subset where zi>0
lm1 = lm(wage ~ exper + I( exper^2 ) + educ + city , newdata, subset = selectvar > 0)
summary(lm1)
My issue now is that the lm model does not much the one created by heckit. I have no idea why. Any ideas?
Implementation
Here is an implementation of the predict.selection function -- it produces 4 different types of predictions (which are explained here):
library(Formula)
library(sampleSelection)
predict.selection = function(objSelection, dfPred,
type = c('link', 'prob', 'cond', 'uncond')) {
# construct the Formula object
tempS = evalq(objSelection$call$selection)
tempO = evalq(objSelection$call$outcome)
FormHeck = as.Formula(paste0(tempO[2], '|', tempS[2], '~', tempO[3], '|', tempS[3]))
# regressor matrix for the selection equation
mXSelection = model.matrix(FormHeck, data = dfPred, rhs = 2)
# regressor matrix for the outcome equation
mXOutcome = model.matrix(FormHeck, data = dfPred, rhs = 1)
# indices of the various parameters in selectionObject$estimate
vIndexBetaS = objSelection$param$index$betaS
vIndexBetaO = objSelection$param$index$betaO
vIndexErr = objSelection$param$index$errTerms
# get the estimates
vBetaS = objSelection$estimate[vIndexBetaS]
vBetaO = objSelection$estimate[vIndexBetaO]
dLambda = objSelection$estimate[vIndexErr['rho']]*
objSelection$estimate[vIndexErr['sigma']]
# depending on the type of prediction requested, return
# TODO allow the return of multiple prediction types
pred = switch(type,
link = mXSelection %*% vBetaS,
prob = pnorm(mXSelection %*% vBetaS),
uncond = mXOutcome %*% vBetaO,
cond = mXOutcome %*% vBetaO +
dnorm(temp <- mXSelection %*% vBetaS)/pnorm(temp) * dLambda)
return(pred)
}
Test
Suppose you estimate the following Heckman sample selection model using MLE:
data(Mroz87)
# define a new variable
Mroz87$kids = (Mroz87$kids5 + Mroz87$kids618 > 0)
# create the estimation sample
Mroz87Est = Mroz87[1:600, ]
# create the hold out sample
Mroz87Holdout = Mroz87[601:nrow(Mroz87), ]
# estimate the model using MLE
heckML = selection(selection = lfp ~ age + I(age^2) + faminc + kids + educ,
outcome = wage ~ exper + I(exper^2) + educ + city, data = Mroz87Est)
summary(heckML)
The different types of predictions are computed as below:
vProb = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'prob')
vLink = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'link')
vCond = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'cond')
vUncond = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'uncond')
You can verify these computation on a platform that produces these outputs, such as Stata.

Resources