I'm building a structural equation model that incorporates 4 latent variables: physical lifestyle, social lifestyle, trauma score, and the DV (well-being).
We have a 7 question survey of just well-being, but I think it would be more sound (less measurement error) to cull three surveys of well-being, depression, and anxiety to make them into a latent dependent variable. I received the warning that the covariance matrix was not positive definite when just using the scaled scores from the surveys, so I decided to actually incorporate the questions from the surveys themselves. However, when I do this and then look at modification indices I receive an output that suggests that the residuals are not currently correlated, when I thought that that was the default for any latent variable, which is why I am wondering whether I am specifying the well-being latent variable correctly (whether it's just a matter of adding in all questions that will ultimately comprise this latent variable).
Below is the entire model. The latent variable "well-being" currently only has questions from the phq 9, Depression Survey; and the General Anxiety Survey (but will also be adding in the well-being survey). I've added the output for the modification indices below that.
I've included some data here: https://drive.google.com/file/d/1AX50DFNik30Qsyiyp6XnPMETNfVXK83r/view?usp=sharing
Thanks much!
fit.latent_wb <- '
#factor loadings; measurement model portion
pl =~ exercisescore + mindfulnessscore + promistscore
sl =~ family_support + friendshipcount + friendshipnet +
sense_of_community + sesscore + ethnicity
trauma =~ neglectscore + abusescore + exposure + family_support + age
wb =~ phq9_1 + phq9_2 + phq9_3 + phq9_4 + phq9_5 + phq9_6 +
phq9_7 + phq9_8 + phq9_9 + gad7_1 + gad7_2 + gad7_3 + gad7_4 +
gad7_5+ gad7_6+ gad7_7
#regressions: structural model
wb ~ age + gender + ethnicity + sesscore + resiliencescore +
pl + emotionalsupportscore + trauma
resiliencescore ~ age + sesscore + emotionalsupportscore + sl
emotionalsupportscore ~ sl + gender
friendshipnet~~age
exercisescore~~sense_of_community
'
fit.latent_wb <- sem(fit.latent_wb, data = total, meanstructure = TRUE, std.lv = TRUE)
summary(fit.latent_wb, fit.measures = TRUE,standardized = TRUE, rsquare = TRUE, estimates = FALSE)
Output for Mod Indices:
Related
I am trying to run a regression with a panel data from the Michigan Consumers Survey. It is the first time I am using panel data on R so I am not very aware of the package "plm" that is needed. I am setting my panel data for fixed effects on individuals (CASEID) and time (YYYY):
Michigan_panel <- pdata.frame(Michigan_survey, index = c("CASEID", "YYYY"))
Then I am using the following regression:
mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
However R is showing me the following error:
> mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor, :
empty model
Does anyone know what I am doing wrong?
Could you give the link where is this specific survey? I found various dataset with this data name.
I suspect (only suspect), you data isn't panel data, please check the CASEID variable.
Changing the order between formula and data in plm won't be solve your problem.
.
I think the error come when you write the model. Your solution is this:
mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
In my view, you have to specify indexes in the formula, and follow the order of the plm package. I would like to write your formula as follows:
mod_1 <- plm(ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq,
data = Michigan_panel,
index= c("CASEID", "YYYY"),
model = "within")
1. Different Approach
From my knowledge we can also code this formula in a more elegant format.
library(plm)
Michigan_panel <- pdata.frame(Michigan_survey, index = c("CASEID", "YYYY"))
attach(Michigan_panel)
y <- cbind(ICS)
X <- cbind(ICE,PX1Q2,RATEX,ZLB,INCOME,AGE,EDUC,MARRY,SEX,AGE_sq)
model1 <- plm(y~X+factor(CASEID)+factor(YEAR), data=Michigan_panel, model="within")
summary(model1)
detach()
Adding factor(CASEID) and factor(YEAR) will add dummy variables in your model.
I am still new to R and still struggling. I am trying to do a logistic regression using a categorical and continuous variable and I am supposed to select the right variable for my model. There are 27 variables and a 8,000 observations.
I have gone through a couple of articles online including stepwise regression by AIC and all I do is confuse myself the more. I was also told to select my variables from the correlation matrix but when I do the correlation I don't seem to find the correlation especially with the categorical variable. I also try to fit all the model and I get some variables with p-value less than 0.5. This is the code:
d4 <- d3[,c('SW','MOI','YOI','DOI_CMC','RMOB','RYOB','RDOB_CMC',
'RCA','Region','TPR','DPR','NV','HEL','Has_Radio','Has_TV',
'Religion','WI','MOFB','YOB','DOB_CMC','DOFB_CMC','AOR','MTFBI',
'DSOUOM_CMC','RW','RH','RBMI')]
cor(d4)
d5 <- cor(d4)
round(cor(d4),2)
When I select the significant variables and try to apply logistic regression all the p value will be between 0.9 to 1. See code:
d3 <- lm(TPR ~ SW + MOI + RMOB + RYOB + RCA + Region + TPR + DPR +
NV + HEL + Has_Radio + Has_TV + Religion + WI + MOFB +
YOB + DOB_CMC + DOFB_CMC + AOR + MTFBI + DSOUOM_CMC +
RW + RH + RBMI,
data = d3, family = "binomial")
summary(d3)
I need help with this please!!
Here is the sample of d3
I am running a gam model based on a large dataset with many variables. My response variable is the level of "recruitment" by a herd every fall/autumn. This is calculated by the fawn:female ratio every fall/autumn over a 60 year period.
My problem is that there are many years and study sites where only between 1 - 10 females are recorded. This means that the robustness of the ratio is not trustworthy. For example if one female and one fawn is seen, it has a recruitment of 100%, but if they see one more female, that drops by 50%!
I need to tell the model that years/study sites with smaller sample sizes should be weighted less than those with larger sample sizes as these smaller sample sizes are no doubt affecting the results.
Above is a table of the females observed every year and a histogram of the same.
My model is as follows:
gamFIN <- gam(Fw.FratioFall
~ s(year)
+ s(percentage_woody_coverage)
+ s(kmRoads.km2)
+ s(WELLS_ACTIVEinsideD)
+ s(d3)
+ s(WT_DEER_springsurveys)
+ s(BadlandsCoyote.1000_mi)
+ s(Average_mintemp_winter, BadlandsCoyote.1000_mi)
+ s(BadlandsCoyote.1000_mi, WELLS_ACTIVEinsideD)
+ s(BadlandsCoyote.1000_mi, d3)
+ s(YEAR, bs = "re") + s(StudyArea, bs = "re"), method = "REML", select = T, data = mydata)
How might I tell the model to weight my response variable by the sample sizes they are based on.
Do not model this as a ratio for your outcome. Instead model the fawn counts as your outcome and model the female counts via an offset() term using logged values on the RHS of the formula. You should be offsetting with the log of the fawn count. So the formula would look like this:
Fawns
~ s(year)
+ all_those_smooth_terms
+ offset( lnFemale_counts)
The gam models have an implicit log link which is the reason for the logging of the Female counts.
Edit (Gavin's correct. The default for gam is not a linear link):
gamFIN <- gam(FawnFall ~ s(year) + s(percentage_woody_coverage) + s(kmRoads.km2) +
s(WELLS_ACTIVEinsideD) + s(d3) + s(WT_DEER_springsurveys) +
s(BadlandsCoyote.1000_mi) + s(Average_mintemp_winter, BadlandsCoyote.1000_mi) +
s(BadlandsCoyote.1000_mi, WELLS_ACTIVEinsideD) + s(BadlandsCoyote.1000_mi, d3) +
s(YEAR, bs = "re") + s(StudyArea, bs = "re") + offset(FemaleFall),
family="poisson", method = "REML", select = T, data = mydata)
I have a severe problem with R. I did not figure out how to run a logit regression with an instrument variable.
The tricky thing is that I have 2 independent variables that work as an interaction term, but the instrument only works on one of the two independent variables. Further, I have a couple of Controls.
I tried a couple of things with the AER ivreg package, but I could not figure out what I have to type in the regression command.
I would be so grateful if somebody could help me.
I think this post is what you need:
http://www.r-bloggers.com/a-simple-instrumental-variables-problem/
The code in the post
library(AER)
library(lmtest)
data("CollegeDistance")
cd.d<-CollegeDistance
simple.ed.1s<- lm(education ~ distance,data=cd.d)
cd.d$ed.pred<- predict(simple.ed.1s)
simple.ed.2s<- lm(wage ~ urban + gender + ethnicity + unemp + ed.pred , data=cd.d)
simple.comp<- encomptest(wage ~ urban + gender + ethnicity + unemp + ed.pred , wage ~ urban + gender + ethnicity + unemp + education , data=cd.d)
1s.ftest<- encomptest(education ~ tuition + gender + ethnicity + urban , education ~ distance , data=cd.d)
library(arm)
coefplot(lm(wage ~ urban + gender + ethnicity + unemp + education,data=cd.d),vertical=FALSE,var.las=1,varnames=c("Education","Unemp","Hispanic","Af-am","Female","Urban","Education"))
coefplot(simple.ed.2s , vertical=FALSE,var.las=1,varnames=c("Education","Unemp","Hispanic","Af-am","Female","Urban","Education"))
First of all, I am relatively new in using R and haven't used lavaan (or growth models) before so please excuse my ignorance.
I am doing my thesis and analyzing the U.S. financial industry during the financial crisis of 2007. I therefore have individual banks and several variables for each bank across time (from 2007-2013), some are time-variant (such as ROA or capital adequacy) and some are time-invariant (such as size or age). Some variables are also time-variant but not multi-level since they apply to all firms (such as the average ROA of the U.S. financial industry).
Fist of all, can I use lavaan's growth curve model ("growth") in this instance? The example given on the tutorial is for either time-varying variables (c) that influence the outcome (DV) or time-invariant variables (x1 & x2) which influence the slope (s) and intercept (i). What about time varying variables that influence the slope and intercept? I couldn't find an example for this syntax.
Also, how do I specify the "groups" (i.e. different banks) in my analysis? It is actually possible to do a multi-level growth curve model in lavaan (or R for that matter)?
Last but not least, I could find how to import a multilevel dataset in R. My dataset is basically a 3-dimensional matrix (different variables for different firms across time) so how do I input that via SPSS (or notepad?)?
Any help is much appreciated, I am basically lost on how to implement this model and sincerely need some assistance...
Thank you all in advance for your time!
Harry
edit: Here is the sytanx that I have come with so far. DO you think it makes sense?
ETHthesismodel <- '
# intercept and slope with fixed coefficients
i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4
s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4
#regressions (independent variables that influence the slope & intercept)
i ~ high_constr_2007 + high_constr_2008 + ... + low_constr_2007 + low_constr_2008 + ... + ... diff_2013
s ~ high_constr_2007 + high_constr_2008 + ... + low_constr_2007 + low_constr_2008 + ... + ... diff_2013
# time-varying covariates (control variables)
t1 ~ size_2007 + cap_adeq_2007 + brand_2007 +... + acquisitions_2007
t2 ~ size_2008 + cap_adeq_2008 + brand_2008 + ... + acquisitions_2008
...
t7 ~ size_2013 + cap_adeq_2013 + brand_2013 + ... + acquisitions_2013
'
fit <- growth(ETHthesismodel, data = inputdata,
group = "bank")
summary(fit)