R: Logit Regression with Instrument Variable and Interaction Term - r

I have a severe problem with R. I did not figure out how to run a logit regression with an instrument variable.
The tricky thing is that I have 2 independent variables that work as an interaction term, but the instrument only works on one of the two independent variables. Further, I have a couple of Controls.
I tried a couple of things with the AER ivreg package, but I could not figure out what I have to type in the regression command.
I would be so grateful if somebody could help me.

I think this post is what you need:
http://www.r-bloggers.com/a-simple-instrumental-variables-problem/
The code in the post
library(AER)
library(lmtest)
data("CollegeDistance")
cd.d<-CollegeDistance
simple.ed.1s<- lm(education ~ distance,data=cd.d)
cd.d$ed.pred<- predict(simple.ed.1s)
simple.ed.2s<- lm(wage ~ urban + gender + ethnicity + unemp + ed.pred , data=cd.d)
simple.comp<- encomptest(wage ~ urban + gender + ethnicity + unemp + ed.pred , wage ~ urban + gender + ethnicity + unemp + education , data=cd.d)
1s.ftest<- encomptest(education ~ tuition + gender + ethnicity + urban , education ~ distance , data=cd.d)
library(arm)
coefplot(lm(wage ~ urban + gender + ethnicity + unemp + education,data=cd.d),vertical=FALSE,var.las=1,varnames=c("Education","Unemp","Hispanic","Af-am","Female","Urban","Education"))
coefplot(simple.ed.2s , vertical=FALSE,var.las=1,varnames=c("Education","Unemp","Hispanic","Af-am","Female","Urban","Education"))

Related

Problems with Fixed effects panel data

I am trying to run a regression with a panel data from the Michigan Consumers Survey. It is the first time I am using panel data on R so I am not very aware of the package "plm" that is needed. I am setting my panel data for fixed effects on individuals (CASEID) and time (YYYY):
Michigan_panel <- pdata.frame(Michigan_survey, index = c("CASEID", "YYYY"))
Then I am using the following regression:
mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
However R is showing me the following error:
> mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor, :
empty model
Does anyone know what I am doing wrong?
Could you give the link where is this specific survey? I found various dataset with this data name.
I suspect (only suspect), you data isn't panel data, please check the CASEID variable.
Changing the order between formula and data in plm won't be solve your problem.
.
I think the error come when you write the model. Your solution is this:
mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
In my view, you have to specify indexes in the formula, and follow the order of the plm package. I would like to write your formula as follows:
mod_1 <- plm(ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq,
data = Michigan_panel,
index= c("CASEID", "YYYY"),
model = "within")
1. Different Approach
From my knowledge we can also code this formula in a more elegant format.
library(plm)
Michigan_panel <- pdata.frame(Michigan_survey, index = c("CASEID", "YYYY"))
attach(Michigan_panel)
y <- cbind(ICS)
X <- cbind(ICE,PX1Q2,RATEX,ZLB,INCOME,AGE,EDUC,MARRY,SEX,AGE_sq)
model1 <- plm(y~X+factor(CASEID)+factor(YEAR), data=Michigan_panel, model="within")
summary(model1)
detach()
Adding factor(CASEID) and factor(YEAR) will add dummy variables in your model.

Create Well-Being Latent Variable in Lavaan from Depression/Anxiety Questionnaire

I'm building a structural equation model that incorporates 4 latent variables: physical lifestyle, social lifestyle, trauma score, and the DV (well-being).
We have a 7 question survey of just well-being, but I think it would be more sound (less measurement error) to cull three surveys of well-being, depression, and anxiety to make them into a latent dependent variable. I received the warning that the covariance matrix was not positive definite when just using the scaled scores from the surveys, so I decided to actually incorporate the questions from the surveys themselves. However, when I do this and then look at modification indices I receive an output that suggests that the residuals are not currently correlated, when I thought that that was the default for any latent variable, which is why I am wondering whether I am specifying the well-being latent variable correctly (whether it's just a matter of adding in all questions that will ultimately comprise this latent variable).
Below is the entire model. The latent variable "well-being" currently only has questions from the phq 9, Depression Survey; and the General Anxiety Survey (but will also be adding in the well-being survey). I've added the output for the modification indices below that.
I've included some data here: https://drive.google.com/file/d/1AX50DFNik30Qsyiyp6XnPMETNfVXK83r/view?usp=sharing
Thanks much!
fit.latent_wb <- '
#factor loadings; measurement model portion
pl =~ exercisescore + mindfulnessscore + promistscore
sl =~ family_support + friendshipcount + friendshipnet +
sense_of_community + sesscore + ethnicity
trauma =~ neglectscore + abusescore + exposure + family_support + age
wb =~ phq9_1 + phq9_2 + phq9_3 + phq9_4 + phq9_5 + phq9_6 +
phq9_7 + phq9_8 + phq9_9 + gad7_1 + gad7_2 + gad7_3 + gad7_4 +
gad7_5+ gad7_6+ gad7_7
#regressions: structural model
wb ~ age + gender + ethnicity + sesscore + resiliencescore +
pl + emotionalsupportscore + trauma
resiliencescore ~ age + sesscore + emotionalsupportscore + sl
emotionalsupportscore ~ sl + gender
friendshipnet~~age
exercisescore~~sense_of_community
'
fit.latent_wb <- sem(fit.latent_wb, data = total, meanstructure = TRUE, std.lv = TRUE)
summary(fit.latent_wb, fit.measures = TRUE,standardized = TRUE, rsquare = TRUE, estimates = FALSE)
Output for Mod Indices:

Implementing 2SLS with fixed effects and an endogenous interaction term AND weights in R

I want to model the following in R:
outcome = beta Var1 + beta Var2+ beta Var1:Var2+ controls + county FE + year FE
I have two instruments for Var1. I also need to weight by county population. What I need is to run a 2SLS regression, with two instruments for Var1, with county and year fixed effects, all weighted by county population.
The felm package doesn't seem to allow me to instrument for the interaction term. plm has not implemented weights for 2SLS analyses, so I can't use that package and weight by county population.
My questions are:
Can anyone recommend a package that allows me to instrument with the interaction term, include my two fixed effects, AND weight by county population?
If not, is there an easy way to correct my 2SLS standard errors if I use felm but use predicted values from the first-stage regression instead of felm's native 2SLS calculation?
Any and all help is appreciated! Thanks!
ETA: I also just tried ivreg from the AER package. I used this command:
test <- ivreg(data = mydata, outcome~ Var1 + Var2 +
Var1:Var2 + Control1+ Control2 + Control3 + Control4 +
Control5 + Control6 + Control7+
Control8 + as.factor(FIPS) + as.factor(Year) | Var2+
Control1 + Control2 + Control3 + Control4 +
Control5 + Control6 + Control7 +
Control8 + Instrument1+ Instrument2,
weights = mydata$population)
This got me the error:
In ivreg.fit(X, Y, Z, weights, offset, ...) :
more regressors than instruments

How to I do a plm regression for categorical regression (panel data)?

I'm wondering if I can use categorical dependent variables in a panel data regression?
Currently I'm using plm for continuous or clogit for binary dependent variables, e.g.
model_income <- plm(income ~ education + age + children, data = incomedata, model = "random")
model_death <- clogit(death ~ food + medicine + workouts, data = deathdata, method = "approximate")
But how about a categorical dependent variable such as
religion ~ education + placeofbirth
where religion can be buddhist, catholic, protestant, muslim, etc.?
Thank you!

Lavaan: how to specify interaction terms in SEM

I am using lavaan and have only observed variables (no latent variables).
I would like to include an interaction term in the model, but not sure how to do this.
This is what I have
model4 <-'
interac =~ var1 * var2
Ent ~ age
presu ~ age + interac
protein ~ age + fat
fat ~ age
tempo ~ age +interac+protein
score ~sex+education+presu+tempo
'
fit <- sem(model4, data=mydata)
summary(fit4, fit.measures=TRUE)
(all variables have been scaled before starting, because I had some issues with some variables being 100 times larger than others).
I am wondering whether this is correct? I don't have the main effects of the interaction in the regression? Shouldn't these be included?
When I add the interaction term directly in the regression (var1*var2), I get 1 as estimates, so that must be wrong...
No, it is not correct. For manifest variables interaction, you have two alternatives:
1 - create the interaction term outside lavaan, e.g.:
mydata$interac <- mydata$var1 * mydata$var2
or
2 - use the : operator:
model4 <-'
Ent ~ age
presu ~ age + var1:var2 #interaction and age as predictors
protein ~ age + fat
fat ~ age
tempo ~ age + var1:var2 + protein #interaction, age and protein as predictors
score ~sex+education+presu+tempo
'
fit <- sem(model4, data=mydata)
summary(fit4, fit.measures=TRUE)

Resources