I am trying to do this simple instrumental variables estimation in R using the package systemfit and two stage least squares (2SLS):
y = b + b1*x1 + b2*x2 + b3*w + e
where x1 and x1 are the endogenous variables I would like to instrument, w is an exogenous variable, and e is the residual. My two instruments are z1 and z2. I want to use z1 for x1 and z2 for x2. Thus, my 1st stage regressions would be
x1 = c + c1*z1 + c2*z2 + c3*w + e1
x2 = d + d1*z1 + d2*z2 +d3*w + e2
I have tried:
systemfit(y~x1 + x2 + w,inst=~z1 + z2 +w)
But is unsure that this is correct...
Why don't you use the ivreg from the AER package? You could try it and compare the results.
#install.packages("AER") # if not already installed
library(AER)
?ivreg
I think systemfit function can handle only one endogenous variable per equation. Try to do this in 2 steps.
lm1 <- lm(x1 ~ z1 + w, data = yourDataFrame)
lm2 <- lm(x2 ~ z2 + w, data = yourDataFrame)
yourDataFrame$x1.1st.step <- lm1$fitted
yourDataFrame$x2.1st.step <- lm2$fitted
lm.2nd.step <- lm(y ~ x1.1st.step + x2.1st.step + w, data = yourDataFrame)
I would definitely use ivreg to estimate 2SLS models. Sometimes uploading the AER package might be tricky if you do not have updated versions of R (check which package better fits you R version if you get stuck!).
Related
I am using the following model in R:
glmer(y ~ x + z + (1|id), weights = specification, family = binomial, data = data)
which:
y ~ binomial(y, specification)
Logit(y) = intercept + a*x + b*z
a and b are coefficients for x and z variables.
a|a0+a1 = a0 + a1*I
One of the variables (here x) depends on other variable (here I), so I need a hierarchical model to see the heterogeneity in mean of the x.
I would appreciate it if anyone could help me with this problem?
Sorry if the question does not look professional! This is one of my first experiences.
I'm not perfectly sure I understand the question, but: if Logit(y) = intercept + a*x + b*z and a = a0 + a1*I, then it would seem that
Logit(y) = intercept + (a0+a1*I)*x + b*z
This looks like a straightforward interaction model:
glmer(y ~ 1 + x + x:I + z + (1|id), ...)
to make it more explicit, this could also be written as
glmer(y ~ 1 + x + I(x*I) + z + (1|id), ...)
(although the use of I as a predictor variable and in the I() function is a little bit confusing at first glance ...)
I am trying to produce a nice regression table for marginal effects & p-values from the probitmfx function, where p-values are reported under the marginal effect per covariate. An picture example of what I'd like it to look like is here Similar Output from Stata.
I tried the stargazer function, as suggested here but this does not seem to work if I don't have an OLS / probit.
data_T1 <- read_dta("xxx")
#specification (1)
T1_1 <- probitmfx(y ~ x1 + x2 + x3, data=data_T1)
#specification (1)
T1_2 <- probitmfx(y ~ x1 + x2 + x3 + x4 + x5, data=data_T1)
#this is what I tried but does not work
table1 <- stargazer(coef=list(T1_1$mfxest[,1], T1_2$mfxest[,1]),
p=list(T1_2$mfxest[,4],T1_2$mfxest[,4]), type="text")
Any suggestions how I can design such a table in R?
You can probably use parameters package to produce a beautiful table:
Code:
library(mfx)
library(parameters)
# simulate some data
set.seed(12345)
n <- 1000
x <- rnorm(n)
# binary outcome
y <- ifelse(pnorm(1 + 0.5 * x + rnorm(n)) > 0.5, 1, 0)
data <- data.frame(y, x)
mod <- probitmfx(formula = y ~ x, data = data)
print_html(model_parameters(mod))
HTML table to be used in Rmarkdown:
I have an issue that concerns itself with extracting output from a regression for all possible combinations of dummy variable while keeping the continuous predictor variables fixed.
The problem is that my model contains over 100 combinations of interactions and manually calculating all of these will be quite tedious. Is there an efficient method for iteratively calculating output?
The only way I can think of is to write a loop that generates all desired combinations to subsequently feed into the predict() function.
Some context:
I am trying to identify the regional differences of automobile resale prices by the model of car.
My model looks something like this:
lm(data, price ~ age + mileage + region_dummy_1 + ... + region_dummy_n + model_dummy_1 + ... + model_dummy_n + region_dummy_1 * model_dummy_1 + ... + region_dummy_1 * model_dummy_n)
My question is:
How do I produce a table of predicted prices for every model/region combination?
Use .*.
lm(price ~ .*.)
Here's a small reproducible example:
> df <- data.frame(y = rnorm(100,0,1),
+ x1 = rnorm(100,0,1),
+ x2 = rnorm(100,0,1),
+ x3 = rnorm(100,0,1))
>
> lm(y ~ .*., data = df)
Call:
lm(formula = y ~ . * ., data = df)
Coefficients:
(Intercept) x1 x2 x3 x1:x2 x1:x3
-0.02036 0.08147 0.02354 -0.03055 0.05752 -0.02399
x2:x3
0.24065
How does it work?
. is shorthand for "all predictors", and * includes the two-way interaction term.
For example, consider a dataframe with 3 columns: Y (independent variable), and 2 predictors (X1 and X2). The syntax lm(Y ~ X1*X2) is shorthand for lm(Y ~ X1 + X2 + X1:X2), where, X1:X2 is the interaction term.
Extending this simple case, imagine we have a data frame with 3 predictors, X1, X2, and X3. lm(Y ~ .*.) is equivalent to lm(Y ~ X1 + X2 + X3 + X1:X2 + X1:X3 + X2:X3).
It's a bit of a long question so thanks for bearing with me.
Here's my data
https://www.dropbox.com/s/jo22d68a8vxwg63/data.csv?dl=0
I constructed a mixed effect model
library(lme4)
mod <- lmer(sqrt(y) ~ x1 + I(x1^2) + x2 + I(x2^2) + x3 + I(x3^2) + x4 + I(x4^2) + x5 + I(x5^2) +
x6 + I(x6^2) + x7 + I(x7^2) + x8 + I(x8^2) + (1|loc) + (1|year), data = data)
All the predictors are standardised and I am interested in knowing how does y changes with changes in x5while keeping other variables at their mean values (equal to 0 since all the variables are standardised).
This is how I do it.
# make all predictors except x5 equal to zero
data$x1<-0
data$x2<-0
data$x3<-0
data$x4<-0
data$x6<-0
data$x7<-0
data$x8<-0
# Use the predict function
library(merTools)
fitted <- predictInterval(merMod = mod, newdata = data, level = 0.95, n.sims = 1000,stat = "median",include.resid.var = TRUE)
Now I want to plot the fitted as a quadratic function of x5. I do this:
i<-order(data$x5)
plot(data$x5[i],fitted$fit[i],type="l")
I expected this to produce a plot of y as a quadratic function of x5. But As you can see, I get the following plot which does not have any quadratic curve. Can anyone tell me what I am doing wrong here?
I'm not sure where predictInterval comes from, but you can do this with predict. The trick is just to make sure you set your random effects to 0. Here's how you can do that
newdata <- data
newdata[,paste0("x", setdiff(1:8,5))] <- 0
y <- predict(mod, newdata=newdata, re.form=NA)
plot(data$x5, y)
The re.form=NA part drops out the random effect
I want to estimate a regression with two exogenous variables, two endogenous variable and a pair of fixed effects. Each endogenous variable has its own instrument.
Y = b0 + b1*X1 + b2*X2 + b3*Q + b4*W + C1*factor(id) + C2*factor(firm)
W = d0 + d1*X3
Q = e0 + e1*X4
Here is the part where I use generated data for Y, X, Q, W
require(lfe)
oldopts <- options(lfe.threads=1)
x <- rnorm(1000)
x2 <- rnorm(length(x))
id <- factor(sample(20,length(x),replace=TRUE))
firm <- factor(sample(13,length(x),replace=TRUE))
id.eff <- rnorm(nlevels(id))
firm.eff <- rnorm(nlevels(firm))
u <- rnorm(length(x))
y <- x + 0.5*x2 + id.eff[id] + firm.eff[firm] + u
x3 <- rnorm(length(x))
x4 <- 5*rnorm(length(x))^2
Q <- 0.3*x3 - 0.3*rnorm(length(x),sd=0.3) - 0.7*id.eff[id]
W <- 0.3*log(x4)- 2*x + 0.1*x2 - 0.2*y+ rnorm(length(x),sd=0.6)
y <- y + Q + W
I can estimate the coefficients using the old lfe syntax
reg <- felm(y~x+x2+G(id)+G(firm),iv=list(Q~x3,W~x4))
But the package strongly discourages the use of old syntax and I do not know how to specify different first stage equations in the new syntax.
If I try this line, both x3 and x4 would be used for both Q and W first stage equations.
reg_new <- felm(y ~ x + x2 | id+firm | (Q|W ~x3 + x4))
I'm sorry for the late answer. As the author of the lfe package, I am not aware of any theory for using different sets of instruments for different endogenous variables. It should not have been allowed in the old syntax either. If one of the instruments is uncorrelated with one of the endogenous variables, its coefficient in the first stage will simply be estimated as zero. The theory for IV-estimation by means of two stage regression simply uses some matrix identities to split the IV-estimation into two stages of ordinary regression, for convenience and reduction to well-known methods. As far as I'm aware, there is no IV with separate sets of instruments for the endogenous variables.
See e.g. wikipedia's entry on this:
https://en.wikipedia.org/wiki/Instrumental_variable#Estimation