I'm trying to fit a number of linear models as shown below. It is important that all interaction terms are sorted lexicographically. Note that the second model is missing the main effect for x.
x = rnorm(100)
y = rnorm(100)
z = x + y + rnorm(100)
m1 = glm(z ~ x + y + x:y)
m2 = glm(z ~ y + x:y)
The models don't behave as expected with respect to the interaction terms:
m1:
x:y -0.1565 0.1151 -1.360 0.1770
m2:
y:x -0.2776 0.1416 -1.961 0.0528 .
I understand that there may be a way to use the interaction() function with the lex.order argument but I can't figure out how or, indeed, whether this is the best way to go. Advice?
I am trying to do this simple instrumental variables estimation in R using the package systemfit and two stage least squares (2SLS):
y = b + b1*x1 + b2*x2 + b3*w + e
where x1 and x1 are the endogenous variables I would like to instrument, w is an exogenous variable, and e is the residual. My two instruments are z1 and z2. I want to use z1 for x1 and z2 for x2. Thus, my 1st stage regressions would be
x1 = c + c1*z1 + c2*z2 + c3*w + e1
x2 = d + d1*z1 + d2*z2 +d3*w + e2
I have tried:
systemfit(y~x1 + x2 + w,inst=~z1 + z2 +w)
But is unsure that this is correct...
Why don't you use the ivreg from the AER package? You could try it and compare the results.
#install.packages("AER") # if not already installed
library(AER)
?ivreg
I think systemfit function can handle only one endogenous variable per equation. Try to do this in 2 steps.
lm1 <- lm(x1 ~ z1 + w, data = yourDataFrame)
lm2 <- lm(x2 ~ z2 + w, data = yourDataFrame)
yourDataFrame$x1.1st.step <- lm1$fitted
yourDataFrame$x2.1st.step <- lm2$fitted
lm.2nd.step <- lm(y ~ x1.1st.step + x2.1st.step + w, data = yourDataFrame)
I would definitely use ivreg to estimate 2SLS models. Sometimes uploading the AER package might be tricky if you do not have updated versions of R (check which package better fits you R version if you get stuck!).
I am quite new to R and I am having trouble figuring out how to select variables in a multivariate linear regression in R.
Pretend I have the following formulas:
P = aX + bY
Q = cZ + bY
I have a data frame with column P, Q, X, Y, Z and I need to find a, b and c.
If I do a simple multivariate regression:
result <- lm( cbind( P, Q ) ~ X + Y + Z - 1 )
It calculates a coefficient for "c" on P's regression and for "a" on Q's regression.
If I calculate the regressions individually then "b" will be different in each regression.
How can I select the variables to consider in a multivariate regression?
Thank you,
Edson
P = aX + bY;
Q = cZ + bY
in lavaan you could do it by adding an equality constraint i.e giving two parameters the same custom name
P ~ X + b*Y
Q ~ Z + b*Y
See also http://lavaan.ugent.be/tutorial/syntax2.html
This question already has an answer here:
R : constraining coefficients and error variance over multiple subsample regressions [closed]
(1 answer)
Closed 6 years ago.
I'm estimating several ordinary least squares linear regressions in R. I want to constrain the estimated coefficients across the regressions such that they're the same. For example, I have the following:
z1 ~ x + y
z2 ~ x + y
And I would like the estimated coefficient on y in the first regression to be equal to the estimated coefficient on x in the second.
Is there a straight-forward way to do this? Thanks in advance.
More detailed edit
I'm trying to estimate a system of linear demand functions, where the corresponding welfare function is quadratic. The welfare function has the form:
W = 0.5*ax*(Qx^2) + 0.5*ay*(Qy^2) + 0.5*bxy*Qx*Qy + 0.5*byx*Qy*Qx + cx*Qx + cy*Qy
Therefore, it follows that the demand functions are:
dW/dQx = Px = 2*0.5*ax*Qx + 0 + 0.5*bxy*Qy + 0.5*byx*Qy + 0 + cx
dW/dQx = Px = ax*Qx + 0.5*(bxy + byx)*Qy + cx
and
dW/dQy = Py = ay*Qy + 0.5*(byx + bxy)*Qx + cy
I would like to constrain the system so that byx = bxy (the cross-product coefficients in the welfare function). If this condition holds, the two demand functions become:
Px = ax*Qx + bxy*Qy + cy
Py = ay*Qy + bxy*Qy + cy
I have price (Px and Py) and quantity (Qx and Qy) data, but what I'm really interested in is the welfare (W) which I have no data for.
I know how to calculate and code all the matrix formulae for constrained least squares (which would take a fair few lines of code to get the coefficients, standard errors, measures of fit etc that come standard with lm()). But I was hoping there might be an existing R function (i.e. something that can be done to the lm() function) so that I wouldn't have to code all of this.
For your specified regression:
Px = ax*Qx + bxy*Qy + cy
Py = ay*Qy + bxy*Qy + cy
We can introduce a grouping factor:
id <- factor(rep.int(c("Px", "Py"), c(length(Px), length(Py))),
levels = c("Px", "Py"))
We also need to combine data:
z <- c(Px, Py) ## response
x <- c(Qx, Qy) ## covariate 1
y <- c(Qy, Qy) ## covariate 2
Then we can fit a linear model using lm with a formula:
z ~ x + y + x:id
If the x and y values are the same, then you could use this model:
lm( I(z1+z2)~ x +y ) # Need to divide coefficients by 2
If they are separate data then you could rbind the two datasets after renaming z2 to z1.
I am using R to replicate a study and obtain mostly the same results the
author reported. At one point, however, I calculate marginal effects that seem to be unrealistically small. I would greatly appreciate if you could have a look at my reasoning and the code below and see if I am mistaken at one point or another.
My sample contains 24535 observations, the dependent variable "x028bin" is a
binary variable taking on the values 0 and 1, and there are furthermore 10
explaining variables. Nine of those independent variables have numeric levels, the independent variable "f025grouped" is a factor consisting of different religious denominations.
I would like to run a probit regression including dummies for religious denomination and then compute marginal effects. In order to do so, I first eliminate missing values and use cross-tabs between the dependent and independent variables to verify that there are no small or 0 cells. Then I run the probit model which works fine and I also obtain reasonable results:
probit4AKIE <- glm(x028bin ~ x003 + x003squ + x025secv2 + x025terv2 + x007bin + x04chief + x011rec + a009bin + x045mod + c001bin + f025grouped, family=binomial(link="probit"), data=wvshm5red2delna, na.action=na.pass)
summary(probit4AKIE)
However, when calculating marginal effects with all variables at their means from the probit coefficients and a scale factor, the marginal effects I obtain are much too small (e.g. 2.6042e-78).
The code looks like this:
ttt <- cbind(wvshm5red2delna$x003,
wvshm5red2delna$x003squ,
wvshm5red2delna$x025secv2,
wvshm5red2delna$x025terv2,
wvshm5red2delna$x007bin,
wvshm5red2delna$x04chief,
wvshm5red2delna$x011rec,
wvshm5red2delna$a009bin,
wvshm5red2delna$x045mod,
wvshm5red2delna$c001bin,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped) #I put variable "f025grouped" 9 times because this variable consists of 9 levels
ttt <- as.data.frame(ttt)
xbar <- as.matrix(mean(cbind(1,ttt[1:19]))) #1:19 position of variables in dataframe ttt
betaprobit4AKIE <- probit4AKIE$coefficients
zxbar <- t(xbar) %*% betaprobit4AKIE
scalefactor <- dnorm(zxbar)
marginprobit4AKIE <- scalefactor * betaprobit4AKIE[2:20] #2:20 are the positions of variables in the output of the probit model 'probit4AKIE' (variables need to be in the same ordering as in data.frame ttt), the constant in the model occupies the first position
marginprobit4AKIE #in this step I obtain values that are much too small
I apologize that I can not provide you with a working example as my dataset is
much too large. Any comment would be greatly appreciated. Thanks a lot.
Best,
Tobias
#Gavin is right and it's better to ask at the sister site.
In any case, here's my trick to interpret probit coefficients.
The probit regression coefficients are the same as the logit coefficients, up to a scale (1.6). So, if the fit of a probit model is Pr(y=1) = fi(.5 - .3*x), this is equivalent to the logistic model Pr(y=1) = invlogit(1.6(.5 - .3*x)).
And I use this to make a graphic, using the function invlogit of package arm. Another possibility is just to multiply all coefficients (including the intercept) by 1.6, and then applying the 'divide by 4 rule' (see the book by Gelman and Hill), i.e, divide the new coefficients by 4, and you will find out an upper bound of the predictive difference corresponding to a unit difference in x.
Here's an example.
x1 = rbinom(100,1,.5)
x2 = rbinom(100,1,.3)
x3 = rbinom(100,1,.9)
ystar = -.5 + x1 + x2 - x3 + rnorm(100)
y = ifelse(ystar>0,1,0)
probit = glm(y~x1 + x2 + x3, family=binomial(link='probit'))
xbar <- as.matrix(mean(cbind(1,ttt[1:3])))
# now the graphic, i.e., the marginal effect of x1, x2 and x3
library(arm)
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*x + probit$coef[3]*xbar[3] + probit$coef[4]*xbar[4]))) #x1
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*xbar[2] + probit$coef[3]*x + probit$coef[4]*xbar[4]))) #x2
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*xbar[2] + probit$coef[3]*xbar[3] + probit$coef[4]*x))) #x3
This will do the trick for probit or logit:
mfxboot <- function(modform,dist,data,boot=1000,digits=3){
x <- glm(modform, family=binomial(link=dist),data)
# get marginal effects
pdf <- ifelse(dist=="probit",
mean(dnorm(predict(x, type = "link"))),
mean(dlogis(predict(x, type = "link"))))
marginal.effects <- pdf*coef(x)
# start bootstrap
bootvals <- matrix(rep(NA,boot*length(coef(x))), nrow=boot)
set.seed(1111)
for(i in 1:boot){
samp1 <- data[sample(1:dim(data)[1],replace=T,dim(data)[1]),]
x1 <- glm(modform, family=binomial(link=dist),samp1)
pdf1 <- ifelse(dist=="probit",
mean(dnorm(predict(x, type = "link"))),
mean(dlogis(predict(x, type = "link"))))
bootvals[i,] <- pdf1*coef(x1)
}
res <- cbind(marginal.effects,apply(bootvals,2,sd),marginal.effects/apply(bootvals,2,sd))
if(names(x$coefficients[1])=="(Intercept)"){
res1 <- res[2:nrow(res),]
res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep=""),res1)),nrow=dim(res1)[1])
rownames(res2) <- rownames(res1)
} else {
res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep="")),nrow=dim(res)[1]))
rownames(res2) <- rownames(res)
}
colnames(res2) <- c("marginal.effect","standard.error","z.ratio")
return(res2)
}
Source: http://www.r-bloggers.com/probitlogit-marginal-effects-in-r/