Selecting variables in a multivariate regression in R - r

I am quite new to R and I am having trouble figuring out how to select variables in a multivariate linear regression in R.
Pretend I have the following formulas:
P = aX + bY
Q = cZ + bY
I have a data frame with column P, Q, X, Y, Z and I need to find a, b and c.
If I do a simple multivariate regression:
result <- lm( cbind( P, Q ) ~ X + Y + Z - 1 )
It calculates a coefficient for "c" on P's regression and for "a" on Q's regression.
If I calculate the regressions individually then "b" will be different in each regression.
How can I select the variables to consider in a multivariate regression?
Thank you,
Edson

P = aX + bY;
Q = cZ + bY
in lavaan you could do it by adding an equality constraint i.e giving two parameters the same custom name
P ~ X + b*Y
Q ~ Z + b*Y
See also http://lavaan.ugent.be/tutorial/syntax2.html

Related

How to translate simple linear model Y = β0 + β1*X + ε into a matrix in R

I have this simple linear model:
Y = β0 + β1*X + ε
This is the given layout for the data below:
The lay-out of the data is given below. n = how many β values there are, in this case only 1):
X Y
X1 Y1
X2 Y2
. .
Xn Y
So my desired matrix would be:
X Y
X1 Y1
My question is, I need to translate the model Y = β0 + β1*X + ε into a matrix in R. I don't have any physical data to insert, I am just wanting to translate the simple linear model into a matrix form. How would I do this in R. I've made matrices using a dataset before, but the lack of data for this is throwing me off on how to do it.

Compute effect sizes of path coefficients in SEM with R

I am currently using the lavaan package in R for structural equation models. I would like to compute the effect sizes (i.e., partial-eta-squared) for each of my path coefficient. Is there already a package that does this?
For instance, how can I compute the effect size of the c, a and b regression coefficients?
set.seed(1234)
X <- rnorm(100)
M <- 0.5*X + rnorm(100)
Y <- 0.7*M + rnorm(100)
Data <- data.frame(X = X, Y = Y, M = M)
model <- ' # direct effect
Y ~ c*X
# mediator
M ~ a*X
Y ~ b*M
# indirect effect (a*b)
ab := a*b
# total effect
total := c + (a*b)
'
fit <- sem(model, data = Data)
summary(fit)
Ideally the method should also work when building models based on latent variables.

manually backward elimination when intercept is not significant in R?

I am using the following code to do Backward Elimination:
regressor = lm (formula = y ~ a + b + c + d + e, data = dataset)
summary (regressor)
then remove the predictor with the highest P-value.
eg. if c has the largest p-value, then
regressor = lm (formula = y ~ a + b + d + e, data = dataset)
summary (regressor)
repeat until we have all variables with p-value < Significant Level.
But I encounter a problem here, I found the intercept has the largest p-value and I cannot specify or remove it in "regressor".
Could someone help me out here plz?
It seems like what you're asking is how to run a regression without an intercept? Then you can do so using Y~x1+x2...-1 as your formula in lm().

New lfe/felm syntax, variable-specific instrument

I want to estimate a regression with two exogenous variables, two endogenous variable and a pair of fixed effects. Each endogenous variable has its own instrument.
Y = b0 + b1*X1 + b2*X2 + b3*Q + b4*W + C1*factor(id) + C2*factor(firm)
W = d0 + d1*X3
Q = e0 + e1*X4
Here is the part where I use generated data for Y, X, Q, W
require(lfe)
oldopts <- options(lfe.threads=1)
x <- rnorm(1000)
x2 <- rnorm(length(x))
id <- factor(sample(20,length(x),replace=TRUE))
firm <- factor(sample(13,length(x),replace=TRUE))
id.eff <- rnorm(nlevels(id))
firm.eff <- rnorm(nlevels(firm))
u <- rnorm(length(x))
y <- x + 0.5*x2 + id.eff[id] + firm.eff[firm] + u
x3 <- rnorm(length(x))
x4 <- 5*rnorm(length(x))^2
Q <- 0.3*x3 - 0.3*rnorm(length(x),sd=0.3) - 0.7*id.eff[id]
W <- 0.3*log(x4)- 2*x + 0.1*x2 - 0.2*y+ rnorm(length(x),sd=0.6)
y <- y + Q + W
I can estimate the coefficients using the old lfe syntax
reg <- felm(y~x+x2+G(id)+G(firm),iv=list(Q~x3,W~x4))
But the package strongly discourages the use of old syntax and I do not know how to specify different first stage equations in the new syntax.
If I try this line, both x3 and x4 would be used for both Q and W first stage equations.
reg_new <- felm(y ~ x + x2 | id+firm | (Q|W ~x3 + x4))
I'm sorry for the late answer. As the author of the lfe package, I am not aware of any theory for using different sets of instruments for different endogenous variables. It should not have been allowed in the old syntax either. If one of the instruments is uncorrelated with one of the endogenous variables, its coefficient in the first stage will simply be estimated as zero. The theory for IV-estimation by means of two stage regression simply uses some matrix identities to split the IV-estimation into two stages of ordinary regression, for convenience and reduction to well-known methods. As far as I'm aware, there is no IV with separate sets of instruments for the endogenous variables.
See e.g. wikipedia's entry on this:
https://en.wikipedia.org/wiki/Instrumental_variable#Estimation

Constrained linear regression coefficients in R [duplicate]

This question already has an answer here:
R : constraining coefficients and error variance over multiple subsample regressions [closed]
(1 answer)
Closed 6 years ago.
I'm estimating several ordinary least squares linear regressions in R. I want to constrain the estimated coefficients across the regressions such that they're the same. For example, I have the following:
z1 ~ x + y
z2 ~ x + y
And I would like the estimated coefficient on y in the first regression to be equal to the estimated coefficient on x in the second.
Is there a straight-forward way to do this? Thanks in advance.
More detailed edit
I'm trying to estimate a system of linear demand functions, where the corresponding welfare function is quadratic. The welfare function has the form:
W = 0.5*ax*(Qx^2) + 0.5*ay*(Qy^2) + 0.5*bxy*Qx*Qy + 0.5*byx*Qy*Qx + cx*Qx + cy*Qy
Therefore, it follows that the demand functions are:
dW/dQx = Px = 2*0.5*ax*Qx + 0 + 0.5*bxy*Qy + 0.5*byx*Qy + 0 + cx
dW/dQx = Px = ax*Qx + 0.5*(bxy + byx)*Qy + cx
and
dW/dQy = Py = ay*Qy + 0.5*(byx + bxy)*Qx + cy
I would like to constrain the system so that byx = bxy (the cross-product coefficients in the welfare function). If this condition holds, the two demand functions become:
Px = ax*Qx + bxy*Qy + cy
Py = ay*Qy + bxy*Qy + cy
I have price (Px and Py) and quantity (Qx and Qy) data, but what I'm really interested in is the welfare (W) which I have no data for.
I know how to calculate and code all the matrix formulae for constrained least squares (which would take a fair few lines of code to get the coefficients, standard errors, measures of fit etc that come standard with lm()). But I was hoping there might be an existing R function (i.e. something that can be done to the lm() function) so that I wouldn't have to code all of this.
For your specified regression:
Px = ax*Qx + bxy*Qy + cy
Py = ay*Qy + bxy*Qy + cy
We can introduce a grouping factor:
id <- factor(rep.int(c("Px", "Py"), c(length(Px), length(Py))),
levels = c("Px", "Py"))
We also need to combine data:
z <- c(Px, Py) ## response
x <- c(Qx, Qy) ## covariate 1
y <- c(Qy, Qy) ## covariate 2
Then we can fit a linear model using lm with a formula:
z ~ x + y + x:id
If the x and y values are the same, then you could use this model:
lm( I(z1+z2)~ x +y ) # Need to divide coefficients by 2
If they are separate data then you could rbind the two datasets after renaming z2 to z1.

Resources