Constrained linear regression coefficients in R [duplicate] - r

This question already has an answer here:
R : constraining coefficients and error variance over multiple subsample regressions [closed]
(1 answer)
Closed 6 years ago.
I'm estimating several ordinary least squares linear regressions in R. I want to constrain the estimated coefficients across the regressions such that they're the same. For example, I have the following:
z1 ~ x + y
z2 ~ x + y
And I would like the estimated coefficient on y in the first regression to be equal to the estimated coefficient on x in the second.
Is there a straight-forward way to do this? Thanks in advance.
More detailed edit
I'm trying to estimate a system of linear demand functions, where the corresponding welfare function is quadratic. The welfare function has the form:
W = 0.5*ax*(Qx^2) + 0.5*ay*(Qy^2) + 0.5*bxy*Qx*Qy + 0.5*byx*Qy*Qx + cx*Qx + cy*Qy
Therefore, it follows that the demand functions are:
dW/dQx = Px = 2*0.5*ax*Qx + 0 + 0.5*bxy*Qy + 0.5*byx*Qy + 0 + cx
dW/dQx = Px = ax*Qx + 0.5*(bxy + byx)*Qy + cx
and
dW/dQy = Py = ay*Qy + 0.5*(byx + bxy)*Qx + cy
I would like to constrain the system so that byx = bxy (the cross-product coefficients in the welfare function). If this condition holds, the two demand functions become:
Px = ax*Qx + bxy*Qy + cy
Py = ay*Qy + bxy*Qy + cy
I have price (Px and Py) and quantity (Qx and Qy) data, but what I'm really interested in is the welfare (W) which I have no data for.
I know how to calculate and code all the matrix formulae for constrained least squares (which would take a fair few lines of code to get the coefficients, standard errors, measures of fit etc that come standard with lm()). But I was hoping there might be an existing R function (i.e. something that can be done to the lm() function) so that I wouldn't have to code all of this.

For your specified regression:
Px = ax*Qx + bxy*Qy + cy
Py = ay*Qy + bxy*Qy + cy
We can introduce a grouping factor:
id <- factor(rep.int(c("Px", "Py"), c(length(Px), length(Py))),
levels = c("Px", "Py"))
We also need to combine data:
z <- c(Px, Py) ## response
x <- c(Qx, Qy) ## covariate 1
y <- c(Qy, Qy) ## covariate 2
Then we can fit a linear model using lm with a formula:
z ~ x + y + x:id

If the x and y values are the same, then you could use this model:
lm( I(z1+z2)~ x +y ) # Need to divide coefficients by 2
If they are separate data then you could rbind the two datasets after renaming z2 to z1.

Related

Multiplication of different FLAGs - fixed effects model

I want to perform a regression in a fixed-effects model. To construct such a model, I have multiple FLAGs, like the following:
y ~ x + z + FlagYear1 + FlagYear2 + FlagYear3 + FlagCountry1 + FlagCountry2
I want to perform another regression in which I have fixed effects for Year * Country, so that the model will be equal to this
y ~ x + z + FlagYear1Country1 + FlagYear1Country2 + FlagYear2Country1 + FlagYear2Country2 + FlagYear3Country1 + FlagYear3Country2
As I have 26 countries and 8 Years in my model, so it would be very time-consuming to manually construct all the FLAGs. I know there is a command to perform this automatically in Stata, how can I do the same in R?
If by 'FLAG' you are referring to 0/1 coded indicator variables (or dummy variables) then R has an easy way to enter all of these interactions into a forumla.
If you have factor variables country with 26 levels and year with 8 levels then you can use
y ~ x + z + country*year
and this will expand the factors into every combination of country and year.
Look at the documentation for formula to understand how this works.
If you already have the indicator variables then you could use
y ~ x + z + (FlagYear1 + FlagYear2 + FlagYear3) * (FlagCountry1 + FlagCountry2)

How to translate simple linear model Y = β0 + β1*X + ε into a matrix in R

I have this simple linear model:
Y = β0 + β1*X + ε
This is the given layout for the data below:
The lay-out of the data is given below. n = how many β values there are, in this case only 1):
X Y
X1 Y1
X2 Y2
. .
Xn Y
So my desired matrix would be:
X Y
X1 Y1
My question is, I need to translate the model Y = β0 + β1*X + ε into a matrix in R. I don't have any physical data to insert, I am just wanting to translate the simple linear model into a matrix form. How would I do this in R. I've made matrices using a dataset before, but the lack of data for this is throwing me off on how to do it.

Imposing a restriction on a piecewise model to ensure continuity in R

I have the following fitted model w/o restriction:
reg <- lm(y ~ indi_x + x + inter)
where indi_x = indicator variable for x > 14 and inter = interaction variable for indi_x and x.
I want to impose the restriction that indi_x + (inter * 14) = 0 to fit the two segments at x = 14. I've been using the I() function within lm but am not getting the output I want.
Thanks!
If I understand correctly, you have two slopes that are joined at x = 14, and you want to infer the individual slopes (and possibly the common intercept?)
This would do it:
reg <- lm(y ~ 1 + x + x : I(x > 14))
Note that x * I(x > 14) is now the change in slope. So the absolute slope of the second segment is slope_2 - slope_1.

Curve fitting "best fit in 3d " with matlab or R

I have a problem with fitting a curve in 3D point set (or point cloud) in space. When I look at curve fitting tools, they mostly create a surface when given a point set [x,y,z]. But it is not what I want. I would like to fit on point set curve not surface.
So please help me what is the best solution for curve fitting in space (3D).
Particularly, my data looks like polynomial curve in 3d.
Equation is
z ~ ax^2 + bxy + cy^2 + d
and there is not any pre-estimated coefficients [a,b,c,d].
Thanks.
xyz <- read.table( text="x y z
518315,750 4328698,260 101,139
518315,429 4328699,830 101,120
518315,570 4328700,659 101,139
518315,350 4328702,050 101,180
518315,3894328702,849 101,190
518315,239 4328704,020 101,430", header=TRUE, dec=",")
sample image is here
With a bit of data we can now demonstrate a rather hackis effort in the direction you suggest, although this really is estimating a surface, despite your best efforts to convince us otherwise:
xyz <- read.table(text="x y z
518315,750 4328698,260 101,139
518315,429 4328699,830 101,120
518315,570 4328700,659 101,139
518315,350 4328702,050 101,180
518315,389 4328702,849 101,190
518315,239 4328704,020 101,430", header=TRUE, dec=",")
lm( z ~ I(x^2)+I(x*y) + I(y^2), data=xyz)
#---------------
Call:
lm(formula = z ~ I(x^2) + I(x * y) + I(y^2), data = xyz)
Coefficients:
(Intercept) I(x^2) I(x * y) I(y^2)
-1.182e+05 -3.187e-07 9.089e-08 NA
The collinearity of x^2 and x*y with y^2 is preventing an estimate of the y^2 variable coefficient since y = x*y/x. You can also use nls to estimate parameters for non-linear surfaces.
I suppose that you want to fit a parametrized curve of of this type:
r(t) = a + bt + ct^2
Therefore, you will have to do three independent fits:
x = ax + bx*t + cx*t^2
y = ay + by*t + cy*t^2
z = az + bz*t + cz*t^2
and obtain nine fitting parameters ax,ay,az,bx,by,bz,cx,cy,cz. Your data contains the positions x,y,z and you also need to include the time variable t=1,2,3,...,5 assuming that the points are sampled at equal time intervals.
If the 'time' parameter of your data points is unknown/random, then I suppose that you will have to estimate it yourself as another fitting parameter, one per data point. So what I suggest is the following:
Assume some reasonable parameters a,b,c.
Write a function which calculates the time t_i of each data point by
minimizing the square distance between that point and the tentative
curve r(t).
Calculate the sum of all (r(t)-R(t))^2
between the curve and your dataset R. This will be your fitting score, or
the Figure of Merit
use Matlab's genetic algoritm ga() routine to
obtain an optimal a,b,c which will minimize the Figure
of Merit as defined above
Good luck!

Selecting variables in a multivariate regression in R

I am quite new to R and I am having trouble figuring out how to select variables in a multivariate linear regression in R.
Pretend I have the following formulas:
P = aX + bY
Q = cZ + bY
I have a data frame with column P, Q, X, Y, Z and I need to find a, b and c.
If I do a simple multivariate regression:
result <- lm( cbind( P, Q ) ~ X + Y + Z - 1 )
It calculates a coefficient for "c" on P's regression and for "a" on Q's regression.
If I calculate the regressions individually then "b" will be different in each regression.
How can I select the variables to consider in a multivariate regression?
Thank you,
Edson
P = aX + bY;
Q = cZ + bY
in lavaan you could do it by adding an equality constraint i.e giving two parameters the same custom name
P ~ X + b*Y
Q ~ Z + b*Y
See also http://lavaan.ugent.be/tutorial/syntax2.html

Resources