Output from scatter3d R script - how to read the equation

Output from scatter3d R script - how to read the equation - r

I am using scatter3d to find a fit in my R script. I did so, and here is the output:
Call:
lm(formula = y ~ (x + z)^2 + I(x^2) + I(z^2))
Residuals:
Min 1Q Median 3Q Max
-0.78454 -0.02302 -0.00563 0.01398 0.47846
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.051975 0.003945 -13.173 < 2e-16 ***
x 0.224564 0.023059 9.739 < 2e-16 ***
z 0.356314 0.021782 16.358 < 2e-16 ***
I(x^2) -0.340781 0.044835 -7.601 3.46e-14 ***
I(z^2) 0.610344 0.028421 21.475 < 2e-16 ***
x:z -0.454826 0.065632 -6.930 4.71e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.05468 on 5293 degrees of freedom
Multiple R-squared: 0.6129, Adjusted R-squared: 0.6125
F-statistic: 1676 on 5 and 5293 DF, p-value: < 2.2e-16
Based on this, what is the equation of the best fit line? I'm not really sure how to read this? Can someone explain? thanks!

This is a basic regression output table. The parameter estimates ("Estimate" column) are the best-fit line coefficients corresponding to the different terms in your model. If you aren't familiar with this terminology, I would suggest reading up on some linear model and regression tutorial. There are thousands around the web. I would also encourage you to play with some simpler 2D simulations.
For example, let's make some data with an intercept of 2 and a slope of 0.5:
# Simulate data
set.seed(12345)
x = seq(0, 10, len=50)
y = 2 + 0.5 * x + rnorm(length(x), 0, 0.1)
data = data.frame(x, y)
Now when we look at the fit, you'll see that the Estimate column shows these same values:
# Fit model
fit = lm(y ~ x, data=data)
summary(fit)
> summary(fit)
Call:
lm(formula = y ~ x, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.26017 -0.06434 0.02539 0.06238 0.20008
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.011759 0.030856 65.20 <2e-16 ***
x 0.501240 0.005317 94.27 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1107 on 48 degrees of freedom
Multiple R-squared: 0.9946, Adjusted R-squared: 0.9945
F-statistic: 8886 on 1 and 48 DF, p-value: < 2.2e-16
Pulling these out, we can then plot the best-fit line:
# Make plot
dev.new(width=4, height=4)
plot(x, y, ylim=c(0,10))
abline(fit$coef[1], fit$coef[2])

It's not a plane but rather a paraboloid surface (and using 'y' as the third dimension since you used 'z' already):
y = -0.051975 + x * 0.224564 + z * 0.356314 +
-x^2 * -0.340781 + z^2 * 0.610344 - x * z * 0.454826

Related

How to formulate time period dummy variable in lm()

I am analysing whether the effects of x_t on y_t differ during and after a specific time period.
I am trying to regress the following model in R using lm():
y_t = b_0 + [b_1(1-D_t) + b_2 D_t]x_t
where D_t is a dummy variable with the value 1 over the time period and 0 otherwise.
Is it possible to use lm() for this formula?

observationNumber <- 1:80
obsFactor <- cut(observationNumber, breaks = c(0,55,81), right =F)
fit <- lm(y ~ x * obsFactor)
For example:
y = runif(80)
x = rnorm(80) + c(rep(0,54), rep(1, 26))
fit <- lm(y ~ x * obsFactor)
summary(fit)
Call:
lm(formula = y ~ x * obsFactor)
Residuals:
Min 1Q Median 3Q Max
-0.48375 -0.29655 0.05957 0.22797 0.49617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.50959 0.04253 11.983 <2e-16 ***
x -0.02492 0.04194 -0.594 0.554
obsFactor[55,81) -0.06357 0.09593 -0.663 0.510
x:obsFactor[55,81) 0.07120 0.07371 0.966 0.337
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3116 on 76 degrees of freedom
Multiple R-squared: 0.01303, Adjusted R-squared: -0.02593
F-statistic: 0.3345 on 3 and 76 DF, p-value: 0.8004
obsFactor[55,81) is zero if observationNumber < 55 and one if its greater or equal its coefficient is your $b_0$. x:obsFactor[55,81) is the product of the dummy and the variable $x_t$ - its coefficient is your $b_2$. The coefficient for $x_t$ is your $b_1$.

Why I() "AsIs" is necessary when making a linear polynomial model in R?

I'm trying to understand what is the role of I() base function in R when using a linear polynomial model or the function poly. When I calculate the model using
q + q^2
q + I(q^2)
poly(q, 2)
I have different answers.
Here is an example:
set.seed(20)
q <- seq(from=0, to=20, by=0.1)
y <- 500 + .1 * (q-5)^2
noise <- rnorm(length(q), mean=10, sd=80)
noisy.y <- y + noise
model3 <- lm(noisy.y ~ poly(q,2))
model1 <- lm(noisy.y ~ q + I(q^2))
model2 <- lm(noisy.y ~ q + q^2)
I(q^2)==I(q)^2
I(q^2)==q^2
summary(model1)
summary(model2)
summary(model3)
Here is the output:
> summary(model1)
Call:
lm(formula = noisy.y ~ q + I(q^2))
Residuals:
Min 1Q Median 3Q Max
-211.592 -50.609 4.742 61.983 165.792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 489.3723 16.5982 29.483 <2e-16 ***
q 5.0560 3.8344 1.319 0.189
I(q^2) -0.1530 0.1856 -0.824 0.411
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.22 on 198 degrees of freedom
Multiple R-squared: 0.02451, Adjusted R-squared: 0.01466
F-statistic: 2.488 on 2 and 198 DF, p-value: 0.08568
> summary(model2)
Call:
lm(formula = noisy.y ~ q + q^2)
Residuals:
Min 1Q Median 3Q Max
-219.96 -54.42 3.30 61.06 170.79
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 499.5209 11.1252 44.900 <2e-16 ***
q 1.9961 0.9623 2.074 0.0393 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.16 on 199 degrees of freedom
Multiple R-squared: 0.02117, Adjusted R-squared: 0.01625
F-statistic: 4.303 on 1 and 199 DF, p-value: 0.03933
> summary(model3)
Call:
lm(formula = noisy.y ~ poly(q, 2))
Residuals:
Min 1Q Median 3Q Max
-211.592 -50.609 4.742 61.983 165.792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 519.482 5.588 92.966 <2e-16 ***
poly(q, 2)1 164.202 79.222 2.073 0.0395 *
poly(q, 2)2 -65.314 79.222 -0.824 0.4107
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.22 on 198 degrees of freedom
Multiple R-squared: 0.02451, Adjusted R-squared: 0.01466
F-statistic: 2.488 on 2 and 198 DF, p-value: 0.08568
Why is the I() necessary when doing a polynomial model in R.
Also, is this normal that the poly function doesn't give the same result as the q + I(q^2)?

The formula syntax in R is described in the ?formula help page. The ^ symbol has not been given the usual meaning of multiplicative exponentiation. Rather, it's used for interactions between all terms at the base of the exponent. For example
y ~ (a+b)^2
is the same as
y ~ a + b + a:b
But if you do
y ~ a + b^2
y ~ a + b # same as above, no way to "interact" b with itself.
That caret would just include the b term because it can't include the interaction with itself. So ^ and * inside formulas has nothing to do with multiplication just like the + doesn't really mean addition for variables in the usual sense.
If you want the "usual" definition for ^2 you need to put it the as is function. Otherwise it's not fitting a squared term at all.
And the poly() function by default returns orthogonal polynomials as described on the help page. This helps to reduce co-linearity in the covariates. But if you don't want the orthogonal versions and just want the "raw" polynomial terms, then just pass raw=TRUE to your poly call. For example
lm(noisy.y ~ poly(q,2, raw=TRUE))
will return the same estimates as model1

General Linear Model interpretation of parameter estimates in R

I have a data set that looks like
"","OBSERV","DIOX","logDIOX","OXYGEN","LOAD","PRSEK","PLANT","TIME","LAB"
"1",1011,984.06650389,6.89169348002254,"L","H","L","RENO_N","1","KK"
"2",1022,1790.7973641,7.49041625445373,"H","H","L","RENO_N","1","USA"
"3",1031,661.95870145,6.4952031694744,"L","H","H","RENO_N","1","USA"
"4",1042,978.06853583,6.88557974511529,"H","H","H","RENO_N","1","KK"
"5",1051,270.92290942,5.60183431332639,"N","N","N","RENO_N","1","USA"
"6",1062,402.98269729,5.99889362626069,"N","N","N","RENO_N","1","USA"
"7",1071,321.71945701,5.77367991426247,"H","L","L","RENO_N","1","KK"
"8",1082,223.15260359,5.40785585845064,"L","L","L","RENO_N","1","USA"
"9",1091,246.65350151,5.507984523849,"H","L","H","RENO_N","1","USA"
"10",1102,188.48323034,5.23900903921703,"L","L","H","RENO_N","1","KK"
"11",1141,267.34994025,5.58855843790491,"N","N","N","RENO_N","1","KK"
"12",1152,452.10355987,6.11391126834609,"N","N","N","RENO_N","1","KK"
"13",2011,2569.6672555,7.85153169693888,"N","N","N","KARA","1","USA"
"14",2021,604.79620572,6.40489155123453,"N","N","N","KARA","1","KK"
"15",2031,2610.4804449,7.86728956188212,"L","H",NA,"KARA","1","KK"
"16",2032,3789.7097503,8.24004471210954,"L","H",NA,"KARA","1","USA"
"17",2052,338.97054188,5.82591320649553,"L","L","L","KARA","1","KK"
"18",2061,391.09027375,5.96893841249289,"H","L","H","KARA","1","USA"
"19",2092,410.04420258,6.01626496505788,"N","N","N","KARA","1","USA"
"20",2102,313.51882368,5.74785940190679,"N","N","N","KARA","1","KK"
"21",2112,1242.5931417,7.12495571830002,"H","H","H","KARA","1","KK"
"22",2122,1751.4827969,7.46821802066524,"H","H","L","KARA","1","USA"
"23",3011,60.48026048,4.10231703874031,"N","N","N","RENO_S","1","KK"
"24",3012,257.27729731,5.55015448107691,"N","N","N","RENO_S","1","USA"
"25",3021,46.74282552,3.84466077914493,"N","N","N","RENO_S","1","KK"
"26",3022,73.605375516,4.29871805996994,"N","N","N","RENO_S","1","KK"
"27",3031,108.25433812,4.68448344109116,"H","H","L","RENO_S","1","KK"
"28",3032,124.40704234,4.82355878915293,"H","H","L","RENO_S","1","USA"
"29",3042,123.66859296,4.81760535031397,"L","H","L","RENO_S","1","KK"
"30",3051,170.05332632,5.13611207209694,"N","N","N","RENO_S","1","USA"
"31",3052,95.868704018,4.56297958887925,"N","N","N","RENO_S","1","KK"
"32",3061,202.69261215,5.31169060558111,"N","N","N","RENO_S","1","USA"
"33",3062,70.686307069,4.25825187761015,"N","N","N","RENO_S","1","USA"
"34",3071,52.034715526,3.95191110210073,"L","H","H","RENO_S","1","KK"
"35",3072,93.33525462,4.53619789950355,"L","H","H","RENO_S","1","USA"
"36",3081,121.47464906,4.79970559129829,"H","H","H","RENO_S","1","USA"
"37",3082,94.833869239,4.55212661590867,"H","H","H","RENO_S","1","KK"
"38",3091,68.624596439,4.22865101914209,"H","L","L","RENO_S","1","USA"
"39",3092,64.837097371,4.17187792984139,"H","L","L","RENO_S","1","KK"
"40",3101,32.351569811,3.47666254561192,"L","L","L","RENO_S","1","KK"
"41",3102,29.285124102,3.37707967726539,"L","L","L","RENO_S","1","USA"
"42",3111,31.36974463,3.44584388158928,"L","L","H","RENO_S","1","USA"
"43",3112,28.127853881,3.33676032670116,"L","L","H","RENO_S","1","KK"
"44",3121,91.825330102,4.51988818660262,"H","L","H","RENO_S","1","KK"
"45",3122,136.4559307,4.91600171048243,"H","L","H","RENO_S","1","USA"
"46",4011,126.11889968,4.83722511024933,"H","L","H","RENO_N","2","KK"
"47",4022,76.520259821,4.33755554003153,"L","L","L","RENO_N","2","KK"
"48",4032,93.551979795,4.53851721545715,"L","L","H","RENO_N","2","USA"
"49",4041,207.09703422,5.33318744777751,"H","L","L","RENO_N","2","USA"
"50",4052,383.44185307,5.94918798759058,"N","N","N","RENO_N","2","USA"
"51",4061,156.79345897,5.05492939129363,"N","N","N","RENO_N","2","USA"
"52",4071,322.72413197,5.77679787769979,"L","H","L","RENO_N","2","USA"
"53",4082,554.05710342,6.31726775620079,"H","H","H","RENO_N","2","USA"
"54",4091,122.55552697,4.80856420867156,"N","N","N","RENO_N","2","KK"
"55",4102,112.70050456,4.72473389805434,"N","N","N","RENO_N","2","KK"
"56",4111,94.245481423,4.54590288271731,"L","H","H","RENO_N","2","KK"
"57",4122,323.16498582,5.77816298482521,"H","H","L","RENO_N","2","KK"
I define a linear model in R using lm as
lm1 <- lm(logDIOX ~ 1 + OXYGEN + LOAD + PLANT + TIME + LAB, data=data)
and I want to interpret the estimated coefficients. However, when I extract the coefficients I get multiple 'NAs' (I'm assuming it's due to linear dependencies among the variables). How can I then interpret the coefficients? I only have one intercept that somehow represents one of the levels of each of the included factors in the model. Is it possible to get an estimate for each factor level?
> summary(lm1)
Coefficients:
Call:
lm(formula = logDIOX ~ OXYGEN + LOAD + PLANT + TIME + LAB, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.90821 -0.32102 -0.08993 0.27311 0.97758
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.2983 0.2110 34.596 < 2e-16 ***
OXYGENL -0.4086 0.1669 -2.449 0.017953 *
OXYGENN -0.7567 0.1802 -4.199 0.000113 ***
LOADL -1.0645 0.1675 -6.357 6.58e-08 ***
LOADN NA NA NA NA
PLANTRENO_N -0.6636 0.2174 -3.052 0.003664 **
PLANTRENO_S -2.3452 0.1929 -12.158 < 2e-16 ***
TIME2 -0.9160 0.2065 -4.436 5.18e-05 ***
LABUSA 0.3829 0.1344 2.849 0.006392 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5058 on 49 degrees of freedom
Multiple R-squared: 0.8391, Adjusted R-squared: 0.8161
F-statistic: 36.5 on 7 and 49 DF, p-value: < 2.2e-16

For the NA part of your question you can have a look here:
[linear regression "NA" estimate just for last coefficient, actually of your variables can be described as a linear combination of the rest.
For the factors and their levels the way r works is showing intercept with first factor level and shows the difference of the intercept with the rest of the factors. I think it will be more clear with just one factor regression:
lm1 <- lm(logDIOX ~ 1 + OXYGEN , data=df)
> summary(lm1)
Call:
lm(formula = logDIOX ~ 1 + OXYGEN, data = df)
Residuals:
Min 1Q Median 3Q Max
-1.7803 -0.7833 -0.2027 0.6597 3.1229
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.5359 0.2726 20.305 <2e-16 ***
OXYGENL -0.4188 0.3909 -1.071 0.289
OXYGENN -0.1896 0.3807 -0.498 0.621
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.188 on 54 degrees of freedom
Multiple R-squared: 0.02085, Adjusted R-squared: -0.01542
F-statistic: 0.5749 on 2 and 54 DF, p-value: 0.5662
what this result is saying is that for
OXYGEN="H" intercept is 5.5359, for OXYGEN="L" intercept is 5.5359-0.4188=5.1171 and for OXYGEN="N" intercept is 5.5359-0.1896= 5.3463.
Hope this helps
UPDATE:
Following your comment I generalize to your model.
when
OXYGEN = "H"
LOAD = "H"
PLANT= "KARRA"
TIME=1
LAB="KK"
then:
logDIOX =7.2983
when
OXYGEN = "L"
LOAD = "H"
PLANT= "KARRA"
TIME=1
LAB="KK"
then:
logDIOX =7.2983-0.4086 =6.8897
when
OXYGEN = "L"
LOAD = "L"
PLANT= "KARRA"
TIME=1
LAB="KK"
then:
logDIOX =7.2983-0.4086-1.0645 =5.8252
etc.

Need help modeling data with a log() function

I have some data that Excel will fit pretty nicely with a logarithmic trend. I want to pass the same data into R and have it tell me the coefficients and intercept. What form should have the data in and what function should I call to have it figure out the coefficients? Ultimately, I want to do this thousands of time so that I can project into the future.
Passing Excel these values produces this trendline function: y = -0.099ln(x) + 0.7521
Data:
y <- c(0.7521, 0.683478429, 0.643337383, 0.614856858, 0.592765647, 0.574715813,
0.559454895, 0.546235287, 0.534574767, 0.524144076, 0.514708368)
For context, the data points represent % of our user base that are retained on a given day.

The question omitted the value of x but working backwards it seems you were using 1, 2, 3, ... so try the following:
x <- 1:11
y <- c(0.7521, 0.683478429, 0.643337383, 0.614856858, 0.592765647,
0.574715813, 0.559454895, 0.546235287, 0.534574767, 0.524144076,
0.514708368)
fm <- lm(y ~ log(x))
giving:
> coef(fm)
(Intercept) log(x)
0.7521 -0.0990
and
plot(y ~ x, log = "x")
lines(fitted(fm) ~ x, col = "red")

You can get the same results by:
y <- c(0.7521, 0.683478429, 0.643337383, 0.614856858, 0.592765647, 0.574715813, 0.559454895, 0.546235287, 0.534574767, 0.524144076, 0.514708368)
t <- seq(along=y)
> summary(lm(y~log(t)))
Call:
lm(formula = y ~ log(t))
Residuals:
Min 1Q Median 3Q Max
-3.894e-10 -2.288e-10 -2.891e-11 1.620e-10 4.609e-10
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.521e-01 2.198e-10 3421942411 <2e-16 ***
log(t) -9.900e-02 1.261e-10 -784892428 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.972e-10 on 9 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 6.161e+17 on 1 and 9 DF, p-value: < 2.2e-16
For large projects I recommend to encapsulate the data into a data frame, like
df <- data.frame(y, t)
lm(formula = y ~ log(t), data=df)

Two stage least square in R

I want to run a two stage probit least square regression in R. Does anyone know how to do this? Is there any package out there? I know it's possible to do it using Stata, so I imagine it's possible to do it with R.

You might want to be more specific when you say 'two-stage-probit-least-squares'. Since you refer to a Stata program that implements this I am guessing you are talking about the CDSIMEQ package, which implements the Amemiya (1978) procedure for the Heckit model (a.k.a Generalized Tobit, a.k.a. Tobit type II model, etc.). As Grant said, systemfit will do a Tobit for you, but not with two equations. The MicEcon package did have a Heckit (but the package has split so many times I don't know where it is now).
If you want what the CDSIMEQ does, it can easily be implemented in R. I wrote a function that replicates CDSIMEQ:
tspls <- function(formula1, formula2, data) {
# The Continous model
mf1 <- model.frame(formula1, data)
y1 <- model.response(mf1)
x1 <- model.matrix(attr(mf1, "terms"), mf1)
# The dicontionous model
mf2 <- model.frame(formula2, data)
y2 <- model.response(mf2)
x2 <- model.matrix(attr(mf2, "terms"), mf2)
# The matrix of all the exogenous variables
X <- cbind(x1, x2)
X <- X[, unique(colnames(X))]
J1 <- matrix(0, nrow = ncol(X), ncol = ncol(x1))
J2 <- matrix(0, nrow = ncol(X), ncol = ncol(x2))
for (i in 1:ncol(x1)) J1[match(colnames(x1)[i], colnames(X)), i] <- 1
for (i in 1:ncol(x2)) J2[match(colnames(x2)[i], colnames(X)), i] <- 1
# Step 1:
cat("\n\tNOW THE FIRST STAGE REGRESSION")
m1 <- lm(y1 ~ X - 1)
m2 <- glm(y2 ~ X - 1, family = binomial(link = "probit"))
print(summary(m1))
print(summary(m2))
yhat1 <- m1$fitted.values
yhat2 <- X %*% coef(m2)
PI1 <- m1$coefficients
PI2 <- m2$coefficients
V0 <- vcov(m2)
sigma1sq <- sum(m1$residuals ^ 2) / m1$df.residual
sigma12 <- 1 / length(y2) * sum(y2 * m1$residuals / dnorm(yhat2))
# Step 2:
cat("\n\tNOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS")
m1 <- lm(y1 ~ yhat2 + x1 - 1)
m2 <- glm(y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit"))
sm1 <- summary(m1)
sm2 <- summary(m2)
print(sm1)
print(sm2)
# Step 3:
cat("\tNOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS\n\n")
gamma1 <- m1$coefficients[1]
gamma2 <- m2$coefficients[1]
cc <- sigma1sq - 2 * gamma1 * sigma12
dd <- gamma2 ^ 2 * sigma1sq - 2 * gamma2 * sigma12
H <- cbind(PI2, J1)
G <- cbind(PI1, J2)
XX <- crossprod(X) # X'X
HXXH <- solve(t(H) %*% XX %*% H) # (H'X'XH)^(-1)
HXXVXXH <- t(H) %*% XX %*% V0 %*% XX %*% H # H'X'V0X'XH
Valpha1 <- cc * HXXH + gamma1 ^ 2 * HXXH %*% HXXVXXH %*% HXXH
GV <- t(G) %*% solve(V0) # G'V0^(-1)
GVG <- solve(GV %*% G) # (G'V0^(-1)G)^(-1)
Valpha2 <- GVG + dd * GVG %*% GV %*% solve(XX) %*% solve(V0) %*% G %*% GVG
ans1 <- coef(sm1)
ans2 <- coef(sm2)
ans1[,2] <- sqrt(diag(Valpha1))
ans2[,2] <- sqrt(diag(Valpha2))
ans1[,3] <- ans1[,1] / ans1[,2]
ans2[,3] <- ans2[,1] / ans2[,2]
ans1[,4] <- 2 * pt(abs(ans1[,3]), m1$df.residual, lower.tail = FALSE)
ans2[,4] <- 2 * pnorm(abs(ans2[,3]), lower.tail = FALSE)
cat("Continuous:\n")
print(ans1)
cat("Dichotomous:\n")
print(ans2)
}
For comparison, we can replicate the sample from the author of CDSIMEQ in their article about the package.
> library(foreign)
> cdsimeq <- read.dta("http://www.stata-journal.com/software/sj3-2/st0038/cdsimeq.dta")
> tspls(continuous ~ exog3 + exog2 + exog1 + exog4,
+ dichotomous ~ exog1 + exog2 + exog5 + exog6 + exog7,
+ data = cdsimeq)
NOW THE FIRST STAGE REGRESSION
Call:
lm(formula = y1 ~ X - 1)
Residuals:
Min 1Q Median 3Q Max
-1.885921 -0.438579 -0.006262 0.432156 2.133738
Coefficients:
Estimate Std. Error t value Pr(>|t|)
X(Intercept) 0.010752 0.020620 0.521 0.602187
Xexog3 0.158469 0.021862 7.249 8.46e-13 ***
Xexog2 -0.009669 0.021666 -0.446 0.655488
Xexog1 0.159955 0.021260 7.524 1.19e-13 ***
Xexog4 0.316575 0.022456 14.097 < 2e-16 ***
Xexog5 0.497207 0.021356 23.282 < 2e-16 ***
Xexog6 -0.078017 0.021755 -3.586 0.000352 ***
Xexog7 0.161177 0.022103 7.292 6.23e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6488 on 992 degrees of freedom
Multiple R-squared: 0.5972, Adjusted R-squared: 0.594
F-statistic: 183.9 on 8 and 992 DF, p-value: < 2.2e-16
Call:
glm(formula = y2 ~ X - 1, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.49531 -0.59244 0.01983 0.59708 2.41810
Coefficients:
Estimate Std. Error z value Pr(>|z|)
X(Intercept) 0.08352 0.05280 1.582 0.113692
Xexog3 0.21345 0.05678 3.759 0.000170 ***
Xexog2 0.21131 0.05471 3.862 0.000112 ***
Xexog1 0.45591 0.06023 7.570 3.75e-14 ***
Xexog4 0.39031 0.06173 6.322 2.57e-10 ***
Xexog5 0.75955 0.06427 11.818 < 2e-16 ***
Xexog6 0.85461 0.06831 12.510 < 2e-16 ***
Xexog7 -0.16691 0.05653 -2.953 0.003152 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1386.29 on 1000 degrees of freedom
Residual deviance: 754.14 on 992 degrees of freedom
AIC: 770.14
Number of Fisher Scoring iterations: 6
NOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS
Call:
lm(formula = y1 ~ yhat2 + x1 - 1)
Residuals:
Min 1Q Median 3Q Max
-2.32152 -0.53160 0.04886 0.53502 2.44818
Coefficients:
Estimate Std. Error t value Pr(>|t|)
yhat2 0.257592 0.021451 12.009 <2e-16 ***
x1(Intercept) 0.012185 0.024809 0.491 0.623
x1exog3 0.042520 0.026735 1.590 0.112
x1exog2 0.011854 0.026723 0.444 0.657
x1exog1 0.007773 0.028217 0.275 0.783
x1exog4 0.318636 0.028311 11.255 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7803 on 994 degrees of freedom
Multiple R-squared: 0.4163, Adjusted R-squared: 0.4128
F-statistic: 118.2 on 6 and 994 DF, p-value: < 2.2e-16
Call:
glm(formula = y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.49610 -0.58595 0.01969 0.59857 2.41281
Coefficients:
Estimate Std. Error z value Pr(>|z|)
yhat1 1.26287 0.16061 7.863 3.75e-15 ***
x2(Intercept) 0.07080 0.05276 1.342 0.179654
x2exog1 0.25093 0.06466 3.880 0.000104 ***
x2exog2 0.22604 0.05389 4.194 2.74e-05 ***
x2exog5 0.12912 0.09510 1.358 0.174544
x2exog6 0.95609 0.07172 13.331 < 2e-16 ***
x2exog7 -0.37128 0.06759 -5.493 3.94e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1386.29 on 1000 degrees of freedom
Residual deviance: 754.21 on 993 degrees of freedom
AIC: 768.21
Number of Fisher Scoring iterations: 6
NOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS
Continuous:
Estimate Std. Error t value Pr(>|t|)
yhat2 0.25759209 0.1043073 2.46955009 0.01369540
x1(Intercept) 0.01218500 0.1198713 0.10165068 0.91905445
x1exog3 0.04252006 0.1291588 0.32920764 0.74206810
x1exog2 0.01185438 0.1290754 0.09184073 0.92684309
x1exog1 0.00777347 0.1363643 0.05700519 0.95455252
x1exog4 0.31863627 0.1367881 2.32941597 0.02003661
Dichotomous:
Estimate Std. Error z value Pr(>|z|)
yhat1 1.26286574 0.7395166 1.7076909 0.0876937093
x2(Intercept) 0.07079775 0.2666447 0.2655134 0.7906139867
x2exog1 0.25092561 0.3126763 0.8025092 0.4222584495
x2exog2 0.22603717 0.2739307 0.8251618 0.4092797527
x2exog5 0.12911922 0.4822986 0.2677163 0.7889176766
x2exog6 0.95609385 0.2823662 3.3860070 0.0007091758
x2exog7 -0.37128221 0.3265478 -1.1369920 0.2555416141

systemfit will also do the trick.

there are several packages available in R to do two state least squares. here are a few
sem: Two-Stage Least Squares
Zelig: Link removed, no longer functional (28.07.11)
let me know if these serve your purpose.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Output from scatter3d R script - how to read the equation - r

It's not a plane but rather a paraboloid surface (and using 'y' as the third dimension since you used 'z' already): y = -0.051975 + x * 0.224564 + z * 0.356314 + -x^2 * -0.340781 + z^2 * 0.610344 - x * z * 0.454826

Related

How to formulate time period dummy variable in lm()

Why I() "AsIs" is necessary when making a linear polynomial model in R?

General Linear Model interpretation of parameter estimates in R

Need help modeling data with a log() function

Two stage least square in R

Categories

Resources