General Linear Model interpretation of parameter estimates in R

General Linear Model interpretation of parameter estimates in R - r

I have a data set that looks like
"","OBSERV","DIOX","logDIOX","OXYGEN","LOAD","PRSEK","PLANT","TIME","LAB"
"1",1011,984.06650389,6.89169348002254,"L","H","L","RENO_N","1","KK"
"2",1022,1790.7973641,7.49041625445373,"H","H","L","RENO_N","1","USA"
"3",1031,661.95870145,6.4952031694744,"L","H","H","RENO_N","1","USA"
"4",1042,978.06853583,6.88557974511529,"H","H","H","RENO_N","1","KK"
"5",1051,270.92290942,5.60183431332639,"N","N","N","RENO_N","1","USA"
"6",1062,402.98269729,5.99889362626069,"N","N","N","RENO_N","1","USA"
"7",1071,321.71945701,5.77367991426247,"H","L","L","RENO_N","1","KK"
"8",1082,223.15260359,5.40785585845064,"L","L","L","RENO_N","1","USA"
"9",1091,246.65350151,5.507984523849,"H","L","H","RENO_N","1","USA"
"10",1102,188.48323034,5.23900903921703,"L","L","H","RENO_N","1","KK"
"11",1141,267.34994025,5.58855843790491,"N","N","N","RENO_N","1","KK"
"12",1152,452.10355987,6.11391126834609,"N","N","N","RENO_N","1","KK"
"13",2011,2569.6672555,7.85153169693888,"N","N","N","KARA","1","USA"
"14",2021,604.79620572,6.40489155123453,"N","N","N","KARA","1","KK"
"15",2031,2610.4804449,7.86728956188212,"L","H",NA,"KARA","1","KK"
"16",2032,3789.7097503,8.24004471210954,"L","H",NA,"KARA","1","USA"
"17",2052,338.97054188,5.82591320649553,"L","L","L","KARA","1","KK"
"18",2061,391.09027375,5.96893841249289,"H","L","H","KARA","1","USA"
"19",2092,410.04420258,6.01626496505788,"N","N","N","KARA","1","USA"
"20",2102,313.51882368,5.74785940190679,"N","N","N","KARA","1","KK"
"21",2112,1242.5931417,7.12495571830002,"H","H","H","KARA","1","KK"
"22",2122,1751.4827969,7.46821802066524,"H","H","L","KARA","1","USA"
"23",3011,60.48026048,4.10231703874031,"N","N","N","RENO_S","1","KK"
"24",3012,257.27729731,5.55015448107691,"N","N","N","RENO_S","1","USA"
"25",3021,46.74282552,3.84466077914493,"N","N","N","RENO_S","1","KK"
"26",3022,73.605375516,4.29871805996994,"N","N","N","RENO_S","1","KK"
"27",3031,108.25433812,4.68448344109116,"H","H","L","RENO_S","1","KK"
"28",3032,124.40704234,4.82355878915293,"H","H","L","RENO_S","1","USA"
"29",3042,123.66859296,4.81760535031397,"L","H","L","RENO_S","1","KK"
"30",3051,170.05332632,5.13611207209694,"N","N","N","RENO_S","1","USA"
"31",3052,95.868704018,4.56297958887925,"N","N","N","RENO_S","1","KK"
"32",3061,202.69261215,5.31169060558111,"N","N","N","RENO_S","1","USA"
"33",3062,70.686307069,4.25825187761015,"N","N","N","RENO_S","1","USA"
"34",3071,52.034715526,3.95191110210073,"L","H","H","RENO_S","1","KK"
"35",3072,93.33525462,4.53619789950355,"L","H","H","RENO_S","1","USA"
"36",3081,121.47464906,4.79970559129829,"H","H","H","RENO_S","1","USA"
"37",3082,94.833869239,4.55212661590867,"H","H","H","RENO_S","1","KK"
"38",3091,68.624596439,4.22865101914209,"H","L","L","RENO_S","1","USA"
"39",3092,64.837097371,4.17187792984139,"H","L","L","RENO_S","1","KK"
"40",3101,32.351569811,3.47666254561192,"L","L","L","RENO_S","1","KK"
"41",3102,29.285124102,3.37707967726539,"L","L","L","RENO_S","1","USA"
"42",3111,31.36974463,3.44584388158928,"L","L","H","RENO_S","1","USA"
"43",3112,28.127853881,3.33676032670116,"L","L","H","RENO_S","1","KK"
"44",3121,91.825330102,4.51988818660262,"H","L","H","RENO_S","1","KK"
"45",3122,136.4559307,4.91600171048243,"H","L","H","RENO_S","1","USA"
"46",4011,126.11889968,4.83722511024933,"H","L","H","RENO_N","2","KK"
"47",4022,76.520259821,4.33755554003153,"L","L","L","RENO_N","2","KK"
"48",4032,93.551979795,4.53851721545715,"L","L","H","RENO_N","2","USA"
"49",4041,207.09703422,5.33318744777751,"H","L","L","RENO_N","2","USA"
"50",4052,383.44185307,5.94918798759058,"N","N","N","RENO_N","2","USA"
"51",4061,156.79345897,5.05492939129363,"N","N","N","RENO_N","2","USA"
"52",4071,322.72413197,5.77679787769979,"L","H","L","RENO_N","2","USA"
"53",4082,554.05710342,6.31726775620079,"H","H","H","RENO_N","2","USA"
"54",4091,122.55552697,4.80856420867156,"N","N","N","RENO_N","2","KK"
"55",4102,112.70050456,4.72473389805434,"N","N","N","RENO_N","2","KK"
"56",4111,94.245481423,4.54590288271731,"L","H","H","RENO_N","2","KK"
"57",4122,323.16498582,5.77816298482521,"H","H","L","RENO_N","2","KK"
I define a linear model in R using lm as
lm1 <- lm(logDIOX ~ 1 + OXYGEN + LOAD + PLANT + TIME + LAB, data=data)
and I want to interpret the estimated coefficients. However, when I extract the coefficients I get multiple 'NAs' (I'm assuming it's due to linear dependencies among the variables). How can I then interpret the coefficients? I only have one intercept that somehow represents one of the levels of each of the included factors in the model. Is it possible to get an estimate for each factor level?
> summary(lm1)
Coefficients:
Call:
lm(formula = logDIOX ~ OXYGEN + LOAD + PLANT + TIME + LAB, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.90821 -0.32102 -0.08993 0.27311 0.97758
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.2983 0.2110 34.596 < 2e-16 ***
OXYGENL -0.4086 0.1669 -2.449 0.017953 *
OXYGENN -0.7567 0.1802 -4.199 0.000113 ***
LOADL -1.0645 0.1675 -6.357 6.58e-08 ***
LOADN NA NA NA NA
PLANTRENO_N -0.6636 0.2174 -3.052 0.003664 **
PLANTRENO_S -2.3452 0.1929 -12.158 < 2e-16 ***
TIME2 -0.9160 0.2065 -4.436 5.18e-05 ***
LABUSA 0.3829 0.1344 2.849 0.006392 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5058 on 49 degrees of freedom
Multiple R-squared: 0.8391, Adjusted R-squared: 0.8161
F-statistic: 36.5 on 7 and 49 DF, p-value: < 2.2e-16

For the NA part of your question you can have a look here:
[linear regression "NA" estimate just for last coefficient, actually of your variables can be described as a linear combination of the rest.
For the factors and their levels the way r works is showing intercept with first factor level and shows the difference of the intercept with the rest of the factors. I think it will be more clear with just one factor regression:
lm1 <- lm(logDIOX ~ 1 + OXYGEN , data=df)
> summary(lm1)
Call:
lm(formula = logDIOX ~ 1 + OXYGEN, data = df)
Residuals:
Min 1Q Median 3Q Max
-1.7803 -0.7833 -0.2027 0.6597 3.1229
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.5359 0.2726 20.305 <2e-16 ***
OXYGENL -0.4188 0.3909 -1.071 0.289
OXYGENN -0.1896 0.3807 -0.498 0.621
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.188 on 54 degrees of freedom
Multiple R-squared: 0.02085, Adjusted R-squared: -0.01542
F-statistic: 0.5749 on 2 and 54 DF, p-value: 0.5662
what this result is saying is that for
OXYGEN="H" intercept is 5.5359, for OXYGEN="L" intercept is 5.5359-0.4188=5.1171 and for OXYGEN="N" intercept is 5.5359-0.1896= 5.3463.
Hope this helps
UPDATE:
Following your comment I generalize to your model.
when
OXYGEN = "H"
LOAD = "H"
PLANT= "KARRA"
TIME=1
LAB="KK"
then:
logDIOX =7.2983
when
OXYGEN = "L"
LOAD = "H"
PLANT= "KARRA"
TIME=1
LAB="KK"
then:
logDIOX =7.2983-0.4086 =6.8897
when
OXYGEN = "L"
LOAD = "L"
PLANT= "KARRA"
TIME=1
LAB="KK"
then:
logDIOX =7.2983-0.4086-1.0645 =5.8252
etc.

Related

How to formulate time period dummy variable in lm()

I am analysing whether the effects of x_t on y_t differ during and after a specific time period.
I am trying to regress the following model in R using lm():
y_t = b_0 + [b_1(1-D_t) + b_2 D_t]x_t
where D_t is a dummy variable with the value 1 over the time period and 0 otherwise.
Is it possible to use lm() for this formula?

observationNumber <- 1:80
obsFactor <- cut(observationNumber, breaks = c(0,55,81), right =F)
fit <- lm(y ~ x * obsFactor)
For example:
y = runif(80)
x = rnorm(80) + c(rep(0,54), rep(1, 26))
fit <- lm(y ~ x * obsFactor)
summary(fit)
Call:
lm(formula = y ~ x * obsFactor)
Residuals:
Min 1Q Median 3Q Max
-0.48375 -0.29655 0.05957 0.22797 0.49617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.50959 0.04253 11.983 <2e-16 ***
x -0.02492 0.04194 -0.594 0.554
obsFactor[55,81) -0.06357 0.09593 -0.663 0.510
x:obsFactor[55,81) 0.07120 0.07371 0.966 0.337
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3116 on 76 degrees of freedom
Multiple R-squared: 0.01303, Adjusted R-squared: -0.02593
F-statistic: 0.3345 on 3 and 76 DF, p-value: 0.8004
obsFactor[55,81) is zero if observationNumber < 55 and one if its greater or equal its coefficient is your $b_0$. x:obsFactor[55,81) is the product of the dummy and the variable $x_t$ - its coefficient is your $b_2$. The coefficient for $x_t$ is your $b_1$.

Why I() "AsIs" is necessary when making a linear polynomial model in R?

I'm trying to understand what is the role of I() base function in R when using a linear polynomial model or the function poly. When I calculate the model using
q + q^2
q + I(q^2)
poly(q, 2)
I have different answers.
Here is an example:
set.seed(20)
q <- seq(from=0, to=20, by=0.1)
y <- 500 + .1 * (q-5)^2
noise <- rnorm(length(q), mean=10, sd=80)
noisy.y <- y + noise
model3 <- lm(noisy.y ~ poly(q,2))
model1 <- lm(noisy.y ~ q + I(q^2))
model2 <- lm(noisy.y ~ q + q^2)
I(q^2)==I(q)^2
I(q^2)==q^2
summary(model1)
summary(model2)
summary(model3)
Here is the output:
> summary(model1)
Call:
lm(formula = noisy.y ~ q + I(q^2))
Residuals:
Min 1Q Median 3Q Max
-211.592 -50.609 4.742 61.983 165.792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 489.3723 16.5982 29.483 <2e-16 ***
q 5.0560 3.8344 1.319 0.189
I(q^2) -0.1530 0.1856 -0.824 0.411
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.22 on 198 degrees of freedom
Multiple R-squared: 0.02451, Adjusted R-squared: 0.01466
F-statistic: 2.488 on 2 and 198 DF, p-value: 0.08568
> summary(model2)
Call:
lm(formula = noisy.y ~ q + q^2)
Residuals:
Min 1Q Median 3Q Max
-219.96 -54.42 3.30 61.06 170.79
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 499.5209 11.1252 44.900 <2e-16 ***
q 1.9961 0.9623 2.074 0.0393 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.16 on 199 degrees of freedom
Multiple R-squared: 0.02117, Adjusted R-squared: 0.01625
F-statistic: 4.303 on 1 and 199 DF, p-value: 0.03933
> summary(model3)
Call:
lm(formula = noisy.y ~ poly(q, 2))
Residuals:
Min 1Q Median 3Q Max
-211.592 -50.609 4.742 61.983 165.792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 519.482 5.588 92.966 <2e-16 ***
poly(q, 2)1 164.202 79.222 2.073 0.0395 *
poly(q, 2)2 -65.314 79.222 -0.824 0.4107
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.22 on 198 degrees of freedom
Multiple R-squared: 0.02451, Adjusted R-squared: 0.01466
F-statistic: 2.488 on 2 and 198 DF, p-value: 0.08568
Why is the I() necessary when doing a polynomial model in R.
Also, is this normal that the poly function doesn't give the same result as the q + I(q^2)?

The formula syntax in R is described in the ?formula help page. The ^ symbol has not been given the usual meaning of multiplicative exponentiation. Rather, it's used for interactions between all terms at the base of the exponent. For example
y ~ (a+b)^2
is the same as
y ~ a + b + a:b
But if you do
y ~ a + b^2
y ~ a + b # same as above, no way to "interact" b with itself.
That caret would just include the b term because it can't include the interaction with itself. So ^ and * inside formulas has nothing to do with multiplication just like the + doesn't really mean addition for variables in the usual sense.
If you want the "usual" definition for ^2 you need to put it the as is function. Otherwise it's not fitting a squared term at all.
And the poly() function by default returns orthogonal polynomials as described on the help page. This helps to reduce co-linearity in the covariates. But if you don't want the orthogonal versions and just want the "raw" polynomial terms, then just pass raw=TRUE to your poly call. For example
lm(noisy.y ~ poly(q,2, raw=TRUE))
will return the same estimates as model1

Extract data from Partial least square regression on R

I want to use the partial least squares regression to find the most representative variables to predict my data.
Here is my code:
library(pls)
potion<-read.table("potion-insomnie.txt",header=T)
potionTrain <- potion[1:182,]
potionTest <- potion[183:192,]
potion1 <- plsr(Sommeil ~ Aubepine + Bave + Poudre + Pavot, data = potionTrain, validation = "LOO")
The summary(lm(potion1)) give me this answer:
Call:
lm(formula = potion1)
Residuals:
Min 1Q Median 3Q Max
-14.9475 -5.3961 0.0056 5.2321 20.5847
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.63931 1.67955 22.410 < 2e-16 ***
Aubepine -0.28226 0.05195 -5.434 1.81e-07 ***
Bave -1.79894 0.26849 -6.700 2.68e-10 ***
Poudre 0.35420 0.72849 0.486 0.627
Pavot -0.47678 0.52027 -0.916 0.361
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.845 on 177 degrees of freedom
Multiple R-squared: 0.293, Adjusted R-squared: 0.277
F-statistic: 18.34 on 4 and 177 DF, p-value: 1.271e-12
I deduced that only the variables Aubepine et Bave are representative. So I redid the model just with this two variables:
potion1 <- plsr(Sommeil ~ Aubepine + Bave, data = potionTrain, validation = "LOO")
And I plot:
plot(potion1, ncomp = 2, asp = 1, line = TRUE)
Here is the plot of predicted vs measured values:
The problem is that I see the linear regression on the plot, but I can not know its equation and R². Is it possible ?
Is the first part is the same as a multiple regression linear (ANOVA)?

pacman::p_load(pls)
data(mtcars)
potion <- mtcars
potionTrain <- potion[1:28,]
potionTest <- potion[29:32,]
potion1 <- plsr(mpg ~ cyl + disp + hp + drat, data = potionTrain, validation = "LOO")
coef(potion1) # coefficeints
scores(potion1) # scores
## R^2:
R2(potion1, estimate = "train")
## cross-validated R^2:
R2(potion1)
## Both:
R2(potion1, estimate = "all")

Different results of lm with same dataset written in two different languages (English and Korean)

Results of lm function applied on two dataset (numeric variables + categorical variables) written in two different languages (one written in English and the other one written in Korean) are different. Except the categorical variables, numeric variable are exactly the same. What could explain the difference in the results?
#data
df3 <- repmis::source_DropboxData("df3_v0.1.csv","gg30a74n4ew3zzg",header = TRUE)
#the one written in korean
out1<-lm(YD~SANJI+TAmin8+TMINup18do6+typ_rain6+DTD9,data=df3)
summary(out1)
#the one written in eng
df3$SANJI[df3$SANJI=="전북"]<-"JB"
df3$SANJI[df3$SANJI=="충북"]<-"CHB"
df3$SANJI[df3$SANJI=="경북"]<-"KB"
df3$SANJI[df3$SANJI=="전남"]<-"JN"
df3$SANJI2[df3$SANJI2=="고창"]<-"Gochang"
df3$SANJI2[df3$SANJI2=="괴산"]<-"Goesan"
df3$SANJI2[df3$SANJI2=="단양"]<-"Danyang"
df3$SANJI2[df3$SANJI2=="봉화"]<-"Fenghua"
df3$SANJI2[df3$SANJI2=="신안"]<-"Sinan"
df3$SANJI2[df3$SANJI2=="안동"]<-"Andong"
df3$SANJI2[df3$SANJI2=="영광"]<-"younggang"
df3$SANJI2[df3$SANJI2=="영양"]<-"youngyang"
df3$SANJI2[df3$SANJI2=="영주"]<-"youngju"
df3$SANJI2[df3$SANJI2=="예천"]<-"Yecheon"
df3$SANJI2[df3$SANJI2=="의성"]<-"Yusaeng"
df3$SANJI2[df3$SANJI2=="제천"]<-"Jechon"
df3$SANJI2[df3$SANJI2=="진안"]<-"Jinan"
df3$SANJI2[df3$SANJI2=="청송"]<-"Changsong"
df3$SANJI2[df3$SANJI2=="해남"]<-"Haenam"
out2<-lm(YD~SANJI+TAmin8+TMINup18do6+typ_rain6+DTD9,data=df3)
summary(out2)
#the one written in korean
#Call:
#lm(formula = YD ~ SANJI + TAmin8 + TMINup18do6 + typ_rain6 +
# DTD9, data = df3)
#Residuals:
# Min 1Q Median 3Q Max
#-98.836 -23.173 -2.261 22.626 111.367
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 970.33251 84.12479 11.534 < 2e-16 ***
#SANJI전남 -33.75664 12.53277 -2.693 0.008158 **
#SANJI전북 -44.17939 11.22274 -3.937 0.000144 ***
#SANJI충북 -44.09285 9.16736 -4.810 4.74e-06 ***
#TAmin8 -25.56618 3.36053 -7.608 9.37e-12 ***
#TMINup18do6 4.58052 0.96528 4.745 6.19e-06 ***
#typ_rain6 -0.19754 0.02862 -6.903 3.23e-10 ***
#DTD9 -16.15975 2.65128 -6.095 1.59e-08 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 37.2 on 112 degrees of freedom
#Multiple R-squared: 0.58, Adjusted R-squared: 0.5538
#F-statistic: 22.1 on 7 and 112 DF, p-value: < 2.2e-16
#the one written in eng
#Call:
#lm(formula = YD ~ SANJI + TAmin8 + TMINup18do6 + typ_rain6 +
# DTD9, data = df3)
#Residuals:
# Min 1Q Median 3Q Max
#-98.836 -23.173 -2.261 22.626 111.367
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 926.23966 84.32621 10.984 < 2e-16 ***
#SANJIJB -0.08654 12.32752 -0.007 0.994
#SANJIJN 10.33620 13.09434 0.789 0.432
#SANJIKB 44.09285 9.16736 4.810 4.74e-06 ***
#TAmin8 -25.56618 3.36053 -7.608 9.37e-12 ***
#TMINup18do6 4.58052 0.96528 4.745 6.19e-06 ***
#typ_rain6 -0.19754 0.02862 -6.903 3.23e-10 ***
#DTD9 -16.15975 2.65128 -6.095 1.59e-08 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 37.2 on 112 degrees of freedom
#Multiple R-squared: 0.58, Adjusted R-squared: 0.5538
#F-statistic: 22.1 on 7 and 112 DF, p-value: < 2.2e-16

Your overall model fits are the same, you just have different reference classes for your factor ("SANJIJ"). Having a different reference level will also affect your intercept but won't change the estimation of your continuous covariates.
You can use relevel() to force a particular reference class (assuming SANJIJ is already a factor) or explicitly create the factor() with a levels= parameter, otherwise the default order is sorted alphabetically and the levels may not sort the same way in the different languages.

Output from scatter3d R script - how to read the equation

I am using scatter3d to find a fit in my R script. I did so, and here is the output:
Call:
lm(formula = y ~ (x + z)^2 + I(x^2) + I(z^2))
Residuals:
Min 1Q Median 3Q Max
-0.78454 -0.02302 -0.00563 0.01398 0.47846
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.051975 0.003945 -13.173 < 2e-16 ***
x 0.224564 0.023059 9.739 < 2e-16 ***
z 0.356314 0.021782 16.358 < 2e-16 ***
I(x^2) -0.340781 0.044835 -7.601 3.46e-14 ***
I(z^2) 0.610344 0.028421 21.475 < 2e-16 ***
x:z -0.454826 0.065632 -6.930 4.71e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.05468 on 5293 degrees of freedom
Multiple R-squared: 0.6129, Adjusted R-squared: 0.6125
F-statistic: 1676 on 5 and 5293 DF, p-value: < 2.2e-16
Based on this, what is the equation of the best fit line? I'm not really sure how to read this? Can someone explain? thanks!

This is a basic regression output table. The parameter estimates ("Estimate" column) are the best-fit line coefficients corresponding to the different terms in your model. If you aren't familiar with this terminology, I would suggest reading up on some linear model and regression tutorial. There are thousands around the web. I would also encourage you to play with some simpler 2D simulations.
For example, let's make some data with an intercept of 2 and a slope of 0.5:
# Simulate data
set.seed(12345)
x = seq(0, 10, len=50)
y = 2 + 0.5 * x + rnorm(length(x), 0, 0.1)
data = data.frame(x, y)
Now when we look at the fit, you'll see that the Estimate column shows these same values:
# Fit model
fit = lm(y ~ x, data=data)
summary(fit)
> summary(fit)
Call:
lm(formula = y ~ x, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.26017 -0.06434 0.02539 0.06238 0.20008
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.011759 0.030856 65.20 <2e-16 ***
x 0.501240 0.005317 94.27 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1107 on 48 degrees of freedom
Multiple R-squared: 0.9946, Adjusted R-squared: 0.9945
F-statistic: 8886 on 1 and 48 DF, p-value: < 2.2e-16
Pulling these out, we can then plot the best-fit line:
# Make plot
dev.new(width=4, height=4)
plot(x, y, ylim=c(0,10))
abline(fit$coef[1], fit$coef[2])

It's not a plane but rather a paraboloid surface (and using 'y' as the third dimension since you used 'z' already):
y = -0.051975 + x * 0.224564 + z * 0.356314 +
-x^2 * -0.340781 + z^2 * 0.610344 - x * z * 0.454826

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

General Linear Model interpretation of parameter estimates in R - r

Related

How to formulate time period dummy variable in lm()

Why I() "AsIs" is necessary when making a linear polynomial model in R?

Extract data from Partial least square regression on R

Different results of lm with same dataset written in two different languages (English and Korean)

Output from scatter3d R script - how to read the equation

Categories

Resources