R lm interaction terms with categorical and squared continuous variables

R lm interaction terms with categorical and squared continuous variables - r

I am trying to get an lm fit for my data. The problem I am having is that I want to fit a linear model(1st order polynomial) when the factor is "true" and a second order polynomial when the factor is "false". How can I get that done using only one lm.
a=c(1,2,3,4,5,6,7,8,9,10)
b=factor(c("true","false","true","false","true","false","true","false","true","false"))
c=c(10,8,20,15,30,21,40,25,50,31)
DumbData<-data.frame(cbind(a,c))
DumbData<-cbind(DumbData,b=b)
I have tried
Lm2<-lm(c~a + b + b*I(a^2), data=DumbData)
summary(Lm2)
that results in:
summary(Lm2)
Call:
lm(formula = c ~ a + b + b * I(a^2), data = DumbData)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.74483 1.12047 -0.665 0.535640
a 4.44433 0.39619 11.218 9.83e-05 ***
btrue 6.78670 0.78299 8.668 0.000338 ***
I(a^2) -0.13457 0.03324 -4.049 0.009840 **
btrue:I(a^2) 0.18719 0.01620 11.558 8.51e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7537 on 5 degrees of freedom
Multiple R-squared: 0.9982, Adjusted R-squared: 0.9967
F-statistic: 688 on 4 and 5 DF, p-value: 4.896e-07
here I have I(a^2) for both fits and i want 1 1st order and another with second order polynomials.
If one tries with:
Lm2<-lm(c~a + b + I(b*I(a^2)), data=DumbData)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In Ops.factor(b, I(a^2)) : * not meaningful for factors
How can I get the proper interaction terms here???
Thanks Andrie, there are still some things I am missing here. In this example the variable b is a logic one, if is a factor of two levels does not work, I guess I have to convert the factor variable in a logic one. The other thing I am missing is the not in the condition, I(!b*a^2) without the ! I get:
Call: lm(formula = c ~ a + I(b * a^2), data = dat)
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.2692 1.8425 3.945 0.005565 **
a 2.3222 0.3258 7.128 0.000189 ***
I(b * a^2) 0.3005 0.0355 8.465 6.34e-05 ***
I can not relate the formulas with and without the ! condition, which is a bit strange to me.

Try something along the following lines:
dat <- data.frame(
a=c(1,2,3,4,5,6,7,8,9,10),
b=c(TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE),
c=c(10,8,20,15,30,21,40,25,50,31)
)
fit <- lm(c ~ a + I(!b * a^2), dat)
summary(fit)
This results in:
Call:
lm(formula = c ~ a + I(!b * a^2), data = dat)
Residuals:
Min 1Q Median 3Q Max
-4.60 -2.65 0.50 2.65 4.40
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.5000 2.6950 3.896 0.005928 **
a 3.9000 0.4209 9.266 3.53e-05 ***
I(!b * a^2)TRUE -13.9000 2.4178 -5.749 0.000699 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.764 on 7 degrees of freedom
Multiple R-squared: 0.9367, Adjusted R-squared: 0.9186
F-statistic: 51.75 on 2 and 7 DF, p-value: 6.398e-05
Note:
I made use of the logical values TRUE and FALSE.
These will coerce to 1 and 0, respectively.
I used the negation !b inside the formula.

Ummm ...
Lm2<-lm(c~a + b + b*I(a^2), data=DumbData)
You say that "The problem I am having is that I want to fit a linear model(1st order polynomial) when the factor is "true" and a second order polynomial when the factor is "false". How can I get that done using only one lm. "
From that I infer that you don't want b to be directly in the model? In addition, a^2 should be included only if b is false.
So that would be...
lm(c~ a + I((!b) * a^2))
If b is true (that is, !b equals FALSE) then a^2 is multiplied by zero (FALSE) and omitted from the equation.
The only problem is that you have defined b as factor instead of logical. That can be cured.
# b=factor(c("true","false","true","false","true","false","true","false","true","false"))
# could use TRUE and FALSE instead of "ture" and "false"
# alternatively, after defining b as above, do
# b <- b=="true" -- that would convert b to logical (i.e boolean TRUE and FALSe values)
Ok to be exact, you defined b as "character" but it was converted to "factor" when adding it to the data frame ("DumbData")
Another minor point about the way you defined the data frame.
a=c(1,2,3,4,5,6,7,8,9,10)
b=factor(c("true","false","true","false","true","false","true","false","true","false"))
c=c(10,8,20,15,30,21,40,25,50,31)
DumbData<-data.frame(cbind(a,c))
DumbData<-cbind(DumbData,b=b)
Here, cbind is unnecessary. You coud have it all on one line:
Dumbdata<- data.frame(a,b,c)
# shorter and cleaner!!
In addition, to convert b to logical use:
Dumbdata<- data.frame(a,b=b=="true",c)
Note. You need to say b=b=="true", it seems redundant but the LHS (b) gives the name of the variable in data frame whereas the RHS (b=="true") is an expression that evaluates to a "logical" (boolean) value.

Related

SLR of transformed data in R

For Y = % of population with income below poverty level and X = per capita income of population, I have constructed a box-cox plot and found that the lambda = 0.02020:
bc <- boxcox(lm(Percent_below_poverty_level ~ Per_capita_income, data=tidy.CDI), plotit=T)
bc$x[which.max(bc$y)] # gives lambda
Now I want to fit a simple linear regression using the transformed data, so I've entered this code
transform <- lm((Percent_below_poverty_level**0.02020) ~ (Per_capita_income**0.02020))
transform
But all I get is the error message
'Error in terms.formula(formula, data = data) : invalid power in formula'. What is my mistake?

You could use bcPower() from the car package.
## make sure you do install.packages("car") if you haven't already
library(car)
data(Prestige)
p <- powerTransform(prestige ~ income + education + type ,
data=Prestige,
family="bcPower")
summary(p)
# bcPower Transformation to Normality
# Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
# Y1 1.3052 1 0.9408 1.6696
#
# Likelihood ratio test that transformation parameter is equal to 0
# (log transformation)
# LRT df pval
# LR test, lambda = (0) 41.67724 1 1.0765e-10
#
# Likelihood ratio test that no transformation is needed
# LRT df pval
# LR test, lambda = (1) 2.623915 1 0.10526
mod <- lm(bcPower(prestige, 1.3052) ~ income + education + type, data=Prestige)
summary(mod)
#
# Call:
# lm(formula = bcPower(prestige, 1.3052) ~ income + education +
# type, data = Prestige)
#
# Residuals:
# Min 1Q Median 3Q Max
# -44.843 -13.102 0.287 15.073 62.889
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -3.736e+01 1.639e+01 -2.279 0.0250 *
# income 3.363e-03 6.928e-04 4.854 4.87e-06 ***
# education 1.205e+01 2.009e+00 5.999 3.78e-08 ***
# typeprof 2.027e+01 1.213e+01 1.672 0.0979 .
# typewc -1.078e+01 7.884e+00 -1.368 0.1746
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 22.25 on 93 degrees of freedom
# (4 observations deleted due to missingness)
# Multiple R-squared: 0.8492, Adjusted R-squared: 0.8427
# F-statistic: 131 on 4 and 93 DF, p-value: < 2.2e-16

Powers (more often represented by ^ than ** in R, FWIW) have a special meaning inside formulas [they represent interactions among variables rather than mathematical operations]. So if you did want to power-transform both sides of your equation you would use the I() or "as-is" operator:
I(Percent_below_poverty_level^0.02020) ~ I(Per_capita_income^0.02020)
However, I think you should do what #DaveArmstrong suggested anyway:
it's only the predictor variable that gets transformed
the Box-Cox transformation is actually (y^lambda-1)/lambda (although the shift and scale might not matter for your results)

R equivalent of Stata's for-loop over macros

I have a variable x that is between 0 and 1, or (0,1].
I want to generate 10 dummy variables for 10 deciles of variable x. For example x_0_10 takes value 1 if x is between 0 and 0.1, x_10_20 takes value 1 if x is between 0.1 and 0.2, ...
The Stata code to do above is something like this:
forval p=0(10)90 {
local Next=`p'+10
gen x_`p'_`Next'=0
replace x_`p'_`Next'=1 if x<=`Next'/100 & x>`p'/100
}
Now, I am new at R and I wonder how I can do above in R?

cut is your friend here; its output is a factor, which, when used in models, R will auto-expand into the 10 dummy variables.
set.seed(2932)
x = runif(1e4)
y = 3 + 4 * x + rnorm(1e4)
x_cut = cut(x, 0:10/10, include.lowest = TRUE)
summary(lm(y ~ x_cut))
# Call:
# lm(formula = y ~ x_cut)
#
# Residuals:
# Min 1Q Median 3Q Max
# -3.7394 -0.6888 0.0028 0.6864 3.6742
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 3.16385 0.03243 97.564 <2e-16 ***
# x_cut(0.1,0.2] 0.43932 0.04551 9.654 <2e-16 ***
# x_cut(0.2,0.3] 0.85555 0.04519 18.933 <2e-16 ***
# x_cut(0.3,0.4] 1.26441 0.04588 27.556 <2e-16 ***
# x_cut(0.4,0.5] 1.66181 0.04495 36.970 <2e-16 ***
# x_cut(0.5,0.6] 2.04538 0.04574 44.714 <2e-16 ***
# x_cut(0.6,0.7] 2.44771 0.04533 53.999 <2e-16 ***
# x_cut(0.7,0.8] 2.80875 0.04591 61.182 <2e-16 ***
# x_cut(0.8,0.9] 3.22323 0.04545 70.919 <2e-16 ***
# x_cut(0.9,1] 3.60092 0.04564 78.897 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 1.011 on 9990 degrees of freedom
# Multiple R-squared: 0.5589, Adjusted R-squared: 0.5585
# F-statistic: 1407 on 9 and 9990 DF, p-value: < 2.2e-16
See ?cut for more customizations
You can also pass cut directly in the RHS of the formula, which would make using predict a bit easier:
reg = lm(y ~ cut(x, 0:10/10, include.lowest = TRUE))
idx = sample(length(x), 500)
plot(x[idx], y[idx])
x_grid = seq(0, 1, length.out = 500L)
lines(x_grid, predict(reg, data.frame(x = x_grid)),
col = 'red', lwd = 3L, type = 's')

This won't fit well into a comment, but for the record, the Stata code can be simplified down to
forval p = 0/9 {
gen x_`p' = x > `p'/10 & `x' <= (`p' + 1)/10
}
Note that -- contrary to the OP's claim -- values of x exactly zero will be mapped to zero for all these variables, both on their code and on mine (which is intended to be a simplification of their code, not a correct way to do it, modulo a difference of taste on variable names). That follows from the fact that 0 is not greater than 0. Again, values that are exactly 0.1, 0.2, 0.3, will in principle go in the lower bin, not the higher bin, but that is complicated by the fact that most multiples of 0.1 don't have exact binary representations (0.5 is clearly an exception).
Indeed, depending on details about their set-up that the OP doesn't tell us, indicator variables (dummy variables, in their terminology) may well be available in Stata without a loop or made quite unnecessary by factor variable notation. In that respect Stata is closer to R than may at first appear.
While not answering the question directly, the signal here to Stata and R users alike is that Stata need not be so awkward as might be inferred from the code in the question.

variable lengths differ in R using lm

I get a nasty surprise when running lm in R:
variable lengths differ (found for 'returnsandp')
I run the following model:
# regress apple price return on s&p price return
attach(NewSetSlide.ex)
resultr = lm(returnapple ~ returnsandp)
summary(resultr)
It cannot get any more simple than that, but for some reason, I get the error above.
I checked that the length of returnapple & returnsandp is exactly the same. So what on earth is going on, please?
The data.frame in question:
NewSetSlide.ex <- structure(list(returnapple = c(0.1412251, 0.07665801, 0.02560235,
0.09638143, 0.06384145, 0.05163189, -0.1076969, 0.05121892, 0.06428114,
0.09939652, 0.07271771, 0.06923432, 0.02873109, 0.0721757, -0.0121841,
0.07196034, 0.1012038, -0.06786657, 0.06142434, 0.09644931, -0.02754909,
0.005786519, 0.04099078, -0.03320592, -0.03292676, -0.06908485,
-0.01878077, 0.08340874, -0.01004186, -0.1064195, -0.07524236,
-0.006677446, 0.133327, -0.139921, 0.06528701, -0.036831, 0.09006266,
0.01813659, 0.07127628, 0.004334296, -0.02659846, 0.05333548,
0.04774654, 0.1288835, 0.05323629, -0.00006978558, 0.0634182,
-0.0533224, 0.03270362, 0.1026693, -0.05655361, 0.09680779, 0.01662336,
-0.01170586, -0.01063646, 0.0638476, -0.0542103, -0.01501973,
0.1307637, -0.005598485, 0.02798327, 0.1962269, 0.006725292,
0), returnsandp = c(0.1159772758, 0.007614392, 0.1104467964,
0.0359706698, 0.0152313579, 0.0331342721, 0.0189951476, 0.0330947526,
0.0749868297, -0.0124064592, 0.0323295771, -0.0303030364, 0.0113188732,
0.0101582303, -0.0151743475, 0.0174258083, -0.0088341409, -0.0092159647,
-0.0388593467, 0.0134979946, 0.0054655738, -0.05935645, 0.0174692125,
-0.0164511628, 0.1063320628, -0.0034796438, -0.0000602649, -0.0151122528,
0.0223743915, 0.0740851449, 0.0086287811, -0.0028700134, -0.0045942764,
0.0540510532, 0.0121340172, -0.0048475787, -0.0119945162, -0.034724078,
0.0425088143, 0.0650615875, 0.0450610926, 0.0023665278, 0.0714892769,
0.052793919, -0.0141481377, 0.0502292875, 0.0141095206, -0.0586828306,
0.071192607, -0.0854386059, 0.05472933, 0.0214771911, -0.0282882713,
0.1317668962, 0.0369236189, 0.0263898652, -0.0114502121, 0.0060341972,
0.0479144906, 0.0482236974, 0.0349588397, -0.0241661652, -0.2176304161,
-0.0853488645)), class = "data.frame", row.names = c(NA, -64L))

Based on #Dave2e comment.
It is better to use data=NewSetSlide.ex argument inside lm function call to avoid naming conflicts moreover you should avoid using attach function. Please see as below (NewSetSlide.ex data frame was taken from the question above):
resultr <- lm(returnapple ~ returnsandp, data = NewSetSlide.ex)
summary(resultr)
Output:
Call:
lm(formula = returnapple ~ returnsandp, data = NewSetSlide.ex)
Residuals:
Min 1Q Median 3Q Max
-0.166599181 -0.041838291 0.003778841 0.044034591 0.166774667
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.028595156 0.008677931 3.29516 0.0016294 **
returnsandp -0.035466006 0.160976847 -0.22032 0.8263478
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0672294 on 62 degrees of freedom
Multiple R-squared: 0.0007822871, Adjusted R-squared: -0.01533413
F-statistic: 0.04853977 on 1 and 62 DF, p-value: 0.8263478

General Linear Model interpretation of parameter estimates in R

I have a data set that looks like
"","OBSERV","DIOX","logDIOX","OXYGEN","LOAD","PRSEK","PLANT","TIME","LAB"
"1",1011,984.06650389,6.89169348002254,"L","H","L","RENO_N","1","KK"
"2",1022,1790.7973641,7.49041625445373,"H","H","L","RENO_N","1","USA"
"3",1031,661.95870145,6.4952031694744,"L","H","H","RENO_N","1","USA"
"4",1042,978.06853583,6.88557974511529,"H","H","H","RENO_N","1","KK"
"5",1051,270.92290942,5.60183431332639,"N","N","N","RENO_N","1","USA"
"6",1062,402.98269729,5.99889362626069,"N","N","N","RENO_N","1","USA"
"7",1071,321.71945701,5.77367991426247,"H","L","L","RENO_N","1","KK"
"8",1082,223.15260359,5.40785585845064,"L","L","L","RENO_N","1","USA"
"9",1091,246.65350151,5.507984523849,"H","L","H","RENO_N","1","USA"
"10",1102,188.48323034,5.23900903921703,"L","L","H","RENO_N","1","KK"
"11",1141,267.34994025,5.58855843790491,"N","N","N","RENO_N","1","KK"
"12",1152,452.10355987,6.11391126834609,"N","N","N","RENO_N","1","KK"
"13",2011,2569.6672555,7.85153169693888,"N","N","N","KARA","1","USA"
"14",2021,604.79620572,6.40489155123453,"N","N","N","KARA","1","KK"
"15",2031,2610.4804449,7.86728956188212,"L","H",NA,"KARA","1","KK"
"16",2032,3789.7097503,8.24004471210954,"L","H",NA,"KARA","1","USA"
"17",2052,338.97054188,5.82591320649553,"L","L","L","KARA","1","KK"
"18",2061,391.09027375,5.96893841249289,"H","L","H","KARA","1","USA"
"19",2092,410.04420258,6.01626496505788,"N","N","N","KARA","1","USA"
"20",2102,313.51882368,5.74785940190679,"N","N","N","KARA","1","KK"
"21",2112,1242.5931417,7.12495571830002,"H","H","H","KARA","1","KK"
"22",2122,1751.4827969,7.46821802066524,"H","H","L","KARA","1","USA"
"23",3011,60.48026048,4.10231703874031,"N","N","N","RENO_S","1","KK"
"24",3012,257.27729731,5.55015448107691,"N","N","N","RENO_S","1","USA"
"25",3021,46.74282552,3.84466077914493,"N","N","N","RENO_S","1","KK"
"26",3022,73.605375516,4.29871805996994,"N","N","N","RENO_S","1","KK"
"27",3031,108.25433812,4.68448344109116,"H","H","L","RENO_S","1","KK"
"28",3032,124.40704234,4.82355878915293,"H","H","L","RENO_S","1","USA"
"29",3042,123.66859296,4.81760535031397,"L","H","L","RENO_S","1","KK"
"30",3051,170.05332632,5.13611207209694,"N","N","N","RENO_S","1","USA"
"31",3052,95.868704018,4.56297958887925,"N","N","N","RENO_S","1","KK"
"32",3061,202.69261215,5.31169060558111,"N","N","N","RENO_S","1","USA"
"33",3062,70.686307069,4.25825187761015,"N","N","N","RENO_S","1","USA"
"34",3071,52.034715526,3.95191110210073,"L","H","H","RENO_S","1","KK"
"35",3072,93.33525462,4.53619789950355,"L","H","H","RENO_S","1","USA"
"36",3081,121.47464906,4.79970559129829,"H","H","H","RENO_S","1","USA"
"37",3082,94.833869239,4.55212661590867,"H","H","H","RENO_S","1","KK"
"38",3091,68.624596439,4.22865101914209,"H","L","L","RENO_S","1","USA"
"39",3092,64.837097371,4.17187792984139,"H","L","L","RENO_S","1","KK"
"40",3101,32.351569811,3.47666254561192,"L","L","L","RENO_S","1","KK"
"41",3102,29.285124102,3.37707967726539,"L","L","L","RENO_S","1","USA"
"42",3111,31.36974463,3.44584388158928,"L","L","H","RENO_S","1","USA"
"43",3112,28.127853881,3.33676032670116,"L","L","H","RENO_S","1","KK"
"44",3121,91.825330102,4.51988818660262,"H","L","H","RENO_S","1","KK"
"45",3122,136.4559307,4.91600171048243,"H","L","H","RENO_S","1","USA"
"46",4011,126.11889968,4.83722511024933,"H","L","H","RENO_N","2","KK"
"47",4022,76.520259821,4.33755554003153,"L","L","L","RENO_N","2","KK"
"48",4032,93.551979795,4.53851721545715,"L","L","H","RENO_N","2","USA"
"49",4041,207.09703422,5.33318744777751,"H","L","L","RENO_N","2","USA"
"50",4052,383.44185307,5.94918798759058,"N","N","N","RENO_N","2","USA"
"51",4061,156.79345897,5.05492939129363,"N","N","N","RENO_N","2","USA"
"52",4071,322.72413197,5.77679787769979,"L","H","L","RENO_N","2","USA"
"53",4082,554.05710342,6.31726775620079,"H","H","H","RENO_N","2","USA"
"54",4091,122.55552697,4.80856420867156,"N","N","N","RENO_N","2","KK"
"55",4102,112.70050456,4.72473389805434,"N","N","N","RENO_N","2","KK"
"56",4111,94.245481423,4.54590288271731,"L","H","H","RENO_N","2","KK"
"57",4122,323.16498582,5.77816298482521,"H","H","L","RENO_N","2","KK"
I define a linear model in R using lm as
lm1 <- lm(logDIOX ~ 1 + OXYGEN + LOAD + PLANT + TIME + LAB, data=data)
and I want to interpret the estimated coefficients. However, when I extract the coefficients I get multiple 'NAs' (I'm assuming it's due to linear dependencies among the variables). How can I then interpret the coefficients? I only have one intercept that somehow represents one of the levels of each of the included factors in the model. Is it possible to get an estimate for each factor level?
> summary(lm1)
Coefficients:
Call:
lm(formula = logDIOX ~ OXYGEN + LOAD + PLANT + TIME + LAB, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.90821 -0.32102 -0.08993 0.27311 0.97758
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.2983 0.2110 34.596 < 2e-16 ***
OXYGENL -0.4086 0.1669 -2.449 0.017953 *
OXYGENN -0.7567 0.1802 -4.199 0.000113 ***
LOADL -1.0645 0.1675 -6.357 6.58e-08 ***
LOADN NA NA NA NA
PLANTRENO_N -0.6636 0.2174 -3.052 0.003664 **
PLANTRENO_S -2.3452 0.1929 -12.158 < 2e-16 ***
TIME2 -0.9160 0.2065 -4.436 5.18e-05 ***
LABUSA 0.3829 0.1344 2.849 0.006392 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5058 on 49 degrees of freedom
Multiple R-squared: 0.8391, Adjusted R-squared: 0.8161
F-statistic: 36.5 on 7 and 49 DF, p-value: < 2.2e-16

For the NA part of your question you can have a look here:
[linear regression "NA" estimate just for last coefficient, actually of your variables can be described as a linear combination of the rest.
For the factors and their levels the way r works is showing intercept with first factor level and shows the difference of the intercept with the rest of the factors. I think it will be more clear with just one factor regression:
lm1 <- lm(logDIOX ~ 1 + OXYGEN , data=df)
> summary(lm1)
Call:
lm(formula = logDIOX ~ 1 + OXYGEN, data = df)
Residuals:
Min 1Q Median 3Q Max
-1.7803 -0.7833 -0.2027 0.6597 3.1229
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.5359 0.2726 20.305 <2e-16 ***
OXYGENL -0.4188 0.3909 -1.071 0.289
OXYGENN -0.1896 0.3807 -0.498 0.621
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.188 on 54 degrees of freedom
Multiple R-squared: 0.02085, Adjusted R-squared: -0.01542
F-statistic: 0.5749 on 2 and 54 DF, p-value: 0.5662
what this result is saying is that for
OXYGEN="H" intercept is 5.5359, for OXYGEN="L" intercept is 5.5359-0.4188=5.1171 and for OXYGEN="N" intercept is 5.5359-0.1896= 5.3463.
Hope this helps
UPDATE:
Following your comment I generalize to your model.
when
OXYGEN = "H"
LOAD = "H"
PLANT= "KARRA"
TIME=1
LAB="KK"
then:
logDIOX =7.2983
when
OXYGEN = "L"
LOAD = "H"
PLANT= "KARRA"
TIME=1
LAB="KK"
then:
logDIOX =7.2983-0.4086 =6.8897
when
OXYGEN = "L"
LOAD = "L"
PLANT= "KARRA"
TIME=1
LAB="KK"
then:
logDIOX =7.2983-0.4086-1.0645 =5.8252
etc.

Changing significance notation in R

R has certain significance codes to determine statistical significance. In the example below, for example, a dot . indicates significance at the 10% level (see sample output below).
Dots can be very hard to see, especially when I copy-paste to Excel and display it in Times New Roman.
I'd like to change it such that:
* = significant at 10%
** = significant at 5%
*** = significant at 1%
Is there a way I can do this?
> y = c(1,2,3,4,5,6,7,8)
> x = c(1,3,2,4,5,6,8,7)
> summary(lm(y~x))
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-1.0714 -0.3333 0.0000 0.2738 1.1191
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2143 0.6286 0.341 0.74480
x 0.9524 0.1245 7.651 0.00026 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8067 on 6 degrees of freedom
Multiple R-squared: 0.907, Adjusted R-squared: 0.8915
F-statistic: 58.54 on 1 and 6 DF, p-value: 0.0002604

You can create your own formatting function with
mystarformat <- function(x) symnum(x, corr = FALSE, na = FALSE,
cutpoints = c(0, 0.01, 0.05, 0.1, 1),
symbols = c("***", "**", "*", " "))
And you can write your own coefficient formatter
show_coef <- function(mm) {
mycoef<-data.frame(coef(summary(mm)), check.names=F)
mycoef$signif = mystarformat(mycoef$`Pr(>|t|)`)
mycoef$`Pr(>|t|)` = format.pval(mycoef$`Pr(>|t|)`)
mycoef
}
And then with your model, you can run it with
mm <- lm(y~x)
show_coef(mm)
# Estimate Std. Error t value Pr(>|t|) signif
# (Intercept) 0.2142857 0.6285895 0.3408993 0.7447995
# x 0.9523810 0.1244793 7.6509206 0.0002604 ***

One should be aware that stargazer package reports significance levels with a different scale than other statistical softwares like STATA.
In R (stargazer) you get # (* p<0.1; ** p<0.05; *** p<0.01). Whereas, in STATA you get # (* p<0.05, ** p<0.01, *** p< 0.001).
This means that what is significant with one * in R results may appear not to be significant for a STATA user.

Sorry for the late response, but I found a great solution to this.
Just do the following:
install.packages("stargazer")
library(stargazer)
stargazer(your_regression, type = "text")
This displays everything in a beautiful way with your desired format.
Note: If you leave type = "text" out, then you'll get the LaTeX code.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R lm interaction terms with categorical and squared continuous variables - r

Related

SLR of transformed data in R

R equivalent of Stata's for-loop over macros

variable lengths differ in R using lm

General Linear Model interpretation of parameter estimates in R

Changing significance notation in R

Categories

Resources