Separate output when results='hold' in knitr - r

Is there a way to separate or section the output in a {r results='hold'} block in knitr without needing to manually insert a print('--------') command in between or the like? I want to avoid such extra print lines is that it makes the code much harder to read.
For example, you need to be quite trained to see where the output from one line stops and the next begins in the output from this:
```{r, results='hold'}
summary(lm(Sepal.Length ~ Species, iris))
#print('-------------') # Not a solution
summary(aov(Sepal.Length ~ Species, iris))
```
Output:
Call:
lm(formula = Sepal.Length ~ Species, data = iris)
Residuals:
Min 1Q Median 3Q Max
-1.6880 -0.3285 -0.0060 0.3120 1.3120
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0060 0.0728 68.762 < 2e-16 ***
Speciesversicolor 0.9300 0.1030 9.033 8.77e-16 ***
Speciesvirginica 1.5820 0.1030 15.366 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5148 on 147 degrees of freedom
Multiple R-squared: 0.6187, Adjusted R-squared: 0.6135
F-statistic: 119.3 on 2 and 147 DF, p-value: < 2.2e-16
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 63.21 31.606 119.3 <2e-16 ***
Residuals 147 38.96 0.265
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Maybe some knitr option or a more general R printing option?

Related

lm(summary) : essentially perfect fit: summary may be unreliable error for standardized datas

I am a beginner in R and statistics in general. I am trying to build a regression model with 4 variables (one of them is nominal data, with 3 alt categories). I think i managed to build a model with my raw data. But I wanted to standardize my data-set and the lm essentially perfect fit: summary may be unreliable message.
This is the summary of the model with raw data;
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.0713 5.9131 -0.350 0.727
Vdownload 8.6046 0.5286 16.279 < 2e-16 ***
DownloadDegisim 2.8854 0.6822 4.229 4.25e-05 ***
Vupload -4.2877 0.5418 -7.914 7.32e-13 ***
Saglayici2 -8.2084 0.6043 -13.583 < 2e-16 ***
Saglayici3 -9.8869 0.5944 -16.634 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.885 on 138 degrees of freedom
Multiple R-squared: 0.8993, Adjusted R-squared: 0.8956
F-statistic: 246.5 on 5 and 138 DF, p-value: < 2.2e-16
I wrote these codes to standardize my data
memnuniyet_scaled <-scale(Vdownload, center = TRUE, scale = TRUE)
Vdownload_scaled <-scale(Vdownload, center = TRUE, scale = TRUE)
Vupload_scaled <-scale(Vupload, center = TRUE, scale = TRUE)
DownloadD_scaled <- scale(DownloadDegisim, center = TRUE, scale = TRUE)
result<-lm(memnuniyet_scaled~Vdownload_scaled+DownloadD_scaled+Vupload_scaled+Saglayıcı)
summary(result)
And this is the summary of my standardized data
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.079e-17 5.493e-17 9.250e-01 0.357
Vdownload_scaled 1.000e+00 6.667e-17 1.500e+16 <2e-16 ***
DownloadD_scaled -4.591e-17 8.189e-17 -5.610e-01 0.576
Vupload_scaled 9.476e-18 6.337e-17 1.500e-01 0.881
Saglayici2 -6.523e-17 7.854e-17 -8.300e-01 0.408
Saglayici3 -8.669e-17 7.725e-17 -1.122e+00 0.264
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.75e-16 on 138 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 2.034e+32 on 5 and 138 DF, p-value: < 2.2e-16
I do know that R value should not have changed with standardization and have no idea what I did wrong.

Chinese characters in summary that I cannot knit to pdf

```
model2 = lm(1/glyhb~ gender + age + gender:age ,data = diabetes)
summary(model2)
```
Call:
lm(formula = 1/glyhb ~ gender + age + gender:age, data = diabetes)
Residuals:
Min 1Q Median 3Q Max
-0.149383 -0.019681 0.002455 0.029739 0.163164
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2498158 0.0120415 20.746 < 2e-16 ***
genderfemale 0.0123141 0.0151608 0.812 0.417
age -0.0011384 0.0002363 -4.817 2.09e-06 ***
genderfemale:age -0.0002195 0.0003030 -0.724 0.469
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.04778 on 386 degrees of freedom
(因为不存在,13个观察量被删除了)
Multiple R-squared: 0.164, Adjusted R-squared: 0.1575
F-statistic: 25.24 on 3 and 386 DF, p-value: 6.264e-15
Error message:
! LaTeX Error: Unicode character 因 (U+56E0)
not set up for use with LaTeX.
Can anyone help me to change this line under residual standard error to English please....
Thanks in advance!
output:
pdf_document:
latex_engine: xelatex
works!

Small sample (20-25 observations) - Robust standard errors (Newey-West) do not change coefficients/standard errors. Is this normal?

I am running a simple regression (OLS)
> lm_1 <- lm(Dependent_variable_1 ~ Independent_variable_1, data = data_1)
> summary(lm_1)
Call:
lm(formula = Dependent_variable_1 ~ Independent_variable_1,
data = data_1)
Residuals:
Min 1Q Median 3Q Max
-143187 -34084 -4990 37524 136293
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 330853 13016 25.418 < 2e-16 ***
`GDP YoY% - Base` 3164631 689599 4.589 0.000118 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 66160 on 24 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.4674, Adjusted R-squared: 0.4452
F-statistic: 21.06 on 1 and 24 DF, p-value: 0.0001181
The autocorrelation and heteroskedasticity tests follow:
> dwtest(lm_1,alternative="two.sided")
Durbin-Watson test
data: lm_1
DW = 0.93914, p-value = 0.001591
alternative hypothesis: true autocorrelation is not 0
> bptest(lm_1)
studentized Breusch-Pagan test
data: lm_1
BP = 9.261, df = 1, p-value = 0.002341
then I run a robust regression for autocorrelation and heteroskedasticity (HAC - Newey-West):
> coeftest(lm_1, vocv=NeweyWest(lm_1,lag=2, prewhite=FALSE))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 330853 13016 25.4185 < 2.2e-16 ***
Independent_variable_1 3164631 689599 4.5891 0.0001181 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
and I get the same results for coefficients / standard errors.
Is this normal? Is this due to the small sample size?

Using summary() and summary.lm() on planned comparisons in R - why do the outputs differ?

I have run a three-way independent ANOVA in R. (Sound) Manipulation being my independent variable with the three levels: congruent (KON), incongruent (INK) and no sound (control). Furthermore, I have constructed planned comparisons. The first comparison c1 is the contrast of KON & INK vs. the control group and the second comparison c2 is the contrast of KON vs. INK. The outputs look like this:
summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
Manipulation 2 11.97 5.985 2.388 0.0975 .
Manipulation: control vs. Experimental 1 7.97 7.970 3.181 0.0778 .
Manipulation: INK vs. KON 1 4.00 3.999 1.596 0.2097
Residuals 91 228.01 2.506
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary.lm(model)
Residuals:
Min 1Q Median 3Q Max
-2.5062 -1.3333 -0.3333 1.1398 4.4111
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0317 0.1647 18.411 <2e-16 ***
Manipulationc1 -0.2214 0.1172 -1.889 0.0621 .
Manipulationc2 -0.2531 0.2003 -1.263 0.2097
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.583 on 91 degrees of freedom
Multiple R-squared: 0.04988, Adjusted R-squared: 0.02899
F-statistic: 2.388 on 2 and 91 DF, p-value: 0.0975
What strikes me is that R uses my pre-defined label of the comparisons, i.e. "control vs. experimental" and "INK vs. KON" in the first summary() output, yet it uses something else in the second output summary.lm(). Why is this?
Furthermore, it seems odd, that the p-value of the first comparison differs across the two outputs, i.e. 0.0778 in case of summary() and 0.0621 in case of summary.lm(). Where is this difference coming from?
You should inspect class(model):
M <- aov(formula = Petal.Length ~ Species, data = iris)
summary(M)
summary.lm(M)
class(M)
First there is "aov" - so summary(M) is the same as summary.aov(M)

Display category labels in regression output in R

Using the this R linear modelling tutorial I'm finding the format of the model output is annoyingly different to that provided in the text and I can't for the life of me work out why. For example, here is the code:
pitch = c(233,204,242,130,112,142)
sex = c(rep("female",3),rep("male",3))
my.df = data.frame(sex,pitch)
xmdl = lm(pitch ~ sex, my.df)
summary(xmdl)
Here is the output I get:
Call:
lm(formula = pitch ~ sex, data = my.df)
Residuals:
1 2 3 4 5 6
6.667 -22.333 15.667 2.000 -16.000 14.000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 177.167 7.201 24.601 1.62e-05 ***
sex1 49.167 7.201 6.827 0.00241 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 17.64 on 4 degrees of freedom
Multiple R-squared: 0.921, Adjusted R-squared: 0.9012
F-statistic: 46.61 on 1 and 4 DF, p-value: 0.002407
In the tutorial the line for Coefficients has "sexmale" instead of "sex1". What setting do I need to activate to achieve this?

Resources