Extracting selected output in R using summary - r

Extracting selected output in R using summary
model <- glm(am ~ disp + hp, data=mtcars, family=binomial)
T1<-summary(model)
T1
This is the output i get
Call:
glm(formula = am ~ disp + hp, family = binomial, data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9665 -0.3090 -0.0017 0.3934 1.3682
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.40342 1.36757 1.026 0.3048
disp -0.09518 0.04800 -1.983 0.0474 *
hp 0.12170 0.06777 1.796 0.0725 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.230 on 31 degrees of freedom
Residual deviance: 16.713 on 29 degrees of freedom
AIC: 22.713
Number of Fisher Scoring iterations: 8
I want to extract only the coefficients and null deviance as shown below how do I do it, I tried using $coefficeint but it only shows coefficient values ?
Coefficients:
(Intercept) disp hp
1.40342203 -0.09517972 0.12170173
Null deviance: 43.230 on 31 degrees of freedom
Residual deviance: 16.713 on 29 degrees of freedom
AIC: 22.713

Update:
Try this:
coef(model)
model$coefficients
model$null.deviance
model$deviance
model$aic
If you type in T1$ then a window occurs and you can select whatever you need.
T1$null.deviance
T1$coefficients
> T1$null.deviance
[1] 43.22973
> T1$coefficients
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.40342203 1.36756660 1.026218 0.30478864
disp -0.09517972 0.04800283 -1.982794 0.04739044
hp 0.12170173 0.06777320 1.795721 0.07253897

Related

how does R handle NA values vs deleted values with regressions

Say I have a table and I remove all the inapplicable values and I ran a regression. If I ran the exact same regression on the same table, but this time instead of removing the inapplicable values, I turned them into NA values, would the regression still give me the same coefficients?
The regression would omit any NA values prior to doing the analysis (i.e. deleting any row that contains a missing NA in any of the predictor variables or the outcome variable). You can check this by comparing the degrees of freedom and other statistics for both models.
Here's a toy example:
head(mtcars)
# check the data set size (all non-missings)
dim(mtcars) # has 32 rows
# Introduce some missings
set.seed(5)
mtcars[sample(1:nrow(mtcars), 5), sample(1:ncol(mtcars), 5)] <- NA
head(mtcars)
# Create an alternative where all missings are omitted
mtcars_NA_omit <- na.omit(mtcars)
# Check the data set size again
dim(mtcars_NA_omit) # Now only has 27 rows
# Now compare some simple linear regressions
summary(lm(mpg ~ cyl + hp + am + gear, data = mtcars))
summary(lm(mpg ~ cyl + hp + am + gear, data = mtcars_NA_omit))
Comparing the two summaries you can see that they are identical, with the one exception that for the first model, there's a warning message that 5 csaes have been dropped due to missingness, which is exactly what we did manually in our mtcars_NA_omit example.
# First, original model
Call:
lm(formula = mpg ~ cyl + hp + am + gear, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-5.0835 -1.7594 -0.2023 1.4313 5.6948
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.64284 7.02359 4.220 0.000352 ***
cyl -1.04494 0.83565 -1.250 0.224275
hp -0.03913 0.01918 -2.040 0.053525 .
am 4.02895 1.90342 2.117 0.045832 *
gear 0.31413 1.48881 0.211 0.834833
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.947 on 22 degrees of freedom
(5 observations deleted due to missingness)
Multiple R-squared: 0.7998, Adjusted R-squared: 0.7635
F-statistic: 21.98 on 4 and 22 DF, p-value: 2.023e-07
# Second model where we dropped missings manually
Call:
lm(formula = mpg ~ cyl + hp + am + gear, data = mtcars_NA_omit)
Residuals:
Min 1Q Median 3Q Max
-5.0835 -1.7594 -0.2023 1.4313 5.6948
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.64284 7.02359 4.220 0.000352 ***
cyl -1.04494 0.83565 -1.250 0.224275
hp -0.03913 0.01918 -2.040 0.053525 .
am 4.02895 1.90342 2.117 0.045832 *
gear 0.31413 1.48881 0.211 0.834833
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.947 on 22 degrees of freedom
Multiple R-squared: 0.7998, Adjusted R-squared: 0.7635
F-statistic: 21.98 on 4 and 22 DF, p-value: 2.023e-07

Model outcome = mortality (count), exposure = climate (continuous), Rstudio

I have ran Poisson distribution model with quasi Poisson error in Rstudio
glm(formula = MI ~ corr_data$Temperature + corr_data$Humidity +
corr_data$Sun + corr_data$Rain, family = quasipoisson(),
data = corr_data)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.5323 -1.1149 -0.1346 0.8591 3.2585
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.9494713 1.2068332 3.273 0.00144 **
corr_data$Temperature -0.0281248 0.0144238 -1.950 0.05381 .
corr_data$Humidity -0.0099800 0.0144047 -0.693 0.48992
corr_data$Sun -0.0767811 0.0414464 -1.853 0.06670 .
corr_data$Rain -0.0003076 0.0004211 -0.731 0.46662
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasipoisson family taken to be 1.873611)
Null deviance: 249.16 on 111 degrees of freedom
Residual deviance: 206.36 on 107 degrees of freedom
(24 observations deleted due to missingness)
I have read that the dispersion parameter should be ideally close to 1
I have some zero values in my cumulative rainfall measures.
How best to I go about finding the appropriate model?
I next tried negative binomial
Call:
glm.nb(formula = Incidence ~ Humidity + Sun + Rain, data = corr_data,
init.theta = 22.16822882, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.53274 -0.85380 -0.08705 0.73230 2.48643
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.3626266 1.0970701 1.242 0.2142
Humidity 0.0111537 0.0124768 0.894 0.3713
Sun -0.0295395 0.0345469 -0.855 0.3925
Rain -0.0006170 0.0003007 -2.052 0.0402 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(22.1682) family taken to be 1)
Null deviance: 120.09 on 111 degrees of freedom
Residual deviance: 113.57 on 108 degrees of freedom
(24 observations deleted due to missingness)
AIC: 578.3
Number of Fisher Scoring iterations: 1
Theta: 22.2
Std. Err.: 11.8
2 x log-likelihood: -568.299
Any advice would be very much appreciated. I am new to R and to modelling!

Getting around with predictors stacked with the intercept

my factor "Hours" is a categorical predictor and has values 1 and 2. When I applied as.factor, I think the category of value 1 is stacked with the intercept. Is there a way for me to not make that stacking happen?
Call:
glm(formula = Appointment.Status ~ as.factor(Hours), family = binomial,
data = data_appt)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.5593 -0.5593 -0.5593 -0.4781 2.1098
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.11132 0.04523 -46.681 < 2e-16 ***
as.factor(Hours)2 0.33508 0.05435 6.166 7.02e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 10871 on 13970 degrees of freedom
Residual deviance: 10832 on 13969 degrees of freedom
AIC: 10836
Number of Fisher Scoring iterations: 4

How to pull out Dispersion parameter in R

Call:
glm(formula = Y1 ~ 0 + x1 + x2 + x3 + x4 + x5, family = quasibinomial(link = cauchit))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5415 0.2132 0.3988 0.6614 1.8426
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 -0.7280 0.3509 -2.075 0.03884 *
x2 -0.9108 0.3491 -2.609 0.00951 **
x3 0.2377 0.1592 1.494 0.13629
x4 -0.2106 0.1573 -1.339 0.18151
x5 3.6982 0.8658 4.271 2.57e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 0.8782731)
Null deviance: 443.61 on 320 degrees of freedom
Residual deviance: 270.17 on 315 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 12
Here is the output from glm in R.
Do you know a way to pull out Dispersion parameter which is 0.8782731 in this case, instead of just copy and paste. Thanks.
You can extract it from the output of summary:
data(iris)
mod <- glm((Petal.Length > 5) ~ Sepal.Width, data=iris)
summary(mod)
#
# Call:
# glm(formula = (Petal.Length > 5) ~ Sepal.Width, data = iris)
#
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -0.3176 -0.2856 -0.2714 0.7073 0.7464
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.38887 0.26220 1.483 0.140
# Sepal.Width -0.03561 0.08491 -0.419 0.676
#
# (Dispersion parameter for gaussian family taken to be 0.2040818)
#
# Null deviance: 30.240 on 149 degrees of freedom
# Residual deviance: 30.204 on 148 degrees of freedom
# AIC: 191.28
#
# Number of Fisher Scoring iterations: 2
summary(mod)$dispersion
# [1] 0.2040818
The str function in R is often helpful to solve these sorts of questions. For instance, I looked at str(summary(mod)) to answer the question.

How is 95% CI calculated using confint in R?

I use the example provided in confint help page
> fit <- lm(100/mpg ~ disp + hp + wt + am, data=mtcars)
> summary(fit)
Call:
lm(formula = 100/mpg ~ disp + hp + wt + am, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-1.6923 -0.3901 0.0579 0.3649 1.2608
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.740648 0.738594 1.003 0.32487
disp 0.002703 0.002715 0.996 0.32832
hp 0.005275 0.003253 1.621 0.11657
wt 1.001303 0.302761 3.307 0.00267 **
am 0.155815 0.375515 0.415 0.68147
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6754 on 27 degrees of freedom
Multiple R-squared: 0.8527, Adjusted R-squared: 0.8309
F-statistic: 39.08 on 4 and 27 DF, p-value: 7.369e-11
> confint(fit)
2.5 % 97.5 %
(Intercept) -0.774822875 2.256118188
disp -0.002867999 0.008273849
hp -0.001400580 0.011949674
wt 0.380088737 1.622517536
am -0.614677730 0.926307310
> confint(fit, "wt")
2.5 % 97.5 %
> wt 0.3800887 1.622518
>confint.default(fit,"wt")
2.5 % 97.5 %
wt 0.4079023 1.594704
> 1.001303 + 1.96*0.302761
[1] 1.594715
> 1.001303 - 1.96*0.302761
[1] 0.4078914
So the 95% CI obtained from confint.default is based on asymptotic normality. What about for confint?
Thanks
You can check out the code for each method.
# View code for 'default'
confint.default
# View code of lm objects
getAnywhere(confint.lm)
The difference appears to be that default uses normal quantiles and the method for linear models uses T-quantiles instead.

Resources