How to pull out Dispersion parameter in R - r

Call:
glm(formula = Y1 ~ 0 + x1 + x2 + x3 + x4 + x5, family = quasibinomial(link = cauchit))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5415 0.2132 0.3988 0.6614 1.8426
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 -0.7280 0.3509 -2.075 0.03884 *
x2 -0.9108 0.3491 -2.609 0.00951 **
x3 0.2377 0.1592 1.494 0.13629
x4 -0.2106 0.1573 -1.339 0.18151
x5 3.6982 0.8658 4.271 2.57e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 0.8782731)
Null deviance: 443.61 on 320 degrees of freedom
Residual deviance: 270.17 on 315 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 12
Here is the output from glm in R.
Do you know a way to pull out Dispersion parameter which is 0.8782731 in this case, instead of just copy and paste. Thanks.

You can extract it from the output of summary:
data(iris)
mod <- glm((Petal.Length > 5) ~ Sepal.Width, data=iris)
summary(mod)
#
# Call:
# glm(formula = (Petal.Length > 5) ~ Sepal.Width, data = iris)
#
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -0.3176 -0.2856 -0.2714 0.7073 0.7464
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.38887 0.26220 1.483 0.140
# Sepal.Width -0.03561 0.08491 -0.419 0.676
#
# (Dispersion parameter for gaussian family taken to be 0.2040818)
#
# Null deviance: 30.240 on 149 degrees of freedom
# Residual deviance: 30.204 on 148 degrees of freedom
# AIC: 191.28
#
# Number of Fisher Scoring iterations: 2
summary(mod)$dispersion
# [1] 0.2040818
The str function in R is often helpful to solve these sorts of questions. For instance, I looked at str(summary(mod)) to answer the question.

Related

Extracting selected output in R using summary

Extracting selected output in R using summary
model <- glm(am ~ disp + hp, data=mtcars, family=binomial)
T1<-summary(model)
T1
This is the output i get
Call:
glm(formula = am ~ disp + hp, family = binomial, data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9665 -0.3090 -0.0017 0.3934 1.3682
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.40342 1.36757 1.026 0.3048
disp -0.09518 0.04800 -1.983 0.0474 *
hp 0.12170 0.06777 1.796 0.0725 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.230 on 31 degrees of freedom
Residual deviance: 16.713 on 29 degrees of freedom
AIC: 22.713
Number of Fisher Scoring iterations: 8
I want to extract only the coefficients and null deviance as shown below how do I do it, I tried using $coefficeint but it only shows coefficient values ?
Coefficients:
(Intercept) disp hp
1.40342203 -0.09517972 0.12170173
Null deviance: 43.230 on 31 degrees of freedom
Residual deviance: 16.713 on 29 degrees of freedom
AIC: 22.713
Update:
Try this:
coef(model)
model$coefficients
model$null.deviance
model$deviance
model$aic
If you type in T1$ then a window occurs and you can select whatever you need.
T1$null.deviance
T1$coefficients
> T1$null.deviance
[1] 43.22973
> T1$coefficients
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.40342203 1.36756660 1.026218 0.30478864
disp -0.09517972 0.04800283 -1.982794 0.04739044
hp 0.12170173 0.06777320 1.795721 0.07253897

Model outcome = mortality (count), exposure = climate (continuous), Rstudio

I have ran Poisson distribution model with quasi Poisson error in Rstudio
glm(formula = MI ~ corr_data$Temperature + corr_data$Humidity +
corr_data$Sun + corr_data$Rain, family = quasipoisson(),
data = corr_data)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.5323 -1.1149 -0.1346 0.8591 3.2585
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.9494713 1.2068332 3.273 0.00144 **
corr_data$Temperature -0.0281248 0.0144238 -1.950 0.05381 .
corr_data$Humidity -0.0099800 0.0144047 -0.693 0.48992
corr_data$Sun -0.0767811 0.0414464 -1.853 0.06670 .
corr_data$Rain -0.0003076 0.0004211 -0.731 0.46662
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasipoisson family taken to be 1.873611)
Null deviance: 249.16 on 111 degrees of freedom
Residual deviance: 206.36 on 107 degrees of freedom
(24 observations deleted due to missingness)
I have read that the dispersion parameter should be ideally close to 1
I have some zero values in my cumulative rainfall measures.
How best to I go about finding the appropriate model?
I next tried negative binomial
Call:
glm.nb(formula = Incidence ~ Humidity + Sun + Rain, data = corr_data,
init.theta = 22.16822882, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.53274 -0.85380 -0.08705 0.73230 2.48643
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.3626266 1.0970701 1.242 0.2142
Humidity 0.0111537 0.0124768 0.894 0.3713
Sun -0.0295395 0.0345469 -0.855 0.3925
Rain -0.0006170 0.0003007 -2.052 0.0402 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(22.1682) family taken to be 1)
Null deviance: 120.09 on 111 degrees of freedom
Residual deviance: 113.57 on 108 degrees of freedom
(24 observations deleted due to missingness)
AIC: 578.3
Number of Fisher Scoring iterations: 1
Theta: 22.2
Std. Err.: 11.8
2 x log-likelihood: -568.299
Any advice would be very much appreciated. I am new to R and to modelling!

linear model having 4 predictors

I am trying to fit a linear model having 4 predictors. The problem I am facing is my code doesn't estimate the one parameter. Every time when I put the one variable at last of my lm formula it doesn't estimate it. My code is:
AllData <- read.csv("AllBandReflectance.csv",header = T)
Swir2ref <- AllData$band7
x1 <- AllData$X1
x2 <- AllData$X2
y1 <- AllData$Y1
y2 <- AllData$Y2
linear.model <- lm( Swir2ref ~ x1 + y1 +x2 +y2 , data = AllData )
summary(linear.model)
Call:
lm(formula = Swir2ref ~ x1 + y1 + x2 + y2, data = AllData)
Residuals:
Min 1Q Median 3Q Max
-0.027277 -0.008793 -0.000689 0.010085 0.035097
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.595593 0.002006 296.964 <2e-16 ***
x1 0.002175 0.003462 0.628 0.532
y1 0.001498 0.003638 0.412 0.682
x2 0.022671 0.018786 1.207 0.232
y2 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01437 on 67 degrees of freedom
Multiple R-squared: 0.02876, Adjusted R-squared: -0.01473
F-statistic: 0.6613 on 3 and 67 DF, p-value: 0.5787

Using specified regressors in glm() in R

Question related to R, glm() function:
I have a dataset obtained as:
mydata <- read.csv("data.csv", header = TRUE)
which contains the variable 'y' (y is binary 0 or 1) and 60 regressors. Three of these regressors are 'avg','age' and 'income' (all three are numerical).
I want to use glm function for logistic regression, as below:
model <-glm(y~., data = mydata, family = binomial)
Can you tell me how I may proceed if I don't want to use the three specified variables (avg, age and income) in the glm() function, and use only the remaining 57 variables?
You can simply exclude those three variables from mydata before running glm().
Here I create some sample data:
set.seed(1)
mydata<-replicate(10,rnorm(100,300,50))
mydata<-data.frame(dv=sample(c(0,1),100,replace = TRUE),mydata)
> head(mydata)
dv X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 268.6773 268.9817 320.4701 344.6837 353.7220 303.8652 282.9467 264.6216 245.6546 222.9299
2 1 309.1822 302.1058 384.4437 247.6351 394.7827 285.1566 375.1212 398.5786 208.6958 309.7161
3 1 258.2186 254.4539 379.3294 398.5669 269.8501 240.8379 326.4154 295.5001 349.7641 313.2211
4 0 379.7640 307.9014 283.4546 280.8184 280.4566 300.5646 327.1096 299.2991 299.4069 244.0632
5 0 316.4754 267.2708 185.7382 382.7073 279.1889 349.5801 293.1663 243.8272 270.0186 332.5476
6 0 258.9766 388.3644 424.8831 375.6106 281.2171 379.6984 243.1633 232.7935 291.1026 248.3550
If I run your specified model on the data as it is then I use all the variables on the right hand side:
model<-glm(data=mydata, dv~.,family=binomial(link = 'logit'))
> summary(model)
Call:
glm(formula = dv ~ ., family = binomial(link = "logit"), data = mydata)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8891 -1.0853 -0.5163 1.0237 1.8303
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.4330825 4.1437180 -0.587 0.5571
X1 -0.0020482 0.0049025 -0.418 0.6761
X2 -0.0059021 0.0046298 -1.275 0.2024
X3 0.0123246 0.0047991 2.568 0.0102 *
X4 0.0024804 0.0046856 0.529 0.5966
X5 0.0025348 0.0039545 0.641 0.5215
X6 -0.0005905 0.0047417 -0.125 0.9009
X7 -0.0001758 0.0040737 -0.043 0.9656
X8 0.0042362 0.0041170 1.029 0.3035
X9 -0.0007664 0.0042471 -0.180 0.8568
X10 -0.0042089 0.0043094 -0.977 0.3287
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.59 on 99 degrees of freedom
Residual deviance: 125.11 on 89 degrees of freedom
AIC: 147.11
Number of Fisher Scoring iterations: 4
Now I exclude X1 and X2 from mydata and run the model again:
mydata2<-mydata[,-match(c('X1','X2'), colnames(mydata))]
model2<-glm(data=mydata2, dv~.,family=binomial(link = 'logit'))
> summary(model2)
Call:
glm(formula = dv ~ ., family = binomial(link = "logit"), data = mydata2)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8983 -1.0724 -0.4521 1.1132 1.7792
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.8725545 3.6357314 -1.340 0.18019
X3 0.0124982 0.0047930 2.608 0.00912 **
X4 0.0031911 0.0045971 0.694 0.48758
X5 0.0015992 0.0038101 0.420 0.67467
X6 -0.0003295 0.0046554 -0.071 0.94357
X7 0.0003372 0.0039961 0.084 0.93275
X8 0.0038889 0.0040737 0.955 0.33977
X9 -0.0010014 0.0042078 -0.238 0.81189
X10 -0.0041691 0.0042232 -0.987 0.32356
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.59 on 99 degrees of freedom
Residual deviance: 126.93 on 91 degrees of freedom
AIC: 144.93
Number of Fisher Scoring iterations: 4
The . ("everything") on the right side of the formula can be modified by subtracting terms:
model <- glm(y~ . - avg - age - income, data = mydata,
family = binomial)

R- How to save the console data into a row/matrix or data frame for future use?

I'm performing the multiple regression to find the best model to predict the prices. See as following for the output in the R console.
I'd like to store the first column (Estimates) into a row/matrix or data frame for future use such as using R shiny to deploy on the web.
*(Price = 698.8+0.116*voltage-70.72*VendorCHICONY
-36.6*VendorDELTA-66.8*VendorLITEON-14.86*H)*
Can somebody kindly advise?? Thanks in advance.
Call:
lm(formula = Price ~ Voltage + Vendor + H, data = PSU2)
Residuals:
Min 1Q Median 3Q Max
-10.9950 -0.6251 0.0000 3.0134 11.0360
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 698.821309 276.240098 2.530 0.0280 *
Voltage 0.116958 0.005126 22.818 1.29e-10 ***
VendorCHICONY -70.721088 9.308563 -7.597 1.06e-05 ***
VendorDELTA -36.639685 5.866688 -6.245 6.30e-05 ***
VendorLITEON -66.796531 6.120925 -10.913 3.07e-07 ***
H -14.869478 6.897259 -2.156 0.0541 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.307 on 11 degrees of freedom
Multiple R-squared: 0.9861, Adjusted R-squared: 0.9799
F-statistic: 156.6 on 5 and 11 DF, p-value: 7.766e-10
Use coef on your lm output.
e.g.
m <- lm(Sepal.Length ~ Sepal.Width + Species, iris)
summary(m)
# Call:
# lm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
# Residuals:
# Min 1Q Median 3Q Max
# -1.30711 -0.25713 -0.05325 0.19542 1.41253
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 2.2514 0.3698 6.089 9.57e-09 ***
# Sepal.Width 0.8036 0.1063 7.557 4.19e-12 ***
# Speciesversicolor 1.4587 0.1121 13.012 < 2e-16 ***
# Speciesvirginica 1.9468 0.1000 19.465 < 2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.438 on 146 degrees of freedom
# Multiple R-squared: 0.7259, Adjusted R-squared: 0.7203
# F-statistic: 128.9 on 3 and 146 DF, p-value: < 2.2e-16
coef(m)
# (Intercept) Sepal.Width Speciesversicolor Speciesvirginica
# 2.2513932 0.8035609 1.4587431 1.9468166
See also names(m) which shows you some things you can extract, e.g. m$residuals (or equivalently, resid(m)).
And also methods(class='lm') will show you some other functions that work on a lm.
> methods(class='lm')
[1] add1 alias anova case.names coerce confint cooks.distance deviance dfbeta dfbetas drop1 dummy.coef effects extractAIC family
[16] formula hatvalues influence initialize kappa labels logLik model.frame model.matrix nobs plot predict print proj qr
[31] residuals rstandard rstudent show simulate slotsFromS3 summary variable.names vcov
(oddly, 'coef' is not in there? ah well)
Besides, I'd like to know if there is command to show the "residual percentage"
=(actual value-fitted value)/actual value"; currently the "residuals()" command can
only show the below info but I need the percentage instead.
residuals(fit3ab)
1 2 3 4 5 6
-5.625491e-01 -5.625491e-01 7.676578e-15 -8.293815e+00 -5.646900e+00 3.443652e+00

Resources