Extract only coefficients whose p values are significant from a logistic model - r

I have run a logistic regression, the summary of which I name. "score" Accordingly, summary(score) gives me the following
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3616 -0.9806 -0.7876 1.2563 1.9246
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.188286233 1.94605597 -2.1521921 0.031382230 *
Overall -0.013407201 0.06158168 -0.2177141 0.827651866
RTN -0.052959314 0.05015013 -1.0560154 0.290961160
Recorded 0.162863294 0.07290053 2.2340482 0.025479900 *
PV -0.086743611 0.02950620 -2.9398438 0.003283778 **
Expire -0.035046322 0.04577103 -0.7656878 0.443862068
Trial 0.007220173 0.03294419 0.2191637 0.826522498
Fitness 0.056135418 0.03114687 1.8022810 0.071501212 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 757.25 on 572 degrees of freedom
Residual deviance: 725.66 on 565 degrees of freedom
AIC: 741.66
Number of Fisher Scoring iterations: 4
What I am hoping to achieve is to get variables names and coefficients of those variables which have a *, **, or *** next to their Pr(>|z|) value. In other words, I want the aforementioned variables and coefficients with a Pr(>|z|) < .05.
Ideally, I'd like to get them in a data frame. Unfortunately, the following code I've tried does not work.
variable_try <-
summary(score)$coefficients[if(summary(score)$coefficients[, 4] <= .05,
summary(score)$coefficients[, 1]),]
Error: unexpected ',' in "variable_try <-
summary(score)$coefficients[if(summary(score)$coefficients[,4] < .05,"

What about this:
data.frame(summary(score)$coef[summary(score)$coef[,4] <= .05, 4])

Related

How to use emmeans function to back transform glm summary

I've run an Interrupted Time Series Analysis using a GLM and need to be able to exponentiate outcomes in order to validate. I have been recommended the emmeans package, but I'm not quite sure how to do it.
Base R summary is below:
summary(fit1a)
Call:
glm(formula = `Subject Total` ~ Quarter + int2 + time_since_intervention2,
family = "poisson", data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4769 -0.5111 0.1240 0.6103 0.9128
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.54584 0.09396 37.737 <0.0000000000000002 ***
Quarter -0.02348 0.01018 -2.306 0.0211 *
int2 -0.23652 0.21356 -1.108 0.2681
time_since_intervention2 -0.02624 0.04112 -0.638 0.5234
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 63.602 on 23 degrees of freedom
Residual deviance: 13.368 on 20 degrees of freedom
AIC: 140.54
Number of Fisher Scoring iterations: 4
Cant really understand how to get started with emmeans. How would I code in order to get the estimate for Quarter, int2 and time_since_intervention2 on the response scale?

Predict response vs manual for logistic regression probability

I am trying to manually calculate the probability of a given x with logistic regression model.
my model looks like this fit2 <- glm(ability~stability, data = df2)
I created a function that gives me the response:
estimator <- function(x){
predict(fit2, type = "response", newdata = data.frame(stability=x))
}
this function gives me 0.5304603 for the value x=550
then i create the manual version. For this i use the function p = e^(B0+B1*x)/(1 + e^(B0+B1*x))
so our code will look like this
est <- function(par, x){
x = c(1,x)
exp(par%*%x)/(1+exp(par%*%x))
}
where par = fit2$coefficients, x = 550
but this code returns 0.6295905
Why?
edit:
summary(fit2):
Call:
glm(formula = ability ~ stability, data = df2)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.78165 -0.33738 0.09462 0.31582 0.72823
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.531574 0.677545 -2.26 0.03275 *
stability 0.003749 0.001229 3.05 0.00535 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 0.1965073)
Null deviance: 6.7407 on 26 degrees of freedom
Residual deviance: 4.9127 on 25 degrees of freedom
AIC: 36.614
Number of Fisher Scoring iterations: 2

illustration of overdispersion

I am trying to plot best fitting poisson distribution over a histogram to show over-dispersion in data. I came across a piece of code: The first part creates an histogram and calculates a Poisson model. So far so good.
hist(patents$ncit, nclas=14,col="light blue",prob=T,
xlab="Number of citations",ylab="",main="",
cex.lab=1.5,cex.axis=1.3)
glm(formula = ncit ~ 1, family = poisson, data = patents)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7513 -1.7513 -0.4604 0.3596 6.4405
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.42761 0.01164 36.72 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 13359 on 4808 degrees of freedom
Residual deviance: 13359 on 4808 degrees of freedom
AIC: 20350
Number of Fisher Scoring iterations: 6
In the second part of the code a best fitting poisson line is constructed. I do not understand where does the exp(0.32723)) comes from ?
lines(0:14,dpois(0:14,exp(0.32723)),col="red",lwd=2)

getting effect sizes from a quasibinomial glm

I have some data that I've fit a glm from the family quasibinomial. The model shows high significance for a number of different factors. However, I was wondering how can I get effect sizes for these different factors. Does anyone have any idea?
The model results:
Call:
glm(formula = freq ~ K + res_dist + K:trail_cost + trail_cost:res_dist:K,
family = "quasibinomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2072 -0.3505 -0.1406 0.2714 1.7746
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.222e+00 1.786e-01 6.842 7.24e-11 ***
K -1.741e-05 2.107e-06 -8.265 1.19e-14 ***
res_distrandom -1.419e+00 2.386e-01 -5.949 1.02e-08 ***
K:trail_cost 1.381e-01 3.930e-02 3.515 0.000531 ***
K:res_distrandom:trail_cost 1.564e-01 2.532e-02 6.176 3.03e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 0.2932451)
Null deviance: 126.028 on 230 degrees of freedom
Residual deviance: 66.616 on 226 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5

linear regression r comparing multiple observations vs single observation

Based upon answers of my question, I am supposed to get same values of intercept and the regression coefficient for below 2 models. But they are not the same. What is going on?
is something wrong with my code? Or is the original answer wrong?
#linear regression average qty per price point vs all quantities
x1=rnorm(30,20,1);y1=rep(3,30)
x2=rnorm(30,17,1.5);y2=rep(4,30)
x3=rnorm(30,12,2);y3=rep(4.5,30)
x4=rnorm(30,6,3);y4=rep(5.5,30)
x=c(x1,x2,x3,x4)
y=c(y1,y2,y3,y4)
plot(y,x)
cor(y,x)
fit=lm(x~y)
attributes(fit)
summary(fit)
xdum=c(20,17,12,6)
ydum=c(3,4,4.5,5.5)
plot(ydum,xdum)
cor(ydum,xdum)
fit1=lm(xdum~ydum)
attributes(fit1)
summary(fit1)
> summary(fit)
Call:
lm(formula = x ~ y)
Residuals:
Min 1Q Median 3Q Max
-8.3572 -1.6069 -0.1007 2.0222 6.4904
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.0952 1.1570 34.65 <2e-16 ***
y -6.1932 0.2663 -23.25 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.63 on 118 degrees of freedom
Multiple R-squared: 0.8209, Adjusted R-squared: 0.8194
F-statistic: 540.8 on 1 and 118 DF, p-value: < 2.2e-16
> summary(fit1)
Call:
lm(formula = xdum ~ ydum)
Residuals:
1 2 3 4
-0.9615 1.8077 -0.3077 -0.5385
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.2692 3.6456 10.497 0.00895 **
ydum -5.7692 0.8391 -6.875 0.02051 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.513 on 2 degrees of freedom
Multiple R-squared: 0.9594, Adjusted R-squared: 0.9391
F-statistic: 47.27 on 1 and 2 DF, p-value: 0.02051
You are not calculating xdum and ydum in a comparable fashion because rnorm will only approximate the mean value you specify, particularly when you are sampling only 30 cases. This is easily fixed however:
coef(fit)
#(Intercept) y
# 39.618472 -6.128739
xdum <- c(mean(x1),mean(x2),mean(x3),mean(x4))
ydum <- c(mean(y1),mean(y2),mean(y3),mean(y4))
coef(lm(xdum~ydum))
#(Intercept) ydum
# 39.618472 -6.128739
In theory they should be the same if (and only if) the mean of the former model is equal to the point in the latter model.
This is not the case in your models, so the results are slightly different. For example the mean of x1:
x1=rnorm(30,20,1)
mean(x1)
20.08353
where the point version is 20.
There are similar tiny differences from your other rnorm samples:
> mean(x2)
[1] 17.0451
> mean(x3)
[1] 11.72307
> mean(x4)
[1] 5.913274
Not that this really matters, but just FYI the standard nomenclature is that Y is the dependent variable and X is the independent variable, which you reversed. Makes no difference of course, but just so you know.

Resources