How to use emmeans function to back transform glm summary - r

I've run an Interrupted Time Series Analysis using a GLM and need to be able to exponentiate outcomes in order to validate. I have been recommended the emmeans package, but I'm not quite sure how to do it.
Base R summary is below:
summary(fit1a)
Call:
glm(formula = `Subject Total` ~ Quarter + int2 + time_since_intervention2,
family = "poisson", data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4769 -0.5111 0.1240 0.6103 0.9128
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.54584 0.09396 37.737 <0.0000000000000002 ***
Quarter -0.02348 0.01018 -2.306 0.0211 *
int2 -0.23652 0.21356 -1.108 0.2681
time_since_intervention2 -0.02624 0.04112 -0.638 0.5234
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 63.602 on 23 degrees of freedom
Residual deviance: 13.368 on 20 degrees of freedom
AIC: 140.54
Number of Fisher Scoring iterations: 4
Cant really understand how to get started with emmeans. How would I code in order to get the estimate for Quarter, int2 and time_since_intervention2 on the response scale?

Related

Predict response vs manual for logistic regression probability

I am trying to manually calculate the probability of a given x with logistic regression model.
my model looks like this fit2 <- glm(ability~stability, data = df2)
I created a function that gives me the response:
estimator <- function(x){
predict(fit2, type = "response", newdata = data.frame(stability=x))
}
this function gives me 0.5304603 for the value x=550
then i create the manual version. For this i use the function p = e^(B0+B1*x)/(1 + e^(B0+B1*x))
so our code will look like this
est <- function(par, x){
x = c(1,x)
exp(par%*%x)/(1+exp(par%*%x))
}
where par = fit2$coefficients, x = 550
but this code returns 0.6295905
Why?
edit:
summary(fit2):
Call:
glm(formula = ability ~ stability, data = df2)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.78165 -0.33738 0.09462 0.31582 0.72823
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.531574 0.677545 -2.26 0.03275 *
stability 0.003749 0.001229 3.05 0.00535 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 0.1965073)
Null deviance: 6.7407 on 26 degrees of freedom
Residual deviance: 4.9127 on 25 degrees of freedom
AIC: 36.614
Number of Fisher Scoring iterations: 2

illustration of overdispersion

I am trying to plot best fitting poisson distribution over a histogram to show over-dispersion in data. I came across a piece of code: The first part creates an histogram and calculates a Poisson model. So far so good.
hist(patents$ncit, nclas=14,col="light blue",prob=T,
xlab="Number of citations",ylab="",main="",
cex.lab=1.5,cex.axis=1.3)
glm(formula = ncit ~ 1, family = poisson, data = patents)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7513 -1.7513 -0.4604 0.3596 6.4405
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.42761 0.01164 36.72 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 13359 on 4808 degrees of freedom
Residual deviance: 13359 on 4808 degrees of freedom
AIC: 20350
Number of Fisher Scoring iterations: 6
In the second part of the code a best fitting poisson line is constructed. I do not understand where does the exp(0.32723)) comes from ?
lines(0:14,dpois(0:14,exp(0.32723)),col="red",lwd=2)

Clinical prediction rules (calibration, discrimination and validation) in R

This is my glm model which I was able to create but I now want to do the Hosmer Lemeshow GOF test but get this error which I don't understand-
Call:
glm(formula = BC.result ~ Diabetic + Low.diastolic + Pulse, family =
"binomial", data = Data2)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0805 -0.4559 -0.3144 -0.2437 2.8259
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.86311 0.98715 -5.939 2.86e-09 ***
Diabetic 1.21963 0.37395 3.262 0.001108 **
Low.diastolic 1.27095 0.35074 3.624 0.000291 ***
Pulse 0.02361 0.00780 3.027 0.002470 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 276.40 on 485 degrees of freedom
Residual deviance: 249.71 on 482 degrees of freedom
(14 observations deleted due to missingness)
AIC: 257.71
> hl<- hoslem.test(model3$BC.result, fitted(model3, g=10))
Error in model.frame.default(formula = cbind(y0 = 1 - y, y1 = y) ~ cutyhat)
:
variable lengths differ (found for 'cutyhat')
Can anyone explain this error with the Hosmer Lemeshow goodness of fit test?
Thanks

getting effect sizes from a quasibinomial glm

I have some data that I've fit a glm from the family quasibinomial. The model shows high significance for a number of different factors. However, I was wondering how can I get effect sizes for these different factors. Does anyone have any idea?
The model results:
Call:
glm(formula = freq ~ K + res_dist + K:trail_cost + trail_cost:res_dist:K,
family = "quasibinomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2072 -0.3505 -0.1406 0.2714 1.7746
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.222e+00 1.786e-01 6.842 7.24e-11 ***
K -1.741e-05 2.107e-06 -8.265 1.19e-14 ***
res_distrandom -1.419e+00 2.386e-01 -5.949 1.02e-08 ***
K:trail_cost 1.381e-01 3.930e-02 3.515 0.000531 ***
K:res_distrandom:trail_cost 1.564e-01 2.532e-02 6.176 3.03e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 0.2932451)
Null deviance: 126.028 on 230 degrees of freedom
Residual deviance: 66.616 on 226 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5

Extract only coefficients whose p values are significant from a logistic model

I have run a logistic regression, the summary of which I name. "score" Accordingly, summary(score) gives me the following
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3616 -0.9806 -0.7876 1.2563 1.9246
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.188286233 1.94605597 -2.1521921 0.031382230 *
Overall -0.013407201 0.06158168 -0.2177141 0.827651866
RTN -0.052959314 0.05015013 -1.0560154 0.290961160
Recorded 0.162863294 0.07290053 2.2340482 0.025479900 *
PV -0.086743611 0.02950620 -2.9398438 0.003283778 **
Expire -0.035046322 0.04577103 -0.7656878 0.443862068
Trial 0.007220173 0.03294419 0.2191637 0.826522498
Fitness 0.056135418 0.03114687 1.8022810 0.071501212 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 757.25 on 572 degrees of freedom
Residual deviance: 725.66 on 565 degrees of freedom
AIC: 741.66
Number of Fisher Scoring iterations: 4
What I am hoping to achieve is to get variables names and coefficients of those variables which have a *, **, or *** next to their Pr(>|z|) value. In other words, I want the aforementioned variables and coefficients with a Pr(>|z|) < .05.
Ideally, I'd like to get them in a data frame. Unfortunately, the following code I've tried does not work.
variable_try <-
summary(score)$coefficients[if(summary(score)$coefficients[, 4] <= .05,
summary(score)$coefficients[, 1]),]
Error: unexpected ',' in "variable_try <-
summary(score)$coefficients[if(summary(score)$coefficients[,4] < .05,"
What about this:
data.frame(summary(score)$coef[summary(score)$coef[,4] <= .05, 4])

Resources