Predict response vs manual for logistic regression probability - r

I am trying to manually calculate the probability of a given x with logistic regression model.
my model looks like this fit2 <- glm(ability~stability, data = df2)
I created a function that gives me the response:
estimator <- function(x){
predict(fit2, type = "response", newdata = data.frame(stability=x))
}
this function gives me 0.5304603 for the value x=550
then i create the manual version. For this i use the function p = e^(B0+B1*x)/(1 + e^(B0+B1*x))
so our code will look like this
est <- function(par, x){
x = c(1,x)
exp(par%*%x)/(1+exp(par%*%x))
}
where par = fit2$coefficients, x = 550
but this code returns 0.6295905
Why?
edit:
summary(fit2):
Call:
glm(formula = ability ~ stability, data = df2)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.78165 -0.33738 0.09462 0.31582 0.72823
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.531574 0.677545 -2.26 0.03275 *
stability 0.003749 0.001229 3.05 0.00535 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 0.1965073)
Null deviance: 6.7407 on 26 degrees of freedom
Residual deviance: 4.9127 on 25 degrees of freedom
AIC: 36.614
Number of Fisher Scoring iterations: 2

Related

How to use emmeans function to back transform glm summary

I've run an Interrupted Time Series Analysis using a GLM and need to be able to exponentiate outcomes in order to validate. I have been recommended the emmeans package, but I'm not quite sure how to do it.
Base R summary is below:
summary(fit1a)
Call:
glm(formula = `Subject Total` ~ Quarter + int2 + time_since_intervention2,
family = "poisson", data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4769 -0.5111 0.1240 0.6103 0.9128
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.54584 0.09396 37.737 <0.0000000000000002 ***
Quarter -0.02348 0.01018 -2.306 0.0211 *
int2 -0.23652 0.21356 -1.108 0.2681
time_since_intervention2 -0.02624 0.04112 -0.638 0.5234
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 63.602 on 23 degrees of freedom
Residual deviance: 13.368 on 20 degrees of freedom
AIC: 140.54
Number of Fisher Scoring iterations: 4
Cant really understand how to get started with emmeans. How would I code in order to get the estimate for Quarter, int2 and time_since_intervention2 on the response scale?

Equation from interaction in a logistic regression (GLM)

I have the following glm regression:
fitglm= glm(Resp ~ Doses*Seasons, data=DataJenipa,family=binomial(link =
"probit"))
That gives that summary:
Call:
glm(formula = Resp ~ Doses * Seasons, family = binomial(link = "probit"),
data = DataJenipa)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6511 -0.4289 -0.3035 -0.3035 2.6079
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.63423 0.26604 -2.384 0.0171 *
Doses -0.23989 0.09339 -2.569 0.0102 *
Seasons2 -1.06117 0.44979 -2.359 0.0183 *
Doses:Seasons2 0.23989 0.14380 1.668 0.0953 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 208.05 on 399 degrees of freedom
Residual deviance: 195.71 on 396 degrees of freedom
AIC: 203.71
To visualize my model, I'm using interact_plot (from jtools package)
interact_plot(fitglm, pred = Doses, modx = Seasons, plot.points = T, point.shape = T,interval = F,modx.labels = c("Summer", "Winter"), line.thickness = 1.5)
and I get the following:
How do I get my two math equations from the two lines above? (like: Summer(Y) = -0.63423 -0.23989x ... and goes on)
I know my example is wrong, but how do I get these two equations from the graphic??
Already found a way!
I simply need to run two different glm regressions, each one with only one season (without the interaction Doses*Season). Doing that I'll have each line and their coefficients to make my equation!
So:
fitglmSummer <- glm(Resp ~ Doses, data=DataSummer,family=binomial(link = "probit"))
fitglmWinter <- glm(Resp ~ Doses, data=DataWinter,family=binomial(link = "probit"))
Thanks!

Clinical prediction rules (calibration, discrimination and validation) in R

This is my glm model which I was able to create but I now want to do the Hosmer Lemeshow GOF test but get this error which I don't understand-
Call:
glm(formula = BC.result ~ Diabetic + Low.diastolic + Pulse, family =
"binomial", data = Data2)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0805 -0.4559 -0.3144 -0.2437 2.8259
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.86311 0.98715 -5.939 2.86e-09 ***
Diabetic 1.21963 0.37395 3.262 0.001108 **
Low.diastolic 1.27095 0.35074 3.624 0.000291 ***
Pulse 0.02361 0.00780 3.027 0.002470 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 276.40 on 485 degrees of freedom
Residual deviance: 249.71 on 482 degrees of freedom
(14 observations deleted due to missingness)
AIC: 257.71
> hl<- hoslem.test(model3$BC.result, fitted(model3, g=10))
Error in model.frame.default(formula = cbind(y0 = 1 - y, y1 = y) ~ cutyhat)
:
variable lengths differ (found for 'cutyhat')
Can anyone explain this error with the Hosmer Lemeshow goodness of fit test?
Thanks

Find value of covariates given a probability in logistic regression

I have the model
am.glm = glm(formula=am ~ hp + I(mpg^2), data=mtcars, family=binomial)
which gives
> summary(am.glm)
Call:
glm(formula = am ~ hp + I(mpg^2), family = binomial, data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5871 -0.5376 -0.1128 0.1101 1.6937
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -18.71428 8.45330 -2.214 0.0268 *
hp 0.04689 0.02367 1.981 0.0476 *
I(mpg^2) 0.02811 0.01273 2.207 0.0273 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.230 on 31 degrees of freedom
Residual deviance: 20.385 on 29 degrees of freedom
AIC: 26.385
Number of Fisher Scoring iterations: 7
Given a value of hp I would like to find the values of mpg that would lead to a 50% probability of am.
I haven't managed to find anything that can be used to output such predictions. I have managed to code something using
#Coefficients
glm.intercept<-as.numeric(coef(am.glm)[1])
glm.hp.beta<-as.numeric(coef(am.glm)[2])
glm.mpg.sq.beta<-as.numeric(coef(am.glm)[3])
glm.hp.mpg.beta<-as.numeric(coef(am.glm)[4])
#Constants
prob=0.9
c<-log(prob/(1-prob))
hp=120
polyroot(c((glm.hp.beta*hp)+glm.intercept-c, glm.hp.mpg.beta*hp,glm.mpg.sq.beta))
Is there a more elegant solution? Perhaps a predict function equivalent?
Interesting problem!
How about the solution below? Basically, create newdata for which your target variable is sampling the range of observed values. Predict for the vector of these values, and find the minimum value that meets your criteria
# Your desired threshold
prob = 0.5
# Create a sampling vector
df_new <- data.frame(
hp = rep(120, times = 100),
mpg = seq(from = range(mtcars$mpg)[1],
to = range(mtcars$mpg)[2],
length.out = 100))
# Predict on sampling vector
df_new$am <- predict(am.glm, newdata = df_new)
# Find lowest value above threshold
df_new[min(which(df_new$am > prob)), 'mpg']

Extracting model equation from glm function in R

I've made a logistic regression to combine two independent variables in R using pROC package and I obtain this:
summary(fit)
#Call: glm(formula = Case ~ X + Y, family = "binomial", data = data)
#Deviance Residuals:
# Min 1Q Median 3Q Max
#-1.5751 -0.8277 -0.6095 1.0701 2.3080
#Coefficients:
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) -0.153731 0.538511 -0.285 0.775281
#X -0.048843 0.012856 -3.799 0.000145 ***
#Y 0.028364 0.009077 3.125 0.001780 **
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#(Dispersion parameter for binomial family taken to be 1)
#Null deviance: 287.44 on 241 degrees of freedom
#Residual deviance: 260.34 on 239 degrees of freedom
#AIC: 266.34
#Number of Fisher Scoring iterations: 4
fit
#Call: glm(formula = Case ~ X + Y, family = "binomial", data = data)
#Coefficients:
# (Intercept) X Y
# -0.15373 -0.04884 0.02836
#Degrees of Freedom: 241 Total (i.e. Null); 239 Residual
#Null Deviance: 287.4
#Residual Deviance: 260.3 AIC: 266.3
Now I need to extract some information from this data, but I'm not sure how to do it.
First, I need the model equation: suppose that fit is a combined predictor called CP, could it be CP = -0.15 - 0.05X + 0.03Y ?
Then, the resulting combined predictor from the regression should may present a median value, so that I can compare median from the two groups Case and Controls which I used to make the regression ( in other words, my X and Y variables are N dimensional with N = N1 + N2, where N1 = Number of Controls, for which Case=0, and N2 = Number of Cases, for which Case=1).
I hope to have explained everything.

Resources