Equation from interaction in a logistic regression (GLM) - r

I have the following glm regression:
fitglm= glm(Resp ~ Doses*Seasons, data=DataJenipa,family=binomial(link =
"probit"))
That gives that summary:
Call:
glm(formula = Resp ~ Doses * Seasons, family = binomial(link = "probit"),
data = DataJenipa)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6511 -0.4289 -0.3035 -0.3035 2.6079
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.63423 0.26604 -2.384 0.0171 *
Doses -0.23989 0.09339 -2.569 0.0102 *
Seasons2 -1.06117 0.44979 -2.359 0.0183 *
Doses:Seasons2 0.23989 0.14380 1.668 0.0953 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 208.05 on 399 degrees of freedom
Residual deviance: 195.71 on 396 degrees of freedom
AIC: 203.71
To visualize my model, I'm using interact_plot (from jtools package)
interact_plot(fitglm, pred = Doses, modx = Seasons, plot.points = T, point.shape = T,interval = F,modx.labels = c("Summer", "Winter"), line.thickness = 1.5)
and I get the following:
How do I get my two math equations from the two lines above? (like: Summer(Y) = -0.63423 -0.23989x ... and goes on)
I know my example is wrong, but how do I get these two equations from the graphic??

Already found a way!
I simply need to run two different glm regressions, each one with only one season (without the interaction Doses*Season). Doing that I'll have each line and their coefficients to make my equation!
So:
fitglmSummer <- glm(Resp ~ Doses, data=DataSummer,family=binomial(link = "probit"))
fitglmWinter <- glm(Resp ~ Doses, data=DataWinter,family=binomial(link = "probit"))
Thanks!

Related

Predict response vs manual for logistic regression probability

I am trying to manually calculate the probability of a given x with logistic regression model.
my model looks like this fit2 <- glm(ability~stability, data = df2)
I created a function that gives me the response:
estimator <- function(x){
predict(fit2, type = "response", newdata = data.frame(stability=x))
}
this function gives me 0.5304603 for the value x=550
then i create the manual version. For this i use the function p = e^(B0+B1*x)/(1 + e^(B0+B1*x))
so our code will look like this
est <- function(par, x){
x = c(1,x)
exp(par%*%x)/(1+exp(par%*%x))
}
where par = fit2$coefficients, x = 550
but this code returns 0.6295905
Why?
edit:
summary(fit2):
Call:
glm(formula = ability ~ stability, data = df2)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.78165 -0.33738 0.09462 0.31582 0.72823
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.531574 0.677545 -2.26 0.03275 *
stability 0.003749 0.001229 3.05 0.00535 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 0.1965073)
Null deviance: 6.7407 on 26 degrees of freedom
Residual deviance: 4.9127 on 25 degrees of freedom
AIC: 36.614
Number of Fisher Scoring iterations: 2

How does the Predict function handle continuous values with a 0 in R for a Poisson Log Link Model?

I am using a Poisson GLM on some dummy data to predict ClaimCounts based on two variables, frequency and Judicial Orientation.
Dummy Data Frame:
data5 <-data.frame(Year=c("2006","2006","2006","2007","2007","2007","2008","2009","2010","2010","2009","2009"),
JudicialOrientation=c("Defense","Plaintiff","Plaintiff","Neutral","Defense","Plaintiff","Defense","Plaintiff","Neutral","Neutral","Plaintiff","Defense"),
Frequency=c(0.0,0.06,.07,.04,.03,.02,0,.1,.09,.08,.11,0),
ClaimCount=c(0,5,10,3,4,0,7,8,15,16,17,12),
Loss = c(100000,100,2500,100000,25000,0,7500,5200, 900,100,0,50),
Exposure=c(10,20,30,1,2,4,3,2,1,54,12,13)
)
Model GLM:
ClaimModel <- glm(ClaimCount~JudicialOrientation+Frequency
,family = poisson(link="log"), offset=log(Exposure), data = data5, na.action=na.pass)
Call:
glm(formula = ClaimCount ~ JudicialOrientation + Frequency, family = poisson(link = "log"),
data = data5, na.action = na.pass, offset = log(Exposure))
Deviance Residuals:
Min 1Q Median 3Q Max
-3.7555 -0.7277 -0.1196 2.6895 7.4768
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3493 0.2125 -1.644 0.1
JudicialOrientationNeutral -3.3343 0.5664 -5.887 3.94e-09 ***
JudicialOrientationPlaintiff -3.4512 0.6337 -5.446 5.15e-08 ***
Frequency 39.8765 6.7255 5.929 3.04e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 149.72 on 11 degrees of freedom
Residual deviance: 111.59 on 8 degrees of freedom
AIC: 159.43
Number of Fisher Scoring iterations: 6
I am using an offset of Exposure as well.
I then want to use this GLM to predict claim counts for the same observations:
data5$ExpClaimCount <- predict(ClaimModel, newdata=data5, type="response")
If I understand correctly then the Poisson glm equation should then be:
ClaimCount = exp(-.3493 + -3.3343*JudicialOrientationNeutral +
-3.4512*JudicialOrientationPlaintiff + 39.8765*Frequency + log(Exposure))
However I tried this manually(In excel =EXP(-0.3493+0+0+LOG(10)) for observation 1 for example) and for some of the observations but did not get the correct answer.
Is my understanding of the GLM equation incorrect?
You are right with the assumption about how predict() for a Poisson GLM works. This can be verified in R:
co <- coef(ClaimModel)
p1 <- with(data5,
exp(log(Exposure) + # offset
co[1] + # intercept
ifelse(as.numeric(JudicialOrientation)>1, # factor term
co[as.numeric(JudicialOrientation)], 0) +
Frequency * co[4])) # linear term
all.equal(p1, predict(ClaimModel, type="response"), check.names=FALSE)
[1] TRUE
As indicated in the comments you probably get the wrong results in Excel because of the different basis of the logarithm (10 in Excel, Euler's number in R).

How to calculate R logistic regression standard error values manually? [duplicate]

This question already has answers here:
Extract standard errors from glm [duplicate]
(3 answers)
Closed 6 years ago.
I want to know, how the standard error values calculate from the logistic regression in R manually.
I took a sample data and i applied binomial logistic regression on it
data = data.frame(x = c(1,2,3,4,5,6,7,8,9,10),y = c(0,0,0,1,1,1,0,1,1,1))
model = glm(y ~ x, data = data, family = binomial(link = "logit"))
And my summary of the model is as follows and i have no idea, how the standard error has been calculated
> summary(model)
Call:
glm(formula = y ~ x, family = binomial(link = "logit"), data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9367 -0.5656 0.2641 0.6875 1.2974
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.9265 2.0601 -1.421 0.1554
x 0.6622 0.4001 1.655 0.0979 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13.4602 on 9 degrees of freedom
Residual deviance: 8.6202 on 8 degrees of freedom
AIC: 12.62
Number of Fisher Scoring iterations: 5
That would be great, if some one will answer to this... thanks in advance
You can calculate this as the square root of the diagonal elements of the unscaled covariance matrix output by summary(model)
sqrt(diag(summary(model)$cov.unscaled)*summary(model)$dispersion)
# (Intercept) x
# 2.0600893 0.4000937
For your model, the dispersion parameter is 1 so the last term (summary(model)$dispersion) could be ignored if you want.
To get this unscaled covariance matrix, you do
fittedVals = model$fitted.values
W = diag(fittedVals*(1 - fittedVals))
solve(t(X)%*%W%*%X)
# (Intercept) x
# (Intercept) 4.2439753 -0.7506158
# x -0.7506158 0.1600754

Syntax for stepwise logistic regression in r

I am trying to conduct a stepwise logistic regression in r with a dichotomous DV. I have researched the STEP function that uses AIC to select a model, which requires essentially having a NUll and a FULL model. Here's the syntax I've been trying (I have a lot of IVs, but the N is 100,000+):
Full = glm(WouldRecommend_favorability ~ i1 + i2 + i3 + i4 + i5 + i6.....i83 + b14 +
Shift_recoded, data = ee2015, family = "binomial")
Nothing = glm(WouldRecommend_favorability ~ 1, data = ee2015, family = "binomial")
Full_Nothing_Step = step(Nothing, scope = Full,Nothing, scale = 0, direction = c('both'),
trace = 1, keep = NULL, steps = 1000, k = 2)
One thing I am not sure about is the order in which "Nothing" and "Full" should be entered in the step formula. Whichever way I try, when I print a summary of "Full_Nothing_Step," it only gives me a summary of either "Nothing" or "Full:"
Call:
glm(formula = WouldRecommend_favorability ~ 1, family = "binomial",
data = ee2015)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8263 0.1929 0.1929 0.1929 0.1929
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.97538 0.01978 201 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 25950 on 141265 degrees of freedom
Residual deviance: 25950 on 141265 degrees of freedom
AIC: 25952
Number of Fisher Scoring iterations: 6
I am pretty familiar with logistic regression in general but am new to R.
As the documentation states, you can enter scope as a formula and or a list with both upper and lower bounds to search over.
In the example below, my initial model is lm1, I then implement the stepwise procedure in both directions. The bounds of this selection procedure are a model with all interaction terms while the lower bound is all terms. You can easily adapt this to a glm model and add the additional arguments you desire.
Be sure to read through the help page though.
lm1 <- lm(Fertility ~ ., data = swiss)
slm1 <- step(lm1, scope = list(upper = as.formula(Fertility ~ .^2),
lower = as.formula(Fertility ~ .)),
direction = "both")

Extracting model equation from glm function in R

I've made a logistic regression to combine two independent variables in R using pROC package and I obtain this:
summary(fit)
#Call: glm(formula = Case ~ X + Y, family = "binomial", data = data)
#Deviance Residuals:
# Min 1Q Median 3Q Max
#-1.5751 -0.8277 -0.6095 1.0701 2.3080
#Coefficients:
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) -0.153731 0.538511 -0.285 0.775281
#X -0.048843 0.012856 -3.799 0.000145 ***
#Y 0.028364 0.009077 3.125 0.001780 **
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#(Dispersion parameter for binomial family taken to be 1)
#Null deviance: 287.44 on 241 degrees of freedom
#Residual deviance: 260.34 on 239 degrees of freedom
#AIC: 266.34
#Number of Fisher Scoring iterations: 4
fit
#Call: glm(formula = Case ~ X + Y, family = "binomial", data = data)
#Coefficients:
# (Intercept) X Y
# -0.15373 -0.04884 0.02836
#Degrees of Freedom: 241 Total (i.e. Null); 239 Residual
#Null Deviance: 287.4
#Residual Deviance: 260.3 AIC: 266.3
Now I need to extract some information from this data, but I'm not sure how to do it.
First, I need the model equation: suppose that fit is a combined predictor called CP, could it be CP = -0.15 - 0.05X + 0.03Y ?
Then, the resulting combined predictor from the regression should may present a median value, so that I can compare median from the two groups Case and Controls which I used to make the regression ( in other words, my X and Y variables are N dimensional with N = N1 + N2, where N1 = Number of Controls, for which Case=0, and N2 = Number of Cases, for which Case=1).
I hope to have explained everything.

Resources