Warning: glm.fit: algorithm did not converge [duplicate] - r

This question already has answers here:
Why am I getting "algorithm did not converge" and "fitted prob numerically 0 or 1" warnings with glm?
(3 answers)
Closed 2 years ago.
banking_data
library(MASS)
library(randomForest)
Bdata <- read.csv("banking_data.csv")
head(Bdata)
Bdata$y <- ifelse(Bdata$y == "y", 1, 0)
intercept_model<- glm(y~1,family = binomial("logit"),data=Bdata)
summary(intercept_model)
(exp(intercept_model$coefficients[1]))/(1+exp(intercept_model$coefficients[1]))
I tried to run this code but it shows Warning:
glm.fit: algorithm did not converge
It shows :
Call:
glm(formula = y ~ 1, family = binomial("logit"), data = Bdata)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.409e-06 -2.409e-06 -2.409e-06 -2.409e-06 -2.409e-06
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -26.57 1754.75 -0.015 0.988
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 0.0000e+00 on 41187 degrees of freedom
Residual deviance: 2.3896e-07 on 41187 degrees of freedom
AIC: 2
Number of Fisher Scoring iterations: 25

Without more information on your data it will be hard to help you.
For general advise, see this existing post.
It could be because of the absence of two classes on your y data. Indeed, for instance :
y <- c(rep(1, 1000))
df <- data.frame(y = y)
reg <- glm(y ~ 1, data = df, family = binomial("logit"))
reg
Will return the same error. Did you check the balance of y with the function table(df$y) ?
Another solution is to increase the maximum of iteration :
reg <- glm(y ~ 1, data = df, family = binomial("logit"), maxit = 100)
reg
However this solution is not useful if you have only no inside your dataset.

Related

glm.fit: fitted probabilities numerically 0 or 1 occurred when building multiple logistic regression

I am having an issue when building my multiple logistic regression. I am sequentially adding each variable to the model and adding/discounting according to AIC. However, there are certain variables that I add that result in the error: glm.fit: fitted probabilities numerically 0 or 1. As an example when I add caregiver2, I get this error
I have checked that and variables including the response variable are factors.
# AIC (158.64)
summary(glm(bin_secondary_outcome ~ 1,
data = data, family = binomial))
# AIC (155.92) IMPROVEMENT
summary(glm(bin_secondary_outcome ~ gender,
data = data, family = binomial))
# AIC (151.07) IMPROVEMENT
summary(glm(bin_secondary_outcome ~ gender + age_months_cat,
data = data, family = binomial))
#AIC (85.417) IMPROVEMENT
summary(glm(bin_secondary_outcome ~ gender + age_months_cat +
freshfood.wash_hands1,
data = data, family = binomial))
# AIC (NA)
summary(glm(bin_secondary_outcome ~ gender + age_months_cat +
freshfood.wash_hands1 + caregiver2,
data = data, family = binomial))
> summary(glm(bin_secondary_outcome ~ gender + age_months_cat +
+ freshfood.wash_hands1 + caregiver2,
+ data = data, family = binomial))
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'summary': (converted from warning) glm.fit: fitted probabilities numerically 0 or 1 occurred
I am confused as when I run the glm model for only caregiver2 I can't see what the issue is
> summary(glm(bin_secondary_outcome ~ caregiver2, data = data, family = binomial))
Call:
glm(formula = bin_secondary_outcome ~ caregiver2, family = binomial,
data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3018 -1.1501 -0.3018 1.2049 1.8930
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.06454 0.20750 -0.311 0.756
caregiver21 16.63061 979.61005 0.017 0.986
caregiver22 0.35222 0.79145 0.445 0.656
caregiver23 -1.54490 1.11492 -1.386 0.166
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 155.26 on 111 degrees of freedom
Residual deviance: 143.80 on 108 degrees of freedom
(1 observation deleted due to missingness)
AIC: 151.8
Number of Fisher Scoring iterations: 15

R survival package - survreg reports singularities when using big intervals

I came across an issue when trying to do an interval regression using the survival package. The goal was merely to estimate the constant using interval censored data.
> library(survival)
> y = with(data, Surv(lb, ub, type = "interval2"))
> model_y = survreg(y ~ 1, dist = "gaussian")
> model_y
Call:
survreg(formula = y ~ 1, dist = "gaussian")
Coefficients: (1 not defined because of singularities)
(Intercept)
NA
Scale= 256178.4
Loglik(model)= -6186.5 Loglik(intercept only)= -6186.5
n=2556 (132 observations deleted due to missingness)
For the intercept NA is reported. Additionally, the remark "(1 not defined because of singularities)" is displayed.
However, once i scale down the bounds of the intervals by a constant factor, proper results are obtained (works for integers > 3):
> library(survival)
> y = with(data, Surv(lb/4, ub/4, type = "interval2"))
> model_y = survreg(y ~ 1, dist = "gaussian")
> model_y
Call:
survreg(formula = y ~ 1, dist = "gaussian")
Coefficients:
(Intercept)
567.184
Scale= 64000.89
Loglik(model)= -6186 Loglik(intercept only)= -6186
n=2556 (132 observations deleted due to missingness)
Does anyone have any ideas on why that is?
Data
Kind regards

Weighted logistic regression in R

Given sample data of proportions of successes plus sample sizes and independent variable(s), I am attempting logistic regression in R.
The following code does what I want and seems to give sensible results, but does not look like a sensible approach; in effect it doubles the size of the data set
datf <- data.frame(prop = c(0.125, 0, 0.667, 1, 0.9),
cases = c(8, 1, 3, 3, 10),
x = c(11, 12, 15, 16, 18))
datf2 <- rbind(datf,datf)
datf2$success <- rep(c(1, 0), each=nrow(datf))
datf2$cases <- round(datf2$cases*ifelse(datf2$success,datf2$prop,1-datf2$prop))
fit2 <- glm(success ~ x, weight=cases, data=datf2, family="binomial")
datf$proppredicted <- 1 / (1 + exp(-predict(fit2, datf)))
plot(datf$x, datf$proppredicted, type="l", col="red", ylim=c(0,1))
points(datf$x, datf$prop, cex=sqrt(datf$cases))
producing a chart like
which looks reasonably sensible.
But I am not happy about the use of datf2 as a way of separating the successes and failures by duplicating the data. Is something like this necessary?
As a lesser question, is there a cleaner way of calculating the predicted proportions?
No need to construct artificial data like that; glm can fit your model from the dataset as given.
> glm(prop ~ x, family=binomial, data=datf, weights=cases)
Call: glm(formula = prop ~ x, family = binomial, data = datf, weights = cases)
Coefficients:
(Intercept) x
-9.3533 0.6714
Degrees of Freedom: 4 Total (i.e. Null); 3 Residual
Null Deviance: 17.3
Residual Deviance: 2.043 AIC: 11.43
You will get a warning about "non-integer #successes", but that is because glm is being silly. Compare to the model on your constructed dataset:
> fit2
Call: glm(formula = success ~ x, family = "binomial", data = datf2,
weights = cases)
Coefficients:
(Intercept) x
-9.3532 0.6713
Degrees of Freedom: 7 Total (i.e. Null); 6 Residual
Null Deviance: 33.65
Residual Deviance: 18.39 AIC: 22.39
The regression coefficients (and therefore predicted values) are basically equal. However your residual deviance and AIC are suspect because you've created artificial data points.

glm in well separated groups does not find the coefficient and p-values

Disclaimer: I am very new to glm binomial.
However, this to me sounds very basic however glm brings back something that either is incorrect or I don't know how to interpret it.
First I was using my primary data and was getting errors, then I tried to replicate the error and I see the same thing: I define two columns, indep and dep and glm results does not make sense, to me at least...
Any help will be really appreciated. I have a second question on handling NAs in my glm but first I wish to take care of this :(
set.seed(100)
x <- rnorm(24, 50, 2)
y <- rnorm(24, 25,2)
j <- c(rep(0,24), rep(1,24))
d <- data.frame(dep= as.factor(j),indep = c(x,y))
mod <- glm(dep~indep,data = d, family = binomial)
summary(mod)
Which brings back:
Call:
glm(formula = dep ~ indep, family = binomial, data = d)
Deviance Residuals:
Min 1Q Median 3Q Max
-9.001e-06 -7.612e-07 0.000e+00 2.110e-08 1.160e-05
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 92.110 168306.585 0.001 1
indep -2.409 4267.658 -0.001 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6.6542e+01 on 47 degrees of freedom
Residual deviance: 3.9069e-10 on 46 degrees of freedom
AIC: 4
Number of Fisher Scoring iterations: 25
Number of Fisher Scoring iterations: 25
What is happening? I see the warning but in this case these two groups are really separated...
Barplot of the random data:
enter image description here

Binary logistic regression with a dichotomous predictor

I'm getting puzzled by a binary logistic regression in R with (obviously) a dichotomous outcome variable (coded 0 and 1) and a dichotomous predictor variable (coded 0 and 1). A contingency table suggests the outcome is a very good predictor, but it's not coming out as significant in my logistic regression. I found the same effect with a dummy problem, so I wonder if somebody can help me spot the problem here when I use a 'perfect' predictor?
outcome <- c(0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1)
predictor <- outcome
model <- glm(outcome ~ predictor, family = binomial)
summary(model)
Call:
glm(formula = outcome ~ predictor, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q
Max
-0.000006547293 -0.000006547293 -0.000006547293 0.000006547293 0.000006547293
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -24.56607 53484.89343 -0.00046 0.99963
predictor 49.13214 79330.94390 0.00062 0.99951
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 15.15820324648649020 on 10 degrees of freedom
Residual deviance: 0.00000000047153748 on 9 degrees of freedom
AIC: 4
Number of Fisher Scoring iterations: 23
My question is why "predictor" comes out with p = .999 rather than something very small, given that it should perfectly predict the outcome here. Thanks in advance.
Edit: The output is the same if I change the main command to outcome ~ as.factor(predictor)

Resources