Issues with logit regression in r - r

I am trying to run a logit regression and I tried two approaches:
m.logit <- glm(p4 ~ scale(log(gdp,orthodox,swb)),
data = happiness,
family = binomial("logit"))
summary(m.logit)
Throws: Error in summary(m.logit) : object 'm.logit' not found
While
m1.logit <- glm(p4 ~ gdp + orthodox + swb, family = binomial(link = "logit"), data = happiness)
Throws: Error in eval(family$initialize) : y values must be 0 <= y <= 1
I kind of understood the errors (in the former case m.logit is not found, and in the latter, I need to transform the variables I think...) but don't know how to solve it...
Any help?

Related

Trouble in GAM model in R software

I am trying to run the following code on R:
m <- gam(Flp_pop ~ s(Flp_CO, bs = "cr", k = 30), data = data, family = poisson, method = "REML")
My dataset is like this:
enter image description here
But when I try to execute, I get this error message:
"Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)"
I am very new to R, maybe it is a very basic question. But does anyone know why this is happening?
Thanks!
The Poisson distribution has support on the non-negative integers and you are passing a continuous variable as the response. Here's an example with simulated data
library("mgcv")
library("gratia")
library("dplyr")
df <- data_sim("eg1", seed = 2) %>% # simulate Gaussian response
mutate(yabs = abs(y)) # make y non negative
mp <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
# fails
which reproduces the error you saw
Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)
The warnings are of the form:
$> warnings()[1]
Warning message:
In dpois(y, y, log = TRUE) : non-integer x = 7.384012
Indicating the problem; the model is evaluating the probability mass for your response data given the estimated model and you're evaluating this at the indicated non-integer value, which returns a 0 mass plus the warning.
If we'd passed the original Gaussian variable as the response, which includes negative values, the function would have errored out earlier:
mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
which raises this error:
r$> mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
Error in eval(family$initialize) :
negative values not allowed for the 'Poisson' family
An immediate but not necessarily advisable solution is just to use the quasipoisson family
mq <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = quasipoisson, method = "REML")
which uses the same mean variance relationship as the Poisson distribution but not the actual distribution so we can get away with abusing it.
Better would be to ask yourself why you were trying to fit a model that is ostensibly for counts to a response that is a continuous (non-negative) variable?
If the answer is you had a count but then normalised it in some way (say by dividing by some measure of effort like area surveyed or length of observation time) then you should use an offset of the form + offset(log(effort_var)) added to the model formula, and use the original non-normalised integer variable as the response.
If you really have a continuous response and the poisson was an over sight, try fitting with family = Gamma(link = "log")) or family = tw().
If it's something else, you should edit your question to include that info and perhaps we here can help or the question could be migrated to CrossValidated if the issue is more statistical in nature.

Error when adjusting a GLM: Error in eval(family$initialize)

I am trying to adjust a generalized linear model defined below:
It must be noted that the response variable Var1, as well as the regressor variable Var2, have zero values, for which a constant has been added to avoid problems when applying the log.
model = glm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)),
family = gaussian(link = "log"), data = data2)
However, I am facing an error when performing the graph for the diagnostic analysis using the hnp function, which is expressed by:
library(hnp)
hnp(model)
Gaussian model (glm object)
Error in eval(family$initialize) :
cannot find valid starting values: please specify some
In order to get around the situation, I tried to perform the manual implementation to then carry out the construction of the graph, however, the error message is still present.
dfun <- function(obj) resid(obj)
sfun <- function(n, obj) simulate(obj)[[1]]
ffun <- function(resp) glm(resp ~ log(Var2+2) + offset(log(Var3/Var4)),
family = gaussian(link = "log"), data = data2)
hnp(model, newclass = TRUE, diagfun = dfun, simfun = sfun, fitfun = ffun)
Error in eval(family$initialize) :
cannot find valid starting values: please specify some
Some guidelines in which I found information to try to solve the problem were used, such as considering initial values to initialize the estimation algorithm both in the linear predictor, as well as for the means, however, these were not enough to solve the problem, see below the computational routine:
fit = lm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)), data=data2)
coefficients(fit)
(Intercept) log(Var2+2)
32.961103 -8.283306
model = glm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)),
family = gaussian(link = "log"), start = c(32.96, -8.28), data = data2)
hnp(model)
Error in eval(family$initialize) :
cannot find valid starting values: please specify some
See that the error persists even when trying to manually implement the half-normal plot.
dfun <- function(obj) resid(obj)
sfun <- function(n, obj) simulate(obj)[[1]]
ffun <- function(resp) glm(resp ~ log(Var2+2) + offset(log(Var3/Var4)),
family = gaussian(link = "log"), data = data2, start = c(32.96, -8.28))
hnp(model, newclass = TRUE, diagfun = dfun, simfun = sfun, fitfun = ffun)
Error in eval(family$initialize) :
cannot find valid starting values: please specify some
I also tried to readjust the model by removing the zeros from the database, however, I didn't get any solution to the problem, that is, it still persists.
I suspect what you meant to fit is a log transformed response variable against your predictors. You can more detail about the difference between a log link glm and a log transformed response variable. Essentially when you use a log link, you are assuming the errors are on the exponential scale. I am not so familiar with hnp but my guess it there are problems simulating the response variable.
If I run your regression like this using the data provided, it looks ok
data2$Y = with(data2, log( (Var1+2)/Var3/Var4))
model = glm(Y ~ log(Var2+2), data = data2)
hnp(model)

Logistic regression - eval(family$initialize) : y values must be 0 <= y <= 1

I am trying to perform logistic regression using R in a dataset provided here : http://archive.ics.uci.edu/ml/machine-learning-databases/00451/
It is about breast cancer. This dataset contains a column Classification which contains only 1 (if patient doesn't have cancer) or 2 (if patient has cancer)
library(ISLR)
dataCancer <- read.csv("~/Desktop/Isep/Machine
Leaning/TD/Project_Cancer/dataR2.csv")
attach(dataCancer)
#Step : Split data into training and testing data
training = (BMI>25)
testing = !training
training_data = dataCancer[training,]
testing_data = dataCancer[testing,]
Classification_testing = Classification[testing]
#Step : Fit a logistic regression model using training data
classification_model = glm(Classification ~ ., data =
training_data,family = binomial )
When running my script I get :
> classification_model = glm(Classification ~ ., data = training_data,family = binomial )
Error in eval(family$initialize) : y values must be 0 <= y <= 1
> summary(classification_model)
Error in summary(classification_model) : object 'classification_model' not found .
I added as.factor(dataCancer$Classification) as seen in other posts but it has not solved my problem.
Can you suggest me a way to have a classification's value between 0 and 1 if it is the content of this predictor?
You added the as.factor(dataCancer$Classification) in the script, but even if the dataset dataCancer is attached, a command like the one above does not transform the dataset variable Classification into a factor. It only returns a factor on the console.
Since you want to fit the model on the training dataset, you either specify
training_data$Classification <- as.factor(training_data$Classification)
classification_model <- glm(Classification ~ ., data =
training_data, family = binomial)
or use the as.factor function in the glm line code
classification_model <- glm(as.factor(Classification) ~ ., data =
training_data, family = binomial)
classification_model = glm(Classification ~ ., data = training_data,family = binomial )
Error in eval(family$initialize) : y values must be 0 <= y <= 1
This is because your data contains numeric values, not factor values. I hope you did
dataCancer$Classification <- as.factor(dataCancer$Classification)
Ideally, 1,0 or 1,2 will not matter as long as it's a factor. But, if doing the above also doesn't help, then you can try converting 1,2 to 1,0 and then trying the same code.
Of course the second error is because logistic regression variable was not created at all.
You need to recode the Dependent variable as 0,1 so use the below code.
library(car)
dataCancer$Classification <- recode(dataCancer$Classification, "1=0; 2=1")

Multilevel moderated mediation with continuous variables

I am a beginner in R, so please forgive me if my question reflects insufficient background.
I am trying to run a moderated mediation model using the mediation and lme4 libraries.
All of my variables are continuous. My data have a nested structure with individuals nested in branches (Branch).
In the model I'm trying to test, my predictor/independent variable (abranch) is at the branch level. My mediator (bmed) and outcome (cout) are at the individual level. And the effect of the mediator is moderated by another individual level variable (dmod). So in my model I have abranch predicting bmed, and bmed*dmod are predicting cout.
This is the syntax I've used:
med.fit <- glmer(
bmed ~ abranch + (1|Branch),
family = binomial(link = "logit"),
data = Dataset
)
out.fit <- glmer(
cout ~ dmod*bmed + (1+bmed|Branch),
family = binomial(link = "logit"),
data = Dataset
)
I was then thinking of using:
med.out <- mediate(med.fit, out.fit, treat = "abranch", mediator = "bmed",
+ sims = 100)
summary(med.out)
But even before getting to the last two lines, I get the following error:
Error in eval(family$initialize, rho) : y values must be 0 <= y <= 1
I now realize that this is because I'm using the "binomial"/logit family whereas my DV is continuous and not between 0 and 1. What can I do, given the nature of my variables?

error in gamma link specification for lmer

I'm trying to fit a mixed-effects model with a gamma distribution. The most basic model has one fixed predictor and 1 random effect. No matter which link I specify (I've tried log, identity and inverse), I obtain the following error. My real data has zeros in Y, but even when I apply simulated data with only positive Y as below, it throws the same error.
mockdf = data.frame(y = rnorm(100,77,6.5), x1 = sample(letters,100,replace = T), x2 = seq(1900,1999,1))
mod = lmer(y ~ (1|x1) + x2, family = gamma(link = 'identity'), na.action = na.exclude, data = mockdf)
Error in gamma(link = "identity") :
supplied argument name 'link' does not match 'x'
I searched through SO and couldn't find another person who ran into this error. Is my syntax incorrect?
Thanks for your help.

Resources