in my GLM model I tested for not significant variables. One of variable in Geography where values woould be country names. So test showed me that variable GeographyCountryA is significant so I want to keep it but not sure how to properly address it ? below gives an error:
Error in eval(predvars, data, env) : object 'GeographyGermany' not
found
Code:
churn_model_rl <- glm(data = train_churn, formula = Exited ~ Age+
+NumOfProducts+IsActiveMember+Gender+GeographyGermany,family = binomial(link = "logit"))
Related
I made a Multinomial Logistic Regression model using library(nnet) in R.
I notice I, one, get an error, and two, after using the step() function, my predictor variables convert into the variable I'm attempting to predict, solely (Depression).
summary(multinom_model)$call
produces:
multinom(formula = out ~ ., data = train)
Warning message:
In sqrt(diag(vc)) : NaNs produced
BUT
mult_model <- step(multinom_model, trace = FALSE)
summary(mult_model)$call
this code produces:
multinom(formula = out ~ Depressed, data = train)
Why is this happening? Also, both models predict the same output on the test data. Does it have to do with the warning message? How do I fix that?
I am trying to run the following code on R:
m <- gam(Flp_pop ~ s(Flp_CO, bs = "cr", k = 30), data = data, family = poisson, method = "REML")
My dataset is like this:
enter image description here
But when I try to execute, I get this error message:
"Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)"
I am very new to R, maybe it is a very basic question. But does anyone know why this is happening?
Thanks!
The Poisson distribution has support on the non-negative integers and you are passing a continuous variable as the response. Here's an example with simulated data
library("mgcv")
library("gratia")
library("dplyr")
df <- data_sim("eg1", seed = 2) %>% # simulate Gaussian response
mutate(yabs = abs(y)) # make y non negative
mp <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
# fails
which reproduces the error you saw
Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)
The warnings are of the form:
$> warnings()[1]
Warning message:
In dpois(y, y, log = TRUE) : non-integer x = 7.384012
Indicating the problem; the model is evaluating the probability mass for your response data given the estimated model and you're evaluating this at the indicated non-integer value, which returns a 0 mass plus the warning.
If we'd passed the original Gaussian variable as the response, which includes negative values, the function would have errored out earlier:
mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
which raises this error:
r$> mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
Error in eval(family$initialize) :
negative values not allowed for the 'Poisson' family
An immediate but not necessarily advisable solution is just to use the quasipoisson family
mq <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = quasipoisson, method = "REML")
which uses the same mean variance relationship as the Poisson distribution but not the actual distribution so we can get away with abusing it.
Better would be to ask yourself why you were trying to fit a model that is ostensibly for counts to a response that is a continuous (non-negative) variable?
If the answer is you had a count but then normalised it in some way (say by dividing by some measure of effort like area surveyed or length of observation time) then you should use an offset of the form + offset(log(effort_var)) added to the model formula, and use the original non-normalised integer variable as the response.
If you really have a continuous response and the poisson was an over sight, try fitting with family = Gamma(link = "log")) or family = tw().
If it's something else, you should edit your question to include that info and perhaps we here can help or the question could be migrated to CrossValidated if the issue is more statistical in nature.
null <- glm(Status ~ Idade, family = "binomial", data = train_data)
Error in model.frame.default(formula = Status ~ Age, data = train_data, :
variable lengths differ (found for 'Age')
When I run glm I get no errors. All the variables are in a single dataset and there are no missing values. I divided de file in:
dim(train_data)
dim(test_data)
The error only occurs when i use the train_data and the test_data. When I use the whole file, I don't have errors.
How do I solve the problem?
Did you use anything like
attach(data)
If so, your response Status would be based on the full dataset.
I am trying to run generalized linear model on my balanced(using SMOTE) train dataset but when I run the below R code I get an error saying
"Error: $ operator is invalid for atomic vectors"
Don't really know what it means. Any help would be highly appreciated!
model.glm<- train(Accident_Severity ~ ., data= smote_train,
method = "glm",metric = RMSE, trControl= "ctrl")
You have misspecified the options for the train function. This may work for you:
model.glm <- train(Accident_Severity ~ ., data = smote_train,
method = "glm", metric = "Kappa", trControl= trainControl())
In your original function call, the option trControl = "ctrl" caused the error message that you got. However, it's also likely that the option metric = "RMSE" will not work with your data (I am assuming that your variable Accident_Severity is a factor variable and that you are trying to fit a classification model).
#Logistic Regression
glm.fit <- glm(recent_cannabis_use~.,data = drug_use_train, family = binomial)
summary(glm.fit)
predict(glm.fit, with(drug_use_train, data.frame(Gender = "Male")), type = "response")
Trying to find the predicted probability for recent_canabis_use for a male.
You should use predict(glm.fit, newdata = data.frame(Gender = "Male")). Using with in this case is not warranted, since you are not accessing any of the variables in drug_use_train.
Note that this assumes your formula is, upon expansion, recent_cannabis_use ~ Gender. If you have other variable and you want to explore only the effect of Gender, you will need to set (pre-calculate or make up) all other variables to some fixed value (remember how coefficients are interpreted - change in y with one unit change of x, provided everything else stays the same). See for example this post.