How to glm on a linear model? - r

null <- glm(Status ~ Idade, family = "binomial", data = train_data)
Error in model.frame.default(formula = Status ~ Age, data = train_data, :
variable lengths differ (found for 'Age')
When I run glm I get no errors. All the variables are in a single dataset and there are no missing values. I divided de file in:
dim(train_data)
dim(test_data)
The error only occurs when i use the train_data and the test_data. When I use the whole file, I don't have errors.
How do I solve the problem?

Did you use anything like
attach(data)
If so, your response Status would be based on the full dataset.

Related

Why do my variables disappear after using feature selection with step()?

I made a Multinomial Logistic Regression model using library(nnet) in R.
I notice I, one, get an error, and two, after using the step() function, my predictor variables convert into the variable I'm attempting to predict, solely (Depression).
summary(multinom_model)$call
produces:
multinom(formula = out ~ ., data = train)
Warning message:
In sqrt(diag(vc)) : NaNs produced
BUT
mult_model <- step(multinom_model, trace = FALSE)
summary(mult_model)$call
this code produces:
multinom(formula = out ~ Depressed, data = train)
Why is this happening? Also, both models predict the same output on the test data. Does it have to do with the warning message? How do I fix that?

GLM Model adressing variables

in my GLM model I tested for not significant variables. One of variable in Geography where values woould be country names. So test showed me that variable GeographyCountryA is significant so I want to keep it but not sure how to properly address it ? below gives an error:
Error in eval(predvars, data, env) : object 'GeographyGermany' not
found
Code:
churn_model_rl <- glm(data = train_churn, formula = Exited ~ Age+
+NumOfProducts+IsActiveMember+Gender+GeographyGermany,family = binomial(link = "logit"))

Issues with logit regression in r

I am trying to run a logit regression and I tried two approaches:
m.logit <- glm(p4 ~ scale(log(gdp,orthodox,swb)),
data = happiness,
family = binomial("logit"))
summary(m.logit)
Throws: Error in summary(m.logit) : object 'm.logit' not found
While
m1.logit <- glm(p4 ~ gdp + orthodox + swb, family = binomial(link = "logit"), data = happiness)
Throws: Error in eval(family$initialize) : y values must be 0 <= y <= 1
I kind of understood the errors (in the former case m.logit is not found, and in the latter, I need to transform the variables I think...) but don't know how to solve it...
Any help?

model frame default error variable lengths differ for logistic regression in R

I am new to R and I am trying to create a logit model. I created a train and test set for my data and when I am trying to create a logit model, I keep getting the following error message:
model <- glm(mortDefault2001$default ~.,family=binomial(link='logit'),data=train)
Error in model.frame.default(formula = mortDefault2001$default ~ .,
data = train,:variable lengths differ (found for 'creditScore')
What am I doing wrong/what can I do to fix this to run the model?
This is the code I used to create the test and train sets:
data <- subset(mortDefault2001,select=c(1,2,3,4,6))
train <- data[1:80000,]
train <- data[1:80000,]
test <- data[80001:99999,]
model <- glm(mortDefault2001$default ~.,family=binomial(link='logit'),data=train)
Error in model.frame.default(formula = mortDefault2001$default ~ ., data = train, :
variable lengths differ (found for 'creditScore')

Running into 'Error in eval(predvars, data, env) : object 'Age' not found' while using predict function in logistic regression

#Logistic Regression
glm.fit <- glm(recent_cannabis_use~.,data = drug_use_train, family = binomial)
summary(glm.fit)
predict(glm.fit, with(drug_use_train, data.frame(Gender = "Male")), type = "response")
Trying to find the predicted probability for recent_canabis_use for a male.
You should use predict(glm.fit, newdata = data.frame(Gender = "Male")). Using with in this case is not warranted, since you are not accessing any of the variables in drug_use_train.
Note that this assumes your formula is, upon expansion, recent_cannabis_use ~ Gender. If you have other variable and you want to explore only the effect of Gender, you will need to set (pre-calculate or make up) all other variables to some fixed value (remember how coefficients are interpreted - change in y with one unit change of x, provided everything else stays the same). See for example this post.

Resources