Why do my variables disappear after using feature selection with step()? - r

I made a Multinomial Logistic Regression model using library(nnet) in R.
I notice I, one, get an error, and two, after using the step() function, my predictor variables convert into the variable I'm attempting to predict, solely (Depression).
summary(multinom_model)$call
produces:
multinom(formula = out ~ ., data = train)
Warning message:
In sqrt(diag(vc)) : NaNs produced
BUT
mult_model <- step(multinom_model, trace = FALSE)
summary(mult_model)$call
this code produces:
multinom(formula = out ~ Depressed, data = train)
Why is this happening? Also, both models predict the same output on the test data. Does it have to do with the warning message? How do I fix that?

Related

Compare regression with robust standard errors to null using Wald's Test in R

I am running a regression model that looks like this:
wwMLR <- lm(contAOMIdiff ~ PHQ9 + KVIQtot, data = wwMeanWide4)
Having used check_heteroscedasticity(wwMLR) from the Performance package I can see that the regression model violates the assumption of homoscedasticity. Due to this I have built a model with robust standard errors shown below:
library(estimatr)
wwMLR_hc3 <- lm_robust(formula = contAOMIdiff ~ PHQ9 + KVIQtot, data = wwMeanWide4,
se_type = "HC3", alpha = 0.0482)
What I would like to do now is compare this regression model to a null using Wald's Test. The null model looks like the below:
wwnull_hc3 <- lm_robust(formula = contAOMIdiff ~ 1, data = wwMeanWide4,
se_type = "HC3", alpha = 0.0482)
When I try to compare these using a Wald's Test:
library(lmtest)
waldtest(wwMLR_hc3, wwnull_hc3, vcov = vcovHC)
I get an error:
Error in eval(predvars, data, env) : object 'contAOMIdiff' not found
contAOMIdiff is my response variable in the regression. I am not sure why it can't be found but I am assuming this may be a compatibility issue between the lm_robust model type and the waldtest() function.
If anyone has any ideas on how I can get this to work, or an alternate way to run a Wald's Test on these two models I would be very grateful.
I have found a similar question here, which has not been answered: R Wald test for cluster robust se's

How to glm on a linear model?

null <- glm(Status ~ Idade, family = "binomial", data = train_data)
Error in model.frame.default(formula = Status ~ Age, data = train_data, :
variable lengths differ (found for 'Age')
When I run glm I get no errors. All the variables are in a single dataset and there are no missing values. I divided de file in:
dim(train_data)
dim(test_data)
The error only occurs when i use the train_data and the test_data. When I use the whole file, I don't have errors.
How do I solve the problem?
Did you use anything like
attach(data)
If so, your response Status would be based on the full dataset.

model frame default error variable lengths differ for logistic regression in R

I am new to R and I am trying to create a logit model. I created a train and test set for my data and when I am trying to create a logit model, I keep getting the following error message:
model <- glm(mortDefault2001$default ~.,family=binomial(link='logit'),data=train)
Error in model.frame.default(formula = mortDefault2001$default ~ .,
data = train,:variable lengths differ (found for 'creditScore')
What am I doing wrong/what can I do to fix this to run the model?
This is the code I used to create the test and train sets:
data <- subset(mortDefault2001,select=c(1,2,3,4,6))
train <- data[1:80000,]
train <- data[1:80000,]
test <- data[80001:99999,]
model <- glm(mortDefault2001$default ~.,family=binomial(link='logit'),data=train)
Error in model.frame.default(formula = mortDefault2001$default ~ ., data = train, :
variable lengths differ (found for 'creditScore')

Error when calculating prediction error for logistic regression model

I am getting the following error: $ operator is invalid for atomic vectors. I am getting the error when trying to calculate the prediction error for a logistic regression model.
Here is the code and data I am using:
install.packages("ElemStatLearn")
library(ElemStatLearn)
# training data
train = vowel.train
# only looking at the first two classes
train.new = train[1:3]
# test data
test = vowel.test
test.new = test[1:3]
# performing the logistic regression
train.new$y <- as.factor(train.new$y)
mylogit <- glm(y ~ ., data = train.new, family = "binomial")
train.logit.values <- predict(mylogit, newdata=test.new, type = "response")
# this is where the error occurs (below)
train.logit.values$se.fit
I tried to make it of type list but that did not seem to work, I am wondering if there is a quick fix so that I can obtain either the prediction error or the misclassification rate.

Increasing number of iterations for lme4 version 1.1-7

I am encountering a problem with iterations whilst trying to do a mixed effects binomial regression using the glmer function of the package lme4 version 1/1-7.
When I run the model using the code:
model <- glmer(Clinical.signs ~ cloacal +(1|Chicken_ID), family = binomial,
data = viral_load_9)
I get the warning:
Error: pwrssUpdate did not converge in (maxit) iterations
When I follow the advice given here
Using the code:
model <- glmer(Clinical.signs ~ cloacal +(1|Chicken_ID), family = binomial,
data = viral_load_9,
control=glmerControl(optimizer="bobyqa",
optCtrl = list(maxfun = 100000)))
I still have the exact same error message.
Any suggestions on what might be wrong with my code will be gratefully received.
-----------------------------------------------------------------
Following the advice from aosmith (Thanks for the sugggestion!) I am including the data and the code so as others might be able to replicate the results I am getting. Note that the code worked fine for variable "oral" and produced "model_1", but when I ran it with the variable "cloacal", I got the error message as noted above.
Chicken_ID <- c(44,44,45,45,46,46,47,47,48,48,49,49,50,50,51,51,52,52,53,55,55)
oral <- c(-0.4827578,-0.1845839,-1.3772797,-0.7809318,-0.4827578,1.6044598,0.1135901,0.411764,-0.1845839,1.6044598,-0.1845839,1.6044598,-1.6754536,0.709938,-1.0791057,0.709938,0.1135901,1.0081119,0.411764,-1.6754536,-0.1845839)
cloacal <- c(-0.9833258,0.450691,-1.1267275,0.7374944,-1.1267275,1.0242977,-1.5569325,1.0242977,0.3072893,1.0242977,-0.1229157,1.1676994,-1.5569325,0.5940927,0.450691,0.3072893,-1.1267275,0.7374944,0.1638876,-1.5569325,1.1676994)
clinical.signs <- c("YES","YES","NO","YES","NO","YES","NO","YES","YES","YES","YES","YES","NO","YES","YES","YES","NO","YES","YES","NO","YES")
clinical.signs <- factor(clinical.signs)
viral_load <- data.frame(Chicken_ID, oral, cloacal, clinical.signs)
library(lme4)
model_1 <- glmer(clinical.signs ~ oral +(1|Chicken_ID),
family = binomial, data = viral_load)
summary(model_1)
model_2 <- glmer(clinical.signs ~ cloacal +(1|Chicken_ID),
family = binomial, data = viral_load)
It may not be a problem with your code. See this Q on Cross-Validated.
Some things you can do to prevent convergence failures:
Rescale continuous variables
Try different approximators using glmerControl()
Check your data for sparse data. If there aren't sufficient outcomes or observations at certain levels of predictors the model may fail to converge.

Resources