Error when calculating prediction error for logistic regression model - r

I am getting the following error: $ operator is invalid for atomic vectors. I am getting the error when trying to calculate the prediction error for a logistic regression model.
Here is the code and data I am using:
install.packages("ElemStatLearn")
library(ElemStatLearn)
# training data
train = vowel.train
# only looking at the first two classes
train.new = train[1:3]
# test data
test = vowel.test
test.new = test[1:3]
# performing the logistic regression
train.new$y <- as.factor(train.new$y)
mylogit <- glm(y ~ ., data = train.new, family = "binomial")
train.logit.values <- predict(mylogit, newdata=test.new, type = "response")
# this is where the error occurs (below)
train.logit.values$se.fit
I tried to make it of type list but that did not seem to work, I am wondering if there is a quick fix so that I can obtain either the prediction error or the misclassification rate.

Related

Why do my variables disappear after using feature selection with step()?

I made a Multinomial Logistic Regression model using library(nnet) in R.
I notice I, one, get an error, and two, after using the step() function, my predictor variables convert into the variable I'm attempting to predict, solely (Depression).
summary(multinom_model)$call
produces:
multinom(formula = out ~ ., data = train)
Warning message:
In sqrt(diag(vc)) : NaNs produced
BUT
mult_model <- step(multinom_model, trace = FALSE)
summary(mult_model)$call
this code produces:
multinom(formula = out ~ Depressed, data = train)
Why is this happening? Also, both models predict the same output on the test data. Does it have to do with the warning message? How do I fix that?

Compare regression with robust standard errors to null using Wald's Test in R

I am running a regression model that looks like this:
wwMLR <- lm(contAOMIdiff ~ PHQ9 + KVIQtot, data = wwMeanWide4)
Having used check_heteroscedasticity(wwMLR) from the Performance package I can see that the regression model violates the assumption of homoscedasticity. Due to this I have built a model with robust standard errors shown below:
library(estimatr)
wwMLR_hc3 <- lm_robust(formula = contAOMIdiff ~ PHQ9 + KVIQtot, data = wwMeanWide4,
se_type = "HC3", alpha = 0.0482)
What I would like to do now is compare this regression model to a null using Wald's Test. The null model looks like the below:
wwnull_hc3 <- lm_robust(formula = contAOMIdiff ~ 1, data = wwMeanWide4,
se_type = "HC3", alpha = 0.0482)
When I try to compare these using a Wald's Test:
library(lmtest)
waldtest(wwMLR_hc3, wwnull_hc3, vcov = vcovHC)
I get an error:
Error in eval(predvars, data, env) : object 'contAOMIdiff' not found
contAOMIdiff is my response variable in the regression. I am not sure why it can't be found but I am assuming this may be a compatibility issue between the lm_robust model type and the waldtest() function.
If anyone has any ideas on how I can get this to work, or an alternate way to run a Wald's Test on these two models I would be very grateful.
I have found a similar question here, which has not been answered: R Wald test for cluster robust se's

model frame default error variable lengths differ for logistic regression in R

I am new to R and I am trying to create a logit model. I created a train and test set for my data and when I am trying to create a logit model, I keep getting the following error message:
model <- glm(mortDefault2001$default ~.,family=binomial(link='logit'),data=train)
Error in model.frame.default(formula = mortDefault2001$default ~ .,
data = train,:variable lengths differ (found for 'creditScore')
What am I doing wrong/what can I do to fix this to run the model?
This is the code I used to create the test and train sets:
data <- subset(mortDefault2001,select=c(1,2,3,4,6))
train <- data[1:80000,]
train <- data[1:80000,]
test <- data[80001:99999,]
model <- glm(mortDefault2001$default ~.,family=binomial(link='logit'),data=train)
Error in model.frame.default(formula = mortDefault2001$default ~ ., data = train, :
variable lengths differ (found for 'creditScore')

Predicting responses for new observations using a model developed with multiple imputation via MICE

I have developed a model via multiple imputation using mice. I want to use this model to predict responses for new observations (containing no missing data), including standard errors. Passing the model object created in mice to predict doesn't work
A simple example using the in-built nhanes dataset. Say I wanted to develop a logistic regression model with the form age == 3 ~ bmi + hyp + chl, and use this model to predict, say, prob(age = 3 | bmi = 20, hyp = 2 and chl = 190)
library('mice')
imp<-mice(nhanes, seed = 1)
#create model on each imputed dataset
model <- with(imp, glm(age == 3 ~ bmi + hyp + chl, family = binomial))
#pool models into one
poolmodel <- pool(model)
#new data
newdata <- data.frame(bmi = 20, hyp = 2, chl = 190)
#attempt to predict response using predict() function
pred <- predict(object = model, newdata = newdata, type = 'link', se.fit = TRUE)
Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "c('mira', 'matrix')"
pred <- predict(object = poolmodel, newdata = newdata, type = 'link', se.fit = TRUE)
Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "c('mipo', 'mira', 'matrix')"
Obviously it would be straight forward to calculate predicted responses and errors manually using the pooled coefficients and the pooled covariance matrix. The real problem however is much larger and the model relies on a few splines and interactions, complicating calculations considerably. I would rather use existing functions that can do all this for me.
Is there a simple solution in R that will output predicted responses for any given (pooled) model object and any given set of new observations, without having to make cumbersome code modifications?
One way to do this is to stack all imputed data together and fit model on this complete dataset. After that you can use the function predict as normal. Parameter estimates generated by pool is actually the average of parameter estimates when you fit the same model on each imputed data separately. Of course, in this case, standard error for each covariate is underestimated.

Predict linearRidge with dummy variable

I am trying to do a ridge regression using the codes below with GenCont data in the library ridge
library(ridge)
data(GenCont)
GenCont_df <- as.data.frame(GenCont)
GenCont_df$SNP1 <- as.factor(GenCont_df$SNP1)
mod2 <- linearRidge(Phenotypes ~ SNP1+SNP2, data = GenCont_df)
predict(mod2, GenCont_df, na.action = na.pass, all.coef = FALSE,scaling ="scale")
But if I used dummy variables in the model I get this error
Error in X[, ll] : subscript out of bounds
Is there a way to predict dummy variables in Ridge regression in R?

Resources