Predict linearRidge with dummy variable - r

I am trying to do a ridge regression using the codes below with GenCont data in the library ridge
library(ridge)
data(GenCont)
GenCont_df <- as.data.frame(GenCont)
GenCont_df$SNP1 <- as.factor(GenCont_df$SNP1)
mod2 <- linearRidge(Phenotypes ~ SNP1+SNP2, data = GenCont_df)
predict(mod2, GenCont_df, na.action = na.pass, all.coef = FALSE,scaling ="scale")
But if I used dummy variables in the model I get this error
Error in X[, ll] : subscript out of bounds
Is there a way to predict dummy variables in Ridge regression in R?

Related

How to plot the roc of a glm model with multiple terms in R?

I have a glm model with multiple terms. I need to plot the roc and find the auc. I tried using roc() and multiclass.roc() but get Error in plot.new() : figure margins too large.
library(AER)
data("Affairs")
str(Affairs)
Affairs$affairs <- as.factor(Affairs$affairs)
m3 <- glm( affairs ~ gender+age+yearsmarried+religiousness+rating, family =
binomial, data = Affairs)
honk <- roc(affairs ~ gender+age+yearsmarried+religiousness+rating, data = Affairs)
plot(honk)
honk$auc

Error: $ operator not defined for this S4 class while running hoslem.test

I'm working on an optimization of a logistic regression model made with glm, the optimization is a lasso regression using glmnet. I want to compare both models using the output of a Hosmer Lemeshow test and I get this output.
For the glm I get
> hl <- hoslem.test(trainingDatos$Exited, fitted(logit.Mod))
> hl
Hosmer and Lemeshow goodness of fit (GOF) test
data: trainingDatos$Exited, fitted(logit.Mod)
X-squared = 2.9161, df = 8, p-value = 0.9395
And when I try to run the test for the lasso regression I get
> hll <- hoslem.test(trainingDatos$Exited, fitted(lasso.model), g=10)
Error in cut.default(yhat, breaks = qq, include.lowest = TRUE) :
'x' must be numeric
I also tried to use the coefficients of the lasso regression to make it numeric and I get
> hll <- hoslem.test(trainingDatos$Exited, fitted(lasso.model$beta), g=10)
Error: $ operator not defined for this S4 class
But when I treat it as an S4
> hll <- hoslem.test(trainingDatos$Exited, fitted(lasso.model#beta), g=10)
Error in fitted(lasso.model#beta) :
trying to get slot "beta" from an object (class "lognet") that is not an S4 object
Any way to run the test for my lasso regression?
Here is my full code for the lasso regression, can't share the database right now sorry
#Creation of Training Data Set
input_ones <- Datos[which(Datos$Exited == 1), ] #All 1s
input_zeros <- Datos[which(Datos$Exited == 0), ] #All 0s
set.seed(100)
#Training 1s
input_ones_training_rows <- sample(1:nrow(input_ones), 0.7*nrow(input_ones))
#Training 0s
input_zeros_training_rows <- sample(1:nrow(input_zeros), 0.7*nrow(input_ones))
training_ones <- input_ones[input_ones_training_rows, ]
training_zeros <- input_zeros[input_zeros_training_rows, ]
trainingDatos <- rbind(training_ones, training_zeros)
library(glmnet)
#Conversion of training data into matrix form
x <- model.matrix(Exited ~ CreditScore + Geography + Gender
+ Age + Tenure + Balance + IsActiveMember
+ EstimatedSalary, trainingDatos)[,-1]
#Defining numeric response variable
y <- trainingDatos$Exited
sed.seed(100)
#Grid search to find best lambda
cv.lasso<-cv.glmnet(x, y, alpha = 1, family = "binomial")
#Creation of the model
lasso.model <- glmnet(x, y, alpha = 1, family = "binomial",
lambda = cv.lasso$lambda.1se)
coef(cv.lasso, cv.lasso$lambda.1se)
#Now trying to run the test
library(ResourceSelection)
set.seed(12657)
hll <- hoslem.test(trainingDatos$Exited, fitted(lasso.model), g=10)#numeric value error
hll <- hoslem.test(trainingDatos$Exited, fitted(lasso.model$beta), g=10)#$ not defined for S4
hll <- hoslem.test(trainingDatos$Exited, fitted(lasso.model#beta), g=10)#saying that beta is nos S4
glmnet uses a unique predict() method for obtaining fitted values. As rightly mentioned, the errors came from using fitted(). Meanwhile, running such tests could be easier with the gofcat package. Supported objects are passed directly to the functions. Your glm model, for instance, goes hosmerlem(logit.Mod).

model frame default error variable lengths differ for logistic regression in R

I am new to R and I am trying to create a logit model. I created a train and test set for my data and when I am trying to create a logit model, I keep getting the following error message:
model <- glm(mortDefault2001$default ~.,family=binomial(link='logit'),data=train)
Error in model.frame.default(formula = mortDefault2001$default ~ .,
data = train,:variable lengths differ (found for 'creditScore')
What am I doing wrong/what can I do to fix this to run the model?
This is the code I used to create the test and train sets:
data <- subset(mortDefault2001,select=c(1,2,3,4,6))
train <- data[1:80000,]
train <- data[1:80000,]
test <- data[80001:99999,]
model <- glm(mortDefault2001$default ~.,family=binomial(link='logit'),data=train)
Error in model.frame.default(formula = mortDefault2001$default ~ ., data = train, :
variable lengths differ (found for 'creditScore')

Plot ROC in R for a GLM model

pred1 <- prediction(predictions=glm.prob2,labels = train_data)
error: Error in prediction(predictions = glm.prob2, labels = train_data) :
Number of predictions in each run must be equal to the number of labels for each run.
I have used glm model to predict the output and trying to produce pred1 as a variable for plotting the ROC curve.
Here is the full code
View(train_data)
library(ROCR)
pred1 <- prediction(predictions=glm.prob2,labels = train_data)
perf1<-performance(pred1,measure ="TP.rate",x.measure = "FP.rate")
plot(perf1)

Error when calculating prediction error for logistic regression model

I am getting the following error: $ operator is invalid for atomic vectors. I am getting the error when trying to calculate the prediction error for a logistic regression model.
Here is the code and data I am using:
install.packages("ElemStatLearn")
library(ElemStatLearn)
# training data
train = vowel.train
# only looking at the first two classes
train.new = train[1:3]
# test data
test = vowel.test
test.new = test[1:3]
# performing the logistic regression
train.new$y <- as.factor(train.new$y)
mylogit <- glm(y ~ ., data = train.new, family = "binomial")
train.logit.values <- predict(mylogit, newdata=test.new, type = "response")
# this is where the error occurs (below)
train.logit.values$se.fit
I tried to make it of type list but that did not seem to work, I am wondering if there is a quick fix so that I can obtain either the prediction error or the misclassification rate.

Resources