R function for finding the sensitivity given an alpha value - r

I am new to data analysis with R so any help is appreciated.
I have a dataset with some explanatory variables and one target variable. The target variable is either Yes or No only. So I would like to use logistic regression for model fitting.
This is how I plot a roc curve
myModel = train(
myTarget ~ .,
myTrainData,
method = "glm",
metric = "ROC",
trControl = myControl,
na.action = na.pass
)
myPred = predict(myModel , newdata = myTestData, type="prob")
eval <- evalm(data.frame(myPred , myTestData$myTarget)
eval$roc
Now, I would like to find the sensitivity given an alpha value / Type I error
And show the information like the following if possible, how can I achieve it?
confusionMatrix(myPred, reference = myTestData$myTarget)

Related

Automate variable selection based on varimp in R

In R, I have a logistic regression model as follows
train_control <- trainControl(method = "cv", number = 3)
logit_Model <- train(result~., data=df,
trControl = train_control,
method = "glm",
family=binomial(link="logit"))
calculatedVarImp <- varImp(logit_Model, scale = FALSE)
I use multiple datasets that run through the same code, so the variable importance changes for each dataset. Is there a way to get the names of the variables that are less than n (e.g. 1) in the overall importance, so I can automate the removal of those variables and rerun the model.
I was unable to get the information from 'calculatedVarImp' variable by subsetting 'overall' value
lowVarImp <- subset(calculatedVarImp , importance$Overall <1)
Also, is there a better way of doing variable selection?
Thanks in advance
You're using the caret package. Not sure if you're aware of this, but caret has a method for stepwise logistic regression using the Akaike Information Criterion: glmStepAIC.
So it iteratively trains a model for every subset of predictors and stops at the one with the lowest AIC.
train_control <- trainControl(method = "cv", number = 3)
logit_Model <- train(y~., data= train_data,
trControl = train_control,
method = "glmStepAIC",
family=binomial(link="logit"),
na.action = na.omit)
logit_Model$finalModel
This is as automated as it gets but it may be worth reading this answer about the downsides to this method:
See Also.

how to pass glm control argument for earth using caret (maxit)

when fitting earth for a glm model, one can pass arguments to the glm call. For example:
mars_fit <- earth(formula = response ~ x1 + x2, data = sim_dat,
glm = list(family=binomial, control = list(maxit = 50)))
Using caret looks like
fit_control <- trainControl(method = "cv", number = 10)
mars_grid <- expand.grid(degree=1:2, nprune=2:10)
mars_fit <- train(factor(response)~x1+x2, method='earth', trControl = fit_control,
data=sim_dat, tuneGrid=mars_grid,
glm = list(control = list(maxit = 50)))
but the glm list is not passed. Any advice?
Edit 1:
Reading https://github.com/topepo/caret/issues/554 caret's author says it is either caught in the ... or it should be passed in the tuning grid. when passed through the tuning grid, since glm is a list, train complains that degree and nprune do not belong to the method, which is not true.
Edit 2:
Opened https://github.com/topepo/caret/issues/1018
issue is solved in this commit:
https://github.com/topepo/caret/commit/2ce2cf4c5889791b7dbca5d8896fcc6dc0d0bcfc

Stepwise Regression with ROC

I am learning data science with R on DataCamp. In one exercise, I have to build a stepwise regression model. Even though I create the stepwise model successfully, roc() function doesn't accept the response and it gives an error like: "'response' has more than two levels. Consider setting 'levels' explicitly or using 'multiclass.roc' instead"
I want to learn how to handle this problem so I wrote my code below.
# Specify a null model with no predictors
null_model <- glm(donated ~ 1, data = donors, family = "binomial")
# Specify the full model using all of the potential predictors
full_model <- glm(donated ~ ., data = donors, family = "binomial")
# Use a forward stepwise algorithm to build a parsimonious model
step_model <- step(null_model, scope = list(lower = null_model, upper = full_model), direction = "forward")
# Estimate the stepwise donation probability
step_prob <- predict(step_model, type = "response")
# Plot the ROC of the stepwise model
library(pROC)
ROC <- roc( step_prob, donors$donated)
plot(ROC, col = "red")
auc(ROC)
I changed the order of roc function's argument and the error was solved.
library(pROC)
ROC <- roc( donors$donated, step_prob)
plot(ROC, col = "red")
auc(ROC)

R - how to set a specific number of PCA components to train a prediction model

Using train() and preProcess() I want to build a predictive model using PCA with the first 7 principal components as my predictors.
The below works but I'm not able to specify the number of PCs:
predModel2 <- train(diagnosis~., data=training2, method = "glm", preProcess = "pca")
I've tried this to specify the number of PCs but I don't know how to incorporate it into train():
training_pre<-preProcess(training[,ILcols],method = c("center", "scale", "pca"),pcaComp= 7)
I've tried using:
predModel2 <- train(diagnosis~., data=training2, method = "glm", preProcess = "pca", pcaComp=7)
Error in train.default(x, y, weights = w, ...) : Stopping
UPDATE:
It seems I get around this by using predict() first:
training2_pca<-predict(training_pre,training2_pca)
train(diagnosis~., data=training2_pca, method = "glm")
All preprocessing should be done within the training folds or, in this case, resamples. That prevents 'data leaks', so the first of the above approaches should be preferred, see e.g. this question.
The pcaComp argument goes into trainControl(). Using the iris data, KNN and the first two principal components as an example:
predModel2 <- train(Species~., data=iris, method = "knn", preProcess = "pca",
trControl = trainControl(preProcOptions = list(pcaComp = 2)))

R Formula Interface Excluded Variables Still Referenced With predict()?

I'm using caret to train a gbm model in R. I've used the formula interface to exclude certain variables from my model:
gbmTune <- train(Outcome ~ . - VarA - VarB - VarC, data = train,
method = "gbm",
metric = "ROC",
tuneGrid = gbmGrid,
trControl = cvCtrl,
verbose = FALSE)
When I try to use predict() against my test set, R complains about new factor levels for a variable I've asked to be excluded. The only solution I've been able to come up with is to set those variables to NULL before training my model...remove them. That doesn't seem like the answer.
I'm fairly new at this, so I would love to know what I'm doing wrong!

Resources