R Caret's Train function quotation marks problem - r

I am using Caret's Train function inside my own function to train multiple models. Because Caret cannot handle the quotation marks in the X input, i tried removing the quotes with base R's 'noquote' function. Because in other parts of the function i need the input with quotation marks, i cannot remove the quotations surrounding the input values beforehand. Thanks in advance!
Code:
i <- "Dax.Shepard"
celeg_lgr = train(noquote(i) ~ ., method = "glm",
family = binomial(link = "logit"), data = celeb_trn,
trControl = trainControl(method = 'cv', number = 5))
Running this code results in the following error:
Error in model.frame.default(form = op ~ ., data = celeb_trn, na.action = na.fail) :
variable lengths differ (found for 'Dax.Shepard')
PS.
Running the code like this does not result in any error:
celeg_lgr = train(Dax.Shepard ~ ., method = "glm",
family = binomial(link = "logit"), data = celeb_trn,
trControl = trainControl(method = 'cv', number = 5))

Related

how to Construction of Cross Validation Plot using LOOCV

loocv <- glm(data_set, tree)$delta[1]
haven't received result, error pop up in glm(data_set, tree) : 'family' not recognized
2.
stop("'family' not recognized")
1.
glm(data_set, tree)
So the syntax for glm is as follows:
glm(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart, offset,
control = list(…), model = TRUE, method = "glm.fit",
x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = NULL, …)
Note that the second argument to this is family. tree is not an acceptable family, the allowed values are:
binomial(link = "logit")
gaussian(link = "identity")
Gamma(link = "inverse")
inverse.gaussian(link = "1/mu^2")
poisson(link = "log")
quasi(link = "identity", variance = "constant")
quasibinomial(link = "logit")
quasipoisson(link = "log")
Were you trying to fit a tree based model like a CART? If so I recommend using a package like tree or rpart to fit this to your data.

preprocessing (center and scale) only specific variables (numeric variables)

I have a dataframe that consist of numerical and non-numerical variables. I am trying to fit a logisic regression model predicting my variable "risk" based on all other variables, optimizing AUC using a 6-fold cross validation.
However, I want to center and scale all numerical explanatory variables. My code raises no errors or warning but somehow I fail to figure out how to tell train() through preProcess (or in some other way) to just center and scale my numerical variables.
Here is the code:
test <- train(risk ~ .,
method = "glm",
data = df,
family = binomial(link = "logit"),
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv",
number = 6,
classProbs = TRUE,
summaryFunction = prSummary),
metric = "AUC")
You could try to preprocess all numerical variables in original df first and then applying train function over scaled df
library(dplyr)
library(caret)
df <- df %>%
dplyr::mutate_if(is.numeric, scale)
test <- train(risk ~ .,
method = "glm",
data = df,
family = binomial(link = "logit"),
trControl = trainControl(method = "cv",
number = 6,
classProbs = TRUE,
summaryFunction = prSummary),
metric = "AUC")

Error building partial dependence plots for RF using FinalModel output from caret's train() function

I am using the following code to fit and test a random forest classification model:
> control <- trainControl(method='repeatedcv',
+ number=5,repeats = 3,
+ search='grid')
> tunegrid <- expand.grid(.mtry = (1:12))
> rf_gridsearch <- train(y = river$stat_bino,
+ x = river[,colnames(river) != "stat_bino"],
+ data = river,
+ method = 'rf',
+ metric = 'Accuracy',
+ ntree = 600,
+ importance = TRUE,
+ tuneGrid = tunegrid, trControl = control)
Note, I am using
train(y = river$stat_bino, x = river[,colnames(river) != "stat_bino"],...
rather than: train(stat_bino ~ .,...
so that my categorical variables will not be turned into dummy variables.
solution here: variable encoding in K-fold validation of random forest using package 'caret')
I would like to extract the FinalModel and use it to make partial dependence plots for my variables (using code below), but I get an error message and don't know how to fix it.
> model1 <- rf_gridsearch$finalModel
> library(pdp)
> partial(model1, pred.var = "MAXCL", type = "classification", which.class = "1", plot =TRUE)
Error in eval(stats::getCall(object)$data) :
..1 used in an incorrect context, no ... to look in
Thanks for any solutions here!

Using caret with recipes is leading to difficulties with resample

I've been using recipes to pipe into caret::train, which has been going well, but now I've tried some step_transforms, I'm getting the error:
Error in resamples.default(model_list) :
There are different numbers of resamples in each model
when I compare models with and without the transformations. The same code with step_centre and step_scale works fine.
library(caret)
library(tidyverse)
library(tidymodels)
formula <- price ~ carat
model_recipe <- recipe(formula, data = diamonds)
quadratic_model_recipe <- recipe(formula, data = diamonds) %>%
step_poly(all_predictors())
model_list <- list(
linear_model = NULL,
quadratic = NULL
)
model_list$linear_model <-
model_recipe %>% train(
data = diamonds,
method = "lm",
trControl = trainControl(method = "cv"))
model_list$quadratic_model <-
quadratic_model_recipe %>% train(
data = diamonds,
method = "lm",
trControl = trainControl(method = "cv"))
resamp <- resamples(model_list)
quadratic = NULL should have been quadratic_model = NULL

Creating Custom Folds For Caret CV

I'm using the caret package to model and cross validate
model <- caret::train(mpg ~ wt
+ drat
+ disp
+ qsec
+ as.factor(am),
data = mtcars,
method = "lm",
trControl = caret::trainControl(method = "cv",
repeats=5,
returnData =FALSE))
However, I'd like to pass the trainControl a custom set of indices relating to my folds. This can be done via IndexOut.
model <- caret::train(wt ~ + disp + drat,
data = mtcars,
method = "lm",
trControl = caret::trainControl(method = "cv",
returnData =FALSE,
index = indicies$train,
indexOut = indicies$test))
What I'm struggling with is that I only want to test on rows in mtcars where the mtcars.am==0. Thus the use of createFolds won't work because you can't add a criterion. Does anyone know of any other functions that allow indexing of rows into K-folds where a criterion of mtcars.am==0 can be added in creating indicies$test?
I think this should work. Just feed the index with the desired row index.
index = list(which(mtcars$am == 0))
model <- caret::train(
wt ~ +disp + drat,
data = mtcars,
method = "lm",
trControl = caret::trainControl(
method = "cv",
returnData = FALSE,
index = index
)
)
index argument is a list so you can feed as many iterations as you want to that list by creating multiple nested list in the index.
Thanks for you help. I got there in the end by modifying the output from createFolds not the best example mtcars because it's such a small dataset but you get the idea:
folds<-caret::createFolds(mtcars,k=2)
indicies<-list()
#Create training folds
indicies$train<-lapply(folds,function(x) which(!1:nrow(mtcars) %in% x))
#Create test folds based output "folds" and with criterion added
indicies$test<-lapply(folds,function(x) which(1:nrow(mtcars) %in% x & mtcars[,"am"]==1))

Resources