I am using Caret's Train function inside my own function to train multiple models. Because Caret cannot handle the quotation marks in the X input, i tried removing the quotes with base R's 'noquote' function. Because in other parts of the function i need the input with quotation marks, i cannot remove the quotations surrounding the input values beforehand. Thanks in advance!
i <- "Dax.Shepard"
celeg_lgr = train(noquote(i) ~ ., method = "glm",
family = binomial(link = "logit"), data = celeb_trn,
trControl = trainControl(method = 'cv', number = 5))
Running this code results in the following error:
Error in model.frame.default(form = op ~ ., data = celeb_trn, na.action = na.fail) :
variable lengths differ (found for 'Dax.Shepard')
Running the code like this does not result in any error:
celeg_lgr = train(Dax.Shepard ~ ., method = "glm",
family = binomial(link = "logit"), data = celeb_trn,
trControl = trainControl(method = 'cv', number = 5))
loocv <- glm(data_set, tree)$delta[1]
haven't received result, error pop up in glm(data_set, tree) : 'family' not recognized
stop("'family' not recognized")
glm(data_set, tree)
So the syntax for glm is as follows:
glm(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart, offset,
control = list(…), model = TRUE, method = "glm.fit",
x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = NULL, …)
Note that the second argument to this is family. tree is not an acceptable family, the allowed values are:
binomial(link = "logit")
gaussian(link = "identity")
Gamma(link = "inverse")
inverse.gaussian(link = "1/mu^2")
poisson(link = "log")
quasi(link = "identity", variance = "constant")
quasibinomial(link = "logit")
quasipoisson(link = "log")
Were you trying to fit a tree based model like a CART? If so I recommend using a package like tree or rpart to fit this to your data.
I have a dataframe that consist of numerical and non-numerical variables. I am trying to fit a logisic regression model predicting my variable "risk" based on all other variables, optimizing AUC using a 6-fold cross validation.
However, I want to center and scale all numerical explanatory variables. My code raises no errors or warning but somehow I fail to figure out how to tell train() through preProcess (or in some other way) to just center and scale my numerical variables.
Here is the code:
test <- train(risk ~ .,
method = "glm",
data = df,
family = binomial(link = "logit"),
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv",
number = 6,
classProbs = TRUE,
summaryFunction = prSummary),
metric = "AUC")
You could try to preprocess all numerical variables in original df first and then applying train function over scaled df
df <- df %>%
dplyr::mutate_if(is.numeric, scale)
test <- train(risk ~ .,
method = "glm",
data = df,
family = binomial(link = "logit"),
trControl = trainControl(method = "cv",
number = 6,
classProbs = TRUE,
summaryFunction = prSummary),
metric = "AUC")
I am using the following code to fit and test a random forest classification model:
> control <- trainControl(method='repeatedcv',
+ number=5,repeats = 3,
+ search='grid')
> tunegrid <- expand.grid(.mtry = (1:12))
> rf_gridsearch <- train(y = river$stat_bino,
+ x = river[,colnames(river) != "stat_bino"],
+ data = river,
+ method = 'rf',
+ metric = 'Accuracy',
+ ntree = 600,
+ importance = TRUE,
+ tuneGrid = tunegrid, trControl = control)
Note, I am using
train(y = river$stat_bino, x = river[,colnames(river) != "stat_bino"],...
rather than: train(stat_bino ~ .,...
so that my categorical variables will not be turned into dummy variables.
solution here: variable encoding in K-fold validation of random forest using package 'caret')
I would like to extract the FinalModel and use it to make partial dependence plots for my variables (using code below), but I get an error message and don't know how to fix it.
> model1 <- rf_gridsearch$finalModel
> library(pdp)
> partial(model1, pred.var = "MAXCL", type = "classification", which.class = "1", plot =TRUE)
Error in eval(stats::getCall(object)$data) :
..1 used in an incorrect context, no ... to look in
Thanks for any solutions here!
I've been using recipes to pipe into caret::train, which has been going well, but now I've tried some step_transforms, I'm getting the error:
Error in resamples.default(model_list) :
There are different numbers of resamples in each model
when I compare models with and without the transformations. The same code with step_centre and step_scale works fine.
formula <- price ~ carat
model_recipe <- recipe(formula, data = diamonds)
quadratic_model_recipe <- recipe(formula, data = diamonds) %>%
model_list <- list(
linear_model = NULL,
quadratic = NULL
model_list$linear_model <-
model_recipe %>% train(
data = diamonds,
method = "lm",
trControl = trainControl(method = "cv"))
model_list$quadratic_model <-
quadratic_model_recipe %>% train(
data = diamonds,
method = "lm",
trControl = trainControl(method = "cv"))
resamp <- resamples(model_list)
quadratic = NULL should have been quadratic_model = NULL
I'm using the caret package to model and cross validate
model <- caret::train(mpg ~ wt
+ drat
+ disp
+ qsec
+ as.factor(am),
data = mtcars,
method = "lm",
trControl = caret::trainControl(method = "cv",
returnData =FALSE))
However, I'd like to pass the trainControl a custom set of indices relating to my folds. This can be done via IndexOut.
model <- caret::train(wt ~ + disp + drat,
data = mtcars,
method = "lm",
trControl = caret::trainControl(method = "cv",
returnData =FALSE,
index = indicies$train,
indexOut = indicies$test))
What I'm struggling with is that I only want to test on rows in mtcars where the mtcars.am==0. Thus the use of createFolds won't work because you can't add a criterion. Does anyone know of any other functions that allow indexing of rows into K-folds where a criterion of mtcars.am==0 can be added in creating indicies$test?
I think this should work. Just feed the index with the desired row index.
index = list(which(mtcars$am == 0))
model <- caret::train(
wt ~ +disp + drat,
data = mtcars,
method = "lm",
trControl = caret::trainControl(
method = "cv",
returnData = FALSE,
index = index
index argument is a list so you can feed as many iterations as you want to that list by creating multiple nested list in the index.
Thanks for you help. I got there in the end by modifying the output from createFolds not the best example mtcars because it's such a small dataset but you get the idea:
#Create training folds
indicies$train<-lapply(folds,function(x) which(!1:nrow(mtcars) %in% x))
#Create test folds based output "folds" and with criterion added
indicies$test<-lapply(folds,function(x) which(1:nrow(mtcars) %in% x & mtcars[,"am"]==1))