Error: The tuning parameter grid should have columns parameter - r

I try to run a 10 fold lasso regression by using R, but when I run the tuneGrid, it shows this error and I don't know how to fix it. Here is my code:
ctrlspecs<-trainControl(method="cv",number=10, savePredictions="all", classProb=TRUE)
lambdas<-c(seq(0,2,length=3))
foldlasso<-train(y1~x1,data=train_dat, method="glm", mtryGrid=expand.grid(alpha=1,lambda=lambdas),
trControl=ctrlspecs,tuneGrid=expand.grid(.alpha=1,.lambda=lambdas),na.action=na.omit)

Clean your code!!!
ctrlspecs <-
trainControl(
method = "cv",
number = 10,
savePredictions = "all",
classProb = TRUE
)
lambdas <- c(seq(0, 2, length = 3))
foldlasso <-
train(
y1~x1,
data=train_dat,
method = "glm",
mtryGrid = expand.grid(alpha = 1, lambda = lambdas),
trControl = ctrlspecs,
na.action = na.omit
)

Related

How to adjust ggplot axis in R package "caret" for resample class?

I have two models trained by R package caret , and I'd like to compare their performance. The "resample class" works with ggplot , however, an error occurs when I try to adjust the x-axis: Error: Discrete value supplied to continuous scale. Thanks for any help.
library(caret)
data("mtcars")
mydata = mtcars[, -c(8,9)]
set.seed(100)
model_rf <- train(
hp ~ .,
data = mydata,
tuneLength = 5,
method = "ranger",
metric = "RMSE",
preProcess = c('center', 'scale'),
trControl = trainControl(
method = "repeatedcv",
number = 5,
repeats = 5,
verboseIter = TRUE,
savePredictions = "final"
)
)
model_rp <- train(
hp ~ .,
data = mydata,
method = "rpart",
metric = "RMSE",
preProcess = c('center', 'scale'),
trControl = trainControl(
method = "repeatedcv",
number = 5,
repeats = 5,
verboseIter = TRUE,
savePredictions = "final"
)
)
Resamples <- resamples(list("RF" = model_rf, "RP" = model_rp))
ggplot(Resamples, metric = "RMSE")
ggplot(Resamples, metric = "RMSE") + scale_x_continuous(limits = c(0,60), breaks = seq(0,60,10))
## Error: Discrete value supplied to continuous scale
If you change scale_x_continuous to scale_y_continuous, the error goes away like
ggplot(Resamples, metric = "RMSE") +
scale_y_continuous(limits = c(0,60), breaks = seq(0,60,10))

Error: The tuning parameter grid should have columns mtry

I'm trying to train a random forest model using caret in R. I want to tune the parameters to get the best values, using the expand.grid function. However, I keep getting this error:
Error: The tuning parameter grid should have columns mtry
This is my code. The data I use here is called scoresWithResponse:
ctrlCV = trainControl(method = 'cv', number = 10 , classProbs = TRUE , savePredictions = TRUE, summaryFunction = twoClassSummary )
rfGRID = expand.grid(interaction.depth = c(2, 3, 5, 6, 7, 8, 10),
n.trees = c(50,75,100,125,150,200,250),
shrinkage = seq(.005, .2,.005),
n.minobsinnode = c(5,7,10, 12 ,15, 20),
nodesize = c(1:10),
mtry = c(1:10))
RF_loop_trn = c()
RF_loop_tst = c()
for(i in (1:5)){
print(i)
IND = createDataPartition(y = scoresWithResponse$response, p=0.75, list = FALSE)
scoresWithResponse.trn = scoresWithResponse[IND, ]
scoresWithResponse.tst = scoresWithResponse[-IND,]
rfFit = train(response~., data = scoresWithResponse.trn,
importance = TRUE,
method = "rf",
metric="ROC",
trControl = ctrlCV,
tuneGrid = rfGRID,
classProbs = TRUE,
summaryFunction = twoClassSummary
)
RF_loop_trn[i] = auc(roc(scoresWithResponse.trn$response,predict(rfFit,scoresWithResponse.trn, type='prob')[,1]))
RF_loop_tst[i] = ahaveroc(scoresWithResponse.tst$response,predict(rfFit,scoresWithResponse.tst, type='prob')[,1]))
}
After investigating for some time, there has been several suggestions like redownloading the caret package from github, adding a . before each parameter in expand.grid, adding a dot only before the mtry parameter (something like .mtry), adding the mtry to the train function instead expand.grid.. I tried all that and they all produce the same error.
Where and how should I add the mtry parameter? what is causing this error?
I can't comment yet so I'll reply
Have you tried to insert the mtry argument directly into the train function? for example:
rfGRID = expand.grid(interaction.depth = c(2, 3, 5, 6, 7, 8, 10),
n.trees = c(50,75,100,125,150,200,250),
shrinkage = seq(.005, .2,.005),
n.minobsinnode = c(5,7,10, 12 ,15, 20),
nodesize = c(1:10)
)
rfFit = train(response~., data = scoresWithResponse.trn,
importance = TRUE,
method = "rf",
metric="ROC",
trControl = ctrlCV,
tuneGrid = rfGRID,
classProbs = TRUE,
summaryFunction = twoClassSummary,
mtry = 1000
)

Error with caret and summaryFunction mnLogLoss: columns consistent with 'lev'

I'm trying to use log loss as loss function for training with Caret, using the data from the Kobe Bryant shot selection competition of Kaggle.
This is my script:
library(caret)
data <- read.csv("./data.csv")
data$shot_made_flag <- factor(data$shot_made_flag)
data$team_id <- NULL
data$team_name <- NULL
train_data_kaggle <- data[!is.na(data$shot_made_flag),]
test_data_kaggle <- data[is.na(data$shot_made_flag),]
inTrain <- createDataPartition(y=train_data_kaggle$shot_made_flag,p=.8,list=FALSE)
train <- train_data_kaggle[inTrain,]
test <- train_data_kaggle[-inTrain,]
folds <- createFolds(train$shot_made_flag, k = 10)
ctrl <- trainControl(method = "repeatedcv", index = folds, repeats = 3, summaryFunction = mnLogLoss)
res <- train(shot_made_flag~., data = train, method = "gbm", preProc = c("zv", "center", "scale"), trControl = ctrl, metric = "logLoss", verbose = FALSE)
And this is the traceback of the error:
7: stop("'data' should have columns consistent with 'lev'")
6: ctrl$summaryFunction(testOutput, lev, method)
5: evalSummaryFunction(y, wts = weights, ctrl = trControl, lev = classLevels,
metric = metric, method = method)
4: train.default(x, y, weights = w, ...)
3: train(x, y, weights = w, ...)
2: train.formula(shot_made_flag ~ ., data = train, method = "gbm",
preProc = c("zv", "center", "scale"), trControl = ctrl, metric = "logLoss",
verbose = FALSE)
1: train(shot_made_flag ~ ., data = train, method = "gbm", preProc = c("zv",
"center", "scale"), trControl = ctrl, metric = "logLoss",
verbose = FALSE)
When I use defaultFunction as summaryFunction and no metric specified in train, it works, but it doesn't with mnLogLoss. I'm guessing it is expecting the data in a different format than what I am passing, but I can't find where the error is.
From the help file for defaultSummary:
To use twoClassSummary and/or mnLogLoss, the classProbs argument of trainControl should be TRUE. multiClassSummary can be used without class probabilities but some statistics (e.g. overall log loss and the average of per-class area under the ROC curves) will not be in the result set.
Therefore, I think you need to change your trainControl() to the following:
ctrl <- trainControl(method = "repeatedcv", index = folds, repeats = 3, summaryFunction = mnLogLoss, classProbs = TRUE)
If you do this and run your code you will get the following error:
Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to X0, X1 . Please use factor levels that can be used as valid R variable names (see ?make.names for help).
You just need to change the 0/1 levels of shot_made_flag to something that can be a valid R variable name:
data$shot_made_flag <- ifelse(data$shot_made_flag == 0, "miss", "made")
With the above changes your code will look like this:
library(caret)
data <- read.csv("./data.csv")
data$shot_made_flag <- ifelse(data$shot_made_flag == 0, "miss", "made")
data$shot_made_flag <- factor(data$shot_made_flag)
data$team_id <- NULL
data$team_name <- NULL
train_data_kaggle <- data[!is.na(data$shot_made_flag),]
test_data_kaggle <- data[is.na(data$shot_made_flag),]
inTrain <- createDataPartition(y=train_data_kaggle$shot_made_flag,p=.8,list=FALSE)
train <- train_data_kaggle[inTrain,]
test <- train_data_kaggle[-inTrain,]
folds <- createFolds(train$shot_made_flag, k = 3)
ctrl <- trainControl(method = "repeatedcv", classProbs = TRUE, index = folds, repeats = 3, summaryFunction = mnLogLoss)
res <- train(shot_made_flag~., data = train, method = "gbm", preProc = c("zv", "center", "scale"), trControl = ctrl, metric = "logLoss", verbose = FALSE)

R - Caret RFE gives "task 1 failed - Stopping" error when using pickSizeBest

I am using Caret R package to train an SVM modell. My code is as follows:
options(show.error.locations = TRUE)
svmTrain <- function(svmType, subsetSizes, data, seeds, metric){
svmFuncs$summary <- function(...) c(twoClassSummary(...), defaultSummary(...), prSummary(...))
data_x <- data.frame(data[,2:ncol(data)])
data_y <- unlist(data[,1])
FSctrl <- rfeControl(method = "cv",
number = 10,
rerank = TRUE,
verbose = TRUE,
functions = svmFuncs,
saveDetails = TRUE,
seeds = seeds
)
TRctrl <- trainControl(method = "cv",
savePredictions = TRUE,
classProbs = TRUE,
verboseIter = TRUE,
sampling = "down",
number = 10,
search = "random",
repeats = 3,
returnResamp = "all",
allowParallel = TRUE
)
svmProf <- rfe( x = data_x,
y = data_y,
sizes = subsetSizes,
metric = metric,
rfeControl = FSctrl,
method = svmType,
preProc = c("center", "scale"),
trControl = TRctrl,
selectSize = pickSizeBest(data, metric = "AUC", maximize = TRUE),
tuneLength = 5
)
}
data1a = openTable(3, 'a')
data1b = openTable(3, 'b')
data = rbind(data1a, data1b)
last <- roundToTens(ncol(data)-1)
subsetSizes <- c( 3:9, seq(10, last, 10) )
svmTrain <- svmTrain("svmRadial", subsetSizes, data, seeds, "AUC")
When I comment out pickSizeBest row, the algorithm runs fine. However, when I do not comment, it gives the following error:
Error in { (from svm.r#58) : task 1 failed - "Stopping"
Row 58 is svmProf <- rfe( x = data_x,..
I tried to look up if I use pickSizeBest the wrong way, but I cannot find the problem. Could somebody help me?
Many thanks!
EDIT: I just realized that pickSizeBest (data, ...) should not use data. However, I still do not know what should be add there.
I can't run your example, but I would suggest that you just pass the function pickSizeBest, i.e.:
[...]
trControl = TRctrl,
selectSize = pickSizeBest,
tuneLength = 5
[...]
The functionality is described here:
http://topepo.github.io/caret/recursive-feature-elimination.html#backwards-selection

R {caret} trainControl(index = createFolds()) changes default tune parameters?

Have spent a lot of time trying to read the {caret} documentation (here, here, and here) as well as the {randomForest} documentation. Just not understanding what's going on here.
When I run the following code with the parameter index = createFolds() in trainControl, the values tried of mtry are 2, 28, and 54:
# Specify fit parameters
set.seed(55555)
fc = trainControl(method = "cv", returnResamp = "all", verboseIter = T,
index = createFolds(fct.og$cover[trn.idx]))
# randomForest Model 1
set.seed(55555)
rf.M1 = train(x = fct.og[trn.idx, -55],
y = fct.og[trn.idx, 55],
method = "rf", trControl = fc,
preProcess = c("nzv", "center", "scale"),
verbose = T)
However, when I don't specify the parameter index = createFolds() in trainControl(), the values tried of mtry are 2, 12, and 22:
# Specify fit parameters
set.seed(55555)
fc = trainControl(method = "cv", returnResamp = "all", verboseIter = T)
# randomForest Model 1
set.seed(55555)
rf.M1 = train(x = fct.og[trn.idx, -55],
y = fct.og[trn.idx, 55],
method = "rf", trControl = fc,
preProcess = c("nzv", "center", "scale"),
verbose = T)
I know by default, the train(tuneLength = 3), so I get why only three values are used. Just not understanding what's going on behind the scenes that causes the values of mtry to differ. I'm sure this is user error and I'm missing something obvious in the documentation.
One last question: the model runs a lot faster with the parameter trainControl(index = createFolds()) code. Is there a reason why? It seems the default value of trainControl(method = "cv", number = 10) is the same as the default value of createFolds(k = 10)?

Resources