I have a multiclass problem: For example, we can take the dataset mtcars dataset and we want to predict number of cylinders cyl.
data(mtcars)
I want to use xgboost and fit it using the caret package. For that I create grid for hyperparameters using
xgb_grid_param = expand.grid(
nrounds = 1000,
eta = c(0.01, 0.001, 0.0001),
max_depth = c(2, 4),
gamma = 0,
colsample_bytree =1,
min_child_weight =1
)
I can create training control parameters as
xgb_tr_ctrl = trainControl(
method = "cv",
number = 5,
repeats =2,
verboseIter = TRUE,
returnData = FALSE,
returnResamp = "all",
allowParallel = TRUE
)
When I then try to run the train function in caret using:
model <- train(factor(cyl)~., data = mtcars, method = "xgbTree",
trControl = xgb_grid_param, tuneGrid=xgb_grid_param)
I get the error ::
Error in trControl$classProbs && any(classLevels != make.names(classLevels)) :
invalid 'x' type in 'x && y'
How do I fix this error and how do I instruct xgbTree to use mlogloss to optimize the learning.
For another method I could solve "invalid 'x' type in 'x && y'" by setting the label attribute as last column of the data frame / matrix.
Related
I have a dataset off which I have no problem building an xgbTree model without weights, but once I include weights -- even if the weights are just all 1 -- the model doesn't converge. I get the
Something is wrong; all the RMSE metric values are missing: error and when I print the warnings, I get In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, ... :There were missing values in resampled performance measures. as the last message.
This is a drive link to the RData file containing the info -- it was too big to print, and smaller samples didn't always reproduce the error.
It contains 3 objects: input_x, input_y, and wts -- the last one is just a vector of 1s, but it should eventually it should be able to accept numbers on the interval (0,1), ideally. The code I used is shown below. Note the comment next to the weight argument that produces the error.
nrounds<-1000
tune_grid <- expand.grid(
nrounds = seq(from = 200, to = nrounds, by = 50),
eta = c(0.025, 0.05, 0.1, 0.3),
max_depth = c(2, 3, 4, 5),
gamma = 0,
colsample_bytree = 1,
min_child_weight = 1,
subsample = 1
)
tune_control <- caret::trainControl(
method = "cv",
number = 3,
verboseIter = FALSE,
allowParallel = TRUE
)
xgb_tune <- caret::train(
x = input_x,
y = input_y,
weights = wts, # If I remove this line, the code works fine. When included, even if just 1s, it throws an error.
trControl = tune_control,
tuneGrid = tune_grid,
method = "xgbTree",
verbose = TRUE
)
EDIT 13.10.2021. thanks to #waterpolo
The correct way to specify weights is via the weights argument to caret::train
xgb_tune <- caret::train(
x = input_x,
y = input_y,
weights = wts,
trControl = tune_control,
tuneGrid = tune_grid,
method = "xgbTree",
verbose = TRUE
)
see a more verbose answer here: Non-tree model error when using xgbTree method with Caret and weights to target variable when applying the varImp function
Old incorrect answer below:
According to the function source weights argument is called wts.
Line:
if (!is.null(wts))
xgboost::setinfo(x, 'weight', wts)
Running
xgb_tune <- caret::train(
x = input_x,
y = input_y,
wts = wts,
trControl = tune_control,
tuneGrid = tune_grid,
method = "xgbTree",
verbose = TRUE
)
should produce the desired result.
Just wanted to add #missuse response from another post (Non-tree model error when using xgbTree method with Caret and weights to target variable when applying the varImp function). The correct argument is weights .
Code:
xgb_tune <- caret::train(x = input_x,
y = input_y,
weights = wts,
trControl = tune_control,
tuneGrid = tune_grid,
method = "xgbTree",
verbose = TRUE
)
The other thing that I found was that I needed to use weights > 1 or I would receive the same error message as you. For example, if I used inverse weighting I would receive the same message as you. Hope this helps.
Thanks #missuse for the lovely response in the other thread!
I'm working on tuning parameters for a neural network exercise on the Boston dataset. I have been getting a persistent error:
Error: The tuning parameter grid should have columns size, decay
The following is the set up of my Caret tuning:
caret_control <- trainControl(method = "repeatedcv",
number = 10,
repeats = 3)
caret_grid <- expand.grid(batch_size=seq(60,120,20),
dropout=0.5,
size=100,
decay = 0,
lr=2e-6,
activation = "relu")
caret_t <- train(medv ~ ., data = chasRad,
method = "nnet",
metric="RMSE",
trControl = caret_control,
tuneGrid = caret_grid,
verbose = FALSE)
Here chasRad is a 12x506 matrix. Could anyone help on fixing the error that seems triggered by the expanded grid?
The error you're getting should be interpreted as:
"The tuning parameter grid should ONLY have columns size, decay".
You're passing in four additional parameters that nnet can't tune in caret. For a full list of parameters that are tunable, run modelLookup(model = 'nnet').
To tune only size and decay, replace your caret_grid with:
caret_grid <- expand.grid(size=seq(from = 1, to = 10, by = 1),
decay = seq(from = 0.1, to = 0.5, by = 0.1))
and your code will run.
I am using caret for modeling using "xgboost"
1- However, I get following error :
"Error: The tuning parameter grid should have columns nrounds,
max_depth, eta, gamma, colsample_bytree, min_child_weight, subsample"
The code:
library(caret)
library(doParallel)
library(dplyr)
library(pROC)
library(xgboost)
## Create train/test indexes
## preserve class indices
set.seed(42)
my_folds <- createFolds(train_churn$churn, k = 10)
# Compare class distribution
i <- my_folds$Fold1
table(train_churn$churn[i]) / length(i)
my_control <- trainControl(
summaryFunction = twoClassSummary,
classProbs = TRUE,
verboseIter = TRUE,
savePredictions = TRUE,
index = my_folds
)
my_grid <- expand.grid(nrounds = 500,
max_depth = 7,
eta = 0.1,
gammma = 1,
colsample_bytree = 1,
min_child_weight = 100,
subsample = 1)
set.seed(42)
model_xgb <- train(
class ~ ., data = train_churn,
metric = "ROC",
method = "xgbTree",
trControl = my_control,
tuneGrid = my_grid)
2- I also want to get a prediction made by averaging the predictions made by using the model fitted for each fold.
I know it's 'tad' bit late but, check your spelling of gamma in the grid of tuning parameters. You misspelled it as gammma (with triple m's).
I am trying tune Hyperparametes of xgboost for a classification problem, using caret library, As there were a lot of factors in my data set and xgboost likes data as numerical, I created a dummy rows using Feature Hashing, but when I get to run caret train , I get an error
#Using Feature hashing to convert all the factor variables to dummies
objTrain_hashed = hashed.model.matrix(~., data=train1[,-27], hash.size=2^15, transpose=FALSE)
#created a dense matrix which is normally accepted by xgboost method in R
#Hoping I could pass it caret as well
dmodel <- xgb.DMatrix(objTrain_hashed[, ], label = train1$Walc)
xgb_grid_1 = expand.grid(
nrounds = 500,
max_depth = c(5, 10, 15),
eta = c(0.01, 0.001, 0.0001),
gamma = c(1, 2, 3),
colsample_bytree = c(0.4, 0.7, 1.0),
min_child_weight = c(0.5, 1, 1.5)
)
xgb_trcontrol_1 = trainControl(
method = "cv",
number = 3,
verboseIter = TRUE,
returnData = FALSE,
returnResamp = "all", # save losses across all models
classProbs = TRUE, # set to TRUE for AUC to be computed
summaryFunction = twoClassSummary,
allowParallel = TRUE
)
xgb_train1 <- train(Walc ~.,dmodel,method = 'xgbTree',trControl = xgb_trcontrol_1,
metric = 'accuracy',tunegrid = xgb_grid_1)
I am getting the following error
Error in as.data.frame.default(data) :
cannot coerce class ""xgb.DMatrix"" to a data.frame
Any suggestions, on how I can proceed ?
This is because you are inputting dmodel into the last part of your code. Try inputting objTrain_hashed, which is a matrix, and not an xgb.DMatrix
How about sparse.model.matrix() instead of hashed.model.matrix...
It works on my PC...
and don't transform to xgb.DMatrix()
put it in train() function just mere sparse.model.matrix() form.
like...
model_data <- sparse.model.matrix(Y~., raw_data)
and
xgb_train1 <- train(Y ~.,model_data, <bla bla> ...)
Wish it works... thank you.
I want to use train caret function to investigate xgboost results
#open file with train data
trainy <- read.csv('')
# open file with test data
test <- read.csv('')
# we dont need ID column
##### Removing IDs
trainy$ID <- NULL
test.id <- test$ID
test$ID <- NULL
##### Extracting TARGET
trainy.y <- trainy$TARGET
trainy$TARGET <- NULL
# set up the cross-validated hyper-parameter search
xgb_grid_1 = expand.grid(
nrounds = 1000,
eta = c(0.01, 0.001, 0.0001),
max_depth = c(2, 4, 6, 8, 10),
gamma = 1
)
# pack the training control parameters
xgb_trcontrol_1 = trainControl(
method = "cv",
number = 5,
verboseIter = TRUE,
returnData = FALSE,
returnResamp = "all", # save losses across all models
classProbs = TRUE, # set to TRUE for AUC to be computed
summaryFunction = twoClassSummary,
allowParallel = TRUE
)
# train the model for each parameter combination in the grid,
# using CV to evaluate
xgb_train_1 = train(
x = as.matrix(trainy),
y = as.factor(trainy.y),
trControl = xgb_trcontrol_1,
tuneGrid = xgb_grid_1,
method = "xgbTree"
)
I see this error
Error in train.default(x = as.matrix(trainy), y = as.factor(trainy.y), trControl = xgb_trcontrol_1, :
At least one of the class levels is not a valid R variable name;
I have looked at other cases but still cant understand what I should change? R is quite different from Python for me for now
As I can see I should do something with y classes variable, but what and how exactly ? Why didnt as.factor function work?
I solved this issue, hope it will help to all novices
I needed to transofm all data to factor type in the way like
trainy[] <- lapply(trainy, factor)