incorrect number of probabilities in R" ERROR - r

when using gafs() (feature selection in caret), I always get the error when I apply the following codes to the dataset. My dataset is a large dataset. I don't get this error when I apply only 500 of them. This error comes up when I apply it all.
CODES:
x_chembl_R_gafs_cont <- gafsControl(
functions = rfGA,
method = "cv",
repeats = 5,
number = 5,
verbose = TRUE,
genParallel = TRUE,
allowParallel = TRUE
)
x_chembl_r_gafs <- system.time(
x_chembl_r_result <- gafs(
x = x_chembl_R_tr_data,
y = y_chembl_R_tr_data,
iters = 25,
method = "lm",
gafsControl = x_chembl_R_gafs_cont
)
)
ERROR:
Fold03 1 8875.134 (116)
Error in { : task 1 failed - "incorrect number of probabilities"
How can I solve this problem?

Related

R script error : attempt to apply non-function

I am trying to code the Gravitational Search Algorithm in R language to tune Xgboost, but I am facing an error: Error in xgb_model$set_params(as.list(particle_positions[i, ])) : attempt to apply non-function
The error appears when I try to evaluate the initial position:
# Evaluate the initial particle positions
for (i in 1:n_particles) {
xgb_model$set_params(as.list(particle_positions[i, ]))
resampling <- trainControl(method = "repeatedcv", number = 5, repeats = 5, verboseIter = FALSE)
model_fit <- train(
x = as.matrix(train[, -15]), y = train[, 15],
method = "xgbTree", trControl = resampling,
metric = "Accuracy", tuneLength = 0,
maximize = TRUE
)
best_positions[i, ] <- particle_positions[i, ]
best_values[i] <- model_fit$results[1, "Accuracy"]
}
Any idea what I am doing wrong?

Error in data$update_params(params = params) : [LightGBM] [Fatal] Cannot change max_bin after constructed Dataset handle

I downloaded lightgbm package on RStudio and trying to run a model with it.
The script based on the Retip.
The function is this :
> fit.lightgbm
function (training, testing)
{
train <- as.matrix(training)
test <- as.matrix(testing)
coltrain <- ncol(train)
coltest <- ncol(test)
dtrain <- lightgbm::lgb.Dataset(train[, 2:coltrain], label = train[,
1])
lightgbm::lgb.Dataset.construct(dtrain)
dtest <- lightgbm::lgb.Dataset.create.valid(dtrain, test[,2:coltest], label = test[, 1])
valids <- list(test = dtest)
params <- list(objective = "regression", metric = "rmse")
modelcv <- lightgbm::lgb.cv(params, dtrain, nrounds = 5000,
nfold = 10, valids, verbose = 1, early_stopping_rounds = 1000,
record = TRUE, eval_freq = 1L, stratified = TRUE, max_depth = 4,
max_leaf = 20, max_bin = 50)
best.iter <- modelcv$best_iter
params <- list(objective = "regression_l2", metric = "rmse")
model <- lightgbm::lgb.train(params, dtrain, nrounds = best.iter,
valids, verbose = 0, early_stopping_rounds = 1000, record = TRUE,
eval_freq = 1L, max_depth = 4, max_leaf = 20, max_bin = 50)
print(paste0("End training"))
return(model)
}
However when I'm trying to run the function as in the Retip
lightgbm <- fit.lightgbm(training,testing)
There is this Fatal Error:
Error in data$update_params(params = params) :
[LightGBM] [Fatal] Cannot change max_bin after constructed Dataset handle.
Only when changing max_bin to max_bin=255 there is no error.
Went through documentation:
What is the right way for hyper parameter tuning for LightGBM classification? #1339
[Python] max_bin weird behaviour #1053
Any ideas\suggestions to what should be done?
This was cross-posted to https://github.com/microsoft/LightGBM/issues/4019 and has been answered there.
Construction of the Dataset object in LightGBM handles some important pre-processing steps (see this prior answer) that happen before training, and none of the Dataset parameters can be changed after construction.
Passing max_bin=50 into lgb.Dataset() instead of lgb.cv() / lgb.train() in the original post's code will result in successful training without this error.

How to prevent "algorithm did not converge" errors in neuralnet / Caret / R?

I am trying to train a neural network using train function and neuralnet as my method paramater to predict times table.
I am scaling my training data set as well.
Even though I've tried different learningrates, stepmaxes, and thresholds for my neuralnet, each time I tried to train the network using train function one of the k-folds happened to fail every time saying
1: Algorithm did not converge in 1 of 1 repetition(s) within the stepmax.
2: predictions failed for Fold05.Rep1: layer1=8, layer2=0, layer3=0 Error in cbind(1, pred) %*% weights[[num_hidden_layers + 1]] :
requires numeric/complex matrix/vector arguments
I am guessing that this is because of weights being random so somehow each time I happen to get some weights that are not going to converge.
Is there anyway of preventing this? Maybe trying to re-train the particular fold which has failed using different weights?
Here is my code:
library(caret)
library(neuralnet)
# Create the dataset
tt = data.frame(multiplier = rep(1:10, times = 10), multiplicand = rep(1:10, each = 10))
tt = cbind(tt, data.frame(product = tt$multiplier * tt$multiplicand))
# Splitting
indexes = createDataPartition(tt$product,
times = 1,
p = 0.7,
list = FALSE)
tt.train = tt[indexes,]
tt.test = tt[-indexes,]
# Pre-process
preProc <- preProcess(tt, method = c('center', 'scale'))
tt.preProcessed <- predict(preProc, tt)
tt.preProcessed.train <- tt.preProcessed[indexes,]
tt.preProcessed.test <- tt.preProcessed[-indexes,]
# Train
train.control <- trainControl(method = "repeatedcv",
number = 10,
repeats = 3)
tune.grid <- expand.grid(layer1 = 8,
layer2 = 0,
layer3 = 0)
tt.cv <- train(product ~ .,
data = tt.preProcessed.train,
method = 'neuralnet',
tuneGrid = tune.grid,
trControl = train.control,
linear.output = TRUE,
algorithm = 'backprop',
learningrate = 0.01,
stepmax = 500000,
lifesign = 'minimal',
threshold = 0.01)

R - Caret RFE gives "task 1 failed - Stopping" error when using pickSizeBest

I am using Caret R package to train an SVM modell. My code is as follows:
options(show.error.locations = TRUE)
svmTrain <- function(svmType, subsetSizes, data, seeds, metric){
svmFuncs$summary <- function(...) c(twoClassSummary(...), defaultSummary(...), prSummary(...))
data_x <- data.frame(data[,2:ncol(data)])
data_y <- unlist(data[,1])
FSctrl <- rfeControl(method = "cv",
number = 10,
rerank = TRUE,
verbose = TRUE,
functions = svmFuncs,
saveDetails = TRUE,
seeds = seeds
)
TRctrl <- trainControl(method = "cv",
savePredictions = TRUE,
classProbs = TRUE,
verboseIter = TRUE,
sampling = "down",
number = 10,
search = "random",
repeats = 3,
returnResamp = "all",
allowParallel = TRUE
)
svmProf <- rfe( x = data_x,
y = data_y,
sizes = subsetSizes,
metric = metric,
rfeControl = FSctrl,
method = svmType,
preProc = c("center", "scale"),
trControl = TRctrl,
selectSize = pickSizeBest(data, metric = "AUC", maximize = TRUE),
tuneLength = 5
)
}
data1a = openTable(3, 'a')
data1b = openTable(3, 'b')
data = rbind(data1a, data1b)
last <- roundToTens(ncol(data)-1)
subsetSizes <- c( 3:9, seq(10, last, 10) )
svmTrain <- svmTrain("svmRadial", subsetSizes, data, seeds, "AUC")
When I comment out pickSizeBest row, the algorithm runs fine. However, when I do not comment, it gives the following error:
Error in { (from svm.r#58) : task 1 failed - "Stopping"
Row 58 is svmProf <- rfe( x = data_x,..
I tried to look up if I use pickSizeBest the wrong way, but I cannot find the problem. Could somebody help me?
Many thanks!
EDIT: I just realized that pickSizeBest (data, ...) should not use data. However, I still do not know what should be add there.
I can't run your example, but I would suggest that you just pass the function pickSizeBest, i.e.:
[...]
trControl = TRctrl,
selectSize = pickSizeBest,
tuneLength = 5
[...]
The functionality is described here:
http://topepo.github.io/caret/recursive-feature-elimination.html#backwards-selection

R not valid variable name for caret function

I want to use train caret function to investigate xgboost results
#open file with train data
trainy <- read.csv('')
# open file with test data
test <- read.csv('')
# we dont need ID column
##### Removing IDs
trainy$ID <- NULL
test.id <- test$ID
test$ID <- NULL
##### Extracting TARGET
trainy.y <- trainy$TARGET
trainy$TARGET <- NULL
# set up the cross-validated hyper-parameter search
xgb_grid_1 = expand.grid(
nrounds = 1000,
eta = c(0.01, 0.001, 0.0001),
max_depth = c(2, 4, 6, 8, 10),
gamma = 1
)
# pack the training control parameters
xgb_trcontrol_1 = trainControl(
method = "cv",
number = 5,
verboseIter = TRUE,
returnData = FALSE,
returnResamp = "all", # save losses across all models
classProbs = TRUE, # set to TRUE for AUC to be computed
summaryFunction = twoClassSummary,
allowParallel = TRUE
)
# train the model for each parameter combination in the grid,
# using CV to evaluate
xgb_train_1 = train(
x = as.matrix(trainy),
y = as.factor(trainy.y),
trControl = xgb_trcontrol_1,
tuneGrid = xgb_grid_1,
method = "xgbTree"
)
I see this error
Error in train.default(x = as.matrix(trainy), y = as.factor(trainy.y), trControl = xgb_trcontrol_1, :
At least one of the class levels is not a valid R variable name;
I have looked at other cases but still cant understand what I should change? R is quite different from Python for me for now
As I can see I should do something with y classes variable, but what and how exactly ? Why didnt as.factor function work?
I solved this issue, hope it will help to all novices
I needed to transofm all data to factor type in the way like
trainy[] <- lapply(trainy, factor)

Resources