Using the iris dataset, a knn-classifier was tuned with iterative search for the purpose of multiple classification. However, an error is generated, when the macro-weighted version of f_meas (as created by metric_tweak) is used in metric_set.
I would appreciate any ideas.
Thank you so much for your support!
library(tidyverse)
library(tidymodels)
tidymodels_prefer()
# function
f_meas_weighted <- metric_tweak("f_meas_weighted", f_meas, estimator = "macro_weighted")
# workflow
set.seed(2023)
df <- iris
splits <- initial_split(df, strata = Species, prop = 4/5)
df_train <- training(splits)
df_test <- testing(splits)
df_rec <- recipe(Species ~ ., data = df_train)
knn_model <- nearest_neighbor(neighbors = tune()) %>%
set_engine("kknn") %>%
set_mode("classification")
df_wflow <- workflow() %>%
add_model(knn_model) %>%
add_recipe(df_rec)
set.seed(2023)
knn_cv <-
df_wflow %>%
tune_bayes(
# error here
metrics = metric_set(f_meas_weighted),
resamples = vfold_cv(df_train, strata = "Species", v = 2),
control = control_bayes(verbose = TRUE, save_pred = TRUE)
)
❯ Generating a set of 5 initial parameter results
x Fold1: internal:
Error in `metric_set()`:
! Failed to compute `f_meas...
Caused by error in `f_meas.data.f...
! formal argument "estimato...
x Fold2: internal:
Error in `metric_set()`:
! Failed to compute `f_meas...
Caused by error in `f_meas.data.f...
! formal argument "estimato...
✓ Initialization complete
Error in `estimate_tune_results()`:
! All of the models failed. See the .notes column.
Run `rlang::last_error()` to see where the error occurred.
Warning message:
All models failed. Run `show_notes(.Last.tune.result)` for more information.
✖ Optimization stopped prematurely; returning current results.
Related
I have an imbalanced data set and am using the tidymodels framework to build predictive models. To correct for the imbalance, I use the upsampling ROSE algorithm, which has two arguments I'd like to tune, namely over_ratio and minority_prop.
To do so, I specified in the step recipe that each argument =tune()and then I built a CV grid with the corresponding names. However, the minority_pro argument is not recognized when I run the CV search.
# data
set.seed(20)
y <- rbinom(100, 1, 0.1)
X <- MASS::mvrnorm(100, c(1,2), diag(2))
dat <- cbind(y,X)
dat <- data.frame(dat)
dat$y <- as.factor(dat$y)
# define the recipe
my_recipe <-
recipe(y ~ ., data = dat) |>
step_rose(y, over_ratio = tune(), minority_prop = tune(),
skip = TRUE) %>%
step_normalize(all_numeric_predictors(), skip = FALSE)
# MODEL
mod <-
svm_rbf(mode = "classification", cost = tune(),
rbf_sigma = tune()) %>%
set_engine("kernlab")
# set the workflow
svc_workflow <- workflow() %>%
# add the recipe
add_recipe(my_recipe) %>%
# add the model
add_model(mod)
grid_svc <- expand.grid(rbf_sigma = seq(0, 10, 2), cost = seq(0,10,2),
over_ratio = seq(0.5,1.5,0.5), minority_prop = seq(0.5,0.8,0.15))
# cv tuning
doParallel::registerDoParallel()
cv_tuning <- tune_grid(svc_workflow,
resamples = vfold_cv(dat),
grid = grid_svc,
metrics = metric_set(f_meas, precision, recall,
accuracy, pr_auc))
I then receive the following error.
Error in `check_grid()`:
! The provided `grid` has the following parameter columns that have not been marked for tuning by `tune()`: 'minority_prop'.
Run `rlang::last_error()` to see where the error occurred.
I tried tuning only over over_ratio without minority_prop and it worked. What am I doing wrong?
I am using below code to build and predict model using tidymodels. I am fairly new to tidymodels, so may be I am totally wrong in my approach. But here is what the problem is.
When input datatype for test dataset is different from train, I am getting this error. Otherwise, the code works fine(In cases where train and test data structures are identical). I am assuming that the preprocessing step should have tackled this while processing test data.
If anyone knows/encountered this problem. Please let me know the possible solution.
I search for this issue, but haven't found anything of this sort.
Thanks for looking into it.
Code:
library(tidymodels)
library(dplyr)
mt1 <- mtcars ## assume this is the train data
mt2 <- mtcars ## assume this is the test data
mt2$mpg <- as.character(mt2$mpg) ## just forcing them to be character to reproduce the problem in my actual data
mt2$qsec <- as.character(mt2$qsec)
dp_pipe <- recipe(am ~ .,data=mt1) %>%
update_role(cyl,vs,new_role = "drop_vars") %>%
update_role(mpg,
disp,
drat,wt, qsec, new_role="to_numeric") %>%
step_rm(has_role("drop_vars")) %>%
step_mutate_at(has_role(match = "to_numeric"),fn = as.numeric)
# Cross folds
folds = vfold_cv(mt1, v = 10)
# define parameter grid to be tuned
my_grid = tibble(penalty = 10^seq(-2, -1, length.out = 10))
# define lasso model
lasso_mod = linear_reg(mode = "regression",
penalty = tune(),
mixture = 1) %>%
set_engine("glmnet")
# add everything to a workflow
wf = workflow() %>%
add_model(lasso_mod) %>%
add_recipe(dp_pipe)
# tune the workflow
my_res <- wf %>%
tune_grid(resamples = folds,
grid = my_grid,
control = control_grid(verbose = FALSE, save_pred = TRUE),
metrics = metric_set(rmse))
best_mod = my_res %>% select_best("rmse")
best_mod
final_fitted = finalize_workflow(wf, best_mod) %>% fit(data=mt1)
# predicted for train
final_fitted %>%
predict(mt1)
final_fitted %>%
predict(mt2)
Error at my end:
> Error: ! Can't convert `data$mpg` <character> to match type of `mpg`
> <double>. Run `rlang::last_error()` to see where the error occurred.
I am trying to replicate the examples of hyperparameter tuning using Bayesian searching from this site: https://www.r-bloggers.com/2020/05/bayesian-hyperparameters-optimization/ , and when running my code, received the following error: Error: All of the models failed. See the .notes column. Run rlang::last_error() to see where the error occurred.
Here is my current code. The error occurs when running the code starting on the tuned_PI line. Please let me know if you have any suggestions. I am very new to the tidymodels package and hyperparameter tuning.
training_index <- sample(nrow(data)*0.70)
test_index <- setdiff(seq(1:nrow(data)), training_index )
# Get the training data and test data
training_data <- data[training_index, ]
test_data <- data[test_index, ]
model_tune <- rand_forest(mtry = tune(), min_n = tune(), trees = tune()) %>%
set_engine("ranger", seed=222) %>%
set_mode("classification")
set.seed(1234)
folds <- vfold_cv(training_data, v=5, strata = DEATH_EVENT)
tune_wf <- workflow() %>%
add_model(model_tune) %>%
add_formula(DEATH_EVENT~.)
tuned_PI <- tune_wf %>%
tune_bayes(resamples = folds,
param_info=parameters(mtry(range = c(1,10)), min_n(range = c(1,10)), trees(range = c(480,540))),
metrics=metric_set(sensitivity),
objective=prob_improve(trade_off = 0.01))
I am trying to do a lasso regression. It worked previously in my R environment, yet now I get a the error: x Fold1: internal: Error in mapply(FUN = eval_safely, calls, names(calls), SIMPLIFY = FALSE, : object 'eval_safely' not found
# Create a recipe
lr_mod_recipe <- recipe(Lead_week ~ .,
data = Data_train) %>%
step_normalize(MeterDistance) %>%
step_dummy(all_nominal(), -all_outcomes())
# Specify the model
lasso_lr <- linear_reg(penalty = tune(), mixture = 1) %>%
set_engine("glmnet")
# mixture = 1 is lasso, mixture = 0 is ridge
# Create a workflow
lr_mod_workflow <- workflow() %>%
add_model(lasso_lr) %>%
add_recipe(lr_mod_recipe)
lr_mod_workflow
# Set tuning grid
grid_lasso <- tibble(penalty = 10^(seq(from = -3, to = 1, length.out = 100)))
grid_lasso
lasso_tune <- lr_mod_workflow %>%
tune_grid(resamples = cv_folds,
grid = grid_lasso,
metrics = class_metrics)
I get this error after trying to excute my last chunk. I do not know what the issue is, and i'm even more surprised that my lasso regression worked previously, but I doesn't work now as of this error.
I am new in R & getting a bit confused with steps need to be followed in classification task by using tidymodels.
kaggle dataset from: https://www.kaggle.com/c/home-credit-default-risk to perform a Classification on TARGET variable of dataset.
This dataset has both missing & negative values and numeric + categorical data.
ISSUE: Getting error on collect_metrics() after fitting model in step 4
Steps I followed:
library(tidyverse)
library(caret)
library(tidymodels)
After some EDA I have followed below steps:
1. Data Partition - Train / test
set.seed(1234)
data_split <- initial_split(dt1, strata = TARGET)
dt1_train <- training(data_split)
dt1_test <- testing(data_split)
dim(dt1_train)
dim(dt1_test)
########## output ############
[1] 230634 122
[1] 76877 122
2. Recipes
rec2 <- recipe(TARGET ~ ., data = dt1_train) %>%
step_rm(contains("SK_ID_CURR")) %>% # removing id var
step_medianimpute(all_numeric()) %>%
step_modeimpute(all_nominal()) %>%
step_dummy(all_nominal(), - all_outcomes()) %>%
step_range(all_numeric()) %>% # to convert negative numbers into range 0 to 1
step_BoxCox(all_numeric()) %>% # as boxcox transformation works on positive numbers only
step_normalize(all_numeric()) %>%
step_zv(all_numeric()) %>%
step_nzv(all_numeric()) %>%
step_corr(all_numeric())
3. RFE
Sampling - to reduce data size for faster rfe results
dt1_train_baked_sample <- bake(prepd_rec2, new_data = dt1_train) %>% sample_frac(0.05)
dim(dt1_train_baked_sample)
######## output ##########
[1] 11532 72
control <- rfeControl(functions = rfFuncs, method = "cv", verbose = FALSE)
system.time(
RFE_res <- rfe(x = subset(dt1_train_baked_sample, select = -TARGET),
y = dt1_train_baked_sample$TARGET,
sizes = c(7, 15, 20),
rfeControl = control
)
)
RFE_res$optVariables[1:15]
######## output ##########
[1] "EXT_SOURCE_2" "EXT_SOURCE_1"
[3] "DAYS_BIRTH" "AMT_INCOME_TOTAL"
[5] "CODE_GENDER_M" "DAYS_ID_PUBLISH"
[7] "AMT_CREDIT" "REG_CITY_NOT_WORK_CITY"
[9] "CNT_FAM_MEMBERS" "AMT_ANNUITY"
[11] "NAME_EDUCATION_TYPE_Higher.education" "REGION_POPULATION_RELATIVE"
[13] "NAME_EDUCATION_TYPE_Secondary...secondary.special" "REG_CITY_NOT_LIVE_CITY"
[15] "DAYS_REGISTRATION"
4. Model Building
knn_Spec <- nearest_neighbor() %>%
set_engine("kknn") %>%
set_mode("classification")
knn_Spec
knn_fit <- knn_Spec %>%
fit(TARGET ~ EXT_SOURCE_2+EXT_SOURCE_1+DAYS_BIRTH+AMT_INCOME_TOTAL+CODE_GENDER_M+DAYS_ID_PUBLISH+AMT_CREDIT+ REG_CITY_NOT_WORK_CITY+CNT_FAM_MEMBERS+AMT_ANNUITY+NAME_EDUCATION_TYPE_Higher.education+REGION_POPULATION_RELATIVE+NAME_EDUCATION_TYPE_Secondary...secondary.special+REG_CITY_NOT_LIVE_CITY+DAYS_REGISTRATION,
data = dt1_train_baked)
knn_fit
knn_fit %>% collect_metrics()
Error: No `collect_metric()` exists for this type of object
I am not sure how to get results like accuracy, spec, sens & predictions/prob from this.
Also tried below code but that gives an error:
knn_workflow <- workflow() %>%
add_recipe(rec2) %>%
add_model(knn_fit)
Error: `spec` must be a `model_spec`
knn_workflow <- workflow() %>%
add_recipe(rec2) %>%
add_model(knn_Spec)
knn_workflow %>% collect_metrics()
Error: No `collect_metric()` exists for this type of object.