I am trying to do a lasso regression. It worked previously in my R environment, yet now I get a the error: x Fold1: internal: Error in mapply(FUN = eval_safely, calls, names(calls), SIMPLIFY = FALSE, : object 'eval_safely' not found
# Create a recipe
lr_mod_recipe <- recipe(Lead_week ~ .,
data = Data_train) %>%
step_normalize(MeterDistance) %>%
step_dummy(all_nominal(), -all_outcomes())
# Specify the model
lasso_lr <- linear_reg(penalty = tune(), mixture = 1) %>%
set_engine("glmnet")
# mixture = 1 is lasso, mixture = 0 is ridge
# Create a workflow
lr_mod_workflow <- workflow() %>%
add_model(lasso_lr) %>%
add_recipe(lr_mod_recipe)
lr_mod_workflow
# Set tuning grid
grid_lasso <- tibble(penalty = 10^(seq(from = -3, to = 1, length.out = 100)))
grid_lasso
lasso_tune <- lr_mod_workflow %>%
tune_grid(resamples = cv_folds,
grid = grid_lasso,
metrics = class_metrics)
I get this error after trying to excute my last chunk. I do not know what the issue is, and i'm even more surprised that my lasso regression worked previously, but I doesn't work now as of this error.
Related
Using the iris dataset, a knn-classifier was tuned with iterative search for the purpose of multiple classification. However, an error is generated, when the macro-weighted version of f_meas (as created by metric_tweak) is used in metric_set.
I would appreciate any ideas.
Thank you so much for your support!
library(tidyverse)
library(tidymodels)
tidymodels_prefer()
# function
f_meas_weighted <- metric_tweak("f_meas_weighted", f_meas, estimator = "macro_weighted")
# workflow
set.seed(2023)
df <- iris
splits <- initial_split(df, strata = Species, prop = 4/5)
df_train <- training(splits)
df_test <- testing(splits)
df_rec <- recipe(Species ~ ., data = df_train)
knn_model <- nearest_neighbor(neighbors = tune()) %>%
set_engine("kknn") %>%
set_mode("classification")
df_wflow <- workflow() %>%
add_model(knn_model) %>%
add_recipe(df_rec)
set.seed(2023)
knn_cv <-
df_wflow %>%
tune_bayes(
# error here
metrics = metric_set(f_meas_weighted),
resamples = vfold_cv(df_train, strata = "Species", v = 2),
control = control_bayes(verbose = TRUE, save_pred = TRUE)
)
❯ Generating a set of 5 initial parameter results
x Fold1: internal:
Error in `metric_set()`:
! Failed to compute `f_meas...
Caused by error in `f_meas.data.f...
! formal argument "estimato...
x Fold2: internal:
Error in `metric_set()`:
! Failed to compute `f_meas...
Caused by error in `f_meas.data.f...
! formal argument "estimato...
✓ Initialization complete
Error in `estimate_tune_results()`:
! All of the models failed. See the .notes column.
Run `rlang::last_error()` to see where the error occurred.
Warning message:
All models failed. Run `show_notes(.Last.tune.result)` for more information.
✖ Optimization stopped prematurely; returning current results.
I have an imbalanced data set and am using the tidymodels framework to build predictive models. To correct for the imbalance, I use the upsampling ROSE algorithm, which has two arguments I'd like to tune, namely over_ratio and minority_prop.
To do so, I specified in the step recipe that each argument =tune()and then I built a CV grid with the corresponding names. However, the minority_pro argument is not recognized when I run the CV search.
# data
set.seed(20)
y <- rbinom(100, 1, 0.1)
X <- MASS::mvrnorm(100, c(1,2), diag(2))
dat <- cbind(y,X)
dat <- data.frame(dat)
dat$y <- as.factor(dat$y)
# define the recipe
my_recipe <-
recipe(y ~ ., data = dat) |>
step_rose(y, over_ratio = tune(), minority_prop = tune(),
skip = TRUE) %>%
step_normalize(all_numeric_predictors(), skip = FALSE)
# MODEL
mod <-
svm_rbf(mode = "classification", cost = tune(),
rbf_sigma = tune()) %>%
set_engine("kernlab")
# set the workflow
svc_workflow <- workflow() %>%
# add the recipe
add_recipe(my_recipe) %>%
# add the model
add_model(mod)
grid_svc <- expand.grid(rbf_sigma = seq(0, 10, 2), cost = seq(0,10,2),
over_ratio = seq(0.5,1.5,0.5), minority_prop = seq(0.5,0.8,0.15))
# cv tuning
doParallel::registerDoParallel()
cv_tuning <- tune_grid(svc_workflow,
resamples = vfold_cv(dat),
grid = grid_svc,
metrics = metric_set(f_meas, precision, recall,
accuracy, pr_auc))
I then receive the following error.
Error in `check_grid()`:
! The provided `grid` has the following parameter columns that have not been marked for tuning by `tune()`: 'minority_prop'.
Run `rlang::last_error()` to see where the error occurred.
I tried tuning only over over_ratio without minority_prop and it worked. What am I doing wrong?
I am using below code to build and predict model using tidymodels. I am fairly new to tidymodels, so may be I am totally wrong in my approach. But here is what the problem is.
When input datatype for test dataset is different from train, I am getting this error. Otherwise, the code works fine(In cases where train and test data structures are identical). I am assuming that the preprocessing step should have tackled this while processing test data.
If anyone knows/encountered this problem. Please let me know the possible solution.
I search for this issue, but haven't found anything of this sort.
Thanks for looking into it.
Code:
library(tidymodels)
library(dplyr)
mt1 <- mtcars ## assume this is the train data
mt2 <- mtcars ## assume this is the test data
mt2$mpg <- as.character(mt2$mpg) ## just forcing them to be character to reproduce the problem in my actual data
mt2$qsec <- as.character(mt2$qsec)
dp_pipe <- recipe(am ~ .,data=mt1) %>%
update_role(cyl,vs,new_role = "drop_vars") %>%
update_role(mpg,
disp,
drat,wt, qsec, new_role="to_numeric") %>%
step_rm(has_role("drop_vars")) %>%
step_mutate_at(has_role(match = "to_numeric"),fn = as.numeric)
# Cross folds
folds = vfold_cv(mt1, v = 10)
# define parameter grid to be tuned
my_grid = tibble(penalty = 10^seq(-2, -1, length.out = 10))
# define lasso model
lasso_mod = linear_reg(mode = "regression",
penalty = tune(),
mixture = 1) %>%
set_engine("glmnet")
# add everything to a workflow
wf = workflow() %>%
add_model(lasso_mod) %>%
add_recipe(dp_pipe)
# tune the workflow
my_res <- wf %>%
tune_grid(resamples = folds,
grid = my_grid,
control = control_grid(verbose = FALSE, save_pred = TRUE),
metrics = metric_set(rmse))
best_mod = my_res %>% select_best("rmse")
best_mod
final_fitted = finalize_workflow(wf, best_mod) %>% fit(data=mt1)
# predicted for train
final_fitted %>%
predict(mt1)
final_fitted %>%
predict(mt2)
Error at my end:
> Error: ! Can't convert `data$mpg` <character> to match type of `mpg`
> <double>. Run `rlang::last_error()` to see where the error occurred.
I am trying to replicate the examples of hyperparameter tuning using Bayesian searching from this site: https://www.r-bloggers.com/2020/05/bayesian-hyperparameters-optimization/ , and when running my code, received the following error: Error: All of the models failed. See the .notes column. Run rlang::last_error() to see where the error occurred.
Here is my current code. The error occurs when running the code starting on the tuned_PI line. Please let me know if you have any suggestions. I am very new to the tidymodels package and hyperparameter tuning.
training_index <- sample(nrow(data)*0.70)
test_index <- setdiff(seq(1:nrow(data)), training_index )
# Get the training data and test data
training_data <- data[training_index, ]
test_data <- data[test_index, ]
model_tune <- rand_forest(mtry = tune(), min_n = tune(), trees = tune()) %>%
set_engine("ranger", seed=222) %>%
set_mode("classification")
set.seed(1234)
folds <- vfold_cv(training_data, v=5, strata = DEATH_EVENT)
tune_wf <- workflow() %>%
add_model(model_tune) %>%
add_formula(DEATH_EVENT~.)
tuned_PI <- tune_wf %>%
tune_bayes(resamples = folds,
param_info=parameters(mtry(range = c(1,10)), min_n(range = c(1,10)), trees(range = c(480,540))),
metrics=metric_set(sensitivity),
objective=prob_improve(trade_off = 0.01))
I am trying to follow this tutorial here - https://juliasilge.com/blog/xgboost-tune-volleyball/
I am using it on the most recent Tidy Tuesday dataset about great lakes fishing - trying to predict agency based on many other values.
ALL of the code below works except the final row where I get the following error:
> final_res <- last_fit(final_xgb, stock_folds)
Error: Each element of `splits` must be an `rsplit` object.
I searched that error and came to this page - https://github.com/tidymodels/rsample/issues/175
That site has it called a bug and seems to be fixed - but it is with initial_time_split, not initial_split that I am using. I would rather not change it because then I would have to rerun the xgboost that took 9 hours. What went wrong here?
# Setup ----
library(tidyverse)
library(tidymodels)
stocked <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-06-08/stocked.csv')
stocked_modeling <- stocked %>%
mutate(AGENCY = case_when(
AGENCY != "OMNR" ~ "other",
TRUE ~ AGENCY
)) %>%
select(-SID, -MONTH, -DAY, -LATITUDE, -LONGITUDE, -GRID, -STRAIN, -AGEMONTH,
-MARK_EFF, -TAG_NO, -TAG_RET, -LENGTH, -WEIGHT, - CONDITION, -LOT_CODE,
-NOTES, - VALIDATION, -LS_MGMT, -STAT_DIST, -ST_SITE, -YEAR_CLASS, -STOCK_METH) %>%
mutate_if(is.character, factor) %>%
drop_na()
# Start making model ----
set.seed(123)
stock_split <- initial_split(stocked_modeling, strata = AGENCY)
stock_train <- training(stock_split)
stock_test <- testing(stock_split)
xgb_spec <- boost_tree(
trees = 1000,
tree_depth = tune(), min_n = tune(), loss_reduction = tune(),
sample_size = tune(), mtry = tune(),
learn_rate = tune()
) %>%
set_engine("xgboost") %>%
set_mode("classification")
xgb_grid <- grid_latin_hypercube(
tree_depth(),
min_n(),
loss_reduction(),
sample_size = sample_prop(),
finalize(mtry(), stock_train),
learn_rate(),
size = 20
)
xgb_workflow <- workflow() %>%
add_formula(AGENCY ~ .) %>%
add_model(xgb_spec)
set.seed(123)
stock_folds <- vfold_cv(stock_train, strata = AGENCY)
doParallel::registerDoParallel()
# BEWARE, THIS CODE BELOW TOOK 9 HOURS TO RUN
set.seed(234)
xgb_res <- tune_grid(
xgb_workflow,
resamples = stock_folds,
grid = xgb_grid,
control = control_grid(save_pred = TRUE)
)
# Explore results
best_auc <- select_best(xgb_res, "roc_auc")
final_xgb <- finalize_workflow(
xgb_workflow,
best_auc)
final_res <- last_fit(final_xgb, stock_folds)
If we look at the documentation of last_fit() We see that split must be
An rsplit object created from `rsample::initial_split().
You accidentally passed the cross-validation folds object stock_folds into split but you should have passed rsplit object stock_split instead
final_res <- last_fit(final_xgb, stock_split)