step_mutate() couldn't find the function str_remove() - r

I have a recipe with the step_mutate() function in between, performing text data transformations on titanic dataset, supported by the stringr package.
library(tidyverse)
library(tidymodels)
extract_title <- function(x) stringr::str_remove(str_extract(x, "Mr\\.? |Mrs\\.?|Miss\\.?|Master\\.?"), "\\.")
rf_recipe <-
recipe(Survived ~ ., data = titanic_train) %>%
step_impute_mode(Embarked) %>%
step_mutate(Cabin = if_else(is.na(Cabin), "Yes", "No"),
Title = if_else(is.na(extract_title(Name)), "Other", extract_title(Name))) %>%
step_impute_knn(Age, impute_with = c("Title", "Sex", "SibSp", "Parch")) %>%
update_role(PassengerId, Name, new_role = "id")
This set of transformations works perfectly well with rf_recipe %>% prep() %>% bake(new_data = NULL).
When I try to fit a random forests model with hyperparameter tunning and 10-fold cross validation within a workflow, all models fail. The output of the .notes columns explicitly says that there was a problem with mutate() column Title: couldn't find the function str_remove().
doParallel::registerDoParallel()
rf_res <-
tune_grid(
rf_wf,
resamples = titanic_folds,
grid = rf_grid,
control = control_resamples(save_pred = TRUE)
)
As this post suggests I've explicitly told R that str_remove should be found in stringr package. Why this isn't working and what could be causing it?

I don't think this will fix the error, but just in case the str_extract function is not written stringr :: str_extract, did you load the package?

The error shows up because step_knn_impute() and subsequently the gower::gower_topn function transforms all characters to factors. To overcome this issue I had to apply prep()and bake() functions, without the inclusion of the recipe in the workflow.
prep_recipe <- prep(rf_recipe)
train_processed <- bake(prep_recipe, new_data = NULL)
test_processed <- bake(prep_recipe, new_data = titanic_test %>%
mutate(across(where(is.character), as.factor)))
Now the models converge.

Related

Normalizing features with tensorflow in R

I am currently starting to work with Keras/Tensorflow in R and am therefore working through the tensorflow tutorial.
However, when I try to normalize the feature space in the same way as described in the tutorial, I receive an error message/exception.
I found a kaggle notebook online that tried to reproduce the tensorflow tutorial as well, and it als got stuck at the exact same error message. See https://www.kaggle.com/code/kewagbln/boston-housing-regression-with-tensorflow/notebook.
Does anyone understand why I am getting the error message? Ultimately, I am not even coding on my own but just copying out of the tutorial and it still does not work.
To provide some more information: I am running the following code:
rm(list = ls())
library(keras)
library(tensorflow)
library(tfdatasets)
tensorflow::set_random_seed(42)
boston_housing <- dataset_boston_housing()
c(train_data, train_labels) %<-% boston_housing$train
c(test_data, test_labels) %<-% boston_housing$test
paste0("Training entries: ", length(train_data), ", labels: ", length(train_labels))
library(dplyr)
column_names <- c('CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE',
'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT')
train_df <- train_data %>%
as_tibble(.name_repair = "minimal") %>%
setNames(column_names) %>%
mutate(label = train_labels)
test_df <- test_data %>%
as_tibble(.name_repair = "minimal") %>%
setNames(column_names) %>%
mutate(label = test_labels)
spec <- feature_spec(train_df, label ~ . ) %>%
step_numeric_column(all_numeric(), normalizer_fn = scaler_standard()) %>%
fit()
spec
layer <- layer_dense_features(
feature_columns = dense_features(spec),
dtype = tf$float32
)
layer(train_df)
input <- layer_input_from_dataset(train_df %>% select(-label))
output <- input %>%
layer_dense_features(dense_features(spec)) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 1)
model <- keras_model(input, output)
summary(model)
Overall, the code runs just fine and I can train a simple neural network. The exception is being raised when calling layer(train_df). This, however, seems to have no impact on the overall model construction.

How do I display test statistic (F value) for anova using tbl_summary

Here is the code that I am using to output a anova summary table with:
hiv %>%
select(education, sexfirsttime) %>%
mutate(education=
factor(education, levels= c("no education", "primary","secondary","college"))) %>%
tbl_summary(missing="no",
by=education,
statistic = all_continuous() ~"{mean} ({sd})",
label = sexfirsttime ~ "Age of first time sex") %>%
add_p(test= all_continuous() ~ "aov") %>%
modify_header(statistic ~ "**Test Statistic**")
After executing the code, I get the following error message:
Error: Error in update= argument input. Select from ‘variable’, ‘test_name’, ‘var_type’, ‘var_label’, ‘row_type’, ‘label’, ‘stat_1’, ‘stat_2’, ‘stat_3’, ‘stat_4’, ‘test_result’, ‘p.value’
When I try replacing statistic in modify_header with test_result, the output that I get a bizarre output as shown is in the image.
I am fairly new to using gtsummary. Any help would be greatly appreciated. Thank you.
Use the most recent version of gtsummary and try again. In the most recent version, the handling of "aov" tests was made more consistent with the other tests, including returning the "statistic" column.
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.0'
tbl <-
trial %>%
select(grade, age, marker) %>%
tbl_summary(
by = grade,
missing = "no"
) %>%
add_p(all_continuous() ~ "aov") %>%
# add a header (which also unhides a hidden column)
modify_header(statistic ~ "**Test Statistic**") %>%
# add a function to format the column
modify_fmt_fun(statistic ~ style_sigfig)
Created on 2021-10-17 by the reprex package (v2.0.1)

tidy function cannot be used within future_map?

I have R code below.
for the last row, when I used map() function, it worked well.
however, when I changed to future_map() function, I got the following error message:
"Error: Problem with mutate() column model.
i model = future_map(splits, fun1).
x no applicable method for 'tidy' applied to an object of class "c('lmerMod', 'merMod')""
any idea on what's wrong? thanks.
fun1 <- function(data) {
data %>% analysis %>%
lmer(val ~ period + (1 | id), data = .) %>% tidy
}
plan(multisession)
raw %>%
nest(data = -c(analyte, var)) %>%
mutate(boot = future_map(data, ~ bootstraps(
data = .x,
times = 5,
strata = id
),
.progress = T)) %>%
unnest(boot) %>%
mutate(model =future_map(splits, fun1))
I experienced exactly the same problem with one of my scripts. In order to get future_map to work properly with tidy, I needed to explicitly reference the broom package (i.e. I needed to use broom::tidy in place of tidy). In your example, you are attempting to extract summary statistics from a mixed model, so the code should run without error if we modify fun1 to be as follows:
fun1 <- function(data) {
data %>% analysis %>%
lmer(val ~ period + (1 | id), data = .) %>% broom.mixed::tidy
}
UPDATE (13-Dec-2021):
After a bit more reading, I now understand that the problem, as described in the original post, is due to the broom.mixed package not being attached in the R environment(s) where the future is evaluated. Instead of modifying fun1 (which is a very hacky way of resolving the problem), we should make use of the .options argument of future_map to guarantee that broom.mixed is attached (and all associated functions are available) in the future environments. The following code should run without error:
fun1 <- function(data) {
data %>%
analysis %>%
lmer(val ~ period + (1 | id), data = .) %>%
tidy
}
plan(multisession)
raw %>%
nest(data = -c(analyte, var)) %>%
mutate(boot = future_map(data, ~ bootstraps(data = .x,
times = 5,
strata = id),
.progress = T)) %>%
unnest(boot) %>%
mutate(model = future_map(splits,
fun1,
.options = furrr_options(packages = "broom.mixed")))
My take-home from this is that it's probably good practice to always list the packages that we need to use (as a character vector) using the .options argument of future_map, just to be on the safe side. I hope this helps!

R: Error in is_symbol(x) : object '.' not found (keras)

I am using the R programming language. I am trying to follow the R tutorial over here on neural networks (lstm) and time series: https://blogs.rstudio.com/ai/posts/2018-06-25-sunspots-lstm/
I decided to create my own time series data ("y.mon") for this tutorial (the same format and the same variable names) :
library(tidyverse)
library(glue)
library(forcats)
library(timetk)
library(tidyquant)
library(tibbletime)
library(cowplot)
library(recipes)
library(rsample)
library(yardstick)
library(keras)
library(tfruns)
library(dplyr)
library(lubridate)
library(tibbletime)
library(timetk)
index = seq(as.Date("1749/1/1"), as.Date("2016/1/1"),by="day")
index <- format(as.Date(index), "%Y/%m/%d")
value <- rnorm(97520,27,2.1)
final_data <- data.frame(index, value)
y.mon<-aggregate(value~format(as.Date(index),
format="%Y/%m"),data=final_data, FUN=sum)
y.mon$index = y.mon$`format(as.Date(index), format = "%Y/%m")`
y.mon$`format(as.Date(index), format = "%Y/%m")` = NULL
y.mon %>%
mutate(index = paste0(index, '/01')) %>%
tk_tbl() %>%
mutate(index = as_date(index)) %>%
as_tbl_time(index = index) -> y.mon
From here on, I follow the instructions in the tutorial (replacing the "sun_spots data" with "y.mon". Everything works fine until this point (I posted a question yesterday that got closed for being too detailed https://stackoverflow.com/questions/65527230/r-error-in-is-symbolx-object-not-found-keras - the code can be followed from the rstudio tutorial) :
#ERROR
coln <- colnames(compare_train)[4:ncol(compare_train)]
cols <- map(coln, quo(sym(.)))
rsme_train <-
map_dbl(cols, function(col)
rmse(
compare_train,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()
rsme_train
Error in is_symbol(x) : object '.' not found
I found another stackoverflow post which deals with a similar problem:Getting error message while calculating rmse in a time series analysis
According to this stackoverflow post, this first error can be resolved like this:
coln <- colnames(compare_train)[4:ncol(compare_train)]
rsme_train <-
map_df(coln, function(col)
rmse(
compare_train,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>%
pull(.estimate) %>%
mean()
rsme_train
However, the following section of the tutorial has a similar section in which the same error persists even after applying the corrections:
compare_test %>% write_csv(str_replace(model_path, ".hdf5", ".test.csv"))
compare_test[FLAGS$n_timesteps:(FLAGS$n_timesteps + 10), c(2, 4:8)] %>% print()
cols <- map(coln, quo(sym(.)))
rsme_test <-
map_dbl(cols, function(col)
rmse(
compare_test,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()
rsme_test
#errors:
Error in stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
object 'model_path' not found
Error in is_symbol(x) : object '.' not found
These errors are preventing me from finishing the rest of the tutorial.
Can someone please show me how to fix these?
Thanks
Try using coln in map_dbl :
rsme_test <- map_dbl(coln, function(col)
rmse(
compare_test,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()

extract weights from a RWeka SMOreg model

I am using the awesome RWeka package in order to fit a SMOreg model as implemented in Weka. While everything is working fine, I have some problem extracting the weights from the fitted model.
As all Weka classifier object, my model has a nice print method that shows me all the features and their relative weights. However, I am not able to extract this weights in any way.
You can see for yourself by running the following code:
library(RWeka)
data("mtcars")
SMOreg_classifier <- make_Weka_classifier("weka/classifiers/functions/SMOreg")
model_SMOreg <- SMOreg_classifier(mpg ~ ., data = mtcars)
Now, if you simply call the model
model_SMOreg
you'll see that it prints all the features used in the model with their relative weight. I would like to access those weights as a vector or, even better, as a 2-columns table with one column containing the names of the features and the other containing the weights.
I am working on a Windows 7 x64 system, using RStudio Version 1.0.153, R 3.4.2 Short Summer and RWeka 0.4-35.
Does someone know how to do this ?
I think you cannot get this in numeric format.
attr(model_SMOreg, "meta")$class # "Weka_classifier"
getAnywhere("print.Weka_classifier")
Result:
A single object matching ‘print.Weka_classifier’ was found
It was found in the following places
registered S3 method for print from namespace RWeka
namespace:RWeka
with value
function (x, ...)
{
writeLines(.jcall(x$classifier, "S", "toString"))
invisible(x)
}
<bytecode: 0x8328630>
<environment: namespace:RWeka>
So we see: print.Weka_classifier() makes a .writeLines() call which in turn makes a rJava::.jcall call, which returns a string.
Thus, I think you need to parse the weights yourself, perhaps by calling the capture.output() method.
Based on the suggestion of #knb I have wrote a function to extract the weights from a SMOreg model and return a tibble with one column for the features name and one for the features weight, with the row arranged following the absolute value of the weight.
Note that this function only works for the SMOreg classifier, as the output of other classifiers is slightly different in terms of layout. However, I think the function can be easily adapted for other classifiers.
library(stringr)
library(tidyverse)
extract_weights_from_SMOreg <- function(model) {
oldw <- getOption("warn")
options(warn = -1)
raw_output <- capture.output(model)
trimmed_output <- raw_output[-c(1:3,(length(raw_output) - 4): length(raw_output))]
df <- data_frame(features_name = vector(length = length(trimmed_output) + 1, "character"),
features_weight = vector(length = length(trimmed_output) + 1, "numeric"))
for (line in 1:length(trimmed_output)) {
string_as_vector <- trimmed_output[line] %>%
str_split(string = ., pattern = " ") %>%
unlist(.)
numeric_element <- trimmed_output[line] %>%
str_split(string = ., pattern = " ") %>%
unlist(.) %>%
as.numeric(.)
position_mul <- string_as_vector[is.na(numeric_element)] %>%
str_detect(string = ., pattern = "[*]") %>%
which(.)
numeric_element <- numeric_element %>%
`[`(., c(1:position_mul))
text_element <- string_as_vector[is.na(numeric_element)]
there_is_plus <- string_as_vector[is.na(numeric_element)] %>%
str_detect(string = ., pattern = "[+]") %>%
sum(.)
if (there_is_plus) { sign_is <- "+"} else { sign_is <- "-"}
feature_weight <- numeric_element[!is.na(numeric_element)]
if (sign_is == "-") {df[line, "features_weight"] <- feature_weight * -1} else {df[line, "features_weight"] <- numeric_element[!(is.na(numeric_element))]}
df[line, "features_name"] <- paste(text_element[(position_mul + 1): length(text_element)], collapse = " ")
}
intercept_line <- raw_output[length(raw_output) - 4]
there_is_plus_intercept <- intercept_line %>%
str_detect(string = ., pattern = "[+]") %>%
sum(.)
if (there_is_plus_intercept) { intercept_sign_is <- "+"} else { intercept_sign_is <- "-"}
numeric_intercept <- intercept_line %>%
str_split(string = ., pattern = " ") %>%
unlist(.) %>%
as.numeric(.) %>%
`[`(., length(.))
df[nrow(df), "features_name"] <- "intercept"
if (intercept_sign_is == "-") {df[nrow(df), "features_weight"] <- numeric_intercept * -1} else {df[nrow(df), "features_weight"] <- numeric_intercept}
options(warn = oldw)
df <- df %>%
arrange(desc(abs(features_weight)))
return(df)
}
Here an example for one model
library(RWeka)
data("mtcars")
SMOreg_classifier <- make_Weka_classifier("weka/classifiers/functions/SMOreg")
mpg_model_weights <- extract_weights_from_SMOreg(SMOreg_classifier(data = mtcars, mpg ~ .))
mpg_model_weights

Resources