Normalizing features with tensorflow in R - r

I am currently starting to work with Keras/Tensorflow in R and am therefore working through the tensorflow tutorial.
However, when I try to normalize the feature space in the same way as described in the tutorial, I receive an error message/exception.
I found a kaggle notebook online that tried to reproduce the tensorflow tutorial as well, and it als got stuck at the exact same error message. See https://www.kaggle.com/code/kewagbln/boston-housing-regression-with-tensorflow/notebook.
Does anyone understand why I am getting the error message? Ultimately, I am not even coding on my own but just copying out of the tutorial and it still does not work.
To provide some more information: I am running the following code:
rm(list = ls())
library(keras)
library(tensorflow)
library(tfdatasets)
tensorflow::set_random_seed(42)
boston_housing <- dataset_boston_housing()
c(train_data, train_labels) %<-% boston_housing$train
c(test_data, test_labels) %<-% boston_housing$test
paste0("Training entries: ", length(train_data), ", labels: ", length(train_labels))
library(dplyr)
column_names <- c('CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE',
'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT')
train_df <- train_data %>%
as_tibble(.name_repair = "minimal") %>%
setNames(column_names) %>%
mutate(label = train_labels)
test_df <- test_data %>%
as_tibble(.name_repair = "minimal") %>%
setNames(column_names) %>%
mutate(label = test_labels)
spec <- feature_spec(train_df, label ~ . ) %>%
step_numeric_column(all_numeric(), normalizer_fn = scaler_standard()) %>%
fit()
spec
layer <- layer_dense_features(
feature_columns = dense_features(spec),
dtype = tf$float32
)
layer(train_df)
input <- layer_input_from_dataset(train_df %>% select(-label))
output <- input %>%
layer_dense_features(dense_features(spec)) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 1)
model <- keras_model(input, output)
summary(model)
Overall, the code runs just fine and I can train a simple neural network. The exception is being raised when calling layer(train_df). This, however, seems to have no impact on the overall model construction.

Related

step_mutate() couldn't find the function str_remove()

I have a recipe with the step_mutate() function in between, performing text data transformations on titanic dataset, supported by the stringr package.
library(tidyverse)
library(tidymodels)
extract_title <- function(x) stringr::str_remove(str_extract(x, "Mr\\.? |Mrs\\.?|Miss\\.?|Master\\.?"), "\\.")
rf_recipe <-
recipe(Survived ~ ., data = titanic_train) %>%
step_impute_mode(Embarked) %>%
step_mutate(Cabin = if_else(is.na(Cabin), "Yes", "No"),
Title = if_else(is.na(extract_title(Name)), "Other", extract_title(Name))) %>%
step_impute_knn(Age, impute_with = c("Title", "Sex", "SibSp", "Parch")) %>%
update_role(PassengerId, Name, new_role = "id")
This set of transformations works perfectly well with rf_recipe %>% prep() %>% bake(new_data = NULL).
When I try to fit a random forests model with hyperparameter tunning and 10-fold cross validation within a workflow, all models fail. The output of the .notes columns explicitly says that there was a problem with mutate() column Title: couldn't find the function str_remove().
doParallel::registerDoParallel()
rf_res <-
tune_grid(
rf_wf,
resamples = titanic_folds,
grid = rf_grid,
control = control_resamples(save_pred = TRUE)
)
As this post suggests I've explicitly told R that str_remove should be found in stringr package. Why this isn't working and what could be causing it?
I don't think this will fix the error, but just in case the str_extract function is not written stringr :: str_extract, did you load the package?
The error shows up because step_knn_impute() and subsequently the gower::gower_topn function transforms all characters to factors. To overcome this issue I had to apply prep()and bake() functions, without the inclusion of the recipe in the workflow.
prep_recipe <- prep(rf_recipe)
train_processed <- bake(prep_recipe, new_data = NULL)
test_processed <- bake(prep_recipe, new_data = titanic_test %>%
mutate(across(where(is.character), as.factor)))
Now the models converge.

R: Error in is_symbol(x) : object '.' not found (keras)

I am using the R programming language. I am trying to follow the R tutorial over here on neural networks (lstm) and time series: https://blogs.rstudio.com/ai/posts/2018-06-25-sunspots-lstm/
I decided to create my own time series data ("y.mon") for this tutorial (the same format and the same variable names) :
library(tidyverse)
library(glue)
library(forcats)
library(timetk)
library(tidyquant)
library(tibbletime)
library(cowplot)
library(recipes)
library(rsample)
library(yardstick)
library(keras)
library(tfruns)
library(dplyr)
library(lubridate)
library(tibbletime)
library(timetk)
index = seq(as.Date("1749/1/1"), as.Date("2016/1/1"),by="day")
index <- format(as.Date(index), "%Y/%m/%d")
value <- rnorm(97520,27,2.1)
final_data <- data.frame(index, value)
y.mon<-aggregate(value~format(as.Date(index),
format="%Y/%m"),data=final_data, FUN=sum)
y.mon$index = y.mon$`format(as.Date(index), format = "%Y/%m")`
y.mon$`format(as.Date(index), format = "%Y/%m")` = NULL
y.mon %>%
mutate(index = paste0(index, '/01')) %>%
tk_tbl() %>%
mutate(index = as_date(index)) %>%
as_tbl_time(index = index) -> y.mon
From here on, I follow the instructions in the tutorial (replacing the "sun_spots data" with "y.mon". Everything works fine until this point (I posted a question yesterday that got closed for being too detailed https://stackoverflow.com/questions/65527230/r-error-in-is-symbolx-object-not-found-keras - the code can be followed from the rstudio tutorial) :
#ERROR
coln <- colnames(compare_train)[4:ncol(compare_train)]
cols <- map(coln, quo(sym(.)))
rsme_train <-
map_dbl(cols, function(col)
rmse(
compare_train,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()
rsme_train
Error in is_symbol(x) : object '.' not found
I found another stackoverflow post which deals with a similar problem:Getting error message while calculating rmse in a time series analysis
According to this stackoverflow post, this first error can be resolved like this:
coln <- colnames(compare_train)[4:ncol(compare_train)]
rsme_train <-
map_df(coln, function(col)
rmse(
compare_train,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>%
pull(.estimate) %>%
mean()
rsme_train
However, the following section of the tutorial has a similar section in which the same error persists even after applying the corrections:
compare_test %>% write_csv(str_replace(model_path, ".hdf5", ".test.csv"))
compare_test[FLAGS$n_timesteps:(FLAGS$n_timesteps + 10), c(2, 4:8)] %>% print()
cols <- map(coln, quo(sym(.)))
rsme_test <-
map_dbl(cols, function(col)
rmse(
compare_test,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()
rsme_test
#errors:
Error in stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
object 'model_path' not found
Error in is_symbol(x) : object '.' not found
These errors are preventing me from finishing the rest of the tutorial.
Can someone please show me how to fix these?
Thanks
Try using coln in map_dbl :
rsme_test <- map_dbl(coln, function(col)
rmse(
compare_test,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()

I don't understand how map() and prepper() work

The behavior of map is understood to the extent of the following code.
iris %>%
group_nest(Species) %>%
mutate(lm_mod = map(data,function(x){
lm(Sepal.Width~Sepal.Length,x)
}))
The above code works in my head as follows.
fot(i in unique(iris$species))
data <- slice iris$species[i]
function(data){lm(width ~ length,data)}
lm_mod[i] <- function(data)
But I'm confused when I encounter new code.
folds %>%
mutate(recipes = map(splits, prepper, recipe = recipe_code))
in my head
for(i in len(folds))
folds$recipes <- prepper(splits) ??recipe = recipe_code??
Where does the recipe_code go in?

Error: in [Seize]->Timeout->[Release]: Expecting a single value: [extent=11]

I'm using R simmer to do a simulation. However, I receive this error message every time when I run it:
Error: 'truck0' at 48.73 in [Seize]->Timeout->[Release]: Expecting a
single value: [extent=11].
What is wrong with this?
This is my R script:
rm(list=ls())
#load packages
library(simmer)
library(simmer.plot)
#create an simulation environment
env <- simmer("Terminal")
env
#create a truck trajectory
truck <- trajectory("Truck path", verbose = TRUE)
truck
#draw model
truck %>%
seize("frontdesk",1) %>%
timeout(function() rnorm(11.27671,3.233562)) %>%
release("frontdesk",1) %>%
seize("gate-in",1) %>%
timeout(function() rnorm(17.54509,9.915719)) %>%
release("gate-in",1) %>%
seize("station",1) %>%
timeout(function() rnorm(12.68418,12.55247)) %>%
release("station",1) %>%
seize("lashing",1) %>%
timeout(function() rnorm(28.87726,21.0809)) %>%
release("lashing",1) %>%
seize("control",1) %>%
timeout(function() rnorm(12.70417,3.711475)) %>%
release("control",1) %>%
seize("frontdesk end",1) %>%
timeout(function()rnorm(11.27671,3.233562)) %>%
release("frontdesk end",1)
env <- lapply(1:100, function(i) {
simmer("Terminal") %>%
add_resource("frontdesk", 2) %>%
add_resource("gate-in", 2) %>%
add_resource("station", 1) %>%
add_resource("lashing", 15) %>%
add_resource("control", 1) %>%
add_resource("frontdesk end", 2) %>%
add_generator(name = "truck" ,
trajectory = truck,
distribution = function() rnorm(1,24.992,36.015)) %>%
run(660) %>%
wrap()
})
As the error indicates, timeout activities expect a single value, and you are providing 11 in this case. Because of this:
timeout(function() rnorm(11.27671,3.233562))
rnorm's first parameter is the number of samples (which is rounded to 11 in this case). What are you trying to do here? If that's supposed to be mean=11.27, sd=3.23, then you need to add
timeout(function() rnorm(1, 11.27671,3.233562))
so that you get a single sample per call, as required. And the same applies for all the other timeouts.
EDIT: Also, I do not recommend using a normal distribution for service times, because a normal distribution may return negative values (that are by default coerced to positive), and thus you may get unexpected results.

Error when trying to load dl format using igraph

I am trying to load the kapferer min dataset into r using the igraph function "read_graph"
The code is very simple, however it throws an error.
test_g <-read_graph("http://vlado.fmf.uni-lj.si/pub/networks/data/ucinet/kapmine.dat", format = "dl")
Error in read.graph.dl(file, ...) : At foreign.c:3050 : syntax
error, unexpected $end, expecting DL in line 1, Parse error
The as can be seen by following the link the file does begin with DL. The only clue I can find to this is a message from 2015 which basically says file a bug report.
Can dl files not beloaded by igraph at the moment, or is there some trick to it?
As there doesn't seem to be a clear way to load dl files I have made a loader that seems to work well for dl graph on the Pajek website. The function is a bit scrappy and has not been extensively tested, but it may be useful to some who want to use certain graphs that are not available in a more common format. If there is more to date information on these datasets, then this code can be ignored.
load_dl_graph <- function(file_path, directed){
raw_mat <- readLines(file_path) %>%
enframe()
row_labels_row <- grep( "ROW LABELS:", raw_mat$value)
column_labels_row <- grep( "COLUMN LABELS:", raw_mat$value)
level_labels_row <- grep("LEVEL LABELS:",raw_mat$value )
data_table_row <- grep( "DATA:", raw_mat$value)
row_labels <- raw_mat %>%
slice((row_labels_row+1):(column_labels_row-1)) %>%
select(from = value)
column_labels <- raw_mat %>%
slice((column_labels_row+1):(level_labels_row-1)) %>% pull(value)
table_levels <- raw_mat %>%
slice((level_labels_row+1):(data_table_row+-1)) %>% pull(value)
data_df <- raw_mat %>%
slice((data_table_row+1):nrow(.)) %>%
select(value) %>%
mutate(value = str_squish(value)) %>%
separate(col = value, into = column_labels, sep = " ") %>%
mutate(table_id = rep(1:length(table_levels), each = nrow(.)/length(table_levels)))
tables_list <- 1:length(table_levels) %>%
map(~{
data_df %>%
filter(table_id ==.x) %>%
select(-table_id) %>%
bind_cols(row_labels,.) %>%
pivot_longer(cols = 2:ncol(.), names_to = "to", values_to = "values") %>%
filter(values ==1) %>%
select(-values) %>%
graph_from_data_frame(., directed = directed)
})
names(tables_list) <- table_levels
return(tables_list)
}

Resources