Error in `validate_index()`: ! Column `Date` (index) must not contain `NA` using modeltime package in R - r

I'm having trouble visualizing and calibrating time series using the modeltime package. In the following code, I encounter two similar errors:
# Webscrape
webpage <- read_html("https://www.westmetall.com/en/markdaten.php?action=table&field=LME_Al_cash")
alu_table <- html_nodes(webpage, "table") %>%
html_table(fill = TRUE)
alu_df <- map_dfr(alu_table, magrittr::extract, c("date", "LME Aluminium Cash-Settlement"))
# Names
names(alu_df)[1] <- "Date"
names(alu_df)[2] <- "Aluminium_Prices"
# Clear rows in data frame
alu_df_cleaned <- alu_df %>%
# Remove unwanted rows
filter(Date != "date" & Aluminium_Prices != "LME Aluminium Cash-Settlement")
# Change Date type
alu_df_cleaned$Date <- dmy(alu_df_cleaned$Date)
# Proper structure of integers to translate them to the numeric class
alu_df_cleaned$Aluminium_Prices <- gsub(",", "", alu_df_cleaned$Aluminium_Prices)
alu_df_cleaned$Aluminium_Prices <- as.numeric(alu_df_cleaned$Aluminium_Prices)
# Sorting prices by Date
alu_df_cleaned <- alu_df_cleaned %>%
arrange(Date)
# Check if there are any NA's
sum(is.na(alu_df_cleaned$Aluminium_Prices))
df_na <- as.data.frame(
cbind(
map(
map(alu_df_cleaned$Aluminium_Prices, is.na), sum)
)
)
rownames(subset(df_na, df_na$V1 != 0))
# Convert to tibble
alu_ts_data <- alu_df_cleaned %>%
mutate(row_name = row_number()) %>%
tsibble::as_tsibble(index = Date, key = row_name)
# Drop NA's from the dataset
alu_ts_data <- alu_ts_data %>%
drop_na()
# Split the data into train and test
set.seed(1353)
splits <- initial_time_split(alu_ts_data)
train <- training(splits)
test <- testing(splits)
# Visualise the Splits
splits %>%
tk_time_series_cv_plan() %>%
plot_time_series_cv_plan(.date_var = Date,
.value = Aluminium_Prices)
Error in `validate_index()`:
! Column `Date` (index) must not contain `NA`.
Backtrace:
1. splits %>% tk_time_series_cv_plan() %>% ...
10. tidyr:::unnest.data.frame(., .value)
11. tidyr::unchop(...)
12. tidyr:::df_unchop(...)
13. vctrs::list_unchop(col, ptype = col_ptype)
14. vctrs (local) `<fn>`()
16. tsibble:::vec_restore.tbl_ts(x = x, to = to)
17. tsibble::build_tsibble(...)
18. tsibble:::validate_index(tbl, !!qindex)
Error in validate_index(tbl, !!qindex) :
# Multiple models
# Auto ARIMA Model
arima_auto_fit <- arima_reg() %>%
set_engine("auto_arima") %>%
fit(Aluminium_Prices ~ Date, data = train)
# Boosted ARIMA Model
arima_boost_fit <- arima_boost() %>%
set_engine("auto_arima_xgboost") %>%
fit(Aluminium_Prices ~ Date, data = train)
# Exponential Smoothing Model
ets_fit <- exp_smoothing() %>%
set_engine("ets") %>%
fit(Aluminium_Prices ~ Date, data = train)
# Prophet Model
prophet_fit <- prophet_reg() %>%
set_engine("prophet") %>%
fit(Aluminium_Prices ~ Date, data = train)
# Linear Model
lm_fit <- linear_reg() %>%
set_engine("lm") %>%
fit(Aluminium_Prices ~ Date, data = train)
# Put in Table and Calibrate
models_tbl <- modeltime_table(
arima_auto_fit,
arima_boost_fit,
ets_fit,
prophet_fit,
lm_fit
)
# Calibrate
calibrate_tbl <- models_tbl %>%
modeltime::modeltime_calibrate(new_data = test, quiet = FALSE)
Error: Column `Date` (index) must not contain `NA`.
Error: Column `Date` (index) must not contain `NA`.
Error: Column `Date` (index) must not contain `NA`.
Error: Column `Date` (index) must not contain `NA`.
Error: Column `Date` (index) must not contain `NA`.
── Model Calibration Failure Report ────────────────────────
All models failed Modeltime Calibration:
- Model 1: Failed Calibration.
- Model 2: Failed Calibration.
- Model 3: Failed Calibration.
- Model 4: Failed Calibration.
- Model 5: Failed Calibration.
Potential Solution: Use `modeltime_calibrate(quiet = FALSE)` AND Check the Error/Warning Messages for clues as to why your model(s) failed calibration.
── End Model Calibration Failure Report ────────────────────
Error in `validate_modeltime_calibration()`:
! All models failed Modeltime Calibration.
Backtrace:
1. models_tbl %>% ...
3. modeltime:::modeltime_calibrate.mdl_time_tbl(...)
4. modeltime:::validate_modeltime_calibration(ret)
Error in validate_modeltime_calibration(ret)
I attempted to see if there were any NAs in my dataset and I have found two. Then I have droped them, but nothing has resolved the problem. I would greatly appreciate it if you could help me.

Related

R - Looping linear regression results for time series

I want to run linear regressions on the NZD vs a number of securities
I have some code to runs the regression but rather than apply it to each security i would prefer to run a loop through the list of securities to give me a file with the r^2 results from each linear regression
my dep variable is called: nzdusd
independent variables I would like to loop through are spx, adxy, vix
Code: as it currently stands with spx (like to use the same code to loop it through for variables adxy and vix as well)
library(tseries)
library(lmtest)
library(dplyr)
library(lubridate)
# 3 month regression, change variable here to get number of days
# e.g. 3 months sd = 60
# inputs
# 3 month regression
sd <- 60
# loading my market data from a saved location (variables nzdusd,spx, adxy, vix)
my_path <- file.path ("K:","X,"bbg_daily.Rdata")
load(file = my_path)
# Transform NZD into percentage change
pct.nzdusd <- nzdusd %>%
select(date, PX_LAST) %>%
mutate(lag = lag(PX_LAST),
pct_chg = (PX_LAST - lag) * 100 / lag) %>%
select(date, pct_chg)
# SPX(S&P 500)
myfun <- function(x) {
deparse(substitute(x))
}
# ^=^=^=^=^=^=^=^=^=^=^=^=^=^=
mysec_str <- myfun(spx)
mysec <- spx
z <- 5 # Series ID
# ^=^=^=^=^=^=^=^=^=^=^=^=^=^=
# Transform into percentage change
mypct <- mysec %>%
select(date, PX_LAST) %>%
mutate(lag = lag(PX_LAST),
pct_chg = (PX_LAST - lag) * 100 / lag) %>%
select(date, pct_chg)
assign(paste("pct.", mysec_str, sep = ""),mypct)
# join times series
ts <- paste("ts_", z, sep ="")
ts <- (inner_join(x = pct.nzdusd, y = mypct, by = "date"))
# get last row
last_row <- ts %>% slice(n())
end_dt <- last_row [1,1]
# start date declared above depending on regression
start_dt <- ts[((nrow (ts))-sd),1]
# getting subset of time series
ts_sub <- subset(ts,
date >= as.POSIXct(start_dt) &
date <= as.POSIXct(end_dt))
# regression
reg.ts = lm(pct_chg.x~pct_chg.y, ts_sub)
r2 <- summary(reg.ts)$r.squared
assign(paste(mysec_str, ".r2", sep = ""),r2)
stderr <- sqrt(deviance(reg.ts)/df.residual(reg.ts))
assign(paste(mysec_str, ".stderr", sep = ""),stderr)
#===================================================
r2 <- c(spx.r2, *adxy.r2, vix.r2*)
my_path2 <- file.path ("K:","x")
save (r2, file = my_path2 )
I've done code by simply copying and pasting and then replacing spx with the other variable names. But i know the code can be a lot slicker by using a loop. Particularily if I want to add a lot more independent variables
It's hard to known without reprex data, but to run multiple models, I've found pivoting longer, nesting by independent variables and then mutating through those variables works well. If your data just contains your dependent and independent variables, you can:
library(tidyverse)
ts_sub %>%
# Keep independent variable outside nested data
pivot_longer(- nzdusd, names_to = "dependent_vars", values_to = "values") %>%
nest_by(dependent_vars) %>%
mutate(model = list(lm(nzdusd ~ values, data = data)))
See: https://dplyr.tidyverse.org/reference/nest_by.html

Future dataset is incomplete when using Fable Prophet

I'm trying to view the out of sample performance scores after running fable prophet. Please note, the forecast is grouped based on type and the forecast is looking 5 observations ahead.
Here is the code:
library(tibble)
library(tsibble)
library(fable.prophet)
lax_passengers <- read.csv("https://raw.githubusercontent.com/mitchelloharawild/fable.prophet/master/data-raw/lax_passengers.csv")
library(dplyr)
library(lubridate)
lax_passengers <- lax_passengers %>%
mutate(datetime = mdy_hms(ReportPeriod)) %>%
group_by(month = yearmonth(datetime), type = Domestic_International) %>%
summarise(passengers = sum(Passenger_Count)) %>%
ungroup()
lax_passengers <- as_tsibble(lax_passengers, index = month, key = type)
fit <- lax_passengers %>%
model(
mdl = prophet(passengers ~ growth("linear") + season("year", type = "multiplicative")),
)
fit
test_tr <- lax_passengers %>%
slice(1:(n()-5)) %>%
stretch_tsibble(.init = 12, .step = 1)
fc <- test_tr %>%
model(
mdl = prophet(passengers ~ growth("linear") + season("year", type = "multiplicative")),
) %>%
forecast(h = 5)
fc %>% accuracy(lax_passengers)
When I run fc %>% accuracy(lax_passenger), I get the following warning:
Warning message:
The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
5 observations are missing between 2019 Apr and 2019 Aug
How do make the future dataset complete as I believe the performance score isn't accurate based on the missing 5 observations.
It seems like when I try to stretch the tsibble, it doesn't slice correctly as it doesn't remove the last 5 observations from each type.
The slice() function removes rows from the entire dataset, so it is only removing the last 5 rows from your last key (type=="International"). To remove the last 5 rows from all keys, you'll need to group by keys and slice.
test_tr <- lax_passengers %>%
group_by_key() %>%
slice(1:(n()-5)) %>%
ungroup() %>%
stretch_tsibble(.init = 12, .step = 1)

R: Tibble Conversions

I am using the R programming language. I am following this tutorial over here: https://blogs.rstudio.com/ai/posts/2018-06-25-sunspots-lstm/
I am trying to prepare my data in the same way as this example over here:
# Core Tidyverse
library(tidyverse)
library(glue)
library(forcats)
# Time Series
library(timetk)
library(tidyquant)
library(tibbletime)
# Visualization
library(cowplot)
# Preprocessing
library(recipes)
# Sampling / Accuracy
library(rsample)
library(yardstick)
# Modeling
library(keras)
library(tfruns)
#here is what I am trying to copy
sun_spots <- datasets::sunspot.month %>%
tk_tbl() %>%
mutate(index = as_date(index)) %>%
as_tbl_time(index = index)
sun_spots
# A time tibble: 3,177 x 2
# Index: index
index value
<date> <dbl>
1 1749-01-01 58
2 1749-02-01 62.6
3 1749-03-01 70
4 1749-04-01 55.7
5 1749-05-01 85
6 1749-06-01 83.5
7 1749-07-01 94.8
8 1749-08-01 66.3
9 1749-09-01 75.9
10 1749-10-01 75.5
# ... with 3,167 more rows
In this example, the formatted data is of dimension 3,177 x 2.
I figured, that I should be able to simulate data in a similar form (using the same names as the data in the tutorial):
index = seq(as.Date("1749/1/1"), as.Date("2016/1/1"),by="day")
index <- format(as.Date(index), "%Y/%m/%d")
value <- rnorm(97520,27,2.1)
final_data <- data.frame(index, value)
y.mon<-aggregate(value~format(as.Date(index),
format="%Y/%m"),data=final_data, FUN=sum)
y.mon$index = y.mon$`format(as.Date(index), format = "%Y/%m")`
y.mon$`format(as.Date(index), format = "%Y/%m")` = NULL
#resulting file is y.mon
Now, when I try to convert my file to the required format:
y.mon_mod <- y.mon%>%
tk_tbl() %>%
mutate(index = as_date(index)) %>%
as_tbl_time(index = index)
I get the following error:
Error: Problem with `mutate()` input `index`.
x 'origin' must be supplied
i Input `index` is `as_date(index)`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In tk_tbl.data.frame(.) :
Warning: No index to preserve. Object otherwise converted to tibble successfully.
Does anyone know why this error happens? I checked my environment and it says that the "namespace" library has been loaded. Is it because my "date" (index) variable is not in the correct format? Does anybody know how to fix this problem?
Thanks
Make your index column such that it can be converted to date object.
library(dplyr)
library(lubridate)
library(tibbletime)
library(timetk)
y.mon %>%
mutate(index = paste0(index, '/01')) %>%
tk_tbl() %>%
mutate(index = as_date(index)) %>%
as_tbl_time(index = index) -> y.mon

fable package error: no applicable method for 'model' applied to

Here is my code:
library(fpp3)
val <- seq(1,100,1)
time <- seq.Date(as.Date("2010-01-01"), by = "day", length.out = 100 )
df <- data.frame(val = val, time = time)
fit <- df %>% as_tibble(., index = time) %>%
model(arima = ARIMA(val))
It generates error:
Error in UseMethod("model") :
no applicable method for 'model' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
I am not sure what I am doing wrong. I do not see how it is different from this fable example
Here we, need as_tsibble instead of as_tibble. According to ?model
.data - A data structure suitable for the models (such as a tsibble)
library(dplyr)
library(fpp3)
df %>%
as_tsibble(., index = time) %>%
model(arima = ARIMA(val))
# A mable: 1 x 1
# arima
# <model>
#1 <ARIMA(0,1,0)>

How to forecast an arima with Dynamic regression models for grouped data?

I'm trying to make a forecast of a arima with regression (Regression with ARIMA errors) to several ts at the same time and using grouped data.
I'm new in the tidy data so... Basically, I'm reproducing this example (https://cran.rstudio.com/web/packages/sweep/vignettes/SW01_Forecasting_Time_Series_Groups.html) with a multivariate ts, and multivariate model.
here is a reproducible example:
library(tidyverse); library(tidyquant)
library(timetk); library(sweep)
library(forecast)
library(tsibble)
library(fpp3)
# using package data
bike_sales
# grouping data
monthly_qty_by_cat2 <- bike_sales %>%
mutate(order.month = as_date(as.yearmon(order.date))) %>%
group_by(category.secondary, order.month) %>%
summarise(total.qty = sum(quantity), price.m = mean(price))
# using nest
monthly_qty_by_cat2_nest <- monthly_qty_by_cat2 %>%
group_by(category.secondary) %>%
nest()
monthly_qty_by_cat2_nest
# Forecasting Workflow
# Step 1: Coerce to a ts object class
monthly_qty_by_cat2_ts <- monthly_qty_by_cat2_nest %>%
mutate(data.ts = map(.x = data,
.f = tk_ts,
select = -order.month, # take off date
start = 2011,
freq = 12))
# Step 2: modeling an ARIMA(y ~ x)
# make a function to map
modeloARIMA_reg <- function(y,x) {
result <- ARIMA(y ~ x)
return(list(result))}
# map the function
monthly_qty_by_cat2_fit <- monthly_qty_by_cat2_ts %>%
mutate(fit.arima = map(data.ts, modeloARIMA_reg))
monthly_qty_by_cat2_fit
Here I dont know if the map is using the right variable in y (the dependent), but I keep going try the forecast and an error appears
# Step 3: Forecasting the model
monthly_qty_by_cat2_fcast <- monthly_qty_by_cat2_fit %>%
mutate(fcast.ets = map(fit.arima, forecast))
# this give me this error
# Erro: Problem with `mutate()` input `fcast.arima`.
# x argumento não-numérico para operador binário
# i Input `fcast.arima` is `map(fit.arima, forecast)`.
# i The error occured in group 1: category.secondary = "Cross Country Race".
# Run `rlang::last_error()` to see where the error occurred.
# Além disso: Warning message:
# In mean.default(x, na.rm = TRUE) :
# argument is not numeric or logical: returning NA
Two questions emerge:
I dont know how to input the mean of the independent variable (x) of each group;
AND how to declare this new data as a forecast argument.
PS: Dont need be tibble or nested result, I just need the point forecast and the CI (total.qty lo.95 hi.95)
Well, this code solve the problem for me.
This make one forecast for each time-series (grouped tsibble) and use the own mean value of those time-series as future data in the forecast
Any comment is welcome.
# MY FLOW
monthly_qty_by_cat2 <-
sweep::bike_sales %>%
mutate(order.month = yearmonth(order.date)) %>%
group_by(category.secondary, order.month) %>%
summarise(total.qty = sum(quantity), price.m = mean(price)) %>%
as_tsibble(index=order.month, key=category.secondary) # coerse in tsibble
# mean for the future
futuro <- monthly_qty_by_cat2 %>%
group_by(category.secondary) %>%
mutate(fut_x = mean(price.m)) %>%
do(price.m = head(.$fut_x,1))
# as.numeric
futuro$price.m <- as.numeric(futuro$price.m)
futuro
# make values in the future
future_x <- new_data(monthly_qty_by_cat2, 3) %>%
left_join(futuro, by = "category.secondary")
future_x
# model and forecast
fc <- monthly_qty_by_cat2 %>%
group_by(category.secondary) %>%
model(ARIMA(total.qty ~ price.m)) %>%
forecast(new_data=future_x) %>%
hilo(level = 95) %>%
unpack_hilo("95%")
fc
# Tidy the forecast
fc_tibble <- fc %>% as_tibble() %>% select(-total.qty)
fc_tibble
# the end
Well this solve the problem for me.
This make one forecast for each group time-series and use the own mean value of those group time-series as future data in the forecast
Any comment is welcome.

Resources