For context, I'm a novice R user, so please forgive any incorrect terminology/processes. I am actively trying to improve my coding ability, but recently have become stumped.
I have the following data set where A * B * C = Output:
Date A B C Output
1/1/2013 177352 0.908329198 0.237047935 38187
1/2/2013 240724 0.852033865 0.237273592 48666
1/3/2013 243932 0.908380204 0.237039845 52524
1/4/2013 221485 0.820543152 0.236356733 42955
1/5/2013 202590 0.818066045 0.240900973 39925
1/6/2013 238038 0.770057722 0.247344561 45339
1/7/2013 271511 0.794258796 0.241252029 52026
1/8/2013 283434 0.807817693 0.233810703 53534
1/9/2013 275016 0.843220031 0.243769917 56530
1/10/2013 255266 0.797791324 0.238562428 48583
1/11/2013 226564 0.815791564 0.236153417 43648
1/12/2013 214366 0.800066242 0.237961133 40812
1/13/2013 256946 0.764845532 0.237640186 46702
1/14/2013 282298 0.816537843 0.234257528 53998
I have a few years worth of data and I'm trying for forecast Output, using A, B, and C. However, when I model out A, B, and C individually, the Output becomes very skewed. If I forecast just Output then I lose the input factors.
What is the best package/code to accomplish this task? I've tried Googling and searching on here numerous different methods, but haven't found the solution I'm looking for.
Here is some of the code:
DataSet1[,"Date"] <- mdy(DataSet[,"Date"])
DataSet1
TotalSet <- ts(DataSet1, frequency = 365, start =c(2013,1))
DataA <- ts(DataSet1$A, frequency = 365, start = c(2013,1))
DataB <- ts(DataSet1$B, frequency = 365, start = c(2013,1))
DataC <- ts(DataSet1$C, frequency = 365, start = c(2013,1))
OutputData <- ts(DataSet$Output, frequency = 365, start = c(2013,1))
ADecompose <- decompose(DataA)
BDecompose <- decompose(DataB)
CDecompose <- decompose(DataC)
OutputDecompose <- decompose(OutputData)
DataAHW <- HoltWinters(DataA, seasonal = "mult")
DataBHW <- HoltWinters(DataB, seasonal = "mult")
DataCHW <- HoltWinters(DataC, seasonal = "mult")
OutputDataHW <- HoltWinters(OutputData, seasonal = "mult")
FC.A <- forecast.HoltWinters(DataAHW)
FC.B <- forecast.HoltWinters(DataBHW)
FC.C <- forecast.HoltWinters(DataCHW)
FC.Output <- forecast.HoltWinters(OutputDataHW)
plot(ForecastVisits)
plot(ForecastCPV)
plot(ForecastRPC)
plot(ForecastRevenue)
Here is another model I built for the Output and I've plugged A, B, and C into it individually then combined them in excel. I'm sure there is a more appropriate way to handle this, but given my lack of experience I am reaching out for help
dataset <- testData
##FORECAST
forecastingFuntion <- function(dataset, lenghtOfForecast)
{
dataset[,"Date"] <- mdy(dataset[,"Date"])
myts <- ts(dataset[,"DataSet$Output"], start = c(2013,1), frequency = 365)
hwModel <- HoltWinters(myts, seasonal = "mult")
future <- data.frame(predict(hwModel, n.ahead = lenghtOfForecast, level = 0.9))
fittedValues <- data.frame(as.numeric(hwModel$fitted[,"xhat"]))
names(fittedValues) <- "fit"
futureDates <- c()
predicitedValues <- rbind(fittedValues, future)
for(i in 1: lenghtOfForecast)
{
futureDateSingle <- data.frame(dataset[nrow(dataset),"Date"] + days(i))
futureDates <- rbind(futureDates, futureDateSingle)
}
names(futureDates) <- "Date"
dates <- data.frame(dataset[366:(nrow(dataset)),"Date"])
names(dates) <- "Date"
dates <- rbind(dates, futureDates)
predictedData <- data.frame(predicitedValues, dates)
names(predictedData) <- c("predictedValues","Date")
finalData2 <- mergeData <- merge(predictedData, dataset, all.x = T, all.y = F, by = "Date")
finalData2
}
finalData2 <- forecastingFuntion(testData, 612)
rm(list=setdiff(ls(), c("finalData2")))
write.csv(finalData2, file="B2BForecastVisits.csv")
Thanks!
Related
I want to extract, .metrics (RMSE) from a Rolling origin forecast resampling
(tibble: 52 x 5) by "id" columns which consist of slices.
The replicating codes are given below. Here is my attempt.
metric <- resamples_fitted$.resample_results
metric
all <- metric[[1]][[".metrics"]]
res <- unlist(all)
estimate <- res[ grepl(".estimate", names(res))]
I want to get the ".estimate" by "slices" in the data frame. For each slice, there will be one RMSE. These are full codes
library(tidymodels)
library(modeltime)
library(modeltime.resample)
library(tidyverse)
library(timetk)
library(resample)
resample_spec <- rolling_origin(
data = m750,
initial = 200,
assess = 3,
cumulative = TRUE,
skip = 1,
overlap = 0 )
resamples_fitted <- m750_models %>%
modeltime_fit_resamples(
resamples = resample_spec,
control = control_resamples(verbose = FALSE)
)
resamples_fitted
metric <- resamples_fitted$.resample_results
metric
all <- metric[[1]][[".metrics"]]
res <- unlist(all)
estimate <- res[ grepl(".estimate", names(res))]
After spending some time, I found a solution, which may not be very elegant though, it solves the problem.
metric <- resamples_fitted$.resample_results
metric
all <- metric[[1]][[ ".metrics"]]
res <- unlist(all)
estimate <- res[ grepl(".estimate", names(res))]
typeof(estimate)
dat <- as.data.frame(sapply(estimate, as.numeric))
data <- dat[complete.cases(dat), ]
all_id <- data.frame(metric[[1]]$id)
al <- cbind(all_id,data)
Trying to use "for loop" in R. I have a vector of length 44 with 4401 observations read from data file "data.csv".
I am converting it to a matrix for working on each column as a time series data.
I want to extract each column, do forecasting and then make a matrix for that.
What is the easiest way to do that?
library(forecast)
data<-read.table(file="data.csv",sep=",",row.names=NULL,header=FALSE)
x <- matrix(1:47, ncol = 1, byrow = FALSE)
for (i in 1:4401)
{
y <- data[i]
y_ts <- ts(y, start=c(2016,1), end=c(2019,8), frequency=12)
AutoArimaModel=auto.arima(y_ts)
forecast=predict(AutoArimaModel, 3)
output <- matrix(forecast$pred, ncol = 1, byrow = FALSE)
ym = data.matrix(y)
z = rbind(ym,output)
x = cbind(x,z)}
It is just running for i = 1 and giving me error as below:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
So, your code needed a partial re-write!
If I understand, you want to get 3 forecasts for every 44 time-series data. I used the .xlsx data that you provided.
library(forecast)
library(readxl)
data<-read_excel("data.xlsx",col_names = F)
z <- NULL
data <- t(data)
forecast_horizon <- 3
for (i in 1:ncol(data)){
y <- data[,i]
y_ts <- ts(y, start=c(2016,1), end=c(2019,8), frequency=12)
AutoArimaModel <- auto.arima(y_ts)
forecast <- tryCatch(predict(AutoArimaModel, forecast_horizon),
error = function(e) data.frame(pred = rep(NA,forecast_horizon)))
output <- matrix(forecast$pred, ncol = 1, byrow = FALSE)
z = cbind(z,output)
}
Pay attention to the usage of tryCatch which is used because there is one time series that produces errors when accessing the predictions (you can investigate further why this is the case.)
Use the tibbletime package: https://www.business-science.io/code-tools/2017/09/07/tibbletime-0-0-1.html
Read the data with readr::read_csv such that it's a tibble. Turn it into a tibbletime with your date vector. Use tmap_* functions as described in the article to encapsulate your forecasting code and map them to the columns of the tibbletime.
The article should have all the info you need to implement this.
The problem seems to be your data source. This works:
n_col <- 5
n_rows <- 44
#generate data
data <- data.frame(replicate(n_col, rnorm(n_rows)))
x <- matrix(1:47, ncol = 1, byrow = FALSE)
for (i in seq_len(n_col)) {
y <- data[i]
y_ts <- ts(y, start=c(2016,1), end=c(2019,8), frequency=12)
AutoArimaModel=auto.arima(y_ts)
forecast=predict(AutoArimaModel, 3)
output <- matrix(forecast$pred, ncol = 1, byrow = FALSE)
ym = data.matrix(y)
z = rbind(ym,output)
x = cbind(x,z)}
x
As an aside, I think I would approach it like this, especially if you have 4,401 fields to perform an auto.arima on:
y_ts <- ts(data, start = c(2016, 1), end = c(2019, 8), frequency = 12)
library(future.apply)
plan(multiprocess)
do.call(
cbind,
future_lapply(y_ts,
function(y_t) {
AutoArimaModel = auto.arima(y_t)
forecast = predict(AutoArimaModel, 3)
output = matrix(forecast$pred, ncol = 1, byrow = F)
ym = data.matrix(y_t)
z = rbind(ym, output)
}
)
)
When combining forecasts using combinef() the resulting hts time series has level names which are like A, AA, AB... It doesn't retain the level names from the supplied hts time series.
In the example below, bottom level names are "A10A" & "A10B", while the resulting bottom names are "AA","BB".
Is there a way to retain the original level names in the combined forecast object?
library(forecast)
library(hts)
abc <- ts(5 + matrix(sort(rnorm(20)), ncol = 2, nrow = 10))
colnames(abc) <- c("A10A", "A10B")
y <- hts(abc, characters = c(3, 1))
h <- 12
ally <- aggts(y)
allf <- matrix(NA, nrow = h, ncol = ncol(ally))
for(i in 1:ncol(ally))
allf[,i] <- forecast(auto.arima(ally[,i]), h = h)$mean
allf <- ts(allf)
y.f <- combinef(allf, get_nodes(y), weights = NULL, keep = "gts", algorithms = "lu")
At the end of your code, add the following lines:
colnames(allf) <- colnames(ally)
colnames(y.f$bts) <- colnames(y$bts)
y.f$nodes <- y$nodes
y.f$labels <- y$labels
I have a monthly dataset of performance (in terms of %) of different sectors in a company in the form
Date |Sector |Value
2016-01-01 |Sect 1 |-20
2016-02-01 |Sect 1 |10
2016-01-01 |Sect 2 |23
2016-02-01 |Sect 1 |10
the data has 20 Sectors and monthly data till June 2018. Now I want to forecast Value for the next month. I used the below code:
combine_ts <- function(data, h=1, frequency= 12, start= c(2016,5),
end=c(2018,6))
{
results <- list()
sectgrowthsub <- data[!duplicated(sectgrowthdf2[,2]),]
sectgrowthts <- ts(sectgrowthsub[,3], frequency = frequency, start = start,
end = end)
for (i in 1:(nrow(sectgrowthsub))) {
results[[i]] <- data.frame(Date =
format(as.Date(time(forecast(auto.arima(sectgrowthts), h)$mean)), "%b-%y"),
SectorName = rep(sectgrowthsub[,2], h),
PointEstimate = forecast(auto.arima(sectgrowthts),
h=h)$mean[i])
}
return(data.table::rbindlist(results))
}
fore <- combine_ts(sectgrowthsub)
The problem in this case is that Value forecast is the same for all the Sectors.
Help is much appreciated
I took the liberty of simplifying the problem a little bit and removed the function to better show the process of modeling groups separately:
library(magrittr)
library(forecast)
dat <- data.frame(value = c(rnorm(36, 5),
rnorm(36, 50)),
group = rep(1:2, each = 36))
# make a list where each element is a group's timeseries
sect_list <- dat %>%
split(dat$group) %>%
lapply(function(x, frequency, start) {
ts(x[["value"]], frequency = 12, start = 1 ) })
# then forecast on each groups timeseries
fc <- lapply(sect_list, function(x) { data.frame(PointEstimate = forecast(x, h=1)$mean ) }) %>%
do.call(rbind, .) # turn into one big data.frame
fc
PointEstimate
1 5.120082
2 49.752510
Let me know if you get hung up on any parts of this.
I'm using the rpart library from R to try forecasting the electricity consumption from Australia (example from the book Introductory Time Series with R):
library(rpart)
www <- "http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/cbe.dat"
CBE <- read.table(www, header = T)
Elec.ts <- ts(CBE[, 3], start = 1958, freq = 12)
plot(cbind(Elec.ts))
fit <- rpart(elec~elec, method="anova", data=CBE)
pre <- predict(fit)
Elec.predict <- ts(pre[], start = 1958, freq = 12)
plot(cbind(Elec.ts,Elec.predict ))
It's really simple, the R program does not run, if I try to create a model using the elec data it self.
Am I using it wrong?
How Can I use this library properly ?
Solving the problem with this script.
I have created a github site with all informations about the script and the time series data. http://alvarojoao.github.io/timeseriesExamples
library(caret)
library(ggplot2)
library(pls)
library(data.table)
library(rpart)
library(bst)
library(plyr)
nLag <- 12
khorizon <- 1
www <- "./databases/elec.dat"
CBE <- read.table(www, header = T)
base <- CBE
variable <- 'elec'
base$elec = (base$elec-min(base$elec))/(max(base$elec)-min(base$elec))
base <- setDT(base)[, paste0(variable, 1:nLag) := shift(elec, 1:nLag)][]
base <- base[(nLag+1):nrow(base),]
Elec.ts <- ts(CBE[, 1], start = 1958, freq = 12)
acf(CBE$elec)
plot(cbind(Elec.ts))
timeSlices <- createTimeSlices(1:nrow(base),
initialWindow =nrow(base)*2/3, horizon = khorizon , fixedWindow = FALSE)
str(timeSlices,max.level = 1)
trainSlices <- timeSlices[[1]]
testSlices <- timeSlices[[2]]
predTest <- c(1,2)
predTest <- predTest[0]
trueTest <- c(1,2)
trueTest <- trueTest[0]
for(i in 1:length(trainSlices)){
plsFitTime <- train(elec ~ .,
data = base[trainSlices[[i]],],
method = "treebag"
)
pred <- predict(plsFitTime,base[testSlices[[i]],])
true <- base$elec[testSlices[[i]]]
}