I am getting the following error when I try to refit the ARIMA model.
new_model <- Arima(data,model=old_model)
Error in Ops.Date(driftmod$coeff[2], time(x)) :
* not defined for "Date" objects
Note: The class of data is zoo. I also tried using xts, but I got the same error.
Edit: As suggested by Joshua, here is the reproducible example.
library('zoo')
library('forecast')
#Creating sample data
sample_range <- seq(from=1, to=10, by=1)
x<-sample(sample_range, size=61, replace=TRUE)
ts<-seq.Date(as.Date('2017-03-01'),as.Date('2017-04-30'), by='day')
dt<-data.frame(ts=ts,data=x)
#Split the data to training set and testing set
noOfRows<-NROW(dt)
trainDataLength=floor(noOfRows*0.70)
trainData<-dt[1:trainDataLength,]
testData<-dt[(trainDataLength+1):noOfRows,]
# Use zoo, so that we get dates as index of dataframe
trainData.zoo<-zoo(trainData[,2:ncol(trainData)], order.by=as.Date((trainData$ts), format='%Y-%m-%d'))
testData.zoo<-zoo(testData[,2:ncol(testData)], order.by=as.Date((testData$ts), format='%Y-%m-%d'))
#Create Arima Model Using Forecast package
old_model<-Arima(trainData.zoo,order=c(2,1,2),include.drift=TRUE)
# Refit the old model with testData
new_model<-Arima(testData.zoo,model=old_model)
The ?Arima page says that y (the first argument) should be a ts object. My guess is that the first call to Arima coerces your zoo object to ts, but the second call does not.
An easy way to work-around this is to explicitly coerce to ts:
# Refit the old model with testData
new_model <- Arima(as.ts(testData.zoo), model = old_model)
Related
I have output from a 2-component mixture model run using the Flexmix package in R. I am trying to extract the list of model coefficients, which is stored in what seems to be a list(mix2#components$Comp.1) inside an object of "Formal class FLXcomponent". I would like to store the estimates from each component ins separate dataframes.
### Simulated data for regression mixture model using Flexmix
### Class 1
x<-seq(from=1,to=2, by=0.01)
y<-seq(from=0,to=1, by=0.01)
z<-x+y+y^2
class_label <- c(rep(c(1), length(z)))
dat1<-data.frame(x,y,z,class_label)
### Class2
x<-seq(from=2,to=3, by=0.01)
y<-seq(from=10,to=11, by=0.01)
z<-x^2+y+y^2
class_label <- c(rep(c(2), length(z)))
dat2<-data.frame(x,y,z,class_label)
simdat<-rbind(dat1,dat2)
### Run the model
mix2 <- flexmix(z ~ x+y+x^2+y^2, data=simdat, k=2)
out2<-summary(mix2)
out2
### Extract model coefficients for Component 1
mix2#components$Comp.1
str(mix2#components$Comp.1)
mix2#components[[1]][["Comp.1"]][,1]
mix2#components$Comp.1[,1]
I tried using the getSlots() function in R on mix2, but this gives an error:
getSlots(mix2#components$Comp.1)
Error in .getClassesFromCache(Class) :
class should be either a character-string name or a class definition
How can I extract the coefficients in the model components and save them in a dataframe?
For instance, neither of the approaches below works:
outdat<-as.data.frame(mix2#components[[1]][["Comp.1"]][,1])
outdat<-as.data.frame(mix2#components$Comp.1)
This seems to work, although I am open to other (better) approaches.
mix2 <- flexmix(z ~ I(x^2)+I(y^2), data=simdat, k=2)
p1<-parameters(mix2, component=1)[[1]]
p2<-parameters(mix2, component=1)[[2]]
and so on.
I am trying to figure out how to apply knn.reg function to predict y (which in this case is the mpg of the Auto dataset) for a specific value of x (it's the 'horsepower' variable of the same dataset).
At first, I used a knn.reg function to build a knn regression model with k=10, which looks like this:
#Preliminary setup
library(ISLR)
library(fastDummies)
library(leaps)
library(boot)
library(FNN)
library(caTools)
df<-Auto
df$origin <- as.factor(df$origin)
df <- dummy_cols(df, select_columns = "origin")
df <- df[,!(names(df) %in% c("name", "origin","origin_1"))]
#Attempted models
knn.model<-knn.reg(train=df$horsepower, y=df$mpg, k=10)
split<-sample.split(df$mpg, SplitRatio=0.8)
train=df[split,]
test=df[!split,]
knn.model<-knn.reg(train=train[c('horsepower')], test=test[c('horsepower')], y=df$mpg, k=10)
I've tried two models that either include or exclude test data that
is split from the original data, but I think I would like to use the
entire dataset as the training data.
After constructing these models, I tried to use predict() function to estimate the mpg of a vehicle when its horsepower is 200, which would look something like this:
mpg<-c(200)
predict(knn.model, newdata=mpg)
Problem with predict() function, however, was that it gave out an error telling me that predict() can't be applied to class "knnRegCV."
I am unsure if I should use a function other than predict(), or if the code I have is missing something essential. I'd appreciate any suggestions or comments that can help me address this issue. Massive thank you in advance!
The function predict() does not have a method for the object that the knn.reg() function returns, but you can easily use the test= argument. Using your first knn.model:
knn.reg(train=df$horsepower, test=200, y=df$mpg, k=10)
# Prediction:
# [1] 12.45
Since you have only one predictor, you need to create a data frame to estimate more then one value:
pred <- data.frame(horsepower=c(100, 150, 200, 250))
knn.reg(train=df$horsepower, test=pred, y=df$mpg, k=10)
# Prediction:
# [1] 17.90 14.50 12.45 12.90
When I do forecast using forecast library, I noticed following code does not run as expected:
library(forecast)
library(dplyr)
df1 <- data.frame(gp=gl(20,5), dt=seq(1:100))
get <- function (df1){
ts1 <- ts((df1%>%filter(gp==2))$dt)
as.numeric(forecast(ar(ts1),15)$mean)
}
print(get(df1))
The error return is:
Error in ts(x) : 'ts' object must have one or more observations
May be it is caused by ar or ar.burg function. Because if you change the function to ets or something else the function works well.
What is more strange is that if you change the code to:
library(forecast)
library(dplyr)
df1 <- data.frame(gp=gl(20,5), dt=seq(1:100))
ts1 <- ts((df1%>%filter(gp==2))$dt)
get <- function (ts1){
as.numeric(forecast(ar(ts1),15)$mean)
}
print(get(ts1))
The code is also running correctly. I think this may be a bug in ar function, and the problem is somehow related to scope. Any thoughts about this?
The problem is to do with scoping. forecast() tries to find the time series used to fit the model. The functions from the forecast package (such as ets) store this information in the model object, so it is easy for forecast() to find it. But ar() is from the stats package, and it does not store the time series used to fit the model. So forecast() goes looking for it. If you run your code outside of the get() function, it works ok because forecast() manages to find the ts1 object in the local environment. But within the get() function it causes an error.
One simple fix is to add the information to the fitted model before calling forecast:
library(forecast)
library(dplyr)
df1 <- data.frame(gp=gl(20,5), dt=seq(1:100))
ts1 <- ts((df1%>%filter(gp==2))$dt)
get <- function (ts1){
fit <- ar(ts1)
fit$x <- ts1
as.numeric(forecast(fit,15)$mean)
}
print(get(ts1))
Alternatively, use predict instead of forecast:
library(dplyr)
df1 <- data.frame(gp=gl(20,5), dt=seq(1:100))
ts1 <- ts((df1%>%filter(gp==2))$dt)
get <- function (ts1){
fit <- ar(ts1)
as.numeric(predict(fit,n.ahead=15)$pred)
}
print(get(ts1))
Q: What is the right way to set the frequency in an xts object given a set of dates? Ideally, auto.arima() called on this xts object would yield the same results as when called on an analogous ts object.
Detail: I was surprised to find different results from an auto.arima() fit based on whether I passed a ts or xts object. I found the difference had to do with the frequency (which, in the case of xts, was being reset to 1 despite my setting it to 12 in the construction). Below, setting up sim_ts_12 and estimating the intended model was relatively straightforward. But in my initial attempts at working with xts (sim_xts and sim_xts_not) I estimated the wrong model. I finally estimated the right model using xts (sim_xts_12, sim_ts2xts), but both of those approaches seem wrong in some way. I'd expect working with xts to be simpler than ts. But that doesn't seem to be the case here. Am I missing something?
sim <- scan(file="./sim.dat")
sim_ts_12 <- ts(sim, start=c(2016,1), frequency=12)
sim_ts2xts_12 <- as.xts(sim_ts_12)
sim_xts <- xts(x=sim, order.by=seq.Date(from=as.Date("2016-01-01"), by="month", length.out = length(sim)))
sim_xts_12_not <- xts(x=sim, order.by=seq.Date(from=as.Date("2016-01-01"), by="month", length.out = length(sim)), frequency=12)
sim_xts_12 <- sim_xts
attr(sim_xts_12, 'frequency') <- 12
auto.arima(sim_ts_12) # ARIMA(0,1,1)(0,1,0)[12]
auto.arima(sim_ts2xts_12) # ARIMA(0,1,1)(0,1,0)[12]
auto.arima(sim_xts) # ARIMA(0,1,1) with drift
auto.arima(sim_xts_12_not) # ARIMA(0,1,1) with drift
auto.arima(sim_xts_12) # ARIMA(0,1,1)(0,1,0)[12]
txt <- "0.04767597 0.07217235 0.03954613 0.03698637 0.04283896
0.03534811 0.04198519 0.04129214 0.04576022 0.03966146
0.03656881 0.04396736 0.04459328 0.07062732 0.03477407
0.0340033 0.039136 0.0347761 0.03819997 0.03634627
0.03966617 0.03455635 0.03009606 0.03927688 0.03959629
0.06554147 0.02908742 0.02619443 0.03179742 0.02468108
0.02612955 0.02300656 0.02988827 0.01878513 0.01399028
0.02601922 0.0250159 0.05610426 0.01537538 0.01231939
0.01330564 0.008744173 0.01296571 0.005741129 0.01674992
0.003210812 -0.007936987 0.01018758"
sim.dat <- scan(text=txt, what=numeric() )
UPDATE, NOT A DUPLICATE: The possible duplicate question/answer does not address the best practice method for handling frequency in an xts. The question does not ask for it, nor does the answer address it. The answer handles ts.
I have fitted a TBATS model around my seasonal time-series data and used the forecast package to obtain predictions. My R code is:
library("forecast")
data = read.csv("data.csv")
season_info <- msts(data,seasonal.periods=c(24,168))
model <- tbats(season_info)
forecasted <- forecast.tbats(best_model,h=24,level=90)
forecasted
Now, I have a variable called 'forecasted' that outputs as such:
> forecasted
Point Forecast Lo 90 Hi 90
6.940476 5080.641 4734.760 5426.523
6.946429 5024.803 4550.111 5499.496
6.952381 4697.625 4156.516 5238.733
6.958333 4419.105 3832.765 5005.446
6.964286 4262.782 3643.528 4882.037
6.970238 4187.629 3543.062 4832.196
6.976190 4349.196 3684.444 5013.947
6.982143 4484.108 3802.574 5165.642
6.988095 4247.858 3551.955 4943.761
6.994048 3851.379 3142.831 4559.927
7.000000 3575.951 2855.962 4295.941
7.005952 3494.943 2764.438 4225.449
7.011905 3501.354 2760.968 4241.739
7.017857 3445.563 2695.781 4195.345
I need to gather the forecasted values from the column 'Forecast' and store it in a CSV file. I tried to read the page for the TBATS and 'forecast' method online, but they do not say how a particular column of forecasted values could be extracted, ignoring the other columns such as 'Hi' 'Lo' and 'Point'.
I'm looking for this output in my CSV:
hour,forecasted_value
0,5080.641
1,5024.803
2,4697.625
...
They are stored in three parts. You can look at the object structure with str(ret):
library(forecast)
fit <- tbats(USAccDeaths)
ret <- forecast(fit)
ret$upper # Upper interval
ret$lower # Lower interval
ret$mean # Point forecast
You can obtain the output shown by using print():
library("forecast")
data = read.csv("data.csv")
season_info <- msts(data,seasonal.periods=c(24,168))
model <- tbats(season_info)
forecasted <- forecast.tbats(best_model,h=24,level=90)
dfForec <- print(forecasted)
this will give you the data.frame, now you can pick out the columns you want, ie. dfForec[, 1] for only the point-forecast, then use write.csv(dfForec[, 1, drop = FALSE], ...) to write it to a flat file.
use mean function for getting your Point Forecast
library("forecast")
data = read.csv("data.csv")
season_info <- msts(data,seasonal.periods=c(24,168))
model <- tbats(season_info)
forecasted <- (forecast.tbats(best_model,h=24,level=90))$mean
or
forecasted$mean