I have seasonal time series data for weekly retail sales and am trying to use the tslm function of Hyndman's forecast package to fit a model with regressors in addition to trend and season.
The issue I'm running into is that when I build the tslm, before adding any regressors (only trend + season), I get a perfect fit (R^2 =1) on the training data!
A prefect fit is problematic because any additional covariate I add to the model (# of items being sold, distribution, etc.) have no impact on predictions (insignificant). Just looking at the data, I know these other regressors matter so I'm not exactly sure where I'm going wrong. Hoping somebody in the community can help me out.
Some information about the data I am using:
Dataset contains weekly data from 2014 - 2017
Training data contains 156 weekly observations (2014 - 2016)
Test data contains 48 observations in 2017
I am using weekly seasonality to build the time series:
ts.train <- ts(df.train$sales, freq=365.25/7)
m.lm <- tslm(ts.train ~ trend + season + items, data=df.train)
p.lm <- forecast(m.lm,
h=48,
newdata=data.frame(items=df.test$items))
If I leave "items" out of the formula, the predictions do not change at all.
I appreciate any input and guidance!
Items probably has too many variables (if they are dummy variables), since you get a perfect fit. See: https://www.otexts.org/fpp2/useful-predictors.html
For example, you need only 6 dummy variables to account for 7 week days.
Related
I am trying to predict stock price movement using different machine learning algorithms with various technical indicators as features. I intend to predict whether the stock price will go up or down 1-day ahead 14-days ahead and 30-days ahead.
I am a little bit confused about how to compute the target variables to make the predictions correctly.
So far I have computed daily returns for each firm and constructed a class variable to predict 1-day ahead.
data <- data %>% group_by(company) %>% mutate(ret =(`CLOSING PRICE` / lag(`CLOSING PRICE`)-1))
data$class <- ifelse((data$ret) >= 0, "Up, "Down")
The problem now is that I am not sure how to properly make predictions 14 and 30 days ahead.
The accuracy of all the models (SVM, RF, and DT) is very similar, around 82-85%, for 1-day ahead predictions. Is this something to be concerned about or is it logical that the accuracy is very similar for all the models?
You need to decide what it is you want to predict at these time points and make the appropriate calculations, as you've already done for the 1 day interval. Some options: you could do similar to your 1 day interval - calculate whether the closing price at day 14 or 30 is above or below the closing price on day 0, then try to predict a binary response of "up" or "down". Or you could calculate the actual difference in price between those days and use that as your response - this would be a regression problem rather than a binary classification one. However you decide to calculate your response, you then calculate the same metric in your training data and use that to train your models.
It's not unusual for different models to offer similar accuracy, especially if you've taken time to tune them all before testing. Do make sure you test against some unseen data, as some models are more prone to over-fitting than others.
The following is a (not perfectly)seasonal time series sequence that I am trying to fit an ARIMA model to:
data plot
I made it stationary(confirmed by adf test) using first order regular diff and first order seasonal diff, after which it looked like this:
acf and pacf of the stationary data
The seasonal components are at 12 and 24, both significant in the acf plot, while in the pacf plot, only the 12th one is significant.
I tried a lot of combinations for p,d,a,P,D,Q but no matter what the combination, the residuals always seem to have the first seasonal lag as significant in both the acf and pacf plots.
I decided to go with ARIMA(4,1,4)(1,1,1)[96] (even though 12 is the frequency of data(monthly), it happens to show seasonality at roughly 96 intervals, i.e. in 8 years) because it was giving the best log-likelihood score, but still it doesn't fits the seasonal components.
residual acf and pacf
Can anyone please suggest me what should be improved/tried in order for the model to fit all the lags?
Sharing the r file and dataset here: https://drive.google.com/drive/folders/1okMUkBj2W2nF9NkoX4igq2-7QP-cgnSO?usp=sharing
I've got this prediction problem for daily data across several years. My data has both yearly and weekly seasonality.
I tried using the following recurrence:(which I just came up with, from nowhere if you like) xn = 1/4(xn-738 + xn-364 + xn-7 + 1/6(xn-1+xn-2+xn-3+xn-4+xn-5+xn-6)
Basically, I am taking into consideration some of the previous days in the week before the day I am trying to predict and also the corresponding day a year and two years earlier. I am doing an average over them.
My question is: can one try to improve the prediction by replacing the coefficients 1/4,1/6 etc with coefficients that would make the mean squared residual smaller?
Personally I see your problem as a regression.
If you have enough data I would run for timeseries prediction.
You said that the data has yearly and weekly seasonality. In order to cope with that you can have two models one with weekly window and one dealing with the yearly pattern and then somehow combine them (linear combination or even another model).
However if you don't have you could try passing the above xi as features to a regression model such as linear regression,svm,feed forward neural network and on theory it will find those coefficients that produce small enough loss (error).
I need to forecast sales on daily basis using arima model with independent variables as weekdays.
So i build up the model :
d= data,
Total = sales Monday,tuesday...Sunday are my independent Vars
i am using library(forecast)
'fit=arima(d$Total,xreg=cbind(Sunday,Monday,Tuesday,Wednesday,Thursday,Friday),order=c(1,1,1))'
Please help me to proceed further and to predict future values.
How to decide p,d,q and to plot forecasted Vs Actual Values ? please help
I have a day level dataset for 3 years,
I ran auto.arima() in R on it for simple time series forecasting and it gave me a (2,1,2) model.
When I used this model to predict the variable for the next 1 year the plot became constant after a few days, which can't be correct
As I have a daily data for 3 years, and a frequency of 364 days, is ARIMA incapable of handling daily data with large frequencies?
Any help will be appreciated
This sounds like you are trying to forecast too far into the future.
The forecast for tomorrow is going to be accurate, but the forecast for the next day and the day after that are not going to be influenced much by the past data and they will therefore settle around some constant when trying to forecast too far into the future. "Too far into the future" probably means two or more time points.
Lets say you have data up until time point T+100, which you used to estimate your ARIMA(2,1,2) model. You can "forecast" the value for time T+1 by pretending you only have data until point T and use your ARIMA(2,1,2) model to forecast T+1. Then move ahead by one period in your data and pretend you only have data until time T+1 and "forecast" T+2. This way you can assess the forecasting accuracy of your ARIMA(2,1,2) model, for example by calculating the Mean Squared Error (MSE) of the "forecasts".