I'm new to R but have some experience with ARIMA models. Now I wanted to learn a bit about neural networks for forecasting.
I tried to repeat the procedure from Rob's post. It worked great for the data set he used. It also worked great for imaginary datasets I created.
But then I tried to use real-life data (revenue data for 7 years monthly) and the resulting forecasts are strangely flat. My code:
read.csv("Revenue.csv",header=TRUE)
x <-read.csv("Revenue.csv",header=TRUE)
y<-ts(x,freq=12,start=c(2011,1))
(fit<-nnetar(y))
fcast <- forecast(fit, PI=TRUE, h=20, bootstrap=TRUE)
autoplot(fcast)
The result is an almost straight line (attached as picture 1). That strikes me as odd, because the trend has been positive so far: there was a revenue growth of more than 100% every year. Still the result of nnetar is that the revenue will stabilise. How is that possible?
As a comparison I used Auto.arima for the same data set (picture 2). It shows a clear upward trend.
One suggestion, even if its hard to help without the data sample.
It appears than nnetar is not capturing very well the trend in your data.
Probably you could try to use a trend as external regressors ( xreg argument)
For example for a deterministic trend.
Trend=seq(from=start, to=end, by=1)
(fit <- nnetar(y, xreg=Trend))
(f <- forecast(fit,h=h, xreg=seq(from=end, to=end+h, by=1))
An alternative would be to use more lag or seasonal lags (p and P argument in your nnetar model)
Related
I am using the forecast package of R and I created a MA(1) model by using the ARIMA function. I plotted the time series itself ($x variable of the ma_model), the model itself ($fitted variable of the ma_model) and the residuals (residuals variable of the ma_model). Strangely the time series looks equal to the model altough there are nonegative residuals. Here is the code that I used:
library(forecast)
ma_model<-Arima(ts(generationData$Price[1:200]), order=c(0,1,0))
plot(ma_model$fitted, main = "Fitted")
plot(ma_model$x, main = "X")
plot(ma_model$residuals, main = "Residuals")
Here is the result
Basically the model can't be equal to the real time series especially when having residuals. Can anyone explain this to me? I'd appreciate every comment.
Update: I tried to use the order=c(0,0,20) so I have a MA(20) or AR(20) model (I am not sure which parameters stands for MA and AR). Now the fitted curve and the original time series look quite equal (but not exactly equal). Is this possible and usual? I'd appreciate every further comment.
Any comments on this issue?
I am not sure about your output, but from the code it seems that you just took the difference in the model, not the MA.
I think it should be order=c(0,0,1) instead of order=c(0,1,0) for building the MA model.
I've tried searching but couldn't find a specific answer to this question. So far I'm able to realize that Time Series Forecasting is possible using SVM. I've gone through a few papers/articles who've performed the same but didn't mention any code, instead explained the algorithm (which I didn't quite understand). And some have done it using python.
My problem here is that: I have a company data(say univariate) of sales from 2010 to 2017. And I need to forecast the sales value for 2018 using SVM in R.
Would you be kind enough to simply present and explain the R code to perform the same using a small example?
I really do appreciate your inputs and efforts!
Thanks!!!
let's assume you have monthly data, for example derived from Air Passengers data set. You don't need the timeseries-type data, just a data frame containing time steps and values. Let's name them x and y. Next you develop an svm model, and specify the time steps you need to forecast. Use the predict function to compute the forecast for given time steps. That's it. However, support vector machine is not commonly regarded as the best method for time series forecasting, especially for long series of data. It can perform good for few observations ahead, but I wouldn't expect good results for forecasting eg. daily data for a whole next year (but it obviously depends on data). Simple R code for SVM-based forecast:
# prepare sample data in the form of data frame with cols of timesteps (x) and values (y)
data(AirPassengers)
monthly_data <- unclass(AirPassengers)
months <- 1:144
DF <- data.frame(months,monthly_data)
colnames(DF)<-c("x","y")
# train an svm model, consider further tuning parameters for lower MSE
svmodel <- svm(y ~ x,data=DF, type="eps-regression",kernel="radial",cost=10000, gamma=10)
#specify timesteps for forecast, eg for all series + 12 months ahead
nd <- 1:156
#compute forecast for all the 156 months
prognoza <- predict(svmodel, newdata=data.frame(x=nd))
#plot the results
ylim <- c(min(DF$y), max(DF$y))
xlim <- c(min(nd),max(nd))
plot(DF$y, col="blue", ylim=ylim, xlim=xlim, type="l")
par(new=TRUE)
plot(prognoza, col="red", ylim=ylim, xlim=xlim)
I use Hyndman's forecast package to produce a somewhat accurate tbats forecast at the weekly level, but I have significant errors on holidays. How can I include holidays in the model? Also, Arima has been shown to fit my weekly data poorly. So holidays would have to be added in a non-arima way.
I have seen two solutions. One https://robjhyndman.com/hyndsight/dailydata/ shows how to add holidays as dummy variables with fourier terms. The problem is dummy variables take the form of 1 or 0. I know that different holidays have different effects that a 1 or 0 would not capture. Black Friday, for example, is very different from Chinese New Year.
Another solution is have seen is here https://robjhyndman.com/hyndsight/forecast7-part-2/ where covariate nnetr change is used as an alternative to auto.arima with seasonal dummy variables. The problem is I don't see how to write the R code to input my holidays. An example would be useful.
The benchmark for time series modeling for use by official statistics agencies is x13-arima-seats by the US Census bureau. It deals with seasonal effects as well as with "parametric" holidays including, say, the Chinese New Year as well as Easter.
The functionality is available in R via the seasonal package which installs and uses the underlying x13-arima-seats binary.
And there is also a full-feature interactive website giving access to most-if-not-all features.
Have you read about Facebook's prophet package?
Haven't used it but from reading the documentation, it seems like a quick implementation that also accounts for holidays:
https://cran.r-project.org/web/packages/prophet/prophet.pdf
Implements a procedure for forecasting time series data based on
an additive model where non-linear trends are fit with yearly and weekly
seasonality, plus holidays [...]
https://cran.r-project.org/web/packages/prophet/vignettes/quick_start.html
The following did everything I needed it to do.
k=23
#forecast holidays
#bool list of future holidays
holidayf <- c(0,0,0,0,0,1,0,0,0,1,1,1,1,1,0,0,0)
h <- length(holidayf)
#given holidays
holiday <- df[,2]
y <- ts(df[,1],start = 2011,frequency = 52)
z <- fourier(y, K=k)
zf <- fourier(y, K=k, h=h)
fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE)
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=h)
fc %>% autoplot()
summary(fit)
To solve the problem of different holidays having different effect, I simply added additional holiday dummy variables. For example, you can make a vector of good holidays and a vector of bad holidays and cbind them then put them in xreg. I did not show this in the above code, but it is straight forward.
Here is the plot of the initial data (after performing a log transformation).
It is evident there is both a linear trend as well as a seasonal trend. I can address both of these by taking the first and twelfth (seasonal) difference: diff(diff(data), 12). After doing so, here is the plot of the resulting data
.
This data does not look great. While the mean in constant, we see a funneling effect as time progresses. Here are the ACF/PACF:.
Any suggestions for possible fits to try. I used the auto.arima() function which suggested an ARIMA(2,0,2)xARIMA(1,0,2)(12) model. However, once I took the residuals from the fit, it was clear there was still some sort of structure in them. Here is the plot of the residuals from the fit as well as the ACF/PACF of the residuals.
There does not appear to be a seasonal pattern regarding which lags have spikes in the ACF/PACF of residuals. However, this is still something not captured by the previous steps. What do you suggest I do? How could I go about building a better model that has better model diagnostics (which at this point is just a better looking ACF and PACF)?
Here is my simplified code thus far:
library(TSA)
library(forecast)
beer <- read.csv('beer.csv', header = TRUE)
beer <- ts(beer$Production, start = c(1956, 1), frequency = 12)
# transform data
boxcox <- BoxCox.ar(beer) # 0 in confidence interval
beer.log <- log(beer)
firstDifference <- diff(diff(beer.log), 12) # get rid of linear and
# seasonal trend
acf(firstDifference)
pacf(firstDifference)
eacf(firstDifference)
plot(armasubsets(firstDifference, nar=12, nma=12))
# fitting the model
auto.arima(firstDifference, ic = 'bic') # from forecasting package
modelFit <- arima(firstDifference, order=c(1,0,0),seasonal
=list(order=c(2, 0, 0), period = 12))
# assessing model
resid <- modelFit$residuals
acf(resid, lag.max = 15)
pacf(resid, lag.max = 15)
Here is the data, if interested (I think you can use an html to csv converter if you would like): https://docs.google.com/spreadsheets/d/1S8BbNBdQFpQAiCA4J18bf7PITb8kfThorMENW-FRvW4/pubhtml
Jane,
There are a few things going on here.
Instead of logs, we used the tsay variance test which shows that the variance increased after period 118. Weighted least squares deals with it.
March becomes higher beginning at period 111. An alternative to an ar12 or seasonal differencing is to identify seasonal dummies. We found that 7 of the 12 months were unusual with a couple level shifts, an AR2 with 2 outliers.
Here is the fit and forecasts.
Here are the residuals.
ACF of residuals
Note: I am a developer of the software Autobox. All models are wrong. Some are useful.
Here is Tsay's paper
http://onlinelibrary.wiley.com/doi/10.1002/for.3980070102/abstract
i'm new to R, so I'm having trouble with this time series data
For example (the real data is way larger)
data <- c(7,5,3,2,5,2,4,11,5,4,7,22,5,14,18,20,14,22,23,20,23,16,21,23,42,64,39,34,39,43,49,59,30,15,10,12,4,2,4,6,7)
ts <- ts(data,frequency = 12, start = c(2010,1))
So if I try to decompose the data to adjust it
ts.decompose <- decompose(ts)
ts.adjust <- ts - ts.decompose$seasonal
ts.hw <- HoltWinters(ts.adjust)
ts.forecast <- forecast.HoltWinters(ts.hw, h = 10)
plot.forecast(ts.forecast)
But for the first values I got negative values, why this is happening?
Well, you are forecasting the seasonally adjusted time series, and of course the deseasonalized series ts.adjust can already contain negative values by itself, and in fact, it actually does.
In addition, even if the original series contained only positive values, Holt-Winters can yield negative forecasts. It is not constrained.
I would suggest trying to model your original (not seasonally adjusted) time series directly using ets() in the forecast package. It usually does a good job in detecting seasonality. (And it can also yield negative forecasts or prediction intervals.)
I very much recommend this free online forecasting textbook. Given your specific question, this may also be helpful.