Problem with creating stationarity in a time series - r

I have a problem with my code. I want to forecast stock returns with an ARIMA model in R but I can not get my data stationary. Besides transforming the stock prices into returns, I also tried the diff function for differencing my time-series. I always assumed that data becomes stationary by using one of the 2 methods. However, when I run an augmented dickey fuller test (adf.test in R) my p-value shows me that the data remains non-stationary. What am I doing wrong?
enter image description here
Thanks in advance.

You must perform time series decomposition into data, seasonal, trend and residuals:
library('forecast')
library('tseries')
data$moving_average=ma(data$original, order=7)
moving_average = ts(na.omit(data$moving_average), frequency=30)
decomposition = stl(moving_average, s.window="periodic")
stationary <- seasadj(decomposition)
plot(decomposition)
You will get:

Related

How do I decide between different forecasting model families to automate forecasting for 150 time series?

I have weekly time series data for multiple departments (retail domain) and based on some research, I am automating the process of finding model parameters for each time series. So far, I have implemented the following models for each time series in a for loop:
1) ARIMA (auto.arima in R)
2) stlf (cannot use R's ets function since I have weekly data)
3) TBATS
4) Regression on ARIMA errors (using fourier terms)
5) Baseline models: naive & mean
I want to understand how to choose models for each time series. I have multiple approaches to this:
1) Choose model with lowest RMSE on test data (risk: overfitting on test data)
2) Choose model with lowest RMSE best on cross-validation of time series (tsCV)
3) Choose one family of models for all the time series based on which family gives lowest average RMSE score across all the time series.
Are there any ways I can improve my approach? Any disadvantages to any of the above approaches? Any better approach?
Thanks a lot!
Forecast your data with all forecasting methods mentioned above, after that calculate the MAPE and check which model is giving best results then use that model for forecast your data.
Also try to check with different different data transformation like log, inverse, etc.. for your input data.

R: Displaying ARIMA forecast as extension of past data after log transformation

My goal: I want to understand a time series, a strongly auto-regressive one (ACF and PACF output told me that) and make a forecast.
So what I did was I first transformed my data into a ts, then decomposed the time series, checked its stationarity (the series wasn't stationary). Then I conducted a log transformation and found an Arima model that fits the data best - I checked the accuracy with accuracy(x) - I selected the model with the accuracy output closest to 0.
Was this the correct procedure? I'm new to statistics and R and would appreciate some criticism if that wasn't correct.
When building the Arima model I used the following code:
ARIMA <- Arima(log(mydata2), order=c(2,1,2), list(order=c(0,1,1), period=12))
The result I received was a log function and the data from the past (the data I used to build the model) wasn't displayed in the diagram. So then to transform the log into the original scale I used the following code:
ARIMA_FORECAST <- forecast(ARIMA, h=24, lambda=0)
Is that correct? I found it somewhere on the web and don't really understand it.
Now my main question: How can I plot the original data and the ARIMA_FORECAST in one diagram? I mean displaying it the way the forecasts are displayed if no log transformation is undertaken - the forecast should be displayed as the extension of the data from the past, confidence intervals should be there too.
The simplest approach is to set the Box-Cox transformation parameter $\lambda=0$ within the modelling function, rather than take explicit logarithms (see https://otexts.org/fpp2/transformations.html). Then the transformation will be automatically reversed when the forecasts are produced. This is simpler than the approach described by #markus. For example:
library(forecast)
# estimate an ARIMA model to log data
ARIMA <- auto.arima(AirPassengers, lambda=0)
# make a forecast
ARIMA_forecast <- forecast(ARIMA)
# Plot forecasts and data
plot(ARIMA_forecast)
Or if you prefer ggplot graphics:
library(ggplot2)
autoplot(ARIMA_forecast)
The package forecast provides the functions autolayer and geom_forecast that might help you to draw the desired plot. Here is an example using the AirPassengers data. I use the function auto.arima to estimate the model.
library(ggplot2)
library(forecast)
# log-transform data
dat <- log(AirPassengers)
# estimate an ARIMA model
ARIMA <- auto.arima(dat)
# make a forecast
ARIMA_forecast <- forecast(ARIMA, h = 24, lambda = 0)
Since your data is of class ts you can use the autoplot function from ggplot2 to plot your original data and add the forecast with the autolayer function from forecast.
autoplot(AirPassengers) + forecast::autolayer(ARIMA_forecast)
The result is shown below.

Arimax forecasting

I need to forecast sales on daily basis using arima model with independent variables as weekdays.
So i build up the model :
d= data,
Total = sales Monday,tuesday...Sunday are my independent Vars
i am using library(forecast)
'fit=arima(d$Total,xreg=cbind(Sunday,Monday,Tuesday,Wednesday,Thursday,Friday),order=c(1,1,1))'
Please help me to proceed further and to predict future values.
How to decide p,d,q and to plot forecasted Vs Actual Values ? please help

Difference between simulate() and forecast() in "forecast" package

I am working on building a time series model.
However, I am having trouble understanding what the difference is between the simulate function and the forecast function in the forecast package.
Suppose I built an arima model and want to use it to simulate future values as long as 10 years. The data is hourly and we have a year worth of data.
When using forecast to predict the next 1000-step-ahead estimation, I got the following plot.
Using forecast method
Then I used the simulate function to simulate the next 1000 simulated values and got the following plot.
Using simulate method
Data points after the red line are simulated data points.
In the latter example, I used the following codes to simulate the future values.
simulate(arima1, nsim=1000, future=TRUE, bootstrap=TRUE))
where arima1 is my trained arima model, bootstrap residuals are used because the model residuals are not very normal.
Per definition in the forecast package, future=TRUE means that we are simulating future values based on the historical data.
Can anyone tell me what the difference is between these two method? Why does simulate() give me a much more realistic results but forecasted values from forecast() just converge to a constant after several iterations (no much fluctuation to the results from simulate())?
A simulation is a possible future sample path of the series.
A point forecast is the mean of all possible future sample paths. So the point forecasts are usually much less variable than the data.
The forecast function produces point forecasts (the mean) and interval forecasts containing the estimated variation in the future sample paths.
As a side point, an ARIMA model is not appropriate for this time series because of the skewness. You might need to use a transformation first.

Reproduce ARIMA Forecast (Coefficients from R Arima())

I am quite new to the R and the ARIMA model, and I have a question on the ARIMA model that I obtained in R.
I will use the US unemployment rate as an example, the data range is from Jan, 1948 to Feb, 2015, total of 806 observations. After looking at the AICc, I decided to use ARIMA(2,1,2) model. (BTW I am using Arima() function from "forecast" package in R)
The output is the following:
Series: log.unemp
ARIMA(2,1,2)
ar1 1.6406
ar2 -0.7499
ma1 -1.5943
ma2 0.7893
sigma^2 estimated as 0.001307: log likelihood=1530.14
AIC=‐3050.27 AICc=‐3050.2 BIC=‐3026.82
The code is
fit.best <- Arima(log.unemp, c(2, 1, 2), include.constant=FALSE)
print(fit.best)
Then I want to measure the forecast performance of this model. That is, to calculate things like RMSE, Theil's U, etc. But I do not know how to do that. The reason is that it seems that I do not know how to derive the forecast equation from this output to calculate the fitted values.
So could anyone help me on this? How should I derive the forecast equation from this output? Also, after obtaining the equation, how can I do the forecast in Excel to calculate the fitted values from the first data point (there are some numbers that are not available when you are calculating the fitted value for t=1)?
Thanks!
you can use summmary(fit.best) to view RMSE.
Or if you want to caluate by yourself you can derive residuals and fitted values like this:
fitted=log.unemp-fit.best$residuals
about the equation see this
You can use forecast package
fit.best <- Arima(log.unemp, c(2, 1, 2), include.constant=FALSE)
my_forecast <- forecast(fit.best, h=10)
my_forecast #will show the next 10 periods
# or use some detailed data like
plot(my_forecast$residuals)
Use arima model as below code:
arimafit = arima(log.unemp, order=c(2,1,2))
Forecast using the below code:
arima_future = forecast(arimafit, h=3)
where forecast is the function to forecast for next whatever months you want.
h=3 means it will forecast for next 3 months.
If you want to check RMSE on the test data you can use DMwr package:
metrics = as.data.frame(DMwR::regr.eval(<test_data_vector>, arima_future$point_forecast))
test_data_vector - is the test data vector which you can create while dividing your main dataset into train and test dataset.
arima_future$point_forecast - Is the vector point forecast vector you will get in step2.

Resources